NSF CDI meeting - San Diego State University

Download Report

Transcript NSF CDI meeting - San Diego State University

MAPPING CYBERSPACE TO REALSPACE

Dipak Gupta (Political Science) Brian Spitzberg (Communication) Ming-Hsiang Tsou (Geography) Li An (Geography) Jean Mark Gawron (Linguistics)

San Diego State University

Funded by NSF Cyber-Enabled Discovery and Innovation (CDI) program award (# 1028177). (4 years 2010-2014) http://mappingideas.sdsu.edu/ PI: Dr. Ming-Hsiang (Ming) Tsou (Geography) [email protected]

Co-PI: Dr. Dipak K Gupta (Political Science), [email protected]

Co-PI: Dr. Jean Marc Gawron (Linguistic), [email protected]

Co-PI: Dr. Brian Spitzberg (Communication) [email protected]

Senior Personnel: Dr. Li An (Geography) [email protected]

Graduate Research Assistants: Ick Hoi (Rick) Kim, Sarah Wandersee, Sri Tulasi Peddola, Kellen Stephens, Jennifer Smith, Amit Nagesh, Vickie Mellos, & Ting-Hwan Lee

REALSPACE VS. CYBERSPACE

 The spread of ideas in the age of the Internet is a double-edged sword; it can enhance our collective welfare as well as produce forces that can destabilize the world.  This project aims at understanding the process by which the impact of a single event or idea diffuses throughout the world over time and space.

IDEAS AND VIOLENT ACTIONS

 The world has seen four “waves” of violent actions, energized by a core idea  1880 – 1920  1920 – 1960  1960 – 1990  1990 – ????

The Anarchist movement Anti-colonial movements New Left movements Religious fundamentalism

STEPS TOWARD MAPPING IDEAS

 Identify exemplars  potentially significant event episodes (e.g., Jihadi terrorism, hate group/militia activities, natural disasters, disease outbreaks, etc.)  Develop a semantic map  identify words and phrases that characterize relevant sites. Computational linguistics becomes critical at this point.

 Collect web data  on how these phrases spread over time and space. Data are converted to Excel file with their relevant web sites, geolocation, and time.

STEPS TOWARD MAPPING IDEAS

 Spatio-temporal analyses  Statistical analyses and interpretation seeks reasons for particular trajectories along which an idea spreads (i.e., identify factors that to account for diffusion “susceptibility” to and “immunity” from particular concepts).  Pattern analysis  By plotting chronological geographic paths, we test the hypothesis that the spread of ideas is not random. That is, there are places, which are more prone to host these sites (and accept and spread an idea) than others over time.

REPRESENTING THE PROCESS

Spatial Web Automatic Reasoning and Mapping System (SWARMS ) flowchart GEOSPATIAL MAP VISUALIZATION

WEB SEARCH ENGINE & SEMANTIC DATABASES

 Microsoft SQL server with Web based GeoLocating services.  Access Bing and Yahoo search engines (search for 1000 results)

CONVERTING URLS TO GEOLOCATIONS

 ‘WHOIS’ databases  street address  host registrant latitude/longitude 

EXEMPLAR

 “ White Power ” keyword search in Yahoo (Nov. 5, 2010)

CREATING INFORMATION LANDSCAPES

Kernel point density scale. function was performed in the ArcGIS. using 3 map unit threshold (radius) and 0.5 map unit output 1 map unit =~ 50 miles.

Search results ranking serve as the "popularity" and the "population" in the kernel density algorithm. Population = (1001 - rank#). A website ranked #1 will be assigned to "1000" (1001 - 1) for its population parameter.

Compare two keywords: e.g….

Jerry Sanders (San Diego Mayor) Antonio Villaraigosa (L.A. Mayor)

CREATING INFORMATION LANDSCAPES

RED:

comparatively higher web page density for “Jerry”

BLUE:

comparatively higher web page density for “Antonio”   

Map Algebra (Raster-based): Differential Value = (Keyword A/Maximum-Kernel-Value-of-Keyword-A) - (Keyword-B/Maximum Kernel-Value-of-Keyword-B)

Red hotspots indicate that "Jerry Sanders" is more popular than "Antonio Villaraigosa" whereas and the blue color areas indicate that "Antonio Villaraigosa" is more popular than "Jerry Sanders“ The differential information landscape map illustrates

geospatial fingerprints

hidden in the text-based web search results depending on the context of selected keywords.

SPATIAL SCALE DEPENDENCY

The following settings of kernel density thresholds for detecting spatial fingerprints at different map scales were used.  6 - 8 map units for detecting the State level spatial fingerprints.  2-3 map units for detecting the County level spatial fingerprints.  1-0.5 map units for detecting the City level spatial fingerprints.  0.2 - 0.1 map units for detecting the Zipcode level spatial fingerprints.

EXEMPLAR:

Global web page density map for “

Osama bin Laden

” (English version).

EXEMPLAR:

Different language search top 1000 hits for “ Osama bin Laden ”

English “Osama bin Laden” Chinese (S)

萨马本拉登 Arabic “ ندلا نب ةماسأ "

EXEMPLAR:

“ Osama bin Laden ” (Geronimo) –(minus) Background Constant

  Note 1: Hotspots in San Francisco and New York.

RED

: high density of web pages related to “Osama bin Laden”  (comparing to the average web page density in U.S.)

BLUE

: low density of web pages related to “Osama bin Laden” (comparing to the average web page density in U.S.)

EXEMPLAR:

“ Ayman al-Zawahiri ” (Al -Quaeda 2nd) – (minus) Background Constant

  Note 1: Hotspots in New York & Washington DC

RED :

High density of web pages related to Zawahiri (compared to   the average web page density in U.S).

Blue :

Lower density of web pages related to Zawahiri (compared to the average web page density in U.S).

Different pattern:

only New York & D.C. are interested. Most other areas are not interested in this keyword (person).

EXEMPLAR:

“Burn Koran” Yahoo search (1.30.11): The kernel density of “burn Koran” keyword search results and 1000 associated websites ( red dots ) with weighted ranks (radius: 3.0 map units, output grid: 0.5 map units).

Standardize information landscapes:  Compare two similar keyword maps.

 Standardized by the population density (U.S. maps).

EXEMPLAR:

“Burn Koran” (1.30.11) The U.S population density map was used to standardize the popularity density. After standardization, the red color hot spots indicate San Jose, Houston, and the middle of Kansas are the popular areas of "burn Koran" keywords. The blue color hot spots indicate the negative value (less popular). WHY the hotspot in Kansas? Near the City of Topeka, after the original event happen in the church located in Gainesville, FL (green symbol), another church in the city of Topeka, KS claimed that they will continue the action of “burn Koran.” )

EXEMPLAR:

“Burn Koran” Time Comparison: Compared “burn Koran” (1.30.11) map to (4.3.11) immediately after Florida Koran burning incident.

Hot spots: Saint Louis, Pittsburgh, Philadelphia – NEW trends?

RED

: Increased density of web pages on April 03, 2011 (compared to 1.30.11)

BLUE

: Decreased density of web pages on April 03, 2011 (compared 1.30.11)

EXEMPLAR:

“Faisal Shahzad”

(Time Square bomber)

Background Constant (300 random keywords)  Note 1 : Hot  spots for “Shahzad”: New York & Chicago.

Note 2: Why Chicago? (link to David Headley?)

EXEMPLAR:

“Faisal Shahzad”

(Time Square bomber)

Querying the link between Chicago & Faisal Shahzad …

EXEMPLAR:

“Faisal Shahzad”

(Time Square bomber)

GLOBAL VIEW: Keyword search on 5.6.11 for “Faisal Shahzad” – Background Constant (300 random keywords)

SUMMARY:

This innovative, multidisciplinary project has wide application in many fields from security studies to the spread of epidemics. It can also be used to track marketing of a new product.

Project Website: http://mappingideas.sdsu.edu