Transcript Slide 1

Getting Connected:
Social Science in the Age of Networks
2006-07 ISS “Theme Project”
The “Thememates”
•
•
John Abowd
Industrial and Labor Relations
CISER, Director
•
Jon Kleinberg
Department of Computer Science
•
Michael Macy
Department of Sociology, Chair
•
Kathleen O'Connor
Johnson Gradute School of
Management
Larry Blume
Department of Economics
•
David Easley
Department of Economics
•
Geri Gay
Dept of Communication, Chair
•
•
Jeffrey Prince
Department of Applied Economics
and Management
Dan Huttenlocher
Computer Science,
Johnson School
•
David Strang
Department of Sociology
Events Planned for Next Year
• Colloquia by visiting scholars
• Year-long discussion group
• Two week-long workshops
– Leading researchers from outside Cornell
– Cornell faculty and students in the social and
information sciences
– Interested ISS affiliates across all disciplines
Search and Diffusion on Networks
• Nov. 8-11, 2006
423 ILR Conference Center
Workshop will focus on the transmission of
information, beliefs, technologies, and behavior.
Thematic issues include the identification and
design of network structures that enhance
search and diffusion, challenges in network
measurement and inference, the interactions
between network topologies and the material
that passes over them, and variation in
mechanisms across empirical domains.
Self-Organizing Online Communities
• March 28-31, 2007
423 ILR Conference Center
• A international workshop, co-sponsored by Microsoft
Research, to examine the new research opportunities for
studying social interactions that leave a digital trace, with
a special focus on the dynamics of self-organization in
cyberspace. The rapid growth of news groups, blogs,
and wikis opens up unprecedented possibilities to
observe the recruitment process in real time, including
the network structures through which it is mediated.
These communities also represent research sites for the
study of on-line governance and the emergence of
norms and institutions.
Outline
• What makes this “the age of networks”?
• What can network tools reveal about
social life?
• What is “new” in the “New Science of
Networks”?
• Overview of research by “theme project”
members
Why “Age of Networks”?
• Changes in society
– emergence of an information economy
– cheap computers linked in a broadband network
– opportunities for evolutionary self-organization
through p2p production:
•
•
•
•
•
Blogs, newsgroups
Open-source software
Wikipedia
Self-policing communities (on-line reputation systems)
Status hierarchies (search engines)
Changes in Social Science
• We know lots about
– Individuals (surveys)
– Aggregated as groups and populations
• Far less is known about the interactions among
individuals
– Structure of interaction, especially in large groups
– How and why network structure changes
– The content of interaction: influence, imitation,
exchange, association.
Why Networks Matter
• Marx: classes are defined by categorizing
together those individuals with a similar
relationship to the means of production
• But are these individuals tied to one
another?
– Do they interact?
– What are the patterns of interaction?
– Do classes differ in density, clustering,
hierarchy, degree distribution?
The Network Challenge
• Very hard to observe relations. You can
interview friends, but you cannot interview a
friendship.
– fleeting
– hard to observe
– tedious to record
• It is much easier to
– measure and aggregate attributes of individuals.
– model social life as correlations among individual
attributes (e.g. age, race, gender, education, income)
Why This is Changing
• Computer-mediated interaction leaves a
digital trace.
– Web pages, email, blogs, news groups, wikis
– Automatic data collection with many millions
of nodes at multiple time points
• Inexpensive computers with the storage
and processing power to analyze and
visualize network data.
So What Exactly is a Network?
Nodes
(vertices)
Relations (edges, arcs)
Affiliations
Interactions
Individuals
Protest march
Neighbors
Webpages
News events
Page links
Computers
Applications
Ethernet
Actors
Movies
Sex & violence
Cities
Airports
Highways
Nouns
Sentences
Verbs
Interlock and Affiliation Networks
Bi-partite Graph
(members and groups)
Interlock and Affiliation Networks
Group Interlock Network
(common members)
Bi-partite Graph
(members and groups)
Interlock and Affiliation Networks
Group Interlock Network
(common members)
Bi-partite Graph
(members and groups)
Member Network
(shared affiliations)
Structure of Relations
• Most social networks
are neither random …
Structure of Relations
• Most social networks
are neither random …
nor regular
Z
but complex …
Source: James Moody, 2000
Properties of Nodes
• Fixed states (e.g. demographics)
• Variable states (beliefs, opinions, etc.)
• Activation thresholds (critical fraction of
neighbors in a particular state)
• Network location (centrality)
Centrality
• Distinguishes “insiders” from “outsiders,” or the
impact of removing a node.
• Degree: number of ties (to, from) a node.
• Closeness
– Takes into account not only node i’s degree but also
the degree of i’s neighbors.
– The reciprocal of the sum of geodesics between a
node and all other nodes.
• Betweeness: the number of geodesics that pass
through a node.
Properties of Edges
•
•
•
•
•
Symmetric, undirected (e.g. marriage)
Asymmetric, directed (e.g. employment)
Strength, value (e.g. frequency of interaction)
Valence (positive or negative)
Contraction (next shortest path length)
Properties of Networks
• Connectivity (integration) and clustering
(differentiation)
• Connectivity -- how easy is it to get from one
node to another?
– A graph is connected if every node is reachable from
every other node, that is, if there is a chain of contact
between every pair of nodes.
– Geodesic: Given that one node is reachable from the
other, what is the shortest path between them?
Degrees of Separation
Actor “a” is:
1 step from 4
2 steps from 5
3 steps from 4
4 steps from 3
5 steps from 1
a
Measures of Connectivity
– Mean geodesic: the average path length over
all pairs of connected nodes.
– Redundancy: How many different paths
connect each pair?
– Density: the number of paths divided by
number possible
– Completeness: Density=1.
Clustering
• Cluster: density within > density between
• Clique: a cluster that is complete.
• Clustering coefficient: the likelihood that
two neighbors of a node are neighbors
themselves.
• Cohesion: takes into account the weights
of valued edges (e.g. frequency of
interaction).
“Old School” Network Analysis
• Structuralism
• Interest in patterning of ties,
– Not what goes on in the ties
– Ties might be affiliations, not interactions
• Motivations and intentions are unimportant.
– What matters are constraints and opportunities.
– White: movement of ministers in the Episcopal
Church depends on vacancy chains, not personal
preferences.
“Old School” Network Analysis
• Networks usually small (data constraint)
• Ties are often binary (0,1), positive, static
• Focus remains on aggregation into
classes or blocks
– But based on structural equivalence
– Not attributes of the nodes
• Interest in network determinants of class,
status, and power
Equivalence Classes
j1
i1
i2
j2
j3
j4
Network Power and Dependence
High power
Low power
Low power
Network Power and Dependence
Equal power
Equal power
Equal power
E-mail Exchanges in the Reagan White House
?
Source: James Moody, 2000, from Tom Blanton (National Security Archive), 1995.
E-mail exchanges in the Reagan White House
Source: James Moody, 2000, from Tom Blanton (National Security Archive), 1995.
Mark Lombardi’s Network Art
George W. Bush, Harken Energy and Jackson Stephens c. 1979-90, 5th Version, 1999
Mark Lombardi’s Network Art
George W. Bush, Harken Energy and Jackson Stephens c. 1979-90, 5th Version, 1999
What is “New” in the “New Science of Networks?”
• Relations can be valued, including negative
• Nodes can alter their relations by choosing
partners.
• Structures shape behaviors by changing the
costs and benefits of actions.
• Greater interest in
–
–
–
–
what goes on in the ties (processes of interaction)
distribution of thresholds (“The Tipping Point”)
propagation of contagions (“Small Worlds”)
global search via local information
The Puzzle of Six Degrees
• The path length between any two randomly chosen
people on the planet (N=6.5 billion) is six.
– Frigyes Karinthy, Chains, 1929
– A knows B who knows C who knows D who knows E who
knows F
• Easy to explain if social ties were highly random
B
• But in fact most social networks are not random
– If A is friends with B and C, then B is friends with C
– Triad closure generates highly clustered networks
A
• How then is it possible that there are only six degrees of
separation among billions (and billions) of people?
C
Answer: a few long ties
• A few bridge ties between otherwise distant nodes
– Create “shortcuts” across the graph
– While preserving
the clustering of a
“small world.”
Answer: a few long ties
• A few bridge ties between otherwise distant nodes
– Create “shortcuts” across the graph
– While preserving
the clustering of a
“small world.”
Current Research with Damon Centola
• Do network structures that increase
exposure to information also increase the
willingness to act on it?
• Having information is not the same thing
as acting on it.
– Credibility, legitimacy, effectiveness tend to
increase with the number of prior adopters.
– Changing behavior requires social
reinforcement from multiple sources.
Maybe It’s Not Such a Small World After All?
• Paradoxically, “short cuts” facilitate
growing awareness of new ideas but
impede the spread of adoptions.
– A single “distant” source is insufficient to
trigger adoption
– Social reinforcement requires local clustering,
which is disrupted by randomization.
John Abowd
• Dynamic bipartite networks, based on
quarterly snapshots of employers (physical
location, industry, employment, payroll,
sales, etc.) and employees (residential
address, sex, age, race, ethnicity, etc.),
where the edge is defined by earnings. He
is primarily interested in developing
models that show how the network
structure affects earnings outcomes.
Geri Gay
• Studying students' interaction patterns when
collaborating through wireless computer
networking tools. Findings show that prestige
strongly affects the likelihood and the extent to
which information posted in the CSCL
environment is shared by peers. Geographical
distribution and group assignment also influence
the formation and persistence of both taskrelated and friendship ties.
Jon Kleinberg
• Modeling how the underlying network of market
participants affects prices (with David Easley,
Larry Blume, and Eva Tardos).
• How communities form and grow in on-line
social networks like LiveJournal.
• Small-world properties of networks, how social
networks are embedded in physical space, and
how people are able to find short chains of
connections in these networks using only local
knowledge.
Kathleen O'Connor
• Experiments to see how people’s selfperceptions as independent of vs.
interdependent with others affects their
awareness of local network structure.
“Independents” network perceptions are
less detailed, less nuanced, less
elaborate.
Jeff Prince
• Focuses on the diffusion of high
technology, especially personal computers
and the use of the internet, with an
emphasis on population heterogeneity,
inter-temporal decision-making, and
differences between early and late
adopters.
David Strang
• Examining reference groups and networkoriented imitation and learning in the
corporate community.
• Study of network contagion in the diffusion
of innovations (participatory improvement
teams, municipal resolutions, and relations
with consultants).
“Cyberspace: The Movie”
• (Well, maybe only a slide show for now…)
• Based on snapshots of the Web taken every two
months for nearly ten years by the Internet
Archive.
• The “Cybortools* Team”
– William Arms, Geri Gay, Dan Huttenlocher, Jon
Kleinberg, Michael Macy, David Strang
– Several dozen graduate students and post-docs in
social, information, and computer science.
*Cybertools is copyrighted by Cybertools, Inc.
The Internet Archive
• Launched (and financed) by Brewster Kahle
• Complete crawls of the Web, every two months
since 1996, with some gaps:
– About 600 TByte (compressed)
– Rate of increase is about 1 TByte/day (compressed)
– No data from sites that are protected by robots.txt or
where owners have requested not to be archived
– Metadata contains format, links, anchor text, file types
– Organized to facilitate historical access to known URL
(Wayback Machine)
(Cornell’s webpage on June 5, 1997)
(Cornell’s webpage on April 1, 2005)
(Cornell’s webpage this week)
The Way Forward Machine
• The Wayback Machine is useful for
tracking a single URL.
• Not designed for network analysis.
• Imagine if somebody put the Internet
Archive into a relational database?
• Unprecedented opportunities for analysis
of social and information networks.
From Archive to Database
• We are copying the data to Cornell over
Internet2 at 300-500 GByte per day
• Hope to have about one-third transferred by the
end of the two-year grant.
• Storing the data in a relational format that allows
users to download small subsets based on userspecified criteria (e.g. time frame, keywords)
• Anticipate NLP and machine learning tools to
train algorithms for intelligent search of page
content.
Possibilities for Web-based Research
• Diffusion of innovation:
– What is the probability an individual will adopt
a new behavior, as a function of the number
of his/her friends who are adopters?
– New behavior could be: adopting a new
technology, joining a social movement,
believing a rumor, joining an on-line
community.
– Most standard models predict S-shaped
probability curves.
Probability of Joining When K Friends Are Already Members
Source: Backstrom, Huttenlocher, Kleinberg & Lan, 2006
Finding the Dogs that Don’t Bark
• Diffusion research tends to focus on
innovations that spread successfully.
• What about those that
– Were nipped in the bud?
– Spread widely and then crashed?
• Run the tape back to the start and
compare the “take-off” trajectories and
early link structures of winners and losers.
Network Dynamics
• Is there a “Matthew Effect” in changes
over time in the distribution of degree?
• Does degree homophily emerge over
time?
• What is the probability page i will link to
page j, as a function of
– the number of pages linked to both i and j?
– the network structure of these pages?
A Parting Thought to Take Home
• Our immediate goal is to advance
knowledge of networks by promoting
collaborations across disciplines and
institutions.
• The hidden agenda: Cornell is poised to
become a leading center for computational
social science by linking the talents, tools,
and expertise of social, computer and
information scientists.