Transcript Slide 1
Getting Connected: Social Science in the Age of Networks 2006-07 ISS “Theme Project” The “Thememates” • • John Abowd Industrial and Labor Relations CISER, Director • Jon Kleinberg Department of Computer Science • Michael Macy Department of Sociology, Chair • Kathleen O'Connor Johnson Gradute School of Management Larry Blume Department of Economics • David Easley Department of Economics • Geri Gay Dept of Communication, Chair • • Jeffrey Prince Department of Applied Economics and Management Dan Huttenlocher Computer Science, Johnson School • David Strang Department of Sociology Events Planned for Next Year • Colloquia by visiting scholars • Year-long discussion group • Two week-long workshops – Leading researchers from outside Cornell – Cornell faculty and students in the social and information sciences – Interested ISS affiliates across all disciplines Search and Diffusion on Networks • Nov. 8-11, 2006 423 ILR Conference Center Workshop will focus on the transmission of information, beliefs, technologies, and behavior. Thematic issues include the identification and design of network structures that enhance search and diffusion, challenges in network measurement and inference, the interactions between network topologies and the material that passes over them, and variation in mechanisms across empirical domains. Self-Organizing Online Communities • March 28-31, 2007 423 ILR Conference Center • A international workshop, co-sponsored by Microsoft Research, to examine the new research opportunities for studying social interactions that leave a digital trace, with a special focus on the dynamics of self-organization in cyberspace. The rapid growth of news groups, blogs, and wikis opens up unprecedented possibilities to observe the recruitment process in real time, including the network structures through which it is mediated. These communities also represent research sites for the study of on-line governance and the emergence of norms and institutions. Outline • What makes this “the age of networks”? • What can network tools reveal about social life? • What is “new” in the “New Science of Networks”? • Overview of research by “theme project” members Why “Age of Networks”? • Changes in society – emergence of an information economy – cheap computers linked in a broadband network – opportunities for evolutionary self-organization through p2p production: • • • • • Blogs, newsgroups Open-source software Wikipedia Self-policing communities (on-line reputation systems) Status hierarchies (search engines) Changes in Social Science • We know lots about – Individuals (surveys) – Aggregated as groups and populations • Far less is known about the interactions among individuals – Structure of interaction, especially in large groups – How and why network structure changes – The content of interaction: influence, imitation, exchange, association. Why Networks Matter • Marx: classes are defined by categorizing together those individuals with a similar relationship to the means of production • But are these individuals tied to one another? – Do they interact? – What are the patterns of interaction? – Do classes differ in density, clustering, hierarchy, degree distribution? The Network Challenge • Very hard to observe relations. You can interview friends, but you cannot interview a friendship. – fleeting – hard to observe – tedious to record • It is much easier to – measure and aggregate attributes of individuals. – model social life as correlations among individual attributes (e.g. age, race, gender, education, income) Why This is Changing • Computer-mediated interaction leaves a digital trace. – Web pages, email, blogs, news groups, wikis – Automatic data collection with many millions of nodes at multiple time points • Inexpensive computers with the storage and processing power to analyze and visualize network data. So What Exactly is a Network? Nodes (vertices) Relations (edges, arcs) Affiliations Interactions Individuals Protest march Neighbors Webpages News events Page links Computers Applications Ethernet Actors Movies Sex & violence Cities Airports Highways Nouns Sentences Verbs Interlock and Affiliation Networks Bi-partite Graph (members and groups) Interlock and Affiliation Networks Group Interlock Network (common members) Bi-partite Graph (members and groups) Interlock and Affiliation Networks Group Interlock Network (common members) Bi-partite Graph (members and groups) Member Network (shared affiliations) Structure of Relations • Most social networks are neither random … Structure of Relations • Most social networks are neither random … nor regular Z but complex … Source: James Moody, 2000 Properties of Nodes • Fixed states (e.g. demographics) • Variable states (beliefs, opinions, etc.) • Activation thresholds (critical fraction of neighbors in a particular state) • Network location (centrality) Centrality • Distinguishes “insiders” from “outsiders,” or the impact of removing a node. • Degree: number of ties (to, from) a node. • Closeness – Takes into account not only node i’s degree but also the degree of i’s neighbors. – The reciprocal of the sum of geodesics between a node and all other nodes. • Betweeness: the number of geodesics that pass through a node. Properties of Edges • • • • • Symmetric, undirected (e.g. marriage) Asymmetric, directed (e.g. employment) Strength, value (e.g. frequency of interaction) Valence (positive or negative) Contraction (next shortest path length) Properties of Networks • Connectivity (integration) and clustering (differentiation) • Connectivity -- how easy is it to get from one node to another? – A graph is connected if every node is reachable from every other node, that is, if there is a chain of contact between every pair of nodes. – Geodesic: Given that one node is reachable from the other, what is the shortest path between them? Degrees of Separation Actor “a” is: 1 step from 4 2 steps from 5 3 steps from 4 4 steps from 3 5 steps from 1 a Measures of Connectivity – Mean geodesic: the average path length over all pairs of connected nodes. – Redundancy: How many different paths connect each pair? – Density: the number of paths divided by number possible – Completeness: Density=1. Clustering • Cluster: density within > density between • Clique: a cluster that is complete. • Clustering coefficient: the likelihood that two neighbors of a node are neighbors themselves. • Cohesion: takes into account the weights of valued edges (e.g. frequency of interaction). “Old School” Network Analysis • Structuralism • Interest in patterning of ties, – Not what goes on in the ties – Ties might be affiliations, not interactions • Motivations and intentions are unimportant. – What matters are constraints and opportunities. – White: movement of ministers in the Episcopal Church depends on vacancy chains, not personal preferences. “Old School” Network Analysis • Networks usually small (data constraint) • Ties are often binary (0,1), positive, static • Focus remains on aggregation into classes or blocks – But based on structural equivalence – Not attributes of the nodes • Interest in network determinants of class, status, and power Equivalence Classes j1 i1 i2 j2 j3 j4 Network Power and Dependence High power Low power Low power Network Power and Dependence Equal power Equal power Equal power E-mail Exchanges in the Reagan White House ? Source: James Moody, 2000, from Tom Blanton (National Security Archive), 1995. E-mail exchanges in the Reagan White House Source: James Moody, 2000, from Tom Blanton (National Security Archive), 1995. Mark Lombardi’s Network Art George W. Bush, Harken Energy and Jackson Stephens c. 1979-90, 5th Version, 1999 Mark Lombardi’s Network Art George W. Bush, Harken Energy and Jackson Stephens c. 1979-90, 5th Version, 1999 What is “New” in the “New Science of Networks?” • Relations can be valued, including negative • Nodes can alter their relations by choosing partners. • Structures shape behaviors by changing the costs and benefits of actions. • Greater interest in – – – – what goes on in the ties (processes of interaction) distribution of thresholds (“The Tipping Point”) propagation of contagions (“Small Worlds”) global search via local information The Puzzle of Six Degrees • The path length between any two randomly chosen people on the planet (N=6.5 billion) is six. – Frigyes Karinthy, Chains, 1929 – A knows B who knows C who knows D who knows E who knows F • Easy to explain if social ties were highly random B • But in fact most social networks are not random – If A is friends with B and C, then B is friends with C – Triad closure generates highly clustered networks A • How then is it possible that there are only six degrees of separation among billions (and billions) of people? C Answer: a few long ties • A few bridge ties between otherwise distant nodes – Create “shortcuts” across the graph – While preserving the clustering of a “small world.” Answer: a few long ties • A few bridge ties between otherwise distant nodes – Create “shortcuts” across the graph – While preserving the clustering of a “small world.” Current Research with Damon Centola • Do network structures that increase exposure to information also increase the willingness to act on it? • Having information is not the same thing as acting on it. – Credibility, legitimacy, effectiveness tend to increase with the number of prior adopters. – Changing behavior requires social reinforcement from multiple sources. Maybe It’s Not Such a Small World After All? • Paradoxically, “short cuts” facilitate growing awareness of new ideas but impede the spread of adoptions. – A single “distant” source is insufficient to trigger adoption – Social reinforcement requires local clustering, which is disrupted by randomization. John Abowd • Dynamic bipartite networks, based on quarterly snapshots of employers (physical location, industry, employment, payroll, sales, etc.) and employees (residential address, sex, age, race, ethnicity, etc.), where the edge is defined by earnings. He is primarily interested in developing models that show how the network structure affects earnings outcomes. Geri Gay • Studying students' interaction patterns when collaborating through wireless computer networking tools. Findings show that prestige strongly affects the likelihood and the extent to which information posted in the CSCL environment is shared by peers. Geographical distribution and group assignment also influence the formation and persistence of both taskrelated and friendship ties. Jon Kleinberg • Modeling how the underlying network of market participants affects prices (with David Easley, Larry Blume, and Eva Tardos). • How communities form and grow in on-line social networks like LiveJournal. • Small-world properties of networks, how social networks are embedded in physical space, and how people are able to find short chains of connections in these networks using only local knowledge. Kathleen O'Connor • Experiments to see how people’s selfperceptions as independent of vs. interdependent with others affects their awareness of local network structure. “Independents” network perceptions are less detailed, less nuanced, less elaborate. Jeff Prince • Focuses on the diffusion of high technology, especially personal computers and the use of the internet, with an emphasis on population heterogeneity, inter-temporal decision-making, and differences between early and late adopters. David Strang • Examining reference groups and networkoriented imitation and learning in the corporate community. • Study of network contagion in the diffusion of innovations (participatory improvement teams, municipal resolutions, and relations with consultants). “Cyberspace: The Movie” • (Well, maybe only a slide show for now…) • Based on snapshots of the Web taken every two months for nearly ten years by the Internet Archive. • The “Cybortools* Team” – William Arms, Geri Gay, Dan Huttenlocher, Jon Kleinberg, Michael Macy, David Strang – Several dozen graduate students and post-docs in social, information, and computer science. *Cybertools is copyrighted by Cybertools, Inc. The Internet Archive • Launched (and financed) by Brewster Kahle • Complete crawls of the Web, every two months since 1996, with some gaps: – About 600 TByte (compressed) – Rate of increase is about 1 TByte/day (compressed) – No data from sites that are protected by robots.txt or where owners have requested not to be archived – Metadata contains format, links, anchor text, file types – Organized to facilitate historical access to known URL (Wayback Machine) (Cornell’s webpage on June 5, 1997) (Cornell’s webpage on April 1, 2005) (Cornell’s webpage this week) The Way Forward Machine • The Wayback Machine is useful for tracking a single URL. • Not designed for network analysis. • Imagine if somebody put the Internet Archive into a relational database? • Unprecedented opportunities for analysis of social and information networks. From Archive to Database • We are copying the data to Cornell over Internet2 at 300-500 GByte per day • Hope to have about one-third transferred by the end of the two-year grant. • Storing the data in a relational format that allows users to download small subsets based on userspecified criteria (e.g. time frame, keywords) • Anticipate NLP and machine learning tools to train algorithms for intelligent search of page content. Possibilities for Web-based Research • Diffusion of innovation: – What is the probability an individual will adopt a new behavior, as a function of the number of his/her friends who are adopters? – New behavior could be: adopting a new technology, joining a social movement, believing a rumor, joining an on-line community. – Most standard models predict S-shaped probability curves. Probability of Joining When K Friends Are Already Members Source: Backstrom, Huttenlocher, Kleinberg & Lan, 2006 Finding the Dogs that Don’t Bark • Diffusion research tends to focus on innovations that spread successfully. • What about those that – Were nipped in the bud? – Spread widely and then crashed? • Run the tape back to the start and compare the “take-off” trajectories and early link structures of winners and losers. Network Dynamics • Is there a “Matthew Effect” in changes over time in the distribution of degree? • Does degree homophily emerge over time? • What is the probability page i will link to page j, as a function of – the number of pages linked to both i and j? – the network structure of these pages? A Parting Thought to Take Home • Our immediate goal is to advance knowledge of networks by promoting collaborations across disciplines and institutions. • The hidden agenda: Cornell is poised to become a leading center for computational social science by linking the talents, tools, and expertise of social, computer and information scientists.