Online Social Networks Thomas Karagiannis Microsoft Research How many people in the room have a profile in an Online Social Network (OSN)?

Download Report

Transcript Online Social Networks Thomas Karagiannis Microsoft Research How many people in the room have a profile in an Online Social Network (OSN)?

Online Social Networks
Thomas Karagiannis
Microsoft Research
How many people in the room have a
profile in an Online Social Network (OSN)?
Real life…
...and the networking community
Multicast and Anycast,
Control mechanisms, WWW,
Performance analysis,
Routing, TCP, Tracing and
Measurement, Header
Processing
Network geometry and
design, Inference of
network properties,
Multihoming and overlays,
Wireless, Secure networks,
Troubleshooting,
Congestion control, Router
design, DNS
Routing, Security,
Data Center Networking,
Management, Wireless,
Router Primitives,
Incentives, Measurement,
P2P
Social networking services
•
Social communities
– Bebo, MySpace, Facebook, etc.
•
Content sharing
– YouTube, Flickr, MSN Soapbox, etc.
•
Corporate
– LinkedIn, Plaxo, etc.
•
Portals
– MSN, Yahoo 360, etc.
•
Recommendation engines
– Last.fm, StumbleUpon, Digg, Me.dium, etc.
•
Bookmarking/Tagging
– Del.icio.us , CiteUlike, Furl, etc.
•
Discussion groups
– Blogs, forums, chat, messaging, Live QnA, etc.
•
Mobile social networks
– Vipera, Nokia “MOSH”, etc.
•
Virtual worlds
– Second life
Social Network Sites: History
[Boyd et al., 2007]





SixDegrees.com the first recognizable OSN
 Profiles and lists of friends
 Combined existing features!
 Failed - Nothing to do after accepting friend
requests.
OSN wave after 2001
Friendster:
 Technical and social difficulties with scale!
 “Fakesters” diluted the community
MySpace:
 Capitalized on Friendster’s problems
 Bands and fans
 Allowed personalization of profiles
Facebook:
 Growth: Harvard-only => University-only =>
high schools & professionals => everyone
 Introduced applications (provided APIs)
Social networking services

Source: Bebo, Social Media – ‘getting your message across’
Shift in online communities

OSNs are organized around people

“Egocentric” networks

WEB: world composed of groups

OSNs: world composed of networks
What do social networks enable?
Leveraging the “community”
in traditional applications
•
•
•
•
•
Content/information sharing
Search
Information management
Recommendations
Advertisements
Research topics of interest
• Identification of communities and their evolution in time
•
•
Measurement and analysis of online communities
Social media analysis: blogs and friendship networks
• Recommendation / collaborative filtering systems
•
•
Rating, review, reputation, and trust systems
Expertise / interest tracking
• Information sharing and forwarding
•
•
Search strategies in social networks
Viral marketing strategies
• Implications on network and distributed systems design
•
•
System design for social networks
Mobile social networks
• Privacy and anonymity
This lecture
• Social networks
— Sociological studies & basic concepts
— Small worlds, weak ties, degrees, centralities
• Analysis and measurements of OSNs
— Structure and properties
— Impact of OSNs on traditional applications and user activity
— Information dissemination, viral marketing, privacy, tagging
Networks..
• …an interconnected system
• …a series of points or nodes interconnected by
communication paths
• … a collection of computers connected to each other
..and networks
• …relations, social structure among a set of actors (i.e., individuals)
• …nodes (which are generally individuals or organizations) that are tied
by one or more specific types of interdependency, such as values,
visions, ideas, financial exchange, friendship
Sociological studies
• How are groups of people connected?
— To what degree does every member of a given group know every
other member?
— Six degrees of separation and the small world phenomenon
• How many people do you know?
— Ego networks
• Communities and interactions
— Zachary’s karate club
• The strength of weak ties
— Bridges and structural holes
Six Degrees of Separation
• How are groups of people connected?
[Milgram 1967]
• Arbitrary “starting persons” were selected to forward a
letter to a first-name acquaintance with the final goal of
reaching an “arbitrary” target person
– Target: Stockbroker in Boston, MA.
– Starters:
• Random sample (n=100) of Boston residents
• Random sample (n=96) from all Nebraska residents
• Sample (n=100) of share-owning Nebraska residents
Six Degrees of Separation
Six hops on the average to reach the target!
•
•
•
•
64 / 296 reached the target
Forwarding by exploiting targets’ address: 6.1
Forwarding by exploiting targets’ job: 4.6
Chains overlap as they converge on the target
– Only 26 individuals in the last hop
– 16 copies delivered from one person alone
• Incomplete chains
• Chances of forwarding increases
with number of intermediaries
How many people do you know?
Ego networks
consist of a focal node ("ego") and
the nodes to whom ego is directly
connected to ("alters") plus the ties
Acquaintances
~ 5,000
Immediate contacts
~ 100-200
Regular contacts
~ 20 per week
Confidants ~ 3
[Ithiel de Sola Pool 1978]
[Freeman and Thomson 1989]
[Heran 1988]
ego
Communities and interactions
• “Friendship” network between karate
club students
• During the study, a dispute arose and the
club split in two
• Split was the minimum cut!
[Zachary 1977]
Bridges and the strength of weak ties
[Granovetter 1973]
• Social relationships are of varying “strength”
– Duration, emotional intensity, intimacy, exchange of
services (backscratching)
• Strength of ties reveal different social processes
– Strong ties tend to form cliques
Bridges and the strength of weak ties
[Granovetter 1973]
• Weak ties “bridge strongly” connected components
Bridge
• Weak ties enable the sharing of information
• Weak ties are related to “structural holes” [Burt 1992]
– Separation between non-redundant contacts
– Efficiency of ego’s network (i.e., social capital) inversely
proportional to the redundancy in the network
Centralities
• Centrally positioned nodes are “privileged”
– Hubs where power concentrates
• Different viewpoints:
– Degree centrality
– Closeness centrality
– Betweenness centrality
[Freeman 1979]
Degree centrality
• Centrality according to the number of connections
– Degree: Number of direct links
10
9
• For vertex u:
2
8
7
• For a graph G(V,E):
– C = 1 a node dominates
– C = 0 all nodes equal centrality
3
1
5
4
6
Closeness centrality
• Degree centrality only measures
number of connections
8
10
9
2
5
– Nodes 2,3,4,1 are equivalent
3
1
4
7
• Closeness centrality refers to the closeness of a
node to all other network members
– Node 1 is less hops away to peripheral nodes
6
Closeness centrality
• Closeness is the mean geodesic distance (i.e., shortest
path) of u to all other vertices
• For vertex u:
– As closeness increases, an individual’s access to information,
power, prestige, etc. increases.
[Leavitt 1951, Coleman 1973, Burt 1982]
• For a graph G(V,E):
Betweenness centrality
• Betweeness measures the individual’s intermediary
value to all members of a network
– Reflects the number of geodesics
through a node
– Stricter measure of centrality
9
7
• Number of geodesics through i:
• For vertex u:
• For a graph G(V,E):
2
8
3
1
10
5
4
6
The meaning of centralities
• Degree centrality:
– Capacity to develop communication within a network
• Closeness and betweenness centrality:
– Capacity to control communication in a network
– Closeness less accurate
• Strong closeness or betweeness:
– Minority of actors control communications
• Centralities do not account for the volume of communication
– Flow betweenness
This lecture
• Social networks
— Sociological studies & basic concepts
— Small worlds, weak ties, degrees, centralities
• Analysis and measurements of OSNs
— Structure and properties
— Impact of OSNs on traditional applications and user activity
— Information dissemination, viral marketing, privacy, tagging
Measurement of Online Social Networks
[Mislove et al, IMC-2007]
• Crawled of several online social networks
–
–
–
–
Flickr: photo sharing
LiveJournal: blogging site
Orkut: social networking site
YouTube: video sharing
Measurement of Online Social NetworksDegree Distributions
[Mislove et al, IMC 2007]
Measurement of Online Social NetworksDegree Distributions
[Leskovec et al, WWW 2008]
180M nodes
1.3 B edges
Corporate email social networks
and degree distributions
• Email exchanges form a social graph
– Corporate email graphs of particular interest
– Problem: What constitutes an edge?
• Studies:
– HP Labs : 430 individuals, 6 emails as a threshold,
3 months
[Adamic et al, Social Networks-2005]
– Microsoft : 150K employess, varying thresholds,
3 months
[Karagiannis et al, MSR-TR 2008]
Corporate email social networks
and degree distributions
[Adamic et al, 2005]

Distribution appears exponential!
Structure of the graph directly affects
its searchability
 Biasing towards high-degree nodes
may not be as efficient in enterprise
email graphs

Corporate email social networks
and degree distributions
[Karagiannis et al, MSR-TR 2008]



Distribution appears to depend
on the view
 In-degree vs. out-degree
Median in-degree :
 50 (threshold eq to 1)
 2 (threshold eq to 10)
Median out-degree :
 25 (threshold eq to 1)
 2 (threshold eq to 10)
Measurement of Online Social Networks
[Mislove et al, IMC 2007]
Link symmetry

Why?
Small world and six degrees revisited

Eccentricity is the maximum shortest path for a vertex

Radius:


Minimum eccentricity of any vertex
Diameter:

Maximum eccentricity of any vertex
Strength of ties
• Impact of strong ties
– What happens to the social graph when
strong/weak ties are removed?
– What is a strong tie?
• Examine the size of the largest connected
component when certain nodes are removed
Strength of ties
Different viewpoints:
Strength of ties
Email graph
• Strength defined based on volume

Removal of weak ties does not
affect the global connectivity

Strong connectivity may be the
result of the imposed org structure
Strength of ties
[Shi et al, Physica-2007]
AOL IM Friend Lists
•
Strength defined as participation in triangles


Giant component shrinks
gradually
Overlapping communities


Bridges unlikely
Shortest path does increase

Weak ties = shortcuts
Sociability and number of friends
Guestbook activity network
•
•
•
[Chun et al, IMC-2008]
2 years worth of data
How do activity graphs compare with friendship graphs?
How does friendship affect sociability?
Sociability and number of friends
[Chun et al, IMC-2008]
Capacity cap



Node strength (sociability) increases
with the number of friends up to a
limit
Is 200 a capacity cap?
Authors argue that the limit could be
connected to Dunbar’s number

Node strength:
Sum of messages across all direct edges
Dunbar (1998): Limit of
manageable relationships is 150
Online marketplaces and social networks
Hypothesis: Transactions with friends will have
higher satisfaction
[Swamynathan et al, WOSN-2008]
•
Overstock Auctions
– Similar to eBay
– Incorporates social components
• Friends, ratings, message boards
•
Two networks
– Personal: connecting friends
– Business: based on transactions
Online marketplaces and social networks

82% users have less than
1% overlap between the
two networks
Business network has lesser
degree
 50% of users have less than 10
friends or transaction partners

Online marketplaces and social networks
17K transactions studied
 Only 22% are between partners
connected in the social network
 High success rate:
 ~80% for paths up to six hops


Satisfaction does not hold at long
distances in the partner network
 Expected (?)
Viral marketing and social networks
[Lerman et al,WOSN-2008]
Hypothesis: Social interactions may be exploited to
promote content
• User-submitted news stories
• Digg promotes stories to the
front page
• Allows social networking:
– Friends vs. fans
• B is A’s friend if
A is watching B
• B is A’s fan if
B is watching A
Viral marketing and social networks

In-network votes
 From fans of previous
voters


Patterns of vote diffusion?
Predict story popularity?
Viral marketing and social networks
Data by scraping Digg:
 900 newly submitted stories
(2006)
 200 front page stories
 Time-ordered votes, user ids, etc
Viral marketing and social networks

Large number of early in-network
votes is negatively correlated with the
eventual popularity of the story

Intuition: If a story is truly interesting,
it will be discovered by “independent”
individuals
Cascades in social networks
[Cha et al, WOSN-2008]
How do photo bookmarks spread through social links?
• Crawled Flickr
– 2.5M users, 33M friend links, 100 days
– 34M bookmarks (11m distinct photos)
• Methodology: Did a particular bookmark spread through
social links?
– No: if a user bookmarks a photo and if none of his friends have
previously bookmarked the photo
– Yes: if a user bookmarks a photo a&er one of his friends
bookmarked the photo
Cascades in social networks
Cascades in social networks
• Hypothesis: Photos propagate like
diseases through human contacts
• Model:
– k: node degree, σ0 :adoption rate
–
–
• Known R0 : HIV (2-5), Measles (12-18)
Cascades in social networks

Finding:


Model can describe photo
propagation
Potential use:

Predicting popularity
Privacy in social networks
[Krishnamurthy et al, WOSN-2008]
• Users are encouraged to share personal information
– Most users unaware
– External applications require users to grant access to
personal info
Privacy in social networks

Finding:

Strong negative correlation
between network size and
viewable profile and friend lists

Users more sensitive about
their profiles
Privacy in social networks

Finding:

Information leaks to thirdparties as for Web
Privacy in social networks
[Guha et al, WOSN-2008]
• How do you ensure the “social network” experience
and keep your data private?
– NOYB (None of your business)
• Ensuring trust
– Do you trust your OSN provider?
– If yes, who else can see your data?
• Main idea:
– Profiles are composed of multiple fields
– If separated, these fields do not mean much
Privacy in social networks
Privacy in social networks
Privacy in social networks

Not a long term strategy!
Ranking and suggested candidate items
[Vojnovic et al, 2008]
• Collaborative information tagging
Ranking and suggested candidate items
• How to suggest tags?
– Goal: Learn true ranking popularity
– Tags could be used for information retrieval
• Problem:
– Users tend to imitate!
Summary
• Degrees in OSNs
– Power law distributions
– Exponential distributions in corporate email graphs
• Small world phenomenon
– Present in OSNs (short paths/diameters)
– Average shortest path close to 6
• Weak ties
– Networks robust to removal of weak ties
• Findings:
–
–
–
–
–
–
–
Capacity cap of 200
Significant symmetry of links
Marketplaces: Social links not exploited but their usage appears promising
Digg: “In-network” votes negatively correlate with story popularity
Flickr: Photo bookmarks propagate similarly to diseases
Privacy: Concerns correlate with network size
Tagging: Users imitate biasing rankings
Thank you!