Knowledge Management, Semantic Web and Social Networking Social Networks
Download
Report
Transcript Knowledge Management, Semantic Web and Social Networking Social Networks
Knowledge Management,
Semantic Web and
Social Networking
Social Networks
Dr. Bhavani Thuraisingham
June 2010
OUTLINE OF PART I
What are Social Networks
Social Network Views: Science, Technology, Culture
Social Network Concepts
Social Networks and Knowledge Management
Social Networks and Semantic Web
Applications
Directions
References:
ce.sharif.edu/~m_jamali/resources/WI06_SNA.ppt (WI 2006)
ic.ucsc.edu/~wsack/fdm20c/fall2008/Lectures/social-networks.ppt
SOCIAL NETWORKS
HTTP://WWW.FLAIRANDSQUARE.COM/ARCHIVES/167
A social network site allows people who share interests to build
a ‘trusted’ network/ online community. A social network site will
usually provide various ways for users to interact, such as IM
(chat/ instant messaging), email, video sharing, file sharing,
blogging, discussion groups, etc.
The main types of social networking sites have a ‘theme’, they
allow users to connect through image or video collections
online (like Flicker or You Tube) or music (like My Space,
lastfm). Most contain libraries/ directories of some categories,
such as former classmates, old work colleagues, and so on (like
Face book, friends reunited, Linked in, etc). They provide a
means to connect with friends (by allowing users to create a
detailed profile page), and recommender systems linked to
trust.
POPULAR SOCIAL NETWORKS
Face book - A social networking website. Initially the membership was restricted
to students of Harvard University. It was originally based on what first-year
students were given called the “face book” which was a way to get to know
other students on campus. As of July 2007, there over 34 million active
members worldwide. From September 2006 to September 2007 it increased
its ranking from 60 to 6th most visited web site, and was the number one site
for photos in the United States.
Twitter- A free social networking and micro-blogging service that allows users to
send “updates” (text-based posts, up to 140 characters long) via SMS, instant
messaging, email, to the Twitter website, or an application/ widget within a
space of your choice, like MySpace, Facebook, a blog, an RSS
Aggregator/reader.
My Space - A popular social networking website offering an interactive, usersubmitted network of friends, personal profiles, blogs, groups, photos, music
and videos internationally. According to AlexaInternet, MySpace is currently the
world’s sixth most popular English-language website and the sixth most
popular website in any language, and the third most popular website in the
United States, though it has topped the chart on various weeks. As of
September 7, 2007, there are over 200 million accounts.
SOCIAL NETWORKS:
INTERDISICPLINARY FIELD
social network analysis is an interdisciplinary social science;
Sociologists, computer scientists, physicists and mathematicians have
made large contributions to understanding networks in general (as graphs)
and thus contributed to an understanding of social networks
[Social network analysis] is grounded in the observation that social actors
[i.e., people] are interdependent and that the links [i.e., relationships]
among them have important consequences for every individual [and for all
of the individuals together]. ... [Relationships] provide individuals with
opportunities and, at the same time, potential constraints on their behavior.
... Social network analysis involves theorizing, model building and empirical
research focused on uncovering the patterning of links among actors. It is
concerned also with uncovering the antecedents and consequences of
recurrent patterns. (from Linton C. Freeman)
SOCIAL NETWORKS: HISTORY
“Sociograms” were invented in 1933 by Moreno.
In a sociogram, the actors are represented as points in a two-dimensional
space. The location of each actor is significant. E.g. a “central actor” is plotted
in the center, and others are placed in concentric rings according to “distance”
from this actor.
Actors are joined with lines representing ties, as in a social network. In other
words a social network is a graph, and a sociogram is a particular 2D
embedding of it.
These days, sociograms are rarely used (most examples on the web are not
sociograms at all, but networks). But methods like MDS (Multi-Dimensional
Scaling) can be used to lay out Actors, given a vector of attributes about them.
Social Networks were studied early by researchers in graph theory (Harary et al.
1950s). Some social network properties can be computed directly from the
graph.
Others depend on an adjacency matrix representation (Actors index rows and
columns of a matrix, matrix elements represent the tie strength between them).
SOCIAL NETWORKS AS TECHNOLOGY
email, newsgroups, and weblogs
search engines: e.g., Google (http://google.com)
Google’s Page Rank algorithm gives more weight to popular
webpages.
A webpage is considered popular if many other webpages
link to it.
collaborative filtering and/or recommender systems;
e.g., amazon.com’s feature: “People who bought this
book also bought...”
TECHNOLOGY : LINKEDIN
What is Your Network?
When your connections invite their connections, your Network starts to grow.
Your Network is your connections, their connections, and so on out from you
at the center.
How do you classify users?
Your Network contains professionals out to “three degrees” — that is, friendsof-friends-of-friends. If each person had 10 connections (and some have
many more) then your network would contain 10,000 professionals.
How do you see who is in your Network?
LinkedIn lets you see your network as one large group of searchable
professional profiles.
SOCIAL NETWORKS AS
POPULAR CULTURE
e.g., six degrees of kevin bacon
bacon number: definition
http://en.wikipedia.org/wiki/Six_Degrees_of_Kevin_B
acon
kevin bacon has a bacon number of 0
an actor, A, has a bacon number of 1 if s/he
appeared in a movie with kevin bacon
an actor, B, has a bacon number of 2 if s/he appear
in a movie with A
. social software; e.g., facebook, friendster, orkut,
SOCIAL NETWORKS:
MORE FORMAL DEFINITION
A structural approach to
understanding social interaction.
Networks consist of Actors and the
Ties between them.
We represent social networks
as graphs whose vertices are
the actors and whose edges
are the ties.
Edges are usually weighted to
show the strength of the tie.
In the simplest networks, an Actor is
an individual person.
A tie might be “is acquainted with”. Or
it might represent the amount of email
exchanged between persons A and B.
SOCIAL NETWORK EXAMPLES
Effects of urbanization on individual wellbeing
World political and economic system
Community elite decision-making
Social support, Group problem solving
Diffusion and adoption of innovations
Belief systems, Social influence
Markets, Sociology of science
Exchange and power
Email, Instant messaging, Newsgroups
Co-authorship, Citation, Co-citation
SocNet software, Friendster
Blogs and diaries, Blog quotes and links
SOCIAL NETWORKS BASIC QUESTIONS
Balance: important in exchange networks
In a two-person network (dyad), exchange of goods, services and cash
should be balanced.
More generally, exchanges of “favors” or “support” are likely to be quite
balanced.
Role: what role does the actor perform in the network?
Role is defined in terms of Actors’ neighborhoods.
The neighborhood is the set of ties and actors connected directly to the
current actor.
Actors with similar or identical neighborhoods are assigned the same
role.
What is the related idea from semiotics?
Paradigm: interchangability. Actors with the same role are
interchangable in the network.
SOCIAL NETWORKS BASIC QUESTIONS
Prestige: How important is the actor in the network?
Related notions are status and centrality.
Centrality reifies the notion of “peripheral vs. central
participation” from communities of practice.
Key notions of centrality were developed in the 1970’s, e.g.
“eigenvalue centrality” by Bonacich.
Most of these measures were rediscovered as quality
measures for web pages:
Indegree
Pagerank = eigenvalue centrality
HITS ?= two-mode eigenvalue centrality
SOCIAL NETWORK CONCEPTS
Actor
Modes
An “actor” is a basic component for SNs. Actors can be:
Individual people, Corporations, Nation-States, Social groups
If all the actors are of the same type, the network is called a onemode network. If there are two groups of actor then it is a twomode network.
E.g. an affiliation network is a two-mode network. One mode is
individuals, the other is groups to which they belong. Ties represent
the relation: person A is a member of group B.
Ties
A tie is the relation between two actors. Common types of ties
include:
Friendship, Amount of communication, Goods exchanged, Familial
relation (kinship), Institutional relations
PRACTICAL ISSUES: BOUNDARIES AND
SAMPLES
Because human relations are rich and unbounded, drawing
meaningful boundaries for network analysis is a challenge.
There are two main approaches:
Realist: boundaries perceived by actors themselves, e.g. gang
members or ACM members.
Nominalist: Boundaries created by researcher: e.g. people who
publish in ACM CHI.
To deal with large networks, sampling is necessary. Unfortunately,
randomly sampled graphs will typically have completely different
structure. Why?
One approach to this is “snowballing”. You start with a random
sample. Then extend with all actors connected by a tie. Then extend
with all actors connected to the previous set by a tie…
THE WEB AS A SOCIAL NETWORK
Social networks are formed between Web pages by
hyperlinking to other Web pages.
A hyperlink is usually an explicit indicator that one
Web page author believes that another page is related
or relevant.
The possibility to publish and gather personal
information, a major factor in the success of the Web
Two Major Tasks
Social Network Extraction from the Web
Social Network Analysis
Social Networking Services (SNS).
Friendster; Orkut
INFERRING COMMUNITIES IN WEB
Bibliographic Metrics
bibliographic
coupling
co-citation coupling
BLOGSPHERE AS A SOCIAL NETWORK
Weblogs have become prominent social media on
the Internet that enable users to quickly and easily
publish content including highly personal thoughts.
Bloggers might list one another’s blogs in a Blogroll
and might read, link to a post, or comment on
other blogs’ posts (A post is the smallest part of a
blog which has some contents and readers can
comment on it. A post also has a date of publish).
SEMANTIC WEB AND SOCIAL NETWORK
Semantic Web: having data on the Web defined and
linked in a way that it can be used by people and
processed by machines in a ”wide variety of new and
exciting applications”
SW and SN models support each other:
Semantic Web enables online and explicitly represented
social information
social networks, especially trust networks, provide a new
paradigm for knowledge management in which users
”outsource” knowledge and beliefs via their social networks
SEMANTIC WEB AND SOCIAL NETWORK
Drawbacks to Centralized Social Networks
the information is under the control of the database owner
centralized systems do not allow users to control the information
they provide on their own terms
The friend-of-a-friend(FOAF) project is a first attempt at a formal,
machine processable representation of user profiles and friendship
networks.
The Swoogle Ontology Dictionary shows that the class foaf:Person
currently has nearly one million instances spread over about 45,000
Web documents.
The FOAF ontology is not the only one used to publish social information
on the Web.
For example, Swoogle identifies more than 360 RDFS or OWL classes
defined with the local name ”person”.
SW AND SNA (ISSUES)
Knowledge representation.
Small number of common ontologies
Knowledge management.
efficient and effective mechanisms for accessing
knowledge, especially social networks, on the Semantic Web
Social network extraction, integration and analysis
extracting social networks correctly from the noisy and
incomplete knowledge on the (Semantic) Web
Provenance and trust aware distributed inference.
manage and reduce the complexity of distributed inference
by utilizing provenance of knowledge
SOCIAL NETWORKS AND KMS
Why Social Networks in
KMS?
People
KM
Technology
Organization
Processes
Knowledge Management involves people, technology, and processes in
Overlapping parts.
SOCIAL NETWORKS AND KMS
Why are we studying
Social Networks ?
What ties Information Architecture,
Knowledge Management and
Social Network Analysis more
closely together is the reciprocal
relationship between people and
content.
Social
Networks
Information
Architecture
Knowledge
Management
Systems
SOCIAL NETWORK ANALYSIS
Social network analysis [SNA] is the mapping and measuring of
relationships and flows between people, groups, organizations,
computers or other information/knowledge processing entities.
The nodes in the network are the people and groups while the links
show relationships or flows between the nodes.
SOCIAL NETWORK ANALYSIS (SNA)
We measure Social Network in terms of:
1. Degree Centrality:
The number of direct connections a node has. What really matters is where
those connections lead to and how they connect the otherwise
unconnected.
2. Betweenness Centrality:
A node with high betweenness has great influence over what flows in the
network indicating important links and single point of failure.
3. Closeness Centrality:
The measure of closeness of a node which are close to everyone else.
The pattern of the direct and indirect ties allows the nodes any other node in
the network more quickly than anyone else. They have the shortest paths to all others.
Application of SNA: Building the 9/11 Al- Qaeda Network.
DIRECTIONS
Reduce Complexity
Geo-social networks
Integrating concepts from semantic web, social network, and
knowledge management
Geo-social semantic web
Visualizing social networks
Security and Privacy
Mining and analysis of social networks
Predicting what the memebrs would do next
OUTLINE OF PART II
Social Networks
Social Networks and 9/11 Terrorists
Social Networks and Baseball Drug Use
Social Networks and Expert Finder
SOCIAL NETWORKS
HTTP://WWW.FLAIRANDSQUARE.COM/ARCHI
VES/167
A social network site allows people who share interests to build a
‘trusted’ network/ online community. A social network site will usually
provide various ways for users to interact, such as IM (chat/ instant
messaging), email, video sharing, file sharing, blogging, discussion
groups, etc.
The main types of social networking sites have a ‘theme’, they allow
users to connect through image or video collections online (like Flicker
or You Tube) or music (like My Space, lastfm). Most contain libraries/
directories of some categories, such as former classmates, old work
colleagues, and so on (like Face book, friends reunited, Linked in, etc).
They provide a means to connect with friends (by allowing users to
create a detailed profile page), and recommender systems linked to
trust.
SOCIAL NETWORK ANALYSIS OF 9/11 TERRORISTS
(WWW.ORGNET.COM)
Early in 2000, the CIA was informed of two terrorist suspects linked to al-Qaeda.
Nawaf Alhazmi and Khalid Almihdhar were photographed attending a meeting of
known terrorists in Malaysia. After the meeting they returned to Los Angeles,
where they had
already set up residence in late 1999.
SOCIAL NETWORK ANALYSIS OF 9/11 TERRORISTS
What do you do with these suspects? Arrest or deport them immediately?
No, we need to use them to discover more of the al-Qaeda network.
Once suspects have been discovered, we can use their daily activities to
uncloak their network. Just like they used our technology against us, we
can use their planning process against them. Watch them, and listen to
their conversations to see...
•who they call / email
•who visits with them locally and in other cities
•where their money comes from
The structure of their extended network begins to emerge as data is
discovered via surveillance.
SOCIAL NETWORK ANALYSIS OF 9/11 TERRORISTS
A suspect being monitored may have many contacts -- both accidental and intentional. We must
always be wary of 'guilt by association'. Accidental contacts, like the mail delivery person, the grocery
store clerk, and neighbor may not be viewed with investigative interest.
Intentional contacts are like the late afternoon visitor, whose car license plate is traced back to a
rental company at the airport, where we discover he arrived from Toronto (got to notify the
Canadians) and his name matches a cell phone number (with a Buffalo, NY area code) that our
suspect calls regularly. This intentional contact is added to our map and we start tracking his
interactions -- where do they lead? As data comes in, a picture of the terrorist organization slowly
comes into focus.
How do investigators know whether they are on to something big? Often they don't. Yet in this case
there was another strong clue that Alhazmi and Almihdhar were up to no good -- the attack on the
USS Cole in October of 2000. One of the chief suspects in the Cole bombing [Khallad] was also
present [along with Alhazmi and Almihdhar] at the terrorist meeting in Malaysia in January 2000.
Once we have their direct links, the next step is to find their indirect ties -- the 'connections of their
connections'. Discovering the nodes and links within two steps of the suspects usually starts to reveal
much about their network. Key individuals in the local network begin to stand out. In viewing the
network map in Figure 2, most of us will focus on Mohammed Atta because we now know his history.
The investigator uncloaking this network would not be aware of Atta's eventual importance. At this
point he is just another node to be investigated.
Figure 2 shows the two suspects and
SOCIAL NETWORK ANALYSIS OF 9/11 TERRORISTS
Figure 2 shows the two suspects and
e to be investigated.
SOCIAL NETWORK ANALYSIS OF 9/11 TERRORISTS
SOCIAL NETWORK ANALYSIS OF 9/11 TERRORISTS
We now have enough data for two key conclusions:
•
All 19 hijackers were within 2 steps of the two original suspects uncovered in 2000!
•
Social network metrics reveal Mohammed Atta emerging as the local leader
With hindsight, we have now mapped enough of the 9-11 conspiracy to stop it. Again, the
investigators are never sure they have uncovered enough information while they are in the
process of uncloaking the covert organization. They also have to contend with superfluous data.
This data was gathered after the event, so the investigators knew exactly what to look for.
Before an event it is not so easy.
As the network structure emerges, a key dynamic that needs to be closely monitored is the activity
within the network. Network activity spikes when a planned event approaches. Is there an
increase of flow across known links? Are new links rapidly emerging between known nodes?
Are money flows suddenly going in the opposite direction? When activity reaches a certain
pattern and threshold, it is time to stop monitoring the network, and time to start removing
nodes.
The author argues that this bottom-up approach of uncloaking a network is more effective than a top
down search for the terrorist needle in the public haystack -- and it is less invasive of the
general population, resulting in far fewer "false positives".
Figure 2 shows the two suspects and
SOCIAL NETWORK ANALYSIS OF STEROID USAGE IN
BASEBALL (WWW.ORGNET.COM)
When the Mitchell Report on steroid use in Major League Baseball [MLB], was published, people were surprised at
who and how many players were mentioned. The diagram below shows a human network created from data found in
the Mitchell Report. Baseball players are shown as green nodes. Those who were found to be providers of steroids
and other illegal performance enhancing substances appear as red nodes. The links reveal the flow of chemicals -from provider to player.
SOCIAL NETWORKING FOR KNOWLEDGE MANAGEMENT
EXAMPLES
WWW.ORGNET.COM
Managing the 21st Century Organization
Networks of Adaptive/Agile Organizations
Best Practice: Organizational Network Mapping
Discovering Communities of Practice
Data-Mining E-mail
Finding Leaders on your Team
Post-Merger Integration
Knowledge Sharing in Organizations
Innovation happens at the Intersections
Partnerships and Alliances in Industry
Decision-Making in Organizations
New Organizational Structures
Figure 2 shows the two suspects and
KNOWLEDGE SHARING NETWORK: FINDING EXPERTS
(WWW.ORGNET.COM)
Organizational leaders are preparing for the potential loss of expertise and knowledge flow due to
turnover, downsizing, outsourcing, and the coming retirements of the baby boom generation. The
model network (previous chart) is used to illustrate the knowledge continuity analysis process.
Each node in this sample network (previous chart) represents a person that works in a knowledge
domain. Some people have more / different knowledge than others. Employees who will retire in 2
years or less have their nodes colored red. Those who will retire in 3-4 years are colored yellow.
Those retiring in 5 years or later are colored green.
A gray, directed line is drawn from the seeker of knowledge to the source of expertise. A-->B indicates
that A seeks expertise / advice from B. Those with many arrows pointing to them are sought often for
assistance.
The top subject matter experts -- SMEs -- in this group are nodes 29, 46, 100, 41, 36 and 55.
The SMEs were discovered using a network metric in InFlow that is similar to how the
Google search engine ranks web pages -- using both direct and indirect links.
Of the top six SMEs in this group, half are colored red[100] or yellow[46, 55]. The loss of person 46
has the greatest potential for knowledge loss. 90% of the network is within
3 steps of accessing this key knowledge source.
Figure 2 shows the two suspects and
KNOWLEDGE SHARING IN ORGANIZATIONS: FINDING
EXPERTS
OTHER APPLICATIONS
Detecting coalitions and subgroups
Conducting a political campaign
Marketing a drug by a pharmaceutical company
Forming a travel network
Many more - - - - -
OUTLINE OF PART IV
Introduction to Social Networks
Properties of Social Networks
Social Network Analysis Basics
Examples
Data Privacy Basics
Privacy and Social Networks
SOCIAL NETWORKS
Social networks have important implications for our daily
lives.
Spread of Information
Spread of Disease
Economics
Marketing
Social network analysis could be used for many activities
related to information and security informatics.
Terrorist network analysis
ENRON SOCIAL GRAPH*
* http://jheer.org/enron/
SOCIAL NETWORKS
ROMANTIC RELATIONS AT “JEFFERSON HIGH SCHOOL”
“SMALL-WORLD” EXAMPLE: SIX DEGREES OF
KEVIN BACON
SOCIAL NETWORK MINING
Social network data is represented a graph
Individuals
Nodes
are represented as nodes
may have attributes to represent personal traits
Relationships
Edges
are represented as edges
may have attributes to represent relationship
types
Edges may be directed
Common Social Network Mining tasks
Node
classification
Link Prediction
GRAPH MODEL
Lindamood et al. 09 &
Heatherly et al. 09
Graph represented by a set of homogenous
vertices and a set of homogenous edges
Each node also has a set of Details, one of
which is considered private.
COLLECTIVE INFERENCE
Lindamood et al. 09 &
Heatherly et al. 09
Collection of techniques that use node
attributes and the link structure to refine
classifications.
Uses local classifiers to establish a set of priors
for each node
Uses traditional relational classifiers as the
iterative step in classification
RELATIONAL CLASSIFIERS
Lindamood et al. 09 &
Heatherly et al. 09
Class Distribution Relational Neighbor
Weighted-Vote Relational Neighbor
Network-only Bayes Classifier
Network-only Link-based Classification
EXPERIMENTAL DATA
Lindamood et al. 09 &
Heatherly et al. 09
167,000 profiles from the Facebook
online social network
Restricted to public profiles in the
Dallas/Fort Worth network
Over 3 million links
GENERAL DATA PROPERTIES
Diameter of the largest component
16
Number of nodes
167,390
Number of friendship links
3,342,009
Total number of listed traits
4,493,436
Total number of unique traits
110,407
Number of components
18
Probability Liberal
.45
Probability Conservative
.55
Lindamood et al. 09 &
Heatherly et al. 09
INFERENCE METHODS
Lindamood et al. 09 &
Heatherly et al. 09
Details only: Uses Naïve Bayes classifier to
predict attribute
Links Only: Uses only the link structure to
predict attribute
Average: Classifies based on an average of the
probabilities computed by Details and Links
PREDICTING PRIVATE DETAILS Lindamood et al. 09 &
Heatherly et al. 09
Attempt to predict the value of the political
affiliation attribute
Three Inference Methods used as the local
classifier
Relaxation labeling used as the Collective
Inference method
REMOVING DETAILS
Lindamood et al. 09 &
Heatherly et al. 09
Ensures that no ‘false’ information is
added to the network, all details in the
released graph were entered by the user
Details that have the highest global
probability of indicating political affiliation
removed from the network
REMOVING LINKS
Lindamood et al. 09 &
Heatherly et al. 09
Ensures that the link structure of the
released graph is a subset of the original
graph
Removes links from each node that are
the most like the current node
MOST LIBERAL TRAITS
Lindamood et al. 09 &
Heatherly et al. 09
Trait Name
Trait Value
Weight Liberal
Group
legalize same sex
marriage
46.16066789
Group
every time i find out a cute 39.68599463
boy is conservative a little
part of me dies
Group
equal rights for gays
33.83786875
Group
the democratic party
32.12011605
Group
not a bush fan
31.95260895
Group
people who cannot
understand people who
voted for bush
30.80812425
Group
government religion
disaster
29.98977927
et al. 09 &
MOST CONSERVATIVE TRAITSLindamood
Heatherly et al. 09
Trait Name
Trait Value
Weight Conservative
Group
george w bush is my
homeboy
45.88831329
Group
college republicans
40.51122488
Group
texas conservatives
32.23171423
Group
bears for bush
30.86484689
Group
kerry is a fairy
28.50250433
Group
aggie republicans
27.64720818
Group
keep facebook clean
23.653477
Group
i voted for bush
23.43173116
Group
protect marriage one man
one woman
21.60830487
MOST LIBERAL TRAITS PER TRAIT
NAME
Lindamood et al. 09 &
Heatherly et al. 09
Trait Name
Trait Value
Weight Liberal
activities
amnesty international
4.659100601
Employer
hot topic
2.753844959
favorite tv shows
queer as folk
9.762900035
grad school
computer science
1.698146579
hometown
mumbai
3.566007713
Relationship Status
in an open relationship
1.617950632
religious views
agnostic
3.15756412
looking for
whatever i can get
1.703651985
EXPERIMENTS
Lindamood et al. 09 &
Heatherly et al. 09
Conducted on 35,000 nodes which recorded
political affiliation
Tests removing 0 details and 0 links, 10 details
and 0 links, 0 details and 10 links, and 10
details and 10 links
Varied Training Set size from 10% of available
nodes to 90%
Results are documented in papers