Transcript P - Data Communication and Data Management Laboratory
Social Network Something Interesting Ruizhi Gao
Contents
• • • • The Born of Social Networks New types of Social Networks My Social Networks Research related
History
Early Years
They are client-based application which can allow you to add friends and have friends list. You can communicate with each other by send text message, image and other information AIM
Problems:
You have to remember the account ID if you want to add new friends. Your friends list is not visible It’s more like a communication tool but SNS ICQ
SixDrgree.com
Combine the function of many different tools like ICQ and AIM, and provide a new search approach based on users’ own information.
SixDegrees.com was a social network service website that lasted from 1997 to 2001 and was based on the Web of Contacts model of social networking. It was named after the six degrees of separation concept and allowed users to list friends, family members and acquaintances both on the site and externally; external contacts were invited to join the site. Users could send messages and post bulletin board items to people in their first, second, and third degrees, and see their connection to any other user on the site. It was one of the first manifestations of social networking websites in the format now seen today. Six Degrees was followed by more successful social networking sites based on the "social-circles network model" such as Friendster, MySpace, LinkedIn, XING and Facebook.
----- Wikipedia http://en.wikipedia.org/wiki/SixDegrees.com
LiveJournal.com
It gives a new way that people mark others as Friends to follow their journals and manage privacy settings
SNS on business networks
• • • Ryze Tribe.net
Friendster
MySpace
• • • Myspace could grow rapidly because many other SNS are trying to collect fees.
MySpace open some public webpages to famous bands or singers so their fans can follow. MySpace did not restrict users from add HTML code into the forms to make their own pages special.
Today
Twitter is an online social networking service and
microblogging
service that enables its users to send and read text-based messages of up to 140 characters, known as "tweets" Harvard - Only High School Networks Corporate Networks
• • • Reference Wikipedia Boyd, d. m., & Ellison, N. B. (2007). Social network sites: Definition, history, and scholarship.
Journal of Computer-Mediated Communication
,
13
(1), article 11.
Visual Academy http://www.onlineschools.org/visual academy/history-of-social-networking/
New
Types of Social Networks
• • • • • • General Facebook, Twitter … School, college Classmates.com, Friends Reunited(UK)… Art Community deviantART, Taltopia Movies , TV series Youtube, douban, Flixster, Filmow … Photo sharing Flickr, Fotki, DailyBooth … …
SoLoMo
• • SoLoMo, short for social-local-mobile, refers to a more mobile-centric version of the addition of local entries to search engine results.(GPS) SoLoMo based social network.
The most famous one is
Foursquare
Foursquare
You can check-in in different place (restaurant, movie center … ) and leave your comments. The more you checked in, more point you earn. You also can be the “mayor of one place” if you check-in many times
Private Social Network
• General social network may not allow you to upload some “sensitive” information or private information. However private social network allow you to create a group-centric SNS in which you can share more if you want.
Sgrouples
• Sgrouples (sounds like ‘scruples’) is our very own private, group-centric social network designed to mimic how small groups of people interact in their real lives. Sgrouples allows you to easily post content to different groups based on your real life interests – friends, family, work, sports teams, and hobby groups.
Reference
• • http://blog.sgrouples.com/new-social networking-sites/ http://en.wikipedia.org/wiki/List_of_social_ne tworking_websites
Experience
X-land Project
• • We designed the Xland project as a 3D immersive blog. Xland was part of the CHIPS (CHina Innovation Program for Students) program sponsored by Sun Microsystems and the Chinese Education Department.
It may be not called as a blog but a social space. Every user has its own room and we provide a open space like a community. People can decorate their own room and share information with each other. The most important thing is it’s not a client based but can be accessed from your web browser, which is very hard at that time. You need to consider very carefully about using the resource.
Trailer
Functions
Friend list and live chat HUD Keyboard piano Decorate your room Albums
Functions
Background music of your room Dairy We tried to transfer the “my page” in facebook Into “my room” in a 3D world. Good idea but impractical like what Kaifu Li did 3D browser Our blog: http://blogs.openwonderland.org/2010/07/23/xland-3d-blog/ Visitor
AlienSandal
• A real start up project SoLoMo based social network cooperated with students in UC Berkley • AlienSandal is a social platform based on Google Map, which enable users to trace and customize Life Track and connect to people who have similar tracks.
• Trailer http://www.youtube.com/watch?v=ACYFEsodwrY
Different Types of check-in Free draw in the map
Difficulties
• • • What is the most important thing in SNS?
Why do you want to use a SNS application?
Why can SNS do?
• Answer from VC, “friends, money and …” How to make your SNS popular?
RECOMMENDATION
Clustering
Why clustering is useful
• Grouping users in social networks is an important process that improves matching and recommendation activities in social networks. The data mining methods of clustering can be used in grouping the users in social networks [1] • [1]S. Alsaleh, R. Nayak, Y. Xu, “Grouping People in Social Networks Using a Weighted Multi-Constraints Clustering Method,” in Proceedings of WCCI 2012, June,10-15. Brisbane, Australia.
[2] GAN, G., MA, C. & WU, J. (2007) Data clustering: theory, algorithms,and applications. ASASIAM Series on Statistics and Applied Probability, 20, 219-230.
[3] NAYAK, R. (2011) Utilizing past relations and user similarities in a social matching system. Advances in Knowledge Discovery and Data Mining, 99-110.
First Question
• How many cluster???????
K-means, Fuzzy C means, K-medoids … Rule of Thumb [1] But this is not reliable….
[1]Kanti Mardia et al. (1979). Multivariate Analysis. Academic Press.
Cluster Estimation 1
• • Stephen L. Chiu, “Fuzzy Model Identification Based on Cluster Estimation,”
Journal of Intelligent and Fuzzy Systems
, Vol.2, pp 267-278, 1994.
Suppose we have a collection of n data points {
x 1 ,x 2
,…,
x
n }, in our case. For each data points, we can assign a potential value.
n P i
j
1
e
||
x i
x j
|| 2 4 and
r
2
r
a is a positive constant. We have
r
a and ||
x
i –
x
j = 0.5 here (suggested by paper).
|| is the distance function between data
x
i and
x
j .
Kendall tau Distance
• Kendall tau distance Example: Two sample data R 1 : 1,2,3,4,5 R 2 : 3,4,1,2,5 There are 4 disorder between R 1 and R 2 The distance will be 4 1 2 For R 1 1<2 1<3 1<4 1<5 2<3 2<4 2<5 3<4 3<5 4<5 For R 2 3<4 3>1 3>2 3<5 4>1 4>2 4<5 1<2 1<5 2<5 Count * * * *
Euclidean Distance
Two sample data R 1 : 1,2,3,4,5 R 2 : 3,4,1,2,5 • D e = 2 2 2 2 2
Hamming Distance
Two sample data R 1 : 1,2,3,4,5 R 2 : 3,4,1,2,5 • D H = 1+1+1+1+0 1 and 3 are different 2 and 4 are different 4 and 2 are different 5 and 5 are different 3 and 1 are different
Cluster Estimation 1
• After we have potential value for each data point, we select the * highest potential as the first cluster center . Let
x
1 be the data point of the first cluster center and
P
1 * be its potential value.
Then we will revise the potential of each data point
x
i by •
P i
i P
1 *
e
||
x i
x
1 || In which, we have: 4
r b
2 and
r b
1.5
r a
Cluster Estimation 1
• Next we select the data point with the highest remaining potential as the second cluster center. We further reduce the potential of each data point according to their distance to the second cluster center. In general, after the
k
th cluster center has been obtained, we revise the potential of each data point by
P i
i P k
*
e
||
x i
x k
Where
x
*
k
is the location of the potential value.
k
th cluster center and
P k
* is its
Cluster Estimation 1
• Every time we got the
P k
* , we need to decide whether we should select that as a new center , we have following process.
if
P k
*
P
1 * Accept else if
P k
*
x
*
k P
1 * as a cluster center and continue.
Reject
x
*
k
and end the selecting process else Let
d
min = [shortest of the distance between if
d
min
r a
P k
*
P
1 * 1
x
*
k
and all previously found cluster centers] Accept
x k
* as a cluster center and continue the whole process else Reject
x
*
k
and set the potential at
x
*
k
to 0.
Select the data point with the next highest potential as new
x
*
k
end if and re-test end if In which, we have 0.5
and 0.15
(suggested in paper)
Running Example 1
• Suppose we have the following rankings set (which may represents different pages you viewed. 1 is page 1, 2 is page 2) R 1 = {1,2,3,4,5,6,7}, R 2 = {1,2,4,3,5,7,6}, R 3 = {7,6,4,5,3,1,2}, R 4 = {7,6,5,4,1,3,2}.
First we will assign potential value for each of them by
P i
j n
1
e
||
x i
x j
|| 2 We have
P
for each of them
P
1 = 1.9311
P
3 = 1.7451
P
2 = 1.9336
P
4 = 1.7496
Running Example 1
• • We choose the highest value center and
P
1 * 1.9336
P
2 = 1.9336, that is, we choose R 2 as our first After this, we need to revise the potential value for each ranking by
P i
i P
1 *
e
||
x i
x
1 || So we got: P 1 = 0.06006
P 2 = 0.0
P 3 = 1.69382
P 4 = 1.57027
We will choose
P
3 then go on the process.
Running Example 1
• Since we have
P
3
P
1 * 0.290049
and = 1.69382 which is greater than
P
1 *
P
1 * 0.966833
0.966833
Refer to if
P k
*
P
1 * Accept
x
*
k
as a cluster center and continue.
So
P
3 will be our second center.
Then we will revise the potential value again by Then we will have
P i
i P
1 *
e
||
x i
x
2 || P 1 = -0.02683
P 2 = -0.04499
P 3 = 0.0
P 4 = 0.085356
So P 4 will be our next choice.
Running Example 1
• Since we have P 4
P
1 * 0.290049
and = 0.085356 which is less than
P
1 *
P
1 * 0.966833
0.290049
Refer to if
P k
*
P
1 * Reject
x
*
k
and end the selecting process So we will stop the whole process • Out of 4 data set, we have estimate 2 clusters and 2 centers which are R 2 and R 3 , this result makes sense.
Cluster Estimation 2: Max-min Distance
• • • ZHOU Shi-bing, XU Zhen-yuan, TANG Xu-qing, “New method for determining optimal number of clusters in K-means clustering algorithm,”
Computer Engineering and Applications
, Vol.46, pp 27-31, 2010.
Beliakov, Gleb and King, Matthew 2006, Density based fuzzy c-means clustering of non-convex patterns,
European journal of operational research
, vol. 173, no. 3, pp. 717-728. Suppose we have a collection of n data points { Kendall tau distance here)
x 1 ,x 2
,…,
x
n }, in our case, they are the ranks. For each two data points, we calculate the distance.(We use
Step 1
(in paper): Randomly choose one data point as the first center
c
1 .
Step 2
(in paper): Choose the data point which has largest distance from the first center
c
1 second center
c
2 .
as the
Cluster Estimation 2: Max-min Distance
• Step 3: Get the distance between the rest of the data and all the centers.
d ij
x i
c j j
center
) then we get
d i
min(
i
1 ,
i
2 ...
d ik
)
i
data
)
D
max{ } ||
c
1
c
2 ||
x i
as our next center.
,
we choose 0.6
Repeat this process until we cannot find any more centers.
Running Example 2
• Suppose we have the following rankings set.
R 1 = {1,2,3,4,5,6,7}, R 2 = {1,2,4,3,5,7,6}, R 3 = {7,6,4,5,3,1,2}, R 4 = {7,6,5,4,1,3,2}.
• The Kendall tau distance between each other is: R1 R2 R3 R4 R1 0 0.095238
0.904762
0.904762
R2 R3 R4 0.095238
0.904762
0 1 1 0 0.904762
0.809524
0.190476
0.809524
0.190476
0
Running Example 2
• The largest distance is between R2 and R3 which is 1.
So we choose these 2 as two centers then
d d
4 1 min(
d d i
1
i
2 ) min(
i
1
i
2 ) 0.095238
0.1904
7 6 Since
D
0.190476
0.
6 ||
c
1
c
2 || We stop the process and choose R 2 and R 3 as the centers.
Pros and Cons
• Estimation 1
r
a , , parameters are application sensitive. For now, there is no adaptive way to adjust such parameter.
• Estimation 2 The problem is on the initial center selection. The result will be affected by the noise data point
• Question