Transcript Short URL

we.b : The web of short URLs
Demetris Antoniades, lasonas Polakis, Gerogios Kontaxis, Elias Athansapoulos,
Sotiris loannidis, Evangelos P.Markatos, Thomas Karagiannis
FORTH-ICS, FORTH-ICS, FORTH-ICS, FORTH-ICS, FORTH-ICS, FORTH-ICS,
Microsoft Research
www 2011
March 30 2011
Presented by Somin Kim
Outline








Introduction
URL Shortening Services
Data Collection
The Web of Short URLs
Evolution and Lifetime
Publishers
Short URLs and Web Performance
Conclusion
2/36
Introduction
 The idea behind URL shortening services is to assist in the easy
sharing of URLs by providing a short equivalent one
 Short URLs have seen a significant increase in their usage
– Result of their extensive usage in Online Social Networks
 Understanding the usage of short URLs is important
– To provide insight into the interests of OSNs or IM systems
– To know performance, scalability, and reliability of URL shortening services
– To define the proper architecture for URL shortening services
3/36
Outline








Introduction
URL Shortening Services
Data Collection
The Web of Short URLs
Evolution and Lifetime
Publishers
Short URLs and Web Performance
Conclusion
4/36
URL Shortening Services(1/3)
 URL Shortening Services
Long URL
http://www.this.is.a.lon
g.url.com/indeed.html
URL
shortening
Service
bit.ly
publish
Short URL
http://bit.ly/dv82ka
access
Redirected to original URL
 Popularity of URL shortening services
– The rapid adoption of OSNs has led to an increased demand for short URLs
– Short URLs are also useful in traditional systems
 such as IMs, SMSes, and e-mails
5/36
URL Shortening Services(2/3)
 Some of these services provide statistics about the accesses of
these URLs
–
–
–
–
The number of hits
The referrer sites the hits came from
The visitor’s countries
…
 Users can create many short URLs for the same long URL
– If a user creates a short URL for the same long URL, the service will create a
different hash that will be given to the user
– For each unique long URL, bit.ly provides a unique global hash with an
information page
– Overall statistics will still be kept by the global URL’s information page
6/36
URL Shortening Services(3/3)
Global information
통계페이지
캡쳐해서보여
줄까?
7/36
Outline
 Introduction
 URL Shortening Services
 Data Collection
– Collection methodology
– Collected data





The Web of Short URLs
Evolution and Lifetime
Publishers
Short URLs and Web Performance
Conclusion
8/36
Data Collection(1/3)
Collection Methodology
 Twitter crawling
– Twitter crawling returns links “gossiped” in a social network
– We collected tweets that contain HTTP URLs
– Only 13% of the HTTP URLs were not shortened by any URL shortening
services
– 50% of the HTTP URLs from Twitter were from bit.ly URLs
9/36
Data Collection(2/3)
Collection Methodology
 Brute-Force
– We can get hashes irrespective of their published medium and recency
– We gathered metadata provided by the shortening service
– We monitored the evolution of the keyspace in ow.ly system
 Ow.ly serially iterates over the available short URL space
 About 70000 new short URLs created each day
10/36
Data Collection(3/3)
Collected Data
 In case of twitter and bitly, all the accompanied metadata for each
short URL are also collected
11/36
Outline




Introduction
URL Shortening Services
Data Collection
The Web of Short URLs
–
–
–
–




Where do short URLs come from?
Where do short URLs point to?
Location
Popularity
Evolution and Lifetime
Publishers
Short URLs and Web Performance
Conclusion
12/36
The Web of Short URLs(1/7)
Where do short URLs come from?
 Short URLs do not frequently appear in traditional web pages
– The vast majority of users arrive at bit.ly from non-web applications
– Users who access through web applications mostly come from social
networking channels (Twitter, facebook)
13/36
The Web of Short URLs(2/7)
Where do short URLs point to?
 Most popular types of short URL contents
– News and informative content come first
– 4% of the most accessed URLs in owly trace were shortening services
 Spammers use short URLs packed inside other short URLs to avoid exposure of
the long URL
14/36
The Web of Short URLs(3/7)
Location
 The penetration of short URL use is significantly different from
that of the Internet/web
– Most of these accesses come from the United States, Japan, and Great
Britain
– Any accesses from China and India was not seen
 China and India are ranked in the top-5 countries with the largest number of
Internet users
15/36
The Web of Short URLs(4/7)
Popularity
 URL popularity
– Large systems that provide
content to users typically exhibit
the power-law behavior
 A small fraction of the content is
very popular
 Most of it is considered
uninteresting
16/36
The Web of Short URLs(5/7)
Popularity
 URL popularity (cont.)
– We split short URLs into active and inactive
 Inactive : no hit was observed during the last 7 days of trace
– 10% of the short URLs are responsible for about 90% of the total hits seen
in trace
17/36
The Web of Short URLs(6/7)
Popularity
 Content popularity
– Besides familiar websites, less known or popular websites were observed
 Pollpigeon.com(short opinion polls), Mashable.com(social media news),
Twibbon.com(Twitter campaign)
– Short polls are popular contents
 It’s very common in social networking sites
18/36
The Web of Short URLs(7/7)
Popularity
 Content popularity (cont.)
– Do popular web sites significantly change over time?
 About 6 sites appears every single day of April 2010 in the top-100
– 22 sites for March 2010
 About 400 sites enjoy short bursts of popularity
19/36
Outline





Introduction
URL Shortening Services
Data Collection
The Web of Short URLs
Evolution and Lifetime
– Life span of short URLs
– Temporal evolution
 Publishers
 Short URLs and Web Performance
 Conclusion
20/36
Evolution and Lifetime(1/5)
Life span of short URLs
 Lifetime of a URL is the number of days between its last and first
observed hit
 Lifetime CDF of the traces (twitter2, bitly)
– 50% of the short URLs are not ephemeral
– Inactive URLs have a shorter lifespan
21/36
Evolution and Lifetime(2/5)
Temporal evolution
 The daily change in the number of hits for each short URL
– The number of accesses for a typical short URL varies by as much as 40%
from one day to the next
– As less popular URLs are included, larger daily changes are observed
22/36
Evolution and Lifetime(3/5)
Temporal evolution
 The evolution of hit rate across the lifetime of the short URLs
– Inactive URLs
 Average 60% of hits are
observed during their first day
 After that, hit rate drops
sharply
– Active URLs
 First-day effect is also evident
 A significant hit rate for recent
days are also observed
23/36
Evolution and Lifetime(4/5)
Temporal evolution
 The daily hit rate with a short URL’s lifetime for inactive short
URLs
– There’s no obvious dependence of the daily hit rate with a short URL’s
lifetime
24/36
Evolution and Lifetime(5/5)
Temporal evolution
 Total number of hits as a function of the short URL’s lifetime
– Active short URLs(bottom) appear to exhibit a linear relationship in log-log
scale
25/36
Outline








Introduction
URL Shortening Services
Data Collection
The Web of Short URLs
Evolution and Lifetime
Publishers
Short URLs and Web Performance
Conclusion
26/36
Publishers(1/4)
 Twitter effect
– Short URLs referred from Twitter enjoy significantly higher popularity
27/36
Publishers(2/4)
 CCDF of posted short URLs per Twitter user
– Most users published a handful of tweets with short URLs
– The majority of tweets with short URLs are original Twitter messages (not
retweets)
28/36
Publishers(3/4)
 User’s daily publish rate of short URLs
– Median rate is 1 short URL per day
– 98% or the user publish no more than 5 short URLs per day
29/36
Publishers(4/4)
 Correlation between a user’s publish rate and total number of hits
– As the number of URLs published by a poster increases, the expected hit
rate drops
 Spamming-type behavior
 Only a few short URLs from each publisher enjoy high hit rates
30/36
Outline







Introduction
URL Shortening Services
Data Collection
The Web of Short URLs
Evolution and Lifetime
Publishers
Short URLs and Web Performance
– Space reduction
– Latency
 Conclusion
31/36
Short URLs and Web Performance(1/3)
Space reduction
 Space gain for the short URL
– URL shortening services are quite effective at reducing URL size
 For roughly 50% of the URLs, 91% reduction in size is observed
– In twitter trace, only 31% of long versions of short URL remained under the
character limit
32/36
Short URLs and Web Performance(2/3)
Latency
 URL shortening services impose an additional overhead in the
user’s web request
 We periodically accessed the 10 most popular short URLs
– Fb.me and ow.ly exhibit a bimodal behavior
– Bit.ly appears to be the slowest but shows more consistent behavior
33/36
Short URLs and Web Performance(3/3)
Latency
 The redirection overhead of bit.ly
– More than 50% of the accesses, the URL shortening redirection imposes a
relative overhead of 54%
– This additional delay turns out to be comparable to the final web page
access time in a significant fraction
34/36
Outline








Introduction
URL Shortening Services
Data Collection
The Web of Short URLs
Evolution and Lifetime
Publishers
Short URLs and Web Performance
Conclusion
35/36
Conclusion
 We have presented a large-scale study of URL shortening services
– Exploring traces from services themselves and Twitter
 Summary
– Short URLs appear mostly in ephemeral media, with profound effects on
their popularity, lifetime, and access patterns
– Small number of URLs have a very large number of accesses
– A large percentage of short URLs are not ephemeral
– The most popular websites changes slowly over time
– The web sites differ from the sites which are popular among the broader
web community
– URL shortening services are extremely effective in space gaining but
increase the overhead to access the web page
36/36