Transcript Short URL
we.b : The web of short URLs
Demetris Antoniades, lasonas Polakis, Gerogios Kontaxis, Elias Athansapoulos,
Sotiris loannidis, Evangelos P.Markatos, Thomas Karagiannis
FORTH-ICS, FORTH-ICS, FORTH-ICS, FORTH-ICS, FORTH-ICS, FORTH-ICS,
Microsoft Research
www 2011
March 30 2011
Presented by Somin Kim
Outline
Introduction
URL Shortening Services
Data Collection
The Web of Short URLs
Evolution and Lifetime
Publishers
Short URLs and Web Performance
Conclusion
2/36
Introduction
The idea behind URL shortening services is to assist in the easy
sharing of URLs by providing a short equivalent one
Short URLs have seen a significant increase in their usage
– Result of their extensive usage in Online Social Networks
Understanding the usage of short URLs is important
– To provide insight into the interests of OSNs or IM systems
– To know performance, scalability, and reliability of URL shortening services
– To define the proper architecture for URL shortening services
3/36
Outline
Introduction
URL Shortening Services
Data Collection
The Web of Short URLs
Evolution and Lifetime
Publishers
Short URLs and Web Performance
Conclusion
4/36
URL Shortening Services(1/3)
URL Shortening Services
Long URL
http://www.this.is.a.lon
g.url.com/indeed.html
URL
shortening
Service
bit.ly
publish
Short URL
http://bit.ly/dv82ka
access
Redirected to original URL
Popularity of URL shortening services
– The rapid adoption of OSNs has led to an increased demand for short URLs
– Short URLs are also useful in traditional systems
such as IMs, SMSes, and e-mails
5/36
URL Shortening Services(2/3)
Some of these services provide statistics about the accesses of
these URLs
–
–
–
–
The number of hits
The referrer sites the hits came from
The visitor’s countries
…
Users can create many short URLs for the same long URL
– If a user creates a short URL for the same long URL, the service will create a
different hash that will be given to the user
– For each unique long URL, bit.ly provides a unique global hash with an
information page
– Overall statistics will still be kept by the global URL’s information page
6/36
URL Shortening Services(3/3)
Global information
통계페이지
캡쳐해서보여
줄까?
7/36
Outline
Introduction
URL Shortening Services
Data Collection
– Collection methodology
– Collected data
The Web of Short URLs
Evolution and Lifetime
Publishers
Short URLs and Web Performance
Conclusion
8/36
Data Collection(1/3)
Collection Methodology
Twitter crawling
– Twitter crawling returns links “gossiped” in a social network
– We collected tweets that contain HTTP URLs
– Only 13% of the HTTP URLs were not shortened by any URL shortening
services
– 50% of the HTTP URLs from Twitter were from bit.ly URLs
9/36
Data Collection(2/3)
Collection Methodology
Brute-Force
– We can get hashes irrespective of their published medium and recency
– We gathered metadata provided by the shortening service
– We monitored the evolution of the keyspace in ow.ly system
Ow.ly serially iterates over the available short URL space
About 70000 new short URLs created each day
10/36
Data Collection(3/3)
Collected Data
In case of twitter and bitly, all the accompanied metadata for each
short URL are also collected
11/36
Outline
Introduction
URL Shortening Services
Data Collection
The Web of Short URLs
–
–
–
–
Where do short URLs come from?
Where do short URLs point to?
Location
Popularity
Evolution and Lifetime
Publishers
Short URLs and Web Performance
Conclusion
12/36
The Web of Short URLs(1/7)
Where do short URLs come from?
Short URLs do not frequently appear in traditional web pages
– The vast majority of users arrive at bit.ly from non-web applications
– Users who access through web applications mostly come from social
networking channels (Twitter, facebook)
13/36
The Web of Short URLs(2/7)
Where do short URLs point to?
Most popular types of short URL contents
– News and informative content come first
– 4% of the most accessed URLs in owly trace were shortening services
Spammers use short URLs packed inside other short URLs to avoid exposure of
the long URL
14/36
The Web of Short URLs(3/7)
Location
The penetration of short URL use is significantly different from
that of the Internet/web
– Most of these accesses come from the United States, Japan, and Great
Britain
– Any accesses from China and India was not seen
China and India are ranked in the top-5 countries with the largest number of
Internet users
15/36
The Web of Short URLs(4/7)
Popularity
URL popularity
– Large systems that provide
content to users typically exhibit
the power-law behavior
A small fraction of the content is
very popular
Most of it is considered
uninteresting
16/36
The Web of Short URLs(5/7)
Popularity
URL popularity (cont.)
– We split short URLs into active and inactive
Inactive : no hit was observed during the last 7 days of trace
– 10% of the short URLs are responsible for about 90% of the total hits seen
in trace
17/36
The Web of Short URLs(6/7)
Popularity
Content popularity
– Besides familiar websites, less known or popular websites were observed
Pollpigeon.com(short opinion polls), Mashable.com(social media news),
Twibbon.com(Twitter campaign)
– Short polls are popular contents
It’s very common in social networking sites
18/36
The Web of Short URLs(7/7)
Popularity
Content popularity (cont.)
– Do popular web sites significantly change over time?
About 6 sites appears every single day of April 2010 in the top-100
– 22 sites for March 2010
About 400 sites enjoy short bursts of popularity
19/36
Outline
Introduction
URL Shortening Services
Data Collection
The Web of Short URLs
Evolution and Lifetime
– Life span of short URLs
– Temporal evolution
Publishers
Short URLs and Web Performance
Conclusion
20/36
Evolution and Lifetime(1/5)
Life span of short URLs
Lifetime of a URL is the number of days between its last and first
observed hit
Lifetime CDF of the traces (twitter2, bitly)
– 50% of the short URLs are not ephemeral
– Inactive URLs have a shorter lifespan
21/36
Evolution and Lifetime(2/5)
Temporal evolution
The daily change in the number of hits for each short URL
– The number of accesses for a typical short URL varies by as much as 40%
from one day to the next
– As less popular URLs are included, larger daily changes are observed
22/36
Evolution and Lifetime(3/5)
Temporal evolution
The evolution of hit rate across the lifetime of the short URLs
– Inactive URLs
Average 60% of hits are
observed during their first day
After that, hit rate drops
sharply
– Active URLs
First-day effect is also evident
A significant hit rate for recent
days are also observed
23/36
Evolution and Lifetime(4/5)
Temporal evolution
The daily hit rate with a short URL’s lifetime for inactive short
URLs
– There’s no obvious dependence of the daily hit rate with a short URL’s
lifetime
24/36
Evolution and Lifetime(5/5)
Temporal evolution
Total number of hits as a function of the short URL’s lifetime
– Active short URLs(bottom) appear to exhibit a linear relationship in log-log
scale
25/36
Outline
Introduction
URL Shortening Services
Data Collection
The Web of Short URLs
Evolution and Lifetime
Publishers
Short URLs and Web Performance
Conclusion
26/36
Publishers(1/4)
Twitter effect
– Short URLs referred from Twitter enjoy significantly higher popularity
27/36
Publishers(2/4)
CCDF of posted short URLs per Twitter user
– Most users published a handful of tweets with short URLs
– The majority of tweets with short URLs are original Twitter messages (not
retweets)
28/36
Publishers(3/4)
User’s daily publish rate of short URLs
– Median rate is 1 short URL per day
– 98% or the user publish no more than 5 short URLs per day
29/36
Publishers(4/4)
Correlation between a user’s publish rate and total number of hits
– As the number of URLs published by a poster increases, the expected hit
rate drops
Spamming-type behavior
Only a few short URLs from each publisher enjoy high hit rates
30/36
Outline
Introduction
URL Shortening Services
Data Collection
The Web of Short URLs
Evolution and Lifetime
Publishers
Short URLs and Web Performance
– Space reduction
– Latency
Conclusion
31/36
Short URLs and Web Performance(1/3)
Space reduction
Space gain for the short URL
– URL shortening services are quite effective at reducing URL size
For roughly 50% of the URLs, 91% reduction in size is observed
– In twitter trace, only 31% of long versions of short URL remained under the
character limit
32/36
Short URLs and Web Performance(2/3)
Latency
URL shortening services impose an additional overhead in the
user’s web request
We periodically accessed the 10 most popular short URLs
– Fb.me and ow.ly exhibit a bimodal behavior
– Bit.ly appears to be the slowest but shows more consistent behavior
33/36
Short URLs and Web Performance(3/3)
Latency
The redirection overhead of bit.ly
– More than 50% of the accesses, the URL shortening redirection imposes a
relative overhead of 54%
– This additional delay turns out to be comparable to the final web page
access time in a significant fraction
34/36
Outline
Introduction
URL Shortening Services
Data Collection
The Web of Short URLs
Evolution and Lifetime
Publishers
Short URLs and Web Performance
Conclusion
35/36
Conclusion
We have presented a large-scale study of URL shortening services
– Exploring traces from services themselves and Twitter
Summary
– Short URLs appear mostly in ephemeral media, with profound effects on
their popularity, lifetime, and access patterns
– Small number of URLs have a very large number of accesses
– A large percentage of short URLs are not ephemeral
– The most popular websites changes slowly over time
– The web sites differ from the sites which are popular among the broader
web community
– URL shortening services are extremely effective in space gaining but
increase the overhead to access the web page
36/36