Transcript Slide Deck

Analyzing Browse Patterns
of Mobile Clients
Lili Qiu
Joint work with Atul Adya and Victor Bahl
{adya,bahl,liliq}@microsoft.com
Microsoft Research
ACM SIGCOMM Measurement Workshop
San Francisco, CA, November 2001
1
Outline



Overview
Related work
Analysis of a popular mobile Web site





Document popularity analysis
User behavior analysis
System load analysis
Content analysis
Summary and implications
2
Motivation



Phenomenal growth in cellular industry and
handheld device
Crucial to understand the performance of
wireless Web
Limited understanding of how wireless Web
services are being used
3
Related Work
Workload of clients at wireline networks

Server-based studies


Proxy-based studies


[ABC+96], [AW96], [MS97], [AJ99],[PQ00]
[BCF+99], [DMF97], [GB97], [VDA+99], [WVS+99]
Client-based studies

[CBC95] and [BBB+98]
Workload of wireless clients

[KBZ+2000]

Only 80K requests over seven months
4
Overview
A popular mobile Web site

Content


Period studied



news, weather, stock quotes, email, yellow pages,
travel reservations, entertainment etc.
August 15, 2000 – August 26, 2000
33 million accesses in 12 days
Type of analyses




This paper is a part of larger analysis study
Analysis of browse pattern
Analysis of notification logs
Correlation between how browsing and notification
services are being used
5
Overview: Types of Analysis




Document popularity analysis
User behavior analysis
System load analysis
Content analysis
6
Overview: User Categories

Cellular users


Offline users


Download content onto their PDAs for later (offline) browsing,
e.g. AvantGo
Desktop users


Browse the Web in real time on cellular phones
Signup services and specify preferences
Many more users now
User Type
# Users
# Requests
Cellular
58,432
2,210,758
Offline
50,968
20,508,272
Desktop
639,971
7,342,206
Misc.
1634
2,944,708
7
Document Popularity


Previous Web research have found Web
accesses follow Zipf-like distribution
(i.e. request frequency  1/i)
Two definitions of document


URL
<URL, parameter> (i.e. query)
8
1000000
100000
10000
1000
100
10
1
# Requests
# Requests
Document Popularity (Cont.)
1
10
100
Popularity ranking of urls
1000
1000000
100000
10000
1000
100
10
1
1
10
100
1000
10000
100000
Popularity ranking of url and para. pairs
Document Popularity does not closely
follow Zipf-like distribution.
9
Document Popularity (Cont.)
Percentage of
requests

1.2
1
0.8
0.6
0.4
0.2
0
Majority of the requests
are concentrated on a
small number of
documents

0
0.5
1
Percentage of documents
1.5
0.1% - 0.5% URL and
parameter combinations
(i.e. 112 – 442) account for
90% requests
Very small amount of memory needed
to cache popular query results.
10
User Behavior Analysis


Understand how long a wireless user stays
on the channel as he/she browses the Web
Determine user sessions


Intuition: a session is idle for a sufficiently long
time, we say it has ended.
Heuristic to determine a session inactivity
period
11
User Behavior Analysis (Cont.)
Determine the session inactivity period (s)





Too small s => too many sessions
Too large s => too few sessions
An appropriate value is at the
knee point
The knee point is between
30 to 45 seconds
No. of sessions vs. session inactivity period
1200000
No. of sessions

1000000
800000
600000
400000
200000
0
0
100
200
300
400
500
Session inactivity period (secs)
95% users


Have session time less than 3 minutes
Initiated less than 35 sessions during the 12 days
We can reclaim IP addresses more quickly than
90 seconds used previously in [KBZ+2000].
12
600
System Load Analysis

CDF for No. of entries vs Reply Size/100

1
0.8
0.6
Wireless
Offline
All
Desktop
Understand how to optimize Web
server for better performance
Small replies


0.4

0.2
0
0.0 100.0 200.0 300.0 400.0 500.0 614.0 742.0 1025.

98% to wireless users < 3 KB
99% to offline users < 6.3 KB
Diurnal pattern and weekday vs.
weekend variation
Over 60% browsing requests are
from offline PDA users, and less
than 7% are from wireless users.
1) Highly optimize sending small replies.
2) Identify what type of user issued the request, and
prioritize the request according to the user type.
13
Content Analysis
Important to content providers: what
content is interesting to users
Rank #1
Rank #2
Rank #3
Wireless
Stock quotes
News
Offline
Help
News
Desktop
Sign-ups
Email
Yellow
pages
Stock
quotes
Sports
Top three preferences for different kinds of users
14
Summary of Results and
Implications
Facts
Implications
0.1% - 0.5% queries (i.e.
121-442) account for 90%
requests.
A large fraction of
requests come from
automated sync programs.
Caching the results of
popular queries can be
very effective.
System designers should
prioritize requests
according to user type.
15
Summary of Results and
Implications
Facts
Implications
Most of the replies are
short (< 3KB for wireless
users, and < 6KB for
offline users).
The session inactivity
period is between 30 to
45 seconds.
Wireless Web servers
should highly optimize
sending short replies.
We may reclaim IP
addresses more quickly
than 90 seconds used
previously.
16