Spamscatter - UCSD CSE - Systems and Networking

Download Report

Transcript Spamscatter - UCSD CSE - Systems and Networking

Spamscatter
Introduction
Spamscatter:
Characterizing Internet
Scam Hosting Infrastructure
David S. Anderson, Chris Fleizach,
Stefan Savage, and Geoffrey M. Voelker
University of California, San Diego
1
Aug. 9th, 2007
Usenix Security 2007
Spamscatter
Introduction
Motivation
• 70 billion spam messages are sent everyday for a
simple reason, advertising websites.
• A scam then is any website marketed using spam
• This online resource is directly implicated in the
spam profit cycle, meaning it is rarer and more
valuable
• Characterizing the scam infrastructure helps
– Reveal the dynamics and business pressures exerted
on spammers
– Identify means to reduce unwanted sites and spam
2
Aug. 9th, 2007
Usenix Security 2007
Spamscatter
Introduction
Spamscatter Approach
• Mine a large quantity of spam
– Extract URLs
– Probe machines hosting the scams
• This works because URLs must be correct
– Follow the scent of money…
• All we need is a reliably large source of spam
– We have access to a four letter, top level domain
producing 150K spam per day
3
Aug. 9th, 2007
Usenix Security 2007
Spamscatter
Introduction
Understanding scams
• Are scams distributed across different servers?
• Do different scams share the same server?
• How long do scams stay active? How reliable
is their hosting?
• Where are scam servers located?
• Why is it useful to study these characteristics?
4
Aug. 9th, 2007
Usenix Security 2007
Spamscatter
Methodology
Spamscatter and the Scam
5
Aug. 9th, 2007
Usenix Security 2007
Spamscatter
Methodology
Methodology
• Data collection
– Extract links from large spam feed
– Probe links every 3 hours for 7 days
– Record browser redirection
– Save screenshots
• Analysis
– Identify scams across servers and domains
– Report on distributed and shared infrastructure,
lifetime, stability, and location
6
Aug. 9th, 2007
Usenix Security 2007
Spamscatter
Methodology
Identifying Scams
• Goal: Identify multiple hosts in the same scam, since many
scams are spread across different IPs and domain names
• Naïve Approaches:
1. Correlate independent spam emails
2. Use HTML content returned from the webserver
• Limitations:
 Spam has too much chaff and obfuscation
 HTML is uninteresting and mostly composed of images.
 Web crawlers fail with frames, iframes and JavaScript
7
Aug. 9th, 2007
Usenix Security 2007
Spamscatter
Methodology
Image Shingling
• Solution: Use rendered screenshots of web pages for
correlation.
– How to compare upwards of 10,000 images?
• Image shingling – based on text shingling idea [BRO97]
– Fragment images into blocks and hash the blocks
– Two images are similar if T% of the hashed blocks
are the same (T=70-80%)
– Shingling allows us to essentially compare all
images in O(N lg N)
– Resilient to small variations among images
8
Aug. 9th, 2007
Usenix Security 2007
Spamscatter
An Example Scam
An Example Scam: “Downloadable Software”
Scam Perspective
• 99 observed virtual
hosts
• 3 IP addresses
• Operated for months
• 85 senders
• No forwarding used
• 5535 probes (97%
successful)
9
Aug. 9th, 2007
Usenix Security 2007
Spamscatter
An Example Scam
Clustering with
Image Shingling
• Images differ
slightly
• Some pages rotate
content
10
Aug. 9th, 2007
Usenix Security 2007
Spamscatter
An Example Scam
Location
• 2 Web servers in China; 1 Webserver in Russia
• 85 senders from 30 countries (28 from US)
Blue – Web servers hosting Downloadable Software
Red – Spam Relays – Hosts that sent us spam
Aug. 9th, 2007
Usenix Security 2007
11
Spamscatter
An Example Scam
Shared Infrastructure
• One of the IPs (221.4.246.3) hosting
“Downloadable Software” was also hosting
“Toronto Pharmacy”
• Server located in Guangzhou, China
12
Aug. 9th, 2007
Usenix Security 2007
Spamscatter
Results
Summary Statistics
1 week of spam collection – Nov. 28th – Dec. 4th
2 weeks of probing – Nov. 28th – Dec. 11th
1,087,711
Spam messages
319,700
30% contain links
36,390
11.3% are distinct links
7,029
19.3% resolve to unique IP addresses
2,334
33.2% resolve to distinct scams
13
Aug. 9th, 2007
Usenix Security 2007
Spamscatter
Results - Infrastructure
Distributed Infrastructure
To what extent is the infrastructure distributed for scams?
• Most scams are not
distributed:
– 94% used one IP
• Top three distributed
scams were extensive
– 22, 30, and 45 IPs
• Top three virtualhosted scams
– 110, 695, and 3029
domain names
14
Aug. 9th, 2007
Usenix Security 2007
Spamscatter
Results - Infrastructure
Shared Infrastructure
To what extent do multiple scams share infrastructure?
• 38% of scams
hosted on a
machine with at
least one other
scam
• 10 IPs hosted 10
or more scams
• Top three shared
IPs
– 15, 18, and 22
scams
15
Aug. 9th, 2007
Usenix Security 2007
Spamscatter
Results - Lifetime
Scam Lifetime & Stability
How long are scams active, and how reliable are the hosts?
Scam webhosts seem
to be taken down
shortly after scams
disappear
Overall scam lifetime
approached two
weeks
Reliability is high >
97% usually
16
Aug. 9th, 2007
Usenix Security 2007
Spamscatter
Results - Lifetime
Spam campaign lifetime
How long do spam campaigns last for a scam?
• 137 spams messages
per scam (Avg)
• Most spam
campaigns relatively
short – 88% last 20
hours or less
• Only 8% last more
than 2 days
• Scam lifetimes
considerably longer –
on average one week
< 2 days
< 20 hour
17
Aug. 9th, 2007
Usenix Security 2007
Spamscatter
Results - Location
Location
Where are scam hosting servers located?
Blue – Web servers
Red – Spam Relays
Aug. 9th, 2007
18
Usenix Security 2007
Spamscatter
Results - Location
Location
Spam Relays
Web Servers
Country
1. usa
2. chn
3. can
4. gbr
5. fra
6. deu
7. rus
8. kor
Count
5884
741
379
315
314
258
185
181
Percent
[57.40%]
[7.23%]
[3.70%]
[3.07%]
[3.06%]
[2.52%]
[1.80%]
[1.77%]
Country Count
1. usa
54159
2. fra
26371
3. esp
25196
4. chn
24833
5. pol
21199
6. ind
20235
7. deu
18678
8. kor
17446
Percent
[14.50%]
[7.06%]
[6.75%]
[6.65%]
[5.68%]
[5.42%]
[5.00%]
[4.67%]
19
Aug. 9th, 2007
Usenix Security 2007
Spamscatter
Results - Categorization
Scam Categorization
Scam category
% of scams
Uncategorized………………………………. 29.57%
Information Technology………………… 16.67%
Dynamic Content …………………………. 11.52%
Business and Economy …………………. 6.23%
Shopping ……………………………………… 4.30%
Financial Data and Services ………….. 3.61%
Illegal or Questionable …………………. 2.15%
Adult ……………………………………………. 1.80%
Message Boards and Clubs …………… 1.80%
Web Hosting ………………………………… 1.63%
20
Aug. 9th, 2007
Usenix Security 2007
Spamscatter
Results - Categorization
Lifetime of scams with Categorization
More than 40%
of malicious
scams disappear
before 120 hours
Same is true for
less than 15% of
all scams
21
Aug. 9th, 2007
Usenix Security 2007
Spamscatter
Conclusion
Summary
• Started with over 1m spam messages and coalesced to
fewer than 2,500 scams.
• Image shingling allowed us to scalably determine if two
sites were part of the same scam
• Most scams use one web server (vulnerable to
blacklisting)
– Scams may use many virtual domains that point to one IP
• Most scams not malicious per se
• Scam infrastructure more stable, longer lived,
concentrated in US, compared with spam senders
22
Aug. 9th, 2007
Usenix Security 2007
Spamscatter
Conclusion
Spammers beware;
These boffins are on the prowl
Questions and Answers
23
Aug. 9th, 2007
Usenix Security 2007
Spamscatter
Supplementary Information
Spamscope Visibility
• Collected spam from news.admin.netabuse.sightings – a newsgroup for
contributing spam
• For a 3 day period, we saw
– 6,977 spam from the newsgroup  205 scams
– 113,216 spam from our feed  1,687
• 12% of the newsgroup scams were in ours
• The “largest” scams (most emails and most
domains/IP) were seen in both feeds
24
Aug. 9th, 2007
Usenix Security 2007
Spamscatter
Results - Blacklisting
Blacklists
Host type
Spam relay
Classification
Open proxy
Spam host
% of hosts
72.27%
5.86%
Scam host
Open proxy
Spam host
2.06%
14.86%
9.7% of the scam hosts also sent us spam
25
Aug. 9th, 2007
Usenix Security 2007
Spamscatter
Supplementary Information
Web Server OS
1 Linux recent 2.4 (1)
2 Windows 2000 (SP1+)
3 Akamai ???
4 Windows 2000 SP4
5 Linux recent 2.4 (2)
6 FreeBSD 4.6-4.8
7 Slashdot or BusinessWeek
8 FreeBSD 5.0
9 Windows XP SP1
10 Linux older 2.4
11.97%
11.05%
10.86%
8.25%
7.84%
7.72%
7.04%
6.49%
5.90%
5.56%
26
Aug. 9th, 2007
Usenix Security 2007
Spamscatter
Supplementary Information
URL Classification
WISP Dynamic Content
WISP Uncategorized
17.931%
13.965%
WISP Illegal or Questionable
WISP Information Technology
WISP Shopping
WISP Business and Economy
WISP Financial Data and Services
WISP Personals and Dating
WISP Advertisements
WISP Educational Institutions
WISP Pay-to-Surf
WISP Search Engines and Portals
WISP Supplements and Unregulated Compounds
WISP Sex
10.306%
9.051%
4.872%
4.733%
4.626%
1.867%
1.249%
1.247%
1.022%
0.884%
0.865%
0.862%
27
Aug. 9th, 2007
Usenix Security 2007
Spamscatter
Supplementary Information
Image Clustering
1 week of spam collection – Nov. 28th – Dec. 4th
2 weeks of probing – Nov. 28th – Dec. 11th
2,541,486
Total probes
9.8% of probes result in a
captured image
250,864
9572
3.8% of screenshots are the 'first'
screenshot for a scam
2334
Clusters detected by image
shingling
28
Aug. 9th, 2007
Usenix Security 2007
Spamscatter
Supplementary Information
Image Shingling
For a typical day of
screenshots, we
tested various
thresholds
A 70% threshold
provided a good
mixture between
flexibility and
accuracy
29
Aug. 9th, 2007
Usenix Security 2007
Spamscatter
Supplementary Information
Overlap of pairs of scams on the same server
For scams running on the same server, how much time do they overlap?
• 96% of all scam
pairs overlapped with
each other when
they remained active
•Only 10% of scams
fully overlapped each
other
One week
30
Aug. 9th, 2007
Usenix Security 2007
Spamscatter
Supplementary Information
IP ranges
What are the network locations of scams and spam relays?
• The cumulative
distribution of IP
addresses is highly nonuniform
• Majority of spam
relays (60%) fall
between 58.* -> 91.*
• Most scams (50%) fall
between 64.* -> 72.*
31
Aug. 9th, 2007
Usenix Security 2007