Your Botnet is my Botnet: Analysis of a Botnet Takeover

Download Report

Transcript Your Botnet is my Botnet: Analysis of a Botnet Takeover

Your Botnet is my Botnet:
Analysis of a Botnet Takeover
BRETT STONE-GROSS, MARCO COVA, LORENZO
CAVALLARO, BOB GILBERT, MARTIN SZYDLOWSKI,
RICHARD KEMMERER, CHRISTOPHER KRUEGEL,
AND GIOVANNI VIGNA
PRESENTATION BY SAM KLOCK
Background
 Botnet: network of
machines compromised
by malware (bots) under
control of external agent
 Botmaster: agent
controlling a botnet
 Command and control
(C&C): mechanism by
which botmaster controls
a botnet
Motivation
 Botnets: big and growing security issue on the
Internet



More broadband Internet access makes them easier to build
Wealth of information transported makes them profitable
Sizeable botnets can participate in large-scale malicious acts
 We want to know more about them
 How do they grow?
 What can they do?
 How do we address the threats existing and potential botnets
pose?
 How do we preempt their growth (address user
vulnerabilities)?
Problem
 Analyzing botnets is difficult



Topologies vary: top-down, P2P,
random
Protocols and goals vary
Sizes vary widely
 Several techniques are typical

Passive analysis: collect spam
likely sent from bots; analyze
query patterns to DNS/DNSBL;
examine network traffic



Can’t scale to entire Internet
Some metrics only work for
botnets engaging in certain
activities
Infiltration: join the botnet and
monitor

Most botnets avoid supplying
information to member bots
Images: Wang, Sparks, Zou, “An Advanced Hybrid Peer-to-Peer Botnet”, in
IEEE Transactions on Dependable and Secure Computing, 7(2): 113-127.
Approach
 Hijack the botnet
 Idea: investigate botnet C&C, then tamper with it
 Learn about botnet behavior from perspective of botmaster
 Two ways to accomplish
 Seize botmaster’s C&C machines


Law enforcement’s job
Better: collaborate with DNS providers
Goal: redirect C&C traffic to us
 Then mimic C&C behavior

 Approach depends on targeted botnet
Target: Torpig
 “One of the most advanced pieces of crimeware ever
created”


Mainly harvests personal information
Opens ports for HTTP and SOCKS on victim machines
Useful for anonymous browsing, sending spam
 Not yet clear what Torpig does with them

 Good candidate for DNS-based hijacking
 Centralized C&C
 Bots identify C&C via domain names
 Communication via HTTP
Torpig vs. Others
 Torpig has interesting characteristics
 Domain flux
 Bot identifiers
 A lot of harvested information
 Implementable protocol
 Past attempts:
 Conficker: no bot identifiers, protocol authentication
Size estimation is hard
 No authentication  no data


Kraken: no data collection (spam sending)

Little insight into data harvesting
Torpig Characteristics
 Basic idea: Trojan-horsed
based rootkit


Uses Mebroot
Attack via drive-by-download
Vulnerable web server
 Vulnerable client/OS


Install Mebroot, then install
Torpig malware
(0) Inject URL
(1) Client HTTP GET
(2) Deliver injected URL
(3) Client HTTP GET from
DbD server
 (4) Download & run
Mebroot




Torpig Characteristics (cont’d)
 (5) Fetch Torpig libraries
 (6) Configure, monitor
 (7) Execute man-in-the-
browser phishing
Bot Behavior
 Periodic C&C
communication



~20 minutes
Uploads harvested data
Server responds okn or okc
 Man-in-the-browser more
complex




List of targeted URLs
Requests sent to special
injection server
Bypasses SSL, certificates, etc.
Can be hijacked (not attempted
here)
Hijacking Torpig
 Domain flux
 Related to fast flux
 C&C hidden behind shifting
domains
 Bots generate list of domains
to check periodically
 Iterate through list; stop on
valid response
 Domain generation
Pseudocode for daily DGA
algorithm (DGA) reverseengineered
 Botmaster didn’t register
domains in advance: big
weakness
Hijacking Torpig (cont’d)
 Conceptually simple with
DGA, protocol,
botmaster carelessness


Register domains first
Mimic protocol
(encryption easily broken)
 Not a general approach
 Conficker: 50,000
domains per day
 Nondeterministic
 Estimated cost: > $91.3m
per year
 In practice:
 Two different hosting
providers, two different
registrars





Redundancy
Apache handled requests
Data obtained
downloaded and
discarded from hosts
Total: 8.7 GB Apache logs,
69 GB pcap
Up three weeks, collected
ten days
Hijacking Torpig (cont’d)
 Legal/ethical
implications

Botnet is a criminal
instrument
 Precedent in past
research
 Follow-through:

No new config (okn only)

Shared data with DoD,
FBI, ISPs
Torpig Data Format
 Communication via HTTP
POST
 URL: bot ID (nid), header
 Body: stolen data
 Header info:
 ts
 ip
 hport, sport
 os, cn
 bld, ver
Torpig Data Collected
Analysis: Botnet Size
 nid may be used to
count bots



Computed from HDD
model/serial
Not completely unique:
couple with os, cn, bld,
ver
Subtract researchers,
probes, casual machines
 Found 182,800 likely
infected hosts
 Identifying researchers





Intuition: analyze in
controlled environment
Use virtual machine
VMs have default
hardware specs (HDD
model/serial)
Eliminate nids computed
from VM defaults
Discounted 40 hosts
Analysis: Botnet Size (cont’d)
 Much more accurate than IP
counting

DHCP churn causes overcount
706 machines: > 100 IPs
 One host: 694 unique IPs



NAT causes undercount
1,247,642 unique IPs vs.
182,800 est. bots
 Traffic characteristics
 Peaks at 9am PST, troughs
9pm PST
 Within hour: unique IPs =
unique bots
 Within day: unique IPs >
unique bots
Analysis: Botnet Growth
 Most bots in U.S.,
Germany, Italy


Intuition: targeted
websites mainly English,
German, Italian
IP counting overestimates
Italian/German infections
 Found 49,294 new
infections


Most on Jan 25, 27
How? ts = 0
Analysis: Botnet as Service
 Why bld?


Twelve different values
Some values more active
than others
dxtrbc: 5,432,528
submissions
 mentat: 1,582,547
submissions


Features do not seem to
differ from build to build
 Explanation: customers
 Treat bld as identifier for
customers
 Can process output on
basis of customer
payment, wants
 Q: Paper doesn’t
mention distribution of
builds over members.
Could build activity be
attributable to that?
Analysis: Stolen Data
 Institutional data
 8,310 accounts, 410
institutions
 Paypal (1,770)
 Poste Italiane (765)
 Capital One (314)
 E*Trade (304)
 Chase (214)
 310 institutions: < 10
accounts
 Notifying victims:
complicated
 38% credentials stolen
from password managers
Analysis: Stolen Data (cont’d)
 Credit cards
 Checked prefixes, used Luhn
heuristic
 Found 1,660 unique
debit/credit card numbers
 1,056 Visa
 447 MasterCard
 81 American Express
 36 Maestro
 24 Discover
 49% in U.S., 12% in Italy, 8%
in Spain, rest in 40 others
 86%: only one card number
 One case: 30 numbers
 Value (via Symantec):



$0.10 to $25 per card
$10 to $1000 per account
$83k to $8.3m over ten
days: profitable
 Assumes all data is fresh
Analysis: Proxies and Other Uses
 HTTP/SOCKS proxies
 20.2% machines public
accessible
 Looked at 10,000 most
active IPs
 Most likely to be used
 Checked IPs against
Spamhaus list
 One is known spammer
 244 flagged as proxies or
malware-infected
 Conclusion: usable, but
can’t claim current use
 Distributed denial-of-
service (DDoS)





Question: how much
bandwidth?
Looked up connection types
for IPs via ip2location
65% analyzable IPs used
cable/DSL
Low baseline of 435 kbps
upstream: 19 Gbps total
Add in corporate connections
(22%) – much higher
 Caveat: could not look up
for two-thirds of hosts
Analysis: Passwords
 Sophos poll (March
2009): 33% of Internet
users use poor password
practices (n = 676)


Torpig supplied a lot of
passwords: we can
validate
297,262 user/password
pairs from 52,540
machines
 28% reused passwords
for 368,501 sites, similar
to Sophos
 Password strength
 Fed 173,686 unique passwords
to John the Ripper
 65 minutes: ~56,000 cracked
(simple replacement)
 +10 minutes: ~14,000 cracked
(wordlist)
 +24 hours: ~30,000 cracked
(brute force)
 40% cracked in < 75 mins
Conclusion
 Contributions:



Comprehensive analysis of Torpig
Insight into victims
Usability of botnets for fun, profit, attack
 Lessons:


IP-counting wildly imprecise. Do not use it
User culture is a big problem
Lots of passwords were guessed easily in this sample
 Intuition: users do not understand usage risks
 Solution: educate, educate, educate


Coordination with registrars, hosting facilities, victim
institutions, law enforcement is hard
Makes redressing victims difficult
 Solution: regulatory intervention
