Transcript BotnetPres
Botnet Behavior and
Detection Strategies
Brad Wilder
Overview
What are botnets?
What security threats are associated with botnets?
How prevalent are botnets?
What strategies can we use to detect and contain botnets?
Page 2
What Are Botnets?
Basic Definitions
Botnets are networks of malware infected machines, capable of being
controlled by a remote adversary. They consist of:
– Bots: malware infected machines
– Botmaster (aka bot herder): the attacker who controls the network
– Command and Control (C&C) channel: the communication channel over which
the botmaster communicates with and issues commands to the bots
– Bot client: the particular malware on which the bot is based
The malware may be a virus, worm, Trojan horse, spyware, rootkit, or any
other malicious/unwelcome software
Page 3
What Are Botnets?
Basic Architecture
Page 4
Botnet Security Threats
Information Infiltration
Intellectual property and personal information theft
– Trade secrets
– Military intelligence
– Banking credentials
– Usernames and passwords
– Information on personal preferences and habits
Key logging
Phishing/man-in-the-browser
Forms the basis for massive copyright infringement and identity theft
Page 5
Botnet Security Threats
Information Infiltration (cont’d)
Stone-Gross, et al hijacked the Torpig botnet, used for spam and phishing
attacks, for 10 days in early 2009, during which time they collected:
– ~70GB of data from more than 180,000 victims
– 8310 financial account credentials at 410 different institutions
– ~300,000 username-password pairs from 52,540 different infected machines
28% of the victims reused the same credentials at other Web sites, giving
them access to another 368,501 Web site accounts
Mariposa botnet, taken down in Spain in March
– Sensitive information from 800,000 users, including half of the Fortune 1000
companies and more than 40 of the world’s major banks
Page 6
Botnet Security Threats
DDoS
Distributed Denial of Service attacks
– Denial of Service attacks involve flooding a server with so much traffic that it
crashes due to the unexpected load
– Botnets distribute the workload among many bots
– Can be used to take down critical infrastructure
– Also used in extortion plots
– DDoS attacks are far more difficult to stop than DoS attacks, since blocking one
IP address does not stop the others
Torpig had an aggregate bandwidth of 17Gbps without factoring in
corporate networks, which accounted for 22% of the total
Page 7
Botnet Security Threats
Spam
95% of all spam is thought to originate from botnets
Spam represents 90% of all email traffic
160 billion spam messages per day!
Spam is not just irritating; it causes noticeable effects for the end user
– Slows connection speed
– Can steal contact information from your email inbox
– Is a conduit for spreading infections
Spam is virtually free to send, but costs time for the recipient to sift through, and
even more if a malware payload is delivered successfully
Page 8
Botnet Security Threats
Cyber Attack Sophistication vs. Cyber Criminal Sophistication
bots
Cross site scripting
Tools
“stealth” / advanced
scanning techniques
High
packet spoofing
sniffers
Intruder
Knowledge
Staged
attack
distributed
attack tools
www attacks
automated probes/scans
denial of service
sweepers
GUI
back doors
network mgmt. diagnostics
disabling audits
hijacking
burglaries sessions
Attack
Sophistication
exploiting known vulnerabilities
password cracking
self-replicating code
Attackers
password guessing
Low
1980
1985
1990
1995
2000+
Page 9
Source: CERT
How Prevalent Are Botnets?
Size Estimation Is Difficult
Botnet footprint: aggregate number of bots under the botnet’s control
Botnet live population: number of bots simultaneously under the botnet’s
control
There is no clear way to measure the size of a botnet
– Analyzing DNS traffic, looking for bots locating a C&C server, or querying DNS
blacklists to see if they have been flagged
– Redirecting C&C traffic into sinkholes
– Infiltration of the botnet C&C server
Most methods rely on counting bot IDs or IP addresses
Page 10
How Prevalent Are Botnets?
Size Estimation Is Difficult (cont’d)
Bot IDs can be changed at the whim of the botmaster, and may be inflated
to make the botnet appear larger
IP addresses do not represent a one-to-one relationship with machines
– One of the shortcomings of IPv4
– DHCP: dynamic allocation of IP addresses; ensures the same user does not
always use the same IP address; overinflates the size estimate
– NAT: allows multiple users on the same private network to more or less share
the same IP address; underinflates the size estimate
Page 11
How Prevalent Are Botnets?
Sizes May Be Overrepresented
Sizes are often erroneously reported
Mariposa botnet was widely reported to have claimed more than 12 million
hosts
– Original quotation indicates 12 million IP addresses
– Still must have compromised hundreds of thousands and possibly millions of
computers
– What is most surprising is the botmasters’ utter lack of proficiency
Page 12
How Prevalent Are Botnets?
Torpig Case Study
Stone-Gross, et al showed that IP address information may give a basis
for estimating the size of a botnet
– Over 10 days, they observed 1.2 million IP addresses
– Determined later that the botnet had a footprint of 182,800 bots
– Estimated an average live population of ~49,000 bots, based on the rate of new
IP addresses used
– Found that the IP address count represented about an order of magnitude
overrepresentation of the botnet footprint
Page 13
How Prevalent Are Botnets?
Total Number Of Bots In Operation
The difficulty in estimating the size of a single botnet further compounds
the difficulty of quantifying the entire botnet problem
There may be significant overlap among botnets, leading to
overestimation
Current estimates diverge widely
– Very conservative estimates put the total number in the hundreds of thousands
– More convincing estimates put the total number in the millions or tens of
millions, spread across perhaps thousands of botnets
Page 14
What Can Be Done
Prevention Strategies
Difference between prevention and detection
– Prevention involves stopping the spread of malware
– Detection is a reactive approach
PEBKAC
– Only 46% of computer users always update their AV software
– 30-60% of users have little to no knowledge about basic security issues
– Almost half of users that open spam do so intentionally
Zero-day viruses
– 20% of malware is not detected in the best of cases
Page 15
What Can Be Done
Basic Detection Strategies
Blacklisting domain names or IPs that exhibit problematic behavior
– Use honeypots: software traps into which malware can be lured
– Spam boxes can be used to study spam behavior
Early (naive) attempts were overly simplistic
– Listening on particular ports: these are often just a suggestion
– Examining packet contents: doesn’t work if the transmission is encrypted, or if
the bot commands are not known ahead of time
Page 16
What Can Be Done
IRC-Based Detection Strategies
Traffic analysis seems to be the most promising method for detecting
botnet C&C activity
Strayer, et al showed how a pipeline of successive filters could be used to
distill network traffic
They started with a base pool of over 9 million traffic flows taken over a 4
month period; >600GB of just TCP/IP header information; they added to
this 42 botnet C&C flows they generated with a bot under their control
over the course of hours
Page 17
What Can Be Done
IRC-Based Detection Strategies (cont’d)
Page 18
What Can Be Done
IRC-Based Detection Strategies (cont’d)
Page 19
What Can Be Done
IRC-Based Detection Strategies (cont’d)
Classifier stage is used to group flows into classes of communication
– Interactive
– Bulk data transfers
– Streaming
– Tranactional
Even though this seemed promising, the researchers omitted this stage in
their implementation because there were too many false negatives
Page 20
What Can Be Done
IRC-Based Detection Strategies (cont’d)
Correlator stage attempts to take flows that occur very close to each other
according to some metric and correlate them
Want to find flows that are the product of similar applications, that
demonstrate a causal relationship with one another, and that follow the
multicast model of communication
Associate with each flow a vector quantifying certain metrics
– Based on temporal qualities in their implementation
Group flows pairwise and plot each against the distance between the
contributing flow’s vector
Page 21
What Can Be Done
IRC-Based Detection Strategies (cont’d)
Page 22
What Can Be Done
IRC-Based Detection Strategies (cont’d)
Topological Analyzer stage attempts to find a common node that would
indicate the C&C channel
The researchers were able to identify 9 out of 10 bots, and find the C&C
channel
Confirmed their original hypothesis that IRC-based C&C flows are highly
correlated
Page 23
What Can Be Done
Fast-Flux And Domain Flux
Fast-flux: bots query a given domain looking for the C&C server, but the
domain is mapped onto a set of constantly changing IP addresses
– Researchers have combated this since there is a single point of entry
– Based on DNS traffic similarity
– The method was strongly affected by experimental parameters, and dependent
on blacklist
Domain flux: the domain names themselves change; each bot contains a
domain generation algorithm
– Stone-Gross et al countered this when hijacking Torpig
– The arms race is stacked against us; it is not scalable
– This technique is used by Conficker to generate 50,000 domains a day
Page 24
What Can Be Done
Current And Future Trends
P2P botnets: distributed architecture makes them much more resilient
– Probably best countered with traffic flow analysis, but this is an area of intense
research
Smaller size
More bandwidth
Decentralized C&C channels
Advanced customized encryption
IP disguising
Page 25
Conclusion
What are botnets?
What security threats are associated with botnets?
How prevalent are botnets?
What strategies can we use to detect and contain botnets?
Page 26
Thank You
Q&A
QUESTIONS???
Page 28