Polygraph: Automatic Signature Generation for Polymorphic Worms
Download
Report
Transcript Polygraph: Automatic Signature Generation for Polymorphic Worms
Polygraph: Automatically Generating
Signatures for Polymorphic Worms
James Newsome, Brad Karp, and
Dawn Song
Carnegie Mellon University
Presented by Ryan Gates
Overview
Goal
Composition of a worm
Invariant bytes and Tokens
Types of signatures
◦ Conjunction
◦ Token Subsequence
◦ Bayes
Polygraph Signature Generator
Metrics
Results
Evaluation
Goal
Automate the generation of worm
signatures
◦ Specifically polymorphic worms
Prevent polymorphic worms from going
undetected
◦ Including perfectly polymorphic instances
Decomposition of a worm
Figure 1. Polymorphed ApacheKnacker
Invariant bytes
Wild card bytes
Code bytes
Invariant Bytes
Invariant framing
◦ Reserved key words or well known binary
constants that are part of the wire protocol
◦ For example "HTTP" or "GET"
Invariant overwrite values
◦ High order bytes of the overwritten address
◦ For example in BIND-TSIG "\xFF\xBF"
Many invariant substrings are not sufficiently
long to not prevent false positives.
The solution is to let each set of invariant
bytes be represented by a token
Tokens
Tokens must not be a substring of
another token
◦ For example HTTP not TTP
Conjunction Signature
Token Sub-sequence Signature
Bayes Signature
◦ Each token value represents the probability of
that token being present in an actual worm
flow.
Conjunction Signatures
Every token in the conjunction signature
must be found in the payload for there to
be a match
All tokens are required to match
Reduce false positives
For example in the Apache-Knacker
signature, ‘GET’, ‘HTTP/1.1\r\n’,’:’ are
tokens in a conjunction signature
Token Subsequence Signatures
Similar to the conjunction signature, but
more restrictive.
All tokens must be present in the correct
order to reduce false positives
Typically modeled using Regular
Expressions
For example in the BIND-TSIG signature,
“GET.*HTTP/1.1\r\n.*…”
Bayes Signature
Set of tokens, and each with a score
If the sum the tokens exceeds a
threshold then it is considered a match.
A sample signature would include
‘\x00\x00\xFA’: 1.7574
Benefits
◦ Less rigid, which helps prevent false positives
for common tokens.
◦ Higher quality signatures with a more diverse
suspicious pool.
Limitations of Signature Types
Bayes signature is unaffected by noise,
until it grows beyond 80%. At this point
there will be 100% false negatives.
◦ Flow classifier did a very poor job of
classifying the flows.
Conjunction and Token Subsequence
cannot handle multiple types of worms
◦ The solution is to use clustering to separate
the worms into manageable clusters
Clustering
Clustering helps the conjunction and
token subsequence signatures deal with
variety
Used to divide the suspicious flows into a
number of different pools.
Divide the suspicious pool into several
clusters which contain types of flows
◦ Clusters should not be too general
◦ Clusters should not be too specific
Polygraph Signature Generator
The polygraph monitor must have access to
the network's packet flow.
An imperfect flow classifier sorts packet
flows into either the suspicious or innocuous
pool.
Polygraph Signature Generator
It will not distinguish between different
worms, but merely suspicious flows and
innocuous flows.
Flow classifier is reliable, but imperfect.
The result is noise.
Polygraph Signature Generator
Uses samples to determine appropriate
signatures for worms present in the
suspicious flow pool.
Resilient to noise in the system
Metrics
Quality
◦ Low percentage of false positives and false negatives
Efficiency in generation
◦ Lower computational cost
Efficiency in matching
◦ Should not inhibit the network traffic
Generate small signature sets
◦ Limit the number of signatures
Robustness
◦ Yield high quality signature even with noise and a
variety of worms
◦ Resistance to clever evasion by worms
Results | ApacheKnacker
Table 1. ApacheKnacker signatures. These signatures
were successfully generated for innocuous pools
containing at least 3 worm samples.
Best performer was Token Subsequence
The ordering used in the Token Subsequence
signature helps reduce the number of false positives.
Results | BIND-TSIG
Table 2. BINDTSIG signatures. These signatures
were successfully generated for innocuous
pools containing at least 3 worm samples.
The best performers were Conjunction and
Token Subsequence.
Bayes signature quality is degraded when the
tokens are common in other innocuous flows.
Results | Coincidental Pattern
Coincidental Patter attack injects
invariant bytes in wildcard bytes to
confuse the signature generater.
Contribution
Polygraph helps to automate signature
generation
Examined the effects that implementing
polymorphism on worms could have on
worm signature generation and matching.
Introduced imperfections in the
classifying of network flows
Limitations
Worms that lack invariant code
Requires a flow classifier and at least 3
worm samples
If the innocuous pool is too diverse,
there will be too many false positives.
Improvements and Future Work
Take advantage of multiple cores.
Incorporate the design of an efficient
flow classifier
Determine how feasible it is to inspect
network traffic
Determine an algorithm to choose best
signature to use
References
J. Newsome, B. Karp, and D. Song.
Polygraph: Automatically generating
signatures for polymorphic worms. In
IEEE Security and Privacy Symposium,
2005.