Polygraph: Automatic Signature Generation for Polymorphic Worms

Download Report

Transcript Polygraph: Automatic Signature Generation for Polymorphic Worms

Polygraph: Automatically Generating
Signatures for Polymorphic Worms
James Newsome, Brad Karp, and
Dawn Song
Carnegie Mellon University
Presented by Ryan Gates
Overview




Goal
Composition of a worm
Invariant bytes and Tokens
Types of signatures
◦ Conjunction
◦ Token Subsequence
◦ Bayes




Polygraph Signature Generator
Metrics
Results
Evaluation
Goal

Automate the generation of worm
signatures
◦ Specifically polymorphic worms

Prevent polymorphic worms from going
undetected
◦ Including perfectly polymorphic instances
Decomposition of a worm
Figure 1. Polymorphed ApacheKnacker
Invariant bytes
 Wild card bytes
 Code bytes

Invariant Bytes

Invariant framing
◦ Reserved key words or well known binary
constants that are part of the wire protocol
◦ For example "HTTP" or "GET"

Invariant overwrite values
◦ High order bytes of the overwritten address
◦ For example in BIND-TSIG "\xFF\xBF"


Many invariant substrings are not sufficiently
long to not prevent false positives.
The solution is to let each set of invariant
bytes be represented by a token
Tokens

Tokens must not be a substring of
another token
◦ For example HTTP not TTP
Conjunction Signature
 Token Sub-sequence Signature
 Bayes Signature

◦ Each token value represents the probability of
that token being present in an actual worm
flow.
Conjunction Signatures
Every token in the conjunction signature
must be found in the payload for there to
be a match
 All tokens are required to match
 Reduce false positives
 For example in the Apache-Knacker
signature, ‘GET’, ‘HTTP/1.1\r\n’,’:’ are
tokens in a conjunction signature

Token Subsequence Signatures
Similar to the conjunction signature, but
more restrictive.
 All tokens must be present in the correct
order to reduce false positives
 Typically modeled using Regular
Expressions
 For example in the BIND-TSIG signature,
“GET.*HTTP/1.1\r\n.*…”

Bayes Signature
Set of tokens, and each with a score
 If the sum the tokens exceeds a
threshold then it is considered a match.
 A sample signature would include
‘\x00\x00\xFA’: 1.7574
 Benefits

◦ Less rigid, which helps prevent false positives
for common tokens.
◦ Higher quality signatures with a more diverse
suspicious pool.
Limitations of Signature Types

Bayes signature is unaffected by noise,
until it grows beyond 80%. At this point
there will be 100% false negatives.
◦ Flow classifier did a very poor job of
classifying the flows.

Conjunction and Token Subsequence
cannot handle multiple types of worms
◦ The solution is to use clustering to separate
the worms into manageable clusters
Clustering
Clustering helps the conjunction and
token subsequence signatures deal with
variety
 Used to divide the suspicious flows into a
number of different pools.
 Divide the suspicious pool into several
clusters which contain types of flows

◦ Clusters should not be too general
◦ Clusters should not be too specific
Polygraph Signature Generator


The polygraph monitor must have access to
the network's packet flow.
An imperfect flow classifier sorts packet
flows into either the suspicious or innocuous
pool.
Polygraph Signature Generator
It will not distinguish between different
worms, but merely suspicious flows and
innocuous flows.
 Flow classifier is reliable, but imperfect.
 The result is noise.

Polygraph Signature Generator
Uses samples to determine appropriate
signatures for worms present in the
suspicious flow pool.
 Resilient to noise in the system

Metrics

Quality
◦ Low percentage of false positives and false negatives

Efficiency in generation
◦ Lower computational cost

Efficiency in matching
◦ Should not inhibit the network traffic

Generate small signature sets
◦ Limit the number of signatures

Robustness
◦ Yield high quality signature even with noise and a
variety of worms
◦ Resistance to clever evasion by worms
Results | ApacheKnacker
Table 1. ApacheKnacker signatures. These signatures
were successfully generated for innocuous pools
containing at least 3 worm samples.
 Best performer was Token Subsequence
 The ordering used in the Token Subsequence
signature helps reduce the number of false positives.

Results | BIND-TSIG
Table 2. BINDTSIG signatures. These signatures
were successfully generated for innocuous
pools containing at least 3 worm samples.
 The best performers were Conjunction and
Token Subsequence.
 Bayes signature quality is degraded when the
tokens are common in other innocuous flows.

Results | Coincidental Pattern

Coincidental Patter attack injects
invariant bytes in wildcard bytes to
confuse the signature generater.
Contribution

Polygraph helps to automate signature
generation

Examined the effects that implementing
polymorphism on worms could have on
worm signature generation and matching.

Introduced imperfections in the
classifying of network flows
Limitations

Worms that lack invariant code

Requires a flow classifier and at least 3
worm samples

If the innocuous pool is too diverse,
there will be too many false positives.
Improvements and Future Work
Take advantage of multiple cores.
 Incorporate the design of an efficient
flow classifier
 Determine how feasible it is to inspect
network traffic
 Determine an algorithm to choose best
signature to use

References

J. Newsome, B. Karp, and D. Song.
Polygraph: Automatically generating
signatures for polymorphic worms. In
IEEE Security and Privacy Symposium,
2005.