Intrusion Detection using Sequences of System Calls
Download
Report
Transcript Intrusion Detection using Sequences of System Calls
Intrusion Detection using
Sequences of System
Calls
By S. Hofmeyr & S. Forrest
Overview
Focus: privileged processes
Discriminator: system call sequences
Building a database: defining “normal”
Detecting anomalies: how to measure
Results: promising numbers
Concerns: remaining doubts
Extensions of research: Jones, Li & Lin
Inspiration
Human immune system
Recognition of self
Rejection of nonself
How would we describe “self” for a
software system, or a program?
Focus and Motivation
Focus on privileged processes
Exploitation can give a user root access
They provide a natural boundary
e.g. telnet daemon, login daemon
Privileged processes are easier to track
Specific, limited function
Stable over time
Contrast with the diversity of user actions
Where do we look?
Need to distinguish when:
Privileged process runs normally
Privileged process exhibits an anomaly
The discriminator is the observable entity
used to distinguish between these two
Use sequences of system calls as the
discriminator, the signature
How much detail?
Discriminator is sequences of system calls
Simple temporal ordering is chosen
Ignore parameters
Ignore specific timing information
Ignore everything else!
Why? As much as possible, work with
simple assumptions
Is it “enough”?
Is it enough detail?
Does the discriminator include enough
detail for this hypothesis to hold?
Answer seems to be yes !
Extra complication: due to the variability
in configuration and use of individual
systems, the set of “normal” sequences of
system calls will be different on different
systems
Design Decisions
Remember temporal ordering of calls
Not total sequence, but sequences of length k
What size should k be?
Long enough to detect anomalies, short as
possible
Empirical observation: length 6 to 10 is sufficient
So “self” is a database of (unordered) short
call sequences
Building the “normal” database
Synthetic
Assurance that the normal database contains
no intrusions; reproducible
But does not reflect any particular real user
activity
Actual use
Necessary to generate from actual use in
order to have a unique “self”
How long to accumulate? Is it clean?
The normal database
Database of normal sequences does not
contain all legal sequences
If it did, anomalies would not be detected
Some rare sequences will not be used during
database initialization
Database is stored as a forest to save
space
Signature Database
Structure (length 3)
fopen
fread
fopen
fread
strcmp
fread
strcmp
strcmp
strcmp
strcmp
fopen
strcmp
fopen
fread
strcmp
strcmp
fopen
fopen
fread
fread
fread
strcmp
strcmp
strcmp strcmp
strcmp strcmp
fopen
fopen
fread
Derive Robust Signature
Database
Robust Signature Database
600
Database Size
500
400
300
200
100
0
0
2000
4000
6000
Total Seqences Scanned
8000
10000
Detecting anomalies
A call sequence not in the database is an
anomalous sequence
Strength of that anomalous sequence is
measured by “Hamming distance” to the
closest normal sequence (called dmin)
Any call trace with an anomalous
sequence is an anomalous trace
Detecting anomalies
Strength of an anomalous trace is the
maximum dmin of the trace normalized for
the value of k (length of sequences in the
database):
ŜA = max{dmin values for the trace} / k
Value is between 0 and 1
By adjusting the threshold value for ŜA,
false positives can be reduced
Efficiency
Complexity of computing dmin
O(k(RAN + 1))
k is sequence length, RA is ratio of anomalous to
normal sequences, N is the number of sequences
in the database
dmin is calculated after every system call
The constant associated with this algorithm
is very important
Not yet running in real time
Results (synthetic)
Sanity test: If different programs are not
distinguishable, anomalies within one
program will certainly not be either
Easy to distinguish between programs;
mismatches on well more than 50% of the
instruction sequences (and ŜA >= 0.6)
All intrusions (both attempted & successful)
produced anomalies of varying strengths
Results (real environment)
The conjecture of unique normal
databases
Experiments in two configurations (at UNM
and MIT) had very different databases for the
same program (lpr)
Is this typical?
Closing concerns
False positives vs false negatives
If forced to choose, UNM prefers to have false
negatives because layering can mitigate
Saw 1 per 100 print jobs (lpr)
Due to system problems
Is ŜA a good measure?
It could help generate false positives
Single extra system call might make ŜA = 0.5
Annex Material
Some UVa experiments
S. Li, Y. Lin, and A. Jones
Illustrated by
two attacks
on Apache
Varied
sequence
length from 2
to 30
We chose
length 10 to
have margin
of error
Normalized Anomaly
Signal
Signature Length Has Little
Effect
1.2
1
0.8
0.6
0.4
0.2
0
0
10
20
30
Sequence Length
40
Effectiveness: Buffer Overflow
High normalized
#Mismatch %Mismatc
Normalized Anomaly
anomaly signals
es
hes indicate attacks
Signal
Stack Overwrite
467
3.5
0.7
Realpath
Vulnerability
569
2.7
0.6
Successfully detected buffer overflow
attacks against wu-ftpd
Work well because attacker code adds
new sequences of library calls
Effectiveness: Denial of Service
Simulated DOS attack that uses up all available memory
As attack progresses, library calls requesting memory
return abnormally and are re-issued
DOS attack caused application to invoke new library call,
fsync
Program - vi
#Mismat
ches
Normal Run
0
DOS Attack
101
No intrusion
detected
%Mismat Normalized
Anomaly
ches
Signal
0
2.6normalized
High
anomaly signal
indicates attack
0
0.6