Reactive Patching: a viable worm defence strategy ? Milan Vojnović & Ayalvadi Ganesh Microsoft Research Cambridge, United Kingdom Tutorial Performance 2005 Juan-le-Pins, France, Oct 4, ‘05

Download Report

Transcript Reactive Patching: a viable worm defence strategy ? Milan Vojnović & Ayalvadi Ganesh Microsoft Research Cambridge, United Kingdom Tutorial Performance 2005 Juan-le-Pins, France, Oct 4, ‘05

Reactive Patching:
a viable worm defence
strategy ?
Milan Vojnović & Ayalvadi Ganesh
Microsoft Research
Cambridge, United Kingdom
Tutorial
Performance 2005
Juan-le-Pins, France, Oct 4, ‘05
1
Who is this tutorial
intended for?
Security non-specialist
Learn some strategies of worm spread &
effectiveness of some countermeasures
Security specialist
Fundamental limits of patching
Whom is it not?
Those interested in gory details of particular
worms and vulnerabilities they exploit
2
What is a Worm?
Self-replicating malicious code that
Exploits a known or unknown software vulnerability
(e.g. buffer overflow)
To gain (partial?) control over the host
Uses the host to propagate copies of itself
Typically, does not require human intervention
Unlike viruses
Contributes to speed of spread
3
Buffer overflow vulnerability
Example: Web form
Name
Data
Program
Name
Worm data
Overwrites program
4
Motivation for studying worms
Self-replicating malicious code spreads very
quickly
Code Red v2: 360,000 hosts in ~ 24 hours
Slammer: 75,000 hosts in ~ 10 minutes
Causes huge economic damage
Backbone saturation, cleanup
Things could be worse!
Hard or impossible to eliminate all
vulnerabilities in code
5
Roadmap
Worm spread strategies
Target discovery mechanisms
SI epidemic model of worm spread
Patch spread strategies
Analysis of patching
Patching
Patching with filtering
Candidate strategies: PUSH, P2P
Summary and conclusions
6
Target discovery (1)
No a priori knowledge of vulnerable hosts
Random scanning
Generate IP address at random
If vulnerable host found at that address,
infect it
Commonly used in current generation of
worms, e.g., Code Red I, Slammer
Not very smart, but can still be fast
7
Host population types
= S (susceptible)
= I (infected)
= P (patched)
8
Random scanning worm

1
IP address
scan hit !
9
Random scanning worm
1

10
Target discovery (2)
Local preference /subnet biased scanning
Ex. Code Red II, Zotob
Infected hosts split their scanning effort
between
Local subnet (IP addresses with the same
1-2-3 octet prefix) &
Global Internet
Why ?
11
Subnet preference: Code Red II
1/2 3/8
/16
1/8
/8
12
Subnet preference: Zotob
Scans local /16 address space until
512 consecutive scans miss, or
Until 32 scans miss if there has been no
success
Then switches to random scanning of entire IP
address space
13
Target discovery (3)
Hit lists: Worm seeded with list of vulnerable
targets identified in advance
Carried in worm payload, or
Looked up from external server, e.g., metaserver for games
Objective: accelerate initial spread
14
Target discovery (4)
Topological worms
Target lists obtained from data residing on
host, ex,
Local DNS cache
Instant messenger contact lists
Neighbour lists of P2P applications
Potentially very fast, and hard to distinguish
from legitimate use
15
Roadmap
Worm spread strategies
Target discovery mechanisms
SI epidemic model of worm spread
Patch spread strategies
Analysis of patching
Patching
Patching with filtering
Candidate strategies: PUSH, P2P
Summary and conclusions
16
Model of random scanning
Address space of size Ω = 232
N vulnerable hosts, occupy fraction N/Ω of
address space
Infected hosts scan addresses randomly at
rate η
Code Red: η = 360 per minute
Slammer (UDP): η = 4000 per second
If scan locates a vulnerable host, it is infected
17
Random scanning (in pictures)
1

18
Stochastic epidemic model
Infected hosts scan IP address space at
points of Poisson process of rate η
Independent at distinct hosts
Rate at which scans hit vulnerable hosts:
β=ηN/Ω
I(t) : Number of infected hosts, evolves as a
Markov process
High-level model: ignores network
congestion, latency
19
Deterministic epidemic model
Large population limit:
N→∞, η/Ω fixed
i(t) = I(t)/N : fraction of hosts infected
i(t) : density dependent Markov chain
Converges to limit deterministic ODE
i’(t) = β i(t) [1-i(t)]
20
Modelling Code Red
5
4
x 10
# infected hosts, 0-360,000
3.5
3
2.5
2
1.5
1
0.5
0
0
5
10
15
20
25
Time in hours, 0-24
21
Characteristic time scale
Epidemic growth follows logistic curve
Initially exponential, then saturates
Time to infect most of the susceptible
population is a small multiple of 1/β:
~ 40 minutes for Code Red
~ 10 seconds for Slammer
Time scale for network-wide infection is hours
for Code Red, minutes for Slammer
22
Not considered: non-constant
per-infective scan rate
Some random scanning worms result in a
non-constant per-infective scan rate
per-infective scan rate =
observed scans per unit time
number of infected hosts
Example: Slammer (2003)
[Moore+04]
plausible cause:
bandwidth-saturation
see “extras”
23
Roadmap
Worm spread strategies
target discovery mechanisms
SI epidemic model of worm spread
Patch spread strategies
Analysis of patching
patching
patching with filtering
candidate strategies: PUSH, P2P
Summary and conclusions
24
Patching strategies
Patching: Identify vulnerabilities and develop
patches
Filtering: Detect and quarantine infected
hosts / subnets
Other: ensure code has no vulnerabilities
(non-trivial in practice)
25
Current approaches
Vulnerability is found & patch developed first
Patch is released
Worm is subsequently reverse engineered
from patch
Patch needs to be installed before worm is
released – hours to days
26
Example: Zotob
Aug 9-05: MS05-039 public disclosure
Plug-and-play vulnerability affecting mostly Win2k
Aug 12-05: Exploit code released
Aug 14-05: Zotob worm discovered
Followed by > 10 variants
27
Deficiencies
Zero-day worms: vulnerability not known or
patch not yet available
Requires automatic response, involving
Detection of worm spread
Automatic patch generation
Automatic patch dissemination
Human reaction times too slow
28
Example: Vigilante [Costa+05]
Detectors distributed through network
Detect worms by analysis of stack in code
execution
Can be combined with honeypots etc
Generate self-certifying alerts (SCAs) proving
vulnerability
Disseminate to hosts which verify SCA and
create filters (patches)
29
Problems we address
Architecture for alert dissemination
Vigilante uses structured overlay
interconnecting all end hosts
We propose a hierarchical scheme
Analysis of competing spread of worm and
patch
To establish if patching is feasible, and
quantify system requirements
30
Roadmap
Worm spread strategies
target discovery mechanisms
SI epidemic model of worm spread
Patch spread strategies
Analysis of patching
patching
patching with filtering
candidate strategies: PUSH, P2P
Summary and conclusions
31
Patching
Hierarchical dissemination:
Phase 1: among patching servers
Phase 2: patching-servers to hosts
32
Network partitioned into subnets
subnet j

1
1
2
j
J
33
Patching server in each subnet
patching server

1
1
2
j
J
patching servers termed superhosts
34
Superhosts interconnected by an
overlay

1
alerts or patches disseminate over overlay
with alerts, patch generated at superhosts
non essential in modelling
35
PULL
hosts poll a superhost with unit rate
superhost service rate = m
results in a patched host,
if the polling host was susceptible
s(t) = fraction of susceptible
hosts at time t
patching rate at time t = m s(t)
36
Host population dynamics
Patching system:
d
i (t )   i (t )s(t )
dt
d
s(t )    s(t )i (t )  ms(t )
dt
patch susceptible hosts only
assumes worm prevents
patching an infected host
plausible assumption for
automatic patching (no
human intervention)
if m = 0, standard logistic
in general:

d
m
log(i (t ))   i (0)  s(0)  log
dt


   i (t )
i (t )
i (0)

37
Limit host population
Result
m
i (  )

i ( )  log i ( 0 )   i (0)  s(0)

Implication
i ()  i (0)e

( i ( 0 )s ( 0 ))
m
Tight bound whenever infection rate  sufficiently small
(final fraction of infectives small)
Exponential in the infection to patch rate ratio !
38
Limit host population (example)
10000 vulnerable hosts
 = 0.1
i ()  i (0)e
dots = Monte Carlo

( i ( 0 )s ( 0 ))
m
39
Subnets (cont’d)
alerted subnets
1
2
j
J
patching with rate m only in alerted subnets
g(t) = fraction of alerted subnets at time t
40
Broadcast curve
g(t) = fraction of alerted
subnets at time t
1
0
T = broadcast time
Natural candidate: logistic
function
1
T
t
0
t
Many-superhosts limit for
random gossip:
T = O(log(J))
Same order for standard
overlays
41
Broadcast curve (example)
Example:
Pastry overlay of J
superhosts
Topology = GaTech
Broadcasting = Flooding
Exhibits logistic growth
(Such overlays randomly
constructed: locally tree)
42
Minimum Broadcast Curve
A curve m(t) such that at
any time t, fraction of
alerted superhosts  m(t)
Comparison: Minimum
broadcast curve yields an
upper bound on the fraction
of infectives
m(t)
Example: flooding on Pastry
overlay & logistic minimum
broadcast curve
43
Host population dynamics
d
fractions of infectives
i (t )  i (t )s(t )
dt
in alerted subnets
d
s(t )   s(t )i (t )  ms1(t )
fractions of susceptibles
dt
in alerted subnets
d 1
i (t )  i (t )s1(t )  g (t )(i (t )  i 1(t ))
dt
d 1
s (t )  ( i (t )  m )s1(t )  g (t )(s(t )  s1(t ))
dt
d
g (t )  g (t )(1  g (t ))
dt
44
The migrations
g(t)(i(t)-i1(t))
= J g(t)(1-g(t)) [i(t)-i1(t)] / [J(1-g(t))]
Assume at time t, J(1-g(t)) = 5
I1
I2
I3
I4
I5
Pick a subnet at random
M := # infectives in randomly picked subnet
E(M) = (I1+…+I5) / 5
45
Host population dynamics

d
i (t )  i (t )s(t )
familiar one-subnet
dt
patching system,
d
but with patching rate
s(t )   s(t )i (t )  mw (t )s(t )
= mw(t)
dt
d
w (t )  mw (t )2  ( m  g (t ))w (t )  g (t )
dt
The last ODE: Ricatti
Use substitution: w = 1 + 1/z
(w = 1, a particular solution)
46
Per-susceptible patching rate
e mt
1
w (t )  1
1 g(0)  g(0)et  m (t )  1/(1 w (0))
 m (t )  
e mt
1
dx
1  g ( 0 )  g (0 ) x 1 / m
 1
w ( )  
1/ m
m 1
m 1
“bottleneck” is patching within subnets
are alerts over the overlay
47
Per-susceptible patching rate
bottleneck = patching within subnets
bottleneck = alerts over overlay
48
Overlays that satisfy a logistic
minimum broadcast curve
Fast-overlay asymptotic: small m,  and m/  fixed
m
s(0)(1  w (0))
i (  )
i ( )  log i ( 0 )  ~ i (0)  s(0)  m
logg (10 ) 

1  g (0 )
overlay “diameter”
Heuristics: take g(0)=1/J, log(1/g(0)) = log(J)
Intuitive: T replaced with log(1/g(0))
49
Known broadcast time T
Result
i ( ) 

m
m
)
i ( 0 ) s ( 0 )

logi (i (

i
(
0
)

s
(
0
)

log
0)
i ( 0 ) s ( 0 )e 


Implies:
i ()  i (0)e

If T = 0, then consistent with
one-subnet patching
Uses minimum broadcast
curve m(t) = 1{tT}
No patching until all subnets alerted

1 T
m
 ( i ( 0 )  s ( 0 )) T

( i (0)s(0))
g(t)
1
0
T
t
50
Roadmap
Worm spread strategies
target discovery mechanisms
SI epidemic model of worm spread
Patch spread strategies
Analysis of patching
patching
patching with filtering
candidate strategies: PUSH, P2P
Summary and conclusions
51
Patching with filtering
Assume each subnet applies edge filtering
An alerted subnet blocks scans in & out
=> a scan between two distinct subnets can succeed only
if both subnets are non alerted
BLOCK
successful scan,
if hits a susceptible
BLOCK
52
Host population dynamics (1)
Population dynamics under NON alerted subnets
fractions of infectives
in NON alerted subnets
d 0
i (t )  i 0 (t )s 0 (t )  g (t )i 0 (t )
dt
d 0
s (t )   s 0 (t )i 0 (t )  g (t )s 0 (t )
dt
d
fractions of susceptibles
g (t )  g (t )(1  g (t ))
dt
in NON alerted subnets
53
Host population dynamics (2)

Result
u (t )  '
1  g (0)u(t )
i (t )  i (0 ) s 0 ( 0 )
'
i 0 (0)
1  g (0 )

u
(
t
)
0
0
0
0
i ( 0 ) s ( 0 )
i ( 0 ) s ( 0 )
0
0
1  g (0)u(t )
s (t )  (1  i (u(t )))
1  g (0 )
0


0
u(t) = g(t)/g(0)
’ = (i0(0)+s0(0))/(1-g(0))
54
Fraction of infected hosts in non
alerted subnets
alert rate = 1
55
Further example with random
realizations
100 superhosts
Pastry overlay
Broadcast: flooding
Topology: GaTech
i(0) = 1/1000
hosts per subnet =
1000
 = 0.1
56
Patching vs. Patching & Filtering
with fast patching within subnets
Bottleneck is overlay
Within subnets patching assumed instantaneous
d 0
i (t )   i (t )s 0 (t )  g (t )i 0 (t )
dt
d 0
s (t )    s 0 (t )i (t )  g (t )s 0 (t )
dt
d 1
i (t )  g (t )i 0 (t )
dt
diffs
d
g (t )  g (t )(1  g (t ))
dt
patching
d 0
i (t )  i 0 (t )s 0 (t )  g (t )i 0 (t )
dt
d 0
s (t )   s 0 (t )i 0 (t )  g (t )s 0 (t )
dt
d 1
i (t )  g (t )i 0 (t )
dt
d
g (t )  g (t )(1  g (t ))
dt
patching & filtering
57
Continued
For patching & filtering: a closed-form
t
i 1(t )  i 1(0)  g (0)i 0 (0) [i 0 (0)  s 0 (0)u   ' ]du
0
Binomial integral with a series solution
Simple bound

1
 'log 
1
1
0
i ( )  i (0)  g (0)i (0) 0
e
s (0)(1   ' )
1
g(0)
“diameter” of the overlay
58
Example
i(0) = 10-5
i1(0) = r i(0)
s0(0) = 1 – g(0) – (1 - r) i(0)
g(0) = 10-3
Note: i1(0) + p(0) = g(0)
59
Roadmap
Worm spread strategies
target discovery mechanisms
SI epidemic model of worm spread
Patch spread strategies
Analysis of patching
patching
patching with filtering
candidate strategies: PUSH, P2P
Summary and conclusions
60
PUSH
superhost maintains inventory list of
hosts
superhost service rate = m
serves hosts in order
…
s(t) = fraction of susceptible
hosts at time t
patching rate at time t = m/ (1-mt) s(t)
61
Host population dynamics
d
i (t )  i (t )s(t )
dt
t < 1/m
d
1
s(t )   s(t )i (t )  m
s (t )
dt
1  mt
By comparison: patching rate per susceptible is
larger, so starting from same initial value, the
fraction of infectives is smaller than with PULL
Claim no estimate of ultimate infectives, see
numerics …
62
Patching rate
Address space: 1, 2, …, 
Susceptibles: S(1), S(2), …, S()
Probability to pick a susceptible at the
k-th push:
k = 1, S(1) / 
k = 2, S(2) / (-1)
…
k= , S() / 1
63
PULL vs. PUSH
i(0) = 10-5
Yes, PUSH superior to PULL
But less than an order of magnitude
64
PULL vs. PUSH
(same but wider range)
4
10
x 10
9
8
PULL
7
PUSH
i(+) / i(0)
6
5
4
3
2
1
0
0
20
40
60
80
100
/m
120
140
160
180
200
65
Worm-like patch dissemination (1)
66
Worm-like patch dissemination (2)
Two epidemics:
d
i (t )   i (t )(1  i (t )  p(t ))
dt
d
p(t )  mp(t )(1  i (t )  p(t ))
dt
Patch epidemics with larger spread rate m

i ( )  i (0)
1i (  )
p( 0 )


m
i ()  i (0)e

log
m
 
1
p( 0 )
67
Roadmap
Worm spread strategies
target discovery mechanisms
SI epidemic model of worm spread
Patch spread strategies
Analysis of patching
patching
patching with filtering
candidate strategies: PUSH, P2P
Summary and conclusions
68
Conclusions (1)
Containment of random scanning worms
Can be effective by patching only if patching rate
sufficiently larger than worm infection rate
Achievable at reasonable patching rate if scan
rates are constrained
No feasibility problems, but largely engineering &
security issues
Smarter worms
?
69
Conclusions (2)
Looking ahead
Worms evolve, so must network immune system
Analysis in its infancy
Need solid theoretical understanding of worm
strategies
Informs design of countermeasures
& lets us know their limitations
Beyond dynamics description & numerical
solving
Analytical estimates
70
This slide deck
& related references
http://research.microsoft.com/~milanv …
… /immunology.htm
71
References
[Moore+04] Inside the Slammer Worm, D. Moore, V. Paxson, S.
Savage, C. Shannon, S. Staniford, N. Weaver, IEEE Security &
Privacy, 2004.
[Costa+05] Vigilante: End-to-End Containment of Internet
Worms, M. Costa, J. Crowcroft, M. Castro, A. Rowstron, L.
Zhou, L. Zhang, P. Barham, SOSP’05, 2005.
[Kesidis+04] Coupled Kermack-McKendrick Models for Randomly
Scanning and Bandwidth Saturating Internet Worms, QoS-IP,
Feb 2005.
[V-G05a] On the Effectiveness of Automatic Patching, V.-G.,
WORM’05, Nov 2005.
[V-G05b] On the Race of Worms, Alerts, and Patches, V.-G.,
Microsoft Research TR TR-2005-13, Feb 2005.
… many other
(see some on the website of the previous slide)
72
Extras
Why did per-infective scan rate of Slammer
decrease ?
Comparison result for patching system
Patching & filtering
73
Why did per-infective scan rate of
Slammer decrease ? (1)
Bandwidth-saturation model [Kesidis+04]
N subnets, each with at most K vulnerable hosts
Each subnet with outbound rate s
A1: one infective saturates the outbound link
A2: ignore worm scans from local subnet
okay for many fixed-size subnets
 (t ) 
per-infective
scan rate
s (1  n0 (t ))
K
 kn (t )
k 1
k
fraction of subnets
with k infectives
74
Why did per-infective scan rate of
Slammer decrease ? (2)
Dynamics:
d
n0    (1  n0 )Kn0
dt
d
nk   (1  n0 )[(K  k  1)nk 1  (K  k )nk ]
dt
d
nK   (1  n0 )nK 1
dt
Non-linear system, but …
75
Why did per-infective scan rate of
Slammer decrease ? (3)
Closed-form solution
change time du(t) = (1 – n0(t))dt
makes it a system of linear ODEs
 k K  j 
 ( K k )u ( t )
u ( t ) k  j
(1  e
nk (t )   
) n j (0)e
, k  0,1,..., K
 j 0  k  j 

1
u ( t )
e

Kt 1/ K
n0 (0)  (1  n0 (0))e


76
Why did per-infective scan rate of
Slammer decrease ? (4)
Special: n1(0) = 1 – n0(0)
Number of infectives at time t:
N  knk (t )  NK 1  1 
k 1
1n0 ( 0 )
K
e 
 (t ) 
s 1 n0 (0)e

K 1 1
 Ku ( t )
1n0 ( 0 )
K
e
u ( t )
Similarly for heterogeneous
outbound links;
see [Kesidis+04]

 Ku ( t )
0
1n0 ( 0 )
u ( t )
K
1 1
12000
u ( t )
Per-infective scan rate:
fit to Slammer data as
s 1 n (0)e
in [Kesidis+04]
 (t ) 
K
14000
Scan rate per-infective
 
K
16000
e
10000
8000
6000
4000
2000
0
0
50
100
150
200
time (sec)
250
300
350
77
Comparison for patching system
d
i j (t )   i j (t )s j (t )
dt
d
s j (t )    s j (t )i j (t )  f j (t )s j (t )
dt
j  1,2
Result: if f1(t) ≤ f2(t), for all t, then starting
from same initial point, the fraction of
infectives with f2 is at most that with f1
See [V-G05b] for a precise statement
78
Patching & filtering
Note
d 0
(i (t )  s 0 (t ))  g (t )( i 0 (t )  s 0 (t ))
dt
Follows
d 0
i (t )  i 0 (t )( a(t )  i 0 (t ))
dt
(i0)
t
with
a(t )  (i (0)  s (0))e 0
0
0
 g ( u )du

1

g (t )
(i0) = generalized logistic ODE
closed-form solution next slide
79
A general result for generalized
logistic ODE
Generalized logistic ODE:
d
y (t )  y (t )( a(t )  y (t ))
dt
t
Let A(t )   a(u )du . Assume A(t) < +.
Result:
0
y (t )  y (0 )
e A( t )
t
, t 0
1  y (0) e A(u )du
0
Proof: Look at ODE for Y, primitive of y. Use separation of variables.
80
Ultimately infected hosts (1)
Consider a subnet j alerted at a time Tj
Before subnet j is alerted
d
i j (t )   i 0 (t )i j (t )
dt
d
s j (t )    i 0 (t )s j (t )
dt
0  t  Tj
i j (t )  n j  (t )(n j  i j (0)), 0  t  Tj
0
0
i
(
0
)

s
(0 )
0
 (t )  exp(  i (u )du )  0
0
'
s
(
0
)

i
(
0
)
u
(
t
)
0
t
81
Ultimately infected hosts (2)
After subnet j is alerted
d
i j (t )   i j (t )i j (t )
dt
d
s j (t )    i j (t )s j (t )  ms j (t )
dt
t  Tj
Familiar patching system
 
m
i (  )
i j ( )  log i (T )  n j

j
j
j
82