Building High Availability Distributed Services

Download Report

Transcript Building High Availability Distributed Services

SITAR: A Scalable Intrusion Tolerant
Architecture for Distributed Services
Feiyi Wang
MCNC
Kishor S. Trivedi
Duke University
Reality and Desire

Reality




Most of “commercial off the shelf” servers have inherent
vulnerabilities
Trusted computing based (TCB), (Authentication,
Authorization, Access Control) AAA, Intrusion detection (ID):
enough?
No assurance can be assumed once system is (even
partially) compromised, more tolerance is needed
Desire



Provide minimal level of service, or, continuation of operation
despite active attacks, partially compromised components
More “tolerance” in addition to “detection”
Enhance/support COTS servers
7/24/2001
DARPA Summer PI Meeting
2
SITAR Approach Overview



SITAR intrusion tolerance capability is focused on a
generic class of services as the target of protection
Develop ITS architecture by leveraging the basic fault
tolerance techniques (redundancy, diversity and
acceptance test, dynamic reconfiguration)
Use both model-based and measurement-based
approaches to quantitatively evaluate intrusion
tolerance capability of this architecture and carry out
cost-benefit trade-off studies
7/24/2001
DARPA Summer PI Meeting
3
Proxy
Servers
Ballot
Monitors
Acceptance
Monitors
P1
B1
A1
S1
P2
B2
A2
S2
Pu
Bv
Am
Sn
Audit
Control
7/24/2001
Adaptive
Reconfiguration
DARPA Summer PI Meeting
COTS
Servers
Protected
Users/Clients
SITAR Architecture
request
responses
control
4
Proxy Server: Architecture
Outside
client
Write proxied requests
Primary Take response
Acceptance
Monitor
Shared Space
e.g. connections
Forward to client
Ballot
Monitor
Resources
e.g. IP address
Backup
7/24/2001
Backup
Backup
DARPA Summer PI Meeting
5
Proxy Failure and Address Takeover
migrate
Resource
Resource
Techniques
Pros
Cons
MAC address
takeover
Almost
instantaneous
speed
Messy
IP address
takeover
Slower, less
reliable
Relatively easy
Slower
Can add loadbalancing
property
restore
Primary
Backup
DNS
reconfiguration
7/24/2001
DARPA Summer PI Meeting
6
MAC Address Take Over: ARP Spoofing
Hosts on LAN
Broadcast Address
Primary Server
Backup Server
ARP Request
ARP Reply
ARP Reply
IP Packets
winner or loser?
7/24/2001
DARPA Summer PI Meeting
7
Generic Resource and Fault Management
Proxy server
update
lease
IP
Addr A
IP
Addr B
Proxy server
7/24/2001
Connection
Shared Space
DARPA Summer PI Meeting
8
Proxy Server: Load Balancing
Name
Description
Round
robin
Distribute jobs equally among the real
servers.
Weighted
round robin
Distribute more jobs to servers with
greater capacity. Capacity is indicated
by the user-assigned weight, which is
adjusted upward or downward by
dynamic load information.
Leastconnections
Distribute more jobs to real servers
with fewer active connections.
Weighted
leastconnections
Distribute more jobs to servers with
fewer active connections relative to
their capacity. Capacity is indicated by
the user-assigned weight, which is
adjusted upward or downward by
dynamic load information.
7/24/2001
Performability
•
•
•
•
CPU
Memory
Disk speed
Current Load
• COTS Server
Confidence
Level
• Threat Level
Redirect decision
DARPA Summer PI Meeting
9
A Sample Configuration
Protocol
Virtual IP
address
Port
Real IP
address
Port
Weight/
Performability
TCP
152.45.4.105
80
172.16.1.2
80
2
172.16.1.3
8080
1
172.16.1.4
21
1
TCP
152.45.4.105
21
FROM 202.14.4.105:3407
TO 152.45.4.105:80
FROM 202.14.4.105:3407
TO 172.16.1.3:8080
FROM 152.45.4.105:80
TO 202.14.4.105:3407
FROM 172.16.1.3:8080
TO 202.14.4.105.3407
NAT
7/24/2001
DARPA Summer PI Meeting
10
Proxy Server Function Comparision
Single Point
Entry
Flexible
resource pool
Firewall
Input Checking
Load balancing
Extend to serve
overall security
Proprietary communication
membership management
7/24/2001
SITAR Proxy
Traditional
Caching
JINI/JavaSpace based
resource management
DARPA Summer PI Meeting
11
Acceptance Testing



In traditional fault tolerance context: an acceptance
test is a developer-provided error detection measure
for a software module
In SITAR project, we perform both reactive and
proactive acceptance test
Reactive testing



Testing methodology
Bugtraq vulnerabilities (41 Apache, 84 Microsoft IIS) from
1996 to March 2001 classification
Testing criteria that cover as many as possible
7/24/2001
DARPA Summer PI Meeting
12
Acceptance Testing Methodology
Requirement test
Conditions must be imposed to complete a
task
Reasonableness test
Used for policy, environmental, known
range testing
Timing test
Used for profiling service response time
Coding test
Used to detect integrity compromises by
comparing data signatures
Accounting test
Used for transaction-based applications
7/24/2001
DARPA Summer PI Meeting
13
Vulnerabilities from IIS and Apache
vulnerability cases
Number of Vulnerabilities Reported from Bugtraq
50
45
40
35
30
25
20
15
10
5
0
43
29
IIS
10
1 1
1 3
3 3
5
1996
1997
1998
1999
year
7/24/2001
Apache
21
5
2000
1/20013/2001
IIS 84, Apache 41
DARPA Summer PI Meeting
14
Web Server Fault, Error, Compromises
Design faults
Errors
exploit
Compromises
Access validation
errors
Input validation
errors
Integrity
Design
Mishandling
exceptional conditions
Confidentiality
Implementation
Unknown errors
DoS
Operation
Race condition errors
Command Execution
7/24/2001
DARPA Summer PI Meeting
15
Example: Apache Vulnerability Analysis
DoS
Integrity
Input Validation
552
Boundary
Condition
1642 1066 552 192
1191
Access
Validation
657
1876
Failure to
handle
exception
2843 1608 1819
2453 1476 579
2440 2441 1868
1190 2218 1089
Configuration
Design
Command
Execution
Others
2100 1587 1488 1084
1081 968
2023
1912 886
2048 1861 1570
1109 307 286
2252
582(dir) 2280(dir)
1193 1057 689 167
189 190 149
Race condition
529
1565
1021 882
2110 1818
521
Unknown
Confidentiality
658 1181
1065 194
1174(dir) 1832 1499
1734 1108 978 559
447 229 2074
501(L)
465 193
7/24/2001
195
DARPA Summer PI Meeting
1594 1595
16
Web Server Compromise Classification
Web server
compromises
Confidentiality
compromises
Path revealing,
Directory listing
File
disclosure
Integrity
compromises
7/24/2001
Command
execution
Stop
responding
File
modification
Other
Refuse
connection
Memory
consumption
File
creation
IP address
disclosure
DoS
compromise
Root
privilege
CPU
overload
File
deletion
DARPA Summer PI Meeting
System
crash
17
Example: Coding Test
Expected hash
response
Database
Of hashed Value
Policy Database
System
Configuration
Acceptance
Testing
COTS Server
Cluster
Client
request
7/24/2001
response
DARPA Summer PI Meeting
18
Proactive Management of COTS servers

Objective


High-Availability through proactive management of software
aging
Proactive security



Cumulative longer term behavior that might be detectable as
consistent spaced-out or low-overhead attacks
 Port scanning, DoS
Trending, long-term noise smoothing, and other techniques
Monitoring tool

Monitor COTS servers periodically and send alerts to the
Adaptive Reconfiguration Module (ARM)
7/24/2001
DARPA Summer PI Meeting
19
Software Aging
Definition, causes, manifestation and examples

Error conditions accumulating over time leading to performance
degradation/crash


What constitutes aging?


Deterioration in the availability of OS resources, data corruption,
numerical error accumulation
How does it manifest itself?


Not related to application program becoming obsolete due to
changing requirements/maintenance
Performance degradation, transient failure
Examples

7/24/2001
Netscape, xrn, Windows 95, File system aging [Smith & Seltzer],
Gradual service degradation in the AT&T transaction processing
system [Avritzer et. al.], Error accumulation in Patriot missile
system’s software
DARPA Summer PI Meeting
20
Environment Diversity
Software Rejuvenation

Proactive approach

Software rejuvenation



Involves occasionally stopping the running software, “cleaning”
it’s internal state and restarting it
Garbage collection, defragmentation, flushing kernel and file
server tables etc
See www.software-rejuvenation.com for more information
on this topic
7/24/2001
DARPA Summer PI Meeting
21
Monitoring tool

Data acquisition

Linux: /proc directory



memory, file descriptors, inodes, swap space, network-related
data, CPU usage etc.
Data logged on disk and used by analysis algorithms
Data analysis


Monitor resources, security-related parameters and send
alerts of intrusions/ security violations/impending vulnerable
states
Monitor workload for load balancing
7/24/2001
DARPA Summer PI Meeting
22
Software Aging
Non-parametric Regression Smoothing
Real Memory Free
File Table Size
Trend detection: Seasonal Kendall test for trend
7/24/2001
DARPA Summer PI Meeting
23
Quantitative Analysis of ITS - Previous work

Uncertainty in security [B. Littlewood et al, 1993]

Quantitative model of attacker behavior [Jonsson, Olovsson, 1997]



Privilege graphs; mean effort to security failure [Ortalo, Deswarte,
Kaaniche, 1999]
Concept of survivability - CERT, SEI [Ellison et al, 1999]
Vulnerability testing using fault - injection technique [Voas, Ghosh
et al, 1996], [Du, Mathur, 1998]
7/24/2001
DARPA Summer PI Meeting
24
Quantitative Analysis of ITS State Transition Diagram
system
free of the
vulnerability
G - good state
V - vulnerable state
A - active attack state
MC - masked compromised
state
UC - undetected compromised
state
TR - triage state
FS - fail secure state
GD - graceful degradation
state
F - failed state
G
entered V state
detected
(by accident or
before
pre-attack actions)
attack
transparent
V
restoration/
recovery recovery
reconfiguration/
without
exploit
begin
evolution
degradation
MC
restoration/
reconfiguration/
evolution
undetected
but masked
A undetected
non-maskable
intrusion tolerance
triggered
FS
fail-secure
TR
measure
graceful
degradation
UC
restoration/
reconfiguration/
evolution
GD
fail with alarm
F
7/24/2001
DARPA Summer PI Meeting
25
Security and Dependability


Attributes of dependability
 availability - readiness for usage
 reliability - continuity of service
 safety - non-occurrence of catastrophic
consequences
 integrity - data and programs are modified or
destroyed only in a specified and authorized manner
 confidentiality - sensitive information is not disclosed
to unauthorized recipients
See the book by Laprie for more details
7/24/2001
DARPA Summer PI Meeting
26
Security and Dependability - Cont’d


Associating integrity and availability with respect to
authorized actions, together with confidentiality, leads to
security
Computer security has traditionally been used as a
binary term that suggests that at any moment in time
the system is either secure (safe) or compromised
7/24/2001
DARPA Summer PI Meeting
27
Compromise of Confidentiality
system
free of the
vulnerability
G
entered V state
(by accident or
pre-attack actions)
detected
before
attack
V
exploit begin
restoration/
reconfiguration/
evolution
A
restoration/
reconfiguration/
evolution
undetected
non-maskable
UC
restoration/
reconfiguration/
evolution
intrusion tolerance
triggered
FS
fail-secure
TR
measure
graceful
degradation
GD
F
7/24/2001
DARPA Summer PI Meeting
28
Compromise of Integrity
system
free of the
vulnerability
G
entered V state
detected
(by accident or
before
pre-attack
actions)
attack
transparent
V
restoration/
recovery recovery
reconfiguration/
without
exploit begin
evolution
degradation
MC
restoration/
reconfiguration/
evolution
undetected
but masked
undetected
non-maskable
A
UC
restoration/
reconfiguration/
evolution
intrusion tolerance
triggered
FS
fail-secure
TR
measure
graceful
degradation
GD
fail with alarm
F
7/24/2001
DARPA Summer PI Meeting
29
Availability - DoS
system
free of the
vulnerability
G
entered V state
(by accident or
pre-attack actions)
detected
before
attack
recovery
without
degradation
V
exploit begin
undetected
non-maskable
A
restoration/
reconfiguration/
evolution
restoration/
reconfiguration/
evolution
UC
intrusion tolerance
triggered
TR
graceful
degradation
restoration/
reconfiguration/
evolution
GD
fail with alarm
F
7/24/2001
DARPA Summer PI Meeting
30
Stochastic models of ITS


To a system attacker with an incomplete knowledge
of the system, there is uncertainty as to the effects of
the attack
To the system owner, there is uncertainty as to the
type, frequency, intensity and the duration of the
attack, and as to whether a particular attack would
result in a security breach
7/24/2001
DARPA Summer PI Meeting
31
Describing Transitions
Events that trigger transitions among states
need to be described in terms of rates,
probabilities or CDF

Modeling attacker’s behavior
 Doubly stochastic process



G
V
MC
A
UC
FS
TR
GD
attacks conditional upon the effort process
effort process as a function of time
F
Phase type distributions

further refinement of the current model (G, V,
and A states)
General distribution function
Modeling system behavior



detection, coverage, and recovery
probabilities/delays for ITS
7/24/2001
DARPA Summer PI Meeting
32
Parameterization


Educated guess based on experience with previous
similar systems
Experimentation
 Attacker’s and system behavior



Monitoring actual system behavior (only if the security
requirements are relatively modest)
Attempting to break into the system using simulated or
actual human attackers (red team) - black box approach
System behavior

Fault injection technique (deliberate insertion of faults
into the system; well known in the testing fault-tolerant
systems) - white box approach
7/24/2001
DARPA Summer PI Meeting
33
Model Evaluation

Measures to be evaluated






Confidentiality
Integrity
Availability
Performance
Performability
Survivability
7/24/2001
DARPA Summer PI Meeting
34
Confidentiality, Integrity, and Availability



Confidentiality - probability that sensitive information
is not disclosed to unauthorized users
Integrity - probability that data and programs are not
modified or destroyed by unauthorized users
Availability - probability that the system is properly
functioning



Instantaneous availability A(t)
Steady-state availability A - limiting value of A(t) as t
approaches infinity (fraction of time spend in up states)
The system functions properly in states G, V, A, MC, and GD;
the system might be available in the UC state, but it delivers
invalid replies
7/24/2001
DARPA Summer PI Meeting
35
Performance

The level of productivity of a system



Full service in states G, V, A, and MC
Reduced service GD
No service FS and F
7/24/2001
DARPA Summer PI Meeting
36
Performability



Availability - failure and repair behavior, disregarding different
performance levels
Performance - How well is the system working, given that it
does not fail
Performability - combined measures of performance and
availability; characterizes degradable systems in term of their
ability to provide a given amount of useful work in a given
period of time



Higher level model represents failure/repair behavior
Performance models for each state in the higher level model
provide rewards
See the monograph on Performability edited by Haverkort et al
7/24/2001
DARPA Summer PI Meeting
37
Survivability


Survivability is the capability of a system to fulfill its
mission, in a a timely manner, in the presence of
attacks, failures, or accidents
Key to the concept of survivability is the identification
of essential services

Central to the delivery of essential services is the capability
of a system to maintain essential properties (i.e., specified
levels of reliability, availability, integrity, confidentiality,
performance, and other quality attributes) in the presence of
attack, failure, or accident
7/24/2001
DARPA Summer PI Meeting
38
Survivability and Security



Security ignores the aspects of maintaining services
during and after intrusions
The system may have a set of requirements defining
its survivability goals and another set defining its
security goals
Some requirements in these two sets may be similar,
some complementary, and some conflicting


for example, reverting to fail secure state will limit the
damage (security goal), but will interrupt the essential
functionality (survivability goal)
tradeoffs must be made
7/24/2001
DARPA Summer PI Meeting
39
Survivability and Performability


Survivable systems are designed to continue working
even in the presence of attacks and failures,
guaranteeing a minimum level of performance
Survivability, as performability, is a composite
measure of performance and dependability
7/24/2001
DARPA Summer PI Meeting
40
Summary




We have set up a test-bed with diversified OS (Linux,
FreeBSD, Windows NT) and JINI/JavaSpace has been
deployed, and we are ready to do basic prototype
We have implemented a software rejuvenation agent
Model for quantitative analysis of security and
survivability are in progress
We are progressing towards a preliminary
architecture and quantitative methods report, due in
August
7/24/2001
DARPA Summer PI Meeting
41