Transcript pps

Autonomic Computing
The vision of autonomic computing, J.
Kephart and D. Chess, IEEE Computer, Jan.
2003.
Also
- A.G. Ganek and T.A. Corbi, “The dawning of the
autonomic computing era”, IBM Systems Journal, 42 (1),
2003.
- R. Want, T. Pering and D. Tennehouse, “Comparing
autonomic and proactive computing”, IBMS Systems
Journal, 42 (1), 2003.
.
Fabián E. Bustamante, Winter 2006
The problem
The main obstacle to further progress in IT industry
– Not a change in Moore’s law, but
– Looming software complexity crisis
• Beyond admin single environments, to integration into intra- and
inter-corporate computing systems
“Complexity is the business we are in, and complexity is what
limits us.”, Fred Brooks Jr.
Better programming won’t do it
Consider
– ~1/3 to ½ of a company’s total IT budget goes to preventing
and recovering from crashes
– “For every dollar to purchase storage, you spend $9 to have
someone manage it.”, N. Tabellion, CTO Fujitsu Softek
– ~40% of computer outages are caused by operator errors
– Average downtime impact for IT ~ $1.4 millions revenue/hour
CS 395/495 Autonomic Computing Systems
EECS, Northwestern University
2
The answer/hope – Autonomic computing
Autonomic systems – can manage themselves given
high-level objectives from admins.
~ autonomic nervous system
An autonomic system
– Knows itself
– Knows its environment & the context surrounding its activity
– (Re) configure itself under varying and unpredictable
conditions
– Is always on the look to optimize its working
– Is able to protect and heal itself
– Anticipates the optimized resources needed to meet a user’s
information needs
To incorporate these characteristics, it must have the
following properties/features …
CS 395/495 Autonomic Computing Systems
EECS, Northwestern University
3
Self-* properties
Self-configuration
– Current: Data centers made of components from/for multiple
vendors/platforms; installation, configuration & integration is
time consuming & error prone
– Autonomic: Automated based high-level policies, host system
adjust itself automatically and seamalessly
Self-optimization
– Current: Hundreds of manually set, nonlinear tuning knobs
– Autonomic: Components and system continually seek
optimization opportunities
Self-healing
– Current: e.g. problem determination can take weeks
– Autonomic: self detection, diagnosis, and repair for HW&SW
Self-protection
– Current: Detection & recovery from attacks & cascading
failures is manual
– Autonomic: Self-defense using early warning to anticipate &
prevent system-wide failures
CS 395/495 Autonomic Computing Systems
EECS, Northwestern University
4
Autonomic element
Autonomic systems –
interactive collection of
autonomic elements
Autonomic element
– 1+ managed elements +
autonomic manager that
controls it
– Function at many levels –
from disk drives to entire
enterprises
– Fixed behavior,
connections and
relationships gives away
to increased dynamism
and flexibility expresed as
high-level goals
Autonomic manager
Analyze
Plan
Knowledge
Monitor
Execute
Managed element
CS 395/495 Autonomic Computing Systems
EECS, Northwestern University
5
Evolution to autonomic systems
Basic
Level 1
Managed
Level 2
Predictive
Level 3
Adaptive
Level 4
Autonomic
Level 5
Multiple sources
of system
generated data
Consolidation of
data through
management
tools
System monitors,
correlates, and
recommends
actions
System monitors,
correlates and
takes actions
Integrated
components
dynamically
managed by
business
rules/policies
Requires
extensive, highly
skilled IT staff
IT staff analyzes
and takes actions
Greater system
awareness
Improved
productivity
IT staff approves
and initiate
actions
Reduced
dependency on
deep skills
IT staff manages
performance
against Service
Level
Agreements
(SLAs)
IT agility and
resiliency with
minimal human
interaction
Faster and better
decision making
CS 395/495 Autonomic Computing Systems
EECS, Northwestern University
IT staff focuses
on enabling
business needs
Business policy
drives IT
management
Business agility
and resilience
6
Engineering challenges
Design, test and verification
Installation and configuration
Monitoring, problem determination, upgrading
Managing the life cycle
– Autonomic systems will have multiple elements at different
stages, handling multiple tasks, … how to handle all?
Relationships among autonomic elements
– Specification of services needed/provided; ways to locate
providers; ways to establish SLA; …
Robustness against self-management-based attacks
Goal specification and robustness to wrongly specified
goals
CS 395/495 Autonomic Computing Systems
EECS, Northwestern University
7
Scientific challenges
How to understand, control, and design emergent
behavior
– Understanding the mapping from local to global behavior is
not enough
Develop a theory of robustness
– Beginning with a definition
Learning and optimization theory
– Machine learning by a single element in static environment is
just the basic – multiagent systems in dynamic environments
Negotiation theory
– How should the multiple elements negotiate?
Automated statistical modeling
– Statistical modeling for detection/prediction of performance
models; ways to aggregate statistical variables to reduce
dimensionality
CS 395/495 Autonomic Computing Systems
EECS, Northwestern University
8