Challenges in distributed energy adaptive computing, Keynote at

Download Report

Transcript Challenges in distributed energy adaptive computing, Keynote at

Challenges in Distributed
Energy Adaptive Computing
K. Kant
NSF and GMU
K. Kant, Modeling Challenges in Distributed Energy Adaptive Computing
1
Information & communication Technology
(ICT) has a problem
Performance Centric  Energy &
Sustainability centric
How do we get there?
K. Kant, Modeling Challenges in Distributed Energy Adaptive Computing
2
ICT Power Growth until 2020
• Increase in spite of power efficient designs
– Clients: 8x in number, 3X in power
– Data Centers: > 2X increase
– Network: 3X increase
Network
Transmission, conversion
& distribution
Clients
Data Center
K. Kant, Modeling Challenges in Distributed Energy Adaptive Computing
3
Current State
Unsustainable Computing
K. Kant, Modeling Challenges in Distributed Energy Adaptive Computing
4
Data Center Infrastructure
• Resource intensive: Water, cabling, metal, …
• ~50% power wasted before getting to racks
K. Kant, Modeling Challenges in Distributed Energy Adaptive Computing
5
Distribution Infrastructure
~10% distribution loss + High carbon impact
IT LOAD
6% loss
94% efficient
~1% loss in switch
gear and conductors
480V
13.2kv
13.2kv
0.3% loss
99.7% efficient
208V
13.2kv
115kv
UPS:
2.5MW Generator
~180 Gallons/hour
0.5% loss
99.5% efficient
K. Kant, Modeling Challenges in Distributed Energy Adaptive Computing
1.0% loss
99.0% efficient
6
~50% Rack Power Wasted
Component
CPU
Fans
Memory (32 GB)
Hard drives
I/O adapters
Motherboard
Total DC power
Power supply loss
AC input power
Total
Used
80 60
50 25
88 24
40 10
20
4
22 12
300 135
50
7
350 142
Comments
Operating at 100% utilization
Temp. directed fan at 100% util
2GB DIMMS, 4W idle, 19W active
6 SATA drives, 25% busy
25% disk, 15% network
N/S bridges & devices, VR’s, …
14%  5% loss of AC input pwr
> 50% of power is wasted
K. Kant, Modeling Challenges in Distributed Energy Adaptive Computing
7
Sustainable Computing
K. Kant, Modeling Challenges in Distributed Energy Adaptive Computing
8
Renewable Energy Push
• Limit energy draw
from grid
– Less infrastructure
– Less losses
– but variable supply
Need better power adaptability
K. Kant, Modeling Challenges in Distributed Energy Adaptive Computing
9
High Temperature DC’s
• Chiller-less operation
– Less energy/materials, but
space inefficient
X
• High temperature operation
– Smaller Toutlet – Tinlet
– More throttling
– More failure prone (?)
Need smarter thermal adaptability
K. Kant, Modeling Challenges in Distributed Energy Adaptive Computing
10
Overdesign
• Overdesign is the norm today
• What if we right-size
everything?
• Highly energy
efficient but need
smarter control
PSU efficiency
– Huge power supplies, fans, heat sinks, server cases,
high rack capacity, UPS capacity, …
– Engineered for worst case  Rarely encountered
– Huge power wastage, waste of materials, energy, …
Efficiency vs. Load
90
85
80
75
70
65
60
55
50
Low eff
0
20
40
60
output load
High eff
80
Better energy adaptability to deal w/ frugal design
11
100
Energy Adaptive Computing
• EAC strives to do dynamic end to end
adjustment to
– Workload adaptation for graceful QoS
degradation under energy limitations
– Infrastructure adaptation to cope with
temporary energy deficiencies.
• Requires coordinated power/thermal mgmt
of computation, network & storage.
• Enhances sustainability of IT infrastructure
12
EAC Instances
K. Kant, Modeling Challenges in Distributed Energy Adaptive Computing
13
Client-server EAC
• Transparently adapt to client energy states
– State = {on-AC, normal, low-battery, …}
– Service contract Ci = {setup QoS, operational
QoS}
• Adaptation Challenges
– Communicating & enforcing contracts.
– Group adaptation of clients forced by
network/servers ?
K. Kant, Modeling Challenges in Distributed Energy Adaptive Computing
14
Cluster EAC
• Adaptation to intra & inter-DC limits
– Multi-level: Server, rack & DC levels
• Adaptation Challenges
– Estimate & collect power deficits/surplus at
multiple levels
– Coordination across large range of devices
• Location based services
• Coordination across levels
– Simultaneously handle client-server loop
K. Kant, Modeling Challenges in Distributed Energy Adaptive Computing
15
P2P EAC
• Adaptation based on “available energy”
•
•
•
•
Content: video resolution, audio coding, …
Network: modulate wireless radio usage (?)
Energy proportional use of peer resources
Energy driven content replication & reorganization
• Adaptation Challenges
– Satisfying QoS ?
– Balancing src/dest usage vs. relay node
energy usage ?
K. Kant, Modeling Challenges in Distributed Energy Adaptive Computing
16
Challenges
Some specific Issues
K. Kant, Modeling Challenges in Distributed Energy Adaptive Computing
17
Power Estimation Challenges
• Notion of effective power?
– Additive relationship: Workload  power
– Why is this hard? Interference
• Available power
– Determined by power, thermal & perhaps
other issues (noise).
– Required at multiple levels: facility, enclosure,
machine, …
K. Kant, Modeling Challenges in Distributed Energy Adaptive Computing
18
Network Role in EAC
• Energy Adaptation
– Aggressive control of switch/router ports
• Speed, state & width controls
– Traffic consolidation across paths
• Adaptation induced congestion
– Propagation (e.g., ECN, EBCN) & response
• Computation – communication tradeoff ?
• Redirection ?
• Network protocol support for adaptation?
K. Kant, Modeling Challenges in Distributed Energy Adaptive Computing
19
Other Issues
• EAC Security
– Attacks on power sources
– Energy Attacks on IT, e.g.,
• Demanding too much, cyclic demands, …
• Storage adaptation
– Storage devices, controllers & network.
• Coordinated end to end control is hard!
• Formal models to understand impact of
energy adaptation.
K. Kant, Modeling Challenges in Distributed Energy Adaptive Computing
20
Energy Adaptation in
Data Centers
K. Kant, Modeling Challenges in Distributed Energy Adaptive Computing
21
Adaptation Methods
• Workload Adaptation
– Coarse grain: Shut down low priority tasks
– Fine grain: Graceful QoS degradation, e.g.,
• Batched service, poorer resolution, …
• Infrastructure Adaptation
– Operation at lower speeds (DVFS)
– Effective use of low power modes & “width”
control.
• Workload adaptation always done first
22
Infrastructure Adaptation
• Need a multilevel scheme –
– Individual “assets” up to entire data center
• Need both supply & demand side adaptations
Supply Side Adaptation
• Supply side Limits
– Hard caps at higher levels (true limit) vs. “soft”
(artificial) caps at lower levels.
– Limits may be a result of thermal/cooling issues.
• Load consolidation
– An essential part of energy efficient operation
– Load consolidation vs. soft capping
• Need to address workload adaptation
changes as a result of supply increase &
decrease.
Demand Side Adaptation
• Adaptation to fluctuating demand
– Transactional workload: Migrate queries or
app VMs?
• Issues w/ combined supply & demand side
adaptations
– Imbalance: One node squeezed while other
has surplus power
– Ping-pong Control: Oscillatory migration of
workload
– Error accumulation down the hierarchy.
A Proposed Algorithm
• Unidirectional control
– Load migration moves up the hierarchy, from
local to global.
– Local migrations are temporary & do not trigger
changes to “soft” caps on supply.
• Target Node selection
– Based on bin packing (best-fit decreasing)
– Allows for more imbalance, which can be
exploited for workload consolidation
• Properties
– Avoids ping-pong, attempts to minimize
imbalance
Experimental Results
• Scenario
–
–
–
–
3 levels, 18 identical servers (4+4 + 5+5)
3 applications, total of 25 app instances
Any app can run on any server
Demand Poisson (active power ∞ utilization)
Migration Frequency
• Migration drivers: consolidation vs. energy deficiency
– Low util  Consolidation, High util  Energy deficiency
• Other characteristics
– Migration frequency low in all cases
– No ping-pong observed
Thermal Impacts
• Additional Issues
– Energy consumption limited by
thermal/cooling issues, not energy availability
– Migrations required to limit temperature
• Temperature & power have nonlinear
relationship
• Need to account for both power & thermal
effects
Results w/ Thermal Effects
• Imbalanced cooling
– Servers 1-14: Ta=25o C, Servers 15-18: Ta=40oC
– Temperature limit: 65oC
• Power demand is adjusted by the alg. to
account for higher temperature
Conclusions
• Need to go beyond energy efficiency
– Design devices/systems to minimize life-cycle
energy footprint
– Creatively adapt to available energy to
operate “at the edge”
• Ongoing/future work
– Coordinated server, network & storage mgmt.
– Explore tradeoffs between QoS, power
savings and admission control performance
31
Thank you!
K. Kant, Modeling Challenges in Distributed Energy Adaptive Computing
32
Power Inefficiencies
Rack
supply
280V
90-95% efficient
Server
PSU
±12, ±5V
Wasted leakage &
clock power
CPU
Voltage
Regulators
DRAM & Mem
controller
70-90% efficient
Fans
95% efficient
Storage
Adapters
Idle wasted power
K. Kant, Modeling Challenges in Distributed Energy Adaptive Computing
33
4.0
3.0
Energy
deficient
computing
Energy
adaptive
computing
Energy
efficient
computing
Performance
Energy
plenty
computing
Operating Regimes
2.0
1.0
Relative power requirements
K. Kant, Modeling Challenges in Distributed Energy Adaptive Computing
0
34
So, What’s the Problem
• Local constraints & controls
 end-to-end impacts
Client
Client
Networ
k
– DC to DC load shift
• Service disruption & post-shift
impact
DC1
storage
Server1
– Client request to alter content
Core
Networ
k
• Less or more work for server
• Potential conflicting controls
storage
K. Kant, Modeling Challenges in Distributed Energy Adaptive Computing
DC2
Server2
35