CoolAir Temperature- and Variation

Download Report

Transcript CoolAir Temperature- and Variation

CoolAir
Temperature- and Variation-Aware
Management for Free-Cooled Datacenters
Íñigo Goiri, Thu D. Nguyen, and Ricardo Bianchini
1
Hybrid:datacenter
typical + free
cooling
Typical
cooling
Cooling
tower
Water
chiller
Air handling
unit
Server
racks
Evaporative
cooler
Fans
Server
racks
Free cooling
Outside
air
Filters
Microsoft DC in Chicago
2
Free cooling limitations
• Potentially negative impact on hardware reliability, especially disks
• High temperature
• Wide temperature variation
• High humidity
Disk
Outside
Inlet
Outside temp directly impacts inlet and disk temps
Daily temperature variation can be large
3
Roadmap
• Motivation and background
• CoolAir: Managing free-cooled datacenters
• Cooling modeling
• Cooling management
• Compute management
• CoolAir for Parasol
• Evaluation and general lessons
• Conclusions
4
CoolAir: Managing free-cooled datacenters
• Energy-aware management of cooling & workload
• Minimize hardware reliability issues
• Limit temperature and relative humidity
• Reduce temperature variation
CoolAir
Cooling
Modeler
Datacenter
• Major tasks
1.
2.
3.
4.
Predict conditions and energy
Select best cooling settings
Apply cooling settings
Place and schedule load
Cooling
Cooling
Manager
Servers
Compute
Manager
Weather
Forecast
5
Cooling modeling
• Predictions based on linear regression model
•
•
•
•
Temperature outside
Location in the datacenter
Datacenter utilization
Cooling setting
• Temperature inside/outside
• Humidity outside
• Cooling setting
• Cooling setting
• Cooling operation
Temperature inside
Datacenter
Historic
Data
Humidity inside
Cooling
Learner
Cooling power
Cooling
Model
6
Cooling management
• Use predictions from cooling model
• Reduce variation with a temp band based on expected outside temp
• Maintain temperature within the band
• Middle: forecast outside temp + offset
• Predict environmentals and energy
• Select best settings using utility
• Apply cooling settings
Temperature
• Periodically
Band selection example
Offset
Average
Outside temperature forecast
0
6
12
Hour
18
24
7
Compute management
• Spatial placement
• Distribute load to servers
Front view of Parasol’s racks
Rack 2
Rack 1
Server
• Group servers into “pods” of similar behavior
• Reduce solving and modeling complexity
• Favor pods with higher heat recirculation
• Against common practice in non-free-cooled DCs
• Lower recirculation pods are closer to cooling
→ temperature variation
Sensors
• Temporal scheduling
Pod
• When to execute deferrable loads (see paper)
8
Roadmap
• Motivation and background
• CoolAir: Managing free-cooled datacenters
• CoolAir for Parasol
• Evaluation and general lessons
• Conclusions
9
Case study: Parasol
• Default cooling controller:
• Outside temperature ≤ 30⁰C → Free cooling with variable fan speed
• Outside temperature > 30⁰C → AC cycling with hysteresis
Air
Conditioner
Relays
Door
Partition
Cooling
Controller
Free
Cooling
External view
Rack 2
Air duct
Cold aisle
Rack 1
Exhaust
Hot aisle
Internal layout (top view)
10
CoolAir for Parasol
• Data collection and model learning
• Historical sensor info for two months
• Generated extreme settings to learn faster
• Cooling configurer
• Interface with Parasol’s “thermostat”
• Control fan speed and AC
• Compute configurer for Hadoop
• Send idle worker nodes to sleep
while keeping data available
>90% with <0.5⁰C errors
11
Example of CoolAir on Parasol
12
Evaluation methodology
• Parasol as the baseline system
• 64 Atom servers: 8 pods in 2 racks
• Hadoop workloads: Non-deferrable Facebook (see paper for others)
• Real experiments and validated simulations
• Evaluated policies (see paper for others)
Policy
Temperature
Humidity
Energy
Spatial placement
Baseline
Reactive <30⁰C
✔
✔
✘
CoolAir
Adaptive band
✔
✔
High recirculation
13
Baseline vs CoolAir
3
2
Up to 4⁰C
reduction
1
Maximum daily
temperature variation
20
Up to 60%
reduction
15
10
5
0
0
Baseline
CoolAir
Power efficiency
2.5
PUE (Power Usage Effectiveness)
Newark
Chad
Santiago
Iceland
Singapore
25
Temperature range (⁰C)
Temperature violation (⁰C)
4
Absolute temperature
violation
2
Warmer locations
are more inefficient
1.5
1
Baseline
CoolAir
Baseline
CoolAir
14
Multiple geographical locations
Power efficiency (PUE) improvement
Reduction in max temperature range
Improves PUE in warmer locations
where PUE is worse
Reduces variation the most in colder
locations where variation is highest
-0.02 to -0.01
Sacrifices PUE slightly in cold locations
15
Principles and lessons learned
• Variation management requires fine-grain cooling and load control
• Management challenges depend on the climate
• Warm: managing absolute temperature costs more than variation
• Cold: managing temperature variation is more critical and successful
• Temp band and spatial placement are key; temporal scheduling is not
• Other lessons in the paper
16
Conclusions
• CoolAir successfully manages
• Absolute temperature and temperature variation
• Relative humidity
• Energy
• CoolAir broadens the set of areas where free cooling can be used
• Principles should apply to larger datacenters
17
CoolAir
Temperature- and Variation-Aware
Management for Free-Cooled Datacenters
Íñigo Goiri, Thu D. Nguyen, and Ricardo Bianchini
18