Transcript PAC 3.2: Using CFD Modeling to Predict Failure Mode Scenarios
PAC 3.2: Using CFD Modeling to Predict Failure Mode Scenarios
Paul Bemis President Applied Math Modeling Inc Wednesday April 22, 2015 1
Data Center World – Certified Vendor Neutral
Each presenter is required to certify that their presentation will be vendor-neutral. As an attendee you have a right to enforce this policy of having no sales pitch within a session by alerting the speaker if you feel the session is not being presented in a vendor neutral fashion. If the issue continues to be a problem, please alert Data Center World staff after the session is complete.
2
Using CFD Modeling to Predict Failure Mode Scenarios
The decision to add new IT equipment into a data center is often done using a “rules of thumb” analysis to determine if the data center can handle the additional load. But what happens when a cooling unit must be taken off line for service or repair? Using modeling techniques for predicting these scenarios prior to encountering them can save time and money.
3
• • •
Outline Introduce Terminology Review Traditional Raised Floor Room Return
• Review effects of CRAC unit failure • Deploy Cold Aisle Containment • Deploy Hot Aisle Containment
Review and Questions
4
PUE and DCIE
Power Utilization Effectiveness (PUE) PUE = Total Facility Energy / Total IT Energy
Interpretation of PUE
Typical Good Excellent 1.8
1.4
1.2
Data Center Infrastructure Efficiency (DCiE) DCIE = (Total IT Energy/ Total Facility Energy) x100%
Interpretation of DCIE
Typical Good Excellent 55% 70% 85% 5
Data Center Metrics
Return Temperature Index (RTI) = Rack Flow Rate Air Handler Flow Rate x 100
or
Return Temperature Index (RTI) = Air Handler Delta T Rack Delta T x 100 RTI is a measure of net by-pass air or net recirculation air in the data center
Interpretation of RTI
Interpretation RTI Value Balanced 100% Net Rack Recirculation Net By-Pass Airflow > 100% < 100% Return Temperature Index (RTI) is a Trademark of ANCIS Incorporated (www.ancis.us). All rights reserved. Used under authorization 6
Rack Cooling Index (RCI HI ) = 1 -
Data Center Metrics
Total Over-Temp Max Allowable Over-Temp x 100 Rack Cooling Index (RCI ASHRAE Specifications:
Recommended
64.4 – 80.6 F LO ) = 1 -
Allowable
59 – 89.6 F RCI is a measure of compliance with the ASHRAE thermal guideline Total Under-Temp Max Allowable Under-Temp
Interpretation of RCI
Interpretation Ideal Good Acceptable RCI Value 100% 95% to 100% 90% to 95% Poor < 90% x 100 Rack Cooling Index (RCI) is a Registered Trademark of ANCIS Incorporated (www.ancis.us). All rights reserved. Used under authorization 7
Data Center Case Summary Classical raised room “downflow” CRAC design 6200 sq ft data center, 18 inch plenum Total Cooling Capacity: 8 CRACs, 630 kW capacity (180 tons) 2 - 105 kW (30 Ton) units 6- 70 kW (20 Ton) units Total IT load: 529 kW of IT load Should have enough extra cooling capacity to qualify as an N+1 design.
8
Base Case 9
Data Center Metrics Data Center Energy Report
Parameter COP (Coefficient of Performance) (User-Specified) PUE (Power Utilization Effectiveness) DCIE (Data Center Infrastructure Efficiency) (%) RTI (Return Temperature Index) 1 (%) RCI_HI (Rack Cooling Index - High) 1 (%) RCI_LO (Rack Cooling Index - Low) 1 (%) Total Facility Power (kW) Annual Operating Cost ($) Value 1.25
1.8
56 84.34
99.96
< 0 954 835,704
Operating with too much airflow and at too low a supply air temperature
10
Base Model Rack Inlet Temperatures 11
Bypass Airflow
Air Supply From Perforated Tiles 12
Air Supply From Perforated Tiles
High Velocity Air Has Low Pressure and Momentum
13
Cooling Unit Failure Analysis
• • Switch off each CRAC and run a CFD analysis to determine the subsequent impact on rack inlet temperatures.
Results will vary depending on many things including CRAC airflow and geometric placement in the room.
14
Example Case 1: CFA, CRAC 45 off Crac 45 15
Data Center Metrics
RTI Improved, but RCI declined Amount of air into room is now better, but distribution is poor
16
Example Case 2: CFA, CRAC 10 off CRAC 10
Racks out of ASHRAE compliance in other areas of room
17
Data Center Metrics
Amount of Air into Room is Perfect, but SAT too low 18
Example Case 3: CFA, CRAC 9 off 19
Data Center Metrics
20
• •
How does Containment Affect CRAC Failure
• • • Even with hot or cold aisle containment, CRAC failure can cause racks to fall out of compliance Hot aisle generally provides better “assurances” against airflow issues.
Room can act as a general reservoir of cool air Also provides improved “human” conditions The investment in a ceiling plenum return is well worth it for the longer term.
21
Cold Aisle Containment: CFA, CRAC 10 off Partially Contained
Contained Aisles
22
Cold Aisle Case 2: CFA, CRAC 10 off Partially Contained
Pressure Loss From Open Tiles Causes Airflow Issues
23
CRAC 10 Off: Rack Inlets Still Out of Specifications CRAC 10 24
Cold Aisle Contained: CFA, CRAC 10 off Fully Contained
Added Containment
25
Revised Design: Full Cold Aisle Containment CRAC 10 off
Racks in Compliance, but SAT still too low
26
Raising Hot Air Return Temperatures Increases Cooling Capacity
27
But in Cold Aisle Contained Designs Becomes Difficult
Resulting Room Ambient Temperature
Remaining Room Becomes the Hot Aisle
28
Hot Aisle Contained Design
29
Hot Aisle Design CRAC 10 off
30
Hot Aisle Contain CRAC 10 off
31
Hot Aisle Contain Crac 10 off
32
Data Center Metrics
RTI, RCI Hi are Optimal, Now SAT Can Be Increased
33
• • •
Summary
Although the overall design might be “N+1”, until some models are run, it is unclear what will happen in a failure condition.
The health of the servers will vary depending on which CRAC is down.
Failure mode plans may need to be developed and deployed.
Having a model of your data center is an important tool for predicting the conditions of a failure or service condition.
34
Summary
• • • Crac Failure Analysis is important to run when performing modifications to existing designs, as well as new designs Often a design that is optimized for normal operation will perform poorly in a CRAC failure mode scenario It is important to understand these conditions before they occur so appropriate action can be taken prior to failure mode.
35
Questions Regarding This Presentation and/or Modeling Methods
Paul Bemis [email protected]
m
Thank you
36