PAC 3.2: Using CFD Modeling to Predict Failure Mode Scenarios

Download Report

Transcript PAC 3.2: Using CFD Modeling to Predict Failure Mode Scenarios

PAC 3.2: Using CFD Modeling to Predict Failure Mode Scenarios

Paul Bemis President Applied Math Modeling Inc Wednesday April 22, 2015 1

Data Center World – Certified Vendor Neutral

Each presenter is required to certify that their presentation will be vendor-neutral. As an attendee you have a right to enforce this policy of having no sales pitch within a session by alerting the speaker if you feel the session is not being presented in a vendor neutral fashion. If the issue continues to be a problem, please alert Data Center World staff after the session is complete.

2

Using CFD Modeling to Predict Failure Mode Scenarios

The decision to add new IT equipment into a data center is often done using a “rules of thumb” analysis to determine if the data center can handle the additional load. But what happens when a cooling unit must be taken off line for service or repair? Using modeling techniques for predicting these scenarios prior to encountering them can save time and money.

3

• • •

Outline Introduce Terminology Review Traditional Raised Floor Room Return

• Review effects of CRAC unit failure • Deploy Cold Aisle Containment • Deploy Hot Aisle Containment

Review and Questions

4

PUE and DCIE

Power Utilization Effectiveness (PUE) PUE = Total Facility Energy / Total IT Energy

Interpretation of PUE

Typical Good Excellent 1.8

1.4

1.2

Data Center Infrastructure Efficiency (DCiE) DCIE = (Total IT Energy/ Total Facility Energy) x100%

Interpretation of DCIE

Typical Good Excellent 55% 70% 85% 5

Data Center Metrics

Return Temperature Index (RTI)  = Rack Flow Rate Air Handler Flow Rate x 100

or

Return Temperature Index (RTI)  = Air Handler Delta T Rack Delta T x 100 RTI is a measure of net by-pass air or net recirculation air in the data center

Interpretation of RTI

Interpretation RTI Value Balanced 100% Net Rack Recirculation Net By-Pass Airflow > 100% < 100% Return Temperature Index (RTI) is a Trademark of ANCIS Incorporated (www.ancis.us). All rights reserved. Used under authorization 6

Rack Cooling Index (RCI HI )  = 1 -

Data Center Metrics

Total Over-Temp Max Allowable Over-Temp x 100 Rack Cooling Index (RCI ASHRAE Specifications:

Recommended

64.4 – 80.6 F LO )  = 1 -

Allowable

59 – 89.6 F RCI is a measure of compliance with the ASHRAE thermal guideline Total Under-Temp Max Allowable Under-Temp

Interpretation of RCI

Interpretation Ideal Good Acceptable RCI Value 100% 95% to 100% 90% to 95% Poor < 90% x 100 Rack Cooling Index (RCI) is a Registered Trademark of ANCIS Incorporated (www.ancis.us). All rights reserved. Used under authorization 7

Data Center Case Summary Classical raised room “downflow” CRAC design 6200 sq ft data center, 18 inch plenum Total Cooling Capacity: 8 CRACs, 630 kW capacity (180 tons) 2 - 105 kW (30 Ton) units 6- 70 kW (20 Ton) units Total IT load: 529 kW of IT load Should have enough extra cooling capacity to qualify as an N+1 design.

8

Base Case 9

Data Center Metrics Data Center Energy Report

Parameter COP (Coefficient of Performance) (User-Specified) PUE (Power Utilization Effectiveness) DCIE (Data Center Infrastructure Efficiency) (%) RTI (Return Temperature Index) 1 (%) RCI_HI (Rack Cooling Index - High) 1 (%) RCI_LO (Rack Cooling Index - Low) 1 (%) Total Facility Power (kW) Annual Operating Cost ($) Value 1.25

1.8

56 84.34

99.96

< 0 954 835,704

Operating with too much airflow and at too low a supply air temperature

10

Base Model Rack Inlet Temperatures 11

Bypass Airflow

Air Supply From Perforated Tiles 12

Air Supply From Perforated Tiles

High Velocity Air Has Low Pressure and Momentum

13

Cooling Unit Failure Analysis

• • Switch off each CRAC and run a CFD analysis to determine the subsequent impact on rack inlet temperatures.

Results will vary depending on many things including CRAC airflow and geometric placement in the room.

14

Example Case 1: CFA, CRAC 45 off Crac 45 15

Data Center Metrics

RTI Improved, but RCI declined Amount of air into room is now better, but distribution is poor

16

Example Case 2: CFA, CRAC 10 off CRAC 10

Racks out of ASHRAE compliance in other areas of room

17

Data Center Metrics

Amount of Air into Room is Perfect, but SAT too low 18

Example Case 3: CFA, CRAC 9 off 19

Data Center Metrics

20

• •

How does Containment Affect CRAC Failure

• • • Even with hot or cold aisle containment, CRAC failure can cause racks to fall out of compliance Hot aisle generally provides better “assurances” against airflow issues.

Room can act as a general reservoir of cool air Also provides improved “human” conditions The investment in a ceiling plenum return is well worth it for the longer term.

21

Cold Aisle Containment: CFA, CRAC 10 off Partially Contained

Contained Aisles

22

Cold Aisle Case 2: CFA, CRAC 10 off Partially Contained

Pressure Loss From Open Tiles Causes Airflow Issues

23

CRAC 10 Off: Rack Inlets Still Out of Specifications CRAC 10 24

Cold Aisle Contained: CFA, CRAC 10 off Fully Contained

Added Containment

25

Revised Design: Full Cold Aisle Containment CRAC 10 off

Racks in Compliance, but SAT still too low

26

Raising Hot Air Return Temperatures Increases Cooling Capacity

27

But in Cold Aisle Contained Designs Becomes Difficult

Resulting Room Ambient Temperature

Remaining Room Becomes the Hot Aisle

28

Hot Aisle Contained Design

29

Hot Aisle Design CRAC 10 off

30

Hot Aisle Contain CRAC 10 off

31

Hot Aisle Contain Crac 10 off

32

Data Center Metrics

RTI, RCI Hi are Optimal, Now SAT Can Be Increased

33

• • •

Summary

Although the overall design might be “N+1”, until some models are run, it is unclear what will happen in a failure condition.

The health of the servers will vary depending on which CRAC is down.

Failure mode plans may need to be developed and deployed.

Having a model of your data center is an important tool for predicting the conditions of a failure or service condition.

34

Summary

• • • Crac Failure Analysis is important to run when performing modifications to existing designs, as well as new designs Often a design that is optimized for normal operation will perform poorly in a CRAC failure mode scenario It is important to understand these conditions before they occur so appropriate action can be taken prior to failure mode.

35

Questions Regarding This Presentation and/or Modeling Methods

Paul Bemis [email protected]

m

Thank you

36