Transcript Slide 1

Toward Parallel Space Radiation Analysis

TG08 Dr. Liwen Shih, Thomas K. Gederberg, Karthik Katikaneni, Ahmed Khan, Sergio J. Larrondo, Susan Strausser, Travis Gilbert, Victor Shum, Romeo Chua University of Houston Clear Lake [email protected]

1

TG08

This project continues Space Radiation Research work preformed last year by Dr. Liwen Shih’s students to investigate HZETRN code optimization options. This semester we will analyze HZETRN code using standard static analysis tools and runtime analysis tools. In addition we will examine code parallelization options for the most called numerical method in the source code: the PHI function.

[email protected]

Runtime Profile Of HZETRN Functions Rem aining Functions, 9.03

cvtas_s_to_a, 0.73

od_, 0.73

prpgt_, 0.93

pow f.J, 1.35

prpli_, 1.97

texp, 2.91

anu, 4.26

iuni 4.36

expf.J, 8.72% logf.J, 30.43% PHI (Inte rpola tion Func tion Most Time Spe nt He re a t: 3 4 .5 % of tota l runtime )

13,220,184 calls made to this function over program run!

phi logf.J

expf.J

iuni_ anu texp prpli_ powf.J

prpgt_ od_ cvtas_s_to_a Remaining Functions 2

What is Space Radiation?

TG08

Two major sources

  galactic cosmic rays (GCR) solar energetic particles (SEP). GCR are ever-present and more energetic, thus they are able to penetrate much thicker materials than SEP. In order to evaluate the space radiation risk and design the spacecraft and habitat for better radiation protection, space radiation

transport codes

, which depends on the input physics of nuclear interactions, have been developed [email protected]

3

Space Radiation and the Earth

Earth protected from Space Radiation Animation Sources: Rice University, Connections Program .

TG08 [email protected]

This image shows how the Earth's magnetic field causes electrons to drift one way about the Earth. Protons drift the opposite direction.

original clips provided courtesy of Professor Patricia Reiff, Rice University, Connections Program 4

What about Galactic Cosmic Radiation (GCR)?

TG08

A typical high energy particle of radiation found in the space environment is ionized itself and as it passes through material such as human tissue it disrupts the electronic clouds of the constituent molecules and leaves a path of ionization in its wake. These particles are either singly charged protons or more highly charged nuclei called "HZE" particles.

[email protected]

5

HZETRN Space Radiation Nuclear Transport Code

HZETRN : High Charge and Energy Nuclear Transport Code FORTRAN-77 Written: 1992 Environment: VAX mainframe Code Metrics: Files: Lines: Code Lines: Comment Lines: 3 9665 6803 2859 Declarative Statements: 780 Executable Statements: 6563 Ratio Comment/Code: 0.42

The three included source code files are: 1-NUCFRAG.FOR for generating nuclear absorption and reaction cross sections 2-GEOMAG.FOR for defining the GCR transmission coefficient cutoff effects within the magnetosphere.

3-HZETRN.FOR for propagating the user defined GCR environments through two layers of user supplied materials. The current version is setup to propagate through aluminum, tissue (H2O), CH2 and LH2.

TG08 [email protected]

6

HZETRN Numerical Method

TG08 [email protected]

7

HZETRN Calculates:

Radiation Fluence of HZE particles:

time-integrated flux of HZE particles per unit area.

Energy absorbed per gram:

first measuring energy amount left behind by radiation in question and, then, amount and type of material.

Dose Equivalent:

A unit of dose equivalent

amount of any type of radiation absorbed in a biological tissue as a standardized value

TG08 [email protected]

8

HZETRN Algorithm

TG08 [email protected]

9

TG08

HZETRN used for Mars Mission

NASA has a new vision for space exploration in the 21st Century encompassing a broad range of human and robotic missions including missions to Moon, Mars and beyond. As a result, there is a focus on long duration space missions. NASA, as much as ever, is committed to the safety of the missions and the crew. Exposure from the hazards of severe space radiation in deep space long duration missions is ‘the show stopper.’

Thus, protection from the hazards of severe space radiation is of paramount importance for the new vision. There is an overwhelming emphasis on the reliability issues for the mission and the habitat. Accurate risk assessments critically depend on the accuracy of the input information about the interaction of ions with materials, electronics and tissues. [email protected]

10

Martian Radiation Climate Modeling Using HZETRN Code

 Calculations of the skin dose equivalent for astronauts on the surface of Mars near solar minimum.  The variation in the dose with respect to altitude is shown.

 Higher altitudes (such as Olympus Mons) offer less shielding.

Mars Radiation Environment (Source Wilson et al: http://marie.jsc.nasa.gov

) TG08 [email protected]

11

HZETRN Model vs. Actual Mars Radiation Climate HZETRN underestimates!

Partly Because of Code Inefficiency Dosage Data is underestimated Dose rate measured by MARIE spacecraft in the transit period from April 2001 to August 2001 compared with HZETRN Calculated Doses Code calculations Spike in May due to SPE Differences between the observed (red) and predicted (black) doses vary from factor 1 to 3

TG08 Graph Source: Aliena Spazio European Space Agency Report 2004 [email protected]

12

Project Goal:

Speedup

of Runtime via Analysis and modification of HZETRN Code numerical algorithm

Runtime Profile Of HZETRN Functions Rem aining Functions, 9.03

cvtas_s_to_a, 0.73

od_, 0.73

prpgt_, 0.93

pow f.J, 1.35

prpli_, 1.97

texp, 2.91

anu, 4.26

iuni 4.36

expf.J, 8.72% PHI (In te rpo la tio n F u n c tio n Mo st Time Spe n t He re a t: 3 4 .5 % o f to ta l ru n time )

13,220,184 calls made to this function over program run!

logf.J, 30.43%

phi logf.J

expf.J

iuni_ anu texp prpli_ powf.J

prpgt_ od_ cvtas_s_to_a Remaining Functions The major Space Radiation Code Bottleneck lies inside the function call to the PHI interpolation function TG08 [email protected]

13

Code Optimization Options

4028 C ************************************************************** 4029 C 4030 FUNCTION PHI(R0,N,R,P,X) 4031 C 4032 C FUNCTION PHI INTERPOLATES IN P(N) ARRAY DEFINED OVER R(N) ARRAY 4033 C ASSUMES P IS LIKE A POWER OF R OVER SUBINTERVALS 4034 C 4035 DIMENSION R(N),P(N) 4036 C 4037 SAVE 4038 C 4039 XT=X 4040 PHI=P(1) 4041 INC=((R(2)-R(1))/ABS(R(2)-R(1)))*1.01

4042 IF(X.LE.R(1).AND.R(1).LT.R(2))RETURN 4043 C 4044 DO 1 I=3,N-1 4045 IL=I 4046 IF(XT*INC.LT.R(I)*INC)GO TO 2 4047 1 CONTINUE 4048 C 4049 IL=N-1 4050 2 CONTINUE 4051 PHI=0.

1. Fix Inefficient code 2. Fix/Remove unnecessary function calls (TEXP) SAVE, and dummy arguments 3. Use optimized ALOG function 4. Use Lookup Table instead 5. Investigate Parallelization Of Interpolation Statements Link to HZETRN TG08 [email protected]

14

TG08

Code Optimization

Improve Code Structure USE FASTER ALOG function (LOG) Remove extraneous Function Calls [email protected]

15

TG08

Steps toward a faster HZETRN

Step

1. Review Algorithm 2. Analyze Source Code and Data files 3. Portability Study 4. Static Analysis 5. Runtime Analysis

Purpose Result

Understand underlying numerical algorithm Understand code structure and function Attempt to port HZETRN code To various HPC platforms and compilers Develop understanding of program structure –Document code for optimization and report Target runtime bottlenecks and determine most called functions/subroutines HZETRN algorithm is complex – Needs further review –overall functions of code are understood Review of Code and data files reveals that much of the code is inefficient, with redundant elements and archaic structure Data files contain sparse matrices amenable to performance improvement Portability study revealed problems with code and additional requirements for optimization We generated a detailed HTML report documenting HZETRN source code functions and structure of subroutine calls Revealed that the PHI interpolation function is the major bottleneck function \The natural logarithm intrinsic function Is also a performance issue 6. Serial Optimization of Code Starting with the PHI function We removed extraneous function calls, cleaned up ‘messy code’ Resulted in Runtime Performance improvement (initially a 10% overall increase) [email protected]

16

Parallel Space Radiation Analysis

The goal of project was to speed up the execution of the HZETRN code using parallel processing.

The Message Passing Interface (MPI) standard library was to be used to perform the parallel processing across a cluster with distributed memory.

TG08 [email protected]

17

Computing Resources Used

     Itanium 2 cluster ( Atlantis ) Texas Learning & Computation Center (TLC 2 ) at the University of Houston.

Atlantis is a cluster of 152 dual Itanium2 (1.3 GHz) compute nodes networked via a Myrinet 2000 interconnect. Atlantis is running RedHat Linux version 5.1.

The Intel Fortran compiler (version 10.0) and OpenMPI (an Open Source MPI-2 implementation) of MPI is being used.

In addition, a home PC running Linux (Ubuntu 7.10) with the Sun Studio 12 Fortran 90 compiler and MPICH2 was used. TeraGrid has just started been used TG08 [email protected]

18

PHI Routine (Lagrangian Interploation)

      Figure showing HZETRN runtime profile Most time is spent by function PHI - 3 rd order Lagrangian Interpolation. PHI function is heavily called by the propagation and integration routines -called 229,380 times at each depth typically.

Early focus - optimizing PHI routine.

The PHI routine takes the natural log of the input ordinate and abscissas prior to peforming the Lagrangian interpolation and returns the exponential of the interpolated ordinate.

Runtime Profile Of HZETRN Functions Rem aining Functions, 9.03

cvtas_s_to_a, 0.73

od_, 0.73

prpgt_, 0.93

pow f.J, 1.35

prpli_, 1.97

texp, 2.91

anu, 4.26

iuni 4.36

expf.J, 8.72% logf.J, 30.43% PHI (In te rpo la tio n F u n c tio n Mo st Time Spe n t He re a t: 3 4 .5 % o f to ta l ru n time )

13,220,184 calls made to this function over program run!

phi logf.J

expf.J

iuni_ anu texp prpli_ powf.J

prpgt_ od_ cvtas_s_to_a Remaining Functions (Source: Shih, Larrondo, et al,

HIgh-Performance Martian Space Radiation Mapping,

NASA/UHCL/UH-ISSO, pp. 121-122) Removing the calls to the natural log and exponential functions resulted in a 21% (Atlantis) to 45% (home) speedup, but had negative impact on numerical results (see next page) since the the functions being interpolated are logarithmic.

TG08 [email protected]

19

PHI Routine Needs LOG/TEXP

Significant different comparing results

with

and

without

calls to LOG/TEXP TG08 [email protected]

20

PHI Routine Optimization

 Bottleneck PHI routine being called so heavily, message passing overhead to parallelize would be prohibitive.

 Simple code optimizations of PHI routine resulted in: – 11.4 % speedup on home PC running Linux compiled using the Sun Studio 12 Fortran compiler.

– 3.85% speedup on an Atlantis node using the Intel Fortran compiler.

– Reduced speedup on Atlantis may be that the Intel compiler was already generating more optimized code.

TG08 [email protected]

21

PHI Routine FPGA Prototype

 Implementing bottleneck routines: PHI routine, and/or logarithm/exponential routines in an FPGA could result in a significant speedup.  A reduced precision floating point FPGA prototype was developed for an estimated

~325 times faster PHI

computation in hardware.

TG08 [email protected]

22

HZETRN Main Program Flow

Basic flow of HZETRN: – Step 1: Call MATTER to obtain the material property (density, atomic weight and atomic number of each element) of the shield.

– Step 2: Generate the energy grid.

– Step 3: Dosemetry and propagation in the shield material  Call DMETRIC to compute dosemetic quantities at current depth.

 Call PRPGT to propagate the GCR's to the next depth  Repeat step 3 until target material is reached – Step 4: Dosemetry and propagation in the target material  Call DMETRIC to compute dosemetric quantities at current depth.

  Call PRPGT to propagate the GCR's to the next depth Repeat step 4 until required depth is reached.

TG08 [email protected]

23

DMETRIC Routine

    The suboutine DMETRIC is called by the main program at each user specified depth in the shield and target to compute dosimetric quantities.

Their are 6 main do-loops in the routine. Approximately 60% of DMETRICs processing time is spent in loop 2 and 39% of DMETRICs processing time is spent in loop 5.

To check whether the above loop could be done in parallel, the order of the loop was reversed to test for data dependency. The results were identical  there was no data dependency between the dosemetric calculations for each isotope. TG08 [email protected]

24

DMETRIC Routine - Dependent?

 To determine if loop 5 is parallelizable, the outer loop was first changed to decrement from II to 1 rather than from 1 to II . The results were identical  outer loop of loop 5 should be parallelizable.

 Next the inner loop was changed to decrement from IJ to 2 rather than from 2 to IJ . Differences appear in the last significant digit (see next page).  These differences are due to floating point rounding differences during four summations. TG08 [email protected]

25

DMETRIC Routine - Not Dependent

Minor results difference changing order of inner loop of loop 5 TG08 [email protected]

26

Parallel DMETRIC Routine

    Since there is no data dependecy in the dosemetric calculations for each of the 59 isotopes, these computations could be done in parallel.

Statements (using MPI's wall-time function: MPI_WTIME) were inserted to measure the amount of time spent in each subroutine.

Approximately 17% of the processing time is spent in subroutine DMETRIC while about 82% of the processing time is spent in subroutine PRPGT and less than 1% of the processing time is spent in the remainder of the program.

Assuming infinite parallelization of DMETRIC, the maximum speedup obtained would be up to 17%.

TG08 [email protected]

27

PRPGT Routine

   PRPGT - propagate GCR's through the shielding and the target.

~ 82% of HZETRN processing is spent in PRPGT or routines it calls.

At each propagation step from one depth to the next in the shield or target, the propagation for each of the 59 isotopes is performed in two stages: – The first stage computes the energy shift due to propagation – The second stage computes the attenuation and the secondary particle production due to collisions To test whether the propagation for each of the 59 ions could be done in parallel, the loop was broken up into four pieces (a J loop from 20 to 30, from 1 to 19, from 41 to 59, and from 31 to 40).

If the loop can be performed in parallel, then the results from these four loops should be the same as the single loop from 1 to 59.

TG08 [email protected]

28

PRPGT Routine - Check Dependency

 The following compares the results of breaking up main loop into four loops (on the left) with the original results.

 Significant different results demonstrate that the propagation can not be parallelized for each of the 59 ions.

TG08 [email protected]

29

PRPGT Routine - Data Dependent

     Identical to original results reversing inner 1 st and 2 nd  possible to parallelize the 1 st or 2 nd stages.

stage I loops However, to test data dependence from the 1 st stage to the 2 nd stage, the main J loop was divided into two loops (one for the 1 st stage and one for the 2 nd stage) Results changed  the 2 nd stage is dependent on the 1 st stage A barrier to prevent execution of the 2 nd completes stage until the 1 st stage 24% of the HZETRN processing is spent on the 1 st stage while less than 2% of the time is spent on the 2 nd stage. Therefore, parallel processing of both stages does not appear worthwhile.

TG08 [email protected]

30

Parallel PRPLI Routine

      PRPLI is called by PRPGT after the 1 st and 2 nd stage propagation has been completed for each of the 59 isotopes.

PRPLI performs the propagation of the six light ions (ions Z < 5).

~ 53% of total HZETRN time is spent on light ions propagation.

PRPLI propagates 45 x 6 fluence (# particles intersect a unit area) matrix (45 energy points for each of the 6 light ions) named PSI.

Analysis of the has shown that there is no data dependency among the energy grid points.

It should, therefore, be possible to parallelize the PRPLI code across the 45 energy grid points.

TG08 [email protected]

31

General HZETRN Recommendations

 Arrays in Fortran are stored in column-order.  more effecient to access in column order, rather that row-order.  HZETRN is using an old Fortran technique of alternate entry points.  The use of alternate entry points is discouraged.  HZETRN uses COMMON blocks for global memory.  Fortran-90 MODULES should be used instead.

TG08 [email protected]

32

Conclusions & Future Work

 HZETRN performance, written in Fortran 77 in the early 1990's, can be improved via simple code optimizations and parallel processing using MPI  Maximum 50% speedup with current HZETRN expected  Additional performance improvements could be obtained by implementing the 3 rd Order Lagrangian Interpolation routine (PHI), or the natural log (LOG) and exponential (TEXP) functions on a FPGA.

TG08 [email protected]

33

          

References

J.W. Wilson, F.F. Badavi, F. A. Cucinotta, J.L. Shinn, G.D. Badhwar, R. Silberberg, C.H. Tsao, L.W. Townsend, R.K. Tripathi, HZETRN: Description of a Free-Space Ion and Nucleon Transport Shielding Computer Program , NASA Technical Paper 3495, May 1995.

J. W. Wilson, J.L. Shinn, R. C. Singleterry, H. Tai, S. A. Thibeault, L.C. Simmons, Improved Spacecraft Materials for Radiation Shielding , NASA Langley Research Center. spacesciene.spaceref.com/colloquia/mmsm/wilson_pos.pdf

NASA Facts: Understanding Space Radiation , FS-2002-10-080-JSC, October 2002.

P. S. Pacheco, Parallel Programming with MPI , Morgan Kaufmann Publishers Inc.: San Francisso, 1997.

S. J. Chapman, Fortran 90/95 for Scientists and Engineers , 2 nd edition. McGraw Hill: New York, 2004.

L. Shih, S. Larrondo, K. Katikaneni, A. Khan, T. Gilbert, S. Kodali, A. Kadari, HIgh Performance Martian Space Radiation Mapping , NASA/UHCL/UH_ISSO, pp. 121-122.

L. Shih, Efficient Space Radiation Computation with Parallel FPGA , Y2006 – ISSO Annual Report, pp. 56 61.

Gilbert, T. and L. Shih. "High-Performance Martian Space Radiation Mapping," IEEE/ACM/UHCL Computer Application Conference, University of Houston-Clear Lake, Houston, TX, April 29, 2005.

Kadari, A.. S. Kodali, T. Gilbert, and L. Shih. "Space Radiation Analysis with FPGA," IEEE/ACM/UHCL Computer Application Conference, University of Houston-Clear Lake, Houston, TX, April 29, 2005.

F. A. Cucinotta, "Space Radiation Biology," NASA-M. D. Anderson Cancer Center Mini-Retreat, Jan. 25, 2002 < http://advtech.jsc.nasa.gov/presentation_portal.shtm

>.

Space Radiation Health Project, May 3, 2005, NASA-JSC, March 7, 2005 < http://srhp.jsc.nasa.gov/ > TG08 [email protected]

34

TG08

Acknowledgements

     NASA LaRC Robert C. Singleterry Jr , PhD NASA JSC/CARR PVA&M Premkumar B. Saganti , PhD TeraGrid, TACC TLC2 Mark Huang & Erik Engquist Texas Space Grant Consortium ISSO [email protected]

35