The Sharing and Training of HPC Resources at the University of Arkansas Amy Apon, Ph.D. Oklahoma Supercomputing Symposium October 4, 2006
Download ReportTranscript The Sharing and Training of HPC Resources at the University of Arkansas Amy Apon, Ph.D. Oklahoma Supercomputing Symposium October 4, 2006
The Sharing and Training of HPC Resources at the University of Arkansas Amy Apon, Ph.D. Oklahoma Supercomputing Symposium October 4, 2006 Outline of Talk • HPC at the University of Arkansas – Current status • A New Mechanism for Sharing Resources – AREON • HPC Training – New course delivery via HDTV collaboration with LSU • Collaboration opportunities and challenges – GPNGrid and SURAGrid – Resource allocation issues Amy Apon, Ph.D. ● University of Arkansas ● October 4, 2006 High Performance Computing Resources at the University of Arkansas • Red Diamond supercomputer – NSF MRI grant, August, 2004 • Substantial University match • Substantial gift from Dell – First supercomputer in Arkansas • Number 379 on the Top 500 list, June, 2005 • 128 node (256 processor), 1.349 TFlops Amy Apon, Ph.D. ● University of Arkansas ● October 4, 2006 More Resources • Prospero cluster – 30 dual processor PIII nodes – SURAGrid resource • Ace cluster – 4 dual processor Opteron – Our entry point to the GPNGrid/Open Science Grid • Trillion cluster – 48 dual processor Opteron – Owned by Mechanical Engineering – About 1TFlop Amy Apon, Ph.D. ● University of Arkansas ● October 4, 2006 How are we doing Amy Apon, Ph.D. ● University of Arkansas ● October 4, 2006 We are seeing research results • Computational Chemistry and Materials Science (NSF) – New formulas for new drugs – Nanomaterials – Chemistry, Physics, Mechanical Engineering – Over 95% of our usage is in these areas Amy Apon, Ph.D. ● University of Arkansas ● October 4, 2006 Research results in other areas, also • Multiscale Modeling • DNA Computing • Middleware and HPC Infrastructure – Tools for managing data for large-scale applications (NSF) – Performance modeling of grid systems (Acxiom) Amy Apon, Ph.D. ● University of Arkansas ● October 4, 2006 We have done some significant upgrades • For the first year we used SGE on half the computer and half of the computer was selfscheduled PVM jobs • LSF scheduler installed May 2006 • About 60 users, about 10 very active users Amy Apon, Ph.D. ● University of Arkansas ● October 4, 2006 Thanks to LSF, we are busy LSF Daily Pending Parallel Job Statistics by Queue (jobs waiting) Amy Apon, Ph.D. ● University of Arkansas ● October 4, 2006 And jobs have to wait LSF Hourly Turnaround Time of Normal Queue Amy Apon, Ph.D. ● University of Arkansas ● October 4, 2006 We have something exciting to share Amy Apon, Ph.D. ● University of Arkansas ● October 4, 2006 AREON Arkansas Research and Education Optical Network TULSA Fayetteville Jonesboro Russellville Fort Smith Conway MEMPHIS Little Rock Pine Bluff Arkadelphia Monticello DALLAS Magnolia MONROE 25-July-2006 AREON Arkansas Research and Education Optical Network • The first bond issue (last fall) failed TULSA Fayetteville Jonesboro Russellville Fort Smith Conway • Governor Huckabee of Arkansas granted $6.4M Little Rock (PI Zimmerman) MEMPHIS Pine Bluff • MBO loop between Tulsa and Fayetteville fiber is in Arkadelphia place, network hardware is being shipped • The campus (last mile) connectionsMonticello are in progress DALLAS Magnolia All is on target for a demo to the Governor on 12/5/06! MONROE AREON Arkansas Research and Education Optical Network • This fall, Uark will have connectivity to Internet2 and the National Lambda Rail • The bond issue is on the ballot again this coming fall • If it passes then the other research institutions will be connected to AREON We hope this happens! • The timeframe for this is about a year and a half Amy Apon, Ph.D. ● University of Arkansas ● October 4, 2006 Opportunities for collaboration with OneNet, LEARN, LONI, GPN, and others LEARN Topology for State Denton Lubbock Dallas Longview Waco El Paso College Station Austin Houston Beaumont San Antonio Galveston LEARN Topology NLR Topology Leased Lambda LEARN Site Metro Interconnect City Corpus Christi Amy Apon, Ph.D. ● University of Arkansas ● October 4, 2006 A Demonstration Application • High Performance Computing New course in Spring 2007 – – – – In collaboration with LSU and Dr. Thomas Sterling We are exploring new methods of course delivery using streaming high-definition TV We expect about 40 students at five locations this time Taught live via Access Grid and HDTV over AREON and LONI, … A test run for future delivery of HPC education Amy Apon, Ph.D. ● University of Arkansas ● October 4, 2006 Collaboration via GPN Grid – Active middleware collaboration for almost 3 years – GPNGrid is in the process of making application as a new Virtual Organization in Open Science Grid – Sponsored by University of Nebraska – Lincoln, includes participants from Arkansas, UNL, Missouri, KU, KSU, OU – Hardware grant from Sun and NSF provide 4 small Opteron clusters for the starting grid environment – Applications are in the process of being defined Amy Apon, Ph.D. ● University of Arkansas ● October 4, 2006 Collaboration via SURA Grid • Uark has a 30-node Pentium cluster in SURAGrid • Some differences with GPN – CA is different – Account management, discovery stacks are different – AUP policy is different - SURA Grid applications are increasing. Uark can run coastal modeling and is open to running other SURA applications Amy Apon, Ph.D. ● University of Arkansas ● October 4, 2006 More Collaboration Mechanisms • Arkansas is participating with the recently awarded CI-TEAM award to OU, PI Neeman – Will deploy Condor across Oklahoma and with participating collaborators • LSF Multicluster provides another mechanism for collaboration • AREON will give the University of Arkansas great bandwidth Amy Apon, Ph.D. ● University of Arkansas ● October 4, 2006 UofA Current HPC Challenges • We have some I/O infrastructure challenges – The system was designed to have a large amount of storage, but it is not fast • Supercomputing operations – AC, power, and UPS need to be upgraded • Funding models for on-going operations – How will basic systems administration and project director be funded? Amy Apon, Ph.D. ● University of Arkansas ● October 4, 2006 Collaboration and sharing bring a challenge • Usage policies – How do you partition usage fairly among existing users? – How do you incorporate usage from new faculty? Current policy uses a fair-share scheduling policy. Dynamic Priority = (# shares) / (#slots*F1 + cpu_time*F2 + run_time*F3); Shares divided among largest users groups: chem, phys, others Amy Apon, Ph.D. ● University of Arkansas ● October 4, 2006 Collaboration and sharing bring a challenge • Are max run times needed? – Almost everyone has them – Requires checkpointing of jobs which is hard to do with our current I/O infrastructure – Requires user education and a change of culture • Are user allocations and accounting of usage needed? • Your suggestions here Amy Apon, Ph.D. ● University of Arkansas ● October 4, 2006 Questions? Contact information: http://hpc.uark.edu http://comp.uark.edu/~aapon [email protected] Amy Apon, Ph.D. ● University of Arkansas ● October 4, 2006