The Sharing and Training of HPC Resources at the University of Arkansas Amy Apon, Ph.D. Oklahoma Supercomputing Symposium October 4, 2006

Download Report

Transcript The Sharing and Training of HPC Resources at the University of Arkansas Amy Apon, Ph.D. Oklahoma Supercomputing Symposium October 4, 2006

The Sharing and Training of
HPC Resources at the
University of Arkansas
Amy Apon, Ph.D.
Oklahoma Supercomputing Symposium
October 4, 2006
Outline of Talk
• HPC at the University of Arkansas
– Current status
• A New Mechanism for Sharing Resources
– AREON
• HPC Training
– New course delivery via HDTV collaboration with
LSU
• Collaboration opportunities and challenges
– GPNGrid and SURAGrid
– Resource allocation issues
Amy Apon, Ph.D. ● University of Arkansas ● October 4, 2006
High Performance Computing Resources
at the University of Arkansas
• Red Diamond supercomputer
– NSF MRI grant, August, 2004
• Substantial University match
• Substantial gift from Dell
– First supercomputer in Arkansas
• Number 379 on the Top 500 list, June, 2005
• 128 node (256 processor), 1.349 TFlops
Amy Apon, Ph.D. ● University of Arkansas ● October 4, 2006
More Resources
• Prospero cluster
– 30 dual processor PIII nodes
– SURAGrid resource
• Ace cluster
– 4 dual processor Opteron
– Our entry point to the GPNGrid/Open Science Grid
• Trillion cluster
– 48 dual processor Opteron
– Owned by Mechanical Engineering
– About 1TFlop
Amy Apon, Ph.D. ● University of Arkansas ● October 4, 2006
How are we doing
Amy Apon, Ph.D. ● University of Arkansas ● October 4, 2006
We are seeing research results
• Computational Chemistry and Materials
Science (NSF)
– New formulas for new drugs
– Nanomaterials
– Chemistry, Physics, Mechanical Engineering
– Over 95% of our usage is in these areas
Amy Apon, Ph.D. ● University of Arkansas ● October 4, 2006
Research results in other areas, also
• Multiscale Modeling
• DNA Computing
• Middleware and HPC Infrastructure
– Tools for managing data for large-scale applications (NSF)
– Performance modeling of grid systems (Acxiom)
Amy Apon, Ph.D. ● University of Arkansas ● October 4, 2006
We have done some significant upgrades
• For the first year we used SGE on half the
computer and half of the computer was selfscheduled PVM jobs
• LSF scheduler installed
May 2006
• About 60 users, about 10 very active users
Amy Apon, Ph.D. ● University of Arkansas ● October 4, 2006
Thanks to LSF, we are busy
LSF Daily Pending Parallel Job Statistics
by Queue (jobs waiting)
Amy Apon, Ph.D. ● University of Arkansas ● October 4, 2006
And jobs have to wait
LSF Hourly Turnaround Time of
Normal Queue
Amy Apon, Ph.D. ● University of Arkansas ● October 4, 2006
We have something exciting to share
Amy Apon, Ph.D. ● University of Arkansas ● October 4, 2006
AREON
Arkansas Research and Education Optical Network
TULSA
Fayetteville
Jonesboro
Russellville
Fort Smith
Conway
MEMPHIS
Little Rock
Pine Bluff
Arkadelphia
Monticello
DALLAS
Magnolia
MONROE
25-July-2006
AREON
Arkansas Research and Education Optical Network
• The first bond issue (last fall) failed
TULSA
Fayetteville
Jonesboro
Russellville
Fort Smith
Conway
• Governor Huckabee of Arkansas granted $6.4M
Little Rock
(PI Zimmerman)
MEMPHIS
Pine Bluff
• MBO loop between Tulsa and
Fayetteville fiber is in
Arkadelphia
place, network hardware is being shipped
• The campus (last mile) connectionsMonticello
are in progress
DALLAS
Magnolia
All is on target for a demo to the Governor on 12/5/06!
MONROE
AREON
Arkansas Research and Education Optical Network
•
This fall, Uark will have connectivity to Internet2 and
the National Lambda Rail
• The bond issue is on the ballot again this coming fall
• If it passes then the other research institutions
will be connected to AREON
We hope this happens!
• The timeframe for this is about a year and a half
Amy Apon, Ph.D. ● University of Arkansas ● October 4, 2006
Opportunities for collaboration with
OneNet, LEARN, LONI, GPN, and others
LEARN Topology for State
Denton
Lubbock
Dallas
Longview
Waco
El Paso
College
Station
Austin
Houston
Beaumont
San Antonio
Galveston
LEARN Topology
NLR Topology
Leased Lambda
LEARN Site
Metro Interconnect City
Corpus Christi
Amy Apon, Ph.D. ● University of Arkansas ● October 4, 2006
A Demonstration Application
• High Performance Computing
New course in Spring 2007
–
–
–
–
In collaboration with LSU and Dr. Thomas Sterling
We are exploring new methods of course delivery using
streaming high-definition TV
We expect about 40 students at five locations this time
Taught live via Access Grid and HDTV
over AREON and LONI, …
A test run for future delivery of HPC education
Amy Apon, Ph.D. ● University of Arkansas ● October 4, 2006
Collaboration via GPN Grid
– Active middleware collaboration for
almost 3 years
– GPNGrid is in the process of making
application as a new Virtual
Organization in Open Science Grid
– Sponsored by University of Nebraska
– Lincoln, includes participants from
Arkansas, UNL, Missouri, KU, KSU,
OU
– Hardware grant from Sun and NSF
provide 4 small Opteron clusters for
the starting grid environment
– Applications are in the process of
being defined
Amy Apon, Ph.D. ● University of Arkansas ● October 4, 2006
Collaboration via SURA Grid
• Uark has a 30-node
Pentium cluster in
SURAGrid
• Some differences with
GPN
– CA is different
– Account management,
discovery stacks are
different
– AUP policy is different
- SURA Grid applications are increasing. Uark can run coastal
modeling and is open to running other SURA applications
Amy Apon, Ph.D. ● University of Arkansas ● October 4, 2006
More Collaboration Mechanisms
• Arkansas is participating with the recently
awarded CI-TEAM award to OU, PI Neeman
– Will deploy Condor across Oklahoma and with
participating collaborators
• LSF Multicluster provides another mechanism
for collaboration
• AREON will give the University of Arkansas
great bandwidth
Amy Apon, Ph.D. ● University of Arkansas ● October 4, 2006
UofA Current HPC Challenges
• We have some I/O infrastructure challenges
– The system was designed to have a large amount
of storage, but it is not fast
• Supercomputing operations
– AC, power, and UPS need to be upgraded
• Funding models for on-going operations
– How will basic systems administration and project
director be funded?
Amy Apon, Ph.D. ● University of Arkansas ● October 4, 2006
Collaboration and sharing bring a challenge
• Usage policies
– How do you partition usage fairly
among existing users?
– How do you incorporate usage from new faculty?
Current policy uses a fair-share scheduling policy.
Dynamic Priority =
(# shares) / (#slots*F1 + cpu_time*F2 + run_time*F3);
Shares divided among largest users groups: chem, phys, others
Amy Apon, Ph.D. ● University of Arkansas ● October 4, 2006
Collaboration and sharing bring a challenge
• Are max run times needed?
– Almost everyone has them
– Requires checkpointing of jobs
which is hard to do with our current I/O infrastructure
– Requires user education and a change of culture
• Are user allocations and accounting of usage
needed?
• Your suggestions here
Amy Apon, Ph.D. ● University of Arkansas ● October 4, 2006
Questions?
Contact information:
http://hpc.uark.edu
http://comp.uark.edu/~aapon
[email protected]
Amy Apon, Ph.D. ● University of Arkansas ● October 4, 2006