Crays, Clusters, Centers and Grids Gordon Bell ([email protected]) Bay Area Research Center Microsoft Corporation Copyright Gordon Bell Clusters & Grids.

Download Report

Transcript Crays, Clusters, Centers and Grids Gordon Bell ([email protected]) Bay Area Research Center Microsoft Corporation Copyright Gordon Bell Clusters & Grids.

Crays, Clusters, Centers and
Grids
Gordon Bell
([email protected])
Bay Area Research Center
Microsoft Corporation
Copyright Gordon Bell
Clusters & Grids
Summary






Sequential & data parallelism using shared
memory, Fortran computers 60-90
Search for parallelism to exploit micros 85-95
Users adapted to the clusters aka multicomputers by lcd program model, MPI. >95
Beowulf standardized clusters of standard
hardware and software >1998
“Do-it-yourself” Beowulfs impede new
structures and threaten centers >2000
High speed nets kicking in to enable Grid.
Copyright Gordon Bell
Clusters & Grids
Outline






Retracing scientific computing evolution: Cray,
DARPA SCI & “killer micros”, Clusters kick in.
Current taxonomy: clusters flavors
deja’vu rise of commodity computng: Beowulfs
are a replay of VAXen c1980
Centers
Role of Grid and Peer-to-peer
Will commodities drive out new ideas?
Copyright Gordon Bell
Clusters & Grids
DARPA Scalable Computing
Initiative c1985-1995; ASCI





Motivated by Japanese 5th Generation
Realization that “killer micros” were
Custom VLSI and its potential
Lots of ideas to build various high
performance computers
Threat and potential sale to military
Copyright Gordon Bell
Clusters & Grids
Steve Squires
& G Bell at
our “Cray” at
the start of
Darpa’s SCI.
Copyright Gordon Bell
Clusters & Grids
Dead Supercomputer Society
Copyright Gordon Bell
Clusters & Grids
Dead Supercomputer Society





















ACRI
Alliant
American Supercomputer
Ametek
Applied Dynamics
Astronautics
BBN
CDC
Convex
Cray Computer
Cray Research
Culler-Harris
Culler Scientific
Cydrome
Dana/Ardent/Stellar/Stardent
Denelcor
Elexsi
ETA Systems
Evans and Sutherland
Computer
Floating Point Systems
Galaxy YH-1






















Goodyear Aerospace MPP
Gould NPL
Guiltech
Intel Scientific Computers
International Parallel Machines
Kendall Square Research
Key Computer Laboratories
MasPar
Meiko
Multiflow
Myrias
Numerix
Prisma
Tera
Thinking Machines
Saxpy
Scientific Computer Systems (SCS)
Soviet Supercomputers
Supertek
Supercomputer Systems
Suprenum
Vitesse Electronics
DARPA Results






Many research and construction efforts …
virtually all failed.
DARPA directed purchases… screwed up the
market, including the many VC funded efforts.
No Software funding.
Users responded to the massive power
potential with LCD software.
Clusters, clusters, clusters using MPI.
It’s not scalar vs vector, its memory bandwidth!
–
–
6-10 scalar processors = 1 vector unit
16-64 scalars = a 2 – 6 processor SMP
Copyright Gordon Bell
Clusters & Grids
CDC
1960
1965
1970
1604
6600
7600
1975
1980
Star
205
Cray Research Vector and SMPvector
Cray 1
MPPs (DEC/Compaq Alpha)
SMP(Sparc)
SGI MIPS
1985
XMP
1990
1995
2000
ETA 10
2
YMP
C
T
SVs----->
sold to SUN
SMP & Scalable SMP buy & sell Cray Research
?
The evolution of vector supercomputers
Cray Inc.
?
Tera Computer (Multi-Thread Arch.)
_--
HEP@Denelcor |---------
Cray Computer
SRC Company (Intel based shared memory multiprocessor)
Fujitsu vector
Hitachi vector
NEC vector
IBM vector
Other parallel
Cray 3
MTA1,2
4
SRC1
VP 100 …
-------------------->
Hitachi 810...
----------->
SX1…
SX5
2938 vector processor
Illiac IV, TI ASC
Intel Microprocessors
8008
8086,8 286
3090 vector processing
386
486
Pentium
Itanium
1960
CDC
1604
1965
1970
1975
1980
1985
1990
1995
2000
The evolution of Cray Inc.
6600
7600
Star
Cray Research Vector and SMPvector
Cray 1
MPPs (DEC/Compaq Alpha)
SMP(Sparc)
205
XMP
ETA 10
2
YMP
C T
SVs----->
sold to SUN
SGI MIPS SMP & Scalable SMP buy & sell Cray Research
?
Cray Inc.
Tera Computer (Multi-Thread Arch.) _--
?
HEP@Denelcor |---------
Cray Computer
SRC Company (Intel based shared memory multiprocessor)
MTA1,2
Cray 3 4
SRC1
Top500 taxonomy… everything is
a cluster aka multicomputer

Clusters are the ONLY scalable structure
–


Cluster: n, inter-connected computer nodes
operating as one system. Nodes: uni- or SMP.
Processor types: scalar or vector.
MPP= miscellaneous, not massive (>1000),
SIMD or something we couldn’t name
Cluster types. Implied message passing.
–
–
–
–
–
Constellations = clusters of >=16 P, SMP
Commodity clusters of uni or <=4 Ps, SMP
DSM: NUMA (and COMA) SMPs and constellations
DMA clusters (direct memory access) vs msg. pass
Uni- and SMPvector clusters:
Vector Clusters and Vector Constellations
Copyright Gordon Bell
Clusters & Grids
Copyright Gordon Bell
Clusters & Grids
The Challenge leading to Beowulf






NASA HPCC Program begun in 1992
Comprised Computational Aero-Science and
Earth and Space Science (ESS)
Driven by need for post processing data
manipulation and visualization of large data sets
Conventional techniques imposed long user
response time and shared resource contention
Cost low enough for dedicated single-user
platform
Requirement:
–

1 Gflops peak, 10 Gbyte, < $50K
Commercial systems: $1000/Mflops or 1M/Gflops
Copyright Gordon Bell
Clusters & Grids
Linux - a web phenomenon








Linus Tovald - bored Finish graduate student
writes news reader for his PC, uses Unix model
Puts it on the internet for others to play
Others add to it contributing to open source
software
Beowulf adopts early Linux
Beowulf adds Ethernet drivers for essentially all
NICs
Beowulf adds channel bonding to kernel
Red Hat distributes Linux with Beowulf software
Low level Beowulf cluster management tools
added
Copyright Gordon Bell
Clusters & Grids
Copyright Gordon Bell
Clusters & Grids
Courtesy of Dr. Thomas Sterling, Caltech
The Virtuous Economic Cycle
drives the PC industry… & Beowulf
Attracts
suppliers
Greater
availability
@ lower cost
Standards
Attracts users
Copyright Gordon Bell
Creates apps,
tools, training,
Clusters & Grids
BEOWULF-CLASS
SYSTEMS

Cluster of PCs
–
–
–


Pure M2COTS
Unix-like O/S with source
–


Linux, BSD, Solaris
Message passing programming model
–

Intel x86
DEC Alpha
Mac Power PC
PVM, MPI, BSP, homebrew remedies
Single user environments
Large science and engineering
Copyright Gordon Bell
Clusters & Grids
applications
Interesting “cluster” in a cabinet

366 servers per 44U cabinet
–
–
–



Single processor
2 - 30 GB/computer (24 TBytes)
2 - 100 Mbps Ethernets
~10x perf*, power, disk, I/O per cabinet
~3x price/perf
Network services… Linux based
*42, 2 processors, 84 Ethernet, 3 TBytes
Copyright Gordon Bell
Clusters & Grids
Lessons from Beowulf










An experiment in parallel computing systems
Established vision- low cost high end computing
Demonstrated effectiveness of PC clusters for
some (not all) classes of applications
Provided networking software
Provided cluster management tools
Conveyed findings to broad community
Tutorials and the book
Provided design standard to rally community!
Standards beget: books, trained people, software
… virtuous cycle that allowed apps to form
Industry begins to form beyond a research project
Courtesy, Thomas Sterling, Caltech.
Direction and concerns





Commodity clusters are evolving to be mainline
supers
Beowulf do-it-yourself effect is like VAXen
… clusters have taken a long time.
Will they drive out or undermine centers?
Or is computing so complex as to require a
center to manage and support complexity?
Centers:
–
–

Data warehouses
Community centers e.g. weather
Will they drive out a diversity of ideas?
Assuming there are some?
Copyright Gordon Bell
Clusters & Grids
Grids: Why now?
Copyright Gordon Bell
Clusters & Grids
The virtuous cycle of bandwidth
supply and demand
Increased
Demand
Increase Capacity
(circuits & bw)
Standards
Create new
service
Telnet & FTP
EMAIL
Lower
response time
WWW
Audio
Voice!
Video
Redmond/Seattle,
Map
of GrayWABell Prize results
single-thread single-stream tcp/ip
New York
via 7 hops
desktop-to-desktop …Win 2K
out of the box performance*
Arlington, VA
San Francisco,
CA
5626 km
10 hops
Copyright Gordon Bell
Clusters & Grids
The Promise of SAN/VIA:10x in 2 years
http://www.ViArch.org/

Yesterday:
–
–
–

10 MBps (100 Mbps Ethernet)
~20 MBps tcp/ip saturates
2 cpus
round-trip latency ~250 µs
Now
–
250
Time µs to
Send 1KB
200
150
Transmit
receivercpu
sender cpu
100
Wires are 10x faster
Myrinet, Gbps Ethernet, ServerNet,…
–
Fast user-level
communication
-

tcp/ip ~ 100 MBps 10% cpu
round-trip latency is 15 us
1.6 Gbps
demoed
Copyright
Gordon Bellon a WAN
50
0
100Mbps
Gbps
SAN
Clusters & Grids
SNAP
… c1995
Scalable Network And Platforms
A View of Computing in 2000+
We all missed the impact of WWW!
Platform
Gordon Bell
Copyright Gordon Bell
Network
Jim Gray
Clusters & Grids
How Will Future Computers Be Built?
Thesis: SNAP: Scalable Networks and Platforms
• Upsize from desktop to world-scale computer
• based on a few standard components
Platform
Network
Because:
• Moore’s law:
exponential progress
• Standardization & Commoditization
• Stratification and competition
When: Sooner than you think!
• Massive standardization gives massive use
• Economic forces are enormous
Computing
SNAP
built entirely
from PCs
Wide-area
global
network
Mobile
Nets
Wide & Local
Area Networks
for: terminal,
PC, workstation,
& servers
Person
Person
servers
servers
(PCs)
(PCs)
???
TC=TV+PC
home ...
(CATV or ATM
or satellite)
Portables
Legacy
mainframes &
Legacy
minicomputers
mainframe
& terms
servers &
minicomputer
servers & terminals
scalable computers
built from PCs
Centralized
&Centralized
departmental
uni& mP servers
&
departmental
(UNIX
& NT)
servers
buit
from
PCs
A space, time
(bandwidth), &
generation scalable
environment
Copyright Gordon Bell
Clusters & Grids
SNAP Architecture----------
Copyright Gordon Bell
Clusters & Grids
GB plumbing from the baroque:
evolving from the 2 dance-hall model
Mp ---- S --- Pc
:
|
:
|——————-- S.fiber ch. — Ms
|
:
|— S.Cluster
|— S.WAN —
vs.
MpPcMs — S.Lan/Cluster/Wan —
:
Copyright Gordon Bell
Clusters & Grids
Grids: Why?
The problem or community
dictates a Grid
 Economics… thief or scavenger
 Research funding… that’s where
the problems are

Copyright Gordon Bell
Clusters & Grids
The Grid… including P2P

GRID was/is an exciting concept …
–
–

They can/must work within a community,
organization, or project. What binds it?
“Necessity is the mother of invention.”
Taxonomy… interesting vs necessity
–
–
–
–
–
–
–
–
Cycle scavenging and object evaluation
(e.g. seti@home, QCD, factoring)
File distribution/sharing aka IP theft
(e.g. Napster, Gnutella)
Databases &/or programs and experiments
(astronomy, genome, NCAR, CERN)
Workbenches: web workflow chem, bio…
Single, large problem pipeline… e.g. NASA.
Exchanges… many sites operating together
Transparent web access aka load balancing
Facilities
managed PCs operating
as cluster!
Copyright Gordon Bell
Clusters
& Grids
Some observations






Clusters are purchased, managed, and used
as a single, one room facility.
Clusters are the “new” computers. They
present unique, interesting, and critical
problems… then Grids can exploit them.
Clusters & Grids have little to do with one
another… Grids use clusters!
Clusters should be a good simulation of
tomorrow’s Grid.
Distributed PCs: Grids or Clusters?
Perhaps some clusterable problems can be
solved on a Grid… but it’s unlikely.
– Lack
of understanding clusters & variants
Copyright Gordon Bell
Clusters & Grids
–
Socio-, political, eco- wrt to Grid.
deja’ vu

ARPAnet: c1969
–
–

To use remote programs & data
Got FTP & mail. Machines & people overloaded.
NREN: c1988
–
–
–
BW => Faster FTP for images, data
Latency => Got http://www…
Tomorrow => Gbit communication BW, latency


<’90 Mainframes, minis, PCs/WSs
>’90 very large, dep’t, & personal clusters


VAX: c1979 one computer/scientist
Beowulf: c1995 one cluster ∑PCs /scientist


1960s batch: opti-use allocate, schedule,$
2000s
GRID: opti-use allocate,
schedule,
$
Copyright Gordon Bell
Clusters
& Grids
(… security, management, etc.)
The end
Copyright Gordon Bell
Clusters & Grids
Modern scalable switches …
also hide a supercomputer
Scale from <1 to 120 Tbps
 1 Gbps ethernet switches scale to
10s of Gbps, scaling upward
 SP2 scales from 1.2

Copyright Gordon Bell
Clusters & Grids
CMOS Technology Projections

2001
–
–

2005
–
–

logic: 0.10 um, 250 Mtr, 2.0 GHz
memory: 17.2 Gbits, 1.45 access
2008
–
–

logic: 0.15 um, 38 Mtr, 1.4 GHz
memory: 1.7 Gbits, 1.18 access
logic: 0.07 um, 500 Mtr, 2.5 GHz
memory: 68.7 Gbits, 1.63 access
2011
–
–
logic: 0.05 um, 1300 Mtr, 3.0 GHz
memory: 275 Gbits, 1.85 access
Copyright Gordon Bell
Clusters & Grids
Future Technology Enablers




SOCs: system-on-a-chip
GHz processor clock rate
VLIW
64-bit processors
–
–



scientific/engineering application
address spaces
Gbit DRAMs
Micro-disks on a board
Optical fiber and wave division multiplexing
communications (free space?)
Copyright Gordon Bell
Clusters & Grids
The End
How can GRIDs become a
non- ad hoc computer
structure?
Get yourself an application
community!
Copyright Gordon Bell
Clusters & Grids
p
e
r
f
o
r
m
a
n
c
e
Volume drives simple,
cost to standard
price for
platforms Stand-alone
Desk tops
high speed
interconnect
Distributed
workstations
PCs
Clustered
Computers
1-4 processor mP
MPPs
1-20 processor mP
price
Copyright Gordon Bell
Clusters & Grids
CDC
1960
1965
1970
1604
6600
7600
1975
1980
Star
205
Cray Research Vector and SMPvector
Cray 1
MPPs (DEC/Compaq Alpha)
SMP(Sparc)
SGI MIPS
1985
XMP
1990
1995
2000
ETA 10
2
YMP
C
T
SVs----->
sold to SUN
SMP & Scalable SMP buy & sell Cray Research
?
Cray Inc.
?
Tera Computer (Multi-Thread Arch.)
_--
HEP@Denelcor |---------
Cray Computer
SRC Company (Intel based shared memory multiprocessor)
Fujitsu vector
Hitachi vector
NEC vector
IBM vector
Other parallel
Cray 3
MTA1,2
4
SRC1
VP 100 …
-------------------->
Hitachi 810...
----------->
SX1…
SX5
2938 vector processor
Illiac IV, TI ASC
Intel Microprocessors
8008
8086,8 286
3090 vector processing
386
486
Pentium
Itanium
In a 5-10 years we can/will have:

more powerful personal computers
– processing 10-100x; multiprocessors-on-a-chip
–
–
–

adequate networking? PCs now operate at 1 Gbps
–
–

4x resolution (2K x 2K) displays to impact paper
Large, wall-sized and watch-sized displays
low cost, storage of one terabyte for personal use
ubiquitous access = today’s fast LANs
Competitive wireless networking
One chip, networked platforms e.g. light bulbs,
cameras


Some well-defined platforms that compete with the PC
for mind (time) and market share
watch, pocket, body implant, home (media, set-top)
Inevitable, continued cyberization… the challenge…
interfacing platforms and people.
Linus’s & Stahlman’s Law:
Linux everywhere
aka Torvald Stranglehold







Software is or should be free
All source code is “open”
Everyone is a tester
Everything proceeds a lot faster when
everyone works on one code
Anyone can support and
market the code for any price
Zero cost software attracts users!
All the developers write code
ISTORE Hardware Vision


System-on-a-chip enables computer, memory,
without significantly increasing size of disk
5-7 year target:
MicroDrive:1.7” x 1.4” x 0.2”
2006: ?
1999: 340 MB, 5400 RPM,
5 MB/s, 15 ms seek
2006: 9 GB, 50 MB/s ? (1.6X/yr capacity,
1.4X/yr BW)
Integrated IRAM processor
2x height
Connected via crossbar switch
growing like Moore’s law
16 Mbytes; ; 1.6 Gflops; 6.4 Gops
10,000+ nodes in one rack! 100/board = 1
TB; 0.16 Tf
Copyright Gordon Bell
Clusters & Grids
The Disk Farm? or
a System On a Card?
14"
The 500GB disc card
An array of discs
Can be used as
100 discs
1 striped disc
50 FT discs
....etc
LOTS of accesses/second
of bandwidth
A few disks are replaced by 10s of Gbytes
of RAM
and a processor to run
Apps!!
Copyright Gordon Bell
Clusters
& Grids