THsort PennySort Award Ceremony Beijing China 19 October 2002 Peng Liu, Yao Shi, Li Zhang, Kuo Zhang, Tian Wang, | ZunChong Tian, Hao Wang, Xiaoge WangTrophy presentation by Jim Gray.

Download Report

Transcript THsort PennySort Award Ceremony Beijing China 19 October 2002 Peng Liu, Yao Shi, Li Zhang, Kuo Zhang, Tian Wang, | ZunChong Tian, Hao Wang, Xiaoge WangTrophy presentation by Jim Gray.

THsort
PennySort
Award Ceremony
Beijing China
19 October 2002
Peng Liu,
Yao Shi,
Li Zhang,
Kuo Zhang,
Tian Wang, |
ZunChong Tian,
Hao Wang,
Xiaoge Wang
1
Trophy presentation by Jim Gray
Outline
•
•
•
•
Penny Sort history and Award
The need for long-range research
Some long-range systems research goals.
What I have been doing.
2
Benchmark History
IBM TP 1-7
CA and Tony Lukes
1970
Debit Credit
Gray
Wisconsin
Bitton Boral DeWitt Turbyfill
1980
Sort
TPC-A
TPC-B
1990
PennySort
MinuteSort
2000
Datamation
Anon et al
TPC-C
TPC-W ?
MCC
Boral &...
Teradata
Bollinger &...
TPC-D
3
A Short History of Sort
• April Fools 1995: Datamation Sort
– Sort 1M 100 B records
– An IO benchmark: 15-min to 1 hr!
• 1993: {Minute | Penny}x{Daytona | Indy}
• 1998: TeraByte Sort
• Web site:
http://research.Microsoft.com/barc/SortBenchmark/
4
Ground Rules
• How much can you sort for a penny (in a minute).
–
–
–
–
–
Hardware and Software cost
Depreciated over 3 years
1M$ system gets about 1 second,
1K$ system gets about 1,000 seconds.
Time (seconds) = SystemPrice ($) / 946,080
• Input and output are disk resident
• Input is
– 100-byte records (random data)
– key is first 10 bytes.
• Must create output file
and fill with sorted version of input file.
• Daytona (product) and Indy (special) categories
5
PennySort
• Hardware
– 266 Mhz Intel PPro
– 64 MB SDRAM (10ns)
– Dual Fujitsu DMA 3.2GB EIDE disks
• Software
– NT workstation 4.3
– NT 5 sort
• Performance
PennySort Machine (1107$ )
Disk
25%
– sort 15 M 100-byte records (~1.5 GB)
board
13%
– Disk to disk
– elapsed time 820 sec
• cpu time = 404 sec
Cabinet +
Assembly
7%
Memory
8%
Other
22%
Network,
Video, floppy
9%
Software
6%
cpu
32%
6
1999 PennySort
• Daytona & Indy:
2.58 GB in 917 sec
• HMsort:
Brad Helmkamp,
Keith McCready,
Stenograph LLC
• Intel 400Mhz
2 IDE disks
7
1998 TB Sort
• Chris Nyberg
Nsort
SGI 32x Origin2000
151 Minutes
8
1999 Terabyte Sort
• Daytona:
Daivd Cossock, Sam Fineberg,
Pankaj Mehra, John Peck
Tandem/Sandia TSort:
68 CPU ServerNet
47 minutes
• Indy:
IBM SPsort
408 nodes, 1952 cpu 2168 disks
17.6 minutes = 1057sec
(all for 1/3 of 94M$, slice price is 64k$ for 4cpu, 2GB ram, 6 9GB disks + interconnect
9
SP sort
4.0
• 2 – 4 GBps!
3.5
GPFS read
GPFS write
3.0
Local read
Local write
GB/s
2.5
2.0
1.5
1.0
0.5
0.0
0
100
200
300
400
500
600
700
800
900
Elapsed time (seconds)
56 nodes
18 racks
Storage
432 nodes
37 racks
compute
488 nodes 55 racks
1952 processors, 732 GB RAM, 2168 disks
56 storage nodes manage 1680 4GB disks
336 4+P twin tail RAID5 arrays (30/node)
Compute rack:
16 nodes, each has
4x332Mhz PowerPC604e
1.5 GB RAM
1 32x33 PCI bus
9 GB scsi disk
150MBps full duplex SP switch
Storage rack:
8 nodes, each has
4x332Mhz PowerPC604e
1.5 GB RAM
3 32x33 PCI bus
RAID5)
30x4 GB scsi disk (4+110
150MBps full duplex SP switch
2002 Sort Records
Daytona
1999
Sort Records
Penny
Minute
9.8 GB 1098 seconds
11.6 GB 1380 seconds
105 million records $857 Linux/Intel
THsort, report as doc (128KB) or pdf (33KB)
Peng Liu, Yao Shi, Li Zhang, Kuo Zhang,
Tian Wang, ZunChong Tian, Hao Wang,
Xiaoge Wang
High Performance Institute,
Dept. of Computer Science and Technology,
Tsinghua University, Beijing 100084, China
125 m records on a $672 Linux/Intel system
DMsort pdf (660KB), ps(950KB)
Araron Darling, Alex Mohr,
U. Wisconsin, Madison
12 GB in 60 seconds
21.8 GB in 56.51 sec
Ordinal Nsort
218 million records
NOW+HPVMsort 64 nodes WinNT pdf.
Luis Rivera , Andrew Chien UCSD
SGI 32 cpu Origin IRIX
TeraByte
Indy
49 minutes
Daivd Cossock, Sam Fineberg,
Pankaj Mehra, John Peck
68x2 Compaq &Sandia Labs
1057 seconds
SPsort 1952 SP
Jm Wyllie
cluster 2168 disks
PDF SPsort.pdf (80KB)
11
The THsort Team
(and friend)
12
2x/year!
Records Sorted per Second
Doubles Every Year
• Partly hardware1.E+06
• Partly software
• Partly economics
THsort ~
1TB/$
1.E+03
1.E+00
GB Sorted per Dollar
Doubles Every Year
1.E-03
1985
1990
1995
13
2000
Progress on Sorting
• Speedup comes from Moore’s law 40%/year
• Processor/Disk/Network arrays: 60%/year
THsort
(this is a software speedup).
~1TB/$
SPsort
1.E+08
Records Sorted per Second
Doubles Every Year
Sort Re cords/se cond vs T ime
SPsort/ IB
1.E+07
1.E+06
NOW
IBM RS6000
1.E+06
IBM 3090
1.E+05
NT/PennySort
Alpha
1.E+03
Compaq/NT
Cray YMP
Sequent
1.E+04
1.E+03
Sandia/Compaq
/NT
Ordinal+SGI
Intel
HyperCube
Penny
NT sort
1.E+00
Kitsuregawa
Hardware Sorter
Tandem
1.E+02
1985
Bitton M68000
1990
1995
1.E-03
1985
2000
GB Sorted per Dollar
Doubles Every Year
1990
1995
14
2000
Musings: PennySort=TBsort
•
•
•
•
•
•
•
•
Sorts 1TB in 1Minute
2 pass so 3TB of disk
= 10 disks if 330GB/disk
= 5Gps (if each disk is 50Mbps)
So, 600 seconds (3TB/5GBps)
So, node costs 1.5k$
Costs 100x that today
maybe in 4 years?
15
Outline
•
•
•
•
Penny Sort history and Award
The need for long-range research
Some long-range systems research goals.
What I have been doing.
16
Properties of a Research Goal
•
•
•
•
Simple to state.
Not obvious how to do it.
Clear benefit.
Can be broken into smaller steps
– So that you can see intermediate progress.
• Progress and solution is testable.
17
I was motivated by a simple goal
1. Devise an architecture that scales up:
Grow the system without limits*.
scaleup:
This is impossible (without limits?), but... 1,000,000 : 1
This meant
automatic parallelism,
automatic management,
distributed,
fault tolerant,
high performance
• Benefits:
– long term vision guides research problems
– simple to state, so attracts colleagues and support
– Can tell your friends & family what it is that you 18do .
Three Seminal Papers
• Babbage: Computers
• Bush: Automatic Information storage & access
• Turing: Intelligent Machines
• Note:
– Previous Turing lectures
described several “theory” problems.
– Problems here are “systems” problems.
– Some include a “and prove it” clause.
– They are enabling technologies, not applications.
– Newell’s: Intelligent Universe (Ubiquitous computing.)
missing because I could not find “simple-to-state” problems.
19
Charles Babbage (1791-1871)
• Babbage’s computing goals have been realized
– But we still need better algorithms & faster machines
• What happens when
– Computers are free and infinitely powerful?
– Bandwidth and storage is free and infinite?
• Remaining limits:
– Content: the core asset of cyberspace
– Software: Bugs, >100$ per line of code (!)
– Operations: > 1,000 $/node/year
20
ops/s/$ Had Three Growth Curves
1890-1990
1890-1945
Mechanical
Relay
7-year doubling
1945-1985
Tube, transistor,..
2.3 year doubling
1985-2000
Microprocessor
1.0 year doubling
Combination of Hans Moravac + Larry Roberts + Gordon Bell
WordSize*ops/s/sysprice
1.E+09
ops per second/$
doubles every
1.0 years
1.E+06
1.E+03
1.E+00
1.E-03
doubles every
7.5 years
doubles every
2.3 years
1.E-06
1880
1900
1920
1940
1960
1980
2000
21
Trouble-Free Appliances
• Appliance just works. TV, PDA, desktop, ...
• State replicated in safe place (somewhere else)
• If hardware fails, or is lost or stolen,
replacement arrives next day (plug&play).
• If software faults,
software and state refresh from server.
• If you buy a new appliance, it plugs in and refreshes
from the server (as though the old one failed)
• Most vendors are building towards this vision.
22
• Browsers come close to working this way.
Trouble-Free Systems
•
Manager
–
–
–
–
Sets goals
Sets policy
Sets budget
System does the rest.
–
–
used by millions of people each day
Administered and managed by a ½ time person.
• Everyone is a CIO (Chief Information Officer)
9. Build a system
•
•
•
On hardware fault, order replacement part
On overload, order additional equipment
Upgrade hardware and software automatically.
23
Trustworthy Systems
•
Build a system used by millions of people that
10. Only services authorized users
•
•
Service cannot be denied (can’t destroy data or power).
Information cannot be stolen.
11. Is always available: (out less than 1 second per 100 years = 8 9’s of availability)
•
–
•
1950’s
Today
90% availability,
99% uptime for web sites,
99.99% for well managed sites (50 minutes/year)
3 extra 9s in 45 years.
Goal: 5 more 9s: 1 second per century.
And prove it.
24
100 $ line of code?
1 bug per thousand lines?
• 20 $ to design and write it. • The only thing in Cyber
• 30 $ to test and document it.
Space that is getting
MORE expensive &
• 50 $ to maintain it.
LESS reliable
100$ total
Solution so far:
• Write fewer lines
High level languages
• Non Procedural
•10x not 1,000x better
Very domain specific
• Application generators:
Web sites, Databases, ...
• Semi-custom apps:
SAP, PeopleSoft,..
• Scripting & Objects
JavaScript & DOM
25
Automatic Programming
Do What I Mean (not 100$ Line of code!, no programming bugs)
The holy grail of programming languages & systems
12. Devise a specification language or UI
1.
2.
3.
•
System should “reason” about application
–
–
–
•
•
That is easy for people to express designs (1,000x easier),
That computers can compile, and
That can describe all applications (is complete).
Ask about exception cases.
Ask about incomplete specification.
But not be onerous.
This already exists in domain-specific areas.
(i.e. 2 out of 3 already exists)
An imitation game for a programming staff.
26
Outline
•
•
•
•
Penny Sort history and Award
The need for long-range research
Some long-range systems research goals.
What I have been doing.
27
What I Have Been Doing
•
•
•
•
•
•
Traveling & Talking
Helping Alex Build the SkyServer
Loading data
Helping build the Virtual Observatory
Doing spatial geometry in SQL (no kidding)!
Learning about web services
(and implementing some)
28