Rules of Thumb in Data Engineering Jim Gray International Conference on Data Engineering San Diego, CA 4 March 2000 Gray@Microsoft.com, http://research.Microsoft.com/~Gray/Talks/

Rules of Thumb in Data Engineering Jim Gray International Conference on Data Engineering San Diego, CA 4 March 2000 [email protected], http://research.Microsoft.com/~Gray/Talks/

Transcript Rules of Thumb in Data Engineering Jim Gray International Conference on Data Engineering San Diego, CA 4 March 2000 [email protected], http://research.Microsoft.com/~Gray/Talks/

Rules of Thumb in Data
Engineering
Jim Gray
International Conference on Data Engineering
San Diego, CA
4 March 2000
[email protected],
http://research.Microsoft.com/~Gray/Talks/
1
Credits & Thank You!!
Prashant Shenoy U. Mass, Amherst
analysis of web caching rules. [email protected]
Terrance Kelly, U. Michigan,
lots of advice on fixing the paper, [email protected]
interesting work on caching at:
http://ai.eecs.umich.edu/~tpkelly/papers/wcp.pdf
Dave Lomet, Paul Larson, Surajit Chaudhuri
how big should database pages be?
Remzi Arpaci-Dusseau, Kim Keeton, Erik Riedel
discussions about balanced systems an IO
Windsor Hsu, Alan Smith, & Honesty Young,
also studied TPC-C and balanced systems (very nice work!)
http://golem.cs.berkeley.edu/~windsorh/DBChar/
Anastassia Ailamaki, Kim Keeton
cpi measurements
Gordon Bell
discussions on balanced systems.
2
and Apology…..
Printed/Published paper has MANY bugs!
 Conclusions OK (sort of ), but typos, flaws, errors,…
 Revised version at
http://research.microsoft.com/~Gray/ and
in CoRR and
MS Research tech report archive.
Sorry!
By 15 March 2000.

Sorry!
Woops!
3
Outline
Moore’s Law and consequences
Storage rules of thumb
Balanced systems rules revisited
Networking rules of thumb
Caching rules of thumb
4
Trends: Moore’s Law
Performance/Price doubles every 18 months
100x per decade
Progress in next 18 months
= ALL previous progress


New storage = sum of all old storage (ever)
New processing = sum of all old processing.
E. coli double ever 20 minutes!
15 years ago
5
Trends:
ops/s/$ Had Three Growth Phases
1890-1945
Mechanical
Relay
7-year doubling
1945-1985
Tube, transistor,..
2.3 year doubling
1985-2000
Microprocessor
1.0 year doubling
1.E+09
ops per second/$
doubles every
1.0 years
1.E+06
1.E+03
1.E+00
1.E-03
doubles every
7.5 years
doubles every
2.3 years
1.E-06
1880
1900
1920
1940
1960
1980
2000
6
Trends: Gilder’s Law:
3x bandwidth/year for 25 more years
Today:



10 Gbps per channel
4 channels per fiber: 40 Gbps
32 fibers/bundle = 1.2 Tbps/bundle
In lab 3 Tbps/fiber (400 x WDM)
In theory 25 Tbps per fiber
1 Tbps = USA 1996 WAN bisection bandwidth
Aggregate bandwidth doubles every 8 months!
1 fiber = 25 Tbps
7
Trends: Magnetic Storage Densities
Amazing progress
Ratios have changed
Capacity grows 60%/y
Access speed grows
10x more slowly
Magnetic Disk Parameters vs
1000000
Time
100000
10000
1000
100
tpi
kbpi
MBps
Gbpsi
10
1
0.1
0.01
year 84
88
92
96
00
04
8
Trends: Density Limits
Bit Density
The end is near!
Products:11 Gbpsi
Lab:
35 Gbpsi
“limit”: 60 Gbpsi
But
limit keeps rising
& there are
alternatives
b/µm2 Gb/in2
3,000 2,000
Density vs Time
b/µm2 & Gb/in2
?: NEMS,
Florescent?
Holograpic,
DNA?
1,000 600
300 200
SuperParmagnetic Limit
100
60
30
20
10
6
3
2
Wavelength Limit
DVD
ODD
CD
1
0.6
Figure adapted from Franco Vitaliano,
“The NEW new media: the growing attraction
1990 1992 1994 1996 1998 2000 2002 2004 2006 2008
of nonmagnetic storage”,
9
Data Storage, Feb 2000, pp 21-32, www.datastorage.com
Trends: promises
NEMS (Nano Electro Mechanical Systems)
(http://www.nanochip.com/)
also Cornell, IBM, CMU,…
• 250 Gbpsi by using
tunneling electronic microscope
• Disk replacement
• Capacity:
•
•
•
•
180 GB now,
1.4 TB in 2 years
Transfer rate: 100 MB/sec R&W
Latency: 0.5msec
Power: 23W active, .05W Standby
10k$/TB now, 2k$/TB in 2002
10
Consequence of Moore’s law:
Need an address bit every 18 months.
Moore’s law gives you 2x more in 18 months.
RAM


Today we have 10 MB to 100 GB machines
(24-36 bits of addressing) then
In 9 years we will need 6 more bits:
30-42 bit addressing (4TB ram).
Disks


Today we have 10 GB to 100 TB file systems/DBs
(33-47 bit file addresses)
In 9 years, we will need 6 more bits
11
40-53 bit file addresses (100 PB files)
Architecture could change this
1-level store:



System 48, AS400 has 1-level store.
Never re-uses an address.
Needs 96-bit addressing today.
NUMAs and Clusters


Willing to buy a 100 M$ computer?
Then add 6 more address bits.
Only 1-level store pushes us beyond 64-bits
Still, these are “logical” addresses,
64-bit physical will last many years
12
Outline
Moore’s Law and consequences
Storage rules of thumb
Balanced systems rules revisited
Networking rules of thumb
Caching rules of thumb
13
Storage Latency:
How Far Away is the Data?
10 9
Andromeda
Tape /Optical
Robot
10 6 Disk
100
10
2
1
Memory
On Board Cache
On Chip Cache
Registers
2,000 Years
Pluto
Olympia
2 Years
1.5 hr
This Hotel
10 min
This Room
My Head
1 min
14
Storage Hierarchy :
Speed & Capacity vs Cost Tradeoffs
1012
Disc
Secondary
109
Main
106
Price vs Speed
Cache
102
Offline Main
Tape
100
Secondary
Online
Online
Tape
Disc Tape 10-2
Offline
Nearline
Tape
Tape
10-4
Cache
103
10-6
10-9 10-6 10-3 10 0 10 3
Access Time (seconds)
10-9 10-6 10-3 10 0 10 3
Access Time (seconds)
15
$/MB
Typical System (bytes)
1015
Size vs Speed
Nearline
Tape
Disks: Today
Disk is 8GB to 80 GB
10-30 MBps
5k-15k rpm (6ms-2ms rotational latency)
12ms-7ms seek
7K$/IDE-TB, 20k$/SCSI-TB
For shared disks most time spent
waiting in queue for access to
arm/controller
Wait
Transfer
Transfer
Rotate
Rotate
Seek
Seek
16
Standard Storage Metrics
Capacity:



RAM:
Disk:
Tape:
MB and $/MB: today at 512MB and
3$/MB
GB and $/GB: today at 40GB and 20$/GB
TB and $/TB: today at
40GB and 10k$/TB
(nearline)
Access time (latency)



RAM:
Disk:
Tape:
100 ns
15 ms
30 second pick, 30 second position
Transfer rate



RAM:
Disk:
Tape:
1-10 GB/s
20-30 MB/s - - -Arrays can go to 10GB/s
5-15 MB/s - - - Arrays can go to 1GB/s
17
New Storage Metrics:
Kaps, Maps, SCAN
Kaps: How many kilobyte objects served per second


The file server, transaction processing metric
This is the OLD metric.
Maps: How many megabyte objects served per sec

The Multi-Media metric
SCAN: How long to scan all the data

the data mining and utility metric
And

Kaps/$, Maps/$, TBscan/$
18
Storage Ratios Changed
10x better access time
10x more bandwidth
100x more capacity
Data 25x cooler (1Kaps/20MB vs 1Kaps/500MB)
4,000x lower media price
20x to 100x lower disk price
Scan takes 10x longer (3 min vs 45 min)
1
1980
1990
Year



1970-1990
100:1
1990-1995
10:1
1995-1997
50:1
today ~ 0.03$/MB disk 100:1
3$/MB dram
0.1
2000
Storage Price vs Time
Megabytes per kilo-dollar
100
10,000.
1,000.
MB/k$
Accesses per Second
1.
Capacity (GB)
seeks per second
bandwidth: MB/s
10.
10

Disk accesses/second
vs Time
Disk Performance vs Time
100
DRAM/disk media price ratio
changed
10
100.
10.
1.
1
1980
1990
Year
2000
0.1
1980
1990
Year
22
2000
Data on Disk
Can Move to RAM in 10 years
Storage Price vs Time
Megabytes per kilo-dollar
10,000.
100:1
MB/k$
1,000.
100.
10.
10 years1.
0.1
1980
1990
Year
2000
23
More Kaps and Kaps/$ but….
1970
1980
1990
1000
100
10
2000
100 GB
30 MB/s
24
Kaps/disk
Kaps/$
Disk accesses got much less expensive
Better disks
Kaps over time
Cheaper disks!
1.E+6
Kaps/$
But: disk arms
1.E+5
1.E+4
are expensive
the scarce resource 1.E+3
1.E+2
45 minute Scan
Kaps
1.E+1
vs 5 minutes in 1990 1.E+0
Disk vs Tape
Disk

40 GB
20 MBps
5 ms seek time
3 ms rotate latency
7$/GB for drive
3$/GB for ctlrs/cabinet
4 TB/rack

1 hour scan





Tape

40 GB
10 MBps
10 sec pick time
30-120 second seek time
2$/GB for media
8$/GB for drive+library
10 TB/rack

1 week scan





Guestimates
Cern: 200 TB
3480 tapes
2 col = 50GB
Rack = 1 TB
=20 drives
The price advantage of tape is narrowing, and
the performance advantage of disk is growing
At 10K$/TB, disk is competitive with nearline tape.
25
It’s Hard to Archive a Petabyte
It takes a LONG time to restore it.
At 1GBps it takes 12 days!
Store it in two (or more) places online
A geo-plex
(on disk?).
Scrub it continuously (look for errors)
On failure,


use other copy until failure repaired,
refresh lost copy from safe copy.
Can organize the two copies differently
(e.g.: one by time, one by space)
27
The “Absurd” 10x (=5 year) Disk
2.5 hr scan time
(poor sequential access)
1 aps / 5 GB
(VERY cold data)
It’s a tape!
100 MB/s
200 Kaps
1 TB
28
How to cool disk data:
Cache data in main memory

See 5 minute rule later in presentation
Fewer-larger transfers

Larger pages (512-> 8KB -> 256KB)
Sequential rather than random access


Random 8KB IO is 1.5 MBps
Sequential IO is 30 MBps (20:1 ratio is growing)
Raid1 (mirroring) rather than Raid5
(parity).
29
Stripes, Mirrors, Parity (RAID 0,1, 5)
RAID 0: Stripes

bandwidth
RAID 1: Mirrors, Shadows,…


Fault tolerance
Reads faster, writes 2x slower
RAID 5: Parity



Fault tolerance
Reads faster
Writes 4x or 6x slower.
0,3,6,..
1,4,7,..
0,1,2,..
2,5,8,..
0,1,2,..
0,2,P2,.. 1,P1,4,.. P0,3,5,..
30
RAID 10 (strips of mirrors) Wins
“wastes space, saves arms”
RAID 5 (6 disks 1 vol):
Performance



675 reads/sec
210 writes/sec
Write
RAID1 (6 disks, 3 pairs)
Performance



750 reads/sec
300 writes/sec
Write
 4 logical IO,
 2 logical IO
 2 seek + 1.7 rotate
 2 seek 0.7 rotate
SAVES SPACE
Performance
degrades on failure
SAVES ARMS
Performance
improves on failure
31
Auto Manage Storage
1980 rule of thumb:

A DataAdmin per 10GB, SysAdmin per mips
2000 rule of thumb


A DataAdmin per 5TB
SysAdmin per 100 clones (varies with app).
Problem:

5TB is 60k$ today, 10k$ in a few years.

Admin cost >> storage cost !!!!
Challenge:

Automate ALL storage admin tasks
33
Summarizing storage rules of thumb (1)
Moore’s law: 4x every 3 years
100x more per decade
Implies 2 bit of addressing every 3 years.
Storage capacities increase 100x/decade
Storage costs drop 100x per decade
Storage throughput increases 10x/decade
Data cools 10x/decade
Disk page sizes increase 5x per decade.
34
Summarizing storage rules of thumb (2)
RAM:Disk and Disk:Tape cost ratios are
100:1
and
3:1
So, in 10 years, disk data can move to RAM
since prices decline 100x per decade.
A person can administer a million dollars of
disk storage: that is 1TB - 100TB today
Disks are replacing tapes as backup devices.
You can’t backup/restore a Petabyte quickly
so geoplex it.
Mirroring rather than Parity to save disk arms
35
Outline
Moore’s Law and consequences
Storage rules of thumb
Balanced systems rules revisited
Networking rules of thumb
Caching rules of thumb
36
Standard Architecture (today)
System Bus
PCI Bus 1
PCI Bus 2
37
Amdahl’s Balance Laws
parallelism law: If a computation has a serial
part S and a parallel component P,
then the maximum speedup is (S+P)/S.
balanced system law: A system needs
a bit of IO per second per instruction per second:
about 8 MIPS per MBps.
memory law: =1:
the MB/MIPS ratio (called alpha ()),
in a balanced system is 1.
IO law:
Programs do one IO per 50,000 instructions.
38
Amdahl’s Laws Valid 35 Years Later?
Parallelism law is algebra: so SURE!
Balanced system laws?
Look at tpc results (tpcC, tpcH) at http://www.tpc.org/
Some imagination needed:

What’s an instruction (CPI varies from 1-3)?
 RISC, CISC, VLIW, … clocks per instruction,…

What’s an I/O?
39
TPC systems
Normalize for CPI (clocks per instruction)


TPC-C has about 7 ins/byte of IO
TPC-H has 3 ins/byte of IO
TPC-H needs ½ as many disks, sequential vs random
Both use 9GB 10 krpm disks (need arms, not bytes)
KB IO/s
MHz/
Disk Disks MB/s
CPI mips
/
/
s
/ cpu / cpu
cpu
IO disk
Amdahl
1
1
1
6
TPC-C=
random
550
2.1
262
8
100
397
50
40
TPC-H=
sequential
550
1.2
458
64
100
176
22
141
Ins/
IO
Byte
40
8
7
3
TPC systems: What’s alpha
(=MB/MIPS)
?
Hard to say:



Intel 32 bit addressing (= 4GB limit). Known CPI.
IBM, HP, Sun have 64 GB limit.
Unknown CPI.
Look at both, guess CPI for IBM, HP, Sun
Alpha is between 1 and 6
Mips
Memory
Alpha
Amdahl
1
1
tpcC Intel
8x262 = 2Gips
4GB
tpcH Intel
8x458 = 4Gips
4GB
tpcC IBM
24 cpus ?= 12 Gips
64GB
tpcH HP
32 cpus ?= 16 Gips
32 GB
1
2
1
6
241
Amdahl’s Balance Laws Revised
Laws right, just need “interpretation”
Balanced System Law:
A system needs 8 MIPS/MBpsIO,
(imagination?)
but instruction rate must be measured on the workload.
 Sequential workloads have low CPI (clocks per instruction),
 random workloads tend to have higher CPI.
Alpha (the MB/MIPS ratio) is rising from 1 to 6.
This trend will likely continue.
One Random IO’s per 50k instructions.
Sequential IOs are larger
One sequential IO per 200k instructions
43
PAP vs RAP
Peak Advertised Performance vs
Real Application Performance
Application
Data
File System
CPU
System Bus
550 x4 Mips = 2 Bips
1600 MBps
500 MBps
PCI
1-3 cpi = 170-550 mips
System Bus
133 MBps
90 MBps
SCSI
PCI Bus 1
PCI Bus 2
160 MBps
90 MBps
Disks
66 MBps
25 MBps
44
Outline
Moore’s Law and consequences
Storage rules of thumb
Balanced systems rules revisited
Networking rules of thumb
Caching rules of thumb
45
Ubiquitous 10 GBps SANs
in 5 years
1Gbps Ethernet are reality now.

Also FiberChannel ,MyriNet, GigaNet,
ServerNet,, ATM,…
10 Gbps x4 WDM deployed now

(OC192)
1 GBps
3 Tbps WDM working in lab
In 5 years, expect 10x,
wow!!
120 MBps
(1Gbps)
80 MBps
40 MBps
5 MBps
20 MBps
47
Networking
WANS are getting faster than LANS
G8 = OC192 = 8Gbps is “standard”
Link bandwidth improves 4x per 3 years
Speed of light (60 ms round trip in US)
Software stacks
have always been the problem.
Time = SenderCPU + ReceiverCPU + bytes/bandwidth
This has been
the problem
48
The Promise of SAN/VIA:10x in 2 years
http://www.ViArch.org/
Yesterday:



10 MBps (100 Mbps Ethernet)
250
~20 MBps tcp/ip saturates
200
2 cpus
round-trip latency ~250 µs 150
Now

Wires are 10x faster
Myrinet, Gbps Ethernet, ServerNet,…

Fast user-level
communication
 tcp/ip ~ 100 MBps 10% cpu
Time µs to
Send 1KB
Transmit
receivercpu
sender cpu
100
50
0
100Mbps
Gbps
SAN
 round-trip latency is 15 us
1.6 Gbps demoed on a WAN
49
How much does wire-time cost?
$/Mbyte?
Gbps Ethernet
100 Mbps Ethernet
OC12 (650 Mbps)
DSL
POTs
Wireless:
Cost
Time
.2µ$
.3µ$
.003$
.0006$
.002$
.80$
10 ms
100 ms
20 ms
25 sec
200 sec
500 sec
Seat cost
$/3y
GBpsE
2000
100MbpsE
700
OC12
12960000
OC3
3132000
T1
28800
DSL
2300
POTS
1180
Wireless ?
Bandwidt
h
B/s
$/MB
Time
1.00E+08
2.E-07
0.010
1.00E+07
7.E-07
0.100
5.00E+07
3.E-03
0.020
3.00E+06
1.E-02
0.333
1.00E+05
3.E-03
10.000
4.00E+04
6.E-04
25.000
5.00E+03
2.E-03 200.000
2.00E+03
8.E-01 500.000
seconds in 3 years
94608000
50
Outline
Moore’s Law and consequences
Storage rules of thumb
Balanced systems rules revisited
Networking rules of thumb
Caching rules of thumb
52
The Five Minute Rule
Trade DRAM for Disk Accesses
Cost of an access (Drive_Cost / Access_per_second)
Cost of a DRAM page ( $/MB/ pages_per_MB)
Break even has two terms:
Technology term and an Economic term
PagesPerMBofDRAM
PricePerDi skDrive
BreakEvenReferenceInterval 

AccessPerSecondPerDi sk PricePerMB ofDRAM
Grew page size to compensate for changing ratios.
Now at 5 minutes for random, 10 seconds sequential
53
The 5 Minute Rule Derived
$
T =TimeBetweenReferences to Page
Breakeven:
RAM_$_Per_MB =
PagesPerMB
T=
_____DiskPrice
.
T x AccessesPerSecond
DiskPrice
x PagesPerMB
.
RAM_$_Per_MB x AccessPerSecond
54
Plugging in the Numbers
BreakEvenReferenceInterval 
PagesPerMBofDRAM
PricePerDi skDrive

AccessPerSecondPerDi sk PricePerMB ofDRAM
PPM/aps
Random
128/120
Sequential
1/30 ~
disk$/Ram$
Break Even
~1 1000/3 ~300
5 minutes
.03
~ 300
10seconds
Trend is longer times because
disk$ not changing much,
RAM$ declining 100x/decade
5 Minutes & 10 second rule
55
When to Cache Web Pages.
Caching
Caching
Caching
Caching


saves user time
saves wire time
costs storage
only works sometimes:
New pages are a miss
Stale pages are a miss
56
The 10 Instruction Rule
Spend 10 instructions /second to save 1 byte
Cost of instruction:
I =ProcessorCost/MIPS*LifeTime
Cost of byte:
B = RAM_$_Per_B/LifeTime
Breakeven:
NxI = B
N = B/I = (RAM_$_B X MIPS)/ ProcessorCost
~ (3E-6x5E8)/500 = 3 ins/B for Intel
~ (3E-6x3E8)/10 = 10 ins/B for ARM
57
Web Page Caching Saves People Time
Assume people cost 20$/hour (or .2 $/hr ???)
Assume 20% hit in browser, 40% in proxy
Assume 3 second server time
Caching saves people time
28$/year to 150$/year of people time
or .28 cents to 1.5$/year.
connection
cache
R_remote
seconds
LAN
LAN
Modem
Modem
Mobile
Mobile
proxy
browser
proxy
browser
proxy
browser
3
3
5
5
13
13
R_local
seconds
H
hit rate
0.3
0.1
2
0.1
10
0.1
0.4
0.2
0.4
0.2
0.4
0.2
People
S avings
¢/page
0.6
0.3
0.7
0.5
0.7
58
1.4
Web Page Caching Saves Resources
Wire cost is penny (wireless) to 100µ$ LAN
Storage is 8 µ$/mo
Breakeven: wire cost = storage rent
4 to 7 months
Add people cost: breakeven is ~ 4 years.
“cheap people” (.2$/hr)  6 to 8 months.
Time =  A/B
A
B
C
$/10 KB
$/10 KB
download
storage/mo
cache
Internet/LAN
network
1.E-04
8.E-06
storage
time
18 months
$
0.02
Modem
Wireless
2.E-04
1.E-02
8.E-06
2.E-04
36 months
300 years
0.03
0.07
Break-even People Cost
Time =
(A+ C )/B
of download Break Even
15 years
21 years
>99959years
Caching
Disk caching


5 minute rule for random IO
11 second rule for sequential IO
Web page caching:

If page will be re-referenced in
18 months: with free users
15 years: with valuable users
then cache the page in the client/proxy.
Challenge:
guessing which pages will be re-referenced
detecting stale pages (page velocity)
60
Outline
Moore’s Law and consequences
Storage rules of thumb
Balanced systems rules revisited
Networking rules of thumb
Caching rules of thumb
61

Rules of Thumb in Data Engineering Jim Gray International Conference on Data Engineering San Diego, CA 4 March 2000 [email protected], http://research.Microsoft.com/~Gray/Talks/

Transcript Rules of Thumb in Data Engineering Jim Gray International Conference on Data Engineering San Diego, CA 4 March 2000 [email protected], http://research.Microsoft.com/~Gray/Talks/

Directory