Computers are Free, Now What? Premise: You're a Fortune 1,000 CIO I’m a DB+OS guy selling CyberBricks What can I say in an hour.
Download ReportTranscript Computers are Free, Now What? Premise: You're a Fortune 1,000 CIO I’m a DB+OS guy selling CyberBricks What can I say in an hour.
Computers are Free, Now What? Premise: You're a Fortune 1,000 CIO I’m a DB+OS guy selling CyberBricks What can I say in an hour that you do not know? How can I help you plan for CyberBricks? Jim Gray Microsoft Research [email protected] http://research.Microsoft.com/~Gray 415 778 8222 1 Outline • Why cost per transaction dropped 100,000x in 10 years. • How does that change things? • What next (technology trends) • Clusters of Hardware and Software CyberBricks 2 Systems 30 Years Ago • MegaBuck per Mega Instruction Per Second (mips) • MegaBuck per MagaByte • Sys Admin & Data Admin per MegaBuck 3 Disks of 30 Years Ago • 10 MB • Failed every few weeks 4 • • • • 1988: IBM DB2 + CICS Mainframe 65 tps IBM 4391 Simulated network of 800 clients 2m$ computer Staff of 6 to do benchmark 2 x 3725 network controllers Refrigerator-sized CPU 16 GB disk farm 4 x 8 x .5GB 5 1987: Tandem Mini @ 256 tps • 14 M$ computer (Tandem) • A dozen people (1.8M$/y) • False floor, 2 rooms of machines 32 node processor array Admin expert Performance Hardware experts expert Network expert Auditor Manager Simulate 25,600 clients 40 GB disk array (80 drives) DB expert OS expert 6 1997: 9 years later 1 Person and 1 box = 1250 tps • • • • 1 Breadbox ~ 5x 1987 machine room 23 GB is hand-held One person does all the work Cost/tps is 100,000x less 5 micro dollars per transaction Hardware expert OS expert Net expert DB expert App expert 4x200 Mhz cpu 1/2 GB DRAM 12 x 4GB disk 3 x7 x 4GB disk arrays7 Cost Per Transaction • Industry uses $/tps (or $/tpm): 5 year cost of hardware and software to get 1 tps. • There are about 1 Million seconds in 3 years • So, if $/tps is 1$, $/t is 1 micro-dollar. • 1988: mini: 50K$/tps mainframe: 150k$/tps – 5 cents to 15 cents per transaction • 1998: micro: 30$/tpmc = 50¢/tpsC –5 micro-dollars per transaction note it is actually 6x less than this, tpcC is 6x tpcA 8 UNIX vs WindowsNT • Solaris on SPARC range 11,559 tpmC @ 57$/tpmc (Sybase) to 51,871 tpmC @ 135 tpmC (Oracle) • SQL on NT/Compaq range11,748 tpmC @ 27$/tmpC to 18,129 tpmC @ 27 $/tpmC • NT price per transaction is 2x to 4x less, peak performance per node is 3x less. • Markup is in TPC Price/tpmC Oracle and SPARC Sun Oracle 52 k tpmC @ 134$/tpmC (disk and DRAM HP+ NT4 +SQL Server 16.2 ktpmC @ 33$/tpmC prices OK.) 50 45 45 40 • • Note:current NT prices are 27$/tpmC not 33 $/tpmC so 23% lower than shown UNIX is 5x less than MVS according to David Matthews, “Large Server TCO: The UNIX advantage”, Unix Review Feb 1998 Reseller Supplement, pp 3-11 35 35 30 30 25 20 17 15 10 12 8 7 5 4 5 3 0 processor disk software net total/10 9 What Happened? Where did the 100,000x come from? • • • • Moore’s law: 100X (at most) Software improvements: 10X (at most) Commodity Pricing: 100X (at least) Total 100,000X • 100x from commodity – (DBMS was 100K$ to start: now 1k$ to start – IBM 390 MIPS is 7.5K$ today – Intel MIPS is 10$ today – Commodity disk is 50$/GB vs 1,500$/GB – ... time 10 Outline • Why cost per transaction has dropped 100,000x in 10 years. • How does that change things? • What next (technology trends) • Clusters of Hardware and Software CyberBricks 11 What does 1 μ$/t Mean? • Human Attention is the precious resource. • Content is the precious resource • Impressions (eyeballs) sell for 10,000 μ $ to 100,000 μ $ • All costs (and value) is in content and admin. • Aside, this month, the TerraServer got 400M hits, 40 M impressions a 2M$/mo asset (for satellite photos.) • That’s why everyone is hot on portals. 12 Administration Costs • Vendor Rule of thumb (1970s mainframe) – one systems programmer per MIPS – one data admin per 10 GB • DataCenter Rule of thumb: – Hardware & Facilities is 40% – Labor is 60% – => 100 sys pgmrs and 1 data admin per laptop! • 1995 Federal study of their data centers – 1 to 3 MIPS per admin! (http://research.microsoft.com/~gray/NC_Servers.doc) • Thin client: – – – – move admin to server claim: save admin costs reality: move admin costs to expensive fixed staff Time will tell. 13 Content Costs • For most web sites – Most staff are doing content – Admin is small fraction of content • RULE OF THUMB: – Hardware/software/facilities/admin is 10% of content – Content is 90% of cost – This seems to apply to • microsoft.com, msn, WebTV, HotMail, Inktomi • MAIN CONCLUSION – Hardware, software, admin is in micro$/t range – Unix and mainframes are 2x or 10x more micro$ – Who cares? Cost is in content – Look for content creation/management tools 14 Legacy Latency:a personal tale • 1970s helped company X covert to IMS/Fast Path • 1980s helped company X experiment with Tandem mini-computers • 1990s visit and ask: – Why are you still buying those mainframes? • Answers: 1. They are up all the time (99.99% up). 2. 25 years ago ROI was 18 months, now it is 1 week. 3.A rewrite would cost more than it would ever save. 4. My career would not survive a rewrite. 5. The devil you know is better than the devil you don’t. 15 Put Anther Way • You are ATT or the airlines industry or... You do 300 M transactions/day • The capital cost of these transactions is – 300 $/day on NT – 1,000 $/day on Solaris – 10,000 $/day on MVS • Who cares? Revenue and costs are 200,000,000 $/day So, transaction cost is .01% or .0001%. • But, if productivity is higher on Solaris or NT… Or if tools exist on them, then…. Or if cost of 2nd or 3rd environment is huge (staff), then... • New apps should not go on MVS! • Investing in SNA? Investing in IMS? Investing in TPF?.. 16 What Happens Next • Last 10 years: 100,000x improvement • Next 10 years: ???? • Today: 1985 1995 text and image servers are free 25 m$/hit => advertising pays for them • Future: video, audio, … servers are free “You ain’t seen nothing yet!” 2005 17 And So... • Traditional transaction processing is a zero-billion dollar industry -• Growth is in new apps Point-to-Point Immediate Time Shifted Broadcast conversation money lecture concert mail book newspaper Net work Data Base Its ALL going electronic Immediate is being stored for analysis (so ALL database) Analysis & Automatic Processing are being added 18 Low rent min $/byte Shrinks time now or later Shrinks space here or there Automate processing knowbots Immediate OR Time Delayed Why Put Everything in Cyberspace? Point-to-Point OR Broadcast Network Locate Process Analyze Summarize Data Base 19 Kilo Mega Giga Tera Peta Exa Zetta Some Tera-Byte Databases • • • • • The Web: 1 TB of HTML TerraServer 1 TB of images Many 1 TB (file) servers Hotmail: 7 TB of email Sloan Digital Sky Survey: 40 TB raw, 2 TB cooked • EOS/DIS (picture of planet each week) – 15 PB by 2007 • Federal Clearing house: images of checks – 15 PB by 2006 (7 year history) • Nuclear Stockpile Stewardship Program Yotta – 10 Exabytes (???!!) 20 A novel Kilo A letter Mega Library of Congress (text) LoC (sound + cinima) All Disks All Tapes Giga Tera Peta Exa A Movie LoC (image) All Photos Zetta All Information! Yotta 22 Michael Lesk’s Points www.lesk.com/mlesk/ksg97/ksg.html • Soon everything can be recorded and kept • Most data will never be seen by humans • Precious Resource: Human attention Auto-Summarization Auto-Search will be a key enabling technology. 23 Outline • Why cost per transaction has dropped 100,000x in 10 years. • How does that change things? • What next (technology trends) • Clusters of Hardware and Software CyberBricks 24 Technology (hardware) NOW • CPU: nearing 1 BIPS – but CPI rising fast (2-10) so less than 100 mips – 1$/mips to 10$/mips • DRAM: 3 $/MB • DISK: 30 $/GB • TAPE: 2003 Forecast (10x better) • CPU: 1BIPS real (smp) – 0.1$ - 1$/mips • DRAM: 1 Gb chip – 0.1 $/MB • Disk: – 10 GB smart cards 500GB RAID packs (NTinside) – 3$ GB – 20 GB/tape, 6 MBps • Tape – Lags disk – 2$/GB offline, 15$/GB nearline – ? 25 System On A Chip • Integrate Processing with memory on one chip – – – – chip is 75% memory now 1MB cache >> 1960 supercomputers 256 Mb memory chip is 32 MB! IRAM, CRAM, PIM,… projects abound • Integrate Networking with processing on one chip – system bus is a kind of network – ATM, FiberChannel, Ethernet,.. Logic on chip. – Direct IO (no intermediate bus) • Functionally specialized cards shrink to a chip. 26 Thesis Many little beat few big $1 million 3 1 MM $100 K $10 K Pico Processor Micro Mini Mainframe Nano 1 MB 10 pico-second ram 10 nano-second ram 100 MB 10 GB 10 microsecond ram 1 TB 14" 9" 5.25" 3.5" 2.5" 1.8" 10 millisecond disc 100 TB 10 second tape archive Smoking, hairy golf ball How to connect the many little parts? How to program the many little parts? Fault tolerance? 1 M SPEC marks, 1TFLOP 106 clocks to bulk ram Event-horizon on chip VM reincarnated Multi-program cache, On-Chip SMP 27 Storage Latency: How Far Away is the Data? 109 Andromeda Tape /Optical Robot 106 Disk 100 10 2 1 Memory On Board Cache On Chip Cache Registers 2,000 Years Pluto Sacramento 2 Years 1.5 hr This Campus 10 min This Room My Head 1 min 28 Gilder’s Telecosom Law: 3x bandwidth/year for 25 more years • Today: – 10 Gbps per channel – 4 channels per fiber: 40 Gbps – 32 fibers/bundle = 1.2 Tbps/bundle • In lab 3 Tbps/fiber (400 x WDM) • In theory 25 Tbps per fiber • 1 Tbps = USA 1996 WAN bisection bandwidth 1 fiber = 25 Tbps 29 Networking BIG!! Changes coming! • Technology – 10 GBps bus “now” – 1 Gbps links “now” – 1 Tbps links in 10 years – Fast & cheap switches • Standard interconnects • CHALLENGE – reduce software tax on messages – Today 30 K ins + 10 ins/byte – Goal: 1 K ins + .01 ins/byte – processor-processor • Best bet: – processor-device (=processor) – SAN/VIA • Deregulation WILL work – Smart NICs someday – Special protocol – User-Level Net IO (like disk)30 What if Networking Was as Cheap As Disk IO? • TCP/IP • Disk – Unix/NT 100% cpu @ 40MBps – Unix/NT 8% cpu @ 40MBps Why the Difference? Host does TCP/IP packetizing, checksum,… flow control small buffers Host Bus Adapter does SCSI packetizing, checksum,… flow control 31 DMA The Promise of SAN/VIA 10x better in 2 years • Today: – wires are 10 MBps (100 Mbps Ethernet) – ~20 MBps tcp/ip saturates 2 cpus – round-trip latency is ~300 us • In two years 250 200 Now Soon 150 100 50 0 Bandwidth Latency Overhead – wires are 100 MBps (1 Gbps Ethernet, ServerNet,…) – tcp/ip ~ 100 MBps 10% of each processor – round-trip latency is 20 us • works in lab today uses Winsock2 api. See http://www.viarch.org/ 32 SAN: Standard Interconnect Gbps Ethernet: 110 MBps PCI: 70 MBps UW Scsi: 40 MBps • LAN faster than memory bus? • 1 GBps links in lab. • 100$ port cost soon • Port is computer FW scsi: 20 MBps scsi: 5 MBps 33 Data Gravity Processing Moves to Transducers • Move Processing to data sources • Move to where the power (and sheet metal) is • Processor in – Modem – Display – Microphones (speech recognition) & cameras (vision) – Storage: Data storage and analysis 34 CyberBricks: Functionally Specialized Cards P mips processor • Storage ASIC Today: P= 20 mips M MB DRAM • Network M= 2 MB In a few years ASIC P= 200 mips M= 64 MB • Display ASIC 35 With Tera Byte Interconnect and Super Computer Adapters • Processing is incidental to – Networking – Storage – UI • Disk Controller/NIC is – faster than device – close to device – Can borrow device package & power Tera Byte Backplane • So use idle capacity for computation. • Run app in device. 36 All Device Controllers will be Cray 1’s • TODAY Central Processor & Memory – Disk controller is 10 mips risc engine with 2MB DRAM – NIC is similar power • SOON – Will become 100 mips systems with 100 MB DRAM. • They are nodes in a federation (can run Oracle on NT in disk controller). Tera Byte Backplane • Advantages – – – – – Uniform programming model Great tools Security economics (CyberBricks) Move computation to data (minimize traffic) 37 It’s Already True of Printers Peripheral = CyberBrick • You buy a printer • You get a – several network interfaces – A Postscript engine • • • • cpu, memory, software, a spooler (soon) – and… a print engine. 38 Disk = Node • • • • has magnetic storage (100 GB?) has processor & DRAM has SAN attachment has execution Applications environment Services DBMS RPC, ... File System SAN driver Disk driver OS Kernel 49 Outline • Why cost per transaction has dropped 100,000x in 10 years. • How does that change things? • What next (technology trends): CyberBricks • Clusters of Hardware and Software CyberBricks 50 All God’s Children Have Clusters! Buying Computing By the Slice • People are buying computers by the dozens – Computers only cost 1k$/slice! • Clustering them together 51 A cluster is a cluster is a cluster • It’s so natural, even mainframes cluster ! Looking closer at usage patterns, a few models emerge • Looking closer at sites, you see hierarchies bunches functional specialization 52 “Commercial” NT Clusters • 16-node Tandem Cluster – 64 cpus – 2 TB of disk – Decision support • 45-node Compaq Cluster – – – – 140 cpus 14 GB DRAM 4 TB RAID disk OLTP (Debit Credit) • 1 B tpd (14 k tps) 53 Tandem Oracle/NT • • • • 27,383 tpmC 71.50 $/tpmC 4 x 6 cpus 384 disks =2.7 TB 54 Microsoft.com: ~150x4 nodes Building 11 Log Processing Ave CFG:4xP6, Internal WWW 1 GB RAM, 180 GB HD Ave Cost:$128K FY98 Fcst:2 Staging Servers (7) The Microsoft.Com Site Ave CFG:4xP5, 512 RAM, 30 GB HD Ave Cost:$35K FY98 Fcst:12 FTP Servers Ave CFG:4xP5, 512 RAM, Download 30 GB HD Replication Ave Cost:$28K FY98 Fcst: 0 SQLNet Feeder LAN Router Live SQL Servers MOSWest Admin LAN Live SQL Server All servers in Building11 are accessable from corpnet. www.microsoft.com (4) register.microsoft.com (2) Ave CFG:4xP6, home.microsoft.com (4) premium.microsoft.com (2) Ave CFG:4xP6, 512 RAM, 30 GB HD Ave Cost:$35K FY98 Fcst:3 Ave CFG:4xP6, 512 RAM, 160 GB HD Ave Cost:$83K FY98 Fcst:12 Ave CFG:4xP6, 512 RAM, 50 GB HD Ave Cost:$35K FY98 Fcst:2 www.microsoft.com (4) Ave CFG:4xP6 512 RAM 28 GB HD Ave Cost: $35K FY98 Fcst: 17 FDDI Ring (MIS1) FDDI Ring (MIS2) activex.microsoft.com (2) Ave CFG:4xP6, 256 RAM, 30 GB HD Ave Cost:$25K FY98 Fcst:2 Router premium.microsoft.com (1) Internet Ave CFG:4xP5, 256 RAM, 20 GB HD Ave Cost:$29K FY98 Fcst:2 register.msn.com (2) search.microsoft.com (1) Japan Data Center www.microsoft.com premium.microsoft.com (3) (1) Ave CFG:4xP6, Ave CFG:4xP6, 512 RAM, 30 GB HD Ave Cost:$35K FY98 Fcst:1 512 RAM, 50 GB HD Ave Cost:$50K FY98 Fcst:1 FTP Download Server (1) HTTP Download Servers (2) SQL SERVERS (2) Ave CFG:4xP6, 512 RAM, 160 GB HD Ave Cost:$80K FY98 Fcst:1 msid.msn.com (1) Switched Ethernet search.microsoft.com (2) Router Secondary Gigaswitch \\Tweeks\Statistics\LAN and Server Name Info\Cluster Process Flow\MidYear98a.vsd 12/15/97 Router (100 Mb/Sec Each) support.microsoft.com (2) Ave CFG:4xP6, 512 RAM, 30 GB HD Ave Cost:$35K FY98 Fcst:9 13 DS3 (45 Mb/Sec Each) Ave CFG:4xP5, 512 RAM, 30 GB HD Ave Cost:$28K FY98 Fcst:0 register.microsoft.com (2) support.microsoft.com search.microsoft.com (1) (3) 2 Ethernet Router FTP.microsoft.com (3) register.microsoft.com (1) (100Mb/Sec Each) Internet Router msid.msn.com (1) 2 OC3 Primary Gigaswitch Router FDDI Ring (MIS3) Switched Ethernet Router Router home.microsoft.com (2) Ave CFG:4xP6, 512 RAM, 30 GB HD Ave Cost:$28K FY98 Fcst:7 Router msid.msn.com (1) FTP Download Server (1) SQL SERVERS (2) Ave CFG:4xP6, 512 RAM, 160 GB HD Ave Cost:$80K FY98 Fcst:1 Router Ave CFG:4xP6, 512 RAM, 30 GB HD Ave Cost:$28K FY98 Fcst:3 cdm.microsoft.com (1) Ave CFG:4xP5, 256 RAM, 12 GB HD Ave Cost:$24K FY98 Fcst:0 512 RAM, 30 GB HD Ave Cost:$35K FY98 Fcst:1 msid.msn.com (1) search.microsoft.com (3) home.microsoft.com (3) Ave CFG:4xP6, 1 GB RAM, 160 GB HD Ave Cost:$83K FY98 Fcst:2 msid.msn.com (1) 512 RAM, 30 GB HD Ave Cost:$43K FY98 Fcst:10 Ave CFG:4xP6, 512 RAM, 50 GB HD Ave Cost:$50K FY98 Fcst:17 www.microsoft.com (3) www.microsoft.com premium.microsoft.com (1) Ave CFG:4xP6, Ave CFG:4xP6,(3) 512 RAM, 50 GB HD Ave Cost:$50K FY98 Fcst:1 SQL Consolidators DMZ Staging Servers Router SQL Reporting Ave CFG:4xP6, 512 RAM, 160 GB HD Ave Cost:$80K FY98 Fcst:2 European Data Center IDC Staging Servers MOSWest www.microsoft.com (5) Internet FDDI Ring (MIS4) home.microsoft.com (5) 55 The Microsoft TerraServer Hardware • • • • Compaq AlphaServer 8400 8x400Mhz Alpha cpus 10 GB DRAM 324 9.2 GB StorageWorks Disks – 3 TB raw, 2.4 TB of RAID5 • STK 9710 tape robot (4 TB) • WindowsNT 4 EE, SQL Server 7.0 56 HotMail: ~400 Computers 57 Inktomi (hotbot), WebTV: > 200 nodes • Inktomi: ~250 UltraSparcs – – – – – web crawl index crawled web and save index Return search results on demand Track Ads and click-thrus ACID vs BASE (basic Availability, Serialized Eventually) • Web TV – ~200 UltraSparcs • Render pages, Provide Email – ~ 4 Network Appliance NFS file servers – A large Oracle app tracking customers 58 Loki: Pentium Clusters for Science http://loki-www.lanl.gov/ 16 Pentium Pro Processors x 5 Fast Ethernet interfaces + 2 Gbytes RAM + 50 Gbytes Disk + 2 Fast Ethernet switches + Linux…………………... = 1.2 real Gflops for $63,000 (but that is the 1996 price) Beowulf project is similar http://cesdis.gsfc.nasa.gov/pub/people/becker/beo wulf.html • Scientists want cheap mips. 59 Your Tax Dollars At Work ASCI for Stockpile Stewardship • Intel/Sandia: 9000x1 node Ppro • LLNL/IBM: 512x8 PowerPC (SP2) • LNL/Cray: ? • Maui Supercomputer Center – 512x1 SP2 60 Berkeley NOW (network of workstations) Project http://now.cs.berkeley.edu/ • 105 nodes – Sun UltraSparc 170, 128 MB, 2x2GB disk – Myrinet interconnect (2x160MBps per node) – SBus (30MBps) limited • • • • • GLUNIX layer above Solaris Inktomi (HotBot search) NAS Parallel Benchmarks Crypto cracker Sort 9 GB per second 61 Wisconsin COW • 40 UltraSparcs 64MB + 2x2GB disk + Myrinet • SUN OS • Used as a compute engine 62 Andrew Chien’s JBOB http://www-csag.cs.uiuc.edu/individual/achien.html • 48 nodes • 36 HP 2PIIx128 1 disk Kayak boxes • 10 Compaq 2PIIx128 1 disk, Wkstation 6000 • 32-Myrinet&16-ServerNet connected • Operational • All running NT 63 NCSA Super Cluster http://access.ncsa.uiuc.edu/CoverStories/SuperCluster/super.html • National Center for Supercomputing Applications University of Illinois @ Urbana • 512 Pentium II cpus, 2,096 disks, SAN • Compaq + HP +Myricom + WindowsNT • A Super Computer for 3M$ • Classic Fortran/MPI programming • DCOM programming model 64 • • • • • 1.2 B tpd 1 B tpd ran for 24 hrs. Out-of-the-box software Off-the-shelf hardware AMAZING! 20x smaller than Microsoft Internet Data Center (amazing!) •Sized for 30 days •Linear growth •5 micro-dollars per transaction 65 Scalability 1 billion transactions 100 million web hits • Scale up: to large SMP nodes • Scale out: to clusters of SMP nodes 4 terabytes of data 1.8 million mail messages 66 4 B PC’s (1 Bips, .1GB dram, 10 GB disk 1 Gbps Net, B=G) The Bricks of Cyberspace • Cost 1,000 $ • Come with – NT – DBMS – High speed Net – System management – GUI / OOUI – Tools • Compatible with everyone else • CyberBricks 67 Super Server: 4T Machine Array of 1,000 4B machines 1 b ips processors 1 B B DRAM 10 B B disks 1 Bbps comm lines 1 TB tape robot A few megabucks Challenge: CPU 50 GB Disc 5 GB RAM Manageability Programmability Security Cyber Brick a 4B machine Availability Scaleability Affordability As easy as a single system Future servers are CLUSTERS of processors, discs Distributed database techniques make clusters work 68 Cluster Vision Buying Computers by the Slice • Rack & Stack – Mail-order components – Plug them into the cluster • Modular growth without limits – Grow by adding small modules • Fault tolerance: – Spare modules mask failures • Parallel execution & data search – Use multiple processors and disks • Clients and servers made from the same stuff – Inexpensive: built with commodity CyberBricks 69 Nostalgia Behemoth in the Basement • today’s PC is yesterday’s supercomputer • Can use LOTS of them • Main Apps changed: – scientific commercial web – Web & Transaction servers – Data Mining, Web Farming 70 SMP -> nUMA: BIG FAT SERVERS • Directory based caching • Needs – 64 bit addressing lets you build large SMPs – nUMA sensitive OS • (not clear who will do it) • Every vendor building a HUGE SMP • Or Hypervisor – 256 way – 3x slower remote memory – 8-level memory hierarchy • • • • • • • L1, L2 cache DRAM remote DRAM (3, 6, 9,…) Disk cache Disk Tape cache Tape – like IBM LSF, – Stanford Disco www-flash.stanford.edu/Hive/papers.html • You get an expensive cluster-in-a-box with very fast network 71 Great Debate: Shared What? Shared Memory (SMP) CLIENTS Shared Disk CLIENTS Easy to program Difficult to build Difficult to scale SGI, Sun, Sequent Shared Nothing (network) CLIENTS Hard to program Easy to build Easy to scale VMScluster, Sysplex Tandem, Teradata, SP2, NT NUMA blurs distinction, but has it’s own problems 72 Technology Drivers Plug & Play Software • RPC is standardizing: (DCOM, IIOP, HTTP) – Gives huge TOOL LEVERAGE – Solves the hard problems for you: • naming, • security, • directory service, • operations,... • Commoditized programming environments – – – – FreeBSD, Linix, Solaris,…+ tools NetWare + tools WinCE, WinNT,…+ tools JavaOS + tools • Apps gravitate to data. • General purpose OS on controller runs apps. 73 Restatement The huge clusters we saw are prototypes for CyberBrick systems: A Federation of Functionally specialized nodes Each node shrinks to a “point” device With embedded processing. Each node / device is autonomous Each talks a high-level protocol 74 Outline • Clusters of Hardware CyberBricks – all nodes are very intelligent – Processing migrates to where the power is • Disk, network, display controllers have full-blown OS • Send RPCs to them (SQL, Java, HTTP, DCOM, CORBA) to them • Computer is a federated distributed system. • Software CyberBricks – standard way to interconnect intelligent nodes – needs execution model – needs parallelism 75 Software CyberBricks: Objects! • It’s a zoo • Objects and 3-tier computing (transactions) – Give natural distribution & parallelism – Give remote management! – TP & Web: Dispatch RPCs to pool of object servers – Components are a 1B$ business today! • Need a Parallel & distributed computing model 76 The COMponent Promise • Objects are Software CyberBricks – productivity breakthrough (plug ins) – manageability breakthrough (modules) • Microsoft: DCOM + ActiveX • IBM/Sun/Oracle/Netscape: CORBA + Java Beans • Both promise – parallel distributed execution – centralized management of distributed system Both camps Share key goals: • Encapsulation: hide implementation • Polymorphism: generic ops key to GUI and reuse • Uniform Naming • Discovery: finding a service • Fault handling: transactions • Versioning: allow upgrades • Transparency: local/remote • Security: who has authority • Shrink-wrap: minimal inheritance • Automation: easy 77 The OO Points So Far • Objects are software Cyber Bricks • Object interconnect standards are emerging • Cyber Bricks become Federated Systems. • Put processing close to data • Next point: – do parallel processing. 89 Kinds of Parallel Execution Pipeline Partition outputs split N ways inputs merge M ways Any Sequential Program Sequential Sequential Any Sequential Sequential Program Any Sequential Program Any Sequential Sequential Program 90 Object Oriented Programming Parallelism From Many Little Jobs • • • • • Gives location transparency ORB/web/tpmon multiplexes clients to servers Enables distribution Exploits embarrassingly parallel apps (transactions) HTTP and RPC (dcom, corba, rmi, iiop, …) are basis Tp mon / orb/ web server 91 Why Parallel Access To Data? At 10 MB/s 1.2 days to scan 1 Terabyte 1,000 x parallel 100 second SCAN. 1 Terabyte 10 MB/s Parallelism: divide a big problem into many smaller ones to be solved in parallel. 92 Partitioned Execution Spreads computation and IO among processors Count Count Count Count Count Count A Table A...E F...J K...N O...S T...Z Partitioned data gives NATURAL parallelism 98 N x M way Parallelism Merge Merge Merge Sort Sort Sort Sort Sort Join Join Join Join Join A...E F...J K...N O...S T...Z N inputs, M outputs, no bottlenecks. Partitioned Data Partitioned and Pipelined Data Flows 99 Summary • Clusters of Hardware CyberBricks – all nodes are very intelligent – Processing migrates to where the power is • Disk, network, display controllers have full-blown OS • Send RPCs to them (SQL, Java, HTTP, DCOM, CORBA) to them • Computer is a federated distributed system. • Software CyberBricks – standard way to interconnect intelligent nodes – needs execution model – needs parallelism 100 Summary • Why cost per transaction has dropped 100,000x in 12 years. • How does that change things? • What next (technology trends) • Hardware and Software CyberBricks 101 What I’m Doing • TerraServer: Photo of the planet on the web – a database (not a file system) – 1TB now, 15 PB in 10 years – http://www.TerraServer.microsoft.com/ • Sloan Digital Sky Survey: picture of the universe – just getting started, cyberbricks for astronomers – http://www.sdss.org/ • Sorting: – one node pennysort (http://research.microsoft.com/barc/SortBenchmark/) – multinode: NT Cluster sort (shows off SAN and DCOM) 102 What I’m Doing • NT Clusters: – failover: Fault tolerance within a cluster – NT Cluster Sort: balanced IO, cpu, network benchmar – AlwaysUp: Geographical fault tolerance. • RAGS: random testing of SQL systems – a bug finder • Telepresence – Working with Gordon Bell on “the killer app” – FileCast and PowerCast – Cyberversity (international, on demand, free university) 103