Transcript Overview
The Role of Design in the Internet and Other Complex Systems David Alderson February 10, 2004 Joint work with J. Doyle, W. Willinger, and L. Li My challenge Use models of Internet topology as a case study to illustrate many of the themes of this week – – – – – – “How to make complex systems still complex but experimentally accessible?” Importance/interpretation of high variability in complex systems Modeling debate: design vs. randomness Understanding the “robust, yet fragile” aspects of the Internet “Closing the loop” between modeling and analysis Similarity to models in biology? The Internet as a Case Study • To the user, it creates the illusion of a simple, robust, homogeneous resource enabling endless varieties and types of technologies, physical infrastructures, virtual networks, and applications (heterogeneous). • Its complexity is starting to approach that of simple biological systems • Our understanding of the underlying technology together with the ability to perform detailed measurements means that most conjectures about its large-scale properties can be unambiguously resolved, though often not without substantial effort. A Theory for the Internet? Applications Source coding General Approach: Use an engineering design perspective TCP/AQM TCP/ to understand,FAST explain AQM the complex structure observed. routing Take IP a single layer in isolationIPand assume that the other layers are handled near optimally. Link HOT topology A Theory for the Internet? Applications TCP/ AQM ? If TCP/AQM is the answer, what is the question? max xs 0 subject to U ( x ) s s s Rx c IP Primal/dual model of TCP/AQM congestion control… Link A Theory for the Internet? Applications TCP/ AQM IP ? Link If the current topology of the Internet is the answer, what is the question? The Internet hourglass Applications Web FTP Mail News Video Audio ping napster Transport protocols TCP SCTP UDP ICMP IP Ethernet 802.11 Power lines ATM Optical Link technologies Satellite Bluetooth The Internet hourglass Applications Web FTP Mail News Video Audio ping napster TCP IP Ethernet 802.11 Power lines ATM Optical Link technologies Satellite Bluetooth The Internet hourglass Applications Web FTP Mail News Video Audio Everything on IP ping napster TCP IP Ethernet 802.11 IP on Power lines ATM Optical everything Link technologies Satellite Bluetooth Network protocols. Files HTTP Files TCP IP packets packets packets packets packets packets Links Sources Links Sources Routers Links Sources Hosts Links Sources Routers packets Hosts Modeling Network Topology Why does it matter? 1. Performance evaluation of protocols 2. Provisioning • Topology constrains the applications and services that run on top of it 3. Understanding large-scale properties • Reliability and robustness to accidents, failures, and attacks on network components 4. Insight into other network systems • To the extent that the network model is “universal” Topology Modeling • Direct inspection generally not possible • Recent trend: generative models follow empirical measurement studies • But… – So many things to measure – Incredible variability in so many aspects – How to determine what matters? The Internet • Full of “high variability” – – – – – Link bandwidth: Kbps – Gbps File sizes: a few bytes – Mega/Gigabytes Flows: a few packets – 100,000+ packets In/out-degree (Web graph): 1 – 100,000+ Delay: Milliseconds – seconds and beyond • How should we think about the incredible scaling ability of the Internet? • Is there something “universal” about its structure? Topology Modeling • Direct inspection generally not possible • Recent trend: generative models follow empirical measurement studies • But… – So many things to measure – Incredible variability in so many aspects – How to determine what matters? • We will focus on router-level topology Router-Level Topology Routers Hosts • Nodes are machines (routers or hosts) running IP protocol • Measurements taken from traceroute experiments that infer topology from traffic sent over network • Subject to sampling errors and bias • Requires careful interpretation Power Laws and Internet Topology A few nodes have lots of connections Source: Faloutsos et al (1999) Most nodes have few connections • How to account for high variability in node degree? • Can we develop an explanatory model for the current network topology? Power laws are ubiquitous • This is no surprise, and requires no “special” explanation. • Gaussians (“Normal”) distributions are attractors for averaging (e.g Central Limit Theorem) so are also ubiquitous. • Power laws are attractors for averaging too, but are also the only distributions invariant under maximizing, marginalization, and mixtures. • For high variability data subject to these operations, power laws should be expected (Power laws as “more normal than Normal”?) 20th Century’s 100 largest disasters worldwide 2 10 Technological ($10B) Log(rank) Natural ($100B) 1 10 US Power outages (10M of customers) 0 10 -2 10 -1 0 10 10 Log(size) 2 100 10 Log(rank) 1 10 10 3 2 0 1 10 -2 10 -1 0 10 10 Log(size) 20th Century’s 100 largest disasters worldwide 2 10 Technological ($10B) Natural ($100B) 1 10 US Power outages (10M of customers, 1985-1997) Slope = -1 (=1) 0 10 -2 10 -1 10 0 10 2 US Power outages (10M of customers, 1985-1997) 10 Slope = -1 (=1) 1 10 0 10 ? A large event is not inconsistent with statistics. -2 10 -1 10 0 10 Our Perspective • Must consider the explicit design of the Internet – Protocol layers on top of a physical infrastructure – Physical infrastructure constrained by technological and economic limitations – Emphasis on network performance – Critical role of feedback at all levels • We seek a theory for Internet topology that is explanatory and not merely descriptive. • Consider the ability to match large scale statistics (e.g. power laws) as secondary evidence of having accounted for key factors affecting design HOT Highly Heavily Heuristically Optimized Organized Tolerance Tradeoffs • Based on ideas of Carlson and Doyle • Complex structure (including power laws) of highly engineered technology (and biological) systems is viewed as the natural by-product of tradeoffs between systemspecific objectives and constraints • Non-generic, highly engineered configurations are extremely unlikely to occur by chance Heuristic Network Design What factors dominate network design? • Economic constraints – User demands – Link costs – Equipment costs • Technology constraints – Router capacity – Link capacity Connection Speed (Mbps) Internet End-User Bandwidths 1e4 1e3 POS/Ethernet 1-10Gbps academic and corporate 1e2 1e1 1 1e-1 high performance computing Ethernet 10-100Mbps a few users have very high speed connections most users have low speed connections residential and small business Broadband Cable/DSL ~500Kbps How to build a network that 1e-2 1e6 1e4 satisfies 1these end1e2 user demands? Rank (number of users) Dial-up ~56Kbps 1e8 Economic Constraints • Network operators have a limited budget to construct and maintain their networks • Links are tremendously expensive • Tremendous drive to operate network so that traffic shares the same links – Enabling technology: multiplexing – Resulting feature: traffic aggregation at edges – Diversity of technologies at network edge (Ethernet, DSL, broadband cable, wireless) is evidence of the drive to provide connectivity and aggregation using many media types Heuristically Optimal Network Mesh-like core of fast, Coresrouters low degree High degree Edges nodes are at the edges. Hosts Heuristically Optimal Network Claim: economic considerations alone yield • Mesh-like core of high-speed, low degree routers • High degree, low-speed nodes at the edge • Is this consistent with technology capability? • Is this consistent with real network design? Cisco 12000 Series Routers • Modular in design, creating flexibility in configuration. • Router capacity is constrained by the number and speed of line cards inserted in each slot. Chassis Rack size Slots Switching Capacity 12416 Full 16 320 Gbps 12410 1/2 10 200 Gbps 12406 1/4 6 120 Gbps 12404 1/8 4 80 Gbps Source: www.cisco.com Cisco 12000 Series Routers Technology constrains the number and capacity of line cards that can be installed, creating a feasible region. Cisco 12000 Series Routers Pricing info: State of Washington Master Contract, June 2002 (http://techmall.dis.wa.gov/master_contracts/intranet/routers_switches.asp) $2,762,500 $1,667,500 $932,400 $560,500 $602,500 $381,500 $212,400 $128,500 Technological advance 160Gb bandwidth 10Gb Technically feasible 2.5Gb 625Mb 155Mb log/log 1 16 256 degree Technologically Feasible Region Core backbone Bandwidth (Mbps) 1000000 High-end gateways 100000 cisco 12416 cisco 12410 10000 cisco 12406 1000 cisco 12404 100 cisco 7500 10 cisco 7200 cisco 3600/3700 1 1 10 0.1 0.01 Older/cheaper technology 100 degree 1000 10000 cisco 2600 linksys 4-port router Edge Shared media (LAN, DSL, Cable, Wireless, Dial-up) uBR7246 cmts (cable) cisco 6260 dslam (DSL) cisco AS5850 (dialup) Sprint backbone Intermountain GigaPoP Front Range GigaPoP Northern Lights U. Memphis Indiana GigaPoP U. Louisville Great Plains Merit OARNET Qwest Labs Arizona St. WiscREN OneNet NCSA U. Arizona Iowa St. StarLight MREN Oregon GigaPoP Pacific Northwest GigaPoP NYSERNet Pacific Wave UNM Denver Kansas City WPI Indianapolis Chicago Northern Crossroads Seattle SINet U. Hawaii New York ESnet AMES NGIX Wash D.C. Sunnyvale WIDE CENIC SURFNet Rutgers Los Angeles MANLAN UniNet Houston TransPAC/APAN Abilene Backbone Physical Connectivity (as of December 16, 2003) OC-3 (155 Mb/s) OC-12 (622 Mb/s) GE (1 Gb/s) OC-48 (2.5 Gb/s) OC-192/10GE (10 Gb/s) North Texas GigaPoP SFGP/ AMPATH Texas GigaPoP Miss State GigaPoP UT Austin UT-SW Med Ctr. Atlanta SOX Texas Tech LaNet Tulane U. GEANT Florida A&M U. So. Florida MAGPI PSC DARPA BossNet UMD NGIX Mid-Atlantic Crossroads Drexel U. Florida U. Delaware NCNI/MCNC CENIC Backbone (as of January 2004) Backbone topology of both Abilene and CENIC are both built as a mesh of high speed, low degree routers. OC-3 (155 Mb/s) OC-12 (622 Mb/s) GE (1 Gb/s) OC-48 (2.5 Gb/s) 10GE (10 Gb/s) Cisco 750X COR Cisco 12008 dc1 Cisco 12410 dc1 OAK Abilene Sunnyvale dc2 hpr SAC hpr dc1 dc2 FRG dc2 dc1 hpr dc1 SVL dc3 FRE dc1 SOL dc1 BAK dc1 As one moves from the core out toward the edge, connectivity gets higher, and speeds get lower. SLO dc1 hpr hpr LAX dc2 Abilene Los Angeles dc1 dc3 TUS SDG dc1 hpr dc3 dc1 CENIC Backbone for Southern California to Sunnyvale to Fremont to Soledad SLO dc1 LA CCD, LA City, LA Harbor, LA Mission, LA Pierce, LA Southwest, LA Trade Tech, LA UC Santa Valley, Moorpark, Mt. San Barbara Antonio, Oxnard Antelope Valley CC, Cerritos, Citrus, College of the Canyons, Compton, East LA, El Camino CC, Glendale, Long Beach City College, Pasadena CC, Santa Monica, Ventura College to Sacramento BAK hpr dc1 Caltech UCLA LAX dc2 Los Nettos Abilene hpr UC Irvine dc1 UCSSN (Las Vegas) dc3 LAAP San Bernardino CSS Riverside COE dc1 CUDI Peer, ESNet Peer UC Riverside SDG Orange COE TUS Monrovia USD Gigaman Los Angeles COE San Diego CC, Soutwestern CC, Grossmont, Cuyamaca, Imperial Valley, Mira Costa CC, Palomar College Chaffey, Crafton Hills, Cypress, Fullerton CC, Mt. San Jacinto, Rio Hondo, Riverside, San Bernardino CCD, San Bernardino Valley, N. Orange Cty CCD, Santa Ana College hpr dc3 dc1 San Diego COE SDSC UC San Diego LA USD Johnson & Johnson Chaffey Joint USD Heuristically Optimal Network • Mesh-like core of high-speed, low degree routers • High degree, low-speed nodes at the edge • Claim: consistent with drivers of topology design – Economic considerations (traffic aggregation) – End user demands • Claim: consistent with technology constraints • Claim: consistent with real observed networks Question: How could anyone imagine anything else? Two opposite views of complexity Physics: Engineering and math: • Pattern formation by reaction/diffusion • Edge-of-chaos • Order for free • Self-organized criticality • Phase transitions • Scale-free networks • Equilibrium, linear • Nonlinear, heavy tails as exotica • • • • • • • • • Constraints Tradeoffs Structure Organization Optimality Robustness/fragility Verification Far from equilibrium Nonlinear, heavy tails as tool Models of Internet Topology • Random graphs [Waxman ’88] • Explicit hierarchy [Calvert/Zegura ’96] • Power laws [Faloutsos3 ’99] Random Networks Two methods for generating random networks having power law distributions in node degree • Preferential attachment (“scale-free” networks) – Inspired by statistical physics – Barabasi et al.; 1999 • Power Law Random Graph (PLRG) – Inspired by graph theory – Aiello, Chung, and Lu; 2000 Common features: • Ignore all system-specific details • Central core of high-degree, hub-like nodes Summary of Scale-Free Story • Fact: Scale-free networks have roughly power law degree distributions • Claim: – If the Internet has power law degree distribution – Then it must be scale-free (oops) – Therefore, it has the properties of a scale-free network One of the most-read papers ever on the Internet! Scientists spot Achilles heel of the Internet • "The reason this is so is because there are a couple of very big nodes and all messages are going through them. But if someone maliciously takes down the biggest nodes you can harm the system in incredible ways. You can very easily destroy the function of the Internet," he added. • Barabasi, whose research is published in the science journal Nature, compared the structure of the Internet to the airline network of the United States. Complexity Digest 2004.06 Feb. 09, 2004 Archive: http://www.comdig.org 13. Accurately Modeling the Internet Topology , arXiv Abstract: To model the behavior of a network it is crucial to obtain a good des topology because structure affects function. When studying the topological prop Internet, we found out that there are two mechanisms which are necessary for th of the Internet: a nonlinear preferential growth, where the growth is described positive-feedback mechanism, and the appearance of new links between already ex show that the Positive-Feedback Preference (PFP) model, which is based on the a reproduces topological properties of the Internet such as: degree distribution, (rich-club connectivity), shortest path length, neighbor clustering, network re and rectangle coefficient), disassortative mixing (nearest-neighbors average de information flow pattern (betweenness centrality). We believe that these growth further study because they provide a novel insight into the evolutionary dynamics of networks. * [38] Accurately Modeling the Internet Topology, Shi Zhou, Raul J. Mondragon, , 2004-02-05, arXiv Key Points • The scale-free story is based critically on the implied relationship between power laws and a network structure that has highly connected “central hubs” – Not all networks with power law degree distributions have properties of scale free networks. (The Internet is just one example!) – Building a model to replicate power law data is no more than curve fitting (descriptive, not explanatory) • The scale-free models ignore all system-specific details in making their claims – Ignore architecture (e.g. hardware, protocol stack) – Ignore objectives (e.g. performance) – Ignore constraints (e.g. geography, economics) End Result The scale-free claims of the Internet are not merely wrong, they suggest properties that are the opposite of the real thing. Fundamental difference: random vs. designed Internet topologies nodes=routers edges=links 25 interior routers 818 end systems “scale-rich” vs. scale-free rank 1 10 Low degree mesh-like core 0 10 High degree hublike core identical power-law degrees 1 2 degree these How to characterize 10 / compare two networks? 10 Network Performance Given realistic technology constraints on routers, how well is the network able to carry traffic? Step 1: Constrain to be feasible Step 2: Compute traffic demand Bj Bandwidth (Mbps) 1000000 xij 100000 10000 1000 100 Bi Abstracted Technologically Feasible Region Step 3: Compute max flow max xij max Bi B j 10 degree 1 10 100 1000 i, j s.t. i, j x i , j:krij ij Bk , k Network Likelihood How likely is a particular graph (having given node degree distribution) to be constructed? • Notion of likelihood depends on defining an appropriate probability space for random graphs. • Many methods (all based on probabilistic preferential attachment) for randomly generating graphs having power law degree distributions: – Power Law Random Graph (PLRG) [Aiello et al.] – Random rewiring (Markov chains) d d In both cases, LogLikelihood (LLH) i i, j connected j Why such striking differences with same node degree distribution? Fast Performance Slow Low High Likelihood Fast Slow Low High Likelihood Performance Likelihood Bandwidth (Mbps) Fast core 100000 100000 10000 10000 1000 1000 100 100 High-degree edge 10 Slow core Slower edge 10 1 1 1 10 100 degree 1 10 100 degree HOT scale-rich • • • • Core: Mesh-like, low degree Edge: High degree Robust to random Robust to “attack” Scale-free • • • • Core: Hub-like, high degree Edge: Low degree Robust to random Fragile to “attack” + objectives and constraints • • • • • High performance Low link costs Unlikely, rare, designed Destroyed by rewiring Similar to real Internet • • • • • Low performance High link costs Highly likely, generic Preserved by rewiring Opposite of real Internet HOT Low Likelihood Low Performance Random Hierarchical Scale-Free (HSF) Most Likely Universal features of complex networks The only functional biological or technological networks are highly organized, robust, efficient, and very unlikely to arise by random. Robust, Efficient HOT scale-free, critical, SOC, edge-of-chaos Fragile, Wasteful Low High Likelihood HOT=Highly Organized/Optimized Tradeoffs/Tolerance Carriers Metabolites 3 Catabolism Precursors 10 Rank all metabolites 33 65 78 2 10 Amino acids 132 152 Carriers Nucleotides 1 10 Lipids & fatty acids 0 184 204 236 251 10 1 10 Number of reactions 100 Cofactors 313 58 133 190 240 1 8 0 1 8 0 1 6 0 1 6 0 1 4 0 5 4 3 2 1 0 1 2 0 H. Pylori 1 0 0 8 0 6 0 4 0 carriers carriers carriers carriers carrier carrier 1 4 0 5 4 3 2 1 0 1 2 0 1 0 0 8 0 6 0 4 0 2 0 Reactions 0 1 2 3 4 5 6 7 8 2 0 0 1 2 3 4 5 6 7 8 carr carr carr carr carr carr Bowtie architecture Carriers Amino acids Nucleotides Nutrients Catabolism Precursors Fatty acids And Lipids Cofactors Biosynthesis S1 ATP S2 ADP S3 NADH S4 NAD Carrier ADP ATP NAD NADH S1 S2 Substrate S3 S4 Reaction, Enzyme Stoichiometry S1 1 0 S 2 1 0 S3 0 1 S4 0 1 ATP 1 0 ADP 1 0 NADH 0 1 NAD 0 1 Stoichiometry Matrix GLC PI G6P NAD NADH ADP ATP NADP NADPH CO2 COA ACCOA PPI AMP ATP NH3 AC THF MTH H2S PGL PGC RL5P F6P X5P 6PG PRPP AN NAN R5P DAH DQT DHS SME S5P PSM CD5 IGP TRP CHO E4P PPN HPP TYR T3P CYS PEP DPG 3PG ASE PHP PPS SER PYR MAL OA GLY ASP ASN BAP ASS CIT HSE PHS THR DHD PIP FUM SUC SAK SDP DPI MDP LYS SUCOA GLN ICIT AKG GLU amino acids precursors NAD NADH PI GLC ADP ATP Carriers NADP NADPH CO2 COA ACCOA PPI AMP ATP NH3 THF MTH AC H2S • WT is highly organized, structured G6P PGL PGC RL5P PRPP AN R5P F6P X5P DAH DQT DHS SME S5P PSM E4P 6PG NAN CD5 IGP PPN HPP TYR T3P CYS PEP DPG 3PG ASE PHP PPS SER GLY ASP ASN PYR OA MAL TRP CHO BAP ASS CIT HSE THR PHS DHD PIP FUM SUC ICIT SAK SDP DPI MDP LYS SUCOA AKG Wild type GLN GLU • Simple reactions • Long assembly lines • Universal common carriers • Precursors and carriers are universal common currencies GLC NAD NADH PI G6P ADP ATP NADP NADPH CO2 COA ACCOA PPI AMP ATP NH3 THF MTH AC H2S PGL PGC RL5P F6P X5P 6PG PRPP AN NAN R5P DAH DQT DHS SME S5P PSM CD5 IGP E4P PPN HPP TYR T3P CYS PEP DPG 3PG ASE PHP PPS SER PYR MAL TRP CHO OA GLY ASP ASN BAP ASS CIT HSE PHS THR DHD PIP FUM SUC SAK SDP DPI MDP LYS Carriers SUCOA GLN ICIT AKG precursors GLU amino acids • Randomly rewire to get “scale-free” version • Preserve • degree • carrier and enzyme • Destroys structure • Only one useful pathway remains NAD NADH PI GLC ADP ATP NADP NADPH CO2 COA ACCOA PPI AMP ATP NH3 AC PRPP AN NAN CD5 THF MTH H2S G6P PGL PGC RL5P R5P F6P X5P 6PG DAH DQT DHS SME S5P PSM IGP E4P PPN HPP TYR T3P CYS PEP DPG 3PG ASE PHP PPS SER PYR MAL TRP CHO OA GLY ASP ASN BAP ASS HSE CIT PHS THR DHD PIP FUM SUC ICIT SAK SDP DPI MDP LYS SUCOA AKG Random GLN GLU GLC NAD NADH PI G6P ADP ATP NADP NADPH CO2 COA ACCOA PPI AMP ATP NH3 THF MTH AC H2S PGL PGC RL5P F6P X5P 6PG PRPP AN NAN R5P DAH DQT DHS SME S5P PSM CD5 IGP TRP CHO E4P PPN HPP TYR T3P CYS PEP DPG 3PG ASE PHP PPS SER PYR MAL OA GLY ASP ASN BAP ASS CIT HSE PHS THR DHD PIP FUM SUC SAK SDP DPI MDP LYS Carriers SUCOA GLN ICIT AKG GLU amino acids precursors NAD NADH PI ADP ATP NADP NADPH CO2 COA ACCOA PPI AMP ATP NH3 THF MTH AC H2S NAD NADH PI GLC ADP ATP NADP NADPH CO2 COA ACCOA PPI AMP ATP NH3 AC PRPP AN NAN CD5 THF MTH H2S G6P GLC PGL PGC RL5P PRPP AN R5P F6P X5P DAH DQT DHS SME S5P PSM CD5 IGP PGC PPN HPP 3PG PHP PPS OA DQT DHS SME S5P PSM IGP DPG E4P 3PG PPN HPP TYR CYS ASE PHP PPS SER PYR GLY ASP ASN ASS CIT HSE THR PHS MAL TRP CHO PEP GLY ASP ASN BAP DAH T3P SER PYR MAL X5P 6PG ASE R5P F6P CYS PEP RL5P TYR T3P DPG TRP CHO E4P 6PG NAN G6P PGL OA BAP ASS DHD PIP SAK SDP DPI MDP HSE CIT LYS PHS THR DHD PIP FUM SUC ICIT SAK SDP DPI MDP LYS SUCOA AKG Wild type GLN FUM SUC SUCOA GLU ICIT AKG Random GLN GLU NAD NADH PI GLC ADP ATP NADP NADPH CO2 COA ACCOA PPI AMP ATP NH3 THF MTH AC H2S G6P PGL PGC RL5P PRPP AN R5P F6P X5P DAH DQT DHS SME S5P PSM CD5 IGP PPN HPP TYR T3P CYS PEP DPG 3PG ASE PHP SER PPS GLY ASP ASN PYR OA MAL TRP CHO E4P 6PG NAN BAP ASS CIT HSE THR PHS DHD PIP FUM SUC SAK SDP DPI MDP LYS SUCOA GLN ICIT AKG GLU “Closing the loop” Modeling Measurement Analysis Validation PLRG Preferential Attachment HOT Preferential Attachment PLRG HOT Preferential Attachment PLRG HOT Total Router Bandwidth (Mbps) Internet Routing Technologies 1e6 Core Routers 1e5 High-End Gateways 1e4 Older Cheaper Technology 1e3 Access Edge Routers Shared Media 1e2 Abstracted Feasible Region 1e1 1 1 1e1 1e2 1e3 Degree (number of connections) 1e4 Total Router Bandwidth (Mbps) Internet Routing Technologies 1e6 Core Routers High-End Gateways 1e5 Per link bandwidth 1e4 Older Cheaper Technology 1e3 Access Edge Routers Shared Media 1e2 Abstracted Feasible Region 1e1 1 1 1e1 1e2 1e3 Degree (number of connections) 1e4 Bandwidth / Link (Mbps) Internet Link Speeds Core Routers 10Gbps 1e4 1e3 1e2 Core/Edge Routers 1Gbps Local Area Ethernet 10-100Mbps 1e1 Broadband Cable ~500Kbps 1 1e-1 DSL ~500Kbps Dial-up ~56Kbps 1e-2 1 1e1 1e2 1e3 Degree (number of connections) 1e4