Transcript Telegraph Status - Massachusetts Institute of Technology
Telegraph Status
Joe Hellerstein
Overview
• Telegraph Design Goals, Current Status • First Application: FFF (Deep Web) • Budding Application: Traffic Sensor Data • Moving Forward
Telegraph: Adaptive Dataflow
• Dataflow – Siphon data from the “deep web” – Harness data streaming from sensors/traces – Flow through code – The API and Architecture for ubiquitous computing • Why adaptive? – Sensor nets & wide area internet: volatile! – Like Telegraph Avenue, need to roll w/the changes – Adaptive techniques for routing data to machines & code
Demos Delivered!
• The big push: FFF Election 2000 demo 10/2000 – Got Telegraph off the ground and live – Shows power of analysis & integration on web • It’s not just search any more!
– Served thousands of live, long-running queries • Initial Sensor Demo – UCB Institute for Transportation Studies data – Various web cams – Project for SIMS InfoVis class • A harness for more sensor-oriented work in Telegraph
Telegraph v1 (alpha) infrastructure
• Single-site (multi-source) dataflow engine – All Java: some lessons here (paper in preparation) • Numerous dataflow operators built – TeSS (Telegraph Screen Scraper) – File reader – Relational ops (filters, joins, grouping, aggregation) – Some simple sequence analysis ops – Eddy : adaptive flow ordering operator • Key architectural theme: gain adaptivity via new operators • Not changes to dataflow infrastructure!
• This is our upgrade strategy to parallelism/distribution • SQL-to-Dataflow parser – SQL is a fine dataflow language for many tasks
Upcoming Telegraph Operators
• Goal: Further adaptivity
through competition
– Multiple mirrored sources • Handle rate changes, failures, parallelism – Multiple alternate operators – STeM operator manages tradeoffs • STate Module, • Competitive
Vijayshankar Raman
unifies caches, rendezvous buffers, join state sources/operators share building/using STeMs eddy + stem eddy static dataflow
Telegraph Nuts and Bolts 2
• Parallelism & Fault Tolerance – Continuous/long-running flows need fault-tolerance – Big flows need parallelism • Adaptive Load-Balancing req’d – FLUX operator: Exchange plus… • Adaptive flow partitioning – River • Mobile operator state for full Load Balancing • Replicated flows & redundant state (RAID for operators) • Load rebalancing vs. vulnerability
Mehul Shah & Sirish Chandrasekaran
Further Directions & Goals
• Deep Web Trawling & Privacy Issues – We’re about to crawl web DBs (What? How much?) – Can do some fascinating/creepy things – Consider privacy & accuracy: countermeasures, incentives, etc Mehul Shah (W/Varian, Papadimitriou, L. Hellerstein & T. Suel) • Data Dissemination & Continuous Queries – Franklin’s XFILTER: XML pub/sub – New automata-based techniques from CS262 – Extend/integrate for pub/sub on general Telegraph flows Yanlei Diao/Asha Tarachandani • Sensor/Trace Data Apps – Bay Area traffic. Would like to do TinyOS (nobody on it yet) – Software traces? OceanStore?
Sam Madden