Telegraph Status - Massachusetts Institute of Technology

Download Report

Transcript Telegraph Status - Massachusetts Institute of Technology

Telegraph Status

Joe Hellerstein

Overview

Telegraph Design Goals, Current StatusFirst Application: FFF (Deep Web)Budding Application: Traffic Sensor DataMoving Forward

Telegraph: Adaptive Dataflow

Dataflow – Siphon data from the “deep web” – Harness data streaming from sensors/traces – Flow through code – The API and Architecture for ubiquitous computing • Why adaptive? – Sensor nets & wide area internet: volatile! – Like Telegraph Avenue, need to roll w/the changes – Adaptive techniques for routing data to machines & code

Demos Delivered!

The big push: FFF Election 2000 demo 10/2000 – Got Telegraph off the ground and live – Shows power of analysis & integration on web • It’s not just search any more!

– Served thousands of live, long-running queries • Initial Sensor Demo – UCB Institute for Transportation Studies data – Various web cams – Project for SIMS InfoVis class • A harness for more sensor-oriented work in Telegraph

Telegraph v1 (alpha) infrastructure

Single-site (multi-source) dataflow engine – All Java: some lessons here (paper in preparation) • Numerous dataflow operators built – TeSS (Telegraph Screen Scraper) – File reader – Relational ops (filters, joins, grouping, aggregation) – Some simple sequence analysis ops – Eddy : adaptive flow ordering operator • Key architectural theme: gain adaptivity via new operatorsNot changes to dataflow infrastructure!

• This is our upgrade strategy to parallelism/distribution • SQL-to-Dataflow parser – SQL is a fine dataflow language for many tasks

Upcoming Telegraph Operators

Goal: Further adaptivity

through competition

– Multiple mirrored sources • Handle rate changes, failures, parallelism – Multiple alternate operators – STeM operator manages tradeoffs • STate Module, • Competitive

Vijayshankar Raman

unifies caches, rendezvous buffers, join state sources/operators share building/using STeMs eddy + stem eddy static dataflow

Telegraph Nuts and Bolts 2

Parallelism & Fault Tolerance – Continuous/long-running flows need fault-tolerance – Big flows need parallelism • Adaptive Load-Balancing req’d – FLUX operator: Exchange plus… • Adaptive flow partitioning – River • Mobile operator state for full Load Balancing • Replicated flows & redundant state (RAID for operators) • Load rebalancing vs. vulnerability

Mehul Shah & Sirish Chandrasekaran

Further Directions & Goals

Deep Web Trawling & Privacy Issues – We’re about to crawl web DBs (What? How much?) – Can do some fascinating/creepy things – Consider privacy & accuracy: countermeasures, incentives, etc Mehul Shah (W/Varian, Papadimitriou, L. Hellerstein & T. Suel) • Data Dissemination & Continuous Queries – Franklin’s XFILTER: XML pub/sub – New automata-based techniques from CS262 – Extend/integrate for pub/sub on general Telegraph flows Yanlei Diao/Asha Tarachandani • Sensor/Trace Data Apps – Bay Area traffic. Would like to do TinyOS (nobody on it yet) – Software traces? OceanStore?

Sam Madden