ICDE2009 - Renmin University of China

Download Report

Transcript ICDE2009 - Renmin University of China

ICDE2009 Keynotes
Summary
Shanghai, China, 3.29-4.2
Li Yukun
Outline
 Keynotes
Search Computing(Stefano Ceri)
 Data Management in the Cloud(Raghu
Ramakrishnan)
Why Can't I Find My Data the Way I Find My Dinner?
David Carlson
Keynote 1
Search Computing
Stefano Ceri
Dipartimento di Elettronica e Informazione, Politecnico di Milano
Piazza L. Da Vinci 32, 20133 Milano, Italy
[email protected]
Motivation
 “Who are the strongest European competitors on
software ideas?
 Who is the best doctor to cure insomnia in a nearby
hospital?
 Where can I attend an interesting conference in my field
close to a sunny beach?”
This information is available on the Web, but no software
system can accept such queries nor compute the answer.
Core model for search computing
 Conventional services
 Are abstracted as systems producing sets of equal-weight answers;
 Service computing
 A cross-discipline that covers the science and technology of
bridging the gap between Business Services and IT Services.
 The goal of Services Computing is to enable IT services and
computing technology to perform business services more
efficiently and effectively.
 Search services
 Can be abstracted as systems producing ranked lists of answers.
 Search computing
 It is a new paradigm where ranking is the dominant factor for composing
services.
 Multi-domain query, constellation of cooperating search services,
possibly dynamically selected,
CHAPTERS OF SEARCH COMPUTING
 Theory for search computing
 Select the best abstractions covering the concepts
 Design basic operations on services and algorithms
 Compute time and space complexity
 Statistical models for search services
 Build statistical estimators of the number and quality of the results
 Optimization methods for search computing
 Description abstractions for search services
 Expose ranking-specific properties of search services
 Language abstractions for search computing
 by incorporating the ranking aspects and strategies for dealing with rankings
CHAPTERS OF SEARCH COMPUTING
 Human-computer interfaces
 Expressing ranking preferences.
 Light-weight user interaction
 Semantics
 Merging the results of heterogeneous search services
 semantic “join” of search services.
 Higher-order ranking
 “ranking of rankings”, is essential for selecting and prioritizing
search services.
 A multi-level one,
 Managing individual and social searching
 search strategies to user profiling or to past user interactions
 Societal recommendation and evaluation
 Thus, individual and societal aspects are key ingredients for
search computing
CHAPTERS OF SEARCH COMPUTING
 Search computing engineering
 designing, assembling and deploying search computing software
applications.
 Economy of search computing
 Suitable business models, based upon advertising schemes,
pay-per-query, subscription fees, micro-billing, and so on.
 Security and privacy of search computing
 control of how data is used.
 For instance, use of a search service could be granted to a
service computing application, provided that the service’s
owners can trace all queries involving their data and limit the
kind of information that is made visible to the queries.
PROJECT ORGANIZATION
 Funded by the European Research Council in
the framework of the IDEAS Advanced Grants;
 It started on Nov. 1, 2008 and will last five years.
PROJECT ORGANIZATION
 The project involves about 30 researchers at
Politecnico
 Abdan Abid, Edoardo Amaldi, Alessandro Bozzon, Daniele Maria Braga, Marco
Brambilla, Tommaso Buganza, Alessandro Campi, Sofia Ceppi, Sara Comai,
Emanuele Della Valle, Piero Fraternali, Nicola Gatti, Michael Grossniklaus,
Ma’moun Abu Hellu, Pier Luca Lanzi, Davide Martinenghi, Marco Masseroli,
Maristella Matera, Davide Mazza, Giuseppe Pozzi, Stefania Ronchi, Roberto
Verganti, Marco Tagliasacchi, Massimo Tisi.
 SeCo has an advisory board







Edoardo Amaldi (Operations Research),
Fabio Casati (Service Computing),
Georg Gottlob (Theory),
Ioana Manolescu (Systems and Performance),
Roberto Verganti (Business Models),
Gerhard Weikum (Information Retrieval for the Web),
Jennifer Widom (Languages and Paradigms)
seven teams
Concept team
Theory and methods
Service registration and management
Query processing
Interaction design
Tools and prototypes
Business models and technology watch
More information on SeCo is
available on the project’s Web site:
 http://home.dei.polimi.it/ceri/seco/index.html
Outline
 Keynotes
Search Computing
Stefano Ceri
 Data Management in the Cloud
Raghu Ramakrishnan
Why Can't I Find My Data the Way I Find My Dinner?
David Carlson
Keynote 2: Data Management in the Cloud
Yahoo! Research
CCDI











Raghu Ramakrishnan
Brian Cooper
Utkarsh Srivastava
Adam Silberstein
Nick Puz
Rodrigo Fonseca
Chuck Neerdaels
P.P.S. Narayan
Kevin Athey
Toby Negrin
Plus Dev/QA teams
Pie-in-the-sky
SCENARIOS
Living in the Clouds
We want to start a new website,
FredsList.com
Our site will provide listings of items for sale,
jobs, etc.
As time goes on, we’ll add more features
illustrate how more cloud capabilities are used
as needed
List of capabilities/components is illustrative,
not exhaustive
Step 1: Listings
FredsList wants to store listings as (key, category, description)
FredsList.com application
1234323,
transportation,
For sale: one
bicycle, barely
used
5523442,
childcare,
Nanny available
in San Jose
215534,
wanted,
Looking for
issue 1 of
Superman comic
book
Simple Web Service API’s
Sherpa
Database
DECLARE DATASET Listings AS
( ID String PRIMARY KEY,
Category String,
Description Text )
Step 2: Search
FredsList’s customers quickly ask for keyword search
FredsList.com application
“dvd’s”
“bicycle”
“nanny”
ALTER Listings
SET Description SEARCHABLE
Simple Web Service API’s
Sherpa
Vespa
Database
Search
MessagingYMB
Step 3: Photos
FredsList decides to add photos to listings
FredsList.com application
ALTER Listings
ADD Photo BLOB
Simple Web Service API’s
Sherpa
Foreign key
MObStor
Vespa
photo → listing
Database
Storage
MessagingYMB
Search
Step 4: Data Analysis
FredsList wants to analyze its listings to get statistics about category, do geocoding, etc.
FredsList.com application
Pig query to
analyze
categories
Hadoop
program to
geocode data
Hadoop program to
generate fancy
pages for listings
ALTER Listings
MAKE ANALYZABLE
Simple Web Service API’s
Grid
Sherpa
Foreign key
MObStor
Vespa
photo → listing
Compute
Database
Batch export
Storage
MessagingYMB
Search
Step 5: Performance
FredsList wants to reduce its data access latency
FredsList.com application
ALTER Listings
MAKE CACHEABLE
Simple Web Service API’s
Grid
Sherpa
Foreign key
MObStor
Vespa
memcached
photo → listing
Compute
Database
Batch export
Storage
MessagingYMB
Search
Caching
Motherhood-and-Apple-Pie
EYES TO THE SKIES
Requirements for Cloud Services
 Multitenant
 A cloud service must support multiple, organizationally distant customers.
 Elasticity
 Tenants should be able to negotiate and receive resources/QoS on-demand.
 Resource Sharing
 Ideally, spare cloud resources should be transparently applied when a tenant’s
negotiated QoS is insufficient.
 Horizontal scaling
 It should be possible to add cloud capacity in small increments; this should be
transparent to the tenants
 Metering
 A cloud service must support accounting that reasonably ascribes operational and
capital expenditures to each of the tenants of the service.
 Security
 A cloud service should be secure in that tenants are not made vulnerable because
of loopholes in the cloud.
 Availability
 A cloud service should be highly available.
 Operability
 A cloud service should be easy to operate
Types of Cloud Services
 Two kinds of cloud services:
Horizontal Cloud Services
 Functionality enabling tenants to build applications or new
services on top of the cloud
Functional Cloud Services
 Functionality that is useful in and of itself to tenants. E.g.,
various SaaS instances, such as Saleforce.com; Google
Analytics and Yahoo!’s IndexTools; Yahoo! properties
aimed at end-users and small businesses, e.g., flickr,
Groups, Mail, News, Shopping
 Yahoo! has been offering these for a long while (e.g., Mail
for SMB, Groups, Flickr, BOSS, Ad exchanges)
SHERPA
To Help You Scale Your Mountains of Data
The Sherpa Solution
The next generation global-scale record store
Record-orientation: Routing, data storage optimized for
low-latency record access
Scale out: Add machines to scale throughput (while
keeping latency low)
Asynchrony: Pub-sub replication to far-flung datacenters
to mask propagation delay
Consistency model: Reduce complexity of asynchrony
for the application programmer
Cloud deployment model: Hosted, managed service to
reduce app time-to-market and enable on demand scale
and elasticity
26
QUERY
PROCESSING
27
Accessing Data
4 Record for key k
1
Get key k
3 Record for key k
SU
SU
2
Get key k
SU
28
Bulk Read
1
{k1, k2, … kn}
2
Get k1
Get k2
SU
SU
Get k3
Scatter/
gather
server
SU
29
Range Queries in YDOT
 Clustered, ordered retrieval of records
Apple
Avocado
Grapefruit…Pear?
Banana
Blueberry
Canteloupe
Grape
Kiwi
Lemon
Grapefruit…Lime?
Lime…Pear?
Router
Lime
Mango
Orange
Strawberry
Apple
Tomato
Avocado
Watermelon
Banana
Blueberry
Storage unit 1
Canteloupe
Storage unit 3
Lime
Storage unit 2
Strawberry
Storage unit 1
Strawberry
Tomato
Watermelon
Storage unit 1
Lime
Mango
Orange
Canteloupe
Grape
Kiwi
Lemon
Storage unit 2
Storage unit 3
Updates
8 Sequence # for key k
1
Write key k
Routers
Message brokers
3
7 Sequence # for key k
2
Write key k
4
Write key k
5
SU
SU
SU
6
SUCCESS
Write key k
31
ASYNCHRONOUS REPLICATION
AND CONSISTENCY
32
Asynchronous Replication
33
Consistency Model
 Goal: make it easier for applications to reason about updates
and cope with asynchrony
 What happens to a record with primary key “Brian”?
Record
inserted
v. 1
Update
Update Update
Update
v. 2
v. 3
v. 4
Update
Update
v. 5
v. 6
Generation 1
v. 7
Delete
Update
v. 8
Time
Time
34
Consistency Model
Read
Stale version
v. 1
v. 2
v. 3
v. 4
Stale version
v. 5
v. 6
Generation 1
v. 7
Current
version
v. 8
Time
35
Consistency Model
Read up-to-date
Stale version
v. 1
v. 2
v. 3
v. 4
Stale version
v. 5
v. 6
Generation 1
v. 7
Current
version
v. 8
Time
36
Consistency Model
Read ≥ v.6
Stale version
v. 1
v. 2
v. 3
v. 4
Stale version
v. 5
v. 6
Generation 1
v. 7
Current
version
v. 8
Time
37
Consistency Model
Write
Stale version
v. 1
v. 2
v. 3
v. 4
Stale version
v. 5
v. 6
Generation 1
v. 7
Current
version
v. 8
Time
38
Consistency Model
Write if = v.7
ERROR
Stale version
v. 1
v. 2
v. 3
v. 4
Stale version
v. 5
v. 6
Generation 1
v. 7
Current
version
v. 8
Time
39
Index Maintenance
How to have lots of interesting indexes,
without killing performance?
Solution: Asynchrony!
Indexes updated asynchronously when base
table updated
Planned functionality
SHERPA
IN CONTEXT
42
MObStor
 Yahoo!’s next-generation globally replicated, virtualized
media object storage service
 Better provisioning, easy migration, replication, better
BCP, and performance
 New features (Evergreen URLs, CDN integration, REST
API, …)
 The object metadata problem is addressed using Sherpa,
though MObStor is focused on blob storage.
43
Storage & Delivery Stack
The World Has Changed
Web applications need
Scalability!
Geographic distribution
High availability
Reliable storage
Web applications be unfit for
Complicated queries
Strong transactions
Web Data Management
• Scan oriented
workloads
• Focus on
sequential disk
I/O
• $ per cpu
cycle
Large data analysis
(Hadoop)
Structured record
storage
(PNUTS)
Blob storage
(SAN/NAS)
• Object
retrieval and
streaming
• Scalable file
storage
• $ per GB
• CRUD
• Point lookups
and short
scans
• Index
organized
table and
random I/Os
• $ per latency
Application Design Space
Get a few
things
Sherpa
MySQL Oracle
BigTable
Scan
everything
Everest
Records
MObStor
YMDB
Filer
Hadoop
Files
47
Further Reading
Efficient Bulk Insertion into a Distributed Ordered Table (SIGMOD 2008)
Adam Silberstein, Brian Cooper, Utkarsh Srivastava, Erik Vee,
Ramana Yerneni, Raghu Ramakrishnan
PNUTS: Yahoo!'s Hosted Data Serving Platform (VLDB 2008)
Brian Cooper, Raghu Ramakrishnan, Utkarsh Srivastava,
Adam Silberstein, Phil Bohannon, Hans-Arno Jacobsen,
Nick Puz, Daniel Weaver, Ramana Yerneni
Outline
 Keynotes
Search Computing(Stefano Ceri)
 Data Management in the Cloud(Raghu
Ramakrishnan)
Why Can't I Find My Data the Way I Find My Dinner?
David Carlson
Keynote 3
Why Can’t I Find My Data the
Way I Find My Dinner?
 David Carlson
 Director International Polar Year International Programme Office
 Cambridge, UK
 [email protected]
International Polar Year(IPY)
One can find almost every
discipline represented in the IPY
projects, and funding has come
from geophysical, biological and
social agencies and programs.
IPY data
 open access data policy
 display and access of IPY data
 We have component systems, within nations,
disciplines, or existingdata service centers, that
provide access examples for portions of the IPY
data set.
 We have unprecedented bandwidth for real-time
data transmission
 But , How to access these data set easily!!!
enormous challenges
financial
social and technical barriers
this talk focuses on the latter.
Example
To understand and predict the health of
migratory bird populations in the polar
environment,
Need ornithological, toxicological, ecological,
meteorological, hydrological, climatological,
geomagnetic, and sociological data.
These data will cover a broad range of space
and times scales, often in disparate (or at least
inconsistent) space and time coordinate system
Problems
 Data access
For a larger population of curious users, the specialized
data services associated with subsets of the IPY data
will not provide easy, friendly, or even accessible
 Interfaces
 No familiar interfaces will provide integrated discovery
and browse services.
 No long-term plan
On longer time scales, and even as data storage
capabilities grow rapidly, most of the IPY data sets donot,
at present, have acceptable long-term archive plans,
even for passive storage without continued discovery
services.
Research issues







smart search engines
pattern recognition
data mining tools
multi-gigabyte personal storage devices
Advanced animation capabilities
coupled with almost unlimited mobile bandwidth
offer many citizens expansive and amazing access to commercial,
recreational, financial, and personal data and data services.
 What changes in strategy, technology, funding and individual and
collective behavior need to occur in the world of scientific data to
allow me to browse, view and access IPY data on my iTouch?
Thanks