Planning for the Web II Execution & Service Integration Dan Weld

Download Report

Transcript Planning for the Web II Execution & Service Integration Dan Weld

Planning for the Web II
Execution & Service Integration
Dan Weld
University of Washington
June, 2003
© Daniel S. Weld, PLANET 2003 Tutorial on Data Integration
Acknowledgements
•
•
•
•
•
•
Oren Etzioni
Yolanda Gil
Keith Golden
Alon Halevy
Zack Ives
Tal Shaked
Caveat
© Daniel S. Weld, PLANET 2003 Tutorial on Data Integration
2
Outline
• Execution for Data Integration
Coping with incomplete statistics, latency
Interleaved planning & execution
Convergent query processing
• Service Integration
Web service composition
• Background
• Representational issues
• Planning algorithms
Automated data analysis
© Daniel S. Weld, PLANET 2003 Tutorial on Data Integration
3
Optimization and Execution
• Problem:
Few and unreliable statistics about the data.
Unexpected (possibly bursty) network transfer
rates.
Generally, unpredictable environment.
• General solution: (research area)
Adaptive query processing.
Interleave optimization and execution. As you
get to know more about your data, you can
improve your plan.
© Daniel S. Weld, PLANET 2003 Tutorial on Data Integration
4
Adaptivity & Incremental
Processing Query Performance
User's
Query
Convergent Query Processing
Execution
Cost Models
Query Plans
Query
Translation
Execution
Stats
Query over
Sources
Adaptive
Query
Execution
Operators
X-Scan Incremental
Pattern Matching
XML Sources
Evaluated within the Tukwila system
Internet
XML
Wrapper
WAN
XML
Source
Legacy
Source
XML
Exporter
© Daniel S. Weld, PLANET 2003 Tutorial on Data Integration
[Ives PhD]
5
Query Optimization: Model
Query Plans’ Execution &
Choose the Best
op
RO
~30 tuples
op
Restock (R)
100 tuples
ROS
~270 tuples
50 sec
ROS
~270 tuples
30 sec
Shipping (S)
90 tuples
Orders (O)
50 tuples
OS
~15 tuples
op
Restock (R)
100 tuples
op
Orders (O)
50 tuples
Shipping (S)
90 tuples
From source sizes, stats, estimate result sizes, costs
Estimates, assumptions introduce error:
 Exponential increase in estimation error with each join
[Ioannidis & Christodoulakis 91] [Antoshekov 93,96]
 Worse if no detailed statistics
© Daniel S. Weld, PLANET 2003 Tutorial on Data Integration
6
Why Does Data Integration Make
Optimization Harder?
Query optimization estimates costs using
knowledge about environment and data:
Data source sizes (“cardinalities”)
Often unavailable or not meaningful in data integration
Histograms
Too expensive to maintain in data integration
I/O costs
Network I/O costs fluctuate
Need a way to gain this sort of knowledge!
© Daniel S. Weld, PLANET 2003 Tutorial on Data Integration
7
Some Solutions
1.
2.
3.
4.
5.
Adaptive operators
Mid query reoptimization
Convergent query processing
Query scrambling [Franklin et al.]
Eddies
[Hellerstein et al.]
© Daniel S. Weld, PLANET 2003 Tutorial on Data Integration
8
Tukwila Data Integration System
data
Execution
Engine
Optimizer
query
Reformulator
logical
plan
source mappings
(Re-)
Optimizer
MemAllocFragmenter
Catalog
exec
plan
exec
results
Event
Handler
answer
Query
Operators
Temp Store
Novel components:
Event handler
Optimization-execution loop
Adaptive operators
© Daniel S. Weld, PLANET 2003 Tutorial on Data Integration
9
Double Pipelined Join
Hybrid Hash Join
No output until build
relation read
Asymmetric (build vs.
probe) — optimization
requires source
behavior knowledge
© Daniel S. Weld, PLANET 2003 Tutorial on Data Integration
Double Pipelined Hash Join
Outputs data immediately
Symmetric — requires less
source knowledge to
optimize
Threads overlap I/O,
computation
10
Performance on Networked Data
Join of 3 tables sent via JDBC over 10Mb Ethernet:
TPC-H Lineitem Supplier Order
800
Time (sec)
600
400
200
Double Pipelined
Hybrid - (Lineitem
Hybrid - (Supplier
0
1
51
101
151
Supplier)
Lineitem)
201
Order
Order
251
Tuples Output (1000s)
© Daniel S. Weld, PLANET 2003 Tutorial on Data Integration
11
Double Pipelined Join in Summary
Benefits:
Easier to optimize (symmetric)
Sub-operations scheduled flexibly
Allows overlap of I/O and computation
Incurs some overhead:
Threading, queues
Required extensions to intelligently handle
overflow:
• Same hash function, number of buckets for each side
• Approaches: flush buckets on left side or flush
symmetrically
© Daniel S. Weld, PLANET 2003 Tutorial on Data Integration
12
Some Solutions
1. Adaptive operators
2. Mid-query reoptimization
• Interleaved planning and execution
3. Convergent query processing
4. Query scrambling
5. Eddies
© Daniel S. Weld, PLANET 2003 Tutorial on Data Integration
13
Mid-query reoptimization
Materialization
Point: write AB to disk
AB
D
C
D
C
A
B
If actual  predicted statistics  replan
[Kabra & DeWitt]
© Daniel S. Weld, PLANET 2003 Tutorial on Data Integration
14
Some Solutions
1.
2.
3.
4.
5.
Adaptive operators
Mid query reoptimization
Convergent query processing
Query scrambling
Eddies
© Daniel S. Weld, PLANET 2003 Tutorial on Data Integration
15
Convergent Query Processing
• Instead of adapting remainder of plan
after executing all data on plan prefix
• Adapt whole plan
after executing whole plan on part of data
• Can better gather information this way…
© Daniel S. Weld, PLANET 2003 Tutorial on Data Integration
16
Convergent Query Processing in
Action: Changing Join Plans in
Mid-Stream
Join Restock, Orders, Shipping
(R  O  S)

R2 O2S2
R0 O0S0
ROS
R1 O1S1
“Cleanup”
query plan
R2O2
0 S0
RRS
R
RI 0
OO
S0
S
O
S0
O1S1
R1
O1
S1
© Daniel S. Weld, PLANET 2003 Tutorial on Data Integration
S2
R2
O2
17
Breaking a Join into Phases: One
Subset per Table, Each Phase
Restock (R) Orders (O)
Cleanup
Phase 0 R0
Phase
O0
O1
Phase 1
O0
O1
R1
T1  ...  Tm 
c1
cm
(
T

...

T
 1
m )
1 c1 n ,..., 1 cm  n
© Daniel S. Weld, PLANET 2003 Tutorial on Data Integration
18
The Cleanup Plan Reuses Previous
Work Where Possible
Restock  Orders  Shipping

R2 O2S2
R0 O0S0
R1 O1S1
Exclude R0S0O0,
R1S1O1, R2S2O2,
R2O2
R0 S0
0
O
O1S1
R1

S2
R2O2
R0
S0
O1
S1
R2
O2
Exclude R2O2
R0
R1
R2
© Daniel S. Weld, PLANET 2003 Tutorial on Data Integration
S0
S1
S2
O0
O1
O2
19
CQP on a 100Mbps LAN:
Nearly
“Optimal”
Performance
866MHz P-III, 256MB buffer pool, re-optimization every 10sec
Completion time (sec)
300
Traditional - no statistics
CQP - no statistics
Traditional - cardinalities
CQP - cardinalities
Best plan - all CQP statistics
200
cost to parse XML
100
0
TPC-H Q3 (2 joins) TPC-H Q5 (5 joins)
© Daniel S. Weld, PLANET 2003 Tutorial on Data Integration
TPC-H Q10 (3
joins, queryable
sources)
Query: Count
People Met (group
& 2 joins)
20
Slow WAN, Faster CPU:
CQP Reduces Work
1GHz P-III, 256MB, re-optimization every 10sec. 1Mbps network, RTT ~50msec
300
Completion time (sec)
Traditional - cardinalities
CQP - no statistics
CQP - cardinalities
200
100
0
TPC-H Q3 (1/10th TPC-H Q5 (1/10th
size ORDERS,
size ORDERS,
local LINEITEM)
local LINEITEM)
© Daniel S. Weld, PLANET 2003 Tutorial on Data Integration
TPC-H Q10
(queryable
sources)
Query: Count
People Met (group
& 2 joins)
21
Outline
• Execution for Data Integration
Coping with incomplete statistics, latency
Interleaved planning & execution
Convergent query processing
• Service Integration
Web service composition
• Background
• Representational issues
• Planning algorithms
Automated data analysis
© Daniel S. Weld, PLANET 2003 Tutorial on Data Integration
22
What is a Web Service
• A web service is a network accessible interface to
application functionality, built using standard
Internet protocols (TCP/IP, XML, SOAP, WSDL…
Clients of a web service do NOT need to know how it is
implemented.
• Why interesting?
Increased automation
Application
Network
client
© Daniel S. Weld, PLANET 2003 Tutorial on Data Integration
Web
Service
Application
code
23
Case Study: Amazon
• Services Exported
Product details (short, long, images, samples)
Purchase functionality
Ratings, reviews, collaborative filtering data, lists, …
• Examples
Store builder tools
Amazon Browser – visualization tool
Windows desktop interfaces – drag-n-drop…
MP3 Piranha
Games
Automatic review writer??
© Daniel S. Weld, PLANET 2003 Tutorial on Data Integration
24
Case Study: Google
• Services Exported
Search interface
Limits on items returned, queries / day
• Examples
Metacrawler functionality
Geosearch ‘nearby thai restaurants’
• TIGER, FIPs -> lat,long of pages
Robust hyperlinks
• Creates a signature for destination pages & tracks
with query
© Daniel S. Weld, PLANET 2003 Tutorial on Data Integration
25
Case Study: Fed Express
•
•
•
•
Shipment tracking
Proof of delivery
Invoice reviewed, adjusted, settled
Schedule pickup time, location
Outgoing or returns
• Order supplies (airbills, envelopes, boxes)
• Review shipping history
• Rate requests
Location, package size
• International trade
Required documents, duties, taxes
© Daniel S. Weld, PLANET 2003 Tutorial on Data Integration
26
Case Study: Hailstorm /
MyServices
• Web Services
MyDocuments
MyAddressbook
MyWallet
MyNotifications ….
• Scenario
Wallet keeps receipts, arranges product return
Expedia uses notifications to warn of canceled
flight
• Reality
Ebay, AmEx, Groove, …
© Daniel S. Weld, PLANET 2003 Tutorial on Data Integration
27
Case Study: OAA
• Common schema for travel industry
• Reservations
Flights, trains, rental cars, hotels
• Time & distances
• Payment, deposits, vouchers
• Vacation Packages
© Daniel S. Weld, PLANET 2003 Tutorial on Data Integration
28
Web Service Technology Stack
shopping web service?
Discovery
Web Service
Client
WSDL URIs
Web Service
Description
Packaging
UDDI
WSDL
SOAP pkg request
Proxy
WSDL
SOAP pkg response
Transport
Network
© Daniel S. Weld, PLANET 2003 Tutorial on Data Integration
29
SOAP (Simple Object Access
Protocol)
• SOAP Messages
XML Payload
• Using SOAP as RPC (Remote Procedure Call)
Messages
Request message
SOAP client
SOAP server
Response message
© Daniel S. Weld, PLANET 2003 Tutorial on Data Integration
32
If a WS were a Phone Call…
• XML
represents the conversation,
• SOAP
describes the rules for how to call someone
• UDDI
is the phone book.
• WSDL
describes what the phone call is about and how
you can participate.
© Daniel S. Weld, PLANET 2003 Tutorial on Data Integration
33
WSDL for int foo(int arg);
<types>
<schema targetNamespace="http://tempuri.org/xsd"
xmlns="http://www.w3.org/2001/XMLSchema"
xmlns:SOAP-ENC="http://schemas.xmlsoap.org/soap/encoding/"
xmlns:wsdl="http://schemas...l/" elementFormDefault="qualified" >
</schema>
</types>
<message name="Simple.foo">
<part name="arg" type="xsd:int"/>
</message>
<message name="Simple.fooResponse">
<part name="result" type="xsd:int"/>
</message>
<portType name="SimplePortType">
<operation name="foo" parameterOrder="arg" >
<input message="wsdlns:Simple.foo"/>
<output message="wsdlns:Simple.fooResponse"/>
</operation>
</portType>
© Daniel S. Weld, PLANET 2003 Tutorial on Data Integration
34
DISCO
• If you know the URL for a service
• DISCO lets you query them
• And get back a WSDL description
• But what if you don’t know the right URL?
© Daniel S. Weld, PLANET 2003 Tutorial on Data Integration
35
UDDI
• Hosted Registries
Microsoft, IBM, HP, SAP, NTT, BEA
• Entries defined with
Business information
• Name, contacts, descriptions, identifier, yellow pages category
Service information
• Entities, each of which describes a family of related services
which together implement a business process
Binding information
• How to invoke: URI, required parameters, options, & Tmodel
Service specifications (Tmodel)
• As a symbol – fingerprint to recognize a known service
• Decomposable to find WSDL description
© Daniel S. Weld, PLANET 2003 Tutorial on Data Integration
36
Acronyms (W3C, MSFT, IBM)
• UDDI
Discover, describe, register services
SOAP-based service for locating WSDL-formatted service
WSFL
descriptions
• DISCO
XLANG
Discover / retrieve SCL+SDL descrips
• SDL / NASSL
SOAP description lang –get params / types
BPEL4WS
• SCL
SOAP contract lang – extends SDL – orchestration of msgs
• WSDL
Describe abstract interface and
protocol bindings of arbitrary
network services (extends scl)
• XLANG / WSFL / BPEL4WS
SDL
NASSL
SCL
lang for biz processes used in BizTalk
Biz process execution language for web services
• MSFT, IBM, BEA proposal
© Daniel S. Weld, PLANET 2003 Tutorial on Data Integration
WSDL
37
The Layer Cake [TBL,XML2000]
© Daniel S. Weld, PLANET 2003 Tutorial on Data Integration
38
RDF (Resource Description Framework)
Way to describe resources via metadata
Makes no assumptions about a particular application domain
Based on XML
Another one?
Standard for semantic web
Restricts resource descriptions to triplets
(subject,predicate,object)
Provides a lightweight ontology system
Subproperty, Subclass, Domain & Range
© Daniel S. Weld, PLANET 2003 Tutorial on Data Integration
39
DAML+OIL (www.daml.org)
• DAML extends RDF and RDFS with richer
modeling primitives.
disjointWith, intersectionOf, oneOf, cardinality
• Able to provide properties of properties
uniqueness, transitivity, etc.
© Daniel S. Weld, PLANET 2003 Tutorial on Data Integration
40
DAML-S
DAML+OIL ontology describing Web Services
Complements low level descriptions like WSDL
Describes what and why a service operates,
Not just how to communicate with it.
Goals: Discovery, Invocation, Composition,
Verification, Execution Monitoring
(mapping to WSDL)
© Daniel S. Weld, PLANET 2003 Tutorial on Data Integration
41
Outline
• Execution for Data Integration
Coping with incomplete statistics, latency
Interleaved planning & execution
Convergent query processing
• Service Integration
Web service composition
• Background
• Representational issues
• Planning algorithms
Automated data analysis
© Daniel S. Weld, PLANET 2003 Tutorial on Data Integration
42
Partial Survey of Planners
• UW Internet Softbot
Planners: SENSp / XII / PUCCINI
Repr. languages: UWL / SADL ; LCW
• PKS
Planning at the knowledge level
• McDermott
Forward-chaining search w/ GRG guidance
• McIlraith et al.
ConGolog (procs, loops, conditionals, w/ nondet
• Papazoglou, Traverso et al.
Stratified service arch; XSRL language; MBP
• Finin; Srivastava; Knoblock; Ambite; Nau…
© Daniel S. Weld, PLANET 2003 Tutorial on Data Integration
43
Planning for image processing tasks
• Many fielded systems
Lansky’s COLLAGE , Chien et al. MVP/ASIP,
Golden ADLIM, Blythe GRID…
• Spatial representations important
•
Daily
Composit
8-day
ReLAZEA Mosaic
project
MODIS
LAI
Daily
Composit
8-day
ReLAZEA Mosaic
project
FPAR
MODIS
FPAR
LAI
GOES
RUC2
GRIB
Inputs
WGRIB
bin
Drilldown
Min, Max Temp
Mean Precip.
Mean wind
Filters
© Daniel S. Weld, PLANET 2003 Tutorial on Data Integration
Statistics
Land
Surface
Models
Models
NPP
Phenology
False
Color
Visualization
44
Motivating Scenarios
Planning a trip
Yahoo maps -> driving time -> travel prefs
Automatic expense form filing
Purchasing a group of items
Aggregation from multiple vendors
Select for: payment types, stock level, deliv
Local & 3rd party reputation services (BBB)
Monitoring marketplace
Auction sites
Events (check calendar / notification service
© Daniel S. Weld, PLANET 2003 Tutorial on Data Integration
45
UW Internet Softbot
•
•
•
•
Software robot
Effectors mv, ftp, chmod, cd, lpr, rm, ...
Sensors ls, finger, INSPEC, netfind, wc, ...
Say what we want, not how to do it
Find phone numbers, fetch/print online papers, …
• Integrate multiple resources
© Daniel S. Weld, PLANET 2003 Tutorial on Data Integration
46
Motivation/Contributions
• Represent actions like ls, finger
• Represent goals such as
“Rename paper.tex to kr.tex”
“Print all files in directory papers.”
(even with incomplete information)
• No previous system could express
© Daniel S. Weld, PLANET 2003 Tutorial on Data Integration
47
The Middle Ground
1. Action Representation
Tractability
Complete Info STRIPS
ADL
Incomplete
UWL
Expressiveness
Situation Calculus
Moore et al
2. Knowledge Representation
Complete Info
Incomplete
Tractability Expressiveness
Closed World Assumption (CWA)
OWA
Circumscription
© Daniel S. Weld, PLANET 2003 Tutorial on Data Integration
48
Softbot Architecture
Task
Manager
SADL
Actions
LCW
Knowledge
PUCCINI
Planner
Sensors
Effectors
UNIX shell & WWW
© Daniel S. Weld, PLANET 2003 Tutorial on Data Integration
49
SADL Family Tree
[Fikes & Nilsson, 71]
STRIPS
[Etzioni et al, 92]
Incomplete info,
Noise-free sensors
[Pednault, 89]
UWL
ADL
", Conditional
Effects
SADL
Represents ls, “Rename”, finger...
[Golden & Weld, 96]
© Daniel S. Weld, PLANET 2003 Tutorial on Data Integration
50
SADL/UWL Annotations
Goal annotations:
satisfy = achieve by any means
hands-off = don’t change (maintenance)
Effect annotations
cause = change world
observe = change agent’s knowledge
“Delete the file named junk”
satisfy (name (ƒ, junk)) satisfy(deleted (ƒ))
© Daniel S. Weld, PLANET 2003 Tutorial on Data Integration
51
Information Goals are Temporal
• Two time points
When proposition sampled
When reply given
• “Tell me now who was President in 1883”
• “Tell me tomorrow who is President now”
• “Identify (ASAP) the file now named `junk’”
© Daniel S. Weld, PLANET 2003 Tutorial on Data Integration
52
Information Goals are Temporal
“Rename paper.tex to kr.tex”
designator (name) changes
UWL can’t express
SADL solution
initially = time goal was posed
initially (name (ƒ, paper.tex)) 
satisfy (name (ƒ, kr.tex))
initially (name (ƒ, core))  satisfy (deleted (ƒ ))
Compare to more general temporal representation
© Daniel S. Weld, PLANET 2003 Tutorial on Data Integration
53
Tidiness Goals
“Print paper, but don’t leave it uncompressed.”
initially (compressed (paper), tv) 
satisfy (printed (paper)) 
satisfy (compressed (paper), tv)
State of paper.ps may change temporarily
C
but must be restored
B
Compare to more general goal lang, e.g. LTL
© Daniel S. Weld, PLANET 2003 Tutorial on Data Integration
54
Unbounded Information Gain
action ls (d )
precondition: satisfy(current.shell(csh)) 
satisfy(readable(d ))
effect:
" f when in.dir(f, d)
$ l,n,d observe(length(f, l )) 
observe(name(f, n )) 
observe(in.dir(f, d ))
© Daniel S. Weld, PLANET 2003 Tutorial on Data Integration
55
Compare PKS Representation
Initial State:
Kf = {(= (pwd) root), (indir papers root), (indir planner root),
(dir root), (dir papers), (dir planner), (file paper_tex)}
Kx = {((indir paper_tex planner) | (indir paper_tex papers))}
Goal:
K(indir paper_tex (pwd))
© Daniel S. Weld, PLANET 2003 Tutorial on Data Integration
56
The Internet Softbot
Task
Manager
SADL
Actions
LCW
Knowledge
PUCCINI
Planner
Sensors
Effectors
UNIX shell & WWW
© Daniel S. Weld, PLANET 2003 Tutorial on Data Integration
57
Knowledge Representation
• Closed World Assumption (CWA)
Made by classical planners
Anything not recorded as true is false
• Open World Assumption (OWA)
Anything not recorded true or false is unknown
Sensor abuse
Can’t handle " goals
© Daniel S. Weld, PLANET 2003 Tutorial on Data Integration
58
Sensor Abuse
• OWA: Don’t know when to stop sensing
Many ways to find same information
Many plans containing same action
• After executing find / -name foo, should know
ls bin won’t reveal more files named foo
ls tex won’t reveal more files named foo
Google may reveal more files named foo
© Daniel S. Weld, PLANET 2003 Tutorial on Data Integration
59
How Classical Planners Handle "
 " block (x) OnTable (x)
A B C
replaced with:
OnTable (A)  OnTable (B)
OnTable (C)
• Relies on CWA
Must know all blocks
OWA can never be sure
© Daniel S. Weld, PLANET 2003 Tutorial on Data Integration
A
B
C
60
Local Closed World Knowledge
• Complete info over restricted domain
All blocks on table, all products at Amazon
• Local Closed World Knowledge (LCW)
Restricted form of circumscription
Provides fast closed world inference
Allows fast updates
Suited to planner action representations.
© Daniel S. Weld, PLANET 2003 Tutorial on Data Integration
61
LCW Semantics
“I know all files in directory bin”
LCW(in.dir(f, bin))
LCW(in.dir(f, bin)) 
"f ⊨ in.dir(f, bin) 
⊨ in.dir(f, bin)
© Daniel S. Weld, PLANET 2003 Tutorial on Data Integration
62
LCW Representation
• M: Ground literals in agent’s model
in.dir(icaps03, papers)
in.dir(junk, papers)
executable(core)
• L: LCW formulas in agent’s model
LCW(in.dir(f, papers))
• If P  M, and L ⊢ LCW(P), then P
Conclude: in.dir(foo, papers)
© Daniel S. Weld, PLANET 2003 Tutorial on Data Integration
63
LCW Reasoning
• Inference
If I know all files in tex, and I know the
size of every file, then do I know the size
of every file in tex?
• Updates
If I know the size of every file in tex, and
I remove a file from tex, do I still know
the size of every file in tex?
What if I add a file to tex?
© Daniel S. Weld, PLANET 2003 Tutorial on Data Integration
64
LCW Reasoning is Hard
Theorem:
If LCW formulas can contain and then
answering an LCW query is NP-hard.
But we need fast inference!
• Solution: restrict representation
• Positive first-order conjunctions
• Fast polynomial time inference/updates
[Etzioni et al. AIJ]
[Levy VLDB96]
[Friedman & Weld IJCAI97]
© Daniel S. Weld, PLANET 2003 Tutorial on Data Integration
65
LCW Updates
• L must be updated when M changes.
• All changes to M fall into one of four
categories:
Information loss: Δ(φ,{T,F} U)
Information gain: Δ(φ,U {T,F})
Domain Growth: Δ(φ,F T)
Domain contraction: Δ(φ,T F)
© Daniel S. Weld, PLANET 2003 Tutorial on Data Integration
66
Domain Growth
Adding core to bin invalidate
LCW(in.dir(f, bin)  size(f,c))
unless the size of core is known!
Theorem:
If Δ(φ,F T) then
L’ L - MREL(φ)
A C
MREL(φ)  {Φ REL(φ) |⊬ LCW(Φ-X)θ}
REL(φ)  {ΦL |$(XΦ,θ,α) Xθφα ⊬ (Φ-X)θ}
© Daniel S. Weld, PLANET 2003 Tutorial on Data Integration
67
LCW Updates
Information
loss
Information
gain
Domain
growth
Domain
contraction
TF U
L’ L - REL()
UTF
L’ L LCW()
ls, wc
FT
L’ L - MREL()
cp
TF
L’ L
rm
© Daniel S. Weld, PLANET 2003 Tutorial on Data Integration
compress
68
Time (CPU seconds) 
Pruning Redundant Sensing
Experience (problems attempted)
© Daniel S. Weld, PLANET 2003 Tutorial on Data Integration

69
The Internet Softbot
Task
Manager
SADL
Actions
LCW
Knowledge
PUCCINI
Planner
Sensors
Effectors
UNIX shell & WWW
© Daniel S. Weld, PLANET 2003 Tutorial on Data Integration
70
XII / Puccini Planner
• Based on UCPOP
Generative, Partial-Order, Causal-Link
I.e. much like Gerevini’s LPG
• Efficient sensing (LCW control)
• Lifted support of " goals
[Golden et al. 94,
Golden Phd]
© Daniel S. Weld, PLANET 2003 Tutorial on Data Integration
71
Satisfying " Goals
Link Directly to " Effect
rm *  "f Satisfy(Deleted(f))
Subgoal on LCW;
Then Expand to Ground Form
ls  LCW
lpr foo, lpr bar  "f Satisfy(Printed(f))
Partition
© Daniel S. Weld, PLANET 2003 Tutorial on Data Integration
72
Threats to LCW, "
LCW(in.dir(f, /tex) & size(f, l))
ls -l /tex
goal
compress
/tex/paper
cause(length(paper),
mv junk /tex/
cause(in.dir(junk,
/tex), T)U)
Threat
Loss”
Threat==“Information
“Domain Growth”
Promote
Promote, Demote, Confront
Demote
Shrink
Confront
Enlarge
Shrink
© Daniel S. Weld, PLANET 2003 Tutorial on Data Integration
73
Softbot Status
• Fully Implemented (1997)
• Hundreds of Unix, Internet Actions
• Daunting Combinatorics
Rodney
Declarative Search Control
SIMS
Laborious, Brittle
• Hence...
Simon
© Daniel S. Weld, PLANET 2003 Tutorial on Data Integration
MetaCrawler
Info Manifold
? Improved Declarative Control
? Reactive Control
? Less Expressive Language
BargainFinder
ILA
Ahoy
ShopBot
Occam
74
PG-based Heuristics / Sensing
[Shaked03]
(subject PlayGo go)
(subject PlayGo go)
(subject PlayGo go)
(not (own PlayGo))
(trade PlayGo *b amazon)
(own PlayGo)
?
(search amazon chess)
(own PlayGo)
(own PlayGo)
(atStore *b amazon)
(atStore *b amazon)
(subject *b chess)
(subject *b chess)
(LCW((atStore !b amazon)
(subject !b chess)))
(LCW((atStore !b amazon)
(subject !b chess)))
?
(subject MySystem chess)
0
(own *b)
(subject MySystem chess)
1
© Daniel S. Weld, PLANET 2003 Tutorial on Data Integration
(order MySystem amazon)
(atStore MySystem amazon)
(subject MySystem chess)
2
75
Using the Graph
•
•
•
•
LPG-like search (local search on POP)
Propagating sensing action links
Executing to reach ‘better’ states
Sophisticated heuristics!
© Daniel S. Weld, PLANET 2003 Tutorial on Data Integration
76
Conclusion
• Planning for the web is ripe for progress
• Data integration
Modeling sources: GAV, LAV, …
Answering queries
using views
,
Interleaved planning and execution, eddies, cqp
• Service integration
Web service composition
Representing unbounded information gain
Latest heuristic search techniques => fast!
© Daniel S. Weld, PLANET 2003 Tutorial on Data Integration
77
PKS
• Contingent, forward-chaining planner
Constructs a complete, correct plan
Separates plan-time and execution-time effects
• Less Expressive
No universal quantification
• Still needs search control heuristics
[Pettrick & Bacchus KR00, AIPS02]
© Daniel S. Weld, PLANET 2003 Tutorial on Data Integration
78