Transcript Document

TU/e
eindhoven university of technology
Web Information Systems
Engineering
Flavius Frasincar
[email protected]
/department of mathematics and computer science
April 17, 2003
ISA
1
TU/e
eindhoven university of technology
Contents
•
•
•
•
What is a Web Information System (WIS)?
WIS Features
Problem: Data Management in WIS
Solution: Model-Driven Methodology (with Tasks
Separation)
• Methodologies for WIS:
– Strudel Methodology
– Hera Methodology
• Summary
/department of mathematics and computer science
April 17, 2003
ISA
2
TU/e
eindhoven university of technology
World Wide Web
• 1990: Tim Berners Lee (
World Wide Web
• The Web success is based on:
) invents the
– hypermedia (link) nature: links allow for a natural and
flexible access to information according to the
associative nature of human mind
– global availability
– interoperability
– simplicity
– free etc.
/department of mathematics and computer science
April 17, 2003
ISA
3
TU/e
eindhoven university of technology
Web Information Systems (WISs)
• 1998: Tomas Isakowitz at al. coined the term Web
Information Systems for: “information systems
that are based on Web technology”
• WISs are different from traditional information
systems as they “have the potential of reaching a
wider audience” through different platforms
• There is an even increased need to integrate data
as the data sources are distributed over the Web
and they are possibly heterogeneous
/department of mathematics and computer science
April 17, 2003
ISA
4
TU/e
Three Generations of WISs
eindhoven university of technology
• First Generation: are based on hand-crafted HTML
– Difficult to maintain (update)
• Second generation: generate HTML on demand by
automatically filling templates
– Data is machine readable/transformable
– Difficult to make the data machine understandable
• Third generation: Semantic Web Information
Systems (SWISs) are WISs based on Semantic Web
technology (RDF, OWL etc.)
– Data is machine understandable
/department of mathematics and computer science
April 17, 2003
ISA
5
TU/e
Present the Deep Web
eindhoven university of technology
Deep Web vs. Surface Web:
•500 times larger
•1000 times better quality
/department of mathematics and computer science
April 17, 2003
ISA
6
TU/e
eindhoven university of technology
WIS Features
• Data-intensive: integrate data from multiple
heterogeneous sources
• Pervasive: support different platforms
e.g. network (T1, 128K, 56K), display (PC, Palm, WAP Phone)
• User Adaptable: consider user’s preferences and
user’s state of mind while interacting with the
system
• Flexible: support semistructured data
• Automatic: need little or no human intervention
• User interactive: e.g. online shops (Amazon)
/department of mathematics and computer science
April 17, 2003
ISA
7
TU/e
Problem: Data Management
eindhoven university of technology
• WIS are hard to specify and implement
• Methodologies exist for manual WIS design but
few of them target automation
• Difficult tasks to perform:
–
–
–
–
–
–
Multiplatform support
Automatic updates
Automatic site reconstruction (WIS Adaptation)
Optimize WIS performance (WIS Optimization)
Enforce WIS integrity constraints (WIS Analysis)
Achieve flexibility, extensibility etc.
/department of mathematics and computer science
April 17, 2003
ISA
8
TU/e
eindhoven university of technology
Semistructured Data
• It is characterized by:
– Irregular structure: missing or additional attributes,
multiple attributes
– Few type constraints: attributes with different types in
different objects, heterogeneous collections
– Rapidly evolving schema or missing schema
• It is typically modeled by a DLG (Directed
Labeled Graph)
• Examples: HTML, XML, RDF, LaTeX Bib etc .
/department of mathematics and computer science
April 17, 2003
ISA
9
TU/e
Solution: Tasks Separation
eindhoven university of technology
• Isolate and automate common tasks for WIS
design:
– Choose and access the data (data integration and
retrieval) to be presented
– Design the navigational structure for this data
– Design the visual aspects of the presentation
• Use a model-driven approach for task specification
(the fairy says it brings “wisdom” [theory],
“richness”[money], and “beauty” [judge it
yourself] – Stefano Ceri)
/department of mathematics and computer science
April 17, 2003
ISA
10
TU/e
WIS Presentation Generation
Srategies
eindhoven university of technology
• Static (eager approach): presentations are
materialized completely, each page is
precomputed
• Dynamic or On-demand (lazy approach):
after each link “click” the next page to be
presented is computed
/department of mathematics and computer science
April 17, 2003
ISA
11
TU/e
eindhoven university of technology
Methodologies
• Dexter-based: HDM (Hypermedia Design Method)
• ER-based: RMM (Relationship Management
Methodology)
• OMT-based: OOHDM
• UML-based: OO-H (Conallen), UWE (UML-based
Web Engineering),W2000 (HDM extension)
• RDF-based: XWMF (eXtensible Web Modeling
Framework), Hera
• Other: Strudel, Araneus, WebML (Web Modeling
Language), Autoweb, Trellis, XAHM (XML-based
Adaptive Hypermedia Model ), WSDM, W3DT etc.
/department of mathematics and computer science
April 17, 2003
ISA
12
TU/e
eindhoven university of technology
Strudel Methodology
http://www.research.att.com/~mff/strudel
AT&T
/department of mathematics and computer science
April 17, 2003
ISA
13
TU/e
eindhoven university of technology
Strudel Architecture
Object-Oriented
Database
Relational
Database
XML
Database
…
Uniform Data Model
STRUQL
Site Graph
HTMLTemplate
Template
HTML
HTML
Template
HTML
Presentation
/department of mathematics and computer science
April 17, 2003
ISA
14
TU/e
eindhoven university of technology
Input Data
<publications>
…
<pub id=pub2>
<pub id=pub1>
<title> Catching the …</ title>
<title>Declarative spec…</title>
<author>Mary Fernandez</author>
<author>Mary Fernandez</author>
<author> Daniela Florescu </author>
<author>Dan Suciu</author>
<year>1998 </year>
<year>2000</year>
<booktitle> SIGMOD </booktitle>
<journal>VLDB</journal>
<abstract>Strudel is a …</abstract> <abstract> The Strudel …</abstract>
<category>Languages</category> <category>WIS</category>
…
<category>Methods</category>
</pub2>
…
</publications>
</pub1>
/department of mathematics and computer science
April 17, 2003
ISA
15
TU/e
Semistructured Data Model
eindhoven university of technology
Root
publications
pub
pub
pub2
pub1
year
year
2000
author
author
…
Direct
Labeled
Graph
(DLG)
…
M. Fernandez 1998
M. Fernandez
/department of mathematics and computer science
April 17, 2003
ISA
16
TU/e
eindhoven university of technology
STRUQL
(Site TRansformation Und Query Language)
where Root”publications”r, r”pub” x, xl v
{ where l=“year”
link
YearPage(v) ”year” v,
YearPage(v) ”paperPage” x,
RootPage() ”yearPage” YearPage(v)
collect RootPage{RootPage()},
YearPage{YearPage(v)}
}…
/department of mathematics and computer science
April 17, 2003
ISA
17
TU/e
eindhoven university of technology
Site Graph
RootPage()
“yearPage”
“yearPage”
YearPage(2000)
YearPage(1998)
“paperPage”
“paperPage”
“year”
…
“year”
“paperPage”
…
“paperPage”
2000 PaperPage(pub1) 1998 PaperPage(pub2)
/department of mathematics and computer science
April 17, 2003
ISA
18
TU/e
STRUDEL Template Language
eindhoven university of technology
• RootPage collection:
• PaperPage collection:
<html>
<sfor p in yearPage order=ascend
key=year>
<sfmt @p [email protected]>
</sfor>
</html>
<i>
<sif booktitle>
<sfmt booktitle>
<selse>
<sfmt journal>
</sif>
</i><br>
<sfor p in author>
<sfmt @p>,
</sfor><br>
<sfmt year><br>
• YearPage collection:
<h1><sfmt year></h1>
<ul>
<sfor p in paperPage>
<li><sfmt @p></li>
</sfor>
</ul>
/department of mathematics and computer science
April 17, 2003
ISA
19
TU/e
eindhoven university of technology
STRUDEL +/+ : Tasks separation (content and presentation)
Declarative specifications (enables presentation content
adaptation)
Verification of integrity constraints (e.g. “All paper pages
are reachable from RootPage”)
- : Intermixes schema and content defintion in the data graph
Does not separate navigation from visual details of the
presentation
Does not use standard technologies
/department of mathematics and computer science
April 17, 2003
ISA
20
TU/e
eindhoven university of technology
Hera Methodology
http://wwwis.win.tue.nl/~hera
TU/e
/department of mathematics and computer science
April 17, 2003
ISA
21
TU/e
eindhoven university of technology
Hera Architecture
Object-Oriented
Database
Relational
Database
ODB-XML
Wrapper
RDB-XML
Wrapper
User/Platform
Adaptation
Mediator/
Integrator
Query
Logical
Presentation
XML
Database
…
Information Retrieval
Hypermedia Presentation
Logical-HTML
Presentation
Logical-WML
Presentation
Logical-SMIL
Presentation
HTML
Presentation
WML
Presentation
SMIL
Presentation
…
/department of mathematics and computer science
April 17, 2003
ISA
22
TU/e
Hera Presentation Methodology
eindhoven university of technology
Conceptual Design
Conceptual Model
Application Design
Application Model
Transformation
Adaptation
Design
Transformation
Presentation Design
Presentation Model
/department of mathematics and computer science
April 17, 2003
ISA
23
TU/e
Conceptual Model (CM)
eindhoven university of technology
• Provides a uniform semantic view over different
data sources that are integrated within a given
Web application
• Consists of hierarchies of concepts relevant
within the given domain
• Concept relationships are:
– Attribute relationships: refer to literal values that
characterize a concept
– Reference relationships: refer to other concepts
/department of mathematics and computer science
April 17, 2003
ISA
24
TU/e
eindhoven university of technology
Example: CM
String
Integer
name
String
exemplifies
name
String
biography
String
created_by
exemplified_by
Technique
String
year
name
Artifact
creates
description
Creator
painted_by
Painting
Painter
paints
Property
subClassOf
subPropertyOf
picture
Image
/department of mathematics and computer science
April 17, 2003
ISA
25
TU/e
Example: CM in RDF/XML
eindhoven university of technology
<rdfs:Class rdf:ID="Creator"/>
<rdfs:Class rdf:ID="Artifact"/>
<rdfs:Class rdf:ID="Painter">
<rdfs:Class rdf:ID="Painting">
<rdfs:subClassOf rdf:resource="#Artifact"/> <rdfs:subClassOf rdf:resource="#Creator"/>
</rdfs:Class>
</rdfs:Class>
<rdf:Property rdf:ID="year">
<rdfs:domain rdf:resource="#Artifact"/>
<rdfs:range rdf:resource=“#Integer"/>
</rdf:Property>
<rdf:Property rdf:ID="picture">
<rdfs:domain rdf:resource="#Painting"/>
<rdfs:range rdf:resource=“#Image"/>
</rdf:Property>
<rdf:Property rdf:ID="creates"
sys:cardinality="multiple"
sys:inverse="created_by">
<rdfs:domain rdf:resource="#Creator"/>
<rdfs:range rdf:resource="#Artifact"/>
</rdf:Property>
/department of mathematics and computer science
April 17, 2003
ISA
26
TU/e
Application Model (AM)
eindhoven university of technology
• Captures the logical (navigational) aspects of the
presentation
• Based on the concept of slice which contains
attributes and possibly other slices
– A slice is a meaningful presentation unit
– A slice is associated to a concept from CM
• Slice relationships are:
– Aggregation relationships: embed a set of slices
(abstraction for index, tour, indexed guided tour etc).
– Reference relationships: link abstraction with an
anchor specified
/department of mathematics and computer science
April 17, 2003
ISA
27
TU/e
eindhoven university of technology
Example: AM
technique
painting
name
name
description
picture
year
painting
picture
painted_by
exemplified_by
painter
name
Set
main
main
/department of mathematics and computer science
April 17, 2003
ISA
28
TU/e
Example: AM in RDF/XML
eindhoven university of technology
<rdfs:Class rdf:ID="Slice.technique.main" <rdfs:Class rdf:ID="Slice.painting.main"
slice:owner="CM #Painting">
slice:owner=“CM#Technique"
<rdfs:subClassOf rdf:resource="#Slice"/>
slice:main="Yes">
<rdfs:subClassOf rdf:resource=“#Slice"/> </rdfs:Class>
</rdfs:Class>
<rdf:Property rdf:ID="slice-ref">
<slice:prop-ref rdf:resource="CM #ex_by"/>
<rdfs:Class rdf:ID="S.painting.picture"
<rdfs:domain rdf:resource="#S.t.main"/>
slice:owner=“CM#Painting"
<rdfs:range rdf:resource="#S.p.picture"/>
slice:attr-ref=“CM#picture">
<rdfs:subClassOf rdf:resource="#Slice"/> </rdf:Property>
</rdfs:Class>
<rdf:Property rdf:ID=“link_1">
<rdfs:subPropertyOf rdf:resource =“#link”>
<rdf:Property rdf:ID="media">
<rdfs:domain rdf:resource="#S.p.picture"/> <rdfs:domain rdf:resource="# S.p.picture"/>
<rdfs:range rdf:resource="#S.p.main"/>
<rdfs:range rdf:resource=“#Image"/>
</rdf:Property>
</rdf:Property>
/department of mathematics and computer science
April 17, 2003
ISA
29
TU/e
eindhoven university of technology
Adaptation
• Captures two kinds of adaptation
– Adaptability takes into account the device capabilities
and user preferences (UAProf = User Agent Profile)
– Adaptivity means that the presentation changes itself
according to the “state of the user’s mind” while being
browsed (UM = User Model)
• Adaptation based on conditioning the appearance
of slices using UAProf and/or UM
• Adaptivity uses AHAM (Adaptive Hypermedia
Application Model) update rules for updating UM
/department of mathematics and computer science
April 17, 2003
ISA
30
TU/e
Adapted Application Model
eindhoven university of technology
prf:ImageCapable = Yes
technique
painting
name
name
description
picture
year
painting
picture
painted_by
exemplified_by
painter
name
Set
main
main
um:Technique < 10
um:Painting < 10
/department of mathematics and computer science
April 17, 2003
ISA
31
TU/e
eindhoven university of technology
Presentation Model
• Defines the physical appearance of the presentation
• Based on the concept of region which contains
attributes and possibly other regions:
– Each region has a rectangular area associated
– Slices are translated to regions, one slice can be mapped
to several regions
• Slice relationships are materialized with:
– Navigational relationships
– Spatial relationships
– Temporal relationships
/department of mathematics and computer science
April 17, 2003
ISA
32
TU/e
eindhoven university of technology
Presentation Model
Region
bookcase
shelf
painting
0
Attribute
P.picture
P.picture
(Associated to a
certain painting P)
P.name
1
…
right
2
xy
0
below
Navigational
Relationship
Bookcase regions
Spatial
Relationship
P1
P2
P3
P1
‘Stone Bridge’
1638
P4
P5
P7
…
P6
…
Priority
(Priority 0 is always
fulfilled)
Screen rendering
/department of mathematics and computer science
April 17, 2003
ISA
33
TU/e
Presentation in Browsers
eindhoven university of technology
HTML
SMIL
WML
HyperText
Markup
Language
Synchronized
Multimedia
Integration
Language
Wireless
Markup
Language
/department of mathematics and computer science
April 17, 2003
ISA
34
TU/e
eindhoven university of technology
Implementation
• Models are represented in RDF and they are
serialized in RDF/XML
• User Agent Profile (UAProf): a Composite
Capability/Preference Profiles (CC/PP) vocabulary
to model device capabilities and user preferences
• XSLT processor for transforming between different
model instances (stylesheet-based transformation)
– Xalan (XSLT 1.0)
– Saxon (XSLT 2.0): multiple output files support
/department of mathematics and computer science
April 17, 2003
ISA
35
TU/e
eindhoven university of technology
Data Transformations
• Step 0: Preparation
– Substep 0.1: Application Model Unfolding creates the
skeleton of an AM instance
– Substep 0.2: Application Model Adaptation adds
slice visibility conditions to the previous skeleton
– Substep 0.3: Main Transformation Specification
Generation builds the specification for the next step
• Step 1: Main Transformation populates the AM
with the input CM instance
• Step 2: Presentation Generation produces code
for different browsers (HTML, WML, SMIL)
/department of mathematics and computer science
April 17, 2003
ISA
36
TU/e
eindhoven university of technology
Data Transformations
CC/PP user/platform
vocabulary
(rdfs)
conceptual model
vocabulary
(rdfs)
system media
vocabulary
(rdfs)
application model
vocabulary
(rdfs)
UAProf
vocabulary
(rdfs)
user profile
vocabulary
(rdfs)
application
indepedent
application
dependent
conceptual model
(rdfs)
conceptual model
instance
(rdf)
application model
(rdfs)
(1)
cmi2ami
(xsl)
user/platform profile
(rdf)
input
dependent
ami2html
(xsl)
application model
instance
(rdf)
(2)
HTML
ami2wml
(xsl)
(0.1)
reference
instantiation
(2)
WML
application model
unfolded
(rdf)
(0.3)
rdf2xsl
(xsl)
(0.2) adaptation
(2)
XSLT transf.
ami2smil
(xsl)
(xsl)
SMIL
application model
unfolded, adapted
(rdf)
RT
/department of mathematics and computer science
April 17, 2003
ISA
37
TU/e
eindhoven university of technology
Hera +/+ : Tasks separation (content, navigation, and presentation)
Model-based specifications (enables presentation content
adaptation)
Uses standard technology: RDF, RDF/XML, XSLT
- (Future Work):
Specifications are semi-formal (difficult to check
integrity constraints)
Does not (yet) support user interaction
/department of mathematics and computer science
April 17, 2003
ISA
38
TU/e
eindhoven university of technology
Summary
• What is a Web Information System (WIS)
• Features of WIS: data intensive, pervasive etc.
• Design methodologies for WIS:
– Strudel (from industry)
– Hera (from university)
• Model-based approach for WIS design
• WIS design tasks separation:
– Data Selection
– Navigation
– Presentation
/department of mathematics and computer science
April 17, 2003
ISA
39