Transcript Title

Web-Base Management Systems
Aaron Brown and David Oppenheimer
CS294-7
February 11, 1999
Slide 1
Introduction
• Online data is stored in both databases (relational)
and web sites (hypertext)
• Need single framework to manage both types of data
and present integrated views
• Solution: Web Base Management Systems (WBMSs)
– 2 challenges
1) querying and extracting structure from semi-structured web
data, transforming it, and presenting custom views
2) mapping structured database data to the web (adding
navigational access paths, redundancy, ...)
– To address these challenges, we need a data model that
maps between relational and hypertextual models
Slide 2
ARANEUS Data Models
Relational
ADM
HTML
Structure
Navigational access
Slide 3
ARANEUS Data Model
• ADM = Logical data model for web hypertexts
– Based on page schemes and navigational access paths
– Page scheme = logical structure shared by a set of pages
» Like a “class”
– Web page = instance of page scheme
» Like an “object” with identifier (URL) + attributes
Slide 4
ADM Example Fragment
Slide 5
Adding Structure to HTML
Relational
ADM
HTML
Structure
Navigational access
Slide 6
EDITOR: Structuring HTML
• EDITOR starts with an existing ADM scheme
– Generated by inspection of web site
• EDITOR maps web page text to attributes of an ADM
page scheme
– “Wrapping” a web page
– Imposes structure on web pages
• EDITOR uses a procedural language to guide the
wrapping process
– Each page seen as object with extraction methods
» One method for each attribute of page
» Method accesses page’s HTML source, extracts value of
corresponding attribute
Slide 7
Querying ADM-Structured Hypertext
Relational
ADM
HTML
Structure
Navigational access
Slide 8
ULIXES: A Navigational Query Lang.
• Language for defining relational views over hypertext
that follows an ADM scheme
– Based on navigational expressions (path expressions)
• DEFINE TABLE statement creates relational views
based on page schemes
– local materialized view (tuples) or
– virtual view
» user can then pose SQL queries across multiple views
» optimizer chooses optimal navigation path through site to satisfy
query
• fetches hypertext pages and extracts attributes via EDITOR
wrappers
• cost metric is number of HTML page fetches
Slide 9
ULIXES Example
DEFINE TABLE VLDBPapers (Authors, Title, Reference)
AS
AuthorSearchPage.NameForm.Submit ->
AuthorPage.WorkList
IN
DBLPScheme
USING
AuthorPage.WorkList.Authors,
AuthorPage.WorkList.Title,
AuthorPage.WorkList.Reference
WHERE
AuthorSearchPage.NameForm.Name =
‘Leonardo Da Vinci’
AuthorPage.WorkList.Reference LIKE ‘%VLDB%’
Slide 10
Generating ADM from existing DB
Relational
ADM
HTML
Structure
Navigational access
Slide 11
The ARANEUS Design Methodology
Database Conceptual
Design (ER)
Database Logical
Design (relational)
Hypertext Conceptual
Design (NCM)
Hypertext Logical
Design (ADM)
DB Mapping (PENELOPE)
+ Page Design (HTML)
Web Site Generation
Slide 12
Database Conceptual Model
•
•
•
•
Starting point for database design
Conceptual description of a domain
Represents essential properties of data abstractly
Entity-Relationship Model
– Based on entities and relationships among entities
– Rectangles = entity sets
» Associated attributes are connected with lines
– Diamonds = relationship sets
» Lines connect entity sets via relationship sets
Slide 13
ER Example
Slide 14
Hypertext Conceptual Design
• ER not suitable for modeling hypertext
– no directed paths (links)
– hypertext access paths not modeled (web page hierarchies)
– no way to group related entities into a singe “macroentity”
• Navigational Conceptual Model (NCM) describes these
conceptual properties of hypertext
– macroentities (groups of related ER entites) model hypertext
nodes
» associated with simple (atomic) or complex (structured)
attributes, either mono- or multi-valued
– directed relationships model links (may be bidirectional)
– union nodes model link targets that can be of different types
– aggregations model hierarchical access paths
Slide 15
Mapping ER to NCM: Example
Department
Room#
Room
General
Title Speaker Date
People
Seminar
Name
Phone
Professor
1:N
SPlace
1:1
1:N
Responsible
ER Model
Education
Research
1:1
Seminar
Room#
Title
Speaker
Date
Responsible
1:1
Professor
Name
Phone
NCM Model
Slide 16
Mapping NCM to ADM
1) macroentity -> one or more pages
single-valued attribute -> ADM simple attribute
multi-valued attribute -> ADM list
2) directed relationship -> link to another page scheme
– anchor = a descriptive key of target macroentity
– reference = URL of target page scheme
3) aggregation node -> ADM “unique” page scheme
– unique page scheme = page scheme with only one instance
4) long lists -> forms
– list items retrieved through program running on server
Slide 17
Mapping NCM to ADM: Example
Slide 18
The ARANEUS Design Methodology
Database Conceptual
Design (ER)
Database Logical
Design (relational)
Hypertext Conceptual
Design (NCM)
Hypertext Logical
Design (ADM)
DB Mapping (PENELOPE)
+ Page Design (HTML)
Web Site Generation
Slide 19
Generating web site from ADM + DB
Relational
ADM
HTML
Structure
Navigational access
Slide 20
Hypertext Views of DB Data
• Given a database and an ADM scheme for it
– database may be local
» derived from design methodology
» uses derived ADM scheme
– composed from one or more remote sites
» derived from integrated relational view produced by one or more
ULIXES queries
» uses new ADM scheme concocted to match integrated view
• PENELOPE language used to integrate ADM and DB in
a generated hypertext
– PENELOPE description = ADM augmented with URL’s and
references to database fields
Slide 21
PENELOPE Description
• Query: reorganize (Da Vinci’s VLDB) papers based on year
DEFINE PAGE YearPage
AS
URL
Year:
WorkList:
FROM
URL(<Year>);
TEXT<Year>;
LIST OF (Authors:
Title:
Reference:
ToRefPage:
TEXT <Authors>;
TEXT <Title>;
TEXT <Reference>;
LINK TO ConferencePage UNION
JournalPage <ToRefPage>);
DaVinciPapers
DEFINE PAGE DaVinciYearsPage UNIQUE
AS
URL ‘result.html’;
YearList:
LIST OF (Year:
TEXT
<Year>;
ToYearPage:LINK TO YearPage
(URL(<Year>)));
FROM
DaVinciPapers
Slide 22
Derived Hypertext View
Slide 23
Resulting Web Pages
Slide 24
Retrospective
• Exceptions during wrapping
– Logically homogenous pages may be physically heterogeneous
» Different ways of laying out the same information
» Errors masked by browsers
• ULIXES syntax is difficult for beginners
– Alternatives
» Fill out forms corresponding to pre-determined ULIXES queries
» Developed POLYPHEMUS query interface
• User selects path for query by clicking on graphical
representation of ADM page schemes
• Push vs. Pull
– Either supported; hybrid model preferred
– Dealing with updates
» each DB update generates a mixed transaction that updates
both the DB and any pushed (static) HTML pages
• Managing internal sites
– PENELOPE-generated HTML includes description of page
scheme and tags attributes
» Like XML but uses HTML comments
Slide 25
Conclusion
• ARANEUS provides database-like functionality for
mixed web/relational DB data
• More to be filled in later...
Relational
ADM
HTML
Slide 26