EasyQuerier: A Keyword Interface in Web Database

Download Report

Transcript EasyQuerier: A Keyword Interface in Web Database

EasyQuerier: A Keyword Interface in
Web Database Integration System
Xian Li1, Weiyi Meng2, Xiaofeng Meng1
1 WAMDM Lab, RUC
& 2 SUNY Binghamton
Traditional Integrated Interface
Attributes and Value number in integrated inteface
700
600
500
705
400
577
Manually
800
Attribute and values
numbers
Domain list
300
342
63
42
49
202
35
52
100
316
200
j ob hunting
real estate
0
book
airfare
auto sales
Domains
attributs number
Q
Survey of value converge
350
Integrated interface of
Job
number of distinct values
Manually
max number of provided values
300
Airfare
Auto
Job seek
250
200
150
100
50
0
0
50
airflight title
car brand
job preference
number of sources
cabin
car class
industry
100
150
departure city
car body style
job title
What does EasyQuerier look like

EasyQuerier
Q
Manually
Automatically
Q
Book
……
Job
Q
Integrated interface
of Job
House
Automatically
New Features of EasyQuerier

Automatically domain mapping


User do not need to select domain from long list
More flexible Keyword Query

Different kinds of data type


More logic relation covered


“and”, “or”, “between…and”
Q1: New York or Washington, education, $2000-$3000




Text, numeric, currency, date
U1={New York, Washington}, logic: or
U2={education}
U3={$2000, $3000}, logic: range
Automatically query translation
EasyQuerier: overview

User input
his/her query
Part 1: Domain Map


Query
Query Translation
Domain
mapping
Part 2: Query translation

Domain
knowledge
base
Knowledge


Selected
Integrated Interface
Domain Knowledge
Collector
Domain
Domain
Domain
Collect the domain knowledge from
candidate domains
Similarity based domain mapping
strategy
Partially Keyword-attribute map
Holistically Keyword-attribute map
Challenge 1: Domain Mapping

Problem statement


Map a user query to the correct domain
automatically without domain information to
be separately entered.
Our solution



Domain representation model
Term weight assignment
Query-domain similarity
Domain mapping(1)

Domain representation model

D =< d_ID; CT; AT; V T >
 d_ID: unique domain identifier.
 CT = {cti|i=1,2,…} is a set of Conceptual Terms, which

describe the whole domain concept
AT =∪A∈D DAL(d_ID, Ai) is a set of Attribute Label Terms
consisting of attribute labels of the products in this domain


InteLabel, LocalLabel, OtherLabel
VT = ∪A∈D DAV(d_ID, Ai) is a set of the Value Terms
associated with the products’ attributes in the domain


Text Attribute: inteValue, LocalValue, Other Value
Non-text Attribute: VT can be characterized by the pre-defined
ranges available on the integrated interfaces.
Domain mapping(2)

Different terms have different ability to
differentiate the domains.


“price” is less powerful than “title” in differentiating
the book from others
Term weight assignment



Adopt idea of CVV, used to measure the skew of the
distribution of terms across all document databases
Ifij means how many times tj appears in either AT or
VT in Di CVVj as the CVV for tj
Weight(Di tj) = CVVj * ifij.
Domain mapping(3)

Q = {u1, u2, …, un}, ui ={vi1, vi2, …}

Q1 example



U1= {New York, Washington}, vi1={New York},
vi2= {Washington}
For each term tj in VT or AT
we only record the most matching term tj
=
Challenge 2: Query translation

Problem statement



Translate the query to the integrated interface
Just like filling the integrated interface with a set of keywords
Computation model




Def 4.1 (Keyword-Attribute Matching (KAM)). KAM(u,A).
Def 4.2 (Degree of Matching (DM)). For each KAM is has a
matching degree.
Def 4.3 (Query Translation Solution (QTS)) A QTS represents a
strategy of filling in the query interface. A QTS is comprised of
several KAMs.
Def 4.4 (Conviction) This measurement determines whether a
QTS is reasonable. The larger the DM of a KAM, the more
reasonable the KAM is. Such KAMs combined together will
generate optimal QTS
Query translation(1)

Computation of DM

For Q = {u1, u2, …, un}, ui ={vi1, vi2, …} ,
Sim(vxi, Aj) is the maximum value of all
Sim(vxi,tj)


Where the tj in the VT of Aj , Sim(vxi,tj) (same as
domain map)
Query translation(2)

Conviction


Conviction value of a QTS is a weighted sum of the DMs of the
related KAMs
Why weight?

If an attribute appears in more local interfaces of a domain, it is
more important in the domain.
weight w(Aj) for each attribute Aj based on its interface

For an attribute within the domain D


frequency ifi
Experiment

Settings



9 domains, each covers 50 web databases
10 students, 20 keyword queries for each domain
Measurement


Correct/acceptable/wrong
Overall/with domain/with attribute label/value only
Fig1: domain mapping accuracy
Fig2: query translation accuracy
Conclusion


In this paper, we proposed a novel keyword
based interface system EasyQuerier for ordinary
users to query structured data in various Web
databases.
We developed solutions to two technical
challenges


map keyword query to appropriate domains
translate the keyword query to a query for the
integrated search interface of the domain

Thank you~