EasyQuerier: A Keyword Interface in Web Database
Download
Report
Transcript EasyQuerier: A Keyword Interface in Web Database
EasyQuerier: A Keyword Interface in
Web Database Integration System
Xian Li1, Weiyi Meng2, Xiaofeng Meng1
1 WAMDM Lab, RUC
& 2 SUNY Binghamton
Traditional Integrated Interface
Attributes and Value number in integrated inteface
700
600
500
705
400
577
Manually
800
Attribute and values
numbers
Domain list
300
342
63
42
49
202
35
52
100
316
200
j ob hunting
real estate
0
book
airfare
auto sales
Domains
attributs number
Q
Survey of value converge
350
Integrated interface of
Job
number of distinct values
Manually
max number of provided values
300
Airfare
Auto
Job seek
250
200
150
100
50
0
0
50
airflight title
car brand
job preference
number of sources
cabin
car class
industry
100
150
departure city
car body style
job title
What does EasyQuerier look like
EasyQuerier
Q
Manually
Automatically
Q
Book
……
Job
Q
Integrated interface
of Job
House
Automatically
New Features of EasyQuerier
Automatically domain mapping
User do not need to select domain from long list
More flexible Keyword Query
Different kinds of data type
More logic relation covered
“and”, “or”, “between…and”
Q1: New York or Washington, education, $2000-$3000
Text, numeric, currency, date
U1={New York, Washington}, logic: or
U2={education}
U3={$2000, $3000}, logic: range
Automatically query translation
EasyQuerier: overview
User input
his/her query
Part 1: Domain Map
Query
Query Translation
Domain
mapping
Part 2: Query translation
Domain
knowledge
base
Knowledge
Selected
Integrated Interface
Domain Knowledge
Collector
Domain
Domain
Domain
Collect the domain knowledge from
candidate domains
Similarity based domain mapping
strategy
Partially Keyword-attribute map
Holistically Keyword-attribute map
Challenge 1: Domain Mapping
Problem statement
Map a user query to the correct domain
automatically without domain information to
be separately entered.
Our solution
Domain representation model
Term weight assignment
Query-domain similarity
Domain mapping(1)
Domain representation model
D =< d_ID; CT; AT; V T >
d_ID: unique domain identifier.
CT = {cti|i=1,2,…} is a set of Conceptual Terms, which
describe the whole domain concept
AT =∪A∈D DAL(d_ID, Ai) is a set of Attribute Label Terms
consisting of attribute labels of the products in this domain
InteLabel, LocalLabel, OtherLabel
VT = ∪A∈D DAV(d_ID, Ai) is a set of the Value Terms
associated with the products’ attributes in the domain
Text Attribute: inteValue, LocalValue, Other Value
Non-text Attribute: VT can be characterized by the pre-defined
ranges available on the integrated interfaces.
Domain mapping(2)
Different terms have different ability to
differentiate the domains.
“price” is less powerful than “title” in differentiating
the book from others
Term weight assignment
Adopt idea of CVV, used to measure the skew of the
distribution of terms across all document databases
Ifij means how many times tj appears in either AT or
VT in Di CVVj as the CVV for tj
Weight(Di tj) = CVVj * ifij.
Domain mapping(3)
Q = {u1, u2, …, un}, ui ={vi1, vi2, …}
Q1 example
U1= {New York, Washington}, vi1={New York},
vi2= {Washington}
For each term tj in VT or AT
we only record the most matching term tj
=
Challenge 2: Query translation
Problem statement
Translate the query to the integrated interface
Just like filling the integrated interface with a set of keywords
Computation model
Def 4.1 (Keyword-Attribute Matching (KAM)). KAM(u,A).
Def 4.2 (Degree of Matching (DM)). For each KAM is has a
matching degree.
Def 4.3 (Query Translation Solution (QTS)) A QTS represents a
strategy of filling in the query interface. A QTS is comprised of
several KAMs.
Def 4.4 (Conviction) This measurement determines whether a
QTS is reasonable. The larger the DM of a KAM, the more
reasonable the KAM is. Such KAMs combined together will
generate optimal QTS
Query translation(1)
Computation of DM
For Q = {u1, u2, …, un}, ui ={vi1, vi2, …} ,
Sim(vxi, Aj) is the maximum value of all
Sim(vxi,tj)
Where the tj in the VT of Aj , Sim(vxi,tj) (same as
domain map)
Query translation(2)
Conviction
Conviction value of a QTS is a weighted sum of the DMs of the
related KAMs
Why weight?
If an attribute appears in more local interfaces of a domain, it is
more important in the domain.
weight w(Aj) for each attribute Aj based on its interface
For an attribute within the domain D
frequency ifi
Experiment
Settings
9 domains, each covers 50 web databases
10 students, 20 keyword queries for each domain
Measurement
Correct/acceptable/wrong
Overall/with domain/with attribute label/value only
Fig1: domain mapping accuracy
Fig2: query translation accuracy
Conclusion
In this paper, we proposed a novel keyword
based interface system EasyQuerier for ordinary
users to query structured data in various Web
databases.
We developed solutions to two technical
challenges
map keyword query to appropriate domains
translate the keyword query to a query for the
integrated search interface of the domain
Thank you~