a4academics.com

Download Report

Transcript a4academics.com

Supporting of search-as-you-type using
sql in databases
Guoliang Li, Chen Li

Click to edit the
outline text format

Second Outline
Level

Third Outline
Level

Fourth
Outline
Level
 Fifth
introduction
•
•
•
•
A search as you type system computes answer on
the fly as user types in keyword query character by
character
Finds the solution for both multi-keyword queries
and single keyword queries
Finds how to support search as you type in
relational database using SQL
Example:is Netflix
contents
•
Introduction
•
Preliminaries
•
Problem formulation
•
Exact search for single keyword
•
Fuzzy search for single keyword
•
Supporting multi keyword queries
•
Supporting first N queries
•
Supporting updates efficiently
•
Related work
preliminaries
•
Let T be the relational table
•
A ,A ,…….,A are the attributes
•
R={r ,r …..,r } is collection of records
•
ri[A j]-denotes content of the record ri in the
1
2
1
l
2
n
attribute Aj
•
W is the set of tokenized keywords in R
Relational table(T)
ID
Title
Authors
Book title Year
r1
K-Auto morphism: A General Framework for Privacy
Preserving Network
Publication
Lei Zou, Lei Chen, M. Tamer O¨ zsu
PVLDB
2009
r2
r3
r4
Privacy-Preserving Singular Value Decomposition
Shuguo Han, Wee Keong Ng,
Philip S. Yu
ICDE
2009
Privacy Preservation of Aggregates in Hidden
Databases: Why and How?
Arjun Dasgupta, Nan Zhang,
Gautam Das, Surajit Chaudhuri
SIGMOD
2009
Privacy-preserving Indexing of Documents on the
Network
Mayank Bawa, Roberto J.
Bayardo, Rakesh Agrawal,
Jaideep Vaidya
VLDBJ
2009
r5
r6
r7
On Anti-Corruption Privacy Preserving Publication
Yufei Tao, Xiaokui Xiao, Jiexing Li,
Donghui Zhang
ICDE
2008
Preservation of Proximity Privacy in Publishing
Numerical Sensitive Data
Jiexing Li, Yufei Tao, Xiaokui Xiao
SIGMOD
2008
Hiding in the Crowd: Privacy Preservation on
Evolving Streams through
Correlation Tracking
Feifei Li, Jimeng Sun, Spiros
Papadimitriou, George A.
Mihaila, Ioana Stanoi
SIGIR
2007
Search as you type for single keyword queries
•
Exact search: as user types a single partial key
word w character by character,search as you type
system finds the record that contain keyword with a
prefix w
•
It is also known as prefix search
•
eg: if user types in a query “sig”, the system returns
Returns records r3,r6 and r7
search as you type for single keyword queries
•
•
•
•
Fuzzy search: As user types in a single partial
keyword w character by character ,system finds
records with keywords similar to the query keyword
Eg:when we types in query “corel” ,record r7 is
relevant answer since it contain
keyword”correlation”
Edit distance measure is used for finding similarity
between strings
ed(s1,s2)
Search as you type for multi-keyword queries
•
•
•
Exact search: given a multi-keyword query Q with
m keywords w1,w2,…wm
Wm is considered as partial keyword and others are
complete keyword
Fuzzy search:finds the record that contains
keyword similar to the complete keyword and a
keyword with a prefix similar to the partial keyword
w
•
Wm is the partial keyword
•
Using the edit distance
Different approaches for search as you type
1.Use separate application layer

It can achieve high performance
2.Use database extenders

not safe method to query engine

Depends on API of specific database
3.Use SQL


More compatible since it is using standard SQL
More portable to different platforms than other
method
Exact search for single keyword
•
No index methods
•
Index based methods
No index methods
Issues a SQL query that scans each records and
verifies whether the record is an answer to the
query
1.Calling user defined functions (UDF)
Add functions to the database
2.Using LIKE predicate
Index based method
•
Building auxiliary tables as index structures
1.
Inverted index table
2.
Prefix table
Inverted- index table
•
Inverted Index table (IT)- records are in the
form<kid,rid>
•
Assigning unique id to the keywords in Table(T)
•
Kid-id of the keyword
Prefix table
•
Build prefix table with records in the
form<p,lkid,ukid>
•
P-prefix of the keyword
•
lkid-smallest id of the keyword with prefix p
•
Ukid-largest id of the keyword with prefix p
•
•
Prefix table can be used to find the range of
keyword with the prefix
Eg:the ids of the keyword with prefix “sig” must be
In the range [k6,k7]
Steps for exact search
•
For given partial keyword w
•
Get the keyword range [lkid,ukid] using prefix table
•
•
Find the records that have a keyword in the range
through the inverted index table IT
Then use the following SQL to answer the prefixsearch query w
SELECT T .* FROM PT , IT , T
WHERE PT .prefix = “w” AND
PT .ukid ≥ IT .kid AND PT .lkid ≤ IT .kid AND
The inverted-index table and prefix
table.
kid
keyword
kid
rid
prefix lkid
ukid
k1
icde
K2
r10
ic
k1
k2
K2
icdt
K5
r6
p
k3
k6
K3
preserving
K5
r8
pr
k3
k4
K4
privacy
K5
r10
pri
k4
K4
K5
publishing
k6
r1
pu
k5
k5
K6
sigmoid
K7
r9
pv
k6
K6
k7
sigir
k8
r3
pvl
k6
K6
fuzzy search for single keyword
No index methods
•
•
•
•
•
Using UDF(user defined function) for supporting
prefix search
LIKE predicate does not support fuzzy search
PED (w,s)-returns minimal edit distance between w
and the prefix of the keyword in s
Eg:PED(“pvb”,r10[title])=PED(“pvb”, “privacy in
database publishing”)=1
Uses the edit distance threshold and forms new UDF
Index based method
•
Using the inverted index table and prefix table to
support search as you type
•
Find the similar prefixes from the prefix table(PT)
•
Finds the keyword ranges of similar prefixes
•
Finds the answer based on the keyword ranges
using inverted index table
•
Method to finding similar prefixes
•
Using UDF
•
Gram-based method
Using UDF
•
•
Using a UDF to find similar prefixes
Issues an SQL query which scans each prefix in PT
and calls the UDF to check if the prefix is similar to w
Gram based method
•
•
q-gram based methods are used for string
matching
For a string “s” its q-grams are substrings with length
q
Neighborhood generation method
•
i-deletion neighborhood- for a given keyword w,
the substring of w by deleting ”i” characters is
called i-deletion neighborhood
•
Di(w) denotes i-deletion neighborhood
•
For eg: for a keyword “pvldb”
•
D0(pvldb)={pvldb}
Supporting multi keyword queries
Computing answer from scratch
Q –multi keyword query with m keyword w1,w2,..wm
1.
Using an “INTERSECT” operator
•
first find the record for each keyword
•
use the “INTERSECT”operator to join the records
2. Using full text indexes
•
•
first find the records matching the first m-1
complete keyword
Find the record matching last prefix keyword
Word level incremental computation
•
•
•
•
User types query Q with keyword w1,w2..wm
Temporary table CQ is created to cache the record
ids of the query Q
if user types new keyword and submit a new query
Q1 ,temporary table is used to incrementally answer
the query
two types of word level incremental computation
•
Exact search
•
Fuzzy search
Supporting first n queries
•
As user types in a query character by character
system gives the first-N results as instant feedback
Exact first-N queries
•
•
“LIMIT “Syntax is used which returns first N results
from the database
E.g.:”LIMIT n1,n2” which returns n2 rows starting from
n1th row
Fuzzy first-N queries
Fuzzy search for single keyword query w, find the
result with edit distance=0
Supporting updates efficiently
•
Data updates includes deleting and insertion of
records
Insertion
•
•
•
When a record is inserted then first assign it a new
record id
Insert each keyword in the inverted index table
If prefix is not in the prefix table add prefixes to the
prefix table
Deletion
•
If a record is deleted inverted index table use a bit to
Related work
1.Auto completion and search as you type
•
An auto completion system can predict a word or
phrase that a user may type in next based on
partial string the user has already typed
2.Approximate string search and similarity join
•
In this method, Given a set of String and query
string,all string in the set are similar to the query
string
3.Keyword search in database
conclusion
•
Need to leverage the existing DBMS functionalities
to meet the high performance requirement achieve
an interactive speed
•
•
•
To support prefix matching,there is proposed
solutions that use auxiliary tables as index
structures
Extended the techniques in the case of fuzzy
queries
Proposed incremental computation techniques to
answer multi-keyword queries
references
•
•
•
•
[1] S. Agrawal, K. Chakrabarti, S. Chaudhuri, and V.
Ganti. Scalable ad-hoc
entity extraction from text collections. PVLDB,
1(1):945–957, 2008.
[2] S. Agrawal, S. Chaudhuri, and G. Das. Dbxplorer:
A system for
keyword-based search over relational databases.
In ICDE, pages 5–16,2002.
thank you