Transcript ppt

参会
报告
干艳桃
2013/11/15
目录 content
Overview
Keynotes
Industry Talk
Paper Awards
Workshops
Picture Time
CIKM 2013
2
of
89
2013/11/15
22nd ACM International Conference on Information and
Knowledge Management (CIKM 2013)
DB track
IR track
Search, data streams and probabilistic
queries, data streams and ranking, graphs
and social networks, graphs and storage
systems, query processing and privacy
Retrieval models, entities,
search engines, networks,
evaluation, data classification,
applications, ranking, users
KM track
Social networks, mining topics, pattern mining and applications, mining big data,
ontologies, mobile and event mining, graphs and networks, clusters-topics and
similarity, networks, mining reviews and wiki, learning and applications, similarity clustering and outlier mining, social networks and media, text, extraction and text
mining, community and web mining, entities-tags and time series, mining and learning
CIKM 2013
3
of
89
2013/11/15
San
Francisco
Welcome to
San Francisco
CIKM 2013
•Golden gate bridge
•Fisherman's Wharf
•Cable Car
4
of
89
•Haight Ashbury
•Lombard Street
2013/11/15
CIKM 2013
1 Submissions to research track:848
2 143 full papers (16.86%)
3 106 short papers (29.36% cumulative)
4 All 3327 authors
5 750+ registrations
1200
1000
800
Submissions
600
Acceptance
Highest
registrations
among all past 22 CIKMs
Rate(% x 10)
400
200
0
CIKM 2013
5
of
89
2013/11/15
Sessions & Events
52 paper
sessions
CIKM 2013
6
of
89
2013/11/15
目录 content
Overview
Keynotes
Industry Talk
Paper Awards
Workshops
Picture Time
CIKM 2013
7
of
89
2013/11/15
C. Lee Giles
Pennsylvania State University
K
e
y
n
o
t
e
s
Scholarly Big Data: Information Extraction and Data Mining
Alon Y. Halevy
Google Research
Structured Data in Web Search
Ronald Fagin
IBM Research - Almaden
Applying Theory to Practice
Carlos Guestrin
University of Washington
Usability in Machine Learning at Scale with GraphLab
CIKM 2013
8
of
89
2013/11/15
Applying Theory to Practice
(and Practice to Theory)
at IBM
Ron Fagin
IBM Research—Almaden
Purpose of This Talk
• Encourage collaboration between theoreticians
and system builders
– via two case studies
• One initiated by the system builders, and one by the
theoreticians
• For theoreticians:
– How to apply theory to practice
– Why applying theory to practice can lead to better theory
• For system builders:
– The value of theory
– The value of involving theoreticians
First Case Study:
Garlic (1996)
Garlic
Laura Haas
Mr. Database Theoretician, we’ve got a
problem with Garlic, our multimedia
database system!
• What was the problem?
Garlic
Example databases:
. . .
• The answers to queries in DB/2 are sets
• The answers to queries in QBIC are sorted lists
• How do you combine the results?
Example
• Searching a CD database for Artist = “Beatles”
yields a set, via, say DB/2
Musicbrainz has 12 million recordings in its DB
Example
• AlbumColor = “Red” yields a sorted list, via, say
QBIC
.697
.683
.670
Redness
.659
.629
What Was My Solution?
•
These weren’t just sorted lists: they were
scored lists
•
Can view sets as scored lists (scores 0 or 1)
•
This reminded me of fuzzy logic
•
In fuzzy logic, conjunction (∧) is min, and
disjunction (∨) is max
Use fuzzy logic
I like your solution. But we also need an efficient
algorithm that can find the top k results while
minimizing database accesses
⋮
I have an algorithm that finds
the top k with only √n
database accesses
Laura Haas
Ron Fagin
Good, that beats linear! But we database people are spoiled,
and are used to only log n accesses. Be smarter and get me a
log n algorithm
⋮
I proved that you can’t do
better than √n
Influence
Algorithm implemented in Garlic
Influenced other IBM products, including
•Watson Bundled Search system
•InfoSphere Federation Server
•WebSphere Commerce
Paper introducing my algorithm has around 800
citations (Google Scholar)
The Threshold Algorithm
Amnon
Lotem
Moni Naor Ron Fagin
• In 2001, we found the Threshold Algorithm
– Optimal not just in the worst case or the average
case, but in every case!
The Problem
• There are m attributes
• Each object in a database has a score xi for
attribute i
• The objects are given in m sorted lists, one list
per attribute
• Goal: Find the top k objects according to a
monotone scoring function, while minimizing
access to the lists
Multimedia Example
REDNESS
ROUNDNESS
177: 0.993
235: 0.999
139: 0.991
666: 0.996
702: 0.982
820: 0.992
...
...
235: 0.325
177: 0.406
...
...
Threshold Algorithm
• Do sorted access in parallel to each of the m sorted lists.
• As each object R is seen under sorted access:
– Do random access to retrieve all of its scores x1,…, xm
– Compute its overall score f(x1,…, xm)
– If this is one of the top k answers so far, remember it
• For each list i, let ti be the score of the last object seen
under sorted access
• Define the threshold value T to be f(t1,…, tm). When k
objects have been seen whose overall score is at least T,
stop
• Return the top k answers
Threshold Algorithm: Example (using min)
REDNESS
177: 0.993
ROUNDNESS
235: 0.999
• A scoring function f is monotone if whenever
xi ≤ yi for every i, then f(x1,…, xm) ≤ f(y1,…, ym)
Scoring function is min
Threshold Algorithm: Example (using min)
REDNESS
ROUNDNESS
177: 0.993
235: 0.999
...
...
235: 0.325
177: 0.406
...
...
Overall score for 177: min(0.993, 0.406) = .406
Overall score for 235: min(0.325, 0.999) = .325
Threshold Algorithm: Example (using min)
REDNESS
.993
ROUNDNESS
177: 0.993
235: 0.999
...
...
235: 0.325
177: 0.406
...
...
Overall score for 177: min(0.993, 0.406) = .406
Overall score for 235: min(0.325, 0.999) = .325
Threshold value: min( 0.993 , 0.999 ) = .993
Threshold Algorithm: Example (using min)
REDNESS
ROUNDNESS
177: 0.993
235: 0.999
139: 0.991
666: 0.996
...
...
235: 0.325
177: 0.406
...
...
Threshold Algorithm: Example (using min)
REDNESS
.991
ROUNDNESS
177: 0.993
235: 0.999
139: 0.991
666: 0.996
...
...
235: 0.325
177: 0.406
...
...
Threshold value: min( 0.991 , 0.996 ) = .991
Threshold Algorithm: Example (using min)
REDNESS
ROUNDNESS
177: 0.993
235: 0.999
139: 0.991
666: 0.996
702: 0.982
820: 0.992
...
...
235: 0.325
177: 0.406
...
...
Threshold Algorithm: Example (using min)
REDNESS
.982
ROUNDNESS
177: 0.993
235: 0.999
139: 0.991
666: 0.996
702: 0.982
820: 0.992
...
...
235: 0.325
177: 0.406
...
...
Threshold value: min( 0.982 , 0.992 ) = .982
Threshold Algorithm: Example (using min)
REDNESS
.993
177: 0.993
ROUNDNESS
235: 0.999
...
...
235: 0.325
177: 0.406
...
...
177: .406
235: .325
...
…
T= .993
Overall score for 177: min(0.993, 0.406) = .406
Overall score for 235: min(0.325, 0.999) = .325
Threshold value: min( 0.993 , 0.999 ) = .993
Threshold Algorithm: Example (using min)
REDNESS
.991
ROUNDNESS
177: 0.993
235: 0.999
139: 0.991
666: 0.996
...
...
235: 0.325
139: 0.702
666: 0.317
177: 0.406
...
...
Overall score for 139: min(0.991, 0.702) = .702
Overall score for 666: min(0.317, 0.996) = .317
Threshold value: min( 0.991 , 0.996 ) = .991
177: .406
235: .325
139: .702
666: .317
...
…
T= .991
Threshold Algorithm: Example (using min)
REDNESS
.982
ROUNDNESS
177: 0.993
235: 0.999
139: 0.991
666: 0.996
702: 0.982
820: 0.992
820: 0.973
702: 0.990
235: 0.325
139: 0.702
666: 0.31
177: 0.406
...
...
177: .406
235: .325
139: .702
666: .317
702: .982
820: .973
...
…
Overall score for 702: min(0.982, 0.990) = .982
Overall score for 820: min(0.973, 0.990) = .973
Threshold value: min( 0.982 , 0.992 ) = .982
T= .982
Correctness of the Halting Rule
Suppose the current top k objects have scores at
least T (the current threshold).
Assume (by way of contradiction):
R unseen;
S in current top k ;
f(R)>f(S)
R has scores x1,…, xm
⇒ xi ≤ ti for every i (as R has not been seen)
⇒ f(R) = f(x1,…, xm) ≤ f(t1,…, tm) = T ≤ f(S)
⇒ contradiction!
Influence
• We submitted the paper to PODS ’01
• I was worried that the Threshold Algorithm was
so simple that the paper would be rejected
– So I called it a “remarkably simple algorithm”
– The paper won the PODS Best Paper Award!
• The paper was very influential
– Around 1400 citations (Google Scholar)
– PODS Test of Time Award in 2011
– IEEE Technical Achievement Award in 2011
Applications of TA
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
relational databases
multimedia databases
music databases
semistructured databases
text databases
uncertain databases
probabilistic databases
graph databases
spatial databases
spatio-temporal databases
web-accessible databases
XML data
web text data
semantic web
high-dimensional datasets
•
•
•
•
•
•
•
•
•
•
•
•
information retrieval
fuzzy data sets
data streams
search auctions
wireless sensor networks
distributed sensor networks
distributed networks
social-tagging networks
document tagging systems
peer-to-peer systems
recommender systems
personal information management
systems
• group recommendation systems
Second Case
Study: Clio (2003)
Composing Schema Mappings
S12
Schema S1
S23
Schema S2
Schema S3
S13
• With Popa, and Wang-Chiew Tan, we studied
coPhokion Kolaitis,Lucian mposition of schema
mappings
• Our initial paper has around 800 citations, and
won the International Conference on Database
Theory Test of Time Award in 2013
Conclusions
for System Builders
• Consult with theoreticians
–
–
–
–
Explaining the problem is useful by itself
Principled approaches can improve your product
Better or new algorithms can differentiate your product
Algorithm analysis can provide performance expectations and provide
product guarantees
– Abstractions can expand the function of your product
for Theoreticians
• Involvement with system builders can help your
theory!
–
–
–
–
–
Novel questions will be asked
New models and new, interesting areas of study will arise
Implementation can reveal weaknesses in the theory
Theory will be relevant
Practical impact!
目录 content
Overview
Keynotes
Industry Talk
Paper Awards
Workshops
Picture Time
CIKM 2013
39
of
89
2013/11/15
Large-scale
Deep Learning
at Baidu
The Online Revolution:
Education for EveryOne
Clustering: Probably
Approximately Useless?
Challenges in Commerce
Search
CIKM 2013
Online Learning from
Streaming Data
Computational
Advertising: The
LinkedIn Way
Which
Learning
Algorithms
Really Matter?
Leveraging Data to Change
Industry Paradigms
40
of
89
From Big Data to
Big Knowledge
Beyond Data: From User
Information to Business
Value Through
Personalized
2013/11/15
Knowledge Vault
 Knowledge Management
 Secure, Social Business
 Modern Learning Management
 Everything Everywhere
目录 content
Overview
Keynotes
Industry Talk
Paper Awards
Workshops
Picture Time
CIKM 2013
67
of
89
2013/11/15
1
2
3
4
BEST PAPER AWARD:
Penguins in Sweaters, or Serendipitous Entity Search on User-generated Content
Ilaria Bordino, Yelena Mejova, Mounia Lalmas (Yahoo! Research)
BEST STUDENT PAPER AWARD:
Mining a Search Engine’s Corpus Without a Query Pool
Mingyang Zhang, Nan Zhang, (GWU) Gautam Das (UTA)
HONORABLE MENTION PAPERS:
Robust Question Answering over the Web of Linked Data
Mohamed Yahya, Klaus Berberich, (MPII) Shady Elbassuoni, (AUB) Gerhard Weikum (MPII)
Graph-of-word and TW-IDF: New Approach to Ad Hoc IR
Francois Rousseau, Michalis Vazirgiannis (LIX, École Polytechnique, France)
Programming with Personalized PageRank: A Locally Groundable First-Order Probabilistic
Logic
William Yang Wang, Kathryn Mazaitis, William W. Cohen (CMU)
BEST POSTER AWARD:
Software Plagiarism Detection: A Graph-based Approach
Dong-kyu Chae, Jiwoon Ha, Boo-Joong Kang, Eul-Gyu Im (Hanyang University)
CIKM 2013
68
of
89
2013/11/15
目录 content
Overview
Keynotes
Industry Talk
Paper Awards
Workshops
Picture Time
CIKM 2013
69
of
89
2013/11/15
DARE
DTMBIO
CloudDB
MNLP
AKBC
of
Web-KR
ESAIR
PIKM
70
LL
DOLAP
CIKM 2013
CSTA
DUBMOD
UEO
PLEAD
89
2013/11/15
目录 content
Overview
Keynotes
Industry Talk
Paper Awards
Workshops
Picture Time
CIKM 2013
71
of
89
2013/11/15
La vie est ailleurs
Art is long, Life is short
CIKM 2013
72
of
89
2013/11/15
Bay Bridge
CIKM 2013
4
of
8
2013/11/15
Wharf
CIKM 2013
4
of
8
2013/11/15
Crab
CIKM 2013
4
of
8
2013/11/15
CIKM 2013
4
of
8
2013/11/15
lobster
CIKM 2013
4
of
8
2013/11/15
Golden Gate Bridge
CIKM 2013
4
of
8
2013/11/15
CIKM 2013
4
of
8
2013/11/15
Caltrain
CIKM 2013
4
of
8
2013/11/15
Halloween
CIKM 2013
4
of
8
2013/11/15
Stanford
CIKM 2013
4
of
8
2013/11/15
Stanford
CIKM 2013
4
of
8
2013/11/15
Stanford
CIKM 2013
4
of
8
2013/11/15
Stanford
CIKM 2013
4
of
8
2013/11/15
Stanford
CIKM 2013
4
of
8
2013/11/15
Stanford
CIKM 2013
4
of
8
2013/11/15
Lombard Street
CIKM 2013
4
of
8
2013/11/15
THANK YOU
参会报告
CIKM2013
Contact 干艳桃
Email:[email protected]