Transcript ppt
参会 报告 干艳桃 2013/11/15 目录 content Overview Keynotes Industry Talk Paper Awards Workshops Picture Time CIKM 2013 2 of 89 2013/11/15 22nd ACM International Conference on Information and Knowledge Management (CIKM 2013) DB track IR track Search, data streams and probabilistic queries, data streams and ranking, graphs and social networks, graphs and storage systems, query processing and privacy Retrieval models, entities, search engines, networks, evaluation, data classification, applications, ranking, users KM track Social networks, mining topics, pattern mining and applications, mining big data, ontologies, mobile and event mining, graphs and networks, clusters-topics and similarity, networks, mining reviews and wiki, learning and applications, similarity clustering and outlier mining, social networks and media, text, extraction and text mining, community and web mining, entities-tags and time series, mining and learning CIKM 2013 3 of 89 2013/11/15 San Francisco Welcome to San Francisco CIKM 2013 •Golden gate bridge •Fisherman's Wharf •Cable Car 4 of 89 •Haight Ashbury •Lombard Street 2013/11/15 CIKM 2013 1 Submissions to research track:848 2 143 full papers (16.86%) 3 106 short papers (29.36% cumulative) 4 All 3327 authors 5 750+ registrations 1200 1000 800 Submissions 600 Acceptance Highest registrations among all past 22 CIKMs Rate(% x 10) 400 200 0 CIKM 2013 5 of 89 2013/11/15 Sessions & Events 52 paper sessions CIKM 2013 6 of 89 2013/11/15 目录 content Overview Keynotes Industry Talk Paper Awards Workshops Picture Time CIKM 2013 7 of 89 2013/11/15 C. Lee Giles Pennsylvania State University K e y n o t e s Scholarly Big Data: Information Extraction and Data Mining Alon Y. Halevy Google Research Structured Data in Web Search Ronald Fagin IBM Research - Almaden Applying Theory to Practice Carlos Guestrin University of Washington Usability in Machine Learning at Scale with GraphLab CIKM 2013 8 of 89 2013/11/15 Applying Theory to Practice (and Practice to Theory) at IBM Ron Fagin IBM Research—Almaden Purpose of This Talk • Encourage collaboration between theoreticians and system builders – via two case studies • One initiated by the system builders, and one by the theoreticians • For theoreticians: – How to apply theory to practice – Why applying theory to practice can lead to better theory • For system builders: – The value of theory – The value of involving theoreticians First Case Study: Garlic (1996) Garlic Laura Haas Mr. Database Theoretician, we’ve got a problem with Garlic, our multimedia database system! • What was the problem? Garlic Example databases: . . . • The answers to queries in DB/2 are sets • The answers to queries in QBIC are sorted lists • How do you combine the results? Example • Searching a CD database for Artist = “Beatles” yields a set, via, say DB/2 Musicbrainz has 12 million recordings in its DB Example • AlbumColor = “Red” yields a sorted list, via, say QBIC .697 .683 .670 Redness .659 .629 What Was My Solution? • These weren’t just sorted lists: they were scored lists • Can view sets as scored lists (scores 0 or 1) • This reminded me of fuzzy logic • In fuzzy logic, conjunction (∧) is min, and disjunction (∨) is max Use fuzzy logic I like your solution. But we also need an efficient algorithm that can find the top k results while minimizing database accesses ⋮ I have an algorithm that finds the top k with only √n database accesses Laura Haas Ron Fagin Good, that beats linear! But we database people are spoiled, and are used to only log n accesses. Be smarter and get me a log n algorithm ⋮ I proved that you can’t do better than √n Influence Algorithm implemented in Garlic Influenced other IBM products, including •Watson Bundled Search system •InfoSphere Federation Server •WebSphere Commerce Paper introducing my algorithm has around 800 citations (Google Scholar) The Threshold Algorithm Amnon Lotem Moni Naor Ron Fagin • In 2001, we found the Threshold Algorithm – Optimal not just in the worst case or the average case, but in every case! The Problem • There are m attributes • Each object in a database has a score xi for attribute i • The objects are given in m sorted lists, one list per attribute • Goal: Find the top k objects according to a monotone scoring function, while minimizing access to the lists Multimedia Example REDNESS ROUNDNESS 177: 0.993 235: 0.999 139: 0.991 666: 0.996 702: 0.982 820: 0.992 ... ... 235: 0.325 177: 0.406 ... ... Threshold Algorithm • Do sorted access in parallel to each of the m sorted lists. • As each object R is seen under sorted access: – Do random access to retrieve all of its scores x1,…, xm – Compute its overall score f(x1,…, xm) – If this is one of the top k answers so far, remember it • For each list i, let ti be the score of the last object seen under sorted access • Define the threshold value T to be f(t1,…, tm). When k objects have been seen whose overall score is at least T, stop • Return the top k answers Threshold Algorithm: Example (using min) REDNESS 177: 0.993 ROUNDNESS 235: 0.999 • A scoring function f is monotone if whenever xi ≤ yi for every i, then f(x1,…, xm) ≤ f(y1,…, ym) Scoring function is min Threshold Algorithm: Example (using min) REDNESS ROUNDNESS 177: 0.993 235: 0.999 ... ... 235: 0.325 177: 0.406 ... ... Overall score for 177: min(0.993, 0.406) = .406 Overall score for 235: min(0.325, 0.999) = .325 Threshold Algorithm: Example (using min) REDNESS .993 ROUNDNESS 177: 0.993 235: 0.999 ... ... 235: 0.325 177: 0.406 ... ... Overall score for 177: min(0.993, 0.406) = .406 Overall score for 235: min(0.325, 0.999) = .325 Threshold value: min( 0.993 , 0.999 ) = .993 Threshold Algorithm: Example (using min) REDNESS ROUNDNESS 177: 0.993 235: 0.999 139: 0.991 666: 0.996 ... ... 235: 0.325 177: 0.406 ... ... Threshold Algorithm: Example (using min) REDNESS .991 ROUNDNESS 177: 0.993 235: 0.999 139: 0.991 666: 0.996 ... ... 235: 0.325 177: 0.406 ... ... Threshold value: min( 0.991 , 0.996 ) = .991 Threshold Algorithm: Example (using min) REDNESS ROUNDNESS 177: 0.993 235: 0.999 139: 0.991 666: 0.996 702: 0.982 820: 0.992 ... ... 235: 0.325 177: 0.406 ... ... Threshold Algorithm: Example (using min) REDNESS .982 ROUNDNESS 177: 0.993 235: 0.999 139: 0.991 666: 0.996 702: 0.982 820: 0.992 ... ... 235: 0.325 177: 0.406 ... ... Threshold value: min( 0.982 , 0.992 ) = .982 Threshold Algorithm: Example (using min) REDNESS .993 177: 0.993 ROUNDNESS 235: 0.999 ... ... 235: 0.325 177: 0.406 ... ... 177: .406 235: .325 ... … T= .993 Overall score for 177: min(0.993, 0.406) = .406 Overall score for 235: min(0.325, 0.999) = .325 Threshold value: min( 0.993 , 0.999 ) = .993 Threshold Algorithm: Example (using min) REDNESS .991 ROUNDNESS 177: 0.993 235: 0.999 139: 0.991 666: 0.996 ... ... 235: 0.325 139: 0.702 666: 0.317 177: 0.406 ... ... Overall score for 139: min(0.991, 0.702) = .702 Overall score for 666: min(0.317, 0.996) = .317 Threshold value: min( 0.991 , 0.996 ) = .991 177: .406 235: .325 139: .702 666: .317 ... … T= .991 Threshold Algorithm: Example (using min) REDNESS .982 ROUNDNESS 177: 0.993 235: 0.999 139: 0.991 666: 0.996 702: 0.982 820: 0.992 820: 0.973 702: 0.990 235: 0.325 139: 0.702 666: 0.31 177: 0.406 ... ... 177: .406 235: .325 139: .702 666: .317 702: .982 820: .973 ... … Overall score for 702: min(0.982, 0.990) = .982 Overall score for 820: min(0.973, 0.990) = .973 Threshold value: min( 0.982 , 0.992 ) = .982 T= .982 Correctness of the Halting Rule Suppose the current top k objects have scores at least T (the current threshold). Assume (by way of contradiction): R unseen; S in current top k ; f(R)>f(S) R has scores x1,…, xm ⇒ xi ≤ ti for every i (as R has not been seen) ⇒ f(R) = f(x1,…, xm) ≤ f(t1,…, tm) = T ≤ f(S) ⇒ contradiction! Influence • We submitted the paper to PODS ’01 • I was worried that the Threshold Algorithm was so simple that the paper would be rejected – So I called it a “remarkably simple algorithm” – The paper won the PODS Best Paper Award! • The paper was very influential – Around 1400 citations (Google Scholar) – PODS Test of Time Award in 2011 – IEEE Technical Achievement Award in 2011 Applications of TA • • • • • • • • • • • • • • • relational databases multimedia databases music databases semistructured databases text databases uncertain databases probabilistic databases graph databases spatial databases spatio-temporal databases web-accessible databases XML data web text data semantic web high-dimensional datasets • • • • • • • • • • • • information retrieval fuzzy data sets data streams search auctions wireless sensor networks distributed sensor networks distributed networks social-tagging networks document tagging systems peer-to-peer systems recommender systems personal information management systems • group recommendation systems Second Case Study: Clio (2003) Composing Schema Mappings S12 Schema S1 S23 Schema S2 Schema S3 S13 • With Popa, and Wang-Chiew Tan, we studied coPhokion Kolaitis,Lucian mposition of schema mappings • Our initial paper has around 800 citations, and won the International Conference on Database Theory Test of Time Award in 2013 Conclusions for System Builders • Consult with theoreticians – – – – Explaining the problem is useful by itself Principled approaches can improve your product Better or new algorithms can differentiate your product Algorithm analysis can provide performance expectations and provide product guarantees – Abstractions can expand the function of your product for Theoreticians • Involvement with system builders can help your theory! – – – – – Novel questions will be asked New models and new, interesting areas of study will arise Implementation can reveal weaknesses in the theory Theory will be relevant Practical impact! 目录 content Overview Keynotes Industry Talk Paper Awards Workshops Picture Time CIKM 2013 39 of 89 2013/11/15 Large-scale Deep Learning at Baidu The Online Revolution: Education for EveryOne Clustering: Probably Approximately Useless? Challenges in Commerce Search CIKM 2013 Online Learning from Streaming Data Computational Advertising: The LinkedIn Way Which Learning Algorithms Really Matter? Leveraging Data to Change Industry Paradigms 40 of 89 From Big Data to Big Knowledge Beyond Data: From User Information to Business Value Through Personalized 2013/11/15 Knowledge Vault Knowledge Management Secure, Social Business Modern Learning Management Everything Everywhere 目录 content Overview Keynotes Industry Talk Paper Awards Workshops Picture Time CIKM 2013 67 of 89 2013/11/15 1 2 3 4 BEST PAPER AWARD: Penguins in Sweaters, or Serendipitous Entity Search on User-generated Content Ilaria Bordino, Yelena Mejova, Mounia Lalmas (Yahoo! Research) BEST STUDENT PAPER AWARD: Mining a Search Engine’s Corpus Without a Query Pool Mingyang Zhang, Nan Zhang, (GWU) Gautam Das (UTA) HONORABLE MENTION PAPERS: Robust Question Answering over the Web of Linked Data Mohamed Yahya, Klaus Berberich, (MPII) Shady Elbassuoni, (AUB) Gerhard Weikum (MPII) Graph-of-word and TW-IDF: New Approach to Ad Hoc IR Francois Rousseau, Michalis Vazirgiannis (LIX, École Polytechnique, France) Programming with Personalized PageRank: A Locally Groundable First-Order Probabilistic Logic William Yang Wang, Kathryn Mazaitis, William W. Cohen (CMU) BEST POSTER AWARD: Software Plagiarism Detection: A Graph-based Approach Dong-kyu Chae, Jiwoon Ha, Boo-Joong Kang, Eul-Gyu Im (Hanyang University) CIKM 2013 68 of 89 2013/11/15 目录 content Overview Keynotes Industry Talk Paper Awards Workshops Picture Time CIKM 2013 69 of 89 2013/11/15 DARE DTMBIO CloudDB MNLP AKBC of Web-KR ESAIR PIKM 70 LL DOLAP CIKM 2013 CSTA DUBMOD UEO PLEAD 89 2013/11/15 目录 content Overview Keynotes Industry Talk Paper Awards Workshops Picture Time CIKM 2013 71 of 89 2013/11/15 La vie est ailleurs Art is long, Life is short CIKM 2013 72 of 89 2013/11/15 Bay Bridge CIKM 2013 4 of 8 2013/11/15 Wharf CIKM 2013 4 of 8 2013/11/15 Crab CIKM 2013 4 of 8 2013/11/15 CIKM 2013 4 of 8 2013/11/15 lobster CIKM 2013 4 of 8 2013/11/15 Golden Gate Bridge CIKM 2013 4 of 8 2013/11/15 CIKM 2013 4 of 8 2013/11/15 Caltrain CIKM 2013 4 of 8 2013/11/15 Halloween CIKM 2013 4 of 8 2013/11/15 Stanford CIKM 2013 4 of 8 2013/11/15 Stanford CIKM 2013 4 of 8 2013/11/15 Stanford CIKM 2013 4 of 8 2013/11/15 Stanford CIKM 2013 4 of 8 2013/11/15 Stanford CIKM 2013 4 of 8 2013/11/15 Stanford CIKM 2013 4 of 8 2013/11/15 Lombard Street CIKM 2013 4 of 8 2013/11/15 THANK YOU 参会报告 CIKM2013 Contact 干艳桃 Email:[email protected]