A Risk Minimization Framework for Information Retrieval

Download Report

Transcript A Risk Minimization Framework for Information Retrieval

CS598CXZ Course Summary
ChengXiang Zhai
Department of Computer Science
University of Illinois, Urbana-Champaign
Course Goal
•
•
•
Advanced (graduate-level) introduction to the field of
information retrieval (IR)
Goal
– Provide an overview of IR research in the past several
decades
– Systematically review the core research topics in IR
– Discuss the most recent research progress (customized
toward the interests of students)
– Give students enough training for doing research in IR
or applying advanced IR techniques to applications
More in-depth treatment of topics than CS410: less
emphasis on practical skills, more on understanding
of principles, models, and algorithms
2
IR Research Topics (Broad View)
Users
Retrieval
Applications
Visualization
Summarization
Filtering
Information
Access
Analytics
Applications
Mining
Information
Organization
Search
Categorization
Extraction
Clustering
Natural Language Content Analysis
Text
Text Acquisition
Text
Mining
IR Topics (narrow view)
docs
4. Efficiency & scalability
INDEXING
Query
Doc
3. Document
Rep
Rep
representation/structure
SEARCHING
Ranking Models
2. Retrieval (Ranking)
Feedback
7. Feedback/Learning
query
6. User interface
(browsing) User
1. Evaluation
results
5. Search
result
INTERFACE
summarization/presentation
judgments
QUERY MODIFICATION
LEARNING
Our focus: 1, 2, 7
4
IR Topics covered & Related Topics
docs
Parallel Prog.
4. Efficiency & scalability
INDEXING
Query
Doc
3. Document
Rep
Rep
representation/structure
SEARCHING
Ranking Models
2. Retrieval (Ranking)
Feedback
7. Feedback/Learning
ML
HCI
query
6. User interface
(browsing) User
1. Evaluation
results
5. Search
result
INTERFACE
summarization/presentation
judgments
QUERY MODIFICATION
LEARNING
Our focus: 1, 2, 7
5
NLP
Core Knowledge that You Should Know
•
IR Evaluation Methodology (Cranfield Lab Test)
– Emphasizes on realistic task modeling
– Test set creation/sharing
– Measures
– Comparative analysis of components
– Statistical significance test
•
Check out the
midterm topics for details
Retrieval Models
– Vector-Space (retrieval heuristics)
– Probabilistic (language models, statistical estimation)
– Machine learning (basic idea)
• Topic models
– EM algorithm
You’ll likely find these to be useful
for your research in general
6
Be Familiar with Some Frontier Topics
•Document Representation and Content Analysis (e.g., text representation,
document structure, linguistic analysis, non-English IR, cross-lingual IR, information
extraction, sentiment analysis, clustering, classification, topic models, facets)
•Queries and Query Analysis (e.g., query representation, query intent, query log analysis,
question answering, query suggestion, query reformulation)
•Users and Interactive IR (e.g., user models, user studies, user feedback, search interface,
summarization, task models, personalized search)
•Retrieval Models and Ranking (e.g., IR theory, language models, probabilistic retrieval
models, feature-based models, learning to rank, combining searches, diversity)
•Search Engine Architectures and Scalability ( e.g., indexing, compression,
MapReduce, distributed IR, P2P IR, mobile devices)
•Filtering and Recommending (e.g., content-based filtering, collaborative filtering,
recommender systems, profiles)
•Evaluation (e.g., test collections, effectiveness measures, experimental design)
•Web IR and Social Media Search (e.g., link analysis, query logs, social tagging, social
more
thissearch,
fromforum
project
presentation!
network analysis,Learn
advertising
and about
search, blog
search,
CQA, adversarial IR,
vertical and local search)
•IR and Structured Data (e.g., XML search, ranking in databases, desktop search, entity
search)
•Multimedia IR (e.g., Image search, video search, speech/audio search, music IR)
•Other Applications (e.g., digital libraries, enterprise search, genomics IR, legal IR, patent
7
search, text reuse)
Beyond Information Retrieval:
Take Other Related Courses
Applications
Models
Machine Learning
Pattern Recognition
Data Mining
Statistics
Optimization
Natural
Language
Processing
Algorithms
Applications
Web, Bioinformatics…
Information
Retrieval
Library & Info
Science
Databases
Software engineering
Computer systems
Systems
8
Remaining Tasks for You
• Work on your projects: Let me know ASAP if you
need help
• Present your project at 1:30-4:30pm on Friday,
Dec. 12 (room 1304 SC)
• Submit your reports by Dec. 19, Friday