Special Topics in Computer Science Advanced Topics in Information Retrieval Chapter 1: Introduction Alexander Gelbukh www.Gelbukh.com.
Download
Report
Transcript Special Topics in Computer Science Advanced Topics in Information Retrieval Chapter 1: Introduction Alexander Gelbukh www.Gelbukh.com.
Special Topics in Computer Science
Advanced Topics in Information Retrieval
Chapter 1: Introduction
Alexander Gelbukh
www.Gelbukh.com
Motivation
First for libraries, but now — WWW!!!
Info: representation, storage, organization, access
Search Engines (IR systems)
User information need
o Plain English description query
Concerns of modern IR:
o
o
o
o
modeling
classification, categorization, filtering
system architecture
user interfaces, visualization, query languages
2
Data vs. Information Retrieval
Data Retrieval
Information Retrieval
Precise description Vague information need
Well-structured data Natural Language, images, ...
Semantic interpretation
Precise results
Approximate results
Yes-or-no results
Relevance ranking
Science
Art!
3
Basic Concepts
User task (search)
o Can formulate what they need: Retrieval (classical)
o Can’t (or does not know): Browsing (new to IR)
Still not very well integrated
o Filtering (user passive, contents active)
Logical view of docs
Slow, good
o ... Added linguistic info... not clear if helps
o Full text
o Text operations: reduce complexity to index terms
Keywords, stopwords
Stemming, noun groups (linguistic processing needed)
o Categories
Fast, bad
4
Past, Present, and Future
Since clay tablets
o Alphabetical index (formal)
o Table of Contents (by storing order)
o Classifications (by meaning)
Libraries
o Automation of classical techniques. Catalogs.
o Search by fields (exact match: author, title, keywords)
Web & Digital Libraries: interactive
o Cheaper huge amount of data
o Networks remote access, wider audience
o Free publishing unprepared, heterogeneous data
Artificial Intelligence and Linguistic methods
5
Main concerns
Open audience
o Help people to formulate their information need
o Improve retrieval quality. Intelligent methods
Efficiency (speed)
o Development of fast techniques
Interaction
o Watch user behavior to improve quality
o Privacy!
Open content
o Legal issues. Copyright. Responsibility for info quality
o Intelligent methods
6
Retrieval process
Database
o Define the logical view: text operations, text model
Index (e.g., inverted file)
User query
o Query operations (users are not good at this!)
Retrieved docs
o Ranked by likelihood (relevance)
Feedback cycle
7
The Textbook: Text IR
Models and Evaluation
o Modeling (basic concepts)
o Retrieval Evaluation
Improvements on Retrieval
o
o
o
o
Query Languages
Query Operations
Text Languages and Properties
Text Operations
Efficiency
o Indexing and Searching
9
Conferences & Journals
Confs on IR
o
o
o
o
IR
ACM SIGIR
TREC
SPIRE
Journal
o IR
General conferences
on text processing
o
o
o
o
o
ACL
COLING
CICLing
DEXA (databases)
NLDB
10
Conclusions
User Information Need
o Vague
o Semantic, not formal
Document Relevance
o Order, not retrieve
Huge amount of information
o Efficiency concerns
o Tradeoffs
IR is art more than science
11
Thank you!
12