Special Topics in Computer Science Advanced Topics in Information Retrieval Chapter 1: Introduction Alexander Gelbukh www.Gelbukh.com.

Download Report

Transcript Special Topics in Computer Science Advanced Topics in Information Retrieval Chapter 1: Introduction Alexander Gelbukh www.Gelbukh.com.

Special Topics in Computer Science
Advanced Topics in Information Retrieval
Chapter 1: Introduction
Alexander Gelbukh
www.Gelbukh.com
Motivation




First for libraries, but now — WWW!!!
Info: representation, storage, organization, access
Search Engines (IR systems)
User information need
o Plain English description  query
 Concerns of modern IR:
o
o
o
o
modeling
classification, categorization, filtering
system architecture
user interfaces, visualization, query languages
2
Data vs. Information Retrieval
Data Retrieval
Information Retrieval
 Precise description  Vague information need
 Well-structured data  Natural Language, images, ...
 Semantic interpretation
 Precise results
 Approximate results
 Yes-or-no results
 Relevance ranking
 Science
 Art!
3
Basic Concepts
 User task (search)
o Can formulate what they need: Retrieval (classical)
o Can’t (or does not know): Browsing (new to IR)
 Still not very well integrated
o Filtering (user passive, contents active)
 Logical view of docs
Slow, good
o ... Added linguistic info... not clear if helps
o Full text
o Text operations: reduce complexity to index terms
 Keywords, stopwords
 Stemming, noun groups (linguistic processing needed)
o Categories
Fast, bad
4
Past, Present, and Future
 Since clay tablets
o Alphabetical index (formal)
o Table of Contents (by storing order)
o Classifications (by meaning)
 Libraries
o Automation of classical techniques. Catalogs.
o Search by fields (exact match: author, title, keywords)
 Web & Digital Libraries: interactive
o Cheaper  huge amount of data
o Networks  remote access, wider audience
o Free publishing  unprepared, heterogeneous data
 Artificial Intelligence and Linguistic methods
5
Main concerns
 Open audience
o Help people to formulate their information need
o Improve retrieval quality. Intelligent methods
 Efficiency (speed)
o Development of fast techniques
 Interaction
o Watch user behavior to improve quality
o Privacy!
 Open content
o Legal issues. Copyright. Responsibility for info quality
o Intelligent methods
6
Retrieval process
 Database
o Define the logical view: text operations, text model
 Index (e.g., inverted file)
 User query
o Query operations (users are not good at this!)
 Retrieved docs
o Ranked by likelihood (relevance)
 Feedback cycle
7
The Textbook: Text IR

Models and Evaluation
o Modeling (basic concepts)
o Retrieval Evaluation

Improvements on Retrieval
o
o
o
o

Query Languages
Query Operations
Text Languages and Properties
Text Operations
Efficiency
o Indexing and Searching
9
Conferences & Journals
 Confs on IR
o
o
o
o
IR
ACM SIGIR
TREC
SPIRE
 Journal
o IR
 General conferences
on text processing
o
o
o
o
o
ACL
COLING
CICLing
DEXA (databases)
NLDB
10
Conclusions
 User Information Need
o Vague
o Semantic, not formal
 Document Relevance
o Order, not retrieve
 Huge amount of information
o Efficiency concerns
o Tradeoffs
 IR is art more than science
11
Thank you!
12