Special Topics in Computer Science The Art of Information Retrieval Chapter 1: Introduction Alexander Gelbukh www.Gelbukh.com.
Download ReportTranscript Special Topics in Computer Science The Art of Information Retrieval Chapter 1: Introduction Alexander Gelbukh www.Gelbukh.com.
Special Topics in Computer Science
The Art of Information Retrieval
Chapter 1: Introduction
Alexander Gelbukh
www.Gelbukh.com
Motivation
Info: representation, storage, organization, access Search Engines (IR systems) User information need o Plain English description query First for libraries, but now — WWW!!!
Modern IR: o modeling o classification, categorization, filtering o system architecture o user interfaces, visualization, query languages 2
Data vs. Information Retrieval
Data Retrieval Precise description Well-structured data Precise results Yes-or-no results Information Retrieval Vague
information need
Natural Language, images, ...
Semantic
interpretation
Approximate results
Relevance
ranking
Science
Art!
3
Basic Concepts
User task (search) o Can formulate what they need:
Retrieval
(classical) o Can’t (or does not know):
Browsing
(new to IR) Still not very well integrated o
Filtering
(user passive, contents active) Logical view of docs o ... (
Added linguistic info
) o
Full text
Slow, good o Text operations: reduce complexity to
index terms
Keywords, stopwords Stemming, noun groups.
Linguistic processing!
o Categories Fast, bad 4
Past, Present, and Future
Since clay tablets o Alphabetical index (formal) o Table of Contents (by order) o Classifications (by meaning) Libraries o Automation of classical techniques. Catalogs.
o Search by fields (author, title, keywords) Web. Digital Libraries: interactive o Cheaper huge amount of data o Networks remote access, wider audience o Free publishing unprepared, heterogeneous data Artificial Intelligence and Linguistic methods 5
Main concerns
Open audience o Help people to formulate their information need o Improve retrieval quality.
Intelligent methods
Efficiency (speed) o Development of fast techniques Interaction o Watch user behavior to improve quality o Privacy!
Open content o Legal issues. Copyright. Responsibility for info quality o Intelligent methods 6
Retrieval process
Database o Define the logical view: text operations, text model Index (e.g., inverted file) User query o Query operations (users are not good at this!) Retrieved docs o Ranked by likelihood (relevance) Feedback cycle 7
Topics o o o o Text IR Interfaces Multimedia IR Applications We will not consider: o o o Parallel and Distributed IR Multimedia IR: Models and Languages Multimedia IR: Indexing and Searching
The Book
9
Chapters: Text IR
Models and Evaluation o Modeling (basic concepts) o Retrieval Evaluation Improvements on Retrieval o Query Languages o Query Operations o Text Languages and Properties o Text Operations Efficiency o Indexing and Searching 11
Chapters: Interfaces, Applications
Interfaces o User Interfaces and Visualization Applications o Searching the Web o Libraries and Bibliographical Systems o Digital Libraries 12
Book’s web page
sunsite.dcc.uchile.cl/irbook/ Errata Test data Other courses, papers, and a lot more Korean version is NOT recommended.
Read in English!
13
Conferences
General conferences on text processing o ACL o COLING o CICLing o DEXA (databases) o NLDB Confs on IR o ACM SIGIR o TREC o SPIRE 14
User Information Need o Vague o Semantic, not formal Document Relevance o Order, not retrieve Huge amount of information o Efficiency concerns o Tradeoffs Art more than science
Conclusions
15
Thank you!
Till September 25
16