Special Topics in Computer Science The Art of Information Retrieval Chapter 1: Introduction Alexander Gelbukh www.Gelbukh.com.

Download Report

Transcript Special Topics in Computer Science The Art of Information Retrieval Chapter 1: Introduction Alexander Gelbukh www.Gelbukh.com.

Special Topics in Computer Science

The Art of Information Retrieval

Chapter 1: Introduction

Alexander Gelbukh

www.Gelbukh.com

Motivation

     Info: representation, storage, organization, access Search Engines (IR systems) User information need o Plain English description  query First for libraries, but now — WWW!!!

Modern IR: o modeling o classification, categorization, filtering o system architecture o user interfaces, visualization, query languages 2

Data vs. Information Retrieval

Data Retrieval  Precise description  Well-structured data   Precise results Yes-or-no results Information Retrieval  Vague

information need

 Natural Language, images, ...

  Semantic

interpretation

Approximate results 

Relevance

ranking 

Science

Art!

3

Basic Concepts

  User task (search) o Can formulate what they need:

Retrieval

(classical) o Can’t (or does not know):

Browsing

(new to IR)  Still not very well integrated o

Filtering

(user passive, contents active) Logical view of docs o ... (

Added linguistic info

) o

Full text

Slow, good o Text operations: reduce complexity to

index terms

  Keywords, stopwords Stemming, noun groups.

Linguistic processing!

o Categories Fast, bad 4

Past, Present, and Future

    Since clay tablets o Alphabetical index (formal) o Table of Contents (by order) o Classifications (by meaning) Libraries o Automation of classical techniques. Catalogs.

o Search by fields (author, title, keywords) Web. Digital Libraries: interactive o Cheaper   huge amount of data o Networks remote access, wider audience o Free publishing  unprepared, heterogeneous data Artificial Intelligence and Linguistic methods 5

Main concerns

    Open audience o Help people to formulate their information need o Improve retrieval quality.

Intelligent methods

Efficiency (speed) o Development of fast techniques Interaction o Watch user behavior to improve quality o Privacy!

Open content o Legal issues. Copyright. Responsibility for info quality o Intelligent methods 6

Retrieval process

  Database o Define the logical view: text operations, text model Index (e.g., inverted file)    User query o Query operations (users are not good at this!) Retrieved docs o Ranked by likelihood (relevance) Feedback cycle 7

  Topics o o o o Text IR Interfaces Multimedia IR Applications We will not consider: o o o Parallel and Distributed IR Multimedia IR: Models and Languages Multimedia IR: Indexing and Searching

The Book

9

Chapters: Text IR

   Models and Evaluation o Modeling (basic concepts) o Retrieval Evaluation Improvements on Retrieval o Query Languages o Query Operations o Text Languages and Properties o Text Operations Efficiency o Indexing and Searching 11

Chapters: Interfaces, Applications

  Interfaces o User Interfaces and Visualization Applications o Searching the Web o Libraries and Bibliographical Systems o Digital Libraries 12

Book’s web page

    sunsite.dcc.uchile.cl/irbook/ Errata Test data Other courses, papers, and a lot more Korean version is NOT recommended.

Read in English!

13

Conferences

  General conferences on text processing o ACL o COLING o CICLing o DEXA (databases) o NLDB Confs on IR o ACM SIGIR o TREC o SPIRE 14

    User Information Need o Vague o Semantic, not formal Document Relevance o Order, not retrieve Huge amount of information o Efficiency concerns o Tradeoffs Art more than science

Conclusions

15

Thank you!

Till September 25

16