Special Topics in Computer Science Advanced Topics in Information Retrieval Chapter 1: Introduction Alexander Gelbukh www.Gelbukh.com.
Download ReportTranscript Special Topics in Computer Science Advanced Topics in Information Retrieval Chapter 1: Introduction Alexander Gelbukh www.Gelbukh.com.
Special Topics in Computer Science Advanced Topics in Information Retrieval Chapter 1: Introduction Alexander Gelbukh www.Gelbukh.com Motivation First for libraries, but now — WWW!!! Info: representation, storage, organization, access Search Engines (IR systems) User information need o Plain English description query Concerns of modern IR: o o o o modeling classification, categorization, filtering system architecture user interfaces, visualization, query languages 2 Data vs. Information Retrieval Data Retrieval Information Retrieval Precise description Vague information need Well-structured data Natural Language, images, ... Semantic interpretation Precise results Approximate results Yes-or-no results Relevance ranking Science Art! 3 Basic Concepts User task (search) o Can formulate what they need: Retrieval (classical) o Can’t (or does not know): Browsing (new to IR) Still not very well integrated o Filtering (user passive, contents active) Logical view of docs Slow, good o ... Added linguistic info... not clear if helps o Full text o Text operations: reduce complexity to index terms Keywords, stopwords Stemming, noun groups (linguistic processing needed) o Categories Fast, bad 4 Past, Present, and Future Since clay tablets o Alphabetical index (formal) o Table of Contents (by storing order) o Classifications (by meaning) Libraries o Automation of classical techniques. Catalogs. o Search by fields (exact match: author, title, keywords) Web & Digital Libraries: interactive o Cheaper huge amount of data o Networks remote access, wider audience o Free publishing unprepared, heterogeneous data Artificial Intelligence and Linguistic methods 5 Main concerns Open audience o Help people to formulate their information need o Improve retrieval quality. Intelligent methods Efficiency (speed) o Development of fast techniques Interaction o Watch user behavior to improve quality o Privacy! Open content o Legal issues. Copyright. Responsibility for info quality o Intelligent methods 6 Retrieval process Database o Define the logical view: text operations, text model Index (e.g., inverted file) User query o Query operations (users are not good at this!) Retrieved docs o Ranked by likelihood (relevance) Feedback cycle 7 The Textbook: Text IR Models and Evaluation o Modeling (basic concepts) o Retrieval Evaluation Improvements on Retrieval o o o o Query Languages Query Operations Text Languages and Properties Text Operations Efficiency o Indexing and Searching 9 Conferences & Journals Confs on IR o o o o IR ACM SIGIR TREC SPIRE Journal o IR General conferences on text processing o o o o o ACL COLING CICLing DEXA (databases) NLDB 10 Conclusions User Information Need o Vague o Semantic, not formal Document Relevance o Order, not retrieve Huge amount of information o Efficiency concerns o Tradeoffs IR is art more than science 11 Thank you! 12