Information Retrieval Lab

Download Report

Transcript Information Retrieval Lab

Information Retrieval Lab
DiSCo – University of Milan Bicocca
Viale Sarca 336 U14
Head: Prof. Gabriella Pasi
IR Lab people
• Gabriella Pasi
Associate Professor and Head of the Laboratory
• Silvia Calegari
Post-doc DISCO
• Stefania Marrara
Post-doc UNIMI
• Célia Cristina Pereira
Post-doc UNIMI
IR Lab numbers
• Small but active!
▫
▫
▫
▫
Two people (since January 2009)
Two external collaborators (since 2008)
Three workplaces for Students and Collaborators
About 60 articles in proceeding of international
conferences and in international journals in the
last three years
▫ 4-5 master students per year
The IR Lab in brief
The Information Retrieval Group (IRG) was established in 2005 at
DiSCo, University of Milan Bicocca.
FOCUS: as the amount of information available on the Web has
enormously increased in last years, there is need of effective
systems that allow an easy and flexible access to information
relevant to specific user’s needs. By flexibility is here meant the
capability of the system to both manage imperfect (vague and/or
uncertain) information, and to personalise its behaviour to
the user context.
AIM: the research activity undertaken by the IRG is
aimed at defining models and techniques that improve the
limitations of current systems for the Information Access to the
main aim of offering personalised and flexible solutions to the
problem of locating information relevant to specific user’s needs.
Research in IR:
some main issues
• Improving indexing
text representation is usually based on keywords extraction and
weighting
▫ how to improve document representations?
 Conceptual indexing based on the use of conceptual structures
 Latent semantic indexing
 Metadata and the Semantic Web
• Modeling user preferences in query formulation
usually based on selection criteria specified by terms
▫ how to formulate queries that capture real users’ needs?
 Modeling the user’s context
 Accounting for vagueness
 Defining mechanism for query reformulation, relevance feedback
Research in IR:
some main issues
• Improving relevance estimate
usually based on a measure of topicality, more recently Popularity (in search
engines)
▫ It should be based on additional criteria:




Novelty
Trust in information sources
Timeliness
Contextual information (geographic location, date, author, etc…)
▫ It should be learnt on the basis of users needs/behavior
 Application of machine learning techniques
 Query reformulation
• Text classification
• Text summarization
IR Lab Activity
• Research areas:
▫
▫
▫
▫
▫
Information Retrieval
Information Filtering
Document Clustering
Personalization
XML Retrieval
• Application Domains:
▫ Large document repositories
▫ World Wide Web
Ongoing and future research
• Definition of conceptual approaches to IR.
• Definition of flexible query languages for semistructured documents (XML).
• Definition of models for multi-dimensional relevance
assessment
• Definition of text clustering techniques
• Definition of techniques for assessment of text quality
and their use for relevance assessment
• Web Service Retrieval
Personalized Information Access
• Personalization is the process of customizing search results
according to the user’s interests and context.
• Approach: generation of user-tailored ontologies
• Aim: to model and learn the user context to
personalize the search process at distinct levels:
• Document indexing
• Query formulation
• Relevance assessment
XML Retrieval
• In XML collections it is important to retrieve documents
based on users’ constraints on both documents’ content
and structure.
• Approach: 1) application of fuzzy set theory to define
flexible extensions of existing XML query languages. 2)
definition of ad hoc indexing strategies.
• Aim: to propose advanced solutions for storing,
managing and retrieving semi-structured documents.
Projects
• Past.
▫ STREP Project: PENG (Personalised News Content Programming) (Gabriella
Pasi, Project Coordinator) (2004 – 2006)
• Submitted
▫ PRIN. Title: What, Where, When? (W3?): Recommendation of Information
concerning specific topics and spatio-temporal contexts characterized by dynamicity and
imprecision.
▫ FIRB. Title: A Cloud Service Stack for Personalized Semantic Information
Retrieval.
▫ Spanish Project: High Performance processing for large data sets represented as
Graphs (HIPERGRAPH) (Principal Investigator: Ricardo Baeza Yates – Yahoo!
Research)
▫ COST Action "Combining Soft Computing Techniques and Statistical Methods to
Improve Data Analysis Solutions", coordinated by ECSC (ONGOING)
Collaborations
At D.I.S.Co:
• Davide Ciucci
• Fabio Farina
• ITIS – SEQUOIAS (Information Quality; Web Service
Retrieval)
External Collaborations:
•
•
•
•
•
CNR – IDPA, Italy
European Center for Soft Computing (ECSC), Spain
IRIT – Toulouse, France
Iona College, NY, USA
Università La Coruna, Spain
Conferences and Events
• Organization of:
▫ The 2009 IEEE / WIC / ACM International Conferences on Web
Intelligence (WI'09) and Intelligent Agent Technology (IAT'09), Milano,
Italy, 15-18 September 2009
▫ International Workshop on “Managing Vagueness and Uncertainty in the
Semantic Web (VUSW’09)”, Milano, Italy, 15 September 2009
▫ Program Chair of the International conference RIAO 2010, Paris
▫ Poster co-chair of ACM SIGIR 2010
-----------------------
• Some Past Events (since 2005)
▫
▫
▫
▫
▫
"Special Track on Information Access and Retrieval Systems”, within the “ACM
Symposium on Applied Computing”, (Fortaleza, Ceará, Brazil, March 16 - 20, 2008, Dijon France
March 2006, Santa Fe - New Mexico 13-17 March 2005, Cyprus 14-17 March 2004, Melbourne Florida 9-12 March 2003, Madrid 10-14 March 2002). IAR2008
International Workshop on Fuzzy Logic and Applications (WILF 2007), Hotel
Portofino Kulm, Portofino Vetta - Ruta di Camogli, Genova (Italy) - July 7-10, 2007
PhD School on Web Information Retrieval, WebBar 2007 Varenna, Italy, 26th August1st September 2007.
Seventh International Conference on Flexible Query Answering Systems (FQAS
2006), Milano, 2-10 June 2006.
“3rd International Summer School on Aggregation Operators”, Università della
Svizzera Italiana (USI-Lugano), Lugano, 10-15 July 2005
Publications from 2005 …some numbers
• Papers in International Journals: 22
• Special Issues in International Journals: 4
• Edited Volumes: 3
• Chapters of International Books: 10
• Proceedings for International Conferences: 40