UCAIR Project Xuehua Shen, Bin Tan, ChengXiang Zhai

Download Report

Transcript UCAIR Project Xuehua Shen, Bin Tan, ChengXiang Zhai

UCAIR Project

Xuehua Shen, Bin Tan, ChengXiang Zhai http://sifaka.cs.uiuc.edu/ir/ucair/

• •

Motivation Progress

–

Framework

–

Model

–

System

–

Evaluation

•

Road ahead

–

Continuous work

–

New direction

Outline

Problem of Context-Independent Search

Jaguar

Apple Software Animal Car Chemistry Software 3

Put Search in Context

Apple software

Query History Clickthrough Other Context Info: Dwelling time Mouse movement Hobby …

• •

Motivation Progress

–

Framework

–

Model

–

System

–

Evaluation

•

Road ahead

–

Continuous work

–

New direction

Outline

A Decision Theoretic Framework

•

Model interactive IR as “action dialog”: cycles of user action and system response User action System response Submit a new query Retrieve new documents View a document Rerank document

A Decision Theoretic Framework (cont.)

•

Search optimal system response given a new user action

r t

*  arg min m t *

) 

 arg max

( , , ) (

| , ,

r t

*  arg min 

) ( , ,

t t

* )

, , ,

 1 )

dm t

 1 ) 7

User Models

•

Components of user model

–

User information need

–

User viewed documents

–

User actions

A t

and system responses

R t-1

–

…

M



,

 1

)

Loss Functions

•

Loss function for result reranking

 

i k

  1 |

) ( •

Loss function for query expansion

*  arg min

r t L a r m t

= arg min 

f q L q f q m t

, ) 9

Implicit User Modeling

•

Update user information need given a new query

•

Learn better user models given skipped top n documents and viewed the (n+1)-th document

 

 ) 1

k i k

  1

s i

• •

Motivation Progress

–

Framework

–

Model

–

System

–

Evaluation

•

Road ahead

–

Continuous work

–

New direction

Outline

Four Contextual Language Models

User Information Need

User Query

1,1

, C

1,2

1,3 ,

…} e.g., Apple software C

User Clickthrough

… {C

2,1

, C

2,2

2,3 ,

… } e.g., Apple - Mac OS X The Apple Mac OS X product page. Describes features in the current version of Mac OS X, a screenshot gallery, latest software downloads, and a directory of

...

How to model and use all the information?

Retrieval Model

Basis: Unigram language model + KL divergence Contextual search: query model update using user

query and clickthrough history

Q k D p p w k

  

Q k k

Similarity Measure

( (  

Q k k

|| ||   ) )

Results

θ D



Q k Q k

 1 )

1 ,

C k

 1 ,...

1 )

Query History Clickthrough

Fixed Coefficient Interpolation (FixInt)

) 

1  1

i k

  1

) C

1 …

k-1

Average user query history and clickthrough

) 

1  1

i k

  1

) Q

1 …

k-1



H C



H Q

 1   )   



1   Linearly interpolate history models 



) Q

Linearly interpolate current query and history model 

)  

 14 )

Bayesian Interpolation (BayesInt)

) 

1  1

i k

  1 Intuition: if the current query

Q k i

) C

1 …

k-1



we should trust

C Q k

more is longer, Average user query and clickthrough history 

) 

1  1

i k

  1

) Q

1 …

k-1



H Q

 

Dirichlet Prior Q

| 

)  = |

Q k

| |

Q k

|   |

)   |

Q k

  )    

)  |

Q k

  |   [ 

) |

)     |

)] 15

Online Bayesian Update (OnlineUp)

Intuition: continuous belief update about user information need  1

 1 ' Q

2 p w



'   2 

c w C

  |

C i

| 

( 

v



)  C

 2 '

)   |

Q i

|   

'  1 ) 

'  1  Q



Batch Bayesian Update (BatchUp)

 1 Q

Intuition: clickthrough data may not decay  

) 

)   |

Q i

|   

 1 )  2 C

C C

2 k-1



 1  Q



' )  

 

 1

 1  '

j k

)   

i j

 1  1 |

C j

|   

) 17

• •

Motivation Progress

–

Framework

–

Model

–

System

–

Evaluation

•

Road ahead

–

Continuous work

–

New direction

Outline

UCAIR Toolbar Architecture (http://sifaka.cs.uiuc.edu/ir/ucair/download.html) Query Modification UCAIR query User Search Engine (e.g., Google) User Modeling Result Re-Ranking Search History Log (e.g.,past queries, clicked results) clickthrough… results Result Buffer

System Characteristics

•

Client side personalization

–

Privacy

–

Distribution of computation

–

More clues about the user

• •

Implicit user modeling Bayesian decision theory and statistical language model

User Actions

• • • •

Submit a keyword query View a document Click the “Back” button Click the “Next” link

System Responses

•

Decide relatedness of neighboring queries and do query expansion

• •

Update user model according to clickthrough Rerank unseen documents

• •

Motivation Progress

–

Framework

–

Model

–

System

–

Evaluation

•

Road ahead

–

Continuous work

–

New direction

Outline

TREC Style Evaluation – Data Set

• • • •

Data collection: TREC AP88-90 Topics: 30 hard topics of TREC topics 1-150 System: search engine + RDBMS Context: Query and clickthrough history of 3 participants ( http://sifaka.cs.uiuc.edu/ir/ucair/QCHistory.zip

)

Experiment Design

•

Models: FixInt, BayesInt, OnlineUp and BatchUp

• •

Performance Comparison: Q

vs. Q

k +H Q +H C

Evaluation Metrics: MAP and Pr@20 docs

Overall Effect of Search Context

Query

Q 3 Q 3 +H Q +H C

Improve

Q 4 Q 4 +H Q +H C

Improve FixInt (  =0.1,  =1.0) BayesInt (  =0.2,  =5.0) MAP 0.0421

0.0726

72.4%

0.0536

0.0891

66.2%

pr@20 MAP 0.1483 0.0421

0.1967 0.0816

32.6% 93.8%

0.1933 0.0536

0.2233 0.0955

15.5% 78.2%

pr@20 0.1483

0.2067

39.4%

0.1933

0.2317

19.9%

OnlineUp (  =5.0,  =15.0) MAP 0.0421

0.0706

67.7%

0.0536

0.0792

47.8%

BatchUp (  =2.0,  =15.0) pr@20 0.1483

0.1783

20.2%

0.1933

0.2067

6.9%

MAP pr@20 0.0421 0.1483

0.0810 0.2067

92.4% 39.4%

0.0536 0.1933

0.0950 0.2250

77.2% 16.4%

•

Interaction history helps system improve retrieval accuracy

• BayesInt better than FixInt; BatchUp better than OnlineUp 26

Using Clickthrough Data Only

Clickthrough data can

Query MAP pr@20

improve retrieval accuracy

Q 3

0.0331

0.125

of unseen relevant docs

Q 3 +H C

0.0661

0.178

Query MAP pr@20 Improve

99.7% 42.4%

Q 3 Q 3 +H C

Improve 0.0421

0.0766

81.9%

0.1483

0.2033

37.1%

Q 4 Q 4 +H C

Improve 0.0442

0.0739

67.2%

0.165

0.188

13.9%

Q 4 Q 4 +H C

0.0536

0.0925

0.1930

0.2283

Improve

72.6% 18.1% BayesInt (



=0.0,



=5.0)

Query

Q 3 Q 3 +H C

MAP 0.0421

0.0521

pr@20 0.1483

0.1820

Improve

23.8% 23.0% Clickthrough data corresponding to non relevant docs are useful for feedback

Q 4 Q 4 +H C

Improve 0.0536

0.0620

15.7%

0.1930

0.1850

-4.1%

Sensitivity of BatchUp Parameters

Sensivitiy of mu in BatchUp Model

Q2+Hq+Hc Q3+Hq+Hc Q4+Hq+Hc 0.1

0.08

0.06

0.04

0.02

0 0 1 2 3 4 5

6 7 8 9 10 0.1

0.08

0.06

0.04

0.02

Sensivity of nu in BatchUp Model

Q2+Hq+Hc Q3+Hq+Hc Q4+Hq+Hc 0 1 2 5 10

15 30 100 300 500 • •

BatchUp is stable with different parameter settings Best performance is achieved when



=2.0;



=15.0

A User Study of Personalized Search

• •

Six participants use UCAIR toolbar to do web search

•

Topics are selected from TREC web track and terabyte track Participants explicitly evaluate the relevance of top 30 search results from Google and UCAIR

Precision at Top N Documents

Ranking Method prec@5 prec@10 prec@20 prec@30 Google 0.538

0.472

0.377

0.308

UCAIR 0.581

Improveme nt

8.0%

0.556

17.8%

0.453

0.375

20.2% 21.8%

More user interaction, better user model and retrieval accuracy 30

Precision-Recall Curve

• •

Motivation Progress

–

Framework

–

Model

–

System

–

Evaluation

•

Road ahead

–

Continuous work

–

New direction

Outline

Decision Theoretic Framework

•

User model

–

Include more factors (e.g., readability)

–

Represent information need in a multi-theme way

–

Learn user model from data accurately

–

Compute user model efficiently

• •

Loss function goes beyond relevance Short-term context synergize with long-term context

Retrieval Models

•

Bridge existing retrieval models and decision theoretic framework (same for active feedback work)

•

Deduce new retrieval models from decision theoretic framework

•

Find effective and efficient retrieval models

Retrieval Models (cont.)

• •

Study specific parameter settings for personalized web search (e.g., ranking of snippets) Utilize context information in finer-granularity (e.g., query relationship and relative judgment of clickthrough data)

System

• • •

Make system more robust and more efficient Enrich user profile (bookmark, local files, etc.) Study user interface design

–

How many results are personalized

–

Aggressive vs. conservative personalization

–

Result representation

–

…

•

Study session boundary detection algorithms

System (cont.)

Add new features into UCAIR toolbar

–

Incorporate clustering into the system

–

Predict user preference based on non-textual features (e.g. website, document format)

•

Analyze logs

–

Simple statistics

–

Query similarity in a community

•

Distribute the toolbar

Evaluation

•

Build an evaluation data set for contextual search (utilize TREC interactive track)

•

Make a large scale user study of contextual search

• • •

Study privacy issue of UCAIR toolbar Study how to share user logs When will personalization be more effective than non-personalization and vice versa

• •

Motivation Progress

–

Framework

–

Model

–

System

–

Evaluation

•

Road ahead

–

Continuous work

–

New direction

Outline

Application

•

Apply techniques in different domains

–

Personalized tutoring system

–

Personalized bioinfo system

•

Collaborative filtering application

–

Goodies for connecting people

–

Social network?

•

Combination of client and server for personalization

Personalization is a dead end

by CEO (Raul Valdes-Perez ) of Vivisimo in Nov., 2004

• • • • •

People are not static Surfing data is weak Whole web page is misleading Home computers are shared by family members Query is short Best personalization is done by individuals themselves Vivisimo way: Clustering, then user explore themselves

•

Personalization is the Holy Grail for search co-founder of Yahoo! (Jerry Yang ) in March, 2005 One size does fit not all CNN report

[Yang] also said that the key challenge for Yahoo! and all search companies going forward will be to find ways to increased the personalization of results, i.e. making sure that a user truly finds what he or she is looking for when typing in a keyword search.

"The relevance of search is still the Holy Grail for any search application," Yang said. 42

UCAIR Project Xuehua Shen, Bin Tan, ChengXiang Zhai

Transcript UCAIR Project Xuehua Shen, Bin Tan, ChengXiang Zhai

UCAIR Project

Outline

Problem of Context-Independent Search

Put Search in Context

Outline

A Decision Theoretic Framework

User Models

M

,

)

Loss Functions

Implicit User Modeling

Outline

Four Contextual Language Models

Retrieval Model

Fixed Coefficient Interpolation (FixInt)

Bayesian Interpolation (BayesInt)

Online Bayesian Update (OnlineUp)

v

Batch Bayesian Update (BatchUp)

Outline

System Characteristics

User Actions

System Responses

Outline

TREC Style Evaluation – Data Set

Experiment Design

Overall Effect of Search Context

Using Clickthrough Data Only

Sensitivity of BatchUp Parameters

A User Study of Personalized Search

Precision at Top N Documents

Precision-Recall Curve

Outline

Decision Theoretic Framework

Retrieval Models

Retrieval Models (cont.)

System

System (cont.)

Evaluation

Outline

Application

Personalization is a dead end

The End

Thank you !

Directory