UCAIR Project Xuehua Shen, Bin Tan, ChengXiang Zhai

Download Report

Transcript UCAIR Project Xuehua Shen, Bin Tan, ChengXiang Zhai

UCAIR Project

Xuehua Shen, Bin Tan, ChengXiang Zhai http://sifaka.cs.uiuc.edu/ir/ucair/

• •

Motivation Progress

Framework

Model

System

Evaluation

Road ahead

Continuous work

New direction

Outline

2

Problem of Context-Independent Search

Jaguar

Apple Software Animal Car Chemistry Software 3

Put Search in Context

Apple software

Query History Clickthrough Other Context Info: Dwelling time Mouse movement Hobby …

4

• •

Motivation Progress

Framework

Model

System

Evaluation

Road ahead

Continuous work

New direction

Outline

5

A Decision Theoretic Framework

Model interactive IR as “action dialog”: cycles of user action and system response User action System response Submit a new query Retrieve new documents View a document Rerank document

6

A Decision Theoretic Framework (cont.)

Search optimal system response given a new user action

r t

*  arg min m t *

t

) 

M

 arg max

m

( , , ) (

t

| , ,

r t

*  arg min 

t

) ( , ,

t t

* )

t

, , ,

t

,

t

 1 )

dm t

 1 ) 7

User Models

Components of user model

M

User information need

x

User viewed documents

S

User actions

A t

and system responses

R t-1

M

t

,

t

 1

)

8

Loss Functions

Loss function for result reranking

 

i k

  1 |

i

) ( •

Loss function for query expansion

r

*  arg min

r t L a r m t

= arg min 

f q L q f q m t

|

i

, ) 9

Implicit User Modeling

Update user information need given a new query

Learn better user models given skipped top n documents and viewed the (n+1)-th document

x

 

q

 ) 1

k i k

  1

s i

10

• •

Motivation Progress

Framework

Model

System

Evaluation

Road ahead

Continuous work

New direction

Outline

11

Four Contextual Language Models

?

User Information Need

Q

1

User Query

{C

1,1

, C

1,2

,C

1,3 ,

…} e.g., Apple software C

1

User Clickthrough

Q

2

… {C

2,1

, C

2,2

,C

2,3 ,

… } e.g., Apple - Mac OS X The Apple Mac OS X product page. Describes features in the current version of Mac OS X, a screenshot gallery, latest software downloads, and a directory of

...

C

2

Q

k

How to model and use all the information?

12

Retrieval Model

Basis: Unigram language model + KL divergence Contextual search: query model update using user

U

query and clickthrough history

Q k D p p w k

  

θ

'

Q k k

Similarity Measure

D

( (  

Q

'

Q k k

|| ||   ) )

Results

θ D

Q k Q k

 1 )

Q

1 ,

C k

 1 ,...

,

C

1 )

Query History Clickthrough

13

Fixed Coefficient Interpolation (FixInt)

C

) 

k

1  1

i k

1

i

  1

i

) C

1 …

C

k-1

Average user query history and clickthrough

Q

) 

k

1  1

i k

1

i

  1

i

) Q

1 …

Q

k-1

H C

H Q

 1   )   

C

H

1   Linearly interpolate history models 

k

Q

) Q

k

Linearly interpolate current query and history model 

k

)  

k

 14 )

Bayesian Interpolation (BayesInt)

C

) 

k

1  1

i k

1

i

  1 Intuition: if the current query

Q k i

) C

1 …

C

k-1

H

we should trust

C Q k

more is longer, Average user query and clickthrough history 

Q

) 

k

1  1

i k

1

i

  1

i

) Q

1 …

Q

k-1

H Q

 

k

Dirichlet Prior Q

k

| 

k

)  = |

Q k

| |

Q k

|   |

k

)   |

Q k

|

Q

  )    

k

)  |

Q k

  |   [ 

C

) |

Q

)     |

C

)] 15

Q

1

Online Bayesian Update (OnlineUp)

Intuition: continuous belief update about user information need  1

v

C

1

 1 ' Q

2 p w

i

'   2 

c w C

  |

C i

| 

v

( 

v

i

)  C

2

 2 '

i

)   |

Q i

|   

i

'  1 ) 

k

'  1  Q

k

k

16

Q

1

Batch Bayesian Update (BatchUp)

 1 Q

2

Intuition: clickthrough data may not decay  

i

) 

i

)   |

Q i

|   

i

 1 )  2 C

1

C C

2 k-1

k

 1  Q

k

k

' )  

k

 

i

 1

j

 1  '

j k

)   

i j

 1  1 |

C j

|   

k

) 17

• •

Motivation Progress

Framework

Model

System

Evaluation

Road ahead

Continuous work

New direction

Outline

18

UCAIR Toolbar Architecture (http://sifaka.cs.uiuc.edu/ir/ucair/download.html) Query Modification UCAIR query User Search Engine (e.g., Google) User Modeling Result Re-Ranking Search History Log (e.g.,past queries, clicked results) clickthrough… results Result Buffer

19

System Characteristics

Client side personalization

Privacy

Distribution of computation

More clues about the user

• •

Implicit user modeling Bayesian decision theory and statistical language model

20

User Actions

• • • •

Submit a keyword query View a document Click the “Back” button Click the “Next” link

21

System Responses

Decide relatedness of neighboring queries and do query expansion

• •

Update user model according to clickthrough Rerank unseen documents

22

• •

Motivation Progress

Framework

Model

System

Evaluation

Road ahead

Continuous work

New direction

Outline

23

TREC Style Evaluation – Data Set

• • • •

Data collection: TREC AP88-90 Topics: 30 hard topics of TREC topics 1-150 System: search engine + RDBMS Context: Query and clickthrough history of 3 participants ( http://sifaka.cs.uiuc.edu/ir/ucair/QCHistory.zip

)

24

Experiment Design

Models: FixInt, BayesInt, OnlineUp and BatchUp

• •

Performance Comparison: Q

k

vs. Q

k +H Q +H C

Evaluation Metrics: MAP and Pr@20 docs

25

Overall Effect of Search Context

Query

Q 3 Q 3 +H Q +H C

Improve

Q 4 Q 4 +H Q +H C

Improve FixInt (  =0.1,  =1.0) BayesInt (  =0.2,  =5.0) MAP 0.0421

0.0726

72.4%

0.0536

0.0891

66.2%

pr@20 MAP 0.1483 0.0421

0.1967 0.0816

32.6% 93.8%

0.1933 0.0536

0.2233 0.0955

15.5% 78.2%

pr@20 0.1483

0.2067

39.4%

0.1933

0.2317

19.9%

OnlineUp (  =5.0,  =15.0) MAP 0.0421

0.0706

67.7%

0.0536

0.0792

47.8%

BatchUp (  =2.0,  =15.0) pr@20 0.1483

0.1783

20.2%

0.1933

0.2067

6.9%

MAP pr@20 0.0421 0.1483

0.0810 0.2067

92.4% 39.4%

0.0536 0.1933

0.0950 0.2250

77.2% 16.4%

Interaction history helps system improve retrieval accuracy

BayesInt better than FixInt; BatchUp better than OnlineUp 26

Using Clickthrough Data Only

Clickthrough data can

Query MAP pr@20

improve retrieval accuracy

Q 3

0.0331

0.125

of unseen relevant docs

Q 3 +H C

0.0661

0.178

Query MAP pr@20 Improve

99.7% 42.4%

Q 3 Q 3 +H C

Improve 0.0421

0.0766

81.9%

0.1483

0.2033

37.1%

Q 4 Q 4 +H C

Improve 0.0442

0.0739

67.2%

0.165

0.188

13.9%

Q 4 Q 4 +H C

0.0536

0.0925

0.1930

0.2283

Improve

72.6% 18.1% BayesInt (

=0.0,

=5.0)

Query

Q 3 Q 3 +H C

MAP 0.0421

0.0521

pr@20 0.1483

0.1820

Improve

23.8% 23.0% Clickthrough data corresponding to non relevant docs are useful for feedback

Q 4 Q 4 +H C

Improve 0.0536

0.0620

15.7%

0.1930

0.1850

-4.1%

27

Sensitivity of BatchUp Parameters

Sensivitiy of mu in BatchUp Model

Q2+Hq+Hc Q3+Hq+Hc Q4+Hq+Hc 0.1

0.08

0.06

0.04

0.02

0 0 1 2 3 4 5

mu

6 7 8 9 10 0.1

0.08

0.06

0.04

0.02

0

Sensivity of nu in BatchUp Model

Q2+Hq+Hc Q3+Hq+Hc Q4+Hq+Hc 0 1 2 5 10

nu

15 30 100 300 500 • •

BatchUp is stable with different parameter settings Best performance is achieved when

=2.0;

=15.0

28

A User Study of Personalized Search

• •

Six participants use UCAIR toolbar to do web search

Topics are selected from TREC web track and terabyte track Participants explicitly evaluate the relevance of top 30 search results from Google and UCAIR

29

Precision at Top N Documents

Ranking Method prec@5 prec@10 prec@20 prec@30 Google 0.538

0.472

0.377

0.308

UCAIR 0.581

Improveme nt

8.0%

0.556

17.8%

0.453

0.375

20.2% 21.8%

More user interaction, better user model and retrieval accuracy 30

Precision-Recall Curve

31

• •

Motivation Progress

Framework

Model

System

Evaluation

Road ahead

Continuous work

New direction

Outline

32

Decision Theoretic Framework

User model

Include more factors (e.g., readability)

Represent information need in a multi-theme way

Learn user model from data accurately

Compute user model efficiently

• •

Loss function goes beyond relevance Short-term context synergize with long-term context

33

Retrieval Models

Bridge existing retrieval models and decision theoretic framework (same for active feedback work)

Deduce new retrieval models from decision theoretic framework

Find effective and efficient retrieval models

34

Retrieval Models (cont.)

• •

Study specific parameter settings for personalized web search (e.g., ranking of snippets) Utilize context information in finer-granularity (e.g., query relationship and relative judgment of clickthrough data)

35

System

• • •

Make system more robust and more efficient Enrich user profile (bookmark, local files, etc.) Study user interface design

How many results are personalized

Aggressive vs. conservative personalization

Result representation

Study session boundary detection algorithms

36

System (cont.)

Add new features into UCAIR toolbar

Incorporate clustering into the system

Predict user preference based on non-textual features (e.g. website, document format)

Analyze logs

Simple statistics

Query similarity in a community

Distribute the toolbar

37

Evaluation

Build an evaluation data set for contextual search (utilize TREC interactive track)

Make a large scale user study of contextual search

• • •

Study privacy issue of UCAIR toolbar Study how to share user logs When will personalization be more effective than non-personalization and vice versa

38

• •

Motivation Progress

Framework

Model

System

Evaluation

Road ahead

Continuous work

New direction

Outline

39

Application

Apply techniques in different domains

Personalized tutoring system

Personalized bioinfo system

Collaborative filtering application

Goodies for connecting people

Social network?

Combination of client and server for personalization

40

Personalization is a dead end

by CEO (Raul Valdes-Perez ) of Vivisimo in Nov., 2004

• • • • •

People are not static Surfing data is weak Whole web page is misleading Home computers are shared by family members Query is short Best personalization is done by individuals themselves Vivisimo way: Clustering, then user explore themselves

41

Personalization is the Holy Grail for search co-founder of Yahoo! (Jerry Yang ) in March, 2005 One size does fit not all CNN report

[Yang] also said that the key challenge for Yahoo! and all search companies going forward will be to find ways to increased the personalization of results, i.e. making sure that a user truly finds what he or she is looking for when typing in a keyword search.

"The relevance of search is still the Holy Grail for any search application," Yang said. 42

The End

Thank you !

43