No Slide Title

Download Report

Transcript No Slide Title

Boolean Model
1
Boolean Model
• A document is represented as a set of
keywords.
• Queries are Boolean expressions of
keywords, connected by AND, OR, and NOT,
including the use of brackets to indicate
scope.
– [[Rio & Brazil] | [Hilo & Hawaii]] & hotel & !Hilton]
• Output: Document is relevant or not. No
partial matches or ranking.
2
Boolean Model
• Simple model based on set theory;
• Queries specified as Boolean expressions:
– precise semantics;
– neat formalism;
– q = ka  (kb  kc).
• Terms are either present or absent. Thus,
wij  {0,1};
• Consider:
– q = ka  (kb  kc)
– vec(qdnf) = (1,1,1)  (1,1,0)  (1,0,0)
– vec(qcc) = (1,1,0) is a conjunctive component.
3
Boolean Model
Ka
• q = ka  (kb  kc)
(1,0,0)
Kb
(1,1,0)
(1,1,1)
Kc
• sim(q,dj) = 1 if
 vec(qcc) |
(vec(qcc)  vec(qdnf)) 
(ki, gi(vec(dj)) = gi(vec(qcc)))
0 otherwise
4
Boolean Retrieval Model
• Popular retrieval model because:
– Easy to understand for simple queries.
– Clean formalism.
• Boolean models can be extended to
include ranking.
• Reasonably efficient implementations
possible for normal queries.
5
Boolean Models  Problems
• Retrieval based on binary decision criteria
with no notion of partial matching;
• No ranking of the documents is provided
(absence of a grading scale);
• Very rigid: AND means all; OR means any.
• Information need has to be translated into a
Boolean expression which most users find
awkward;
• The Boolean queries formulated by the users
are most often too simplistic;
• It is difficult to express complex user
requests.
6
Boolean Models  Problems
• As a consequence, Boolean model frequently
returns either too few or too many documents
in response to a user query.
• Difficult to control the number of documents
retrieved.
– All matched documents will be returned.
• Difficult to rank output.
– All matched documents logically satisfy the query.
• Difficult to perform relevance feedback.
– If a document is identified by the user as relevant
or irrelevant, how should the query be modified?
7