3. Probabilistic Model.
The probabilistic model was formulated by Stephen Robertson and Sparck Jones in 1977. To begin to define this model we have to establish the IR process as inherently imprecise.
This model works as follows. A user makes a query to the system looking for a certain information then the model estimates the probability that the documents accessible by it are relevant to that query. If we consider the query as
C and any document
D We could define the probability as P(
dC ) .
The model attempts to obtain a set of relevant documents (called
R ), which should maximize the probability of relevance. A document is considered relevant if its probability of being relevant, P(rel)(
dC ), is greater than the probability of not being relevant, P(not rel)(
)
The probabilistic model is based on a feedback process. This process begins with a first set of relevant documents, which is gradually recalculated based on the information provided by the user of those documents that he considers relevant and not relevant.
3.1 Advantages
- Provides an ordering of documents based on their probability of relevance
3.2 Disadvantages
- The need to start the model from a first estimate of the set of relevant documents
- The number of times each term appears in a document is not taken into account when estimating its probability of relevance.
- The results are not much better than those obtained in the Boolean model.