Retrieval Models


To overview and compare the characteristics of well-known retrieval models we have developed an interactive map, shown below. By clicking on a model acronym in the map a short description of the respective retrieval model is displayed on the right-hand side. In our map a retrieval model is either empirical, probabilistic, or of language model type. Below a model's acronym you find a code in the form of a quadrupel, [1 2 3 4], which hints the model's characteristics along four dimensions: (1) Feature type, which defines the basic principle to capture a document's content; possible values include document terms [T], latent or explicit taxonomic concepts [C], or an (often NLP-based) method yielding special [S] features. (2) Foundation of the Retrieval status value (RSV) computation; possible values include feature vector similarity [φ], relevance [ρ] assessment, or the ability of a document to generate [γ] a query. (3) Dependency on a Closed world; possible values are open [∪], where the document collection need not to be completely given, and closed [∩], where the collection must be completely given to compute global characteristics. (4) External knowledge, if used at all; possible values include none [∅], user feedback [✓], e.g. for relevance assessment purposes, and an additional [+] document collection, e.g., for computing collection-relative document similarities. Our scheme is not intended to exactly differentiate between all particularities of a model, but shall pinpoint retrieval model strengths and weaknesses. If you find it useful, if you have hints for its improvement, or if you detect incorrect statements please drop us a mail. Finally, we kindly ask you to refer to the overview using the related publication below.

pLSI MixtUnigram LDA LM BeliefNet BestMatch Inquery BII BIM 2-Poisson ProbIndex ESA CL-ESA WebGenre DivRand SuffixTree Genre LSI GVSM FuzzySet VSM Boolean
Legend [1 2 3 4]
(1) Feature type
  • Tterms
  • Cconcepts
  • Sspecial
(2) RSV computation
  • φsimilarity
  • ρrelevance
  • γgeneration
(3) Closed world
  • open collection
  • closed collection
(4) External knowledge
  • none
  • user feedback
  • +additional collection