Incorporating Relevance Feedback
for Information-Seeking Retrieval
using Few-Shot Document Re-Ranking

Re-ranking search results using a handful of user-selected relevant documents, via few-shot learning and meta-learning.

1 UKP Lab, TU Darmstadt & hessian.AI 2 cohere.ai

TL;DR

Neural re-ranking models ignore relevance feedback that users naturally provide through clicks and explicit judgments. We integrate this feedback directly into re-rankers using kNN similarity, per-query fine-tuning, and MAML meta-learning—requiring only a handful of labeled documents. Fusing our best neural re-ranker with lexical retrieval outperforms all baselines by 5.2 nDCG@20 across four IR benchmarks.

Pairing a lexical retriever with a neural re-ranking model has set state-of-the-art performance on large-scale information retrieval datasets. This pipeline covers scenarios like question answering or navigational queries, however, for information-seeking scenarios, users often provide information on whether a document is relevant to their query in form of clicks or explicit feedback. Therefore, in this work, we explore how relevance feedback can be directly integrated into neural re-ranking models by adopting few-shot and parameter-efficient learning techniques. Specifically, we introduce a kNN approach that re-ranks documents based on their similarity with the query and the documents the user considers relevant. Further, we explore Cross-Encoder models that we pre-train using meta-learning and subsequently fine-tune for each query, training only on the feedback documents. To evaluate our different integration strategies, we transform four existing information retrieval datasets into the relevance feedback scenario. Extensive experiments demonstrate that integrating relevance feedback directly in neural re-ranking models improves their performance, and fusing lexical ranking with our best performing neural re-ranker outperforms all other methods by 5.2 nDCG@20.

Method Overview

Our pipeline has three stages: retrieve an initial set of documents and collect user feedback, then use that feedback for both query expansion and re-ranker fine-tuning, and finally fuse the lexical and neural rankings.

1 Retrieval & Relevance Feedback
Query q
BM25 Retrieval
Top-1000 Docs
Feedback
2 Query Expansion & Re-Ranker Fine-Tuning
Retrieval Track
Query q + Feedback R
Query Expansion
BM25-QE Results
Re-Ranking Track
2k Feedback Docs
Few-Shot Fine-Tuning
Re-Ranker (CE)
3 Re-Ranking & Fusion
BM25-QE
+
CE / kNN
RRF
Final

How We Integrate Feedback

kNN Re-Ranking Similarity-based, no training

d₁d₂Query qFeedback R⁺High scoreLow score

Score each candidate by cosine similarity to the query and relevant feedback documents. No training needed—uses precomputed MiniLM embeddings.

si = f(di, q) + ∑dj ∈ R+ f(di, dj)(1)
Key properties
  • Zero weight updates—extremely fast at inference
  • Scores against k+1 individual points (like Prototypical Networks)
  • +2.6 nDCG@20 when fused with BM25-QE

CE MAML + Query FT Meta-learned Cross-Encoder

MAML TRAINING T₁: Query q₁ + 2k feedback docs Inner-loop adaptation · BCE loss · bias only T₂: Query q₂ + 2k feedback docs Outer-loop meta-update · tests generalization INNER LOOP (Eq. 2) OUTER LOOP (Eq. 3) θ adapt on T₁ θ' eval on T₂ θ″ = θ − α∇ℒ(gθ'; T₂) inner loop outer loop meta-gradient INFERENCE (PER NEW QUERY) θ″ trained init New query q + 2k feedback 1 step θ' adapted Re-rank documents

Meta-train a Cross-Encoder with MAML so it adapts to any new query in one gradient step using only the feedback docs.

θ′ = θ − α ∇θ ℒ(gθ; T1)(2)
θ″ = θ − α ∇θ ℒ(gθ′; T2)(3)
Key properties
  • Only 0.11% of parameters updated (bias layers)
  • +0.5 nDCG@20 over supervised pre-training
  • Best overall: 0.4973 avg. nDCG@20 with BM25-QE fusion

Key Contributions

Learn from Few Feedback Docs

Fine-tune a re-ranker per query using only 4–16 feedback documents—no large labeled dataset required.

Speed vs. Accuracy Trade-Off

kNN re-ranking needs no training and runs instantly from precomputed embeddings. CE MAML requires meta-learning but achieves the best performance.

Outperforms All Baselines

Fusing neural re-ranking with BM25 query expansion yields +5.2 nDCG@20 over the strongest individual method.

Datasets

Dataset Domain Description Docs Queries
Robust04 News News articles, complex topics 528k 148
TREC-Covid Biomedical COVID-19 research paper retrieval from CORD-19 191k 50
TREC-News News Washington Post background linking task 595k 34
Webis-Touché Debates Argument retrieval from web debates 383k 49

Results

+5.2 nDCG@20

Our best model (Rank Fusion: CE MAML + Query FT & BM25-QE) outperforms BM25 with query expansion by +5.2 nDCG@20 across four datasets by fusing neural re-ranking with relevance feedback.

Method Robust04 TREC-Covid TREC-News Touché Avg.
BM25-QE
BM25-QE
0.4964 0.6106 0.3924 0.2714 0.4427
kNN
kNN
0.4433 0.6863 0.3652 0.1749 0.4175
CE Zero-Shot
CE
0.4152 0.7028 0.3143 0.1766 0.4022
CE Query-FT
CEQuery FT
0.4846 0.7231 0.3350 0.1981 0.4352
CE MAML + Query FT
CEMAMLQuery FT
0.5060 0.7359 0.3147 0.2235 0.4450
RF: kNN & BM25-QE
kNNBM25-QEFusion
0.5076 0.7077 0.4123 0.2482 0.4689
RF: CE MAML + QFT & BM25-QE
CEMAMLBM25-QEFusion
0.5707 0.7402 0.4055 0.2727 0.4973

nDCG@20 test set results averaged over three seeds and feedback documents k ∈ {2, 4, 8}. Best per column in bold. RF = Rank Fusion, QFT = Query Fine-Tuning.

Getting Started

1 Setup Environment
# Create and activate virtual environment
python -m venv .venv
source .venv/bin/activate

# Install dependencies
make install
2 Download & Index Data
# Download datasets (Robust04, TREC-Covid, TREC-News, Touché)
make download

# Index corpora in Elasticsearch (starts Docker if available)
make index
3 Run Experiments
# kNN re-ranking
make knn args="--dataset robust04"

# Cross-Encoder with Query Fine-Tuning
make args="--dataset robust04 --num_samples 4"

# CE MAML + Query Fine-Tuning
make args="--dataset robust04 --num_samples 4 --meta_learning"

# Rank Fusion
make args="--dataset robust04 --num_samples 4 \
  --result_files results/knn.json results/ce.json"

Full argument reference in inc_rel/args.py. Requires Python 3.10+ and Elasticsearch (Docker recommended).

Citation

If you find this work useful, please cite our paper:

@inproceedings{baumgartner-etal-2022-incorporating,
    title = "{{Incorporating Relevance Feedback for Information-Seeking
             Retrieval using Few-Shot Document Re-Ranking}}",
    author = {Baumg{\"a}rtner, Tim  and
      Ribeiro, Leonardo F. R.  and
      Reimers, Nils  and
      Gurevych, Iryna},
    booktitle = "Proceedings of the 2022 Conference on Empirical Methods
                 in Natural Language Processing",
    month = dec,
    year = "2022",
    address = "Abu Dhabi, United Arab Emirates",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2022.emnlp-main.614",
    pages = "8988--9005",
}