Re-ranking search results using a handful of user-selected relevant documents, via few-shot learning and meta-learning.
Neural re-ranking models ignore relevance feedback that users naturally provide through clicks and explicit judgments. We integrate this feedback directly into re-rankers using kNN similarity, per-query fine-tuning, and MAML meta-learning—requiring only a handful of labeled documents. Fusing our best neural re-ranker with lexical retrieval outperforms all baselines by 5.2 nDCG@20 across four IR benchmarks.
Our pipeline has three stages: retrieve an initial set of documents and collect user feedback, then use that feedback for both query expansion and re-ranker fine-tuning, and finally fuse the lexical and neural rankings.
Score each candidate by cosine similarity to the query and relevant feedback documents. No training needed—uses precomputed MiniLM embeddings.
Meta-train a Cross-Encoder with MAML so it adapts to any new query in one gradient step using only the feedback docs.
Fine-tune a re-ranker per query using only 4–16 feedback documents—no large labeled dataset required.
kNN re-ranking needs no training and runs instantly from precomputed embeddings. CE MAML requires meta-learning but achieves the best performance.
Fusing neural re-ranking with BM25 query expansion yields +5.2 nDCG@20 over the strongest individual method.
| Dataset | Domain | Description | Docs | Queries |
|---|---|---|---|---|
| Robust04 | News | News articles, complex topics | 528k | 148 |
| TREC-Covid | Biomedical | COVID-19 research paper retrieval from CORD-19 | 191k | 50 |
| TREC-News | News | Washington Post background linking task | 595k | 34 |
| Webis-Touché | Debates | Argument retrieval from web debates | 383k | 49 |
Our best model (Rank Fusion: CE MAML + Query FT & BM25-QE) outperforms BM25 with query expansion by +5.2 nDCG@20 across four datasets by fusing neural re-ranking with relevance feedback.
| Method | Robust04 | TREC-Covid | TREC-News | Touché | Avg. |
|---|---|---|---|---|---|
| BM25-QE BM25-QE |
0.4964 | 0.6106 | 0.3924 | 0.2714 | 0.4427 |
| kNN kNN |
0.4433 | 0.6863 | 0.3652 | 0.1749 | 0.4175 |
| CE Zero-Shot CE |
0.4152 | 0.7028 | 0.3143 | 0.1766 | 0.4022 |
| CE Query-FT CEQuery FT |
0.4846 | 0.7231 | 0.3350 | 0.1981 | 0.4352 |
| CE MAML + Query FT CEMAMLQuery FT |
0.5060 | 0.7359 | 0.3147 | 0.2235 | 0.4450 |
| RF: kNN & BM25-QE kNNBM25-QEFusion |
0.5076 | 0.7077 | 0.4123 | 0.2482 | 0.4689 |
| RF: CE MAML + QFT & BM25-QE CEMAMLBM25-QEFusion |
0.5707 | 0.7402 | 0.4055 | 0.2727 | 0.4973 |
nDCG@20 test set results averaged over three seeds and feedback documents k ∈ {2, 4, 8}. Best per column in bold. RF = Rank Fusion, QFT = Query Fine-Tuning.
# Create and activate virtual environment
python -m venv .venv
source .venv/bin/activate
# Install dependencies
make install
# Download datasets (Robust04, TREC-Covid, TREC-News, Touché)
make download
# Index corpora in Elasticsearch (starts Docker if available)
make index
# kNN re-ranking
make knn args="--dataset robust04"
# Cross-Encoder with Query Fine-Tuning
make args="--dataset robust04 --num_samples 4"
# CE MAML + Query Fine-Tuning
make args="--dataset robust04 --num_samples 4 --meta_learning"
# Rank Fusion
make args="--dataset robust04 --num_samples 4 \
--result_files results/knn.json results/ce.json"
Full argument reference in inc_rel/args.py. Requires Python 3.10+ and Elasticsearch (Docker recommended).
If you find this work useful, please cite our paper:
@inproceedings{baumgartner-etal-2022-incorporating,
title = "{{Incorporating Relevance Feedback for Information-Seeking
Retrieval using Few-Shot Document Re-Ranking}}",
author = {Baumg{\"a}rtner, Tim and
Ribeiro, Leonardo F. R. and
Reimers, Nils and
Gurevych, Iryna},
booktitle = "Proceedings of the 2022 Conference on Empirical Methods
in Natural Language Processing",
month = dec,
year = "2022",
address = "Abu Dhabi, United Arab Emirates",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2022.emnlp-main.614",
pages = "8988--9005",
}