GRITHopper
Decomposition-Free Multi-Hop Dense Retrieval

UKP Lab · TU DarmstadtGRITHopper is a state-of-the-art multi-hop dense retriever and the first decoder-based model to perform multi-hop retrieval in an encoder-only fashion, similar to MDR (Xiong et al., 2021) and BeamRetriever (Zhang et al., 2024). Unlike previous approaches that struggle with longer reasoning chains and out-of-distribution data, GRITHopper achieves robust performance by combining dense retrieval with generative training objectives.
How Decomposition-Free Retrieval Works
Watch GRITHopper recursively retrieve documents by expanding context with each hop.

GRITHopper-7B
Multi-Hop Dense Embedder
MultiHop-RAG Benchmark
Hits@1 (Tang et al., 2024)
Open Retrieval Performance (Hits@1)
| Model | MuSiQue | HoVer | ExFever | MoreHopQA* | |||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| H1 | H2 | H3 | H4 | Avg | H1 | H2 | H3 | H4 | Avg | H1 | H2 | H3 | Avg | H1 | H2 | Avg | |
| GRITHopper (ours) | 94.25 | 76.13 | 55.45 | 32.10 | 76.42 | 95.86 | 91.56 | 91.69 | 92.31 | 93.88 | 96.88 | 92.20 | 85.38 | 93.02 | 96.96 | 93.92 | 95.44 |
| GRITLM-7B | 91.15 | 57.51 | 22.32 | 5.43 | 60.51 | 95.81 | 88.09 | 83.95 | 88.46 | 91.81 | 91.13 | 54.88 | 17.28 | 63.83 | 98.75 | 95.53 | 97.14 |
| BeamRetriever | 88.75 | 60.70 | 30.73 | 12.84 | 62.80 | 98.04 | 88.96 | 85.96 | 76.92 | 93.42 | - | - | - | - | 97.85 | 93.02 | 95.44 |
| MDR | 81.75 | 45.18 | - | - | 63.47 | 84.77 | 65.69 | - | - | 77.10 | 92.93 | 77.16 | - | 85.13 | 88.73 | 75.58 | 82.16 |
| Decomposition-based (LLM + retriever) | |||||||||||||||||
| Qwen2.5-32B + GRITLM | 82.62 | 45.72 | 13.91 | 1.48 | 51.06 | 75.38 | 61.44 | 50.43 | 46.15 | 67.69 | 63.24 | 29.88 | 11.93 | 40.90 | 96.24 | 55.19 | 75.72 |
| GPT-4o + GRITLM | 81.96 | 48.53 | 13.39 | 1.98 | 51.81 | - | - | - | - | - | - | - | - | - | - | - | - |
*MoreHopQA is a zero-shot (out-of-distribution) benchmark. H1-H4 = Hop depth. MultiHop-RAG results shown in graph above.
Key Strengths
Encoder-Only Efficiency
Each retrieval iteration requires only a single forward pass, rather than multiple autoregressive steps.
OOD Robustness
State-of-the-art performance compared to other decomposition-free methods on multiple out-of-distribution benchmarks.
Unified Training
Combines dense retrieval with generative objectives, exploring how post-retrieval generation loss improves dense retrieval.
Self-Stopping
Utilizes generative capabilities via ReAct to control its own state, stopping itself through causal next-token prediction.
Quick Start
Training GRITHopper
GRITHopper uses a joint training objective combining contrastive learning for embedding similarity and causal language modeling for next-token prediction:
Post-retrieval language modeling refers to predicting tokens that appear after the retrieval chain (e.g., the final answer). By keeping the retrieval sequence identical for both losses and only appending post-retrieval tokens to the generative objective, we ensure any performance gains come from learning what information is useful, not from extra computation or thinking tokens.
| Answers + Reward | 82.32 |
| Answers Only | 82.08 |
| No Post-Retrieval LM | 80.78 |
| Contrastive Only | 78.02 |
| Dataset | Ans+Rew | Ans | No Post |
|---|---|---|---|
| MuSiQue | 76.16 | 75.95 | 75.22 |
| ExFever | 87.10 | 91.81 | 89.69 |
| HoVer | 93.34 | 94.29 | 94.36 |
| MultiHop-RAG | 51.74 | 54.03 | 51.13 |
| MoreHopQA | 96.14 | 95.80 | 94.68 |
Answer prediction always helps: Adding the final answer to the generative loss teaches the model what information is needed to solve the query, improving retrieval quality (+4.06 Hits@1 on MuSiQue).
Reward modeling trade-off: While observing causal negatives improves discrimination on handcrafted distractors (82.32 Hits@1), it overfits to these specific negatives. In open retrieval, reward modeling causes a 7.32% drop vs only 5.09% for answers-only, indicating that learning to reject specific negatives hurts generalization to unseen corpora.
Citation
@inproceedings{erker2026grithopper,
title={{GRITHopper}: Decomposition-Free Multi-Hop Dense Retrieval},
author={Erker, Justus-Jonas and Reimers, Nils and Gurevych, Iryna},
booktitle={Proceedings of the 2026 Conference of the European Chapter
of the Association for Computational Linguistics (EACL)},
year={2026},
url={https://arxiv.org/abs/2503.07519}
}