EACL 2026 Main Conference

GRITHopper

Decomposition-Free Multi-Hop Dense Retrieval

GritHopper Mascot
1
UKP LabUKP Lab · TU Darmstadt
2Cohere
Scroll to see how it works

GRITHopper is a state-of-the-art multi-hop dense retriever and the first decoder-based model to perform multi-hop retrieval in an encoder-only fashion, similar to MDR (Xiong et al., 2021) and BeamRetriever (Zhang et al., 2024). Unlike previous approaches that struggle with longer reasoning chains and out-of-distribution data, GRITHopper achieves robust performance by combining dense retrieval with generative training objectives.

How Decomposition-Free Retrieval Works

Watch GRITHopper recursively retrieve documents by expanding context with each hop.

1
Input
Query
Where does the body of water by the city that shares a border with Elizabeth Berg's birthplace and Ohio River meet?
encode
2
Encoding
GritHopper

GRITHopper-7B

Multi-Hop Dense Embedder

Ready
3
Search Documents in Embedding Space
Document Vector
document returns
1
2
3
4
Hop 1/4

MultiHop-RAG Benchmark

Hits@1 (Tang et al., 2024)

GRITHopper-7B (ours)
GRITLM-7B (Muennighoff et al., 2024)
BeamRetriever (Zhang et al., NAACL 2024)
GPT-4o + GRITLM (decomposition-based)
Qwen2.5-32B + GRITLM (decomposition-based)
Waiting...

Open Retrieval Performance (Hits@1)

ModelMuSiQueHoVerExFeverMoreHopQA*
H1H2H3H4AvgH1H2H3H4AvgH1H2H3AvgH1H2Avg
GRITHopper (ours)94.2576.1355.4532.1076.4295.8691.5691.6992.3193.8896.8892.2085.3893.0296.9693.9295.44
GRITLM-7B91.1557.5122.325.4360.5195.8188.0983.9588.4691.8191.1354.8817.2863.8398.7595.5397.14
BeamRetriever88.7560.7030.7312.8462.8098.0488.9685.9676.9293.42----97.8593.0295.44
MDR81.7545.18--63.4784.7765.69--77.1092.9377.16-85.1388.7375.5882.16
Decomposition-based (LLM + retriever)
Qwen2.5-32B + GRITLM82.6245.7213.911.4851.0675.3861.4450.4346.1567.6963.2429.8811.9340.9096.2455.1975.72
GPT-4o + GRITLM81.9648.5313.391.9851.81------------

*MoreHopQA is a zero-shot (out-of-distribution) benchmark. H1-H4 = Hop depth. MultiHop-RAG results shown in graph above.

Key Strengths

Encoder-Only Efficiency

Each retrieval iteration requires only a single forward pass, rather than multiple autoregressive steps.

OOD Robustness

State-of-the-art performance compared to other decomposition-free methods on multiple out-of-distribution benchmarks.

Unified Training

Combines dense retrieval with generative objectives, exploring how post-retrieval generation loss improves dense retrieval.

Self-Stopping

Utilizes generative capabilities via ReAct to control its own state, stopping itself through causal next-token prediction.

Quick Start

Training GRITHopper

GRITHopper uses a joint training objective combining contrastive learning for embedding similarity and causal language modeling for next-token prediction:

L = Lrep + Lgen

Post-retrieval language modeling refers to predicting tokens that appear after the retrieval chain (e.g., the final answer). By keeping the retrieval sequence identical for both losses and only appending post-retrieval tokens to the generative objective, we ensure any performance gains come from learning what information is useful, not from extra computation or thinking tokens.

Contrastive
Embedding similarity loss
Anchor (query context)
Q
GritHopper
GritHopper-7B
Embedding Space - Hop 1
Q
D1
D1_N
pullpush
Hop 1: Anchor Q pulls D1 (positive) closer, pushes D1_N (hard negative from distractors) away.
No Post-Retrieval LM
Same sequence for both losses
Input
QD1D2
GritHopper
GritHopper-7B
Output (next token prediction)
We add 'Eval: Relevant' to match sequence length with other variants. Since it's always the same token, it provides no discriminative signal—isolating whether gains come from actual post-retrieval info vs. just more compute tokens.
+ Answer
Post-retrieval answer tokens
Input
QD1D2
GritHopper
GritHopper-7B
Output (next token prediction)
The final answer is appended as post-retrieval signal. This teaches what information leads to correct answers, improving retrieval.
+ Reward
Causal negative observation
Input
QD1Distractor
GritHopper
GritHopper-7B
Output (next token prediction)
Hard negatives are observed causally with 'Irrelevant' label. Improves distractor discrimination but can overfit.
Ablation Results (Hits@1)
MuSiQue Distractor Setting
Answers + Reward82.32
Answers Only82.08
No Post-Retrieval LM80.78
Contrastive Only78.02
Open Retrieval (avg. 2 seeds)
DatasetAns+RewAnsNo Post
MuSiQue76.1675.9575.22
ExFever87.1091.8189.69
HoVer93.3494.2994.36
MultiHop-RAG51.7454.0351.13
MoreHopQA96.1495.8094.68
Key Findings

Answer prediction always helps: Adding the final answer to the generative loss teaches the model what information is needed to solve the query, improving retrieval quality (+4.06 Hits@1 on MuSiQue).

Reward modeling trade-off: While observing causal negatives improves discrimination on handcrafted distractors (82.32 Hits@1), it overfits to these specific negatives. In open retrieval, reward modeling causes a 7.32% drop vs only 5.09% for answers-only, indicating that learning to reject specific negatives hurts generalization to unseen corpora.

Citation

BibTeX
@inproceedings{erker2026grithopper,
  title={{GRITHopper}: Decomposition-Free Multi-Hop Dense Retrieval},
  author={Erker, Justus-Jonas and Reimers, Nils and Gurevych, Iryna},
  booktitle={Proceedings of the 2026 Conference of the European Chapter
             of the Association for Computational Linguistics (EACL)},
  year={2026},
  url={https://arxiv.org/abs/2503.07519}
}