In-depth Research Impact Summarization through Fine-Grained Temporal Citation Analysis

Hiba Arnaout¹, Noy Sternlicht², Tom Hope^2,3, Iryna Gurevych¹

UKP Lab, TU Darmstadt¹ HUJI² Allen Institute for AI³
2025

Abstract

Understanding the impact of scientific publications is crucial for identifying breakthroughs and guiding future research. Traditional metrics based on citation counts often miss the nuanced ways a paper contributes to its field. In this work, we propose a new task: generating nuanced, expressive, and time-aware impact summaries that capture both praise (confirmation citations) and critique (correction citations) through the evolution of fine-grained citation intents. We introduce an evaluation framework tailored to this task, showing moderate to strong human correlation on subjective metrics such as insightfulness. Expert feedback from professors reveals a strong interest in these summaries and suggests future improvements.

We need better ways to describe scientific impact!

📊 Citation counts and other quantitative metrics are a common proxy for research impact.
⚠️ But they offer only a shallow view — they don't explain how a paper influenced later work.
❓ A raw count doesn't tell us if the paper was:
- 📌 Foundational
- 🔁 Extended
- 🧐 Refined
- 💬 Just mentioned in passing
🔍 Truly understanding impact requires examining the context of citations.
📝 Citation context = text surrounding a citation
💡 This means analyzing how a paper's ideas are discussed, applied, and evolved over time.
🚫 Manual tracking of this across large, diverse literature is not practical.

Citation profiling

Detecting impact revealing citations with in-context-learning

Adding examples helps with detecting impact-revealing citations (i.e., suitable for the task of impact summarization) -- with 90% in recall.

🗂️ A new dataset: 4k citation contexts with classified fine-grained intents

Comparison with existing intent classifiers

How does citation intent vary across research fields?

impact-revealing other (incidental)

📚 All: 70k citation contexts in total

🕒 Recent (vs. older): published in the last 5 years

⭐ Highly cited (vs. less cited): top-20% by citation count

🧠 Psychology leans toward impact-revealing citations

🗣️ Psychology citations often use a more subjective tone, e.g., “controversial assumptions”, “researchers disagree”

💻 Computer Science (CS) skews toward other except in recent papers. Likely reflects the novelty and immediate relevance of current AI research.

Generating impact summaries

Ablations

Faith: faithfulness, Cov: coverage, Cyc: citation year compliance, Insi: insightfulness, Trend: trend awareness, Spec: specificity.

Human evaluation

Researchers found our summaries to be insightful and relevant.

9 professors (gender: 5 male, 4 female; country: 2 DE, 2 BR, 1 US, 1 CZ, 1 AL; research focus: AI, NLP, KGs, Psychology, computational social sciences), evaluating summaries about their own papers.

🎯 63% in relevance
(Which summary better reflects the actual impact?)

💡 75% in insightfulness
(Which summary has information you didn’t already know about how your paper was used?)

The summaries had new information about how their papers were used, that they were not aware of.

Perceived usefulness results for: (a) all papers, (b) papers in the top 10% for number of impact-revealing citations, (c) papers in the top 10% for total number of citations.

BibTeX

@misc{arnaout2025indepthresearchimpactsummarization,
      title={In-depth Research Impact Summarization through Fine-Grained Temporal Citation Analysis}, 
      author={Hiba Arnaout and Noy Sternlicht and Tom Hope and Iryna Gurevych},
      year={2025},
      eprint={2505.14838},
      archivePrefix={arXiv},
      primaryClass={cs.DL},
      url={https://arxiv.org/abs/2505.14838}, 
}