The Nature of NLP: Analyzing Contributions in NLP Papers

1UKP Lab, Technische Universität Darmstadt
2IT:U - Interdisciplinary Transformation University
3IBM Research Europe
4National Research Council Canada
The 63rd Annual Meeting of the Association for Computational Linguistics (ACL 2025)
TL;DR
  • New 8-label taxonomy for what counts as a contribution in NLP (datasets, methods, tasks ↔ knowledge about each, plus language & people).”
  • NLPContributions: 1,995 abstracts manually annotated for those labels.”
  • SciBERT-based classifier that scales the taxonomy to ~29k ACL-Anthology papers (1974–Feb 2024).”
  • Findings: early NLP = language & people-centric; 1990s–2010s = method/task/dataset boom; post-2020 = renaissance of human & linguistic studies alongside LLM frenzy.”
  • All code + data openly released — a playground for meta-research.”

Abstract

Natural Language Processing (NLP) is an established and dynamic field. Despite this, what constitutes NLP research remains debated. In this work, we address the question by quantitatively examining NLP research papers. We propose a taxonomy of research contributions and introduce NLPContributions, a dataset of nearly 2k NLP research paper abstracts, carefully annotated to identify scientific contributions and classify their types according to this taxonomy. We also introduce a novel task of automatically identifying contribution statements and classifying their types from research papers. We present experimental results for this task and apply our model to ∼29k NLP research papers to analyze their contributions, aiding in the understanding of the nature of NLP research. We show that NLP research has taken a winding path — with the focus on language and human-centric studies being prominent in the 1970s and 80s, tapering off in the 1990s and 2000s, and starting to rise again since the late 2010s. Alongside this revival, we observe a steady rise in dataset and methodological contributions since the 1990s, such that today, on average, individual NLP papers contribute in more ways than ever before. Our dataset and analyses offer a powerful lens for tracing research trends and focus on language and human-centric studies being prominent in the 1970s and 80s, tapering off in the 1990s and 2000s, and starting to rise again since the late 2010s. Alongside this revival, we observe a steady rise in dataset and methodological contributions since the 1990s, such that today, on average, individual NLP papers contribute in more ways than ever before. Our dataset and analyses offer a powerful lens for tracing research trends and offer potential for generating informed, data-driven literature surveys.

Why this paper matters

NLP is maturing at break-neck speed, yet we still debate what counts as an NLP paper. We tackle that head-on with a data-driven approach:

  1. Define a contribution taxonomy broad enough for all of NLP yet precise enough for automatic labeling.
  2. Build the first gold dataset that tags sentences, not whole papers.
  3. Automate contribution mining and push it to a historical corpus.
  4. Analyze 50 years of ACL output to surface field-level trends we’ve only anecdotally sensed.

For anyone working on scientometrics, survey automation, benchmarking, or simply curious about our field’s trajectory, this is gold.

The Taxonomy in 60s

Taxonomy Overview

Type Sub-type Description Example
Knowledge k-dataset Describes new knowledge about datasets, such as their new properties or characteristics. "Furthermore, our thorough analysis demonstrates the average distance between aspect and opinion words are shortened by at least 19% on the standard SemEval Restaurant14 dataset." - Zhou et al., 2021
k-language Presents new knowledge about language, such as a new property or characteristic of language. "In modern Chinese articles or conversations, it is very popular to involve a few English words, especially in emails and Internet literature." - Zhao et al., 2012
k-method Describes new knowledge or analysis about NLP models or methods (which predominantly draw from Machine Learning). “Different generative processes identify specific failure modes of the underlying model.” – Deng et al. (2022)
k-people Presents new knowledge about people, humankind, society, or human civilization. “Combating the outcomes of this infodemic is not only a question of identifying false claims, but also reasoning about the decisions individuals make.” – Pacheco et al. (2022)
k-task Describes new knowledge about NLP tasks. “We show that these bilingual features outperform the monolingual features used in prior work for the task of classifying translation direction.” – Eetemadi and Toutanova (2014)
Artifact a-dataset Introduces a new NLP dataset (i.e., textual resources such as corpora or lexicon). “We present a new corpus of Weibo messages annotated for both name and nominal mentions.” – Peng and Dredze (2015)
a-method Introduces or proposes a new or novel NLP method or model (primarily to solve NLP task(s)) “The paper also describes a novel method, EXEMPLAR, which adapts ideas from SRL to less costly NLP machinery, resulting in substantial gains both in efficiency and effectiveness, over binary and n-ary relation extraction tasks.” – Mesquita et al. (2013)
a-task Introduces or proposes a new or novel NLP task (i.e., well-defined NLP problem). “We formulate a task that represents a hybrid of slot-filling information extraction and named entity recognition and annotate data from four different forums.” – Durrett et al. (2017)

A single sentence can carry multiple labels (≈ 58 % actually do), capturing the multifaceted nature of modern papers.

Building NLPContributions & Scaling Up

  • 1,995 abstracts (1974-2024) → 5,890 contribution sentences.
  • Two expert annotators, κ ≈ 0.71 (solid for eight labels).
  • Public release: data + Label-Studio config + guidelines.

Fine-tuned SciBERT (macro-F1 ≈ 0.80) → tag 28,937 ACL papers.
Now we can ask field-level questions without months of manual coding.

What the numbers say

1. The long arc of NLP

Knowledge Temporal Label Distribution Artifact Temporal Label Distribution
  • 1970s-80s: heavy on language & people studies (think discourse, dialogue).
  • 1990s-2000s: statistical turn → datasets & methods skyrocket; human-centric work fades.
  • Late 2010s-...: LLM era revives interest in sociolinguistics & bias → k-language and k-people trend up again, but methods still dominate.

2. Venue personalities (or lack thereof)

Venue Personalities
  • Classic ACL/EMNLP/NAACL now share nearly identical contribution profiles.
  • Computational Linguistics (journal) keeps a distinctive linguistic/human flavor.
  • Newcomers (Findings, AACL) mirror ACL from day 1 — fast institutional convergence.

3. Citations & incentives

  • Methodological papers get more citations, dataset papers less so.
  • k-language and k-people papers are cited more than a-methods, but less than a-datasets.
  • k-dataset and k-task papers are the least cited, but still important for field growth.

Open questions ripe for follow-up

  • Beyond ACL: does the arXiv NLP flood follow the same patterns?
  • Full-paper granularity: how do contributions differ between abstract and body?
  • Cross-discipline transfer: can the taxonomy bootstrap similar studies in CV or IR?

Takeaway

The Nature of NLP seeds a new branch of meta-science for our field: evidence-based self-reflection. With the dataset and code released, the barrier to entry is minimal — the next wave of analyses is yours to run.
“Individual NLP papers contribute in more ways than ever before.”
— Let’s keep that diversity alive, measure it, and learn from it.

BibTeX

@inproceedings{pramanick-etal-2025-nlpcontributions,
                title={The Nature of NLP: Analyzing Contributions in NLP Papers},
                author={Pramanick, Aniket and Hou, Yufang and Mohammad, Saif and Gurevych, Iryna},
                booktitle={The 63rd Annual Meeting of the Association for Computational Linguistics},
                year={2025},
                url={https://arxiv.org/abs/2409.19505}    
            }