M4FC: a Multimodal, Multilingual, Multicultural, Multitask fact-checking dataset

1MBZUAI  2UKP Lab, TU Darmstadt   3Electrical Engineering, KU Leuven 
4Computer Science, KU Leuven 

A novel real-world dataset for multimodal fact-checking

🕵️ We introduce M4FC, a novel real-world multimodal fact-checking dataset, comprising 4,982 images and 6,985 claims.
Each multimodal claim is available in one or two out of ten languages, and the dataset represents a diverse range of cultures from around the world.
🤖 M4FC is the first dataset to provide labels for 6 real-world multimodal fact-checking tasks.

An instance from M4FC with labels for all six tasks.

Abstract

Existing real-world datasets for multimodal automated fact-checking have multiple limitations: they contain few instances, focus on only one or two languages and tasks, suffer from evidence leakage, or depend on external sets of news articles for sourcing true claims. To address these shortcomings, we introduce M4FC, a new real-world dataset comprising 4,982 images paired with 6,980 claims. The images, verified by professional fact-checkers from 22 organizations, represent diverse cultural and geographic contexts. Each claim is available in one or two out of ten languages. M4FC spans six multimodal fact-checking tasks: visual claim extraction, claimant intent prediction, fake detection, image contextualization, location verification, and verdict prediction. We provide baseline results for all tasks and analyze how combining intermediate tasks influence downstream verdict prediction performance.

New tasks

M4FC includes two new tasks: visual claim extraction and location verification.
Visual claim extraction aims to extract a verifiable claim from a multimodal social media post.
Location verification aims to validate a candidate location for an image by comparing with satellite imagery and maps.

Experiments

💡 We provide several results in the paper, including an analysis of how the output of intermediate tasks impact downstream verdict prediction performance. Learn more by reading our paper and stay tuned for future work!

BibTeX

@article{geng2025m4fc,
  title={M4FC: a Multimodal, Multilingual, Multicultura, Multitask real-world Fact-Checking Dataset},
  author={Geng, Jiahui and Tonglet, Jonathan and Gurevych, Iryna},
  journal={arXiv preprint arXiv:2510.23508},
  year={2025},
  url={https://arxiv.org/abs/2510.23508},
  doi={10.48550/arXiv.2510.23508}
}