We assess the vulnerability of multimodal large language models (MLLMs) to misleading visualizations, charts that distort the underlying data table using deceptive techniques such as truncated or inverted axes, leading readers to draw inaccurate conclusions that may support disinformation. While MLLMs have shown steady improvement on standard chart reasoning benchmarks, our analysis reveals that their performance on misleading visualizations remains close to the level of the random baseline. To mitigate this vulnerability, we introduce six inference-time methods to improve the question-answering performance of MLLMs on misleading visualizations while preserving their higher accuracy on non-misleading ones. The most effective approach consists of (1) extracting the underlying data table and (2) using a text-only large language model to answer the question based on the table. This method improves question-answering performance on misleading visualizations by 15.4 to 19.6 percentage points.
@article{tonglet2025misleadingvisualizations,
title={Protecting multimodal LLMs against misleading visualizations},
author={Tonglet, Jonathan and Tuytelaars, Tinne and Moens, Marie-Francine and Gurevych, Iryna},
journal={arXiv preprint arXiv:2502.20503},
year={2025},
url={https://arxiv.org/abs/2502.20503},
doi={10.48550/arXiv.2502.20503}
}