Protecting multimodal large language models against misleading visualizations

1UKP Lab, TU Darmstadt   2Electrical Engineering, KU Leuven  3Computer Science, KU Leuven 

What are misleading visualizations?

Charts are useful tools to communicate data insights. However, deceiving design patterns, such as truncated, inverted, or dual axes, can lead readers to inaccurate interpretations of the data. Such misleading visualizations have been used to propagate and increase belief in misinformation during crises, and are effective to deceive humans.
What about multimodal large language models (MLLMs)? Are they vulnerable too? Yes! 😧 Their chart QA performance is reduced to the level of the random baseline, up to 65.5 percentage points lower than their QA performance on the ChartQA benchmark.

Two examples of misleading visualizations with QA pairs.

Abstract

We assess the vulnerability of multimodal large language models to misleading visualizations - charts that distort the underlying data using techniques such as truncated or inverted axes, leading readers to draw inaccurate conclusions that may support misinformation or conspiracy theories. Our analysis shows that these distortions severely harm multimodal large language models, reducing their question-answering accuracy to the level of the random baseline. To mitigate this vulnerability, we introduce six inference-time methods to improve performance of MLLMs on misleading visualizations while preserving their accuracy on non-misleading ones. The most effective approach involves (1) extracting the underlying data table and (2) using a text-only large language model to answer questions based on the table. This method improves performance on misleading visualizations by 15.4 to 19.6 percentage points.

MLLMs are vulnerable to misleading visualizations

We compare the chart question-answering accuracy of 13 MLLMs on 3 datasets: (1) a collection of misleading visualizations, (2) a collection of non-misleading visualizations, (3) the reference ChartQA benchmark, which contains non-misleading visualizations. MLLMs perform much worse on misleading visualizations than on non-misleading ones. In fact, they do not perform on average better than random.

How to mitigate this vulnerability?

We propose six inference-time correction methods to mitigate the negative effects of misleading visualizations while preserving the high performance on non-misleading ones.

Table-based QA is the best correction method

Among all correction methods, the most promising is to extract the underlying table using a MLLM, then provide the table without the chart image to a text-only LLM, framing the task as Table-QA. However, gains are modest (~15–20 percentage points) and depend on table extraction quality—highlighting the need for further research! Learn more by reading our paper.

BibTeX

@article{tonglet2025misleadingvisualizations,
          title={Protecting multimodal LLMs against misleading visualizations},
          author={Tonglet, Jonathan and Tuytelaars, Tinne and Moens, Marie-Francine and Gurevych, Iryna},
          journal={arXiv preprint arXiv:2502.20503},
          year={2025},
          url={https://arxiv.org/abs/2502.20503},
          doi={10.48550/arXiv.2502.20503}
      }