Protecting multimodal large language models against misleading visualizations

1UKP Lab, TU Darmstadt   2Electrical Engineering, KU Leuven  3Computer Science, KU Leuven 

What are misleading visualizations?

Charts are useful tools to communicate data insights. However, deceiving design patterns, such as truncated, inverted, or dual axes, can lead readers to inaccurate interpretations of the underlying data. Such misleading visualizations have been used to propagate and increase belief in disinformation during crises, and are effective to deceive humans.
What about multimodal large language models (MLLMs)? Are they vulnerable too? Yes! 😧 Their chart QA performance is reduced to the level of the random baseline, up to 65.5 percentage points lower than their performance on the ChartQA benchmark.

Three examples of misleading visualizations with QA pairs.

Abstract

We assess the vulnerability of multimodal large language models (MLLMs) to misleading visualizations, charts that distort the underlying data table using deceptive techniques such as truncated or inverted axes, leading readers to draw inaccurate conclusions that may support disinformation. While MLLMs have shown steady improvement on standard chart reasoning benchmarks, our analysis reveals that their performance on misleading visualizations remains close to the level of the random baseline. To mitigate this vulnerability, we introduce six inference-time methods to improve the question-answering performance of MLLMs on misleading visualizations while preserving their higher accuracy on non-misleading ones. The most effective approach consists of (1) extracting the underlying data table and (2) using a text-only large language model to answer the question based on the table. This method improves question-answering performance on misleading visualizations by 15.4 to 19.6 percentage points.

MLLMs are vulnerable to misleading visualizations

We compare the chart question-answering accuracy of 16 MLLMs on 3 datasets: (1) a collection of misleading visualizations, (2) a collection of non-misleading visualizations, (3) the reference ChartQA benchmark, which contains non-misleading visualizations. MLLMs perform much worse on misleading visualizations than on non-misleading ones. In fact, they do not perform better than the random baseline on average.

How to mitigate this vulnerability?

We propose six inference-time correction methods to mitigate the negative effects of misleading visualizations while preserving the high performance on non-misleading ones.

Table-based QA is the best correction method

Among all correction methods, the most promising is to extract the underlying table using a MLLM, then provide the table without the chart image to a text-only LLM, framing the task as Table-QA. However, gains are modest (~15–20 percentage points) and depend on table extraction quality—highlighting the need for further research! Learn more by reading our paper and stay tuned for future work!

BibTeX

@article{tonglet2025misleadingvisualizations,
          title={Protecting multimodal LLMs against misleading visualizations},
          author={Tonglet, Jonathan and Tuytelaars, Tinne and Moens, Marie-Francine and Gurevych, Iryna},
          journal={arXiv preprint arXiv:2502.20503},
          year={2025},
          url={https://arxiv.org/abs/2502.20503},
          doi={10.48550/arXiv.2502.20503}
      }