A Survey on Multimodal Reasoning
A Comprehensive Survey on Multimodal Reasoning
Reasoning categories
In this paper, we focus on the reasoning abilities of Multimodal Large Language Models. The reasoning methods employed by these models fall under the category of informal reasoning. This is primarily because they utilize natural language to articulate the steps and conclusions involved in the reasoning process and they allow a certain degree of inaccuracy in their reasoning mechanisms. This paper primarily focuses on three reasoning types: deductive reasoning, abductive reasoning, and analogical reasoning. These types are highlighted due to their prevalent application in real-world reasoning tasks, particularly within the scope of current MLLMs.
MLLMs Summarization
Conclusion and Future Directions
The concept of reasoning ability is pivotal in the quest for Strong Artificial Intelligence (Strong AI) or Artificial General Intelligence (AGI), a goal that has been pursued across various scientific disciplines for several decades. With recent advancements in both models and evaluation benchmarks, there is a growing discourse on the reasoning abilities exhibited in current LLMs and MLLMs. In this work, we present the different types of reasoning and discuss the models, data, and evaluation methods used to measure and understand the reasoning abilities demonstrated in existing studies. Our survey aims to provide a better understanding of our current standing in this research direction and hopes to inspire further exploration into the reasoning abilities of future work.