Belangrijkste concepten
Visual knowledge, which encompasses visual concepts, relations, operations, and reasoning, can empower large AI models to overcome their limitations in transparency, reasoning, and catastrophic forgetting, and advance artificial intelligence closer to human-level general intelligence.
Samenvatting
The article discusses the significance of visual knowledge in the era of large AI models, also known as "big models" or "foundation models". It first provides an overview of the origins and core definitions of visual knowledge, which is a form of knowledge representation that differs from traditional symbolic and sub-symbolic approaches. Visual knowledge is grounded in cognitive psychology and involves the representation of visual concepts, relations, operations, and reasoning.
The article then reviews recent research on visual knowledge in the pre-big model era, highlighting progress and remaining challenges in areas such as visual concept modeling, visual relation understanding, visual operation generation, and visual reasoning. It notes that while some advancements have been made, numerous core issues remain challenging and underexplored.
The article then explores the prospect of visual knowledge in the big model era. It argues that visual knowledge can empower big models to overcome their limitations in transparency, reasoning, and catastrophic forgetting, and advance artificial intelligence closer to human-level general intelligence. Conversely, it also discusses how big models can boost the development of visual knowledge, given the significant challenges of establishing visual knowledge.
Specifically, the article suggests that by integrating visual knowledge into big models, these models can become more transparent, accountable, and effective in reasoning and problem-solving. Visual knowledge can provide big models with a more structured and interpretable representation of visual information, enabling them to better understand and manipulate visual concepts, relations, and operations. This, in turn, can lead to improved reasoning capabilities, better generalization, and more reliable outputs.
At the same time, the article acknowledges the significant challenges in constructing visual knowledge, and proposes that big models can aid in this endeavor. The vast scale and broad applicability of big models can facilitate the large-scale acquisition and learning of visual knowledge, overcoming the data and computational limitations that have historically hindered progress in this area.
Overall, the article highlights the synergistic potential of visual knowledge and big models, and calls for interdisciplinary collaboration to advance this promising direction of research, which can ultimately bring artificial intelligence closer to human-level general intelligence.