Logo image
A Framework for Leveraging LLMs for Scene Analysis and Cognitive Processing
Journal article   Open access   Peer reviewed

A Framework for Leveraging LLMs for Scene Analysis and Cognitive Processing

Catarina Moreira, Jeffrey Cockburn and Monica S. Castelhano
Proceedings of the ACM on computer graphics and interactive techniques, Vol.8(2), pp.1-18
06/2025
DOI: 10.1145/3729414
url
https://doi.org/10.1145/3729414View
Published (Version of record) Open Access

Abstract

In everyday visual search tasks, humans rely on prior knowledge of object placements in scenes to efficiently locate target objects. This ability is evidenced by eye movement patterns, where individuals focus on areas that are more likely to contain the target, such as searching for a cup on a table or shoes on the floor. Building on this, we propose a new annotation pipeline that leverages these priors by extracting a knowledge graph from images based on automatically annotated objects. This knowledge graph is then used with large language models (LLMs) to predict the most likely locations of a specific target object in an image. Our approach is the first instance of using LLMs to identify relevant prior knowledge in images and to bridge the gap between human scene understanding and computational models.
Computing methodologies Human computer interaction (HCI) Human-centered computing Image segmentation Knowledge representation and reasoning

Details

Metrics

4 Record Views
Logo image