A Framework for Leveraging LLMs for Scene Analysis and Cognitive Processing

Catarina Moreira; Jeffrey Cockburn; Monica S. Castelhano

doi:10.1145/3729414

Back

A Framework for Leveraging LLMs for Scene Analysis and Cognitive Processing

Journal article

Open access

Peer reviewed

A Framework for Leveraging LLMs for Scene Analysis and Cognitive Processing

Catarina Moreira, Jeffrey Cockburn and Monica S. Castelhano

Proceedings of the ACM on computer graphics and interactive techniques, Vol.8(2), pp.1-18

06/2025

DOI: 10.1145/3729414

Files and links (1)

url

https://doi.org/10.1145/3729414View

Published (Version of record) Open Access

Abstract

In everyday visual search tasks, humans rely on prior knowledge of object placements in scenes to efficiently locate target objects. This ability is evidenced by eye movement patterns, where individuals focus on areas that are more likely to contain the target, such as searching for a cup on a table or shoes on the floor. Building on this, we propose a new annotation pipeline that leverages these priors by extracting a knowledge graph from images based on automatically annotated objects. This knowledge graph is then used with large language models (LLMs) to predict the most likely locations of a specific target object in an image. Our approach is the first instance of using LLMs to identify relevant prior knowledge in images and to bridge the gap between human scene understanding and computational models.

Computing methodologies

Human computer interaction (HCI)

Human-centered computing

Image segmentation

Knowledge representation and reasoning

Details

Title: Subtitle: A Framework for Leveraging LLMs for Scene Analysis and Cognitive Processing
Creators: Catarina Moreira - Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento
Jeffrey Cockburn - University of Iowa
Monica S. Castelhano - Queen's University
Resource Type: Journal article
Publication Details: Proceedings of the ACM on computer graphics and interactive techniques, Vol.8(2), pp.1-18
DOI: 10.1145/3729414
ISSN: 2577-6193
eISSN: 2577-6193
Publisher: ACM
Number of pages: 18
Grant note: 10.54499/UIDB/50021/2020, 10.54499/DL57/2016/CP1368/CT0002, 2022.09212.PTDC (XAVIER) / Fundação para a Ciência e Tecnologia (10.54499/UIDB/50021/2020) RGPAS-2018-522460,RGPIN- 2018-05166 / Natural Sciences and Engineering Research Council of Canada (https://doi.org/10.13039/501100000038)
Alternative title: A Framework for Leveraging LLMs for Scene Analysis and Cognitive Processing
Language: English
Date published: 06/2025
Academic Unit: Psychological and Brain Sciences
Record Identifier: 9984825638002771

Metrics

4 Record Views