Journal article
A Framework for Leveraging LLMs for Scene Analysis and Cognitive Processing
Proceedings of the ACM on computer graphics and interactive techniques, Vol.8(2), pp.1-18
06/2025
DOI: 10.1145/3729414
Abstract
In everyday visual search tasks, humans rely on prior knowledge of object placements in scenes to efficiently locate target objects. This ability is evidenced by eye movement patterns, where individuals focus on areas that are more likely to contain the target, such as searching for a cup on a table or shoes on the floor. Building on this, we propose a new annotation pipeline that leverages these priors by extracting a knowledge graph from images based on automatically annotated objects. This knowledge graph is then used with large language models (LLMs) to predict the most likely locations of a specific target object in an image. Our approach is the first instance of using LLMs to identify relevant prior knowledge in images and to bridge the gap between human scene understanding and computational models.
Details
- Title: Subtitle
- A Framework for Leveraging LLMs for Scene Analysis and Cognitive Processing
- Creators
- Catarina Moreira - Instituto de Engenharia de Sistemas e Computadores Investigação e DesenvolvimentoJeffrey Cockburn - University of IowaMonica S. Castelhano - Queen's University
- Resource Type
- Journal article
- Publication Details
- Proceedings of the ACM on computer graphics and interactive techniques, Vol.8(2), pp.1-18
- DOI
- 10.1145/3729414
- ISSN
- 2577-6193
- eISSN
- 2577-6193
- Publisher
- ACM
- Number of pages
- 18
- Grant note
- 10.54499/UIDB/50021/2020, 10.54499/DL57/2016/CP1368/CT0002, 2022.09212.PTDC (XAVIER) / Fundação para a Ciência e Tecnologia (10.54499/UIDB/50021/2020) RGPAS-2018-522460,RGPIN- 2018-05166 / Natural Sciences and Engineering Research Council of Canada (https://doi.org/10.13039/501100000038)
- Alternative title
- A Framework for Leveraging LLMs for Scene Analysis and Cognitive Processing
- Language
- English
- Date published
- 06/2025
- Academic Unit
- Psychological and Brain Sciences
- Record Identifier
- 9984825638002771
Metrics
4 Record Views