Information
論文タイトル:Image Understanding Support Method for Visually Impaired Users via Multi-Region Caption Generation
著者:Yiling Xu, Junjie Shan, Megumi Yasuo, and Yoko Nishihara
概要:This study develops an interactive system to help visually impaired users achieve a deeper understanding of image content, addressing the challenge that a holistic descriptive sentence often fails to convey complex compositions. We propose a sub-image captioning approach, implementing two methods: (1) a simple grid-based division and (2) a semantically-aware method using Dense Captioning to generate numerous region candidates, then selecting the one with the largest overlap for each grid cell. Users can selectively query descriptions for any region. To evaluate effectiveness, 20 participants drew 300 sketches based on the generated descriptions. An LLM calculated a similarity score between sketches and original images as an objective measure of “perception and comprehension,” supplemented by a subjective “imaginability” rating. The results validate our approach. Notably, method (2) achieved higher objective scores in eight out of 10 image categories, particularly for images with complex scenes and small but critical elements, and a similar trend was observed in the subjective evaluations. Conversely, for images where a single item occupies most of the frame, a holistic description sometimes proved more effective, indicating that the optimal strategy is content-dependent. This study thus offers a novel methodology for enhancing image accessibility and suggests future work in dynamically adapting the description strategy based on both image content and user feedback.
書誌情報:The 30th International Conference on Technologies and Applications of Artificial Intelligence, pp.??–??
発表日:2025年12月13日
