Despite the outstanding progress in robotic perception, including in tasks such as accurate mapping or supervised object detection, robots do not have the versatility of the human vision system and the capacity to operate in diverse environments while performing complex tasks. Motivated to overcome such challenges, in this project, we aim to study the modeling of human visual attention with the goal to advance robot perception. Our hypothesis is that the study of modeling of the human visual attention mechanisms enables the development of safer and robust autonomous robotic operations.
Visual attention mechanisms include processes that among others are involved in the management of the overwhelming amount of visual data that enter the brain while performing everyday tasks. Specifically, we are interested in how spatial attention works, i.e., the ability to focus on a specific region in the visual field. Humans make rapid eye movements known as saccades and fixate in certain regions of the visual scene. This can be visualized as scan paths of eye fixations in a 2D/3D image. As fixation we broadly refer to the point where the eye attends to for a sufficient amount of time. Within this framework, we study where one looks next in the visual field, model this mechanism and then combine it with robot vision algorithms for tasks of interest.
While most of the studies of modeling the human visual attention system have been concentrating on experiments involving human subjects looking at still images without having a specific task or purpose in mind, our goal is to collect data that include recordings of eye fixations in experiments that include movement and specifically recording eye fixations of the subject while being engaged in a specific task. We are using the recorded data for modeling scan paths of eye fixations and then integrating it into robotic perception algorithms. The derived models will be evaluated against the state-of-the-art in visual attention modeling and other techniques in robot vision.