Institut de Neurosciences Cognitives et Intégratives d'Aquitaine (UMR5287)

Aquitaine Institute for Cognitive and Integrative Neuroscience

Université de Bordeaux

Zone nord Bat 2 2ème étage
146, rue Léo Saignat
33076 Bordeaux cedex


Supervisory authorities

CNRS Ecole Pratique des Hautes Etudes Université de Bordeaux

Our partners

Neurocampus Unitéde Formation de Biologie


GDR Robotique GDR Mémoire GDR Multi-électrodes


Home > Latest publications

Pérez De San Roman P, Benois-Pineau J, Domenger J-P, Cattaert D, Paclet F, de Rugy A (2017)

by Loïc Grattier - published on

Saliency Driven Object Recognition in Egocentric Videos with Deep CNN: toward application in assistance to Neuroprostheses. Computer Vision and Image Understanding. doi: 10.1016/j.cviu.2017.03.001.

`The problem of object recognition in natural scenes has been recently successfully addressed with Deep Convolutional Neuronal Networks giving a significant break-through in recognition scores. The computa- tional efficiency of Deep CNNs as a function of their depth, allows for their use in real-time applications. One of the key issues here is to reduce the number of windows selected from images to be submitted to a Deep CNN. This is usually solved by preliminary segmentation and selection of specific windows, hav- ing outstanding “objectiveness”or other value of indicators of possible location of objects. In this paper we propose a Deep CNN approach and the general framework for recognition of objects in a real-time scenario and in an egocentric perspective. Here the window of interest is built on the basis of visual attention map computed over gaze fixations measured by a glass-worn eye-tracker. The application of this set-up is an interactive user-friendly environment for upper-limb amputees. Vision has to help the subject to control his worn neuro-prosthesis in case of a small amount of remaining muscles when the EMG control becomes inefficient. The recognition results on a specifically recorded corpus of 151 videos with simple geometrical objects show the mean Average Precision (mAP) of 64,6% and the computational time at the generalization lower than a time of a visual fixation on the object of interest.