- Ortega-Bastida, Javier; Gallego, Antonio-Javier; Pertusa, Antonio
"Multimodal Object Recognition Using Deep Learning Representations Extracted from Images and Smartphone Sensors"
Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications, ISBN: 978-3-030-13469-3, pp. 521--529
In this work, we present a multimodal approach to perform object recognition from photographs taken using smartphones. The proposed method extracts neural codes from the input image using a Convolutional Neural Network (CNN), and combines them with a series of metadata gathered from the smartphone sensors when the picture was taken. These metadata complement the visual contents and they can provide additional information in order to determine the target class. We add feature selection and metadata pre-processing, by encoding textual features, such as the kind of place where a picture was taken, using Doc2Vec in order to maintain the semantics. The deep representations extracted from images and metadata are combined with early fusion to classify samples using different machine learning methods (k-Nearest Neighbors, Random Forests and Support Vector Machines). Results show that metadata preprocessing is beneficial, SVM outperforms kNN when using neural codes on the visual information, and the combination of neural codes and metadata only improves the results slightly when the images are classified into very general categories.
author = "Ortega-Bastida, Javier; Gallego, Antonio-Javier; Pertusa, Antonio",
title = "Multimodal Object Recognition Using Deep Learning Representations Extracted from Images and Smartphone Sensors",
booktitle = "Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications",
isbn = "978-3-030-13469-3",
pages = "521--529",
year = "2019"
|Resources associated with this publication|