WGANVO: odometría visual monocular basada en redes adversarias generativas

Autores/as

  • Javier Cremona CIFASIS, Centro Internacional Franco-Argentino de Ciencias de la Información y de Sistemas (CONICET-UNR) https://orcid.org/0000-0002-7699-4262
  • Lucas Uzal CIFASIS, Centro Internacional Franco-Argentino de Ciencias de la Información y de Sistemas (CONICET-UNR)
  • Taihú Pire CIFASIS, Centro Internacional Franco-Argentino de Ciencias de la Información y de Sistemas (CONICET-UNR)

DOI:

https://doi.org/10.4995/riai.2022.16113

Palabras clave:

Localización, Redes Neuronales, Robots Móviles

Resumen

Los sistemas tradicionales de odometría visual (VO), directos o basados en características visuales, son susceptibles de cometer errores de correspondencia entre imágenes. Además, las configuraciones monoculares sólo son capaces de estimar la localización sujeto a un factor de escala, lo que hace imposible su uso inmediato en aplicaciones de robótica o realidad virtual. Recientemente, varios problemas de Visión por Computadora han sido abordados con éxito por algoritmos de Aprendizaje Profundo. En este trabajo presentamos un sistema de odometría visual monocular basado en Aprendizaje Profundo llamado WGANVO. Específicamente, entrenamos una red neuronal basada en GAN para regresionar una estimación de movimiento. El modelo resultante recibe un par de imágenes y estima el movimiento relativo entre ellas. Entrenamos la red neuronal utilizando un enfoque semi-supervisado. A diferencia de los sistemas monoculares tradicionales basados en geometría, nuestro método basado en Deep Learning es capaz de estimar la escala absoluta de la escena sin información extra ni conocimiento previo. Evaluamos WGANVO en el conocido conjunto de datos KITTI. Demostramos que nuestro sistema funciona en tiempo real y la precisión obtenida alienta a seguir desarrollando sistemas de localización basados en Aprendizaje Profundo.

Descargas

Los datos de descargas todavía no están disponibles.

Citas

Agrawal, P., Carreira, J., Malik, J., 2015. Learning to See by Moving. In: Proceedings of the International Conference on Computer Vision. pp. 37-45. https://doi.org/10.1109/ICCV.2015.13

Almalioglu, Y., Saputra, M. R. U., de Gusmao, P. P. B., Markham, A., Trigoni, N., 2019. GANVO: Unsupervised Deep Monocular Visual Odometry and Depth Estimation with Generative Adversarial Networks. In: Proceedings of the IEEE International Conference on Robotics and Automation. pp. 5474-5480. https://doi.org/10.1109/ICRA.2019.8793512

Comport, A. I., Malis, E., Rives, P., 2010. Real-time quadrifocal visual odometry. International Journal of Robotics Research, 245-266. https://doi.org/10.1177/0278364909356601

Cremona, J., Uzal, L., Pire, T., 2021. WGANVO Repository.https://github.com/CIFASIS/wganvo, [Online; accessed 19-August-2021].

Engel, J., Agrawal, K. K., Chen, S., Gulrajani, I., Donahue, C., Roberts, A.,2019. GANSynth: Adversarial Neural Audio Synthesis. In: Proceedings of the International Conference on Learning Representations. URL: https://openreview.net/pdf?id=H1xQVn09FX

Engel, J., Koltun, V., Cremers, D., 2018. Direct Sparse Odometry. IEEE Transactions on Pattern Analysis and Machine Intelligence, 611-625. https://doi.org/10.1109/TPAMI.2017.2658577

Engel, J., Schöps, T., Cremers, D., 2014. LSD-SLAM: Large-Scale Direct Monocular SLAM. In: Proceedings of the European Conference on Computer Vision. pp. 834-849. https://doi.org/10.1007/978-3-319-10605-2_54

Facil, J. M., Ummenhofer, B., Zhou, H., Montesano, L., Brox, T., Civera, J.,2019. CAM-Convs: Camera-Aware Multi-Scale Convolutions for Single-View Depth. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 11818-11827. https://doi.org/10.1109/CVPR.2019.01210

Forster, C., Pizzoli, M., Scaramuzza, D., 2014. SVO: Fast semi-direct monocular visual odometry. In: Proceedings of the IEEE International Conference on Robotics and Automation. pp. 15-22. https://doi.org/10.1109/ICRA.2014.6906584

Geiger, A., Lenz, P., Stiller, C., Urtasun, R., 2013. Vision Meets Robotics: The KITTI Dataset. International Journal of Robotics Research, 1231-1237. https://doi.org/10.1177/0278364913491297

Geiger, A., Lenz, P., Urtasun, R., 2012. Are we ready for Autonomous Driving? The KITTI Vision Benchmark Suite. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 3354-3361. https://doi.org/10.1109/CVPR.2012.6248074

Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair,S., Courville, A., Bengio, Y., 2014. Generative adversarial nets. In: Proceedings of the Advances in Neural Information Processing Systems. pp. 2672-2680.

Gulrajani, I., Ahmed, F., Arjovsky, M., Dumoulin, V., Courville, A. C., 2017. Improved Training of Wasserstein GANs. In: Guyon, I., Luxburg, U. V.,Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., Garnett, R. (Eds.),Proceedings of the Advances in Neural Information Processing Systems.Vol. 30. Curran Associates, Inc. URL: https://proceedings.neurips.cc/paper/2017/file/892c3b1c6dccd52936e27cbd0ff683d6-Paper.pdf

Hartley, R., Zisserman, A., 2003. Multiple View Geometry in Computer Vision. Cambridge University Press, New York, USA. https://doi.org/10.1017/CBO9780511811685

Karras, T., Laine, S., Aila, T., 2019. A Style-Based Generator Architecture for Generative Adversarial Networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 4396-4405. https://doi.org/10.1109/CVPR.2019.00453

Kendall, A., Cipolla, R., 2017. Geometric Loss Functions for Camera Pose Regression with Deep Learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 6555-6564. https://doi.org/10.1109/CVPR.2017.694

Kendall, A., Grimes, M., Cipolla, R., 2015. PoseNet: A Convolutional Network for Real-Time 6-DOF Camera Relocalization. In: Proceedings of the International Conference on Computer Vision. pp. 2938-2946. https://doi.org/10.1109/ICCV.2015.336

Krizhevsky, A., Sutskever, I., Hinton, G. E., 2012. ImageNet Classification with Deep Convolutional Neural Networks. In: Pereira, F., Burges, C.J. C., Bottou, L., Weinberger, K. Q. (Eds.), Proceedings of the Advances in Neural Information Processing Systems. Vol. 25. Curran Associates, Inc.URL:https://proceedings.neurips.cc/paper/2012/file/c399862d3b9d6b76c8436e924a68c45b-Paper.pdf

Krombach, N., Droeschel, D., Behnke, S., 2016. Combining Feature-based and Direct Methods for Semi-dense Real-time Stereo Visual Odometry. In: Proceedings of the International Conference on Intelligent Autonomous Systems. pp. 855-868. https://doi.org/10.1007/978-3-319-48036-7_62

LeCun, Y., Bengio, Y., Hinton, G., 2015. Deep learning. Nature 521 (7553),436. URL: https://www.nature.com/articles/nature14539 https://doi.org/10.1038/nature14539

Li, R., Wang, S., Long, Z., Gu, D., 2018. UnDeepVO: Monocular Visual Odometry through Unsupervised Deep Learning. In: Proceedings of the IEEE International Conference on Robotics and Automation. pp. 7286-7291. https://doi.org/10.1109/ICRA.2018.8461251

Li, S., Xue, F., Wang, X., Yan, Z., Zha, H., 2019. Sequential Adversarial Learning for Self-Supervised Deep Visual Odometry. In: Proceedings of the International Conference on Computer Vision. https://doi.org/10.1109/ICCV.2019.00294

Lowe, D. G., 1999. Object recognition from local scale-invariant features. In:Proceedings of the International Conference on Computer Vision. pp. 1150-1157. https://doi.org/10.1109/ICCV.1999.790410

Min, Z., Yang, Y., Dunn, E., 2020. VOLDOR: Visual Odometry From LogLogistic Dense Optical Flow Residuals. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 4897-4908. https://doi.org/10.1109/CVPR42600.2020.00495

Mur-Artal, R., Montiel, J. M. M., Tardós, J. D., 2015. ORB-SLAM: A Versatile and Accurate Monocular SLAM System. IEEE Transactions on Robotics,1147-1163. https://doi.org/10.1109/TRO.2015.2463671

Mur-Artal, R., Tardós, J. D., 2017. ORB-SLAM2: An Open-Source SLAM System for Monocular, Stereo, and RGB-D Cameras. IEEE Transactions on Robotics, 1255-1262. https://doi.org/10.1109/TRO.2017.2705103

Nistér, D., Naroditsky, O., Bergen, J., 2004. Visual odometry. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp.652-659. https://doi.org/10.1109/CVPR.2004.1315094

Pire, T., Fischer, T., Castro, G., De Cristóforis, P., Civera, J., Jacobo Berlles, J.,2017. S-PTAM: Stereo Parallel Tracking and Mapping. Journal of Robotics and Autonomous Systems, 27-42. https://doi.org/10.1016/j.robot.2017.03.019

Radford, A., Metz, L., Chintala, S., 2015. Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks. In: Computing Research Repository (CoRR).URL:http://arxiv.org/abs/1511.06434

Salimans, T., Goodfellow, I., Zaremba, W., Cheung, V., Radford, A., Chen, X.,2016. Improved Techniques for Training GANs. In: Proceedings of the International Conference on Neural Information Processing Systems. pp. 2234-2242.

Scaramuzza, D., Fraundorfer, F., 2011. Visual Odometry [Tutorial]. IEEE Robotics and Automation Magazine, 80-92. https://doi.org/10.1109/MRA.2011.943233

Siciliano, B., Khatib, O., 2016. Springer Handbook of Robotics. Springer Publishing Company, Incorporated. https://doi.org/10.1007/978-3-319-32552-1

Tateno, K., Tombari, F., Laina, I., Navab, N., 2017. CNN-SLAM: Real-Time Dense Monocular SLAM with Learned Depth Prediction. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp.6565-6574. https://doi.org/10.1109/CVPR.2017.695

Thrun, S., Burgard, W., Fox, D., 2005. Probabilistic Robotics. The MIT Press.

Tulyakov, S., Liu, M.-Y., Yang, X., Kautz, J., 2018. MoCoGAN: Decomposing motion and content for video generation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 1526-1535. https://doi.org/10.1109/CVPR.2018.00165

Umeyama, S., 1991. Least-squares estimation of transformation parameters between two point patterns. IEEE Transactions on Pattern Analysis and Machine Intelligence, 376-380. https://doi.org/10.1109/34.88573

Wang, S., Clark, R., Wen, H., Trigoni, N., 2017. DeepVO: Towards end-to-end visual odometry with deep Recurrent Convolutional Neural Networks. In:Proceedings of the IEEE International Conference on Robotics and Automation. pp. 2043-2050. https://doi.org/10.1109/ICRA.2017.7989236

Yang, N., Wang, R., Stückler, J., Cremers, D., 2018. Deep Virtual Stereo Odometry: Leveraging Deep Depth Prediction for Monocular Direct Sparse Odometry. In: Proceedings of the European Conference on Computer Vision. pp. 835-852. https://doi.org/10.1007/978-3-030-01237-3_50

Yi, X., Walia, E., Babyn, P., 2019. Generative adversarial network in medical imaging: A review. Medical Image Analysis 58, 101552.URL:https://www.sciencedirect.com/science/article/pii/S1361841518308430 https://doi.org/10.1016/j.media.2019.101552

Yin, Z., Shi, J., 2018. GeoNet: Unsupervised Learning of Dense Depth, OpticalFlow and Camera Pose. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 1983-1992. https://doi.org/10.1109/CVPR.2018.00212

Descargas

Publicado

28-12-2021

Cómo citar

Cremona, J., Uzal, L. y Pire, T. (2021) «WGANVO: odometría visual monocular basada en redes adversarias generativas», Revista Iberoamericana de Automática e Informática industrial, 19(2), pp. 144–153. doi: 10.4995/riai.2022.16113.

Número

Sección

Artículos