WGANVO: odometría visual monocular basada en redes adversarias generativas
DOI:
https://doi.org/10.4995/riai.2022.16113Palabras clave:
Localización, Redes Neuronales, Robots MóvilesResumen
Los sistemas tradicionales de odometría visual (VO), directos o basados en características visuales, son susceptibles de cometer errores de correspondencia entre imágenes. Además, las configuraciones monoculares sólo son capaces de estimar la localización sujeto a un factor de escala, lo que hace imposible su uso inmediato en aplicaciones de robótica o realidad virtual. Recientemente, varios problemas de Visión por Computadora han sido abordados con éxito por algoritmos de Aprendizaje Profundo. En este trabajo presentamos un sistema de odometría visual monocular basado en Aprendizaje Profundo llamado WGANVO. Específicamente, entrenamos una red neuronal basada en GAN para regresionar una estimación de movimiento. El modelo resultante recibe un par de imágenes y estima el movimiento relativo entre ellas. Entrenamos la red neuronal utilizando un enfoque semi-supervisado. A diferencia de los sistemas monoculares tradicionales basados en geometría, nuestro método basado en Deep Learning es capaz de estimar la escala absoluta de la escena sin información extra ni conocimiento previo. Evaluamos WGANVO en el conocido conjunto de datos KITTI. Demostramos que nuestro sistema funciona en tiempo real y la precisión obtenida alienta a seguir desarrollando sistemas de localización basados en Aprendizaje Profundo.
Descargas
Citas
Agrawal, P., Carreira, J., Malik, J., 2015. Learning to See by Moving. In: Proceedings of the International Conference on Computer Vision. pp. 37-45. https://doi.org/10.1109/ICCV.2015.13
Almalioglu, Y., Saputra, M. R. U., de Gusmao, P. P. B., Markham, A., Trigoni, N., 2019. GANVO: Unsupervised Deep Monocular Visual Odometry and Depth Estimation with Generative Adversarial Networks. In: Proceedings of the IEEE International Conference on Robotics and Automation. pp. 5474-5480. https://doi.org/10.1109/ICRA.2019.8793512
Comport, A. I., Malis, E., Rives, P., 2010. Real-time quadrifocal visual odometry. International Journal of Robotics Research, 245-266. https://doi.org/10.1177/0278364909356601
Cremona, J., Uzal, L., Pire, T., 2021. WGANVO Repository.https://github.com/CIFASIS/wganvo, [Online; accessed 19-August-2021].
Engel, J., Agrawal, K. K., Chen, S., Gulrajani, I., Donahue, C., Roberts, A.,2019. GANSynth: Adversarial Neural Audio Synthesis. In: Proceedings of the International Conference on Learning Representations. URL: https://openreview.net/pdf?id=H1xQVn09FX
Engel, J., Koltun, V., Cremers, D., 2018. Direct Sparse Odometry. IEEE Transactions on Pattern Analysis and Machine Intelligence, 611-625. https://doi.org/10.1109/TPAMI.2017.2658577
Engel, J., Schöps, T., Cremers, D., 2014. LSD-SLAM: Large-Scale Direct Monocular SLAM. In: Proceedings of the European Conference on Computer Vision. pp. 834-849. https://doi.org/10.1007/978-3-319-10605-2_54
Facil, J. M., Ummenhofer, B., Zhou, H., Montesano, L., Brox, T., Civera, J.,2019. CAM-Convs: Camera-Aware Multi-Scale Convolutions for Single-View Depth. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 11818-11827. https://doi.org/10.1109/CVPR.2019.01210
Forster, C., Pizzoli, M., Scaramuzza, D., 2014. SVO: Fast semi-direct monocular visual odometry. In: Proceedings of the IEEE International Conference on Robotics and Automation. pp. 15-22. https://doi.org/10.1109/ICRA.2014.6906584
Geiger, A., Lenz, P., Stiller, C., Urtasun, R., 2013. Vision Meets Robotics: The KITTI Dataset. International Journal of Robotics Research, 1231-1237. https://doi.org/10.1177/0278364913491297
Geiger, A., Lenz, P., Urtasun, R., 2012. Are we ready for Autonomous Driving? The KITTI Vision Benchmark Suite. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 3354-3361. https://doi.org/10.1109/CVPR.2012.6248074
Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair,S., Courville, A., Bengio, Y., 2014. Generative adversarial nets. In: Proceedings of the Advances in Neural Information Processing Systems. pp. 2672-2680.
Gulrajani, I., Ahmed, F., Arjovsky, M., Dumoulin, V., Courville, A. C., 2017. Improved Training of Wasserstein GANs. In: Guyon, I., Luxburg, U. V.,Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., Garnett, R. (Eds.),Proceedings of the Advances in Neural Information Processing Systems.Vol. 30. Curran Associates, Inc. URL: https://proceedings.neurips.cc/paper/2017/file/892c3b1c6dccd52936e27cbd0ff683d6-Paper.pdf
Hartley, R., Zisserman, A., 2003. Multiple View Geometry in Computer Vision. Cambridge University Press, New York, USA. https://doi.org/10.1017/CBO9780511811685
Karras, T., Laine, S., Aila, T., 2019. A Style-Based Generator Architecture for Generative Adversarial Networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 4396-4405. https://doi.org/10.1109/CVPR.2019.00453
Kendall, A., Cipolla, R., 2017. Geometric Loss Functions for Camera Pose Regression with Deep Learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 6555-6564. https://doi.org/10.1109/CVPR.2017.694
Kendall, A., Grimes, M., Cipolla, R., 2015. PoseNet: A Convolutional Network for Real-Time 6-DOF Camera Relocalization. In: Proceedings of the International Conference on Computer Vision. pp. 2938-2946. https://doi.org/10.1109/ICCV.2015.336
Krizhevsky, A., Sutskever, I., Hinton, G. E., 2012. ImageNet Classification with Deep Convolutional Neural Networks. In: Pereira, F., Burges, C.J. C., Bottou, L., Weinberger, K. Q. (Eds.), Proceedings of the Advances in Neural Information Processing Systems. Vol. 25. Curran Associates, Inc.URL:https://proceedings.neurips.cc/paper/2012/file/c399862d3b9d6b76c8436e924a68c45b-Paper.pdf
Krombach, N., Droeschel, D., Behnke, S., 2016. Combining Feature-based and Direct Methods for Semi-dense Real-time Stereo Visual Odometry. In: Proceedings of the International Conference on Intelligent Autonomous Systems. pp. 855-868. https://doi.org/10.1007/978-3-319-48036-7_62
LeCun, Y., Bengio, Y., Hinton, G., 2015. Deep learning. Nature 521 (7553),436. URL: https://www.nature.com/articles/nature14539 https://doi.org/10.1038/nature14539
Li, R., Wang, S., Long, Z., Gu, D., 2018. UnDeepVO: Monocular Visual Odometry through Unsupervised Deep Learning. In: Proceedings of the IEEE International Conference on Robotics and Automation. pp. 7286-7291. https://doi.org/10.1109/ICRA.2018.8461251
Li, S., Xue, F., Wang, X., Yan, Z., Zha, H., 2019. Sequential Adversarial Learning for Self-Supervised Deep Visual Odometry. In: Proceedings of the International Conference on Computer Vision. https://doi.org/10.1109/ICCV.2019.00294
Lowe, D. G., 1999. Object recognition from local scale-invariant features. In:Proceedings of the International Conference on Computer Vision. pp. 1150-1157. https://doi.org/10.1109/ICCV.1999.790410
Min, Z., Yang, Y., Dunn, E., 2020. VOLDOR: Visual Odometry From LogLogistic Dense Optical Flow Residuals. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 4897-4908. https://doi.org/10.1109/CVPR42600.2020.00495
Mur-Artal, R., Montiel, J. M. M., Tardós, J. D., 2015. ORB-SLAM: A Versatile and Accurate Monocular SLAM System. IEEE Transactions on Robotics,1147-1163. https://doi.org/10.1109/TRO.2015.2463671
Mur-Artal, R., Tardós, J. D., 2017. ORB-SLAM2: An Open-Source SLAM System for Monocular, Stereo, and RGB-D Cameras. IEEE Transactions on Robotics, 1255-1262. https://doi.org/10.1109/TRO.2017.2705103
Nistér, D., Naroditsky, O., Bergen, J., 2004. Visual odometry. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp.652-659. https://doi.org/10.1109/CVPR.2004.1315094
Pire, T., Fischer, T., Castro, G., De Cristóforis, P., Civera, J., Jacobo Berlles, J.,2017. S-PTAM: Stereo Parallel Tracking and Mapping. Journal of Robotics and Autonomous Systems, 27-42. https://doi.org/10.1016/j.robot.2017.03.019
Radford, A., Metz, L., Chintala, S., 2015. Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks. In: Computing Research Repository (CoRR).URL:http://arxiv.org/abs/1511.06434
Salimans, T., Goodfellow, I., Zaremba, W., Cheung, V., Radford, A., Chen, X.,2016. Improved Techniques for Training GANs. In: Proceedings of the International Conference on Neural Information Processing Systems. pp. 2234-2242.
Scaramuzza, D., Fraundorfer, F., 2011. Visual Odometry [Tutorial]. IEEE Robotics and Automation Magazine, 80-92. https://doi.org/10.1109/MRA.2011.943233
Siciliano, B., Khatib, O., 2016. Springer Handbook of Robotics. Springer Publishing Company, Incorporated. https://doi.org/10.1007/978-3-319-32552-1
Tateno, K., Tombari, F., Laina, I., Navab, N., 2017. CNN-SLAM: Real-Time Dense Monocular SLAM with Learned Depth Prediction. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp.6565-6574. https://doi.org/10.1109/CVPR.2017.695
Thrun, S., Burgard, W., Fox, D., 2005. Probabilistic Robotics. The MIT Press.
Tulyakov, S., Liu, M.-Y., Yang, X., Kautz, J., 2018. MoCoGAN: Decomposing motion and content for video generation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 1526-1535. https://doi.org/10.1109/CVPR.2018.00165
Umeyama, S., 1991. Least-squares estimation of transformation parameters between two point patterns. IEEE Transactions on Pattern Analysis and Machine Intelligence, 376-380. https://doi.org/10.1109/34.88573
Wang, S., Clark, R., Wen, H., Trigoni, N., 2017. DeepVO: Towards end-to-end visual odometry with deep Recurrent Convolutional Neural Networks. In:Proceedings of the IEEE International Conference on Robotics and Automation. pp. 2043-2050. https://doi.org/10.1109/ICRA.2017.7989236
Yang, N., Wang, R., Stückler, J., Cremers, D., 2018. Deep Virtual Stereo Odometry: Leveraging Deep Depth Prediction for Monocular Direct Sparse Odometry. In: Proceedings of the European Conference on Computer Vision. pp. 835-852. https://doi.org/10.1007/978-3-030-01237-3_50
Yi, X., Walia, E., Babyn, P., 2019. Generative adversarial network in medical imaging: A review. Medical Image Analysis 58, 101552.URL:https://www.sciencedirect.com/science/article/pii/S1361841518308430 https://doi.org/10.1016/j.media.2019.101552
Yin, Z., Shi, J., 2018. GeoNet: Unsupervised Learning of Dense Depth, OpticalFlow and Camera Pose. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 1983-1992. https://doi.org/10.1109/CVPR.2018.00212
Descargas
Publicado
Cómo citar
Número
Sección
Licencia
Derechos de autor 2021 Revista Iberoamericana de Automática e Informática industrial

Esta obra está bajo una licencia internacional Creative Commons Atribución-NoComercial-CompartirIgual 4.0.
Esta revista se publica bajo una Licencia Creative Commons Attribution-NonCommercial-CompartirIgual 4.0 International (CC BY-NC-SA 4.0)