Abstract
Single RGB image monocular depth estimation is an inherently ill-posed task because three-dimensional data is lost when building an image. Despite the recent remarkable performance of deep learning-based models, most existing techniques are based on deterministic models that are weak in considering uncertainty, which restricts their application in complex indoor settings. To overcome this shortcoming, this paper introduces a hybrid probabilistic–deep learning architecture to estimate the depth of indoor monocular scenes, which combines Bayesian uncertainty representation with supervised convolutional neural network (CNN) refinement. As part of the proposed solution, a Bayesian Network is used to model the probabilistic dependence between image features and depth, generating pixel-wise posterior depth distributions that explicitly reflect estimation uncertainty. The following probabilistic depth priors are then used to train a supervised encoder-decoder CNN with an uncertainty-aware loss formulation that allows for smoother predictions of metric depth while retaining uncertainty information. This two-stage approach makes it less sensitive to ambiguous visual information and enhances depth estimation stability at a lower computational cost. Our proposed approach delivers an AbsRel of 0.080, RMSE of 0.290 and RMSE-log of 0.044 on the NYU Depth V2 dataset, along with a threshold accuracy of 93.0% with δ < 1.25. This shows comparable results with enhanced stability and uncertainty prediction for indoor depth estimation.
References
- Sanz, Pablo Revuelta, Belén Ruiz Mezcua, and José M. Sánchez Pena. "Depth Estimation-An Introduction." In Current Advancements in Stereo Vision. IntechOpen, 2012.
- Dijk, Tom van, and Guido de Croon. "How do Neural Networks See Depth in Single Images?." In Proceedings of the IEEE/CVF international conference on computer vision, 2019, 2183-2191.
- Khan, Faisal, Saqib Salahuddin, and Hossein Javidnia. "Deep Learning-Based Monocular Depth Estimation Methods—A State-of-the-Art Review." Sensors 20, no. 8 (2020): 2272.
- Zhao, Chaoqiang, Qiyu Sun, Chongzhen Zhang, Yang Tang, and Feng Qian. "Monocular Depth Estimation Based on Deep Learning: An Overview." Science China Technological Sciences 63, no. 9 (2020): 1612-1627.
- Masoumian, Armin, Hatem A. Rashwan, Julián Cristiano, M. Salman Asif, and Domenec Puig. "Monocular Depth Estimation Using Deep Learning: A Review." Sensors 22, no. 14 (2022): 5353.
- Wofk, Diana, Fangchang Ma, Tien-Ju Yang, Sertac Karaman, and Vivienne Sze. "Fastdepth: Fast Monocular Depth Estimation on Embedded Systems." In 2019 International Conference on Robotics and Automation (ICRA), pp. 6101-6108. IEEE, 2019.
- Kendall, Alex, and Yarin Gal. "What Uncertainties do We Need in Bayesian Deep Learning for Computer Vision?." Advances in neural information processing systems 30 (2017).
- Liu, Miaomiao, Mathieu Salzmann, and Xuming He. "Discrete-Continuous Depth Estimation from a Single Image." In Proceedings of the IEEE conference on computer vision and pattern recognition, 2014, 716-723.
- Cao, Yuanzhouhan, Tianqi Zhao, Ke Xian, Chunhua Shen, Zhiguo Cao, and Shugong Xu. "Monocular Depth Estimation with Augmented Ordinal Depth Relationships." IEEE Transactions on Circuits and Systems for Video Technology 30, no. 8 (2019): 2674-2682.
- Hu, Junjie, Yan Zhang, and Takayuki Okatani. "Visualization of Convolutional Neural Networks for Monocular Depth Estimation." In Proceedings of the IEEE/CVF international conference on computer vision, 2019, 3869-3878.
- Ji, Rongrong, Ke Li, Yan Wang, Xiaoshuai Sun, Feng Guo, Xiaowei Guo, Yongjian Wu, Feiyue Huang, and Jiebo Luo. "Semi-Supervised Adversarial Monocular Depth Estimation." IEEE transactions on pattern analysis and machine intelligence 42, no. 10 (2019): 2410-2422.
- Nishimura, Mark, David B. Lindell, Christopher Metzler, and Gordon Wetzstein. "Disambiguating Monocular Depth Estimation with a Single Transient." In European Conference on Computer Vision, Cham: Springer International Publishing, 2020, 139-155.
- Huynh, Lam, Matteo Pedone, Phong Nguyen, Jiri Matas, Esa Rahtu, and Janne Heikkilä. "Monocular Depth Estimation Primed by Salient Point Detection and Normalized Hessian Loss." In 2021 International Conference on 3D Vision (3DV), IEEE, 2021, 228-238.
- Huynh, Lam, Phong Nguyen-Ha, Jiri Matas, Esa Rahtu, and Janne Heikkilä. "Guiding Monocular Depth Estimation Using Depth-Attention Volume." In European Conference on Computer Vision, Cham: Springer International Publishing, 2020, 581-597.
- Piccinelli, Luigi, Christos Sakaridis, and Fisher Yu. "Idisc: Internal Discretization for Monocular Depth Estimation." In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2023, 21477-21487.
- Shao, Shuwei, Zhongcai Pei, Weihai Chen, Ran Li, Zhong Liu, and Zhengguo Li. "Urcdc-Depth: Uncertainty Rectified Cross-Distillation with Cutflip for Monocular Depth Estimation." IEEE Transactions on Multimedia 26 (2023): 3341-3353.
- Agarwal, Ashutosh, and Chetan Arora. "Attention Attention Everywhere: Monocular Depth Prediction with Skip Attention." In Proceedings of the IEEE/CVF winter conference on applications of computer vision, 2023, 5861-5870.
- Godard, Clément, Oisin Mac Aodha, Michael Firman, and Gabriel J. Brostow. "Digging Into Self-Supervised Monocular Depth Estimation." In Proceedings of the IEEE/CVF international conference on computer vision, 2019, 3828-3838.
- Zhou, Tinghui, Matthew Brown, Noah Snavely, and David G. Lowe. "Unsupervised Learning of Depth and Ego-Motion from Video." In Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, 1851-1858.
- Silberman, Nathan, Derek Hoiem, Pushmeet Kohli, and Rob Fergus. "Indoor Segmentation and Support Inference from Rgbd Images." In European conference on computer vision, Berlin, Heidelberg: Springer Berlin Heidelberg, 2012, 746-760.
- Fu, Huan, Mingming Gong, Chaohui Wang, Kayhan Batmanghelich, and Dacheng Tao. "Deep Ordinal Regression Network for Monocular Depth Estimation." In Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, 2002-2011.
- Yuan, Weihao, Xiaodong Gu, Zuozhuo Dai, Siyu Zhu, and Ping Tan. "Neural Window Fully-Connected Crfs for Monocular Depth Estimation." In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022, 3916-3925.
- Li, Zhenyu, Zehui Chen, Xianming Liu, and Junjun Jiang. "Depthformer: Exploiting Long-Range Correlation and Local Information for Accurate Monocular Depth Estimation." Machine Intelligence Research 20, no. 6 (2023): 837-854.
- Bae, Gwangbin, Ignas Budvytis, and Roberto Cipolla. "Irondepth: Iterative Refinement of Single-View Depth Using Surface Normal and its Uncertainty." arXiv preprint arXiv:2210.03676 (2022).
- Bhat, Shariq Farooq, Ibraheem Alhashim, and Peter Wonka. "Adabins: Depth Estimation Using Adaptive Bins." In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2021, 4009-4018.
- Li, Zhenyu, Xuyang Wang, Xianming Liu, and Junjun Jiang. "Binsformer: Revisiting Adaptive Bins for Monocular Depth Estimation." IEEE Transactions on Image Processing 33 (2024): 3964-3976.
- Vasiljevic, Igor, Nick Kolkin, Shanyi Zhang, Ruotian Luo, Haochen Wang, Falcon Z. Dai, Andrea F. Daniele et al. "Diode: A Dense Indoor and Outdoor Depth Dataset." arXiv preprint arXiv:1908.00463 (2019).

Journal of Innovative Image Processing