Video Compression for Surveillance Application using Deep Neural Network
PDF

Keywords

Video compression
Motion Estimation
Auto-encoder
Rate-Distortion minimization
Bitrate Estimation

How to Cite

Dhungel, Prasanga, Prashant Tandan, Sandesh Bhusal, Sobit Neupane, and Subarna Shakya. 2020. “Video Compression for Surveillance Application Using Deep Neural Network”. Journal of Artificial Intelligence and Capsule Networks 2 (2): 131-45. https://doi.org/10.36548/jaicn.2020.2.006.

Abstract

We present a new approach to video compression for video surveillance by refining the shortcomings of conventional approach and substitute each traditional component with their neural network counterpart. Our proposed work consists of motion estimation, compression and compensation and residue compression, learned end-to-end to minimize the rate-distortion trade off. The whole model is jointly optimized using a single loss function. Our work is based on a standard method to exploit the spatio-temporal redundancy in video frames to reduce the bit rate along with the minimization of distortions in decoded frames. We implement a neural network version of conventional video compression approach and encode the redundant frames with lower number of bits. Although, our approach is more concerned toward surveillance, it can be extended easily to general purpose videos too. Experiments show that our technique is efficient and outperforms standard MPEG encoding at comparable bitrates while preserving the visual quality.

PDF

References

Le Gall, D. (1991). Mpeg: A video compression standard for multimedia applications. Communications of the ACM, 34(4), 46–58.

Wiegand, T., Sullivan, G. J., Bjontegaard, G., & Luthra, A. (2003). Overview of the h. 264/avc video coding standard. IEEE Transactions on circuits and systems for video technology, 13(7), 560–576.

Liu, H., Chen, T., Shen, Q., Yue, T., & Ma, Z. (2018). Deep image compression via end-to-end learning., In Cvpr workshops.

Toderici, G., Vincent, D., Johnston, N., Jin Hwang, S., Minnen, D., Shor, J., & Covell, M. (2017). Full resolution image compression with recurrent neural networks, In Proceedings of the ieee conference on computer vision and pattern recognition

Kim, S., Park, J. S., Bampis, C. G., Lee, J., Markey, M. K., Dimakis, A. G., & Bovik, A. C. (2020). Adver-sarial video compression guided by soft edge detection, In Icassp 2020-2020 ieee international conference on acoustics, speech and signal processing (icassp). IEEE

Lu, G., Ouyang, W., Xu, D., Zhang, X., Cai, C., & Gao, Z. (2019). Dvc: An end-to-end deep video compression framework, In Proceedings of the ieee conference on computer vision and pattern recognition.

Rippel, O., Nair, S., Lew, C., Branson, S., Anderson, A. G., & Bourdev, L. (2018). Learned video compression, arXiv 1811.06981

Farnebäck, G. (2003). Two-frame motion estimation based on polynomial expansion, In Scandinavian conference on image analysis. Springer.

Fischer, P., Dosovitskiy, A., Ilg, E., Häusser, P., Hazırbaş, C., Golkov, V., Van der Smagt, P., Cremers, D., & Brox, T. (2015). Flownet: Learning optical flow with convolutional networks. arXiv preprint arXiv:1504.06852

Ranjan, A., & Black, M. J. (2017). Optical flow estimation using a spatial pyramid network, In Proceedings of the ieee conference on computer vision and pattern recognition.

N. Ahmed, T. N., & Rao, K. R. (1974). Discrete cosine transform. IEEE Transactions on Computers, C-23(1), 90–93.

Sullivan, G. J., Ohm, J., Han, W., & Wiegand, T. (2012). Overview of the high efficiency video coding (hevc) standard. IEEE Transactions on Circuits and Systems for Video Technology, 22(12), 1649–1668.

Hadar, O., Shleifer, A., Mukherjee, D., Joshi, U., Mazar, I., Yuzvinsky, M., Tavor, N., Itzhak, N., & Birman, R. (2017). Novel modes and adaptive block scanning order for intra prediction in AV1 (A. G. Tescher, Ed.). In A. G. Tescher (Ed.), Applications of digital image processing xl, SPIE. International Society for Optics and Photonics. https://doi.org/10.1117/12.2274035

Laude, T., & Ostermann, J. (2016). Deep learning-based intra prediction mode decision for hevc, In 2016 picture coding symposium (pcs).

Cui, W., Zhang, T., Zhang, S., Jiang, F., Zuo, W., & Zhao, D. (2018). Convolutional neural networks based intra prediction for hevc.

Zhao, Z., Wang, S., Wang, S., Zhang, X., Ma, S., & Yang, J. (2018). Cnn-based bi-directional motion compen-sation for high efficiency video coding, In 2018 ieee international symposium on circuits and systems (iscas).

Lee, J. K., Kim, N., Cho, S., & Kang, J. (2018). Convolution neural network based video coding technique using reference video synthesis, In 2018 asia-pacific signal and information processing association annual summit and conference (apsipa asc).

Liu, J., Xia, S., Yang, W., Li, M., & Liu, D. (2019). One-for-all: Grouped variation network-based fractional interpolation in video coding. IEEE Transactions on Image Processing, 28(5), 2140–2151.

Ibrahim, E. M., Badry, E., Abdelsalam, A. M., Abdalla, I. L., Sayed, M., & Shalaby, H. (2018). Neural networks based fractional pixel motion estimation for hevc, In 2018 ieee international symposium on multimedia (ism).

Jiang, F., Tao, W., Liu, S., Ren, J., Guo, X., & Zhao, D. (2018). An end-to-end compression framework based on convolutional neural networks. IEEE Transactions on Circuits and Systems for Video Technology, 28(10), 3007–3018.

Ilg, E., Mayer, N., Saikia, T., Keuper, M., Dosovitskiy, A., & Brox, T. (2017). Flownet 2.0: Evolution of optical flow estimation with deep networks, In Proceedings of the ieee conference on computer vision and pattern recognition

Schwarz, H., Marpe, D., & Wiegand, T. (2007). Overview of the scalable video coding extension of the h. 264/avc standard. IEEE Transactions on circuits and systems for video technology, 17(9), 1103–1120.

Ballé, J., Laparra, V., & Simoncelli, E. P. (2015). Density modeling of images using a generalized normalization transformation. arXiv preprint arXiv:1511.06281.

Ballé, J., Laparra, V., & Simoncelli, E. P. (2016). End-to-end optimized image compression. arXiv preprint arXiv:1611.01704.

Marpe, D., Schwarz, H., & Wiegand, T. (2003). Context-based adaptive binary arithmetic coding in the h. 264/avc video compression standard. IEEE Transactions on circuits and systems for video technology, 13(7), 620–636.

Ballé, J., Minnen, D., Singh, S., Hwang, S. J., & Johnston, N. (2018). Variational image compression with a scale hyperprior. arXiv preprint arXiv:1802.01436.

Kingma, D. P., & Welling, M. (2013). Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114.

Rissanen, J., & Langdon, G. (1981). Universal modeling and coding. IEEE Transactions on Information Theory, 27(1), 12–23

Wang, Z., Bovik, A. C., Sheikh, H. R., & Simoncelli, E. P. (2004). Image quality assessment: From error visibility to structural similarity. IEEE transactions on image processing, 13(4), 600–612.

Oh, S., Hoogs, A., Perera, A., Cuntoor, N., Chen, C.-C., Lee, J. T., Mukherjee, S., Aggarwal, J., Lee, H., Davis, L. Et al. (2011). A large-scale benchmark dataset for event recognition in surveillance video, In Cvpr2011. IEEE.

Kingma, D. P., & Ba, J. (2014). Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980.