Abstract
Image generation is the task of automatically generating an image using an input vector z. In recent years, the quest to understand and manipulate this input vector has gained more and more attention due to potential applications. The previous works have shown promising results in interpreting the latent space of pre-trained Generator G to generate images up to 256 x 256 using supervised and unsupervised techniques. This paper addresses the challenge of interpreting the latent space of pre-trained Generator G to generate high-resolution images, i.e., images with resolution up to 1024x1024. This problem is tackled by proposing a new framework that iterates upon Cyclic Reverse Generator (CRG) by upgrading Encoder E present in CRG to handle high-resolution images. This model can successfully interpret the latent space of the generator in complex generative models like Progressive Growling Generative Adversarial Network (PGGAN) and StyleGAN. The framework then maps input vector zf with image attributes defined in the dataset. Moreover, it gives precise control over the output of generator models. This control over generator output is tremendously helpful in enhancing computer vision applications like photo editing and face manipulation. One downside of this framework is the reliance on a comprehensive dataset, thus limiting the use of it.
References
Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, et al. Image-to-image translation with conditional adversarial networks. In CVPR, 2017.
Binod Bhattarai and Tae-Kyun Kim. Inducing optimal attribute representations for conditional gans. ECCV, 2020.
Yunjey Choi, Minje Choi, Munyoung Kim, et al. Stargan: Unified generative adversarial networks for multi-domain image-to-image translation. In CVPR, 2018.
Po-Wei Wu, Yu-Jing Lin, Che-Han Chang, et al. Relgan: Multi-domain image-to-image translation via relative attributes. In ICCV, 2019.
Larsen, Anders Boesen Lindbo et al. “Autoencoding beyond pixels using a learned similarity metric.” ArXiv abs/1512.09300 (2016): n. pag.
Elad Richardson, Yuval Alaluf, Or Patashnik, Yotam Nitzan, Yaniv Azar, Stav Shapiro, and Daniel Cohen-Or. Encoding in style: a stylegan encoder for image-to-image translation. In CVPR, 2021.
Omer Tov, Yuval Alaluf, Yotam Nitzan, Or Patashnik, and Daniel CohenOr. Designing an encoder for stylegan image manipulation. TOG, 2021.
Pernuvs, Martin et al. “High-Resolution Face Editing with Masked GAN Latent Code Optimization.” (2021).
Gao, Yue et al. “High-Fidelity and Arbitrary Face Editing.” 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2021): 16110-16119.
Dogan, Yahya and Hacer Yalim Keles. “Iterative Facial Image Inpainting using Cyclic Reverse Generator.” ArXiv abs/2101.07036 (2021): n. pag.
X. Wu, K. Xu and P. Hall, ”A survey of image synthesis and editing with generative adversarial networks,” in Tsinghua Science and Technology, vol. 22, no. 6, pp. 660-674, December 2017, doi: 10.23919/TST.2017.8195348.
Wenqi Xian, Patsorn Sangkloy, Varun Agrawal, Amit Raj, Jingwan Lu, Chen Fang, Fisher Yu, James Hays; Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018, pp. 8456- 8465.
Dong J, Liu J, Yao K, Chantler M, Qi L, Yu H, Jian M. Survey of Procedural Methods for Two-Dimensional Texture Generation. Sensors. 2020; 20(4):1135.
Elharrouss, O., Almaadeed, N., Al-Maadeed, S. et al. Image Inpainting: A Review . Neural Process Lett 51, 2007–2028 (2020).
Karras, Tero et al. “Progressive Growing of GANs for Improved Quality, Stability, and Variation.” ArXiv abs/1710.10196 (2018): n. pag.
Karras, Tero et al. “Analyzing and Improving the Image Quality of StyleGAN.” 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2020): 8107-8116.
Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., . . . Bengio, Y. . Generative adversarial nets. In Advances in neural information processing systems (2014)(pp. 2672–2680).
M. Heusel, H. Ramsauer, T. Unterthiner, B. Nessler, and S. Hochreiter, “GANs trained by a two time-scale update rule converge to a local Nash equilibrium,” in NeurIPS, 2017.
Saad, Muhammad Muneeb et al. “A Survey on Training Challenges in Generative Adversarial Networks for Biomedical Image Analysis.” ArXiv abs/2201.07646 (2022): n. pag.
A. Jahanian, L. Chai, and P. Isola, “On the ”steerability” of generative adversarial networks,” in ICLR, 2020.
Xia, Weihao et al. “GAN Inversion: A Survey.” ArXiv abs/2101.05278 (2021): n. pag.
J.-Y. Zhu, P. Krahenb ¨ uhl, E. Shechtman, and A. A. Efros, “Gen- ¨ erative visual manipulation on the natural image manifold,” in ECCV, 2016.
G. Perarnau, J. Van De Weijer, B. Raducanu, and J. M. Alvarez, ´ “Invertible conditional gans for image editing,” arXiv preprint arXiv:1611.06355, 2016.
D. Bau, J.-Y. Zhu, J. Wulff, W. Peebles, H. Strobelt, B. Zhou, and A. Torralba, “Inverting layers of a large generator,” in ICLR Workshop, vol. 2, no. 3, 2019.
R. Abdal, Y. Qin, and P. Wonka, “Image2StyleGAN: How to embed images into the StyleGAN latent space?” in ICCV, 2019.
R Abdal, “Image2StyleGAN++: How to edit the embedded images?” in CVPR, 2020.
A. Creswell and A. A. Bharath, “Inverting the generator of a generative adversarial network,” TNNLS, 2018.
J. Zhu, Y. Shen, D. Zhao, and B. Zhou, “In-domain gan inversion for real image editing,” in ECCV, 2020.
J.-Y. Zhu, P. Krahenb ¨ uhl, E. Shechtman, and A. A. Efros, “Gen- ¨ erative visual manipulation on the natural image manifold,” in ECCV, 2016.
M. Rosca, B. Lakshminarayanan, D. Warde-Farley, and S. Mohamed, “Variational approaches for auto-encoding generative adversarial networks,” arXiv preprint.
D. Ulyanov, A. Vedaldi, and V. Lempitsky, “It takes (only) two: Adversarial generator-encoder networks,” in ThirtySecond AAAI Conference on Artificial Intelligence, 2018.
V. Dumoulin, I. Belghazi, B. Poole, O. Mastropietro, A. Lamb, M. Arjovsky, and A. Courville, “Adversarially learned inference,” arXiv preprint.
Karras, T., Laine, S., Aittala, M., Hellsten, J., Lehtinen, J., Aila, T. (2020). Analyzing and Improving the Image Quality of StyleGAN. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 8107-8116.
Roich, Daniel et al. “Pivotal Tuning for Latent-based Editing of Real Images.” ArXiv abs/2106.05744 (2021): n. pag.
Alaluf, Yuval et al. “ReStyle: A Residual-Based StyleGAN Encoder via Iterative Refinement.” ArXiv abs/2104.02699 (2021): n. pag.
Zhu, Jiapeng et al. “In-Domain GAN Inversion for Real Image Editing.” ArXiv abs/2004.00049 (2020): n. pag.
Karras, Tero et al. “Alias-Free Generative Adversarial Networks.” ArXiv abs/2106.12423 (2021): n. pag.
Bau, David et al. “Semantic photo manipulation with a generative image prior.” ACM Transactions on Graphics (TOG) 38 (2019): 1 - 11.
