Séminaire de Statistique et Optimisation

Autoencoders and Generative Adversarial Networks for Image Understanding and Editing

par Alasdair Newson (Télécom Paris, LTCI)

Europe/Paris
Salle K. Johnson, 1er étage (1R3)

Salle K. Johnson, 1er étage

1R3

Description

Deep Generative Models are neural networks which can produce random examples of very high dimensional and complicated data, for example images. These include in particular Variational Autoencoders and Generative Adversarial Networks (GAN). The core idea of these models is to learn a latent representation of the data, synthesise the data in this latent space, and then project back to the data space. In general, the latent space is designed to be more compact than the data space, thus the representation is more powerful. When the projection to the latent space from the data space is also constructed, the network is called an autoencoder. Finally, generative models can also be exploited to achieve complex modifications of the output data (eg. editing). 

In this presentation, I will discuss three topics which concern generative models, in the case of image data. Firstly, I will look at exactly how an autoencoder can learn an optimal latent space in the case of simple images of centred, binary disks. The goal of this work is to understand the inner working of autoencoders in the simplest situation possible. Secondly, I will discuss how an autoencoder can be created to imitate the Principal Component Analysis. The associated architecture and loss function organise the latent space into independent axes which represent different attributes of the data (for example shape, rotation). These attributes are learned in a completely unsupervised manner. 
Finally, I will present a method which uses a pre-trained GAN to achieve high-level editing of facial images, with labelled attributes. For example, this will allow us to modify the hair style or smile of a person's face. This approach is completely generic and could therefore be applied to any type of data, as long as the pre-trained GAN is available.