The “magic” of Generative Adversarial Networks (GAN-s)

This woman has been entirely generated by a computer program. Curious to know what else this technology can do?

Andrey Gizdov
10 min readJul 9, 2022

Generative Adversarial Networks (GAN-s) — sounds complicated, doesn’t it? Do not worry. It is a lot simpler than it sounds.

In this article, I will intuitively explain how those programs work, what they are used for, and my view on their future applications. Without further ado, let’s get into it.

In a nutshell — what is it & how does it work?

Imagine the following situation;

We have two people. One of them is a policeman and the other a cheater:

The Policeman vs The Cheater

The cheater wants to print banknotes that are indistinguishable from real money. The value for him in achieving this goal is obvious — how would become rich. The policeman, on the other hand, wants to identify all fake banknotes which the cheater attempts to produce. The value for the policeman in doing so is that he increases his reputation & gets closer to a promotion.

GAN-s essentially consist of two computer programs representing this scenario — a “policeman” program & a “cheater” program. The “cheater” program attempts to fool the “policeman” program over and over again — producing all sorts of different banknotes while doing so. When the “cheater” program finally manages to come across a banknote design which successfully fools the “policeman”, its job is done and it stops trying out new banknote designs.

Real — the goal vs Fake

The example I just gave, talks about fake banknotes. But in reality, the “cheater” program can be made to fake all sorts of things & the policeman to identify all types of fake items.

For example, the “cheater” program can be made to create fake human faces. The “cheater” will keep producing human faces — each with a different set of characteristics — until he manages to fool the “policeman” (who has been well trained to know how a real face looks). This is the principle through which this guy has been created:

AI-generated human face

This might be the evolution of the cheater’s attempts. As you can see, the policeman’s feedback progressively improves the quality of the produced fakes:

Evolution of the cheater’s attempts

In the image above, I’ve skipped a few iterations between the 3rd and 4th image, but you get the point: practice makes perfect — especially when guided by feedback.

Some GAN types also allow you — the user — to tell the “cheater” what characteristics the fake he produces should have. In the context of human faces, this might mean generating one with specific hair color: blonde, brown, black ect. Those GAN types are very applicable to real-world scenarios, as we’ll see below.

In a nutshell

A “cheater” program is given the task of creating fakes — be it faces, banknotes, clothes ect. The “policeman” program provides the “cheater” with feedback on his fake attempts — why he was caught. After multiple repetitions, the “cheater” manages to come up with a strategy of manufacturing fakes that successfully fools the “policeman”. This is an intuitive way to think about GAN-s — in their basic form.

Why should you care?

Hold onto your chair. Here comes the exciting part. Bellow, you’ll find 3 of the most impressive uses of GAN-s that I think hold immense future potential. Most of those are not commercial products (yet), but this is the purpose of AI For Busy People: to give you a taste of what the future might hold.

#1. Image-to-Image Translation

Essentially, those GAN-s are able to transform a very low-quality input — representing an object — into a higher quality image containing color, texture & fine details. For example, take this scribble of a bag:

Bag scribble

Can you guess what it transforms into — automatically?

Semi-real bag

I don’t know about you, but this would have taken me a good half an hour to draw — at minimum. Is it perfect? Far from it. But it is something to start with — more on this in The Future.

There are many types of Image-to-Image GAN-s, including one which can output an entire cat from a scribble:

Scribble cat vs Semi-real cat

Do you want to turn your own scribble into a cat? Who doesn’t. You can do so over here.

The Future

In my opinion, one of the first uses which those GAN-s will undertake is architectural design:

As you can see, even without being a commercial product, this type of GAN is already producing some impressive architectural results.

If someone was to find many 3D models of buildings in a basic stage, with just walls — no painting, furniture, lighting ect. Then they were to find those same 3D buildings, but with painting, furniture & lighting, a GAN would be able to map this relationship.

Essentially, this GAN would be able to tell you exactly where & what furniture should sit, what the colors the building should be & what lighting should go where — and all of that from just the basic structure of the building!

Even if not perfect in designing the interior, this GAN at least makes the job of the designer much easier. Editing (something non-terrible) is much faster than creating. The same applies to the clothing example you saw at the beginning.

My prediction is that such sketch-to-image functionality will probably make its way into photoshop & other design software within the next decade — perhaps for architecture to start with.

The Limitation

The main limitation with this type of GAN is the need for data. Essentially, the “policeman” from the introduction needs to see a lot of building interiors to know how a decent one looks.

In turn, the “cheater” will need to see how a lot of basic buildings translate to buildings with interiors — so that he knows how to prepare your custom interior design from a plain skeleton. Believe it or not, getting thousands of examples of buildings with & without interior — in pairs — is difficult. Hence, The Limitation

The same applies for any type of object this GAN creates: for it to look decent, it needs a lot of examples.

#2. Face Frontal View Generator

Essentially, given a side view of someone’s face, this type of GAN can generate a facial frontal view:

Input — facial side view vs AI-generated image — facial front view

Considering that the image on the right has only been generated based on the side view from the left, the result is very impressive. Are you curious to see how the original face looks in comparison to the AI-generated one?

Try to guess on those examples:

Column I vs Column II vs Column III

Which column do you think contains the real faces — Column II or III?

Stop here if you want to think about it…

The second column is generated and the third one is real. They look very similar indeed, so I don’t blame you if you didn’t guess correctly.

The Future

Usually, criminals don’t stand in front of the security camera for a good facial shot. What police officers have to work with in order to identify criminals is, at best, of low quality & at a weird angle:

Police camera footage

Imagine you were a police officer & were looking for this suspect. Wouldn’t it be useful to have a clearer picture of them? This is indeed what this type of GAN can do with enough examples — more on this in The Limitation.

The principle behind this GAN is not limited to just front-face generation. Through the same paradigm, one could make a face-mask remover — something of immense practical utility in the current COVID era, which conveniently makes the tracking of criminals that much harder, for obvious reasons.

If this technology was made reliable & accurate, police departments around the world would gladly deploy it in their identification procedures.

My prediction is that such face-matching software probably won’t become commonplace in the next decade, but experiments will surely be undertaken — particularly by more authoritarian governments. Perhaps this will first be deployed in China.

The Limitation

This type of GAN, in its current stage, can generate a decent frontal face view from a very high-quality image at a good angle. In reality, criminals will not stand like this in front of the security camera (duh):

Expectation vs Reality

The solution?

You guessed it.

The main limitation here is gathering enough human faces with large variability. The “cheater” needs to learn how to fake the frontal view of all ethnicities & angles. This means that someone will need to picture the side & frontal view of people from many ethnicities & at many angles:

Multiple ethnicities & angles

In any case, with enough data, this is certainly possible.

#3. Text-to-image translation

The cherry of the cake. Essentially, this type of GAN is able to take a sentence describing an object and turn it into an actual image of that object. Take for example this description of a flower:

This flower has overlapping pink pointed petals surrounding a ring of short yellow filaments

Personally, I can’t even imagine a flower as such, let alone draw it. Can you guess what this GAN transforms the description into — automatically?

Visualization of flower description

Pretty decent. Not perfect by any stretch of the imagination, but it is something to start with.

There exist many such types of GAN-s: able to transform a description into a visualization. Here is how this same type of GAN performs on descriptions of birds:

This bird is white with some black on its head and wings, and has a long orange beak

Visualization of bird description

The Future

The main application for those types of GAN-s will likely be as an aid to the designer & the customer.

Imagine you were making a chair. “I want it to resemble an avocado” you might think in the initial stages. Where do you start? There are literally hundreds, if not more, possible chair designs which “resemble an avocado”.

What’s more: imagine that a client gave you this description for an order they want complete by next Friday. That’s not good. How do you know if the design you make will be to your client’s liking?

Well, generally, you don’t. You make a few basic sketches & send them to your client — who then tells you which one you should build. This is precisely the problem which those GAN-s can address — making dozens of “basic sketches” for your client instead of you:

chair that resembles an avocado

Are the produced “basic sketches” perfect? Not at all. However, they are a lot more perfect than nothing — and this is key. Usually, thinking of the basic design concept is the hardest and having something to do this for you is a huge time-saver.

My prediction for this particular application of GAN-s is that it probably will be utilized commercially in the next couple of years. Designer software will be first to get it.

The Limitation

For this GAN to be able to produce an image of an object — be it bag, bird, car, chair — the “cheater” would need to have seen many times how a description of this object type translates to an image.

Essentially, this means that if you want the “cheater” to generate bird images, you’ll need to have a lot of sentences describing birds and their corresponding visualizations.

The same applies for chairs: you’ll need a lot of descriptions & their corresponding chairs to make this work.

With enough text-to-image examples, however, almost anything can be visualized. It’s a matter of time for someone smart to capitalize on this. In fact, this startup already is capitalizing on something similar:

What would you design if you could just use your voice?

Conclusions

There you go: some food for thought. Did you learn something new? I hope so — or else I wouldn’t have fulfilled my intention.

If you have a minute, let me know what was good and what not so much — it helps me out tremendously to shape this website!

Which GAN application would you like to see materialize first? Let me know in the comments bellow.

StAI smart!

--

--

Andrey Gizdov

Computer Vision Researcher @ Weizmann AI Center | Aspiring tech-entrepreneur | Sport enthusiast