To understand GANs, we need to think about payoffs and equilibria in the context of machine learning. If we can identify an equilibrium in a GAN game, we can use that equilibrium as a defining characteristic to better understand how the game works.
Most machine learning models we have seen so far are based on optimization. We define some model parameters, write down a cost function of those parameters, and then minimize that cost. If we imagine plotting this, the parameters lie along the horizontal axes and the cost function is shown on the vertical axis. The goal of the learning algorithm is to find a very low value of the cost function. In practice, we usually do not reach a global or even a local minimum. Instead, we often get stuck in regions where numerical issues make further progress difficult, but the optimization algorithm still manages to find a point with reasonably low cost.
GANs are different because there are two players: the generator and the discriminator. Each has its own cost function. In a simple and easy-to-visualize version of GANs, the discriminator’s cost is just the negative of the generator’s cost. In this case, we can describe the entire interaction using a single value function. The generator tries to minimize this value function, while the discriminator tries to maximize it.
Recall that an equilibrium occurs when neither player can improve their outcome without the other changing their strategy. In this setting, that happens at a point that is a maximum for the discriminator and a minimum for the generator. Such a point is called a saddle point. If we imagine a graph of the game, the equilibrium lies in the middle.
One horizontal axis represents the generator’s parameters. If we take a cross-section along this axis, the equilibrium appears as a minimum of the value function. The other horizontal axis represents the discriminator’s parameters. Along that axis, the equilibrium appears as a maximum of the value function.
This perspective helps us understand games played by deep networks. We aim to find a point in the joint parameter space that is simultaneously a local minimum for each player’s cost with respect to that player’s own parameters.
When we analyze the GAN game, the discriminator reaches a local maximum when it correctly estimates the probability that an input is real rather than fake. This probability is given by the ratio between the data density at that input and the sum of the data density and the generator’s model density at that input. Intuitively, this ratio measures how much of the probability mass in a region comes from real data rather than from the generator.
Using tools from game theory, we can show that if both networks are sufficiently expressive, there exists an equilibrium where the generator’s density matches the true data density. At that point, the discriminator outputs a constant value everywhere (for example, 1/2 in the standard formulation).
However, even though such an equilibrium exists, finding it in practice is difficult. We usually train GANs by running two optimization algorithms simultaneously, each minimizing its own player’s cost with respect to that player’s parameters. Unfortunately, these optimization procedures do not necessarily converge to the game’s equilibrium.
A common failure case occurs when the data distribution has multiple clusters. The generator may learn to produce samples from only one cluster. The discriminator then learns to reject that cluster as fake, prompting the generator to switch to a different cluster, and so on. Instead of covering all clusters at once, the generator keeps cycling between them.
Ideally, we would like a training algorithm that reliably finds the equilibrium where the generator captures all clusters simultaneously. Designing such an algorithm for high-dimensional, continuous, non-convex games remains one of the key research challenges in GANs. Solving this problem could unlock new applications that depend on highly reliable learning algorithms.