📚 Module 1: Fundamentals of Generative AI — Beyond Classification

1.1 What is Generative AI?

Generative AI is a branch of artificial intelligence whose purpose is to learn the underlying distribution of a dataset (e.g., landscape images, portraits, oil paintings, etc.) and, based on that learning, generate new samples belonging to the same distribution but unseen during training. In other words: create something new that looks real.

Unlike discriminative models — which learn to distinguish between classes (e.g., dog vs. cat) — generative models learn to reconstruct or simulate reality based on statistical patterns. This capability makes them especially useful in contexts where data is scarce, expensive to obtain, or simply nonexistent, enabling the synthesis of artificial examples that enrich the creative or training process.

1.2 Types of Generative Models: A Historical and Technical Overview

Over the past decade, several families of generative models have emerged, each with its own strengths, weaknesses, and preferred application areas. Understanding this landscape helps contextualize why diffusion models have become so popular.

➤ Autoregressive Models

Autoregressive models generate data sequentially, predicting the next element (pixel, token, word) based on previous elements. Classic examples include PixelRNN and PixelCNN for images, and GPT for text.

  • Advantages: High quality in sequences, excellent local coherence.
  • Disadvantages: Extremely slow generation (not parallelizable), difficult to scale to high resolutions for images.
  • Typical Application: Text generation, music, small images.

➤ Generative Adversarial Networks (GANs)

Introduced in 2014 by Ian Goodfellow, GANs consist of two neural networks competing against each other: a generator that creates fake samples, and a discriminator that tries to distinguish between real and fake samples. The generator iteratively improves until the discriminator can no longer differentiate them.

  • Advantages: Very fast generation once trained, high-resolution images with extreme realism.
  • Disadvantages: Difficult to train (instability, mode collapse), requires delicate balance between generator and discriminator, subjective evaluation.
  • Typical Application: Digital art, deepfakes, face synthesis (StyleGAN), fashion design.

➤ Flow-based Models

These models use invertible and differentiable transformations to map input data to a simple distribution (e.g., Gaussian) and vice versa. Examples: RealNVP, Glow.

  • Advantages: Exact probability evaluation, reversible generation, stable training.
  • Disadvantages: Restrictive architecture, lower scalability, often inferior visual quality compared to GANs or diffusion.
  • Typical Application: Density modeling, controlled generation.

➤ Diffusion Models

Diffusion models, popularized since 2020, are based on a thermodynamics-inspired process: Gaussian noise is gradually added to data until it becomes indistinguishable from pure Gaussian noise, and then a neural network is trained to reverse this process, step by step removing noise to reconstruct a coherent image.

  • Advantages: Stable training, high sample quality and diversity, excellent handling of fine details, easy to combine with conditions (text, edges, masks).
  • Disadvantages: Slower generation than GANs (though accelerations have been proposed), higher computational cost during inference.
  • Typical Application: Text-guided image generation, image editing, super-resolution, inpainting, animation.

Course Info

Course: AI-course5

Language: EN

Lesson: Module1