📚 Module 2: PEFT — The Efficient Fine-Tuning Paradigm

2.1 Introduction to PEFT

PEFT (Parameter-Efficient Fine-Tuning) is a set of techniques that allow adapting large models by updating only a small fraction of their parameters while keeping the rest frozen. This drastically reduces memory requirements, accelerates training, and mitigates the risk of catastrophic forgetting.

The fundamental principle of PEFT is simple: not all model parameters need to be updated for the model to learn a new task. In many cases, introducing small structured modifications — such as low-rank matrices, adapter layers, or activation shifts — is sufficient to guide the model toward the desired behavior.

PEFT is not a single method but a family of techniques including:

  • LoRA (Low-Rank Adaptation)
  • Adapter Layers
  • Prefix Tuning / Prompt Tuning
  • QLoRA (Quantized LoRA)

Among these, LoRA and QLoRA are currently the most popular and widely adopted in the community, thanks to their simplicity, effectiveness, and excellent support in libraries like Hugging Face PEFT.

2.2 Advantages of PEFT Over Full Fine-Tuning

Feature Full Fine-Tuning PEFT (LoRA/QLoRA)
Trainable Parameters 100% ~0.1% - 1%
VRAM Requirement Very High (tens of GB) Low to Moderate (can run on 16GB)
Training Time Long Short to Moderate
Risk of Catastrophic Forgetting High Low
Model Portability Entire model must be saved Only PEFT parameters (small files) are saved
Reusability of Base Model Not directly possible Yes: multiple adapters can be loaded onto the same base model

2.3 When to Use PEFT

PEFT is ideal in the following scenarios:

  • Limited hardware is available (GPU with 16GB or less).
  • The fine-tuning dataset is small (<10,000 examples).
  • You want to preserve the base model’s general knowledge.
  • You plan to adapt the same base model for multiple tasks (multi-task learning).
  • You aim to reduce training and storage costs.
  • You are prototyping rapidly or conducting iterative experimentation.

PEFT is not recommended when:

  • The fine-tuning dataset is extremely large and diverse (in which case full fine-tuning may yield better results).
  • A radical change in the model’s output distribution is required (e.g., completely changing vocabulary or linguistic domain).
  • Unlimited access to high-end hardware is available and time is not a limiting factor.