PEFT (Parameter-Efficient Fine-Tuning) is a set of techniques that allow adapting large models by updating only a small fraction of their parameters while keeping the rest frozen. This drastically reduces memory requirements, accelerates training, and mitigates the risk of catastrophic forgetting.
The fundamental principle of PEFT is simple: not all model parameters need to be updated for the model to learn a new task. In many cases, introducing small structured modifications — such as low-rank matrices, adapter layers, or activation shifts — is sufficient to guide the model toward the desired behavior.
PEFT is not a single method but a family of techniques including:
Among these, LoRA and QLoRA are currently the most popular and widely adopted in the community, thanks to their simplicity, effectiveness, and excellent support in libraries like Hugging Face PEFT.
| Feature | Full Fine-Tuning | PEFT (LoRA/QLoRA) |
|---|---|---|
| Trainable Parameters | 100% | ~0.1% - 1% |
| VRAM Requirement | Very High (tens of GB) | Low to Moderate (can run on 16GB) |
| Training Time | Long | Short to Moderate |
| Risk of Catastrophic Forgetting | High | Low |
| Model Portability | Entire model must be saved | Only PEFT parameters (small files) are saved |
| Reusability of Base Model | Not directly possible | Yes: multiple adapters can be loaded onto the same base model |
PEFT is ideal in the following scenarios:
PEFT is not recommended when: