📚 Module 9: Resource Management and Common Issues

9.1 Memory Optimization in Limited Environments (Colab)

Even with QLoRA, memory exhaustion in Colab is possible. Strategies:

a) Reduce per_device_train_batch_size

Start with 1 or 2. Compensate with gradient_accumulation_steps.

b) Reduce max_seq_length

Lower from 512 to 256 or 384 if content permits.

c) Use torch.compile (experimental)

model = torch.compile(model)

May accelerate training and reduce memory, but isn't always stable.

d) Clear CUDA Cache

torch.cuda.empty_cache()

Useful after loading the model or between experiments.

9.2 Common Errors and Solutions

Error: CUDA out of memory

  • Reduce batch size.
  • Increase gradient_accumulation_steps.
  • Reduce max_seq_length.
  • Restart Colab runtime and reload everything.

Error: Some weights of the model checkpoint ... were not used

Normal if loading with trust_remote_code=True or using PEFT. Not critical.

Error: ValueError: Attempting to unscale FP16 gradients.

Use optim="adamw_bnb_8bit" or optim="paged_adamw_8bit" in TrainingArguments.

Warning: The model is not in eval mode

Ignore. Trainer handles mode automatically.