To run QLoRA on Google Colab (free), follow these steps:
# Install dependencies
!pip install -q bitsandbytes transformers accelerate peft trl
# Verify GPU
import torch
print(f"GPU available: {torch.cuda.is_available()}")
print(f"GPU name: {torch.cuda.get_device_name(0)}")
Note: In Colab, ensure you select a T4 GPU (Runtime → Change runtime type → GPU).
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
import torch
# Quantization configuration
bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=torch.bfloat16,
bnb_4bit_use_double_quant=True,
)
# Load model and tokenizer
model_name = "Qwen/Qwen2.5-0.5B-Instruct"
model = AutoModelForCausalLM.from_pretrained(
model_name,
quantization_config=bnb_config,
device_map="auto", # Automatically distributes layers across GPU/CPU
trust_remote_code=True # Required for some models like Qwen
)
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
from peft import LoraConfig
lora_config = LoraConfig(
r=8, # Low-rank matrix dimension
lora_alpha=16, # Scaling factor (typically 2x r)
target_modules=["q_proj", "v_proj"], # Modules to apply LoRA to
lora_dropout=0.05, # Regularization dropout
bias="none", # Do not train biases
task_type="CAUSAL_LM" # Task type: causal language modeling
)
# Apply PEFT to the model
from peft import get_peft_model
model = get_peft_model(model, lora_config)
model.print_trainable_parameters()
Expected output:
trainable params: 327,680 || all params: 510,550,016 || trainable%: 0.0642
This confirms that only 0.06% of parameters are trained — over 510 million frozen!
target_modules are the model layers where LoRA matrices are inserted. Correct selection is crucial for performance.
["q_proj", "v_proj"] — Query and Value projections in attention layers.["q_proj", "k_proj", "v_proj", "o_proj"] — All attention projections (more parameters, possible improvement on complex tasks).["gate_proj", "up_proj", "down_proj"] (in Llama) or ["fc1", "fc2"] (in others).# List all linear modules in the model
from peft.utils import TRANSFORMERS_MODELS_TO_LORA_TARGET_MODULES_MAPPING
# Or inspect manually
for name, module in model.named_modules():
if isinstance(module, torch.nn.Linear):
print(name)
Tip: Start with
["q_proj", "v_proj"]. If performance is insufficient, experiment by adding more modules.
r (rank): Start with 8. If the model doesn’t learn well, try 16 or 32. If overfitting occurs, reduce to 4.lora_alpha: Usually set to 2 * r. If r=8, alpha=16. Controls the “strength” of LoRA updates.lora_dropout: 0.05 or 0.1 for small datasets. 0.0 for large datasets.bias: "none" is most common. "all" or "lora_only" rarely improve performance.