📚 Module 8: Monitoring Training and Evaluation

8.1 Monitoring with Weights & Biases (wandb)

Wandb is essential for experiment tracking. It enables real-time visualization of:

  • Training and validation loss.
  • GPU and memory usage.
  • Learning rate.
  • Generated examples during training.

Configuration:

import wandb

# Login (requires free API key)
wandb.login()

# Configure project
wandb.init(project="fine-tuning-qwen-lora", name="experiment-1")

During training, SFTTrainer automatically sends metrics to wandb if report_to="wandb" is enabled.

8.2 Evaluation During Training

The SFTTrainer can evaluate periodically if an eval_dataset is provided. Prepare a separate validation set.

# Assume we have a validation dataset
eval_dataset = ... # Similar to training set, unseen during training

# Modify trainer
trainer = SFTTrainer(
    model=model,
    args=training_args,
    train_dataset=train_dataset,
    eval_dataset=eval_dataset,           # Add evaluation dataset
    formatting_func=formatting_prompts_func,
    max_seq_length=512,
    tokenizer=tokenizer,
    packing=False,
)

Evaluation runs every eval_steps (defined in TrainingArguments) and records eval_loss. You can define custom metrics (e.g., ROUGE, BLEU, accuracy) via compute_metrics.

8.3 Manual Post-Training Evaluation

After training, test the model with new prompts.

def generate_response(instruction, input_text=""):
    prompt = f"### Instruction:\n{instruction}\n\n### Input:\n{input_text}\n\n### Response:\n"
    inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
    outputs = model.generate(**inputs, max_new_tokens=200, temperature=0.7, top_p=0.9)
    response = tokenizer.decode(outputs[0], skip_special_tokens=True)
    # Extract only the response (after "### Response:")
    response_text = response.split("### Response:")[-1].strip()
    return response_text

# Test
instruction = "Write a short description for a technology product."
input_text = "Product: Smartwatch with GPS and heart rate monitor. Price: $199.99."

print(generate_response(instruction, input_text))