🛠️ MODULE 6: Guided Project — Build Your First QA (Question Answering) System

Estimated duration of this module: 1.5 - 2 hours
Objective: Build a complete extractive question-answering system that takes a context text and a question, and returns the most probable answer extracted directly from the text.
Requirements: Only what you learned in previous modules + a code editor (or Google Colab).

Lesson 6.1 — What is Extractive QA? The Difference Between “Generating” and “Extracting”

Before we start, let’s clarify the type of QA we’ll build.

🔹 Generative QA:
The model invents a new answer, in its own words.

Question: “What is a Transformer?”
Generated Answer: “A Transformer is a neural network architecture that uses attention mechanisms to process sequences...”

🔹 Extractive QA (what we’ll do):
The model extracts a literal fragment from the context text.

Question: “What is a Transformer?”
Context: “...the Transformer, introduced in 2017, is an architecture based on attention that processes all words simultaneously...”
Extracted Answer: “an architecture based on attention that processes all words simultaneously”

✅ Advantages of Extractive QA:

Doesn’t require fine-tuning (excellent pretrained models exist).
The answer is always faithful to the source text (no hallucinations).
Ideal for technical, legal, manuals, articles, etc.

Lesson 6.2 — Choosing the Model: Our Ally Will Be “deepset/roberta-base-squad2”

For this project, we’ll use an encoder-only model, specifically trained for extractive QA.

🔹 Chosen Model: deepset/roberta-base-squad2

Based on RoBERTa (an improved variant of BERT).
Trained on SQuAD 2.0 (a QA dataset with questions that sometimes have no answer).
Reasonably supports Spanish (though primarily trained on English).
Lightweight (base), runs smoothly on CPU.

🌐 You can view it on the Model Hub

Lesson 6.3 — Step 1: Initial Setup and Model Loading

First, install (if not done before) and load the model and tokenizer.

from transformers import AutoTokenizer, AutoModelForQuestionAnswering, pipeline

# Model name
model_name = "deepset/roberta-base-squad2"

# Load tokenizer and model
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForQuestionAnswering.from_pretrained(model_name)

# Optional: create a pipeline to simplify
qa_pipeline = pipeline("question-answering", model=model, tokenizer=tokenizer)

Lesson 6.4 — Step 2: Prepare the Context and Question

We’ll use an example text. It can be anything: an article, a book excerpt, a manual, etc.

context = """
Transformers are a neural network architecture introduced in 2017 by Vaswani et al. 
in the paper "Attention Is All You Need". Unlike recurrent networks, 
Transformers process all words in a sequence simultaneously, 
using a mechanism called "attention" that allows each word to relate 
to any other in the sentence. This architecture is the basis for models like 
BERT, GPT, T5, and many others dominating natural language processing today.
"""

question = "What mechanism do Transformers use to relate words?"

Lesson 6.5 — Step 3: Use the Pipeline (The Easiest Way)

result = qa_pipeline(question=question, context=context)

print("Question:", question)
print("Answer:", result['answer'])
print("Score:", result['score'])
print("Start:", result['start'])
print("End:", result['end'])

Expected output:

Question: What mechanism do Transformers use to relate words?
Answer: attention
Score: 0.9321
Start: 234
End: 242

It works! The model extracted the word “attention” as the answer.

Lesson 6.6 — Step 4: Do It Manually (To Understand the Process)

Now, let’s do it step by step, as in Module 5, to see what happens internally.

# Tokenize question + context (together)
inputs = tokenizer(question, context, return_tensors="pt", truncation=True)

# Pass through the model
outputs = model(**inputs)

# Get start and end logits
start_logits = outputs.start_logits
end_logits = outputs.end_logits

# Find positions with highest probability
start_index = torch.argmax(start_logits)
end_index = torch.argmax(end_logits)

# Convert tokens back to text
answer_tokens = inputs.input_ids[0][start_index:end_index + 1]
answer = tokenizer.decode(answer_tokens)

print("Answer (manual):", answer)

Output:

Answer (manual): attention

🔹 What does the model do?
It predicts two things:

The position (token) where the answer starts.
The position (token) where the answer ends.

Then, it extracts all tokens between those two positions.

Lesson 6.7 — Step 5: Improve Robustness — Handle Multiple Candidates

Sometimes, the model can make mistakes if it only takes the start_index and end_index with the highest score. A better practice is to consider valid combinations (start <= end) and choose the one with the highest combined score.

import torch

def get_best_answer(start_logits, end_logits, input_ids, tokenizer, top_k=5):
    # Get top_k indices for start and end
    start_probs, start_indices = torch.topk(start_logits, top_k)
    end_probs, end_indices = torch.topk(end_logits, top_k)

    best_score = -float('inf')
    best_answer = ""

    # Test valid combinations
    for i in range(top_k):
        for j in range(top_k):
            start = start_indices[i].item()
            end = end_indices[j].item()
            if start <= end:  # valid
                score = (start_probs[i] + end_probs[j]).item()
                if score > best_score:
                    best_score = score
                    answer_tokens = input_ids[0][start:end+1]
                    best_answer = tokenizer.decode(answer_tokens, skip_special_tokens=True)

    return best_answer, best_score

# Use the function
answer, score = get_best_answer(start_logits, end_logits, inputs.input_ids, tokenizer)
print("Best answer:", answer)
print("Best score:", score)

This makes the system more robust against occasional errors.

Lesson 6.8 — Step 6: Test with Different Contexts and Questions

Now it’s your turn to experiment! Try:

context2 = """
Generative artificial intelligence enables the creation of new content: text, images, music, code. 
Models like DALL-E, Stable Diffusion, and GPT-4 are popular examples. 
These models learn patterns from large datasets and then generate original outputs 
based on prompts or instructions given by the user.
"""

question2 = "What kind of content can generative AI create?"
result2 = qa_pipeline(question=question2, context=context2)
print(result2['answer'])  # Expected: "text, images, music, code"

Or in English:

context_es = """
Barcelona is a city located on the Mediterranean coast of Spain. 
It is known for its unique architecture, especially the works of Antoni Gaudí, 
such as the Sagrada Familia and Park Güell. It is also famous for its cuisine, 
beaches, and vibrant cultural life.
"""

question_en = "Which architect is famous in Barcelona?"
result_en = qa_pipeline(question=question_en, context=context_es)
print(result_en['answer'])  # Expected: "Antoni Gaudí"

Lesson 6.9 — Step 7: Limitations and How to Overcome Them

🔹 Limitation 1: The model can only answer if the answer is in the text.

If you ask “What is the capital of France?” and the text doesn’t mention Paris, the model may invent something or give a wrong answer.

🔹 Solution:

Use the parameter handle_impossible_answer=True (if the model supports it, like SQuAD 2.0).
Or filter by score: if the score is low (< 0.1), respond “I don’t know” or “Not in the text.”

if result['score'] < 0.1:
    print("I couldn't find a reliable answer in the text.")
else:
    print("Answer:", result['answer'])

🔹 Limitation 2: Context has a length limit (~512 tokens).

If the text is very long, it gets truncated and information is lost.

🔹 Solution:

Split the text into overlapping chunks.
Ask the question on each chunk.
Choose the answer with the highest score.

✍️ Reflection Exercise 6.1

Take a Wikipedia article (or a book chapter) you like.
Copy 3-4 paragraphs as context.
Formulate 5 different questions (easy, hard, ambiguous).
Run the QA system and evaluate:

How many answers are correct?

Where does it fail?

How could you improve it?

📊 Conceptual Diagram 6.1 — QA System Flow (described)

[Question + Context] → Tokenizer → input_ids → QA Model → start_logits + end_logits → 
       ↑                   ↑             ↑               ↑
   Plain text        Converts to    Token IDs       Predicts start
                     tokens +       (question +     and end positions
                     separators     context)

→ Select best start-end combination → Extract tokens → Decode → [Final Answer]

🧠 Module 6 Conclusion

Congratulations! You’ve just built a functional artificial intelligence system, based on one of the world’s most advanced models (Transformer), without training anything, without expensive GPUs, and in less than 50 lines of code.

This system can be the foundation for:

A chatbot for technical manuals.

An assistant for reading scientific articles.

A tutor that answers questions about a book.

And best of all: you now know how it works inside! It’s not a black box. You know what an embedding is, what attention does, how position is encoded, and how the model chooses the answer.

← Module5 Module7 →

Course Info

Course: AI-course2

Language: EN

Lesson: Module6