Understand that accuracy isn’t everything. Learn to choose, calculate, and interpret appropriate metrics based on the problem—especially in imbalanced contexts like fraud detection.
Imagine a fraud dataset:
A model that always predicts “NO FRAUD” will have 99% accuracy… but it’s useless! It detects ZERO fraud.
The confusion matrix shows:
from sklearn.metrics import confusion_matrix
import seaborn as sns
y_pred = modelo.predict(X_test)
cm = confusion_matrix(y_test, y_pred)
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues',
xticklabels=['Legítimo', 'Fraude'],
yticklabels=['Legítimo', 'Fraude'])
plt.title("Confusion Matrix")
plt.ylabel("Actual")
plt.xlabel("Predicted")
plt.show()
Of all transactions I flagged as fraud, how many were actually fraud?
Precision = TP / (TP + FP)
✅ Important when false positive cost is high (e.g., blocking a legitimate card).
Of all actual frauds, how many did I detect?
Recall = TP / (TP + FN)
✅ CRITICAL in fraud detection. You want to minimize FN (undetected fraud).
Harmonic mean of precision and recall. Useful when seeking balance.
F1 = 2 * (Precision * Recall) / (Precision + Recall)
from sklearn.metrics import precision_score, recall_score, f1_score
print("Precision:", precision_score(y_test, y_pred))
print("Recall: ", recall_score(y_test, y_pred))
print("F1-Score: ", f1_score(y_test, y_pred))
The ROC curve shows the trade-off between True Positive Rate (Recall) and False Positive Rate at various decision thresholds.
AUC (Area Under Curve) is a number between 0 and 1:
from sklearn.metrics import roc_auc_score, roc_curve
y_proba = modelo.predict_proba(X_test)[:, 1] # probability of positive class
auc = roc_auc_score(y_test, y_proba)
print("AUC-ROC:", auc)
# Plot curve
fpr, tpr, thresholds = roc_curve(y_test, y_proba)
plt.plot(fpr, tpr, label=f'ROC Curve (AUC = {auc:.2f})')
plt.plot([0,1], [0,1], 'k--', label='Random Classifier')
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('ROC Curve')
plt.legend()
plt.grid()
plt.show()
✅ AUC advantage: Invariant to class imbalance. Perfect for fraud.
Scikit-learn provides a professional summary:
from sklearn.metrics import classification_report
print(classification_report(y_test, y_pred,
target_names=['Legítimo', 'Fraude']))
Typical output:
precision recall f1-score support
Legítimo 0.99 1.00 0.99 19800
Fraude 0.85 0.50 0.63 200
accuracy 0.99 20000
macro avg 0.92 0.75 0.81 20000
weighted avg 0.99 0.99 0.99 20000
Dataset: fraud_preprocessed.csv (preprocessed from previous module)
Tasks:
predict_proba and a 0.3 threshold.precision_recall_curve to plot the precision-recall trade-off (useful when negatives are the majority).