Skip to content

DaniloGouvea/vom-tech-test

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Fraud Detection – Technical Challenge (Data Science)

Executive Summary

This project presents an end-to-end fraud detection solution developed as part of a technical assessment for a Data Scientist role at VOM. The objective is to design, evaluate, and document a machine learning model capable of identifying fraudulent card transactions under real-time decision constraints and asymmetric business costs.

The work follows the CRISP-DM methodology, covering business understanding, exploratory analysis, feature preparation, modeling, evaluation, and a proposed deployment and monitoring strategy. Two models were implemented and compared: a class-weighted Logistic Regression baseline and a Gradient Boosting model (XGBoost).

The XGBoost model achieved near-perfect performance on the synthetic dataset, with strong recall and precision while minimizing false positives. Threshold analysis highlights the importance of aligning technical decisions with business risk tolerance and operational capacity.

Beyond predictive performance, the project emphasizes production-oriented considerations such as governance, monitoring, interpretability, and scalability. This repository is intended as a practical reference for applied machine learning in risk and decision systems.

Project Overview

This repository documents a technical challenge proposed by VOM as part of a Data Scientist recruitment process. The objective is to design and evaluate a fraud detection model for card transactions using a structured data science methodology.

The project is published for educational and reference purposes, allowing candidates and practitioners to learn from the problem framing, modeling decisions, and analytical workflow.

Project Motivation

This repository documents a complete and realistic data science workflow based on a real technical challenge. Although the project originated from a recruitment process, its publication is intended for educational purposes.

Many technical assignments remain private, limiting collective learning and transparency around practical problem-solving. By open-sourcing this project, the goal is to provide a concrete reference for:

  • Structuring an end-to-end machine learning project using CRISP-DM.
  • Translating business objectives into measurable modeling goals.
  • Handling class imbalance and asymmetric cost problems.
  • Evaluating and comparing models beyond accuracy metrics.
  • Incorporating deployment and monitoring considerations early in the design process.

The repository may serve as a template or learning resource for similar fraud detection and risk modeling problems. All data used is synthetic and does not represent real production systems.

Business Context

VOM provides a low-code decision engine that enables companies to create, manage, and evolve automated decision policies (e.g., credit approval, fraud prevention, pricing, and risk management).

In this challenge, the role of the Data Scientist is to propose a fraud detection solution that could be integrated into such a decision engine. Transactions are evaluated in real time, and the model must balance fraud prevention with customer experience.

Approving a fraudulent transaction is significantly more costly than incorrectly declining a legitimate one, creating an asymmetric cost structure that directly influences model evaluation, metric selection, and decision threshold tuning.

Methodology

The project explicitly follows the CRISP-DM methodology:

  1. Business Understanding
  2. Data Understanding
  3. Data Preparation
  4. Modeling
  5. Evaluation
  6. Deployment and Monitoring (conceptual proposal)

This structure ensures traceability between business objectives, analytical decisions, and operational considerations.

Dataset Description

The dataset is synthetic and represents individual card transactions with behavioral and contextual features and a fraud label.

Features

  • distance_from_home — Distance between customer residence and transaction location.
  • distance_from_last_transaction — Distance between current and previous transaction.
  • ratio_to_median_purchase_price — Transaction amount relative to historical median.
  • repeat_retailer — Whether the merchant was previously used by the customer.
  • used_chip — Whether the card chip was used.
  • used_pin_number — Whether a PIN was used.
  • online_order — Whether the transaction occurred online.
  • fraud (target) — Binary label (1 = fraud, 0 = legitimate).

Objective

The primary goal is to maximize fraud detection recall while maintaining an acceptable false positive rate to preserve customer experience and operational efficiency.

Model evaluation and threshold selection explicitly reflect the asymmetric business cost of fraud versus false declines.

Disclaimer

This project uses synthetic data and a simplified business scenario intended solely for technical assessment and learning purposes. It does not represent real production systems or operational constraints.

Tech Stack

  • Python 3.x
  • pandas, numpy
  • scikit-learn
  • XGBoost
  • matplotlib, seaborn
  • Jupyter Notebook

Results Summary

Two supervised classification models were evaluated: a class-weighted Logistic Regression baseline and an XGBoost model.

Logistic Regression (Baseline)

  • ROC-AUC: ~0.98
  • Fraud Recall: ~0.95
  • Fraud Precision: ~0.58

The model achieved strong discriminative power and high fraud recall but generated a high false positive rate (~48%). While suitable as a baseline, this behavior increases operational costs and customer friction, limiting production suitability in high-volume environments.

XGBoost

  • ROC-AUC: ~0.999
  • Fraud Recall: ~1.00
  • Fraud Precision: ~0.98

The XGBoost model achieved near-perfect performance on the validation set, detecting almost all fraud cases while keeping false positives very low. This balance delivers strong business value by reducing financial losses and minimizing unnecessary transaction declines.

Feature importance indicates that transaction amount relative to historical behavior, distance metrics, online transactions, and merchant recurrence are strong predictive signals.

Threshold Analysis

Threshold tuning demonstrates the trade-off between recall and precision. Lower thresholds increase fraud capture but raise false positives, while higher thresholds reduce false alarms at the risk of missed fraud. Threshold selection should therefore be driven by business risk tolerance and operational capacity rather than default model settings.

Recommended Model

XGBoost is recommended for production deployment due to its superior predictive performance, robustness to nonlinear relationships, and favorable balance between fraud prevention and customer experience.

Lessons Learned

1. Business Objectives Must Drive Metrics

Fraud detection involves asymmetric costs. Recall for fraud must be prioritized while controlling false positives. Accuracy alone is insufficient for decision-making.

2. Class Imbalance Requires Explicit Treatment

Strong class imbalance (~9% fraud) was effectively handled using class weighting. Oversampling methods should be applied cautiously and validated carefully.

3. Model Choice Impacts Operational Outcomes

While Logistic Regression offers interpretability, it produced excessive false positives. Tree-based ensembles captured nonlinear patterns more effectively and delivered superior operational performance.

4. Threshold Selection Is a Business Decision

Thresholds directly affect fraud capture, manual review workload, and customer experience. They should be tuned collaboratively with business stakeholders and monitored continuously.

5. Synthetic Data Can Mask Real-World Complexity

Near-perfect performance likely overestimates real-world behavior. Production systems face concept drift, noisy data, and adversarial dynamics, requiring continuous monitoring and retraining.

6. Production Readiness Goes Beyond Model Accuracy

Reliable systems require deployment pipelines, model versioning, monitoring, logging, alerting, and safe rollout strategies (e.g., shadow mode or A/B testing).

7. Interpretability Remains Important

Explainability supports regulatory compliance, operational trust, and debugging. Feature importance and model interpretation should be part of production workflows.

About

Minha solução para o teste técnico para Data Scientist da Vom

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors