Fraud Detection – Technical Challenge (Data Science)

Executive Summary

This project presents an end-to-end fraud detection solution developed as part of a technical assessment for a Data Scientist role at VOM. The objective is to design, evaluate, and document a machine learning model capable of identifying fraudulent card transactions under real-time decision constraints and asymmetric business costs.

The work follows the CRISP-DM methodology, covering business understanding, exploratory analysis, feature preparation, modeling, evaluation, and a proposed deployment and monitoring strategy. Two models were implemented and compared: a class-weighted Logistic Regression baseline and a Gradient Boosting model (XGBoost).

The XGBoost model achieved near-perfect performance on the synthetic dataset, with strong recall and precision while minimizing false positives. Threshold analysis highlights the importance of aligning technical decisions with business risk tolerance and operational capacity.

Beyond predictive performance, the project emphasizes production-oriented considerations such as governance, monitoring, interpretability, and scalability. This repository is intended as a practical reference for applied machine learning in risk and decision systems.

Project Overview

This repository documents a technical challenge proposed by VOM as part of a Data Scientist recruitment process. The objective is to design and evaluate a fraud detection model for card transactions using a structured data science methodology.

The project is published for educational and reference purposes, allowing candidates and practitioners to learn from the problem framing, modeling decisions, and analytical workflow.

Project Motivation

This repository documents a complete and realistic data science workflow based on a real technical challenge. Although the project originated from a recruitment process, its publication is intended for educational purposes.

Many technical assignments remain private, limiting collective learning and transparency around practical problem-solving. By open-sourcing this project, the goal is to provide a concrete reference for:

Structuring an end-to-end machine learning project using CRISP-DM.
Translating business objectives into measurable modeling goals.
Handling class imbalance and asymmetric cost problems.
Evaluating and comparing models beyond accuracy metrics.
Incorporating deployment and monitoring considerations early in the design process.

The repository may serve as a template or learning resource for similar fraud detection and risk modeling problems. All data used is synthetic and does not represent real production systems.

Business Context

VOM provides a low-code decision engine that enables companies to create, manage, and evolve automated decision policies (e.g., credit approval, fraud prevention, pricing, and risk management).

In this challenge, the role of the Data Scientist is to propose a fraud detection solution that could be integrated into such a decision engine. Transactions are evaluated in real time, and the model must balance fraud prevention with customer experience.

Approving a fraudulent transaction is significantly more costly than incorrectly declining a legitimate one, creating an asymmetric cost structure that directly influences model evaluation, metric selection, and decision threshold tuning.

Methodology

The project explicitly follows the CRISP-DM methodology:

Business Understanding
Data Understanding
Data Preparation
Modeling
Evaluation
Deployment and Monitoring (conceptual proposal)

This structure ensures traceability between business objectives, analytical decisions, and operational considerations.

Dataset Description

The dataset is synthetic and represents individual card transactions with behavioral and contextual features and a fraud label.

Features

distance_from_home — Distance between customer residence and transaction location.
distance_from_last_transaction — Distance between current and previous transaction.
ratio_to_median_purchase_price — Transaction amount relative to historical median.
repeat_retailer — Whether the merchant was previously used by the customer.
used_chip — Whether the card chip was used.
used_pin_number — Whether a PIN was used.
online_order — Whether the transaction occurred online.
fraud (target) — Binary label (1 = fraud, 0 = legitimate).

Objective

The primary goal is to maximize fraud detection recall while maintaining an acceptable false positive rate to preserve customer experience and operational efficiency.

Model evaluation and threshold selection explicitly reflect the asymmetric business cost of fraud versus false declines.

Disclaimer

This project uses synthetic data and a simplified business scenario intended solely for technical assessment and learning purposes. It does not represent real production systems or operational constraints.

Tech Stack

Python 3.x
pandas, numpy
scikit-learn
XGBoost
matplotlib, seaborn
Jupyter Notebook

Results Summary

Two supervised classification models were evaluated: a class-weighted Logistic Regression baseline and an XGBoost model.

Logistic Regression (Baseline)

ROC-AUC: ~0.98
Fraud Recall: ~0.95
Fraud Precision: ~0.58

The model achieved strong discriminative power and high fraud recall but generated a high false positive rate (~48%). While suitable as a baseline, this behavior increases operational costs and customer friction, limiting production suitability in high-volume environments.

XGBoost

ROC-AUC: ~0.999
Fraud Recall: ~1.00
Fraud Precision: ~0.98

The XGBoost model achieved near-perfect performance on the validation set, detecting almost all fraud cases while keeping false positives very low. This balance delivers strong business value by reducing financial losses and minimizing unnecessary transaction declines.

Feature importance indicates that transaction amount relative to historical behavior, distance metrics, online transactions, and merchant recurrence are strong predictive signals.

Threshold Analysis

Threshold tuning demonstrates the trade-off between recall and precision. Lower thresholds increase fraud capture but raise false positives, while higher thresholds reduce false alarms at the risk of missed fraud. Threshold selection should therefore be driven by business risk tolerance and operational capacity rather than default model settings.

Recommended Model

XGBoost is recommended for production deployment due to its superior predictive performance, robustness to nonlinear relationships, and favorable balance between fraud prevention and customer experience.

Lessons Learned

1. Business Objectives Must Drive Metrics

Fraud detection involves asymmetric costs. Recall for fraud must be prioritized while controlling false positives. Accuracy alone is insufficient for decision-making.

2. Class Imbalance Requires Explicit Treatment

Strong class imbalance (~9% fraud) was effectively handled using class weighting. Oversampling methods should be applied cautiously and validated carefully.

3. Model Choice Impacts Operational Outcomes

While Logistic Regression offers interpretability, it produced excessive false positives. Tree-based ensembles captured nonlinear patterns more effectively and delivered superior operational performance.

4. Threshold Selection Is a Business Decision

Thresholds directly affect fraud capture, manual review workload, and customer experience. They should be tuned collaboratively with business stakeholders and monitored continuously.

5. Synthetic Data Can Mask Real-World Complexity

Near-perfect performance likely overestimates real-world behavior. Production systems face concept drift, noisy data, and adversarial dynamics, requiring continuous monitoring and retraining.

6. Production Readiness Goes Beyond Model Accuracy

Reliable systems require deployment pipelines, model versioning, monitoring, logging, alerting, and safe rollout strategies (e.g., shadow mode or A/B testing).

7. Interpretability Remains Important

Explainability supports regulatory compliance, operational trust, and debugging. Feature importance and model interpretation should be part of production workflows.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
README.md		README.md
requirements.txt		requirements.txt
vom_logo.avif		vom_logo.avif
vom_tech_test.ipynb		vom_tech_test.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Fraud Detection – Technical Challenge (Data Science)

Executive Summary

Project Overview

Project Motivation

Business Context

Methodology

Dataset Description

Features

Objective

Disclaimer

Tech Stack

Results Summary

Logistic Regression (Baseline)

XGBoost

Threshold Analysis

Recommended Model

Lessons Learned

1. Business Objectives Must Drive Metrics

2. Class Imbalance Requires Explicit Treatment

3. Model Choice Impacts Operational Outcomes

4. Threshold Selection Is a Business Decision

5. Synthetic Data Can Mask Real-World Complexity

6. Production Readiness Goes Beyond Model Accuracy

7. Interpretability Remains Important

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Fraud Detection – Technical Challenge (Data Science)

Executive Summary

Project Overview

Project Motivation

Business Context

Methodology

Dataset Description

Features

Objective

Disclaimer

Tech Stack

Results Summary

Logistic Regression (Baseline)

XGBoost

Threshold Analysis

Recommended Model

Lessons Learned

1. Business Objectives Must Drive Metrics

2. Class Imbalance Requires Explicit Treatment

3. Model Choice Impacts Operational Outcomes

4. Threshold Selection Is a Business Decision

5. Synthetic Data Can Mask Real-World Complexity

6. Production Readiness Goes Beyond Model Accuracy

7. Interpretability Remains Important

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages