Skip to content

costakevinn/costakevinn

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

58 Commits
 
 

Repository files navigation

Kevin Mota da Costa

Machine Learning & Data Engineer

Brazil

Building statistically grounded machine learning systems and structured data platforms for real-world problems.

PortfolioLinkedInEmail


About

I design and implement machine learning and data systems with a strong foundation in statistics, optimization, and relational data architecture.

My work spans:

  • Probabilistic modeling and uncertainty-aware learning
  • Deep learning systems built from first principles
  • Bayesian inference and MCMC sampling
  • Time series modeling on real-world datasets
  • Structured SQL data platforms with integrity enforcement

I combine mathematical rigor with production-style engineering to build reliable, reproducible ML workflows.


Engineering Philosophy

My approach to ML and Data Engineering is guided by:

  • Statistical rigor over unnecessary model complexity
  • Reproducibility over ad-hoc experimentation
  • Data architecture as part of the modeling lifecycle
  • Validation and reconciliation as first-class system components

I treat models and databases as systems — not scripts.


Selected Projects

Machine Learning & Probabilistic Systems Data Engineering & Analytical Systems
FilinGPT
Byte-level financial language model built from scratch in NumPy with structured ETL and training pipeline.
ChinookAnalytics
Layered SQL analytical platform (stg → core → marts) with financial reconciliation and executive reporting.
ProbNN
Heteroscedastic probabilistic neural network for uncertainty-aware regression using likelihood-based optimization.
RetailSQL
Normalized relational data platform enforcing business rules and integrity at the storage layer.
GPredict
Gaussian Process regression framework implementing Bayesian non-parametric modeling and posterior inference.
ParamInsight
Custom Metropolis–Hastings MCMC engine for Bayesian parameter inference and posterior diagnostics.
Probabilistic ML Thesis
Unified probabilistic ML pipeline integrating neural networks, Gaussian processes, Bayesian inference, and MCMC sampling.
OptLearn
Numerical optimization framework benchmarking SGD, Momentum, RMSProp, and Adam using finite-difference gradients.
Time Series Distance Estimation
Large-scale irregular time series processing and regression pipeline validated against benchmark datasets.

Core Competencies

Machine Learning

  • Supervised Learning (Regression & Classification)
  • Deep Learning & Neural Networks
  • Probabilistic Modeling & Uncertainty Quantification
  • Gaussian Processes & Bayesian Inference
  • Time Series Analysis
  • Likelihood-Based Optimization

Data Engineering & Analytics

  • Relational Modeling (3NF)
  • SQL Data Architecture
  • ETL / ELT Pipelines
  • Data Validation & Reconciliation
  • Analytics Engineering

Systems & Optimization

  • End-to-End ML Pipelines
  • Reproducible Experimentation
  • Modular Architecture
  • Gradient-Based Optimization
  • Performance & Numerical Stability

Tech Stack

Machine Learning: PyTorch • TensorFlow • Keras • Scikit-learn

Scientific Computing: NumPy • SciPy • Pandas • Matplotlib

Data & Databases: SQL • PostgreSQL

Infrastructure & Workflow: Docker • Git • Linux • Jupyter


Education

B.Sc. in Physics Federal University of Espírito Santo (UFES), Brazil (2018–2023) Thesis in Probabilistic Machine Learning and Statistical Modeling. CNPq-funded applied research in predictive modeling and time series analysis.

Technical Degree in IT Support & Systems Federal Institute of Espírito Santo (IFES), Brazil (2016–2017) Training in systems architecture, infrastructure, and structured technical problem-solving.

About

Machine Learning & Data Engineer building probabilistic ML systems and structured SQL data platforms grounded in statistical rigor and reproducible engineering.

Topics

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors