Glad to have you onboard for our predictive modeling project! While you get familiar, here are a few things I'd like you all to brush up on:
For this project, I plan on starting with Linear Regression as it is a fundamental part of machine learning and also a lot easier to grasp and understand.
Resource: Codecademy - Simple Linear Regression Course (free!)
I can't stress this one enough. Understanding how to use Git is absolutely fundamental — so much that I actually want to take the time during our next standup to show you all the workflow we're looking for.
IF YOU'RE UNFAMILIAR WITH GIT: Please check out this resource, it's a great interactive tool that will teach you what "git" is and how crucial it is in our project - Learn git branching
General workflow:
create feature branch → make edits → pull from main → open pull request → merge → repeat
Don't forget to pull from main regularly to keep up to date
Important: Commit Message Standards
Please follow this format for all commit messages:
[COMMIT TYPE]: [COMMIT MESSAGE]
Commit types:
feat- New featuresfix- Bug fixesdoc- Documentation changesrefactor- Code refactoringtest- Test additions or modifications
Example: feat: add data preprocessing pipeline
Pretty obvious and not too hard. There's lots of great resources out there, but this one is a personal favorite of mine:
Resource: futurecoder.io
We have automated code quality checks that run on every pull request. Here's what you need to know:
Every time you open a PR, GitHub Actions will automatically run:
- Ruff format - Ensures code formatting consistency
- Ruff lint - Checks code quality and style
- mypy - Type checking for better code reliability
- bandit - Security scanning
- pytest - Runs all tests
Install the development tools:
pip install ruff mypy pytest bandit pre-commitInstall pre-commit hooks (highly recommended - runs checks before each commit):
pre-commit installBefore pushing your code, you can run these commands to catch issues early:
ruff format . # Auto-format your code
ruff check --fix . # Lint and auto-fix issues
mypy . # Check types
pytest # Run testsOr run all pre-commit hooks manually:
pre-commit run --all-filesAll tests go in the tests/ directory. Example test structure:
def test_data_preprocessing():
"""Test that preprocessing removes null values."""
raw_data = load_sample_data()
processed = preprocess(raw_data)
assert not processed.isnull().any().any()For ML projects, focus on testing:
- Data preprocessing functions
- Model input/output validation
- Feature engineering logic
- Edge cases and data quality
This is to be updated...
TBD!