This is the content for the talk "Going from Notebooks to Scalable Systems", presented at PyCon US 2025.
The data used in this talk is the "Palmer Penguins" dataset.
I'm the author of "Software Engineering for Data Scientists", published by O'Reilly Media in 2024. It's a guide for data professionals to level up their Python coding skills, especially for early- to mid-career folks. You can read much more about the topics in this talk in my book!
- Read it on the O'Reilly Platform
- Buy it on Amazon
- Buy it from your local bookstore
from_notebooks.pdf contains the slides for the PyCon 2025 talk.
penguins_notebook.ipynb is a typical "data science" style notebook, with few functions and many instances where the data is displayed. It contains a training pipeline for a model to predict penguin species.
penguins_refactored.py is the same code, but refactored into a script that is robust and reproducible.
test_penguins_refactored.py contains unit tests for the functions in penguins_refactored.py
requirements.txt contains the dependencies for this code.
The folder files_for_presentation contains the draft files I used to prepare the slides.
-
Jupytext can convert notebooks to paired Python scripts.
-
Use nbconvert to convert a notebook to a script.
Another great perspective on this topic: https://transferlab.ai/trainings/beyond-jupyter/
How to make great slides: https://ines.io/blog/beginners-guide-beautiful-slides-talks/