Skip to content

catherinenelson1/from_notebooks_to_scalable

Repository files navigation

This is the content for the talk "Going from Notebooks to Scalable Systems", presented at PyCon US 2025.

The data used in this talk is the "Palmer Penguins" dataset.

About me

I'm the author of "Software Engineering for Data Scientists", published by O'Reilly Media in 2024. It's a guide for data professionals to level up their Python coding skills, especially for early- to mid-career folks. You can read much more about the topics in this talk in my book!

Social media

Files in this repository

from_notebooks.pdf contains the slides for the PyCon 2025 talk.

penguins_notebook.ipynb is a typical "data science" style notebook, with few functions and many instances where the data is displayed. It contains a training pipeline for a model to predict penguin species.

penguins_refactored.py is the same code, but refactored into a script that is robust and reproducible.

test_penguins_refactored.py contains unit tests for the functions in penguins_refactored.py

requirements.txt contains the dependencies for this code.

The folder files_for_presentation contains the draft files I used to prepare the slides.

Links and references

Tools in this talk

  • Jupytext can convert notebooks to paired Python scripts.

  • Use nbconvert to convert a notebook to a script.

Further reading

Another great perspective on this topic: https://transferlab.ai/trainings/beyond-jupyter/

How to make great slides: https://ines.io/blog/beginners-guide-beautiful-slides-talks/

About

Code and slides for the talk "Going from Notebooks to Scalable Systems", PyCon 2025

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published