diff --git a/STYLE.md b/STYLE.md index 9c59746ff..ee4c1a683 100644 --- a/STYLE.md +++ b/STYLE.md @@ -10,25 +10,94 @@ TLDR: We want to decouple metrics and analysis from the core codebase as much as possible. Metric- and figure-generation code should be encapsulated in `spd/metrics.py` and `spd/figures.py`. +### Pure Functions & I/O +- Keep I/O at the edges of your code +- Make as many functions as possible pure (no side effects, deterministic output from inputs) +- Push file operations, logging, and network calls to the highest reasonable level in the call stack + +### Data Structures +- **Don't use bare dictionaries for heterogeneous structures** + - **Good**: `{user_id: user_data}` - homogeneous values indexed by keys + - **Bad**: `{"tokens": [...], "loss": 0.5, "metadata": {...}}` - use a dataclass or typed dict instead +- This makes code more maintainable and catches errors at type-check time rather than runtime + ### Fail Fast (Negative Space Programming) Code should fail immediately when assumptions are violated, preventing bugs from propagating. -If there's an assumption you're making while writing code, assert it. -- If you were right, then it won't matter -- If you were wrong, then the code **should** fail. +**Assert liberally:** +- If you have an invariant in your head, assert it +- If you're afraid to assert, your program might already be broken +- Never soft-fail or recover silently from unexpected states + +**Don't handle errors gracefully without good reason:** +- If your program isn't working correctly, it shouldn't be running—you should be fixing it +- Exceptions should be exceptional; let the program crash and surface the real issue ## Type Annotations - Use jaxtyping for tensor shapes (though for now we don't do runtime checking) -- Always use the PEP 604 typing format of `|` for unions and `type | None` over `Optional`. +- Always use the PEP 604 typing format of `|` for unions and `type | None` over `Optional` - Use `dict`, `list` and `tuple` not `Dict`, `List` and `Tuple` -- Don't add type annotations when they're redundant. (i.e. `my_thing: Thing = Thing()` or `name: str = "John Doe"`) +- Don't add type annotations when they're redundant (i.e. `my_thing: Thing = Thing()` or `name: str = "John Doe"`) - Prefer `match` over `if/elif/else` chains when dispatching on conditions - more declarative and makes cases explicit +### Encode Invariants in Types +Write your invariants into types as much as possible to offload validation to the type checker: + +- **Joint dependencies**: If you need both `a` and `b` together, or neither: + ```python + # Bad: independently optional + a: str | None + b: int | None + + # Good: coupled in the type system + config: tuple[str, int] | None + ``` + +- **Make invalid states unrepresentable**: If tracking dependent states requires lots of assertions, encode the dependency in types instead + +### Distinguish None from Empty Collections +Differentiate "no data" from "empty data" when semantically meaningful: + +```python +# Bad: ambiguous +def process(items: list[str] | None = None): ... + +# Good: explicit semantics +def process(items: list[str] | None): + # None = data not loaded yet + # [] = loaded but empty + ... +``` + ## Tensor Operations -- Try to use einops by default for clarity. +- Try to use einops by default for clarity - Assert shapes liberally - Document complex tensor manipulations +## Function Design + +### Default Arguments +Default arguments are a code smell more often than they're useful: + +- You should have a **very good reason** for having a default value for an argument +- Especially problematic when a caller also defaults to the same thing—this creates hidden coupling +- Keep defaults high in the call stack rather than deep in implementation details +- Often it's better to be explicit at the call site than to hide behavior in defaults + +**Ask yourself**: "Does this default encode domain logic that should be visible to the caller?" + +## Code Hygiene + +### Delete Unused Code +- If an argument is always the same value, strongly consider removing it as a parameter +- Delete dead code immediately—don't comment it out +- If code isn't being used, the repository history will preserve it + +### Remove Unnecessary Logs +- Logs should provide actionable information +- Remove debug logs that don't help users understand what's happening +- Keep logs at module boundaries and for genuine diagnostics + ## Comments Your first instinct should be: "If I couldn't write any comments, how would I write this code?"