add svgs for docs

amakelov · Jul 2, 2024 · 7d80f5e · 7d80f5e
1 parent 28fa53b
commit 7d80f5e
Show file tree

Hide file tree

Showing 23 changed files with 2,432 additions and 974 deletions.
diff --git a/.gitignore b/.gitignore
@@ -10,7 +10,7 @@
 *.png
 *.tif
 *.gif
-*.svg
+# *.svg
 *.dot
 *.gv
 # *.mp4

diff --git a/docs/docs/01_storage_and_ops.md b/docs/docs/01_storage_and_ops.md
@@ -17,7 +17,7 @@ from mandala._next.imports import Storage
 storage = Storage(
     # omit for an in-memory storage
     db_path='my_persistent_storage.db', 
-    # omit to disable automatic dependency tracking
+    # omit to disable automatic dependency tracking & versioning
     # use "__main__" to only track functions defined in the current session
     deps_path='__main__', 
 )
@@ -63,8 +63,10 @@ The objects (e.g. `s`) returned by `@op`s are always instances of a subclass of
 composition of `@op`s that created this ref.  
 
 Two `Ref`s with the same `cid` may have different `hid`s, and `hid` is the
-unique identifier of `Ref`s in the storage. 
+unique identifier of `Ref`s in the storage. However, only 1 copy per unique
+`cid` is stored to avoid duplication in the storage.
 
+### `Ref`s can be in memory or not
 Additionally, `Ref`s have the `in_memory` property, which indicates if the
 underlying object is present in the `Ref` or if this is a "lazy" `Ref` which
 only contains metadata. **`Ref`s are only loaded in memory when needed for a new
@@ -94,7 +96,7 @@ storage.unwrap(s) # loads from storage only if necessary
 
 
 
-Other useful methods of the `Storage` include:
+### Other useful `Storage` methods
 
 - `Storage.attach(inplace: bool)`: like `unwrap`, but puts the objects in the
 `Ref`s if they are not in-memory.
@@ -121,8 +123,9 @@ version at the time of the call, and the `cid`s of the inputs
 - `Call.hid`: a history ID for the call, the same as `Call.cid`, but using the 
 `hid`s of the inputs.
 
-**Every `Ref` history ID has at most one `Call` that it is an output of**, and
-if it exists, this call can be found by calling `storage.get_ref_creator`: 
+**For every `Ref` history ID, there's at most one `Call` that has an output with
+this history ID**, and if it exists, this call can be found by calling
+`storage.get_ref_creator()`: 
 
 
 ```python

diff --git a/docs/docs/02_retracing.md b/docs/docs/02_retracing.md
@@ -89,7 +89,7 @@ with storage:
 ```
 
     AtomRef(hid='d0f...', cid='908...', in_memory=False) AtomRef(hid='f1a...', cid='69f...', in_memory=False)
-    AtomRef(hid='caf...', cid='d80...', in_memory=False)
+    AtomRef(hid='caf...', cid='f35...', in_memory=False)
     AtomRef(hid='d16...', cid='12a...', in_memory=False)
 
 
@@ -134,7 +134,7 @@ with storage:
     Loading data
     Training model
     Getting accuracy
-    AtomRef(0.84, hid='158...', cid='6c4...')
+    AtomRef(0.82, hid='158...', cid='238...')
     Training model
     Getting accuracy
     AtomRef(0.9, hid='214...', cid='24c...')
@@ -185,5 +185,5 @@ with storage:
             print(storage.unwrap(acc), storage.unwrap(model))
 ```
 
-    0.84 RandomForestClassifier(max_depth=2, n_estimators=5)
+    0.82 RandomForestClassifier(max_depth=2, n_estimators=5)
 
diff --git a/docs/docs/03_cf.md b/docs/docs/03_cf.md
@@ -1,7 +1,9 @@
 # Query the Storage with `ComputationFrame`s
 ## Why `ComputationFrame`s?
 The `ComputationFrame` data structure **formalizes the natural/intuitive way you
-think of the "web" of saved `@op` calls**.
+think of the "web" of saved `@op` calls**. It gives you a "grammar" in which
+operations over persisted computation graphs that are easy to think of are also
+easy to implement.
 
 In computational projects, all queries boil down to how some variables depend on
 other variables: e.g., in ML you often care about what input parameters lead to
@@ -12,9 +14,9 @@ represents the "web" of saved `@op` calls, linked by how the outputs of one
 
 The `ComputationFrame` (CF) is the data structure used to explore and query this
 web of calls. It's a high-level view of a collection of `@op` calls, so that
-calls that serve the same role are grouped together. The groups of calls form a
+calls that serve the same role are grouped together. **The groups of calls form a
 computational graph of variables and functions, which enables effective &
-natural high-level operations over storage. 
+natural high-level operations over storage**. 
 
 This section covers basic tools to get up to speed with CFs. For more advanced
 usage, see [Advanced `ComputationFrame` tools](06_advanced_cf.md) 
@@ -29,9 +31,9 @@ limited view of storage because it will involve few (0 or 1) `@op`s
 context to the CF by adding new function nodes containing the calls that
 produced/used some variable(s). The goal of this stage is to incorporate in the
 CF all variables whose relationships you're interested in.
-- **selection**: restrict the values of the variables in the CF
-variables by some predicates. This lets you focus on specific parameters before
-making expensive calls to the storage.
+- **combination & restriction**: merge multiple CFs, restrict to subgraphs or 
+specific values of the variables along some predicates. This lets you focus on
+the computations you want before making expensive calls to the storage.
 - **[conversion to a `pandas.DataFrame`](#extracting-dataframes-from-computationframes)**: finally,
 extract a table representing the relationships between the variables in the CF
 for downstream analysis.
@@ -244,13 +246,13 @@ print(cf.df(values='refs').to_markdown())
 ```
 
     Extracting tuples from the computation graph:
-        output_0@output_0, output_1@output_1 = train_model(n_estimators=n_estimators, X_train=X_train, y_train=y_train)
-    |    | n_estimators                                         | y_train                                              | X_train                                              | train_model                                   | output_0                                             | output_1                                             |
+        output_0@output_0, output_1@output_1 = train_model(n_estimators=n_estimators, y_train=y_train, X_train=X_train)
+    |    | y_train                                              | X_train                                              | n_estimators                                         | train_model                                   | output_1                                             | output_0                                             |
     |---:|:-----------------------------------------------------|:-----------------------------------------------------|:-----------------------------------------------------|:----------------------------------------------|:-----------------------------------------------------|:-----------------------------------------------------|
-    |  0 | AtomRef(hid='98c...', cid='29d...', in_memory=False) | AtomRef(hid='faf...', cid='83f...', in_memory=False) | AtomRef(hid='efa...', cid='a6d...', in_memory=False) | Call(train_model, cid='c4f...', hid='5f7...') | AtomRef(hid='b25...', cid='462...', in_memory=False) | AtomRef(hid='760...', cid='46b...', in_memory=False) |
-    |  1 | AtomRef(hid='120...', cid='9bc...', in_memory=False) | AtomRef(hid='faf...', cid='83f...', in_memory=False) | AtomRef(hid='efa...', cid='a6d...', in_memory=False) | Call(train_model, cid='3be...', hid='e60...') | AtomRef(hid='522...', cid='d5a...', in_memory=False) | AtomRef(hid='646...', cid='acb...', in_memory=False) |
-    |  2 | AtomRef(hid='235...', cid='c04...', in_memory=False) | AtomRef(hid='faf...', cid='83f...', in_memory=False) | AtomRef(hid='efa...', cid='a6d...', in_memory=False) | Call(train_model, cid='204...', hid='c55...') | AtomRef(hid='208...', cid='c75...', in_memory=False) | AtomRef(hid='5b7...', cid='f0a...', in_memory=False) |
-    |  3 | AtomRef(hid='9fd...', cid='4ac...', in_memory=False) | AtomRef(hid='faf...', cid='83f...', in_memory=False) | AtomRef(hid='efa...', cid='a6d...', in_memory=False) | Call(train_model, cid='5af...', hid='514...') | AtomRef(hid='331...', cid='e64...', in_memory=False) | AtomRef(hid='784...', cid='238...', in_memory=False) |
+    |  0 | AtomRef(hid='faf...', cid='83f...', in_memory=False) | AtomRef(hid='efa...', cid='a6d...', in_memory=False) | AtomRef(hid='9fd...', cid='4ac...', in_memory=False) | Call(train_model, cid='5af...', hid='514...') | AtomRef(hid='784...', cid='238...', in_memory=False) | AtomRef(hid='331...', cid='e64...', in_memory=False) |
+    |  1 | AtomRef(hid='faf...', cid='83f...', in_memory=False) | AtomRef(hid='efa...', cid='a6d...', in_memory=False) | AtomRef(hid='235...', cid='c04...', in_memory=False) | Call(train_model, cid='204...', hid='c55...') | AtomRef(hid='5b7...', cid='f0a...', in_memory=False) | AtomRef(hid='208...', cid='c75...', in_memory=False) |
+    |  2 | AtomRef(hid='faf...', cid='83f...', in_memory=False) | AtomRef(hid='efa...', cid='a6d...', in_memory=False) | AtomRef(hid='120...', cid='9bc...', in_memory=False) | Call(train_model, cid='3be...', hid='e60...') | AtomRef(hid='646...', cid='acb...', in_memory=False) | AtomRef(hid='522...', cid='d5a...', in_memory=False) |
+    |  3 | AtomRef(hid='faf...', cid='83f...', in_memory=False) | AtomRef(hid='efa...', cid='a6d...', in_memory=False) | AtomRef(hid='98c...', cid='29d...', in_memory=False) | Call(train_model, cid='c4f...', hid='5f7...') | AtomRef(hid='760...', cid='46b...', in_memory=False) | AtomRef(hid='b25...', cid='462...', in_memory=False) |
 
 
 ## 
@@ -494,16 +496,14 @@ print(cf.df().drop(columns=['X_train', 'y_train']).to_markdown())
 
     Extracting tuples from the computation graph:
         X_train@output_0, y_train@output_2 = generate_dataset(random_seed=random_seed)
-        output_0@output_0, output_1@output_1 = train_model(n_estimators=n_estimators, X_train=X_train, y_train=y_train)
+        output_0@output_0, output_1@output_1 = train_model(n_estimators=n_estimators, y_train=y_train, X_train=X_train)
         output_0_0@output_0 = eval_model(model=output_0)
-    |    |   n_estimators |   random_seed | generate_dataset                                   | train_model                                   | output_0                                             | eval_model                                   |   output_0_0 |   output_1 |
-    |---:|---------------:|--------------:|:---------------------------------------------------|:----------------------------------------------|:-----------------------------------------------------|:---------------------------------------------|-------------:|-----------:|
-    |  0 |             10 |            42 | Call(generate_dataset, cid='19a...', hid='c3f...') | Call(train_model, cid='c4f...', hid='5f7...') |                                                      |                                              |       nan    |       0.74 |
-    |  1 |             80 |            42 | Call(generate_dataset, cid='19a...', hid='c3f...') | Call(train_model, cid='3be...', hid='e60...') | RandomForestClassifier(max_depth=2, n_estimators=80) | Call(eval_model, cid='137...', hid='d32...') |         0.82 |       0.83 |
-    |  2 |             20 |            42 | Call(generate_dataset, cid='19a...', hid='c3f...') | Call(train_model, cid='204...', hid='c55...') |                                                      |                                              |       nan    |       0.8  |
-    |  3 |             40 |            42 | Call(generate_dataset, cid='19a...', hid='c3f...') | Call(train_model, cid='5af...', hid='514...') | RandomForestClassifier(max_depth=2, n_estimators=40) | Call(eval_model, cid='38f...', hid='5d3...') |         0.81 |       0.82 |
-    |  4 |             20 |            42 | Call(generate_dataset, cid='19a...', hid='c3f...') | Call(train_model, cid='204...', hid='c55...') | RandomForestClassifier(max_depth=2, n_estimators=20) |                                              |       nan    |     nan    |
-    |  5 |             10 |            42 | Call(generate_dataset, cid='19a...', hid='c3f...') | Call(train_model, cid='c4f...', hid='5f7...') | RandomForestClassifier(max_depth=2, n_estimators=10) |                                              |       nan    |     nan    |
+    |    |   random_seed | generate_dataset                                   |   n_estimators | train_model                                   |   output_1 | output_0                                             | eval_model                                   |   output_0_0 |
+    |---:|--------------:|:---------------------------------------------------|---------------:|:----------------------------------------------|-----------:|:-----------------------------------------------------|:---------------------------------------------|-------------:|
+    |  0 |            42 | Call(generate_dataset, cid='19a...', hid='c3f...') |             80 | Call(train_model, cid='3be...', hid='e60...') |       0.83 | RandomForestClassifier(max_depth=2, n_estimators=80) | Call(eval_model, cid='137...', hid='d32...') |         0.82 |
+    |  1 |            42 | Call(generate_dataset, cid='19a...', hid='c3f...') |             20 | Call(train_model, cid='204...', hid='c55...') |       0.8  | RandomForestClassifier(max_depth=2, n_estimators=20) |                                              |       nan    |
+    |  2 |            42 | Call(generate_dataset, cid='19a...', hid='c3f...') |             40 | Call(train_model, cid='5af...', hid='514...') |       0.82 | RandomForestClassifier(max_depth=2, n_estimators=40) | Call(eval_model, cid='38f...', hid='5d3...') |         0.81 |
+    |  3 |            42 | Call(generate_dataset, cid='19a...', hid='c3f...') |             10 | Call(train_model, cid='c4f...', hid='5f7...') |       0.74 | RandomForestClassifier(max_depth=2, n_estimators=10) |                                              |       nan    |
 
 
 Importantly, we see that some computations only partially follow the full
-Original file line number
+Diff line change
@@ Expand Up / @@ -10,7 +10,7 @@ @@
     *.png
     *.tif
     *.gif
-    *.svg
+    # *.svg
     *.dot
     *.gv
     # *.mp4
@@ Expand Down @@