Save/load config settings #108

MarcelMB · 2025-02-22T00:59:31Z

I assume this PR is going to be more controversial

first I based it off butter_filter (my other PR I just opened), should have maybe started it from feature_process and not nest it
second it is failing some tests
RROR tests/test_io.py
ERROR tests/test_models/test_model_process.py
ERROR tests/test_stream_daq.py

and here I probably need help. Because I don't know how to integrate it exactly or whats best practice.

The idea is since we now have a bunch of preprocessing steps we would need to keep track what values we used for a specific file to make the preprocessing repeatable. We have this values in denoise_example.yml. So I save the most important values in a json file (not the ones that are just for visualization)
we than want to have a CLI command to use this saved json file to update the parameters in denoise_example.yaml

mio process denoise -i video.avi -c denoise_example -u 'relativepath to' processing_log.json
the -u stand for update and uses the saved.json file

one can still run just the normal command without updating the config yaml and just using whatever its in there
`mio process denoise -i video.avi -c denoise_example'

this all works well on my end by just playing around with changing thing in the json and using -u to update it on a next run.
But I am falling test since changing the CLI probably doesn't make all the tests happy. But let me know

📚 Documentation preview 📚: https://miniscope-io--108.org.readthedocs.build/en/108/

t-sasatani · 2025-02-22T01:48:01Z

I'm not sure if it's nice to have JSON and YAMLs that basically contain the same thing but are used differently. Might be more straightforward to have a yaml_export method in the config model and save the YAML file with the export video (with a hash or identifier if necessary if uniqueness/history is an issue)?

MarcelMB · 2025-02-22T01:50:43Z

________________________________________________________ ERROR collecting tests/test_io.py ________________________________________________________ ImportError while importing test module '/Users/mbrosch/Documents/GitKraken_mac/miniscope-io/tests/test_io.py'. Hint: make sure your test modules/packages have valid Python names. Traceback: /opt/homebrew/Cellar/[email protected]/3.11.11/Frameworks/Python.framework/Versions/3.11/lib/python3.11/importlib/__init__.py:126: in import_module return _bootstrap._gcd_import(name[level:], package, level) tests/test_io.py:15: in <module> from mio.utils import hash_file, hash_video E ImportError: cannot import name 'hash_file' from 'mio.utils' (/Users/mbrosch/Documents/GitKraken_mac/miniscope-io/mio/utils/__init__.py) ____________________________________________ ERROR collecting tests/test_models/test_model_process.py _____________________________________________ ImportError while importing test module '/Users/mbrosch/Documents/GitKraken_mac/miniscope-io/tests/test_models/test_model_process.py'. Hint: make sure your test modules/packages have valid Python names. Traceback: /opt/homebrew/Cellar/[email protected]/3.11.11/Frameworks/Python.framework/Versions/3.11/lib/python3.11/importlib/__init__.py:126: in import_module return _bootstrap._gcd_import(name[level:], package, level) tests/test_models/test_model_process.py:6: in <module> from mio.utils import hash_video, hash_file E ImportError: cannot import name 'hash_video' from 'mio.utils' (/Users/mbrosch/Documents/GitKraken_mac/miniscope-io/mio/utils/__init__.py) ____________________________________________________ ERROR collecting tests/test_stream_daq.py ____________________________________________________ ImportError while importing test module '/Users/mbrosch/Documents/GitKraken_mac/miniscope-io/tests/test_stream_daq.py'. Hint: make sure your test modules/packages have valid Python names. Traceback: /opt/homebrew/Cellar/[email protected]/3.11.11/Frameworks/Python.framework/Versions/3.11/lib/python3.11/importlib/__init__.py:126: in import_module return _bootstrap._gcd_import(name[level:], package, level) tests/test_stream_daq.py:15: in <module> from mio.utils import hash_video, hash_file E ImportError: cannot import name 'hash_video' from 'mio.utils' (/Users/mbrosch/Documents/GitKraken_mac/miniscope-io/mio/utils/__init__.py)

The tests seem to fail for some hash functions. I am not sure how this could be related to the json file saving and loading functionality?!

MarcelMB · 2025-02-22T01:53:11Z

save the YAML file with the export video (with a hash or identifier if necessary if uniqueness/history is an issue)
so you mean instead of saving a json save a yaml?

i mean the file format is of course not that important. Just saving a file that can always live with the video files in a folder. to basically know what the config was or re-run it if necessary on the same file or other files by another person

t-sasatani · 2025-02-22T02:06:15Z

Oh, the quotes and sentences got mixed up lol

so you mean instead of saving a json save a yaml?

No, the file extension isn't so important.

I agree that saving config files is nice. But my question is, why don't we copy the config file into the export folder, or make a model export that can be used directly as a config file, instead of manually dumping everything into a file that can't be directly plugged in? So a comment about implementation/interface rather than concept.

t-sasatani · 2025-02-22T02:21:23Z

I guess it'll be ideal if we can throw this stuff or some metadata into the video container itself, but it seems tricky with .avi.

MarcelMB · 2025-02-22T02:50:26Z

make a model export that can be used directly as a config file

sure that makes sense. Would this file still live in the user output folder or when loded replace the actual denoise config yaml in the config folder?

for now I never change the actual config file. it stays how it is downloaded. But only temporarily updates from the json file. if -u is used otherwise does what ever is in there

MarcelMB · 2025-03-07T01:47:35Z

ideas to change this before merge

copy paste yaml config file into user output folder (with .avi files)
-c denoise_example -u 'relativepath to' processing_log.json replace denoise_example instead of adding another yaml file

sneakers-the-rat · 2025-03-07T04:27:44Z

sorry just seeing this.

If i'm getting this right, we want to have some root configuration that might have a bunch of minor variations, e.g. for each folder in some project directory?

there are a few planned changes to configs that are relevant here:

first, as @t-sasatani said above, the plan is to have a command to create a new config from an existing one. The config files themselves already satisfy the need for per-file or per-project config, so that would work like

# create new config with id `my-wireless` in user config directory
mio config copy wireless-200px my-wireless
# use it
mio stream capture -c my-wireless

# create a new config with id `my-wireless-2` in current directory
mio config copy wireless-200px my-wireless-2 -o ./config.yaml
# use it
mio stream capture -c ./config.yaml

second, it is definitely true that a system like mio config copy would be annoying: what if you need to change a value? do you need to change all the configs???? in working on the pipelining stuff, i have made a few improvements to the config system that gives it some of the nice features of a templating system. so far i've implemented an include-like thing where you can basically import another config into the current one by referring to it by id where config values would normally go, so e.g. here:

mio/mio/data/config/wirefree/wirefree-pipeline.yaml

Line 9 in a8ec180

layout: "wirefree-sd-layout"

where the WireFreePipeline model specifies that it needs an SDFileSourceConfig (implicitly, working on that part), the loader then injects the contents of wirefree-sd-layout by reference.

The inverse of includes is extends, where rather than importing another config into the current one as some key, you say that another config is the base config that you are extending/overriding. This would make it easier to have one "root" config that doesn't change, and then have more minimal configs that only change a subset of the values, but you only need to edit the single parent config to update it.

so, e.g.

parent.yaml
```
id: parent-config
some_value: 1
another_value: 2
changing_value: "hello"
```
child.yaml
```
id: child-config
extends: parent-config 
changing_value: "sup"
```
child-config would get loaded as {"some_value": 1, "another_value": 2, "changing_value": "sup"} , as desired.

The reasons I prefer those rather than something like this is that

it keeps the config system uniform: juggling paths is a pain, especially when you start getting into dependencies between things. e.g. for the extends example, it would be a pain in the ass if you had to e.g. specify the parent config as ../parent-config.yaml and then make sure that you never changed the relative folder structure of your data. absolute paths are a no-go because they are not portable and not publishable in addition to the other problems with being dependent on paths. If all configs are always referenced by ids, they are always in one place (yes, you can supply a path via the cli if that is easier to do, e.g. if you always have a mio_config.yaml in each project directory), they are always yaml, and they are always backed by a pydantic model, it's a lot easier to build intuition.
this version mixes concerns a bit: if we wanted to update the values in a config file, we would want that to be like mio config set -c my-config some_value=2 to set a value from the cli, and then we would assume that the file itself was changed. having this be in the capture function overloads it, and introduces cli inconsistencies -- it would make sense that we should be able to do that anywhere we refer to a config
discourages people from making their own configs: we want to encourage people to make their own config files for projects so that they can be versioned along with any project code and then publishable. so we want people to use something like mio config copy, or a system like extends where we preserve the automatically stamped mio_version (and can thus resolve the state of what the extended config was, at least coarsely).

sneakers-the-rat

already commented that i think this should be implemented differently, tho i totally understand the need. just commenting here to show some stuff that could be done differently for learning purposes

sneakers-the-rat · 2025-03-07T04:38:10Z

mio/process/video.py

+    metadata = {
+        "timestamp": datetime.now().isoformat(),
+        "input_video": str(video_path),
+        "output_directory": str(output_dir),
+        "processing_config": {
+            "noise_patch": {
+                "enabled": config.noise_patch.enable if config.noise_patch else False,
+                "method": config.noise_patch.method if config.noise_patch else None,
+                "gradient_config": (
+                    {
+                        "threshold": (
+                            config.noise_patch.gradient_config.threshold
+                            if config.noise_patch
+                            else None
+                        ),
+                    }
+                    if config.noise_patch and config.noise_patch.gradient_config
+                    else None
+                ),
+                "black_area_config": (
+                    {
+                        "consecutive_threshold": (
+                            config.noise_patch.black_area_config.consecutive_threshold
+                            if config.noise_patch
+                            else None
+                        ),
+                        "value_threshold": (
+                            config.noise_patch.black_area_config.value_threshold
+                            if config.noise_patch
+                            else None
+                        ),
+                    }
+                    if config.noise_patch and config.noise_patch.black_area_config
+                    else None
+                ),
+            },
+            "frequency_masking": {
+                "enabled": config.frequency_masking.enable if config.frequency_masking else False,
+                "cast_float32": (
+                    config.frequency_masking.cast_float32 if config.frequency_masking else False
+                ),
+                "cutoff_radius": (
+                    config.frequency_masking.spatial_LPF_cutoff_radius
+                    if config.frequency_masking
+                    else None
+                ),
+                "vertical_BEF": (
+                    config.frequency_masking.vertical_BEF_cutoff
+                    if config.frequency_masking
+                    else None
+                ),
+                "horizontal_BEF": (
+                    config.frequency_masking.horizontal_BEF_cutoff
+                    if config.frequency_masking
+                    else None
+                ),
+            },
+            "butterworth": {
+                "enabled": config.butter_filter.enable,
+                "order": config.butter_filter.order,
+                "cutoff_frequency": config.butter_filter.cutoff_frequency,
+                "sampling_rate": config.butter_filter.sampling_rate,
+            },
+        },
+        "output_files": {
+            "noise_patch": f"output_{pathstem}_patch.avi",
+            "freq_mask": f"output_{pathstem}_freq_mask.avi",
+            "butter_filter": f"output_{pathstem}_butter_filter.avi",
+            "butter_plot": f"{pathstem}_butter_filter_intensity_plot.png",
+        },
+    }
+
+    # Save as JSON (overwrite any existing file)
+    log_file = output_dir / "processing_log.json"
+    logs = {"processing_runs": [metadata]}  # Just create new log with single run
+
+    with open(log_file, "w") as f:  # "w" mode overwrites existing file
+        json.dump(logs, f, indent=2)


I don't think we should do this for a few reasons, not to belabor the point but we should not need to log what our config was after the fact, we should already know that because we already supplied it when we invoked it! it might be nice to offer, but i would really want to just make a structured model for outputs and have this be a part of that. ad-hoc layouts and formats are not gr8 in general.

but anyway, you'd want this to be like:

Suggested change

metadata = {

"timestamp": datetime.now().isoformat(),

"input_video": str(video_path),

"output_directory": str(output_dir),

"processing_config": {

"noise_patch": {

"enabled": config.noise_patch.enable if config.noise_patch else False,

"method": config.noise_patch.method if config.noise_patch else None,

"gradient_config": (

{

"threshold": (

config.noise_patch.gradient_config.threshold

if config.noise_patch

else None

),

}

if config.noise_patch and config.noise_patch.gradient_config

else None

),

"black_area_config": (

{

"consecutive_threshold": (

config.noise_patch.black_area_config.consecutive_threshold

if config.noise_patch

else None

),

"value_threshold": (

config.noise_patch.black_area_config.value_threshold

if config.noise_patch

else None

),

}

if config.noise_patch and config.noise_patch.black_area_config

else None

),

},

"frequency_masking": {

"enabled": config.frequency_masking.enable if config.frequency_masking else False,

"cast_float32": (

config.frequency_masking.cast_float32 if config.frequency_masking else False

),

"cutoff_radius": (

config.frequency_masking.spatial_LPF_cutoff_radius

if config.frequency_masking

else None

),

"vertical_BEF": (

config.frequency_masking.vertical_BEF_cutoff

if config.frequency_masking

else None

),

"horizontal_BEF": (

config.frequency_masking.horizontal_BEF_cutoff

if config.frequency_masking

else None

),

},

"butterworth": {

"enabled": config.butter_filter.enable,

"order": config.butter_filter.order,

"cutoff_frequency": config.butter_filter.cutoff_frequency,

"sampling_rate": config.butter_filter.sampling_rate,

},

},

"output_files": {

"noise_patch": f"output_{pathstem}_patch.avi",

"freq_mask": f"output_{pathstem}_freq_mask.avi",

"butter_filter": f"output_{pathstem}_butter_filter.avi",

"butter_plot": f"{pathstem}_butter_filter_intensity_plot.png",

},

}

# Save as JSON (overwrite any existing file)

log_file = output_dir / "processing_log.json"

logs = {"processing_runs": [metadata]} # Just create new log with single run

with open(log_file, "w") as f: # "w" mode overwrites existing file

json.dump(logs, f, indent=2)

logs = {

"timestamp": datetime.now().isoformat(),

"input_video": str(video_path),

"output_directory": str(output_dir),

"config": config.model_dump()

}

with open(log_file, "w") as f: # "w" mode overwrites existing file

json.dump(logs, f, indent=2)

no reason to change the keys like this when we're saving the config, might as well just dump the whole thing which has 100% of the information that the config has by definition, and is in a format that can be roundtripped if needed. doing it like this would mean that every time we changed the model, we would need to remember to update this.

sneakers-the-rat · 2025-03-07T04:41:15Z

mio/utils/__init__.py

@@ -0,0 +1 @@
+"""Utility functions for miniscope-io."""


i have blanket objects to the existence of utils subpackages, as they are, as a rule, junk drawer subpackages that make maintenance unnecessarily hard. for an advanced case of how this plays out, see linkml/linkml#2371

it is true that sometimes things don't have an obvious place, so a single, short utils.py module is fine, but once it starts ballooning...

sneakers-the-rat · 2025-03-07T04:44:02Z

mio/utils/config.py

+    with open(json_path) as f:
+        log_data = json.load(f)


ideally all i/o operations are separated from processing actions (so, e.g., we wouldn't need to write to a temporary tmp.json file and then pass the path to this function if we already had a dict).

sneakers-the-rat · 2025-03-07T04:49:51Z

mio/utils/config.py

+    # Update noise patch config
+    config.noise_patch.enable = proc_config["noise_patch"]["enabled"]
+    if config.noise_patch.enable:
+        config.noise_patch.method = proc_config["noise_patch"]["method"]
+        if proc_config["noise_patch"]["gradient_config"]:
+            config.noise_patch.gradient_config.threshold = proc_config["noise_patch"][
+                "gradient_config"
+            ]["threshold"]
+        if proc_config["noise_patch"]["black_area_config"]:
+            config.noise_patch.black_area_config.consecutive_threshold = proc_config["noise_patch"][
+                "black_area_config"
+            ]["consecutive_threshold"]
+            config.noise_patch.black_area_config.value_threshold = proc_config["noise_patch"][
+                "black_area_config"
+            ]["value_threshold"]
+
+    # Update frequency masking config
+    config.frequency_masking.enable = proc_config["frequency_masking"]["enabled"]
+    if config.frequency_masking.enable:
+        config.frequency_masking.cast_float32 = proc_config["frequency_masking"]["cast_float32"]
+        config.frequency_masking.spatial_LPF_cutoff_radius = proc_config["frequency_masking"][
+            "cutoff_radius"
+        ]
+        config.frequency_masking.vertical_BEF_cutoff = proc_config["frequency_masking"][
+            "vertical_BEF"
+        ]
+        config.frequency_masking.horizontal_BEF_cutoff = proc_config["frequency_masking"][
+            "horizontal_BEF"
+        ]
+
+    # Update butterworth config
+    config.butter_filter.enable = proc_config["butterworth"]["enabled"]
+    if config.butter_filter.enable:
+        config.butter_filter.order = proc_config["butterworth"]["order"]
+        config.butter_filter.cutoff_frequency = proc_config["butterworth"]["cutoff_frequency"]
+        config.butter_filter.sampling_rate = proc_config["butterworth"]["sampling_rate"]


same problem as above, lack of generality, need for maintenance if model changes.

Suggested change

# Update noise patch config

config.noise_patch.enable = proc_config["noise_patch"]["enabled"]

if config.noise_patch.enable:

config.noise_patch.method = proc_config["noise_patch"]["method"]

if proc_config["noise_patch"]["gradient_config"]:

config.noise_patch.gradient_config.threshold = proc_config["noise_patch"][

"gradient_config"

]["threshold"]

if proc_config["noise_patch"]["black_area_config"]:

config.noise_patch.black_area_config.consecutive_threshold = proc_config["noise_patch"][

"black_area_config"

]["consecutive_threshold"]

config.noise_patch.black_area_config.value_threshold = proc_config["noise_patch"][

"black_area_config"

]["value_threshold"]

# Update frequency masking config

config.frequency_masking.enable = proc_config["frequency_masking"]["enabled"]

if config.frequency_masking.enable:

config.frequency_masking.cast_float32 = proc_config["frequency_masking"]["cast_float32"]

config.frequency_masking.spatial_LPF_cutoff_radius = proc_config["frequency_masking"][

"cutoff_radius"

]

config.frequency_masking.vertical_BEF_cutoff = proc_config["frequency_masking"][

"vertical_BEF"

]

config.frequency_masking.horizontal_BEF_cutoff = proc_config["frequency_masking"][

"horizontal_BEF"

]

# Update butterworth config

config.butter_filter.enable = proc_config["butterworth"]["enabled"]

if config.butter_filter.enable:

config.butter_filter.order = proc_config["butterworth"]["order"]

config.butter_filter.cutoff_frequency = proc_config["butterworth"]["cutoff_frequency"]

config.butter_filter.sampling_rate = proc_config["butterworth"]["sampling_rate"]

config = config.model_validate({**config.model_dump(), **proc_config})

t-sasatani · 2025-03-07T05:15:22Z

I think the primary need for this is to record the configs with the output files. So I think the key dump @sneakers-the-rat posted is pretty much enough to cover short-term needs.

For the design of configs in general, the denoise and device configs probably have different needs. I'm not exactly sure what's a good interface that fits both:

denoise configs are pretty dynamic. We're now playing around with filter parameters (cutoff, etc.), so numerous minor config adjustments exist. I'm unsure if we should have an ID for each, but I want to preserve raw processing parameters with the output. The denoising may require users to play around with parameters, even in the final form of mio. It might be better to interact with notebooks if that's something people are comfortable with, or a GUI.
device configs are matched one-to-one with the device (or at least the firmware version) and will be much more static. So, I guess keeping the ID is usually sufficient. Also, I think the child config method @sneakers-the-rat posted above can cover most of the tool developer's needs.

sneakers-the-rat · 2025-03-07T07:49:20Z

i think recording the configs with the output is a really good idea - i would just like to spec that out a bit more explicitly :) that is the primary place that downstream analysis and data format tools will interact with this package, so it's worth having that be explicit and declarative. e.g. see the nwb miniscope extension - it's very dependent on the structure of the config model and output directories: https://github.com/catalystneuro/ndx-miniscope/tree/main/src/pynwb/ndx_miniscope

the denoise and device configs probably have different needs. I'm not exactly sure what's a good interface that fits both:

this is super true, and one of the reasons i decided it was worth the complexity to have an extends and include-style system in the configs. it seemed pretty clear that we have different kinds of configs, but all of them need to be editable, so it should be possible for someone to e.g. take one and just tweak it slightly without needing to modify or copy the slow-moving device-specific config.

hard agree we need to have a GUI for this, and hopefully all the data modeling makes the GUI part a lot less work. a decent amount of what i've been thinking about re: making configs not care about paths is in prep for making GUIs on top of this package.

just want to reiterate i totally understand and agree with the need for this PR, but think that we probably want to do it in a different way, and since i'm the one that's saying that i'm willing to put up the work to make that happen. i'll have increasing time to work on mio stuff in the coming weeks.

MarcelMB added 3 commits February 21, 2025 15:09

close butter filter figure with Esc

736f32f

saving denoise_example.yml user settings to a json file

31fdbf0

loading and existing json file to update config, CLI

3930545

MarcelMB requested review from sneakers-the-rat and t-sasatani February 22, 2025 00:59

sneakers-the-rat reviewed Mar 7, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Save/load config settings #108

Save/load config settings #108

MarcelMB commented Feb 22, 2025 •

edited by github-actions bot

Loading

t-sasatani commented Feb 22, 2025 •

edited

Loading

MarcelMB commented Feb 22, 2025

MarcelMB commented Feb 22, 2025

t-sasatani commented Feb 22, 2025 •

edited

Loading

t-sasatani commented Feb 22, 2025

MarcelMB commented Feb 22, 2025 •

edited

Loading

MarcelMB commented Mar 7, 2025

sneakers-the-rat commented Mar 7, 2025 •

edited

Loading

sneakers-the-rat left a comment

sneakers-the-rat Mar 7, 2025

sneakers-the-rat Mar 7, 2025

sneakers-the-rat Mar 7, 2025

sneakers-the-rat Mar 7, 2025

t-sasatani commented Mar 7, 2025 •

edited

Loading

sneakers-the-rat commented Mar 7, 2025

Save/load config settings #108

Are you sure you want to change the base?

Save/load config settings #108

Conversation

MarcelMB commented Feb 22, 2025 • edited by github-actions bot Loading

t-sasatani commented Feb 22, 2025 • edited Loading

MarcelMB commented Feb 22, 2025

MarcelMB commented Feb 22, 2025

t-sasatani commented Feb 22, 2025 • edited Loading

t-sasatani commented Feb 22, 2025

MarcelMB commented Feb 22, 2025 • edited Loading

MarcelMB commented Mar 7, 2025

sneakers-the-rat commented Mar 7, 2025 • edited Loading

sneakers-the-rat left a comment

Choose a reason for hiding this comment

sneakers-the-rat Mar 7, 2025

Choose a reason for hiding this comment

sneakers-the-rat Mar 7, 2025

Choose a reason for hiding this comment

sneakers-the-rat Mar 7, 2025

Choose a reason for hiding this comment

sneakers-the-rat Mar 7, 2025

Choose a reason for hiding this comment

t-sasatani commented Mar 7, 2025 • edited Loading

sneakers-the-rat commented Mar 7, 2025

MarcelMB commented Feb 22, 2025 •

edited by github-actions bot

Loading

t-sasatani commented Feb 22, 2025 •

edited

Loading

t-sasatani commented Feb 22, 2025 •

edited

Loading

MarcelMB commented Feb 22, 2025 •

edited

Loading

sneakers-the-rat commented Mar 7, 2025 •

edited

Loading

t-sasatani commented Mar 7, 2025 •

edited

Loading