Skip to content

Save/load config settings #108

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: butter_filter
Choose a base branch
from
Open

Conversation

MarcelMB
Copy link
Contributor

@MarcelMB MarcelMB commented Feb 22, 2025

I assume this PR is going to be more controversial

  • first I based it off butter_filter (my other PR I just opened), should have maybe started it from feature_process and not nest it
  • second it is failing some tests
    RROR tests/test_io.py
    ERROR tests/test_models/test_model_process.py
    ERROR tests/test_stream_daq.py

and here I probably need help. Because I don't know how to integrate it exactly or whats best practice.

  1. The idea is since we now have a bunch of preprocessing steps we would need to keep track what values we used for a specific file to make the preprocessing repeatable. We have this values in denoise_example.yml. So I save the most important values in a json file (not the ones that are just for visualization)
  2. we than want to have a CLI command to use this saved json file to update the parameters in denoise_example.yaml

mio process denoise -i video.avi -c denoise_example -u 'relativepath to' processing_log.json
the -u stand for update and uses the saved.json file

one can still run just the normal command without updating the config yaml and just using whatever its in there
`mio process denoise -i video.avi -c denoise_example'

this all works well on my end by just playing around with changing thing in the json and using -u to update it on a next run.
But I am falling test since changing the CLI probably doesn't make all the tests happy. But let me know


📚 Documentation preview 📚: https://miniscope-io--108.org.readthedocs.build/en/108/

@t-sasatani
Copy link
Collaborator

t-sasatani commented Feb 22, 2025

I'm not sure if it's nice to have JSON and YAMLs that basically contain the same thing but are used differently. Might be more straightforward to have a yaml_export method in the config model and save the YAML file with the export video (with a hash or identifier if necessary if uniqueness/history is an issue)?

@MarcelMB
Copy link
Contributor Author

________________________________________________________ ERROR collecting tests/test_io.py ________________________________________________________ ImportError while importing test module '/Users/mbrosch/Documents/GitKraken_mac/miniscope-io/tests/test_io.py'. Hint: make sure your test modules/packages have valid Python names. Traceback: /opt/homebrew/Cellar/[email protected]/3.11.11/Frameworks/Python.framework/Versions/3.11/lib/python3.11/importlib/__init__.py:126: in import_module return _bootstrap._gcd_import(name[level:], package, level) tests/test_io.py:15: in <module> from mio.utils import hash_file, hash_video E ImportError: cannot import name 'hash_file' from 'mio.utils' (/Users/mbrosch/Documents/GitKraken_mac/miniscope-io/mio/utils/__init__.py) ____________________________________________ ERROR collecting tests/test_models/test_model_process.py _____________________________________________ ImportError while importing test module '/Users/mbrosch/Documents/GitKraken_mac/miniscope-io/tests/test_models/test_model_process.py'. Hint: make sure your test modules/packages have valid Python names. Traceback: /opt/homebrew/Cellar/[email protected]/3.11.11/Frameworks/Python.framework/Versions/3.11/lib/python3.11/importlib/__init__.py:126: in import_module return _bootstrap._gcd_import(name[level:], package, level) tests/test_models/test_model_process.py:6: in <module> from mio.utils import hash_video, hash_file E ImportError: cannot import name 'hash_video' from 'mio.utils' (/Users/mbrosch/Documents/GitKraken_mac/miniscope-io/mio/utils/__init__.py) ____________________________________________________ ERROR collecting tests/test_stream_daq.py ____________________________________________________ ImportError while importing test module '/Users/mbrosch/Documents/GitKraken_mac/miniscope-io/tests/test_stream_daq.py'. Hint: make sure your test modules/packages have valid Python names. Traceback: /opt/homebrew/Cellar/[email protected]/3.11.11/Frameworks/Python.framework/Versions/3.11/lib/python3.11/importlib/__init__.py:126: in import_module return _bootstrap._gcd_import(name[level:], package, level) tests/test_stream_daq.py:15: in <module> from mio.utils import hash_video, hash_file E ImportError: cannot import name 'hash_video' from 'mio.utils' (/Users/mbrosch/Documents/GitKraken_mac/miniscope-io/mio/utils/__init__.py)

The tests seem to fail for some hash functions. I am not sure how this could be related to the json file saving and loading functionality?!

@MarcelMB
Copy link
Contributor Author

save the YAML file with the export video (with a hash or identifier if necessary if uniqueness/history is an issue)
so you mean instead of saving a json save a yaml?

i mean the file format is of course not that important. Just saving a file that can always live with the video files in a folder. to basically know what the config was or re-run it if necessary on the same file or other files by another person

@t-sasatani
Copy link
Collaborator

t-sasatani commented Feb 22, 2025

Oh, the quotes and sentences got mixed up lol

so you mean instead of saving a json save a yaml?

No, the file extension isn't so important.

I agree that saving config files is nice. But my question is, why don't we copy the config file into the export folder, or make a model export that can be used directly as a config file, instead of manually dumping everything into a file that can't be directly plugged in? So a comment about implementation/interface rather than concept.

@t-sasatani
Copy link
Collaborator

I guess it'll be ideal if we can throw this stuff or some metadata into the video container itself, but it seems tricky with .avi.

@MarcelMB
Copy link
Contributor Author

MarcelMB commented Feb 22, 2025

make a model export that can be used directly as a config file

sure that makes sense. Would this file still live in the user output folder or when loded replace the actual denoise config yaml in the config folder?

for now I never change the actual config file. it stays how it is downloaded. But only temporarily updates from the json file. if -u is used otherwise does what ever is in there

@MarcelMB
Copy link
Contributor Author

MarcelMB commented Mar 7, 2025

ideas to change this before merge

  • copy paste yaml config file into user output folder (with .avi files)
  • -c denoise_example -u 'relativepath to' processing_log.json replace denoise_example instead of adding another yaml file

@sneakers-the-rat
Copy link
Collaborator

sneakers-the-rat commented Mar 7, 2025

sorry just seeing this.

If i'm getting this right, we want to have some root configuration that might have a bunch of minor variations, e.g. for each folder in some project directory?

there are a few planned changes to configs that are relevant here:

  • first, as @t-sasatani said above, the plan is to have a command to create a new config from an existing one. The config files themselves already satisfy the need for per-file or per-project config, so that would work like

    # create new config with id `my-wireless` in user config directory
    mio config copy wireless-200px my-wireless
    # use it
    mio stream capture -c my-wireless
    
    # create a new config with id `my-wireless-2` in current directory
    mio config copy wireless-200px my-wireless-2 -o ./config.yaml
    # use it
    mio stream capture -c ./config.yaml
    
  • second, it is definitely true that a system like mio config copy would be annoying: what if you need to change a value? do you need to change all the configs???? in working on the pipelining stuff, i have made a few improvements to the config system that gives it some of the nice features of a templating system. so far i've implemented an include-like thing where you can basically import another config into the current one by referring to it by id where config values would normally go, so e.g. here:

    layout: "wirefree-sd-layout"
    where the WireFreePipeline model specifies that it needs an SDFileSourceConfig (implicitly, working on that part), the loader then injects the contents of wirefree-sd-layout by reference.

    The inverse of includes is extends, where rather than importing another config into the current one as some key, you say that another config is the base config that you are extending/overriding. This would make it easier to have one "root" config that doesn't change, and then have more minimal configs that only change a subset of the values, but you only need to edit the single parent config to update it.

    so, e.g.

    parent.yaml

    id: parent-config
    some_value: 1
    another_value: 2
    changing_value: "hello"

    child.yaml

    id: child-config
    extends: parent-config 
    changing_value: "sup"

    child-config would get loaded as {"some_value": 1, "another_value": 2, "changing_value": "sup"} , as desired.


The reasons I prefer those rather than something like this is that

  • it keeps the config system uniform: juggling paths is a pain, especially when you start getting into dependencies between things. e.g. for the extends example, it would be a pain in the ass if you had to e.g. specify the parent config as ../parent-config.yaml and then make sure that you never changed the relative folder structure of your data. absolute paths are a no-go because they are not portable and not publishable in addition to the other problems with being dependent on paths. If all configs are always referenced by ids, they are always in one place (yes, you can supply a path via the cli if that is easier to do, e.g. if you always have a mio_config.yaml in each project directory), they are always yaml, and they are always backed by a pydantic model, it's a lot easier to build intuition.
  • this version mixes concerns a bit: if we wanted to update the values in a config file, we would want that to be like mio config set -c my-config some_value=2 to set a value from the cli, and then we would assume that the file itself was changed. having this be in the capture function overloads it, and introduces cli inconsistencies -- it would make sense that we should be able to do that anywhere we refer to a config
  • discourages people from making their own configs: we want to encourage people to make their own config files for projects so that they can be versioned along with any project code and then publishable. so we want people to use something like mio config copy, or a system like extends where we preserve the automatically stamped mio_version (and can thus resolve the state of what the extended config was, at least coarsely).

Copy link
Collaborator

@sneakers-the-rat sneakers-the-rat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

already commented that i think this should be implemented differently, tho i totally understand the need. just commenting here to show some stuff that could be done differently for learning purposes

Comment on lines +792 to +869
metadata = {
"timestamp": datetime.now().isoformat(),
"input_video": str(video_path),
"output_directory": str(output_dir),
"processing_config": {
"noise_patch": {
"enabled": config.noise_patch.enable if config.noise_patch else False,
"method": config.noise_patch.method if config.noise_patch else None,
"gradient_config": (
{
"threshold": (
config.noise_patch.gradient_config.threshold
if config.noise_patch
else None
),
}
if config.noise_patch and config.noise_patch.gradient_config
else None
),
"black_area_config": (
{
"consecutive_threshold": (
config.noise_patch.black_area_config.consecutive_threshold
if config.noise_patch
else None
),
"value_threshold": (
config.noise_patch.black_area_config.value_threshold
if config.noise_patch
else None
),
}
if config.noise_patch and config.noise_patch.black_area_config
else None
),
},
"frequency_masking": {
"enabled": config.frequency_masking.enable if config.frequency_masking else False,
"cast_float32": (
config.frequency_masking.cast_float32 if config.frequency_masking else False
),
"cutoff_radius": (
config.frequency_masking.spatial_LPF_cutoff_radius
if config.frequency_masking
else None
),
"vertical_BEF": (
config.frequency_masking.vertical_BEF_cutoff
if config.frequency_masking
else None
),
"horizontal_BEF": (
config.frequency_masking.horizontal_BEF_cutoff
if config.frequency_masking
else None
),
},
"butterworth": {
"enabled": config.butter_filter.enable,
"order": config.butter_filter.order,
"cutoff_frequency": config.butter_filter.cutoff_frequency,
"sampling_rate": config.butter_filter.sampling_rate,
},
},
"output_files": {
"noise_patch": f"output_{pathstem}_patch.avi",
"freq_mask": f"output_{pathstem}_freq_mask.avi",
"butter_filter": f"output_{pathstem}_butter_filter.avi",
"butter_plot": f"{pathstem}_butter_filter_intensity_plot.png",
},
}

# Save as JSON (overwrite any existing file)
log_file = output_dir / "processing_log.json"
logs = {"processing_runs": [metadata]} # Just create new log with single run

with open(log_file, "w") as f: # "w" mode overwrites existing file
json.dump(logs, f, indent=2)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should do this for a few reasons, not to belabor the point but we should not need to log what our config was after the fact, we should already know that because we already supplied it when we invoked it! it might be nice to offer, but i would really want to just make a structured model for outputs and have this be a part of that. ad-hoc layouts and formats are not gr8 in general.

but anyway, you'd want this to be like:

Suggested change
metadata = {
"timestamp": datetime.now().isoformat(),
"input_video": str(video_path),
"output_directory": str(output_dir),
"processing_config": {
"noise_patch": {
"enabled": config.noise_patch.enable if config.noise_patch else False,
"method": config.noise_patch.method if config.noise_patch else None,
"gradient_config": (
{
"threshold": (
config.noise_patch.gradient_config.threshold
if config.noise_patch
else None
),
}
if config.noise_patch and config.noise_patch.gradient_config
else None
),
"black_area_config": (
{
"consecutive_threshold": (
config.noise_patch.black_area_config.consecutive_threshold
if config.noise_patch
else None
),
"value_threshold": (
config.noise_patch.black_area_config.value_threshold
if config.noise_patch
else None
),
}
if config.noise_patch and config.noise_patch.black_area_config
else None
),
},
"frequency_masking": {
"enabled": config.frequency_masking.enable if config.frequency_masking else False,
"cast_float32": (
config.frequency_masking.cast_float32 if config.frequency_masking else False
),
"cutoff_radius": (
config.frequency_masking.spatial_LPF_cutoff_radius
if config.frequency_masking
else None
),
"vertical_BEF": (
config.frequency_masking.vertical_BEF_cutoff
if config.frequency_masking
else None
),
"horizontal_BEF": (
config.frequency_masking.horizontal_BEF_cutoff
if config.frequency_masking
else None
),
},
"butterworth": {
"enabled": config.butter_filter.enable,
"order": config.butter_filter.order,
"cutoff_frequency": config.butter_filter.cutoff_frequency,
"sampling_rate": config.butter_filter.sampling_rate,
},
},
"output_files": {
"noise_patch": f"output_{pathstem}_patch.avi",
"freq_mask": f"output_{pathstem}_freq_mask.avi",
"butter_filter": f"output_{pathstem}_butter_filter.avi",
"butter_plot": f"{pathstem}_butter_filter_intensity_plot.png",
},
}
# Save as JSON (overwrite any existing file)
log_file = output_dir / "processing_log.json"
logs = {"processing_runs": [metadata]} # Just create new log with single run
with open(log_file, "w") as f: # "w" mode overwrites existing file
json.dump(logs, f, indent=2)
logs = {
"timestamp": datetime.now().isoformat(),
"input_video": str(video_path),
"output_directory": str(output_dir),
"config": config.model_dump()
}
with open(log_file, "w") as f: # "w" mode overwrites existing file
json.dump(logs, f, indent=2)

no reason to change the keys like this when we're saving the config, might as well just dump the whole thing which has 100% of the information that the config has by definition, and is in a format that can be roundtripped if needed. doing it like this would mean that every time we changed the model, we would need to remember to update this.

@@ -0,0 +1 @@
"""Utility functions for miniscope-io."""
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i have blanket objects to the existence of utils subpackages, as they are, as a rule, junk drawer subpackages that make maintenance unnecessarily hard. for an advanced case of how this plays out, see linkml/linkml#2371

it is true that sometimes things don't have an obvious place, so a single, short utils.py module is fine, but once it starts ballooning...

Comment on lines +11 to +12
with open(json_path) as f:
log_data = json.load(f)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ideally all i/o operations are separated from processing actions (so, e.g., we wouldn't need to write to a temporary tmp.json file and then pass the path to this function if we already had a dict).

Comment on lines +17 to +52
# Update noise patch config
config.noise_patch.enable = proc_config["noise_patch"]["enabled"]
if config.noise_patch.enable:
config.noise_patch.method = proc_config["noise_patch"]["method"]
if proc_config["noise_patch"]["gradient_config"]:
config.noise_patch.gradient_config.threshold = proc_config["noise_patch"][
"gradient_config"
]["threshold"]
if proc_config["noise_patch"]["black_area_config"]:
config.noise_patch.black_area_config.consecutive_threshold = proc_config["noise_patch"][
"black_area_config"
]["consecutive_threshold"]
config.noise_patch.black_area_config.value_threshold = proc_config["noise_patch"][
"black_area_config"
]["value_threshold"]

# Update frequency masking config
config.frequency_masking.enable = proc_config["frequency_masking"]["enabled"]
if config.frequency_masking.enable:
config.frequency_masking.cast_float32 = proc_config["frequency_masking"]["cast_float32"]
config.frequency_masking.spatial_LPF_cutoff_radius = proc_config["frequency_masking"][
"cutoff_radius"
]
config.frequency_masking.vertical_BEF_cutoff = proc_config["frequency_masking"][
"vertical_BEF"
]
config.frequency_masking.horizontal_BEF_cutoff = proc_config["frequency_masking"][
"horizontal_BEF"
]

# Update butterworth config
config.butter_filter.enable = proc_config["butterworth"]["enabled"]
if config.butter_filter.enable:
config.butter_filter.order = proc_config["butterworth"]["order"]
config.butter_filter.cutoff_frequency = proc_config["butterworth"]["cutoff_frequency"]
config.butter_filter.sampling_rate = proc_config["butterworth"]["sampling_rate"]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same problem as above, lack of generality, need for maintenance if model changes.

Suggested change
# Update noise patch config
config.noise_patch.enable = proc_config["noise_patch"]["enabled"]
if config.noise_patch.enable:
config.noise_patch.method = proc_config["noise_patch"]["method"]
if proc_config["noise_patch"]["gradient_config"]:
config.noise_patch.gradient_config.threshold = proc_config["noise_patch"][
"gradient_config"
]["threshold"]
if proc_config["noise_patch"]["black_area_config"]:
config.noise_patch.black_area_config.consecutive_threshold = proc_config["noise_patch"][
"black_area_config"
]["consecutive_threshold"]
config.noise_patch.black_area_config.value_threshold = proc_config["noise_patch"][
"black_area_config"
]["value_threshold"]
# Update frequency masking config
config.frequency_masking.enable = proc_config["frequency_masking"]["enabled"]
if config.frequency_masking.enable:
config.frequency_masking.cast_float32 = proc_config["frequency_masking"]["cast_float32"]
config.frequency_masking.spatial_LPF_cutoff_radius = proc_config["frequency_masking"][
"cutoff_radius"
]
config.frequency_masking.vertical_BEF_cutoff = proc_config["frequency_masking"][
"vertical_BEF"
]
config.frequency_masking.horizontal_BEF_cutoff = proc_config["frequency_masking"][
"horizontal_BEF"
]
# Update butterworth config
config.butter_filter.enable = proc_config["butterworth"]["enabled"]
if config.butter_filter.enable:
config.butter_filter.order = proc_config["butterworth"]["order"]
config.butter_filter.cutoff_frequency = proc_config["butterworth"]["cutoff_frequency"]
config.butter_filter.sampling_rate = proc_config["butterworth"]["sampling_rate"]
config = config.model_validate({**config.model_dump(), **proc_config})

@t-sasatani
Copy link
Collaborator

t-sasatani commented Mar 7, 2025

I think the primary need for this is to record the configs with the output files. So I think the key dump @sneakers-the-rat posted is pretty much enough to cover short-term needs.

For the design of configs in general, the denoise and device configs probably have different needs. I'm not exactly sure what's a good interface that fits both:

  • denoise configs are pretty dynamic. We're now playing around with filter parameters (cutoff, etc.), so numerous minor config adjustments exist. I'm unsure if we should have an ID for each, but I want to preserve raw processing parameters with the output. The denoising may require users to play around with parameters, even in the final form of mio. It might be better to interact with notebooks if that's something people are comfortable with, or a GUI.
  • device configs are matched one-to-one with the device (or at least the firmware version) and will be much more static. So, I guess keeping the ID is usually sufficient. Also, I think the child config method @sneakers-the-rat posted above can cover most of the tool developer's needs.

@sneakers-the-rat
Copy link
Collaborator

i think recording the configs with the output is a really good idea - i would just like to spec that out a bit more explicitly :) that is the primary place that downstream analysis and data format tools will interact with this package, so it's worth having that be explicit and declarative. e.g. see the nwb miniscope extension - it's very dependent on the structure of the config model and output directories: https://github.com/catalystneuro/ndx-miniscope/tree/main/src/pynwb/ndx_miniscope

the denoise and device configs probably have different needs. I'm not exactly sure what's a good interface that fits both:

this is super true, and one of the reasons i decided it was worth the complexity to have an extends and include-style system in the configs. it seemed pretty clear that we have different kinds of configs, but all of them need to be editable, so it should be possible for someone to e.g. take one and just tweak it slightly without needing to modify or copy the slow-moving device-specific config.

hard agree we need to have a GUI for this, and hopefully all the data modeling makes the GUI part a lot less work. a decent amount of what i've been thinking about re: making configs not care about paths is in prep for making GUIs on top of this package.

just want to reiterate i totally understand and agree with the need for this PR, but think that we probably want to do it in a different way, and since i'm the one that's saying that i'm willing to put up the work to make that happen. i'll have increasing time to work on mio stuff in the coming weeks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants