Pipeline should prioritize `save=True` before subsequent tasks #283

ljgray · 2024-12-09T19:25:51Z

In certain cases, it's possible that the dataset produced by a task will be modified by another task before the task save method is run, resulting in the wrong dataset state being saved.

This happens when task runs process, passes the output on to a subsequent task, and the subsequent tasks runs its process before the finish method of the first task. If the second task modifies the data in-place, both tasks reference the same data and the file saved by the first task will include the modifications made by the second task. In practice, this has primarily been seen when a mask is applied.

If save=True for a task, the pipeline should figure out that it needs to save that dataset before it is used by any other task.

The text was updated successfully, but these errors were encountered:

ljgray · 2025-03-10T18:29:37Z

I've been thinking about this one on and off, and I don't think it's realistic to include this sort of logic in the pipeline. Once #278 is finished and merged, the logic to avoid this issue will be straightforward.

ljgray self-assigned this Dec 9, 2024

ljgray closed this as completed Mar 10, 2025

ljgray reopened this Mar 10, 2025

ljgray closed this as not planned Won't fix, can't repro, duplicate, stale Mar 10, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pipeline should prioritize `save=True` before subsequent tasks #283

Pipeline should prioritize `save=True` before subsequent tasks #283

ljgray commented Dec 9, 2024

ljgray commented Mar 10, 2025

Pipeline should prioritize save=True before subsequent tasks #283

Pipeline should prioritize save=True before subsequent tasks #283

Comments

ljgray commented Dec 9, 2024

ljgray commented Mar 10, 2025

Pipeline should prioritize `save=True` before subsequent tasks #283

Pipeline should prioritize `save=True` before subsequent tasks #283