Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow to regenerate memo files for images imported after a specific date #155

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

sbesson
Copy link
Member

@sbesson sbesson commented Mar 19, 2025

Related to #148

The changes made in #150 aimed to improve the distribution of the memo file regeneration to make better use of the resources allocated to the process. This used the initialization time as measured at import time and stored in the database as a proxy to assess the processing time for each fileset.

While reasonable, these assumptions might be challenged by the storage realities e.g. deployments applying data management policies that move data through different tiered storages. For these cases, regenerating all the memo files involves reading all the data from archived/slow storage/.... which might have real implications both in terms of time and billing.

This PR proposes to mitigate this scenario by specifying an import time cut-off and regenerating only memo files for images imported after this cut-off.

The logic use the image creation time retrieved from the database as well as PSQL variable assignment to filter entries based on the import date. The default behavior is unmodified and should return a CSL files for all images.

To test this PR, compare the outcome of the process with different variants of --since arguments

./regen-memo-files.sh --cache-options /OMERO/BioFormatsCache.full
./regen-memo-files.sh --cache-options /OMERO/BioFormatsCache.since2024 --since 2024-01-01
./regen-memo-files.sh --cache-options /OMERO/BioFormatsCache.since2025 --since 2025-01-01

Future extensions might take advantage of the image.archived flag which has been proposed to communicate the archival state of a particular image - see ome/openmicroscopy#6390.

/cc @erindiel @atTODO

The logic use the image creation time retrieved from the database as well
as PSQL variable assignment to filter entries based on the import date.
The default behavior is unmodified and should return a CSL files for all
images.
@sbesson sbesson requested a review from stick March 19, 2025 11:55
This should prevent PSQL of type
"operator does not exist: timestamp without time zone > integer"
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants