DM-45988: Support storing init-outputs in Prompt Processing central repo #4436

kfindeisen · 2025-03-28T17:44:48Z

This PR adds a Job and CronJob that create init-outputs and output collections for the Prompt Processing service. By design, these jobs share much of their config with the primary service and must be updated together in order to work correctly.

charts/prompt-keda/templates/_service-init.tpl

charts/prompt-keda/templates/service-init.yaml

dspeck1

LGTM. There a couple comments.

The initializer is a small-scale process that creates all the pipeline init-outputs that the main service might need, and commits them to the central repo exactly once. As such, it doesn't need much memory and no storage, but *must* run before the service is viable. The partial template allows the job definition to be shared between a Job and a CronJob, ensuring consistency.

The service does not currently use Sasquatch, but it could in principle send performance metrics.

The init service constructs all tasks in all pipelines, and PackageAlertsTask tries to connect to the Kafka server at construction time.

Rerunning the initializer daily is necessary because the run collection names include the day_obs (and this feature is very useful for anybody trying to analyze or reproduce the data later).

Knative HSC-gpu, LSSTCam, and LSSTComCam, and Keda LSSTComCam are currently unused.

The job needs to prepare version- and configuration-specific settings for the main service. By default, Argo only runs a job if no existing job object exists.

The service depends on the init job to correctly and self-consistently initialize the run collections for the day. Without the block, the service would receive visits but then fail during preprocessing.

kfindeisen requested a review from dspeck1 March 28, 2025 17:44

dspeck1 reviewed Mar 28, 2025

View reviewed changes

charts/prompt-keda/templates/_service-init.tpl Outdated Show resolved Hide resolved

charts/prompt-keda/templates/service-init.yaml Show resolved Hide resolved

dspeck1 approved these changes Mar 28, 2025

View reviewed changes

kfindeisen force-pushed the tickets/DM-45988 branch from 75addfd to f2e5a53 Compare April 7, 2025 17:06

kfindeisen added 7 commits April 8, 2025 11:43

Add Sasquatch credentials to PP init service.

3847956

The service does not currently use Sasquatch, but it could in principle send performance metrics.

Add Kafka alert stream credentials to PP init service.

de38da5

The init service constructs all tasks in all pipelines, and PackageAlertsTask tries to connect to the Kafka server at construction time.

Set up CronJob to run Prompt Processing init every day_obs.

5fb7b1d

Rerunning the initializer daily is necessary because the run collection names include the day_obs (and this feature is very useful for anybody trying to analyze or reproduce the data later).

Turn off init-outputs CronJob for Prompt Processing dev services.

4ad04d5

Knative HSC-gpu, LSSTCam, and LSSTComCam, and Keda LSSTComCam are currently unused.

Configure init Prompt Processing job to run on every sync.

3ea4337

The job needs to prepare version- and configuration-specific settings for the main service. By default, Argo only runs a job if no existing job object exists.

Block Prompt Processing deployment on successful init job.

761967d

The service depends on the init job to correctly and self-consistently initialize the run collections for the day. Without the block, the service would receive visits but then fail during preprocessing.

kfindeisen force-pushed the tickets/DM-45988 branch from f2e5a53 to 761967d Compare April 8, 2025 18:44

kfindeisen added this pull request to the merge queue Apr 8, 2025

Merged via the queue into main with commit fcdb88c Apr 8, 2025
6 checks passed

kfindeisen deleted the tickets/DM-45988 branch April 8, 2025 18:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DM-45988: Support storing init-outputs in Prompt Processing central repo #4436

DM-45988: Support storing init-outputs in Prompt Processing central repo #4436

kfindeisen commented Mar 28, 2025

dspeck1 left a comment

DM-45988: Support storing init-outputs in Prompt Processing central repo #4436

DM-45988: Support storing init-outputs in Prompt Processing central repo #4436

Conversation

kfindeisen commented Mar 28, 2025

dspeck1 left a comment

Choose a reason for hiding this comment