GarNet / Redis / Other Storage Options #510
Replies: 5 comments 15 replies
-
Our prod workflows also move files (.zip) around (from generator to consumers) but never carry any workload data directly. Instead, workflows command a 'transfer-service' to move a file from a source URL (i.e. HTTP endpoint served by the generator, known by the workflow instance) to a MinIO/S3 ObjectStorage bucket (also known by the workflow). The 'transfer-service' is quite simple (given a source URL, download the zip file locally then push it to MinIO/S3) but could require "double" authentication (1. access to the transfer-service' endpoint and 2. access to the file' source endpoint) so the workflow definition supports this awesome nested authentication feature ...
- transferContentToLdsS3:
call: openapi
with:
document:
endpoint:
uri: ${ "\( $context.environmentVariables.S3_MANAGER_URL )/openapi.json" }
operationId: s3-manager
parameters:
body:
transfers:
- bucketName: ${ if $language != "ENU" then (($context.form.qualifiedName | split(" ") | join("-") | ascii_downcase) + "-" + ($language | ascii_downcase)) else ($context.form.qualifiedName | split(" ") | join("-") | ascii_downcase) end}
objectName: SVN.zip
makeBucket: true
unzip: false
source:
url: ${ "\( $context.form.generator.baseUrl )/api/file/download/package/\( $context.packages[] | select(.name? // "" | contains($package)) | .[$language]? // {} | .fileId // "" )" }
headers:
- name: Authorization
value: ${ "Bearer \( $authorization.parameter )" } # <<<<< Authenticate on the Generator's API!
authentication:
use: generator-oauth2 # <<<< Authenticate on the s3-manager (aka transfer-service) API!
... |
Beta Was this translation helpful? Give feedback.
-
Thanks @bvandewe, so it seems you're using an API to store all data consistently. In my world, that'd e.g. be a GCS bucket. Is there any way I could access the value of a secret? (e.g. tried |
Beta Was this translation helpful? Give feedback.
-
thanks @bvandewe let me clarify a bit: Our 1st workflow should
My initial idea was to do it this way:
I have a PoC python script for accessing images on GCS (authenticating with a key-file mounted via secrets directory), but I think you are proposing a completely different approach, which is outsourcing heavy-lifting to a deployed service, which in turn is being called from serverless workflows.
Is this correctly summarizing your idea? |
Beta Was this translation helpful? Give feedback.
-
on a sidenote: as mentioned I have a PoC for my initial flow using GCS client in a python script. When adding the base64 representation of the image data (51kb raw image size, potentially larger when base64 encoded) to the output (stdout), the workflow does never finish (runner keeps running forever). When removing that particular field from the response, the workflow finishes successfully. I do understand this approach is not the way to go, but I'd like to better understand any limits applying. |
Beta Was this translation helpful? Give feedback.
-
@cdavernas I actually tried to do this, and at least locally in docker, I am not able to specify a bind mount in the runner template.
I found this code (and definition here?), which seems to attempt to extract volumes and add them to binds. However, somehow this does not work for me (container won't start with |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Hi Devs
we're currently evaluating to use Synapse in production. I have replaced GarNet with a GCP (Cloud MemoryStore) Redis instance, as this more seamlessly integrates with our infrastructure stacks (e.g. no persistent disk needed for garnet container) and reduces maintenance efforts.
Our workflows will (besides other things) download images from GCS and forward them to other workflows/steps in the output.
I curently see 2 options to do that:
The issue I am seeing with 1. is the large amount of data which would be exposed in logs/events etc.
The issue with 2 is the constraint to stay inside a single runner container for the whole flow.
I am wondering, whether there's better options to avoid congested logs/events, and whether a different storage solution can be used (e.g. NoSQL DB). I have seen some code available here, but I am not sure how this could be used without changing the codebase.
So, the question is: How would you approach the challenge of large amounts of data, and how could I replace redis with another persistency option (or combine both).
Beta Was this translation helpful? Give feedback.
All reactions