Support Lazy Loading of Pod Templates #17701

GWphua · 2025-02-06T10:53:24Z

Description

There may be some circumstances where we want to change the config of the task pod, such as changing the cpu requests/limits of default task pods by editing the base YAML pod template file. In my case, I am editing something that looks like this example ConfigMap, to directly make changes to the pod templates.

However, since the code only updates HashMap<String, PodTemplate> templates during Overlord initialization, the changes are not immediately reflected.

This PR aims to solve this problem and make Druid more adaptable in Kubernetes environments. Instead of internally keeping track of the deserialized PodTemplates we have read from all template files, we will now run the deserialization process whenever a new Task Pod is to be created. Do note this PR does not support reading from newly created pod template files. (e.g. If we create a newPodTemplate.yaml, we will still need to let the Overlord run template initialization for the new file to take effect).

The original intention of fail-fast for invalid pod templates is retained by validating all of the pod templates defined during initialization.

Performance Considerations

Given that a ingestion task will create n task pods, and we have k pod templates defined, we will be reading from the pod template n + k times instead of just k times.

Release note

This PR has:

been self-reviewed.
added documentation for new or modified features or behaviors.
a release note entry in the PR description.
added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
added integration tests.
been tested in a test Druid cluster.

… restarting.

FrankChen021

I like this feature and it looks good to me

FrankChen021 · 2025-02-17T14:07:33Z

I think we need one or two sentence description of this feature in the doc so that people don't need to dive into the code or do some testing to know whether this is support or not

@GWphua could u update it?

…ing of pod templates

FrankChen021

minor suggestion on doc

FrankChen021 · 2025-02-19T07:21:44Z

...rc/main/java/org/apache/druid/k8s/overlord/taskadapter/DynamicConfigPodTemplateSelector.java

-  private HashMap<String, PodTemplate> podTemplates;
-  private Supplier<KubernetesTaskRunnerDynamicConfig> dynamicConfigRef;
+  private final Supplier<KubernetesTaskRunnerDynamicConfig> dynamicConfigRef;
+  private HashMap<String, Supplier<PodTemplate>> podTemplates;


add a one line comment to explain why the Supplier is defined here

docs/development/extensions-core/k8s-jobs.md

Co-authored-by: Frank Chen <[email protected]>

cryptoe · 2025-02-19T15:49:33Z

Earlier the pod template code was de-serialized and kept in memory only once no ? Now for each invocation of task, the supplier gets called which reads the contents from file no ?
Is that correct ?
@GWphua ?

I am kind of on the fence for this. Its easier to debug when stuff is immutable. Earlier I just use to check if the template name is same or not but now templates with same name can be different.

GWphua · 2025-02-20T02:06:54Z

Earlier the pod template code was de-serialized and kept in memory only once no ? Now for each invocation of task, the supplier gets called which reads the contents from file no ? Is that correct ? @GWphua ?

I am kind of on the fence for this. Its easier to debug when stuff is immutable. Earlier I just use to check if the template name is same or not but now templates with same name can be different.

Hi @cryptoe, you are right about the logic of how it works. Regarding ease of debugging, it is also possible for the case that you have described: If the user and the cluster both have podTemplateFile.yaml, but the contents are different, there may be some confusion.

However, I think we can still do debugging by going into the pod and cat /path/to/podTemplateFile.yaml to see the template files. The main motivation behind this PR is to allow users to easily configure task pod resources without restarting the Overlord.

I also suggest taking a look at "Example 2: Using a ConfigMap to upload the Pod Template file" to easily maintain a single source of truth to allow you to identify issues easier.

* Lazy loading for PodTemplate to allow changing template files without restarting. * Revert accidental changes to inspectionProfiles/Druid.xml * Checkstyle * Add another example for pod template configuration, and for lazy loading of pod templates * Add tls port to example * Add unit test for lazy pod template loading * Fix spell-checks * Allow k8s-jobs.md to dynamically take in Druid Version * Update docs/development/extensions-core/k8s-jobs.md Co-authored-by: Frank Chen <[email protected]> * Update docs/development/extensions-core/k8s-jobs.md Co-authored-by: Frank Chen <[email protected]> * Add description for why Supplier is used --------- Co-authored-by: Frank Chen <[email protected]>

GWphua added 3 commits February 6, 2025 10:30

Lazy loading for PodTemplate to allow changing template files without…

39cc828

… restarting.

Revert accidental changes to inspectionProfiles/Druid.xml

f6deff6

Checkstyle

77b269d

github-actions bot added the Kubernetes label Feb 6, 2025

FrankChen021 approved these changes Feb 12, 2025

View reviewed changes

Add another example for pod template configuration, and for lazy load…

9185bb3

…ing of pod templates

github-actions bot added the Area - Documentation label Feb 19, 2025

GWphua added 4 commits February 19, 2025 11:38

Add tls port to example

a539e98

Add unit test for lazy pod template loading

876ddaf

Fix spell-checks

e494756

Allow k8s-jobs.md to dynamically take in Druid Version

3704774

FrankChen021 reviewed Feb 19, 2025

View reviewed changes

GWphua and others added 3 commits February 19, 2025 15:31

Update docs/development/extensions-core/k8s-jobs.md

c148e49

Co-authored-by: Frank Chen <[email protected]>

Update docs/development/extensions-core/k8s-jobs.md

48da95f

Co-authored-by: Frank Chen <[email protected]>

Add description for why Supplier is used

36c1daa

FrankChen021 merged commit fd45182 into apache:master Feb 19, 2025
75 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support Lazy Loading of Pod Templates #17701

Support Lazy Loading of Pod Templates #17701

GWphua commented Feb 6, 2025 •

edited

Loading

FrankChen021 left a comment

FrankChen021 commented Feb 17, 2025

FrankChen021 left a comment

FrankChen021 Feb 19, 2025

cryptoe commented Feb 19, 2025

GWphua commented Feb 20, 2025

Support Lazy Loading of Pod Templates #17701

Support Lazy Loading of Pod Templates #17701

Conversation

GWphua commented Feb 6, 2025 • edited Loading

Description

Performance Considerations

Release note

FrankChen021 left a comment

Choose a reason for hiding this comment

FrankChen021 commented Feb 17, 2025

FrankChen021 left a comment

Choose a reason for hiding this comment

FrankChen021 Feb 19, 2025

Choose a reason for hiding this comment

cryptoe commented Feb 19, 2025

GWphua commented Feb 20, 2025

GWphua commented Feb 6, 2025 •

edited

Loading