Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[GOBBLIN-2174] GoT YarnService Integration with DynamicScaling #4077

Merged
merged 23 commits into from
Dec 9, 2024

Conversation

Blazer-007
Copy link
Member

@Blazer-007 Blazer-007 commented Nov 23, 2024

Dear Gobblin maintainers,

Please accept this PR. I understand that it will not be reviewed until I have checked off all the steps below!

JIRA

[ ] My PR addresses the following Gobblin JIRA issues and references them in the PR title. For example, "[GOBBLIN-XXX] My Gobblin PR"
- https://issues.apache.org/jira/browse/GOBBLIN-2174

Description

☑️ Here are some details about my PR, including screenshots (if applicable):

Tests

☑️ My PR adds the following unit tests OR does not need testing for this extremely good reason:

  • Tested E2E using newly created DummyDynamicScalingYarnServiceManager which uses DummyScalingDirectiveSource which returns set of fixed profiles and their set points
  • YarnServiceTest
    • testBaselineWorkerProfileCreatedWithPassedConfigs and testBuildContainerCommand
  • DynamicScalingYarnServiceManagerTest

Logs from container

2024-11-29 05:22:33 PST [,,] INFO  [DynamicScalingYarnService STARTING] org.apache.gobblin.temporal.yarn.YarnService  - Requesting initial containers using baselineWorkerProfile
2024-11-29 05:22:33 PST [,,] INFO  [DynamicScalingYarnService STARTING] org.apache.gobblin.temporal.yarn.YarnService  - Requesting 2 containers with resource=<memory:8192, vCores:2> and allocation request id = Optional.of(0)
2024-11-29 05:22:59 PST [,,] INFO  [DynamicScalingExecutor] org.apache.gobblin.temporal.yarn.DynamicScalingYarnService  - Requesting 2 new containers for profile secondProfile having currently 0 containers
2024-11-29 05:22:59 PST [,,] INFO  [DynamicScalingExecutor] org.apache.gobblin.temporal.yarn.YarnService  - Requesting 2 containers with resource=<memory:2048, vCores:2> and allocation request id = Optional.of(1)
2024-11-29 05:22:59 PST [,,] INFO  [DynamicScalingExecutor] org.apache.gobblin.temporal.yarn.DynamicScalingYarnService  - Requesting 3 new containers for profile firstProfile having currently 0 containers
2024-11-29 05:22:59 PST [,,] INFO  [DynamicScalingExecutor] org.apache.gobblin.temporal.yarn.YarnService  - Requesting 3 containers with resource=<memory:2048, vCores:2> and allocation request id = Optional.of(2)
...
...
2024-11-29 05:23:59 PST [,,] INFO  [DynamicScalingExecutor] org.apache.gobblin.temporal.yarn.DynamicScalingYarnService  - Requesting 1 new containers for profile secondProfile having currently 2 containers
2024-11-29 05:23:59 PST [,,] INFO  [DynamicScalingExecutor] org.apache.gobblin.temporal.yarn.YarnService  - Requesting 1 containers with resource=<memory:2048, vCores:2> and allocation request id = Optional.of(3)
2024-11-29 05:23:59 PST [,,] INFO  [DynamicScalingExecutor] org.apache.gobblin.temporal.yarn.DynamicScalingYarnService  - Requesting 2 new containers for profile firstProfile having currently 3 containers
2024-11-29 05:23:59 PST [,,] INFO  [DynamicScalingExecutor] org.apache.gobblin.temporal.yarn.YarnService  - Requesting 2 containers with resource=<memory:2048, vCores:2> and allocation request id = Optional.of(4)

Commits

✔️ My commits all reference JIRA issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "How to write a good git commit message":
1. Subject is separated from body by a blank line
2. Subject is limited to 50 characters
3. Subject does not end with a period
4. Subject uses the imperative mood ("add", not "adding")
5. Body wraps at 72 characters
6. Body explains "what" and "why", not "how"

@Blazer-007
Copy link
Member Author

Previous PR review done here - phet#1

Also please ignore initial commit history as those got added while rebasing

@Blazer-007 Blazer-007 changed the title [GOBBLIN-] GoT YarnService Integration with DynamicScaling [GOBBLIN-2174] GoT YarnService Integration with DynamicScaling Nov 23, 2024
Copy link
Contributor

@phet phet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks really good. are there any mocks that would help test YarnService?

@Blazer-007 Blazer-007 marked this pull request as ready for review November 29, 2024 13:56
@Blazer-007
Copy link
Member Author

Blazer-007 commented Nov 29, 2024

are there any mocks that would help test YarnService?

Have added unit test for buildContainerCommand

Copy link
Contributor

@phet phet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this YARN integration is coming along nicely and seems very close!

testDynamicScalingYarnServiceManager.startUp();
Thread.sleep(5000); // 5 seconds sleep so that GetScalingDirectivesRunnable.run() is called for 5 times
testDynamicScalingYarnServiceManager.shutDown();
Mockito.verify(mockDynamicScalingYarnService, Mockito.times(3)).reviseWorkforcePlanAndRequestNewContainers(Mockito.anyList());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm tempted to verify further what it actually did. e.g. that it called requestContainers. what do you think?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i dont think it will be useful or good to do as we are working with mock of dynamicscalingyarn service and requestContainers is called from internal of reviseWorkforcePlanAndRequestNewContainers.

Keeping in mind added DynamicScalingYarnServiceTest to test reviseWorkforcePlanAndRequestNewContainers

Comment on lines 24 to 27
/**
* {@link DummyScalingDirectiveSource} based implementation of {@link AbstractDynamicScalingYarnServiceManager}.
* This class is meant to be used for testing purposes only.
*/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how is this to be used? are you hard coding when making a private build or passing by config? if not the latter yet, would it be worth getting it to work that way?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes for doing e2e testing for which logs i have added in pr description is passed through config -
gobblin.yarn.app.master.serviceClasses=org.apache.gobblin.temporal.loadgen.dynamic.DummyDynamicScalingYarnServiceManager in prod we can simply replace this with FsScaling one

Comment on lines +32 to +33
public final static String DYNAMIC_SCALING_DIRECTIVES_DIR = GobblinTemporalConfigurationKeys.DYNAMIC_SCALING_PREFIX + "directives.dir";
public final static String DYNAMIC_SCALING_ERRORS_DIR = GobblinTemporalConfigurationKeys.DYNAMIC_SCALING_PREFIX + "errors.dir";
Copy link
Contributor

@phet phet Dec 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these are fine for now, but I suspect we'll move to basing these directories from the JobStateUtils::getWorkDirRoot

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, will move it.
added a todo for reminder

Copy link
Contributor

@phet phet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NOTE: checkstyle Errors:

[ERROR] /home/runner/work/gobblin/gobblin/gobblin-temporal/src/main/java/org/apache/gobblin/temporal/yarn/DynamicScalingYarnService.java:23:8: Unused import - org.apache.hadoop.yarn.api.records.Resource. [UnusedImports]

> Task :gobblin-temporal:checkstyleMain FAILED
[ERROR] /home/runner/work/gobblin/gobblin/gobblin-temporal/src/main/java/org/apache/gobblin/temporal/yarn/DynamicScalingYarnService.java:26:8: Unused import - com.google.common.base.Optional. [UnusedImports]

UPDATE: sorry, NVM, not an issue... my browser hadn't updated at first to indicate a successful exec happened since.

Copy link
Contributor

@phet phet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

excellent work here!

@codecov-commenter
Copy link

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 38.77%. Comparing base (e5d897e) to head (2e78702).
Report is 1 commits behind head on master.

❗ There is a different number of reports uploaded between BASE (e5d897e) and HEAD (2e78702). Click for more details.

HEAD has 1 upload less than BASE
Flag BASE (e5d897e) HEAD (2e78702)
2 1
Additional details and impacted files
@@              Coverage Diff              @@
##             master    #4077       +/-   ##
=============================================
- Coverage     51.52%   38.77%   -12.76%     
+ Complexity     7563     1598     -5965     
=============================================
  Files          1387      388      -999     
  Lines         52132    16006    -36126     
  Branches       5724     1586     -4138     
=============================================
- Hits          26863     6206    -20657     
+ Misses        22979     9301    -13678     
+ Partials       2290      499     -1791     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@phet phet merged commit 7309060 into apache:master Dec 9, 2024
6 checks passed
@Blazer-007 Blazer-007 deleted the dynamic-scaling-test-branch branch December 9, 2024 17:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants