Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CB-17015 Ephemeral Disk in Data Mart and Real-time Data Mart Data Hub Clusters… #13191

Merged
merged 1 commit into from
Aug 24, 2022

Conversation

Vatsal23082000
Copy link
Contributor

  • Added the Ephermeral disk storage size value to IMPALA_DATACACHE_CAPACITY_PARAM.

Testing

  • Covered changes with unit tests.

See detailed description in the commit message.

@Vatsal23082000 Vatsal23082000 requested a review from a team as a code owner August 4, 2022 14:09
@Vatsal23082000 Vatsal23082000 requested a review from drorke August 4, 2022 14:19
-- Migration SQL that makes the change goes here.

ALTER TABLE template ADD COLUMN IF NOT EXISTS instancestoragesize INTEGER;
UPDATE template SET instancestoragesize = 0 WHERE instancestoragesize IS NULL;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have stats on how many records are affected?
Please avoid setting defaults for new fields (as well as NON NULL constraints). Solve it from backend code.

@@ -28,6 +28,11 @@ public Integer mapInstanceTypeToInstanceStoreCountNullHandled(String instanceTyp
return instanceStoreCount != null ? instanceStoreCount : 0;
}

public Integer mapInstanceTypeToInstanceSizeNullHandled(String instanceType) {
Integer instanceSize = instaceStoreConfigMap.getOrDefault(instanceType, VolumeParameterConfig.EMPTY).minimumSize();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you test the functionality on Azure as well as AWS? AFAIK there are Azure instance types where besides the one ephemeral volume that comes with every Azure instance there are also additional NVMW ephemeral volumes that can be of different size than the default one ephemeral volume. And I think Azure implementation of the instance store metadata collection is not prepared for this scenario and only returns info about the one default ephemeral storage

Copy link
Contributor Author

@Vatsal23082000 Vatsal23082000 Aug 5, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I had discussed this with @juhi-09. So, we had decided to do the changes for AWS first and after that would look for changes in the Azure part.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So it means that on Azure clusters the impala cache will be put on the generic one ephemeral volume and the extra nvme ephemeral volumes won't be used, right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@drorke It is ok, if we use generic ephemeral volume and ignore the extra nvme ephemeral volumes to calculate impala params ?

Copy link
Contributor

@bergerdenes bergerdenes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You will need to rebase on top of master too.

@Vatsal23082000
Copy link
Contributor Author

I have made the changes for the azure part. The below images are for the Data Mart Cluster in Azure where Ephemeral volumes are being used.
resolve 1
resolve 2

The below images are for the Data Mart Cluster in Azure where Ephemeral volumes are not being used and the attached volumes are being used.

resolve3

resolve4

The node types which are used in DM are:

  1. master - Standard_E8_v3
  2. coordinator/executor - Standard_E16_v3

The node types which are used in RT-DM are:

  1. master - Standard_D16_v3
  2. coordinator/executor - Standard_D16_v3

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
… Clusters

- Added the Ephermeral disk storage size value to IMPALA_DATACACHE_CAPACITY_PARAM.

Testing
- Covered changes with unit tests.

- Made changes as per the review.
Copy link
Contributor

@juhi-09 juhi-09 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@nadamdb nadamdb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@lacikaaa lacikaaa merged commit 36857ad into master Aug 24, 2022
@lacikaaa lacikaaa deleted the CB-17015_2 branch August 24, 2022 07:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants