Skip to content

[PLUGIN-1717]: Added implementation for openFile and openFileWithOptions #1543

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: develop
Choose a base branch
from

Conversation

prince-cs
Copy link

@prince-cs prince-cs commented May 5, 2025

Fix encrypted file reader for DP 2.1 and above

Bug Tracker

PLUGIN-1717

Description

Currently the file read on 2.1, 2.2 produces garbage binary data when reading an encrypted file, this is caused as new Hadoop version uses different file open methods.

This PR overrides the new file open method used and extend the decryption logic being used.

Code change

  • Modified EncryptedFileSystem.java
    • Add new methods to open file.
  • pom.xml
    • Bump the client lib used to get the new open methods.

Tests

Meta

  • Test case uses 2 CSV files
    • 100 records
    • 100K records (~10 MB)
  • Pipeline are run 4 times
    • Ephemeral Cluster
    • Existing Cluster (Dataproc Image 2.0)
    • Existing Cluster (Dataproc Image 2.1)
    • Existing Cluster (Dataproc Image 2.2)

Test Case [100 - Ephemeral]

Screenshot 2025-06-03 at 2 06 20 AM

Test Case [100K - Ephemeral]

Screenshot 2025-06-03 at 2 02 49 AM

Test Case [100 - DP2.0]

Screenshot 2025-06-03 at 2 17 10 AM

Test Case [100K - DP2.0]

Screenshot 2025-06-03 at 1 49 21 AM

Test Case [100 - DP2.1]

Screenshot 2025-06-03 at 2 10 43 AM

Test Case [100K - DP2.1]

Screenshot 2025-06-03 at 1 57 12 AM

Test Case [100 - DP2.2]

Test Case [100K - DP2.2]

@prince-cs prince-cs added the build Trigger unit test build label May 6, 2025
@psainics psainics removed the build Trigger unit test build label May 19, 2025
@psainics psainics self-assigned this May 19, 2025
@psainics psainics added the build Trigger unit test build label Jun 1, 2025
pom.xml Outdated
@@ -89,7 +89,7 @@
<google.tink.version>1.3.0-rc3</google.tink.version>
<guava.version>27.0.1-jre</guava.version>
<hadoop.version>3.3.6</hadoop.version>
<hbase-shaded-client.version>1.4.13</hbase-shaded-client.version>
<hbase-shaded-client.version>2.6.2-hadoop3</hbase-shaded-client.version>
<hbase-shaded-server.version>1.4.13</hbase-shaded-server.version>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not also update server version?

Copy link
Member

@itsankit-google itsankit-google left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also add e2e tests for this change.

@itsankit-google
Copy link
Member

Bigtable e2e tests are failing currently:
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
build Trigger unit test build
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants