Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-43057: [C++] Thread-safe AesEncryptor / AesDecryptor #44990

Open
wants to merge 36 commits into
base: main
Choose a base branch
from

Conversation

EnricoMi
Copy link
Contributor

@EnricoMi EnricoMi commented Dec 10, 2024

Rationale for this change

OpenSSL encryption / decryption is wrapped by AesEncryptor / AesDencryptor, which is used by multiple threads of a single scanner or by multiple concurrent scanners when scanning a dataset. Some thread may call WipeOut while other threads still use the instance.

What changes are included in this PR?

  • Remove the WipeOut methods and related datastructures entirely.
  • Each call into CtrEncrypt / CtrDecrypt and GcmEncrypt / GcmDecrypt uses its own EVP_CIPHER_CTX instance, making this thread-safe.

After fixing this "AesDecryptor was wiped out" issue, two other segmentation faults surfaced: GH-44988. This has also been addressed here as it can only be exposed after fixing the wipe-out issue.

Fixes GH-43057.
Fixes GH-44852.
Fixes GH-44988.

Are these changes tested?

A unit test that scans a dataset concurrently reproduced the initial issue in 30% of the test runs.

Are there any user-facing changes?

No.

@github-actions github-actions bot added awaiting committer review Awaiting committer review and removed awaiting review Awaiting review labels Dec 11, 2024
@adamreeve
Copy link
Contributor

The test failures look like they might be caused by openssl/openssl#21955

The failing ASAN test run is using OpenSSL 3.02, but builds with other versions are passing. Eg. the Ubuntu 20.04 build uses OpenSSL 1.1.1f and the Conda build has OpenSSL 3.3.2.

The fix for this was backported to the 3.0 branch and released in 3.0.13 (openssl/openssl#23102), but Ubuntu 22.04 still uses 3.0.2. If there's not an easy way around this, maybe we need a slower code path for 3.0.2 that creates a new context each time?

@pitrou
Copy link
Member

pitrou commented Dec 11, 2024

If there's not an easy way around this, maybe we need a slower code path for 3.0.2 that creates a new context each time?

Let's use the "slower" code path everywhere? I'm skeptical it will be that slow...

Copy link
Member

@pitrou pitrou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for stepping in and submitting this @EnricoMi . You'll find a number of comments below.

@EnricoMi EnricoMi force-pushed the thread-safe-endecryptor branch from a2e3992 to 108c8cf Compare December 11, 2024 19:49
@EnricoMi
Copy link
Contributor Author

I have addressed all comments except for the test related ones. Will go over them tomorrow.

Thanks for your input, that is highly appreciated!

@adamreeve
Copy link
Contributor

I tested concurrent scans of a larger dataset with uniform encryption, and this change doesn't completely fix that scenario:

--- a/cpp/src/arrow/dataset/file_parquet_encryption_test.cc
+++ b/cpp/src/arrow/dataset/file_parquet_encryption_test.cc
@@ -289,6 +289,8 @@ TEST_F(DatasetEncryptionTest, ReadSingleFile) {
 // processing encrypted datasets over 2^15 rows in multi-threaded mode.
 class LargeRowEncryptionTest : public DatasetEncryptionTestBase {
  public:
+  LargeRowEncryptionTest() : DatasetEncryptionTestBase(true) {}
+
   // The dataset is partitioned using a Hive partitioning scheme.
   void PrepareTableAndPartitioning() override {
     // Specifically chosen to be greater than batch size for triggering prefetch.
@@ -307,7 +309,7 @@ class LargeRowEncryptionTest : public DatasetEncryptionTestBase {
 
 // Test for writing and reading encrypted dataset with large row count.
 TEST_F(LargeRowEncryptionTest, ReadEncryptLargeRows) {
-  ASSERT_NO_FATAL_FAILURE(TestScanDataset());
+  ASSERT_NO_FATAL_FAILURE(TestScanDataset(true));
 }
 
 }  // namespace dataset

This gives errors like:

failed with IOError: Failed decryption finalization

I believe this is due to the decryptor AAD being mutable. It's updated for each data page read, so concurrent use will result in incorrect AADs being used in some threads.

Rather than mutating the decryptor state, it might be possible to refactor this so that the AAD values are passed in to the decrypt method as a parameter.

I think this should probably be addressed as a separate PR though as the changes should be orthogonal.

@EnricoMi
Copy link
Contributor Author

EnricoMi commented Dec 12, 2024

failed with IOError: Failed decryption finalization

Confirmed in ec9b054. Multi-threaded uniform encryption read fails.

@EnricoMi EnricoMi force-pushed the thread-safe-endecryptor branch 2 times, most recently from 33ff64b to 6508c5a Compare December 12, 2024 09:49
@EnricoMi
Copy link
Contributor Author

failed with IOError: Failed decryption finalization

Confirmed in ec9b054. Multi-threaded uniform encryption read fails.

Attempt to fix multi-threaded AAD update: d77a081

The PageReader uses its own copy of the data_decryptor / data_decryptor. Happy to move into separate PR.

@mapleFU
Copy link
Member

mapleFU commented Jan 21, 2025

Any updates?

@EnricoMi EnricoMi force-pushed the thread-safe-endecryptor branch from 4fc5b42 to b652e06 Compare January 21, 2025 08:55
@EnricoMi EnricoMi force-pushed the thread-safe-endecryptor branch from b652e06 to fbe3a1c Compare January 29, 2025 18:25
Copy link
Member

@pitrou pitrou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@EnricoMi Sorry for the delay and thanks a lot for persevering on this. I think the fix is still not 100% correct, see comments below. But we're nearing a solution.

Also, I can help on this PR if you want.

@EnricoMi EnricoMi force-pushed the thread-safe-endecryptor branch from 85c55ba to 4d388fa Compare February 6, 2025 17:17
@pitrou

This comment was marked as outdated.

This comment was marked as outdated.

EnricoMi and others added 29 commits February 25, 2025 17:30
@pitrou pitrou force-pushed the thread-safe-endecryptor branch from 4249ce4 to 7019158 Compare February 25, 2025 16:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
6 participants