Skip to content

INTPYTHON-527 Add Queryable Encryption support #329

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

aclark4life
Copy link
Collaborator

@aclark4life aclark4life commented Jun 27, 2025

(see previous attempts in #318, #319 and #323 for additional context)

With this PR I am able to get Django to create an encrypted collection when the schema code is running create_model on an EncryptedModel containing an EncryptedCharField e.g. see db.enxcol_.encryption__person.ecoc below

Enterprise a49c6bfb-b6b3-4711-bd5d-c6ecf0611a4c [direct: secondary] test> use test_djangotests
switched to db test_djangotests
Enterprise a49c6bfb-b6b3-4711-bd5d-c6ecf0611a4c [direct: secondary] test_djangotests> db.
db.__proto__                        db.constructor                      db.hasOwnProperty                   db.isPrototypeOf
db.propertyIsEnumerable             db.toLocaleString                   db.toString                         db.valueOf
db.getMongo                         db.getName                          db.getCollectionNames               db.getCollectionInfos
db.runCommand                       db.adminCommand                     db.aggregate                        db.getSiblingDB
db.getCollection                    db.dropDatabase                     db.createUser                       db.updateUser
db.changeUserPassword               db.logout                           db.dropUser                         db.dropAllUsers
db.auth                             db.grantRolesToUser                 db.revokeRolesFromUser              db.getUser
db.getUsers                         db.createCollection                 db.createEncryptedCollection        db.createView
db.createRole                       db.updateRole                       db.dropRole                         db.dropAllRoles
db.grantRolesToRole                 db.revokeRolesFromRole              db.grantPrivilegesToRole            db.revokePrivilegesFromRole
db.getRole                          db.getRoles                         db.currentOp                        db.killOp
db.shutdownServer                   db.fsyncLock                        db.fsyncUnlock                      db.version
db.serverBits                       db.isMaster                         db.hello                            db.serverBuildInfo
db.serverStatus                     db.stats                            db.hostInfo                         db.serverCmdLineOpts
db.rotateCertificates               db.printCollectionStats             db.getProfilingStatus               db.setProfilingLevel
db.setLogLevel                      db.getLogComponents                 db.commandHelp                      db.listCommands
db.printSecondaryReplicationInfo    db.getReplicationInfo               db.printReplicationInfo             db.watch
db.sql                              db.auth_group_permissions           db.django_session                   db.auth_user
db.enxcol_.encryption__person.ecoc  db.auth_group                       db.django_site                      db.django_migrations
db.django_content_type              db.auth_user_groups                 db.enxcol_.encryption__person.esc   db.auth_permission
db.auth_user_user_permissions       db.django_admin_log

Questions

  • To manage both encrypted and unencrypted connections, keep the _nodb_cursor functionality in this PR or do something in init_connection_state as @timgraham suggests, or do something else?
  • As @ShaneHarvey suggests, ask encryption folks about command not supported for auto encryption: buildinfo which happens when Django attempts to get the server version via encrypted connection, thus necessitating the need to manage both encrypted and unencrypted connections. Are most commands supported for auto encryption or not?
  • What does EncryptedModel support for EmbeddedModel look like? What are the specific use cases for integration of EncryptedModel and EmbeddedModel? Should we be able to mixin EncryptedModel and EmbeddedModel then include that model in an EmbeddedModelField ?

Todo

  • Helpers need a home
  • Add additional encrypted fields
    • EncryptedCharField
  • Migrations
  • Querying
  • Docs
    • Limitations
    • Mention pymongocrypt wheel includes crypt_shared library!
  • More tests
  • More KMS support ("local" only in this PR)

Helpers

Included helpers are also used by the test runner e.g.

import os

from django_mongodb_backend import encryption, parse_uri

kms_providers = encryption.get_kms_providers()
auto_encryption_opts = encryption.get_auto_encryption_opts(
    kms_providers=kms_providers,
)

DATABASE_URL = os.environ.get("MONGODB_URI", "mongodb://localhost:27017/djangotests")
DATABASES = {
    "default": parse_uri(
        DATABASE_URL, options={"auto_encryption_opts": auto_encryption_opts}
    ),
}

DEFAULT_AUTO_FIELD = "django_mongodb_backend.fields.ObjectIdAutoField"
PASSWORD_HASHERS = ("django.contrib.auth.hashers.MD5PasswordHasher",)
SECRET_KEY = "django_tests_secret_key"
USE_TZ = False


# Build a map of encrypted fields
encrypted_fields = {
"fields": {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add query conditions

return ClientEncryption(kms_providers, key_vault_namespace, encrypted_connection, codec_options)


def get_auto_encryption_opts(crypt_shared_lib_path=None, kms_providers=None):
Copy link
Collaborator Author

@aclark4life aclark4life Jun 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

crypt_shared library is in the pymongocrypt wheel, which is much easier than downloading separately and telling MongoClient where it is.

Copy link
Collaborator Author

@aclark4life aclark4life Jun 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More to this story:

  • libmongocrypt is in the pymongocrypt wheel, not crypt_shared which must always be downloaded and configured manually.
  • libmongocrypt works because mongocryptd is running on enterprise.

We should document this.

(via @ShaneHarvey, thanks!)

Comment on lines 434 to 435
self.connection.features.supports_encryption
and self.connection._settings_dict.get("OPTIONS", {}).get("auto_encryption_opts")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we want encrypted models to silently fallback to working as unencrypted models.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No we don't but I'm not sure why you are making that comment here … as of 65bd15a I'm creating two connections and using the encrypted_connection only when needed. Is there a fallback scenario I'm missing? Seems like with two connections we're going to have to check every use of self.connection to make sure we're using the right one.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In _create_collection() you're guarding the creation of an encrypted model based on this method, so if features.supports_encryption = False but the model has encrypted fields, it's going to incorrectly use create_collection() instead.

class EncryptedCharField(models.CharField):
def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
self.encrypted = True
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd think this could be a class-level variable.

# Use the encrypted connection and auto_encryption_opts to create an encrypted client
encrypted_client = get_encrypted_client(auto_encryption_opts, encrypted_connection)

with contextlib.suppress(EncryptedCollectionError):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need a comment about why the error should be suppressed.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There shouldn't be a case where we're trying to create a collection that already exists. It would be correct to surface that error to the user because their migrations are out of sync with their database.

@aclark4life
Copy link
Collaborator Author

Wrong commit message for 65bd15a and I don't want to force push yet. It should have said:

"Only create an encrypted connection once then reuse it."

I'm aware that _nodb_cursor is slated for removal but in the meantime I can keep going with other fixes with this approach, and it does satisfy the design we all agree on (I think) of maintaining two simultaneous connections:

  • Unencrypted connection unless we need it
  • Encrypted connection when we need that can be used.

@timgraham
Copy link
Collaborator

timgraham commented Jun 27, 2025

I'm aware that _nodb_cursor is slated for removal but in the meantime I can keep going with other fixes with this approach, and it does satisfy the design we all agree on (I think) of maintaining two simultaneous connections:

It's not working as you think it is. As I said elsewhere, _nodb_cursor is not used by this backend.

Does this fix the "command not supported for auto encryption: buildinfo" error? If so, it's perhaps because self.settings_dict["OPTIONS"].pop("auto_encryption_opts") is having the side effect of altering settings_dict before DatabaseWrapper.connection is initialized.

I'd suggest to use my patch is as a starting point for maintaining two connections. self.connection should be the encrypted version (secure by default) with a fallback to a non-encrypted connection only as needed (e.g. for commands like buildInfo). At least it will help us understand whether that's a viable approach. As I mentioned in the design doc, I'm not sure if using an encrypted connection for non-encrypted collections is problematic. If so, we'll have to go back to the drawing board on the design.

@aclark4life
Copy link
Collaborator Author

It's not working as you think it is. As I said elsewhere, _nodb_cursor is not used by this backend.

I don't disagree, but it feels a lot like _start_transaction_under_autocommit which gets called by start_transaction_under_autocommit because autocommit is False. Django appears to stumble into _nodb_cursor when the encrypted connection fails to get the database version and while we don't use a cursor in this backend, we do have a "nosql" cursor that has __enter__ and __exit__ (I assume) to meet Django's expectations and we get an opportunity to modify the connection. @Jibola mentioned this design is suspect yesterday and I agree with both of you, particularly with regard to the desire to start with and maintain an encrypted connection first.

Does this fix the "command not supported for auto encryption: buildinfo" error? If so, it's perhaps because self.settings_dict["OPTIONS"].pop("auto_encryption_opts") is having the side effect of altering settings_dict before DatabaseWrapper.connection is initialized.

Yes it works by design, not a side effect. I'm deep.copying settings_dict when DatabaseWrapper is initialized and so when DatabaseWrapper.connection is initialized it's unencrypted. When the schema needs encryption later, it's retrieved from _settings_dict.

I'd suggest to use my patch is as a starting point for maintaining two connections. self.connection should be the encrypted version (secure by default) with a fallback to a non-encrypted connection only as needed (e.g. for commands like buildInfo). At least it will help us understand whether that's a viable approach. As I mentioned in the design doc, I'm not sure if using an encrypted connection for non-encrypted collections is problematic. If so, we'll have to go back to the drawing board on the design.

I'd make a few passes at it but did not get anywhere, I'll try again though.

@timgraham
Copy link
Collaborator

Your "stumble" theory of how it's working isn't correct. _nodb_cursor is only used on one place: to create the test database. As I said, I could imagine that perhaps this method causes connection to later be initialized without auto_encryption_opts because of self.settings_dict["OPTIONS"].pop("auto_encryption_opts"). The connection that's created in your _nodb_cursor is never used.

@aclark4life
Copy link
Collaborator Author

aclark4life commented Jun 28, 2025

Your "stumble" theory of how it's working isn't correct. _nodb_cursor is only used on one place: to create the test database. As I said, I could imagine that perhaps this method causes connection to later be initialized without auto_encryption_opts because of self.settings_dict["OPTIONS"].pop("auto_encryption_opts"). The connection that's created in your _nodb_cursor is never used.

Copy that, thanks!

I've removed _nodb_cursor in 8e83ada and discovered the version check is the only time that error occurs. I now get errors like:

Traceback (most recent call last):
  File "/Users/alexclark/Developer/django-mongodb-cli/src/mongo-python-driver/pymongo/synchronous/encryption.py", line 124, in _wrap_encryption_errors
    yield
  File "/Users/alexclark/Developer/django-mongodb-cli/src/mongo-python-driver/pymongo/synchronous/encryption.py", line 466, in encrypt
    encrypted_cmd = self._auto_encrypter.encrypt(database, encoded_cmd)
  File "/Users/alexclark/Developer/django-mongodb-cli/.venv/lib/python3.13/site-packages/pymongocrypt/synchronous/auto_encrypter.py", line 44, in encrypt
    return run_state_machine(ctx, self.callback)
  File "/Users/alexclark/Developer/django-mongodb-cli/.venv/lib/python3.13/site-packages/pymongocrypt/synchronous/state_machine.py", line 136, in run_state_machine
    result = callback.mark_command(ctx.database, mongocryptd_cmd)
  File "/Users/alexclark/Developer/django-mongodb-cli/src/mongo-python-driver/pymongo/synchronous/encryption.py", line 286, in mark_command
    res = self.mongocryptd_client[database].command(
        inflated_cmd, codec_options=DEFAULT_RAW_BSON_OPTIONS
    )
  File "/Users/alexclark/Developer/django-mongodb-cli/src/mongo-python-driver/pymongo/_csot.py", line 125, in csot_wrapper
    return func(self, *args, **kwargs)
  File "/Users/alexclark/Developer/django-mongodb-cli/src/mongo-python-driver/pymongo/synchronous/database.py", line 930, in command
    return self._command(
           ~~~~~~~~~~~~~^
        connection,
        ^^^^^^^^^^^
    ...<7 lines>...
        **kwargs,
        ^^^^^^^^^
    )
    ^
  File "/Users/alexclark/Developer/django-mongodb-cli/src/mongo-python-driver/pymongo/synchronous/database.py", line 770, in _command
    return conn.command(
           ~~~~~~~~~~~~^
        self._name,
        ^^^^^^^^^^^
    ...<8 lines>...
        client=self._client,
        ^^^^^^^^^^^^^^^^^^^^
    )
    ^
  File "/Users/alexclark/Developer/django-mongodb-cli/src/mongo-python-driver/pymongo/synchronous/helpers.py", line 47, in inner
    return func(*args, **kwargs)
  File "/Users/alexclark/Developer/django-mongodb-cli/src/mongo-python-driver/pymongo/synchronous/pool.py", line 414, in command
    return command(
        self,
    ...<20 lines>...
        write_concern=write_concern,
    )
  File "/Users/alexclark/Developer/django-mongodb-cli/src/mongo-python-driver/pymongo/synchronous/network.py", line 212, in command
    helpers_shared._check_command_response(
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^
        response_doc,
        ^^^^^^^^^^^^^
    ...<2 lines>...
        parse_write_concern_error=parse_write_concern_error,
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    )
    ^
  File "/Users/alexclark/Developer/django-mongodb-cli/src/mongo-python-driver/pymongo/helpers_shared.py", line 250, in _check_command_response
    raise OperationFailure(errmsg, code, response, max_wire_version)
pymongo.errors.OperationFailure: Non-empty 'let' field is not allowed in the $lookup aggregation stage over an encrypted collection., full error: RawBSONDocument(b"\xa7\x00\x00\x00\x01ok\x00\x00\x00\x00\x00\x00\x00\x00\x00\x02errmsg\x00d\x00\x00\x00Non-empty 'let' field is not allowed in the $lookup aggregation stage over an encrypted collection.\x00\x10code\x00\x08\xc8\x00\x00\x02codeName\x00\x0e\x00\x00\x00Location51208\x00\x00", codec_options=CodecOptions(document_class=<class 'bson.raw_bson.RawBSONDocument'>, tz_aware=False, uuid_representation=UuidRepresentation.UNSPECIFIED, unicode_decode_error_handler='strict', tzinfo=None, type_registry=TypeRegistry(type_codecs=[], fallback_encoder=None), datetime_conversion=DatetimeConversion.DATETIME))

Still working on an unencrypted connection, but perhaps the only time we need it is for the version check.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants