Skip to content

[SPARK] Set Auto Bucket Partitioner to be the default partitioning strategy #252

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
May 12, 2025

Conversation

rachel-mack
Copy link
Contributor

@rachel-mack rachel-mack commented May 8, 2025

Pull Request Info

PR Reviewing Guidelines

JIRA - https://jira.mongodb.org/browse/DOCSP-49645

Staging Links

  • batch-mode/batch-read-config
  • release-notes
  • Self-Review Checklist

    • Is this free of any warnings or errors in the RST?
    • Did you run a spell-check?
    • Did you run a grammar-check?
    • Are all the links working?
    • Are the facets and meta keywords accurate?
    • Are the page titles greater than 20 characters long and SEO relevant?

    Copy link

    netlify bot commented May 8, 2025

    Deploy Preview for docs-spark-connector ready!

    Name Link
    🔨 Latest commit fef704a
    🔍 Latest deploy log https://app.netlify.com/sites/docs-spark-connector/deploys/681d005d77d56b0008b53e5d
    😎 Deploy Preview https://deploy-preview-252--docs-spark-connector.netlify.app
    📱 Preview on mobile
    Toggle QR Code...

    QR Code

    Use your smartphone camera to open QR code link.

    To edit notification comments on pull requests, go to your Netlify site configuration.

    @rachel-mack rachel-mack marked this pull request as ready for review May 8, 2025 14:40
    Copy link

    @stephmarie17 stephmarie17 left a comment

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    LGTM with a small fix.

    Comment on lines +198 to +207
    The ``AutoBucketPartitioner`` is the default partitioner configuration. It
    samples the data to generate partitions and uses
    the :manual:`$bucketAuto </reference/operator/aggregation/bucketAuto/>`
    aggregation stage to paginate. By using this configuration, you can partition
    the data across single or multiple fields, including nested fields.

    .. note:: Compound Keys

    The ``AutoBucketPartitioner`` configuration requires {+mdb-server+} version
    7.0 or higher to support compound keys.
    Copy link
    Contributor Author

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    I moved the entire AutoBucketPartitioner section to the top because it's the default, so the entire section is showing up as new, but this highlighted section, and the SamplePartitioner section below are the only content I've changed.

    Comment on lines +252 to +256
    The ``SamplePartitioner`` configuration is similar to the
    :ref:`AutoBucketPartitioner <conf-autobucketpartitioner>` configuration, but
    does not use the ``$bucketAuto`` aggregation stage. This
    configuration lets you specify a partition field, partition size, and number of
    samples per partition.
    Copy link
    Contributor Author

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    Also modified, see note above.

    Copy link
    Member

    @rozza rozza left a comment

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    LGTM!

    @rachel-mack rachel-mack merged commit c879f28 into mongodb:master May 12, 2025
    6 checks passed
    @rachel-mack rachel-mack deleted the DOCSP-49645 branch May 12, 2025 12:04
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    None yet
    Projects
    None yet
    Development

    Successfully merging this pull request may close these issues.

    3 participants