Skip to content

[rule enhancement] partition rules: add a tie-breaker — for read-heavy workloads, query-pattern alignment beats raw cardinality (a bare /id is not automatically the best key) #201

Description

@jaydestro

Type: enhancement to existing rules (partition-high-cardinality, partition-query-patterns) — conflicting guidance with no precedence
Category: Partitioning (partition-)
Severity: Medium (leads to a defensible-but-suboptimal key; turns the dominant query into a cross-partition scan; higher RU/latency at scale)
Affected: data modeling for any SQL→NoSQL migration or new container design, all SDKs
Doc reference: Partitioning overview · Choose a partition key


Summary

Two CRITICAL rules in the kit point in different directions for a common case, and there is
no guidance on which wins:

  • partition-high-cardinality — "Select partition keys with many unique values." Its
    ✅ GOOD examples are CustomerId, TenantId, DeviceId — i.e. a bare per-entity id.
    Read literally, a document's own /id is the maximum-cardinality choice and looks ideal.
  • partition-query-patterns — "Choose a partition key that supports your most frequent
    queries." Its anti-pattern is a product partitioned by one field while most queries filter
    by another.

For a read-heavy, rarely-written dataset that is almost always filtered by a field
(a product catalog filtered by category/brand; an orders-by-customer read model; a
content library filtered by type), these two rules disagree:

  • High-cardinality says: use the unique id (perfect distribution).
  • Query-patterns says: use the field you filter on (single-partition reads).

An agent following the high-cardinality rule will choose /id, partition every document into
its own logical partition, and turn every WHERE category = @c / WHERE brand = @b query
into a fan-out cross-partition scan — the exact thing query-patterns warns against. The
choice "looks" correct and even cites the right principle ("even distribution, efficient
single-item reads"), so it passes review.

The missing piece is a precedence/tie-breaker: when write volume is low and reads are
dominated by a filter, query alignment should win over raw cardinality. A bare /id is
the right key primarily when the dominant access pattern is a point read by that id, or
when write throughput is so high that write distribution is the binding constraint.

Scope (important — /id is not always wrong)

The Microsoft docs are explicit that /id is a great partition key for two cases, and
this issue is not asking to contradict that:

"For small read-heavy containers or write-heavy containers of any size, the item ID
(/id) is naturally a great choice for the partition key."
Partitioning and horizontal scaling › Use item ID as the partition key

The same page adds the caveat that pins down exactly where /id stops being a good fit:

"If you have a read-heavy container with many physical partitions, queries are more
efficient if they have an equality filter with the item ID."

So the gap is narrow and specific: a read-heavy container that grows past one physical
partition and is filtered by a non-id field
. There, /id turns the dominant query into a
cross-partition fan-out, while the kit's two CRITICAL rules still give no rule for which one
wins. The ask is a tie-breaker for that case, not a blanket "avoid /id."

Benchmark evidence — reproduced end-to-end with the kit loaded

This is a real agent run (not a synthetic test): the kit was loaded and read, the agent
understood the access pattern, and still chose /id.

The eShop Catalog SQL→Cosmos migration task was run with claude-opus-4.7 and the
cosmosdb-best-practices kit installed (compiled AGENTS.md baked into the working dir;
load verified — hook install lines present, AGENTS.md pulled into the session 8×, Azure MCP
connected). The run passed 13 of 14 independent checks; the only failure was
partition_key_grouping.

The agent's own header comment in the generated Program.cs (verbatim) shows it had already
worked out the dominant access pattern — it indexed exactly the filter fields — and still
partitioned by /id:

//   * Container "items":   one document per product, partition key /id.
// Indexing on "items":
//   * Include /name, /catalogTypeId, /catalogBrandId (the filter paths)
var itemsContainerProps = new ContainerProperties("items", "/id");   // the graded miss

So with the kit present, the agent recognized that reads filter by type/brand (it built a
composite index and included /catalogTypeId + /catalogBrandId as "the filter paths"), yet
chose the per-item /id partition key — turning the dominant filtered query into a
cross-partition fan-out. The decision "looks" principled and cites high cardinality / even
distribution, exactly as predicted above.

Whole-kit rule audit (why a faithful agent lands on /id)

Auditing the installed kit (119 rules) for the catalog partition decision: four rules point
at /id, only one ambiguous rule points away, and nothing ranks them.

Rule Impact Stance on a per-item /id key
partition-high-cardinality CRITICAL Blesses it — "thousands to millions of unique values… distribute writes evenly." A per-item /id is the maximum; no carve-out warns that per-item granularity fragments reads.
partition-key-length Endorses it — "Prefer short GUIDs, IDs, or codes … for partition keys."
partition-immutable-key Satisfiedid never changes.
partition-avoid-hotspots Satisfied — per-item ⇒ zero hot partitions.
partition-query-patterns CRITICAL The lone counter — but its anti-pattern partitions by Category (never shows /id as wrong), all "correct" examples use an obvious parent entity (Seller/Customer/Conversation), and it explicitly permits "for less common queries, accept cross-partition."

A whole-kit search found zero rules that warn against a per-item /id partition key and
zero rules that give precedence when high-cardinality conflicts with query-alignment. With
the guidance 4-blesses-vs-1-ambiguous and no tie-breaker, an agent optimizing the stated
principles chooses /id and passes its own review. This is the precise gap the precedence
note below closes.

Verified against the live SDK + emulator

The same outcome reproduces deterministically at the SDK level with Microsoft.Azure.Cosmos
3.46.1 against the Cosmos DB Linux (vNext) emulator. Two containers, identical 30-item data
set, only the partition key differs:

truth = 10 items WHERE c.category = 'Footwear'

/category container, query scoped to PartitionKey('Footwear')  -> 10 items  (single-partition, correct)
/id       container, query scoped to PartitionKey('Footwear')  ->  0 items  (cannot be served from one partition)
/id       container, cross-partition (no PartitionKey)          -> 10 items  (correct ONLY when fanned out)

The /id container can return the correct result for a category filter only by fanning out
across partitions — there is no single logical partition that holds "all Footwear," because
the partition key is the per-item id. That is the cross-partition scan the partition-query-patterns
rule warns about, reached by following partition-high-cardinality to the letter.

Concrete example (read-heavy catalog)

// Access patterns:
//   ~85% : "list products in category X" / "list products for brand Y"   (filtered reads)
//   ~10% : "get product by id"                                            (point read)
//   ~5%  : writes (occasional catalog edits)

// ❌ High-cardinality choice: /id  — perfect distribution, but every filtered list is a
//    cross-partition scan (the 85% case is now the slow, RU-expensive path).
new ContainerProperties("items", "/id");

// ✅ Query-aligned choice: /category (or /brandId) — the 85% filtered reads become
//    single-partition queries; the 10% point read still works via (id + partition key).
new ContainerProperties("items", "/category");

Recommended guidance to add

Add a short precedence note to both rules (cross-linked):

Cardinality vs. query alignment. High cardinality matters most when write
distribution
is the binding constraint (write-heavy, high-throughput). For read-heavy
workloads dominated by a filter on one field, prefer the field you filter on as the
partition key even though its cardinality is lower than /id — single-partition reads beat
a perfectly even write spread you don't need. Choose a bare /id when the dominant access
is a point read by that id, or when write throughput genuinely requires maximal spread.
If the filter field's cardinality is too low (hot-partition risk), use a synthetic or
hierarchical key that leads with the filter field (e.g. /category then /id).

And soften the partition-high-cardinality GOOD examples so a unique id is shown as
one good option conditioned on access pattern, not as unconditionally ideal.

Related: SQL → NoSQL migration note

This bites hardest during relational-to-Cosmos migrations. The instinct is to carry the
table's integer primary key over as the document id and partition by it. Worth a
one-line callout (here or in a model- rule):

When migrating a relational table to Cosmos DB, partition by the dominant query
dimension
(the column you filter on most), not the surrogate primary key carried over
as id.

Suggested placement

Precedence note added to partition-high-cardinality.md and partition-query-patterns.md
(both impact: CRITICAL); optional migration one-liner in a model- rule or
partition-synthetic-keys.

Metadata

Metadata

Assignees

No one assigned

    Labels

    agent-kitIssues requiring updates to cosmosdb-best-practices Agent Kit rulesenhancementNew feature or requestrule:partitionPartition key rules (partition-*)

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions