Type: enhancement to existing rules (partition-high-cardinality, partition-query-patterns) — conflicting guidance with no precedence
Category: Partitioning (partition-)
Severity: Medium (leads to a defensible-but-suboptimal key; turns the dominant query into a cross-partition scan; higher RU/latency at scale)
Affected: data modeling for any SQL→NoSQL migration or new container design, all SDKs
Doc reference: Partitioning overview · Choose a partition key
Summary
Two CRITICAL rules in the kit point in different directions for a common case, and there is
no guidance on which wins:
partition-high-cardinality — "Select partition keys with many unique values." Its
✅ GOOD examples are CustomerId, TenantId, DeviceId — i.e. a bare per-entity id.
Read literally, a document's own /id is the maximum-cardinality choice and looks ideal.
partition-query-patterns — "Choose a partition key that supports your most frequent
queries." Its anti-pattern is a product partitioned by one field while most queries filter
by another.
For a read-heavy, rarely-written dataset that is almost always filtered by a field
(a product catalog filtered by category/brand; an orders-by-customer read model; a
content library filtered by type), these two rules disagree:
- High-cardinality says: use the unique id (perfect distribution).
- Query-patterns says: use the field you filter on (single-partition reads).
An agent following the high-cardinality rule will choose /id, partition every document into
its own logical partition, and turn every WHERE category = @c / WHERE brand = @b query
into a fan-out cross-partition scan — the exact thing query-patterns warns against. The
choice "looks" correct and even cites the right principle ("even distribution, efficient
single-item reads"), so it passes review.
The missing piece is a precedence/tie-breaker: when write volume is low and reads are
dominated by a filter, query alignment should win over raw cardinality. A bare /id is
the right key primarily when the dominant access pattern is a point read by that id, or
when write throughput is so high that write distribution is the binding constraint.
Scope (important — /id is not always wrong)
The Microsoft docs are explicit that /id is a great partition key for two cases, and
this issue is not asking to contradict that:
"For small read-heavy containers or write-heavy containers of any size, the item ID
(/id) is naturally a great choice for the partition key."
— Partitioning and horizontal scaling › Use item ID as the partition key
The same page adds the caveat that pins down exactly where /id stops being a good fit:
"If you have a read-heavy container with many physical partitions, queries are more
efficient if they have an equality filter with the item ID."
So the gap is narrow and specific: a read-heavy container that grows past one physical
partition and is filtered by a non-id field. There, /id turns the dominant query into a
cross-partition fan-out, while the kit's two CRITICAL rules still give no rule for which one
wins. The ask is a tie-breaker for that case, not a blanket "avoid /id."
Benchmark evidence — reproduced end-to-end with the kit loaded
This is a real agent run (not a synthetic test): the kit was loaded and read, the agent
understood the access pattern, and still chose /id.
The eShop Catalog SQL→Cosmos migration task was run with claude-opus-4.7 and the
cosmosdb-best-practices kit installed (compiled AGENTS.md baked into the working dir;
load verified — hook install lines present, AGENTS.md pulled into the session 8×, Azure MCP
connected). The run passed 13 of 14 independent checks; the only failure was
partition_key_grouping.
The agent's own header comment in the generated Program.cs (verbatim) shows it had already
worked out the dominant access pattern — it indexed exactly the filter fields — and still
partitioned by /id:
// * Container "items": one document per product, partition key /id.
// Indexing on "items":
// * Include /name, /catalogTypeId, /catalogBrandId (the filter paths)
var itemsContainerProps = new ContainerProperties("items", "/id"); // the graded miss
So with the kit present, the agent recognized that reads filter by type/brand (it built a
composite index and included /catalogTypeId + /catalogBrandId as "the filter paths"), yet
chose the per-item /id partition key — turning the dominant filtered query into a
cross-partition fan-out. The decision "looks" principled and cites high cardinality / even
distribution, exactly as predicted above.
Whole-kit rule audit (why a faithful agent lands on /id)
Auditing the installed kit (119 rules) for the catalog partition decision: four rules point
at /id, only one ambiguous rule points away, and nothing ranks them.
| Rule |
Impact |
Stance on a per-item /id key |
partition-high-cardinality |
CRITICAL |
Blesses it — "thousands to millions of unique values… distribute writes evenly." A per-item /id is the maximum; no carve-out warns that per-item granularity fragments reads. |
partition-key-length |
— |
Endorses it — "Prefer short GUIDs, IDs, or codes … for partition keys." |
partition-immutable-key |
— |
Satisfied — id never changes. |
partition-avoid-hotspots |
— |
Satisfied — per-item ⇒ zero hot partitions. |
partition-query-patterns |
CRITICAL |
The lone counter — but its anti-pattern partitions by Category (never shows /id as wrong), all "correct" examples use an obvious parent entity (Seller/Customer/Conversation), and it explicitly permits "for less common queries, accept cross-partition." |
A whole-kit search found zero rules that warn against a per-item /id partition key and
zero rules that give precedence when high-cardinality conflicts with query-alignment. With
the guidance 4-blesses-vs-1-ambiguous and no tie-breaker, an agent optimizing the stated
principles chooses /id and passes its own review. This is the precise gap the precedence
note below closes.
Verified against the live SDK + emulator
The same outcome reproduces deterministically at the SDK level with Microsoft.Azure.Cosmos
3.46.1 against the Cosmos DB Linux (vNext) emulator. Two containers, identical 30-item data
set, only the partition key differs:
truth = 10 items WHERE c.category = 'Footwear'
/category container, query scoped to PartitionKey('Footwear') -> 10 items (single-partition, correct)
/id container, query scoped to PartitionKey('Footwear') -> 0 items (cannot be served from one partition)
/id container, cross-partition (no PartitionKey) -> 10 items (correct ONLY when fanned out)
The /id container can return the correct result for a category filter only by fanning out
across partitions — there is no single logical partition that holds "all Footwear," because
the partition key is the per-item id. That is the cross-partition scan the partition-query-patterns
rule warns about, reached by following partition-high-cardinality to the letter.
Concrete example (read-heavy catalog)
// Access patterns:
// ~85% : "list products in category X" / "list products for brand Y" (filtered reads)
// ~10% : "get product by id" (point read)
// ~5% : writes (occasional catalog edits)
// ❌ High-cardinality choice: /id — perfect distribution, but every filtered list is a
// cross-partition scan (the 85% case is now the slow, RU-expensive path).
new ContainerProperties("items", "/id");
// ✅ Query-aligned choice: /category (or /brandId) — the 85% filtered reads become
// single-partition queries; the 10% point read still works via (id + partition key).
new ContainerProperties("items", "/category");
Recommended guidance to add
Add a short precedence note to both rules (cross-linked):
Cardinality vs. query alignment. High cardinality matters most when write
distribution is the binding constraint (write-heavy, high-throughput). For read-heavy
workloads dominated by a filter on one field, prefer the field you filter on as the
partition key even though its cardinality is lower than /id — single-partition reads beat
a perfectly even write spread you don't need. Choose a bare /id when the dominant access
is a point read by that id, or when write throughput genuinely requires maximal spread.
If the filter field's cardinality is too low (hot-partition risk), use a synthetic or
hierarchical key that leads with the filter field (e.g. /category then /id).
And soften the partition-high-cardinality GOOD examples so a unique id is shown as
one good option conditioned on access pattern, not as unconditionally ideal.
Related: SQL → NoSQL migration note
This bites hardest during relational-to-Cosmos migrations. The instinct is to carry the
table's integer primary key over as the document id and partition by it. Worth a
one-line callout (here or in a model- rule):
When migrating a relational table to Cosmos DB, partition by the dominant query
dimension (the column you filter on most), not the surrogate primary key carried over
as id.
Suggested placement
Precedence note added to partition-high-cardinality.md and partition-query-patterns.md
(both impact: CRITICAL); optional migration one-liner in a model- rule or
partition-synthetic-keys.
Type: enhancement to existing rules (
partition-high-cardinality,partition-query-patterns) — conflicting guidance with no precedenceCategory: Partitioning (
partition-)Severity: Medium (leads to a defensible-but-suboptimal key; turns the dominant query into a cross-partition scan; higher RU/latency at scale)
Affected: data modeling for any SQL→NoSQL migration or new container design, all SDKs
Doc reference: Partitioning overview · Choose a partition key
Summary
Two CRITICAL rules in the kit point in different directions for a common case, and there is
no guidance on which wins:
partition-high-cardinality— "Select partition keys with many unique values." Its✅ GOOD examples are
CustomerId,TenantId,DeviceId— i.e. a bare per-entity id.Read literally, a document's own
/idis the maximum-cardinality choice and looks ideal.partition-query-patterns— "Choose a partition key that supports your most frequentqueries." Its anti-pattern is a product partitioned by one field while most queries filter
by another.
For a read-heavy, rarely-written dataset that is almost always filtered by a field
(a product catalog filtered by category/brand; an orders-by-customer read model; a
content library filtered by type), these two rules disagree:
An agent following the high-cardinality rule will choose
/id, partition every document intoits own logical partition, and turn every
WHERE category = @c/WHERE brand = @bqueryinto a fan-out cross-partition scan — the exact thing query-patterns warns against. The
choice "looks" correct and even cites the right principle ("even distribution, efficient
single-item reads"), so it passes review.
The missing piece is a precedence/tie-breaker: when write volume is low and reads are
dominated by a filter, query alignment should win over raw cardinality. A bare
/idisthe right key primarily when the dominant access pattern is a point read by that id, or
when write throughput is so high that write distribution is the binding constraint.
Scope (important —
/idis not always wrong)The Microsoft docs are explicit that
/idis a great partition key for two cases, andthis issue is not asking to contradict that:
The same page adds the caveat that pins down exactly where
/idstops being a good fit:So the gap is narrow and specific: a read-heavy container that grows past one physical
partition and is filtered by a non-
idfield. There,/idturns the dominant query into across-partition fan-out, while the kit's two CRITICAL rules still give no rule for which one
wins. The ask is a tie-breaker for that case, not a blanket "avoid
/id."Benchmark evidence — reproduced end-to-end with the kit loaded
This is a real agent run (not a synthetic test): the kit was loaded and read, the agent
understood the access pattern, and still chose
/id.The eShop Catalog SQL→Cosmos migration task was run with
claude-opus-4.7and thecosmosdb-best-practices kit installed (compiled
AGENTS.mdbaked into the working dir;load verified — hook install lines present,
AGENTS.mdpulled into the session 8×, Azure MCPconnected). The run passed 13 of 14 independent checks; the only failure was
partition_key_grouping.The agent's own header comment in the generated
Program.cs(verbatim) shows it had alreadyworked out the dominant access pattern — it indexed exactly the filter fields — and still
partitioned by
/id:So with the kit present, the agent recognized that reads filter by type/brand (it built a
composite index and included
/catalogTypeId+/catalogBrandIdas "the filter paths"), yetchose the per-item
/idpartition key — turning the dominant filtered query into across-partition fan-out. The decision "looks" principled and cites high cardinality / even
distribution, exactly as predicted above.
Whole-kit rule audit (why a faithful agent lands on
/id)Auditing the installed kit (119 rules) for the catalog partition decision: four rules point
at
/id, only one ambiguous rule points away, and nothing ranks them./idkeypartition-high-cardinality/idis the maximum; no carve-out warns that per-item granularity fragments reads.partition-key-lengthpartition-immutable-keyidnever changes.partition-avoid-hotspotspartition-query-patternsCategory(never shows/idas wrong), all "correct" examples use an obvious parent entity (Seller/Customer/Conversation), and it explicitly permits "for less common queries, accept cross-partition."A whole-kit search found zero rules that warn against a per-item
/idpartition key andzero rules that give precedence when high-cardinality conflicts with query-alignment. With
the guidance 4-blesses-vs-1-ambiguous and no tie-breaker, an agent optimizing the stated
principles chooses
/idand passes its own review. This is the precise gap the precedencenote below closes.
Verified against the live SDK + emulator
The same outcome reproduces deterministically at the SDK level with
Microsoft.Azure.Cosmos3.46.1 against the Cosmos DB Linux (vNext) emulator. Two containers, identical 30-item data
set, only the partition key differs:
The
/idcontainer can return the correct result for a category filter only by fanning outacross partitions — there is no single logical partition that holds "all Footwear," because
the partition key is the per-item id. That is the cross-partition scan the
partition-query-patternsrule warns about, reached by following
partition-high-cardinalityto the letter.Concrete example (read-heavy catalog)
Recommended guidance to add
Add a short precedence note to both rules (cross-linked):
And soften the
partition-high-cardinalityGOOD examples so a unique id is shown asone good option conditioned on access pattern, not as unconditionally ideal.
Related: SQL → NoSQL migration note
This bites hardest during relational-to-Cosmos migrations. The instinct is to carry the
table's integer primary key over as the document
idand partition by it. Worth aone-line callout (here or in a
model-rule):Suggested placement
Precedence note added to
partition-high-cardinality.mdandpartition-query-patterns.md(both impact: CRITICAL); optional migration one-liner in a
model-rule orpartition-synthetic-keys.