Add per-field dimension to caches#309
Conversation
Caches gain an optional `dimension` field — a whitelist of allowed values for that field. When set, only edges whose value for the field is in the list are written to the cache; other edges are silently skipped at write time. Read-side code is unchanged: filtered rows are simply absent from the wide row, so existing range filters and pagination keep working. `dimension` is held as a `Set` (membership-only, dedup'd by Jackson at deserialization). Empty sets are rejected at construction to surface DDL mistakes. V2 `Cache.Field` mirrors the V3 `IndexField` shape, including defensive immutability so caller mutations can't drift the whitelist. Test plan: - core: `EdgeMutationBuilderTest` cases for in/out-of-dimension writes, null dimension regression, UPDATED transitions across the boundary, and empty-set construction reject. - server: `EdgeCacheQueryE2ETest` and `ActionbaseQueryE2ETest` exercise the write-time filter via direct seek and multi-hop CACHE. - testFixtures: align `PrettyObjectWriter` with production `ActionbaseObjectMapper`'s `NON_NULL` inclusion so schema round-trip fixtures don't regress on new nullable fields. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Defensive copy via `new HashSet<>(dimension)` is enough — the surrounding `Cache.fields` list isn't unmodifiable either, so wrapping only `dimension` was inconsistent. `HashSet` over `LinkedHashSet`: a whitelist is order- independent by definition; preserving JSON input order would contradict the Set semantics we already chose. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Cache field values are direction-independent, so computing them once per cache (alongside the dimension match check) instead of once per direction saves an allocation pair on `BOTH` mutations. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
| jacksonObjectMapper().apply { | ||
| setDefaultPrettyPrinter(PrettyPrinter()) | ||
| configure(DeserializationFeature.FAIL_ON_UNKNOWN_PROPERTIES, false) | ||
| setSerializationInclusion(JsonInclude.Include.NON_NULL) |
There was a problem hiding this comment.
@em3s
Surfaced by nullable dimension — aligns test mapper with production NON_NULL so nullable fields don't break round-trip fixtures.
Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Two cases — matching and non-matching edges against a `dimension` whitelist on `permission` — confirming the filter applies through `BulkEdgeEncoder.bulkEncodeAll` without affecting the hash/indexed/counter rows. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
|
Adding context from our offline discussion — this is groundwork for per-dimension Pinning that set in the label via |
| val field: String, | ||
| val order: Order, | ||
| ) | ||
| val dimension: Set<Any>? = null, |
There was a problem hiding this comment.
dimension is EdgeCache-only and stays null on EdgeIndex — time to split off a CacheField.
|
Left a review comment. Please take a look. @zipdoki |
`dimension` is EdgeCache-only and stays `null` on EdgeIndex, so move it to a dedicated `CacheField` and keep `IndexField` to its index shape. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
|
As also discussed offline: instead of |
Adds a `dimensionValue` byte/string field to `KeyFieldValue`/`TypedKeyFieldValue` and threads it through the cache-edge write path. The value encodes only the dimensioned fields (in declared order) and is shared across OUT/IN directions. `Cache.Field.hasDimension()` treats `null` and `[]` as equivalent (no filter), preserving JSON round-trip while unifying behavior. `Cache` precomputes `dimensionedFields` so the hot path avoids re-filtering, and `passesAllDimensions` / `encodeDimensionValue` iterate the precomputed list directly. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
|
@em3s 1. Splitting 2. Threading Three outcomes have to coexist: skip on mismatch, emit with .flatMap(cache -> {
List<Cache.Field> dimensionedFields = cache.getDimensionedFields();
Object[] values = new Object[dimensionedFields.size()];
for (int i = 0; i < dimensionedFields.size(); i++) {
Cache.Field f = dimensionedFields.get(i);
Object v = resolveFieldValue(f.getField(), ts, src, tgt, props);
if (!f.getDimension().contains(v)) return Stream.empty();
values[i] = v;
}
T dimensionValue = dimensionedFields.isEmpty()
? null
: encodeBufferAsT(buffer -> { /* encode values[i] */ });
return dirType.getDirs().stream()
.map(dir -> encodeCacheEdge(...).withDimensionValue(dimensionValue));
})Costs of merging:
What the current shape pays for that: .filter(cache -> passesAllDimensions(cache, ts, src, tgt, props))
.flatMap(cache -> {
T dimensionValue = encodeDimensionValue(cache, ts, src, tgt, props);
return dirType.getDirs().stream()
.map(dir -> encodeCacheEdge(...).withDimensionValue(dimensionValue));
})
|
Added in 4c9c586 to keep schema fixtures stable against the new nullable `IndexField.dimension`. After 77bbb5b split `CacheField` off `IndexField`, `IndexField` has no nullable field again and no current fixture serializes a `CacheField` with `dimension` set, so the inclusion override no longer affects any test. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Existing dimension tests only exercise 2-field caches (one dimensioned field plus `created_at`). Adds a 3-field case (`permission`, `memo`, `created_at`) that verifies the encoder output downstream per-dimension top-N depends on: - four (permission, memo) buckets emit distinct `dimensionValue` tags, - same-bucket edges with different `created_at` share a `dimensionValue`, - configured field order is preserved in the byte encoding so same-permission siblings share a longer prefix than cross-permission pairs. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
|
Once the three-field tests pass and the bulkload/seek paths we're testing internally are green, we can merge and follow up with the codec-java release. @zipdoki |
`testCacheDimensionFourBucketsHaveDistinctOrderedTags` previously checked distinctness, stability, and field-order prefix grouping. Adds an explicit byte-wise comparison so ASC encoding is verified to sort the four buckets in (permission, memo) lexicographic order: me/a < me/b < others/a < others/b. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Summary
As discussed in offline meeting, adds an optional
dimensionwhitelist to cache fields. When set, only edges whose value for that field is in the whitelist are written to the cache; other edges are skipped at write time. Read paths are unchanged — filtered rows are simply absent from the wide row.Changes
IndexField(V3) andCache.Field(V2 codec) gaindimensionwith empty-set validation.EdgeMutationBuilder.buildCacheRecordsskips the cache record when any field is outside its dimension.AbstractEdgeEncoder.encodeAllCacheEdgesapplies the same filter on V2 bulk writes.PrettyObjectWritermatches production mapper'sNON_NULLpolicy.How to Test
./gradlew :core:test :codec-java:test :server:test :engine:testEdgeMutationBuilderTest.CacheDimensionFilter— in/out-of-dimension, null dimension, UPDATED transitions, empty-set reject.EdgeCacheQueryE2ETest,ActionbaseQueryE2ETest— direct seek and multi-hop CACHE.AI Assistance