Add per-field `dimension` to caches by zipdoki · Pull Request #309 · kakao/actionbase

zipdoki · 2026-05-07T10:24:17Z

Summary

As discussed in offline meeting, adds an optional dimension whitelist to cache fields. When set, only edges whose value for that field is in the whitelist are written to the cache; other edges are skipped at write time. Read paths are unchanged — filtered rows are simply absent from the wide row.

Changes

IndexField (V3) and Cache.Field (V2 codec) gain dimension with empty-set validation.
EdgeMutationBuilder.buildCacheRecords skips the cache record when any field is outside its dimension.
AbstractEdgeEncoder.encodeAllCacheEdges applies the same filter on V2 bulk writes.
Test PrettyObjectWriter matches production mapper's NON_NULL policy.

How to Test

./gradlew :core:test :codec-java:test :server:test :engine:test
EdgeMutationBuilderTest.CacheDimensionFilter — in/out-of-dimension, null dimension, UPDATED transitions, empty-set reject.
EdgeCacheQueryE2ETest, ActionbaseQueryE2ETest — direct seek and multi-hop CACHE.

AI Assistance

This PR was written largely with AI assistance.
- Tool / model: Claude Code (Opus 4.7, 1M context)

Caches gain an optional `dimension` field — a whitelist of allowed values for that field. When set, only edges whose value for the field is in the list are written to the cache; other edges are silently skipped at write time. Read-side code is unchanged: filtered rows are simply absent from the wide row, so existing range filters and pagination keep working. `dimension` is held as a `Set` (membership-only, dedup'd by Jackson at deserialization). Empty sets are rejected at construction to surface DDL mistakes. V2 `Cache.Field` mirrors the V3 `IndexField` shape, including defensive immutability so caller mutations can't drift the whitelist. Test plan: - core: `EdgeMutationBuilderTest` cases for in/out-of-dimension writes, null dimension regression, UPDATED transitions across the boundary, and empty-set construction reject. - server: `EdgeCacheQueryE2ETest` and `ActionbaseQueryE2ETest` exercise the write-time filter via direct seek and multi-hop CACHE. - testFixtures: align `PrettyObjectWriter` with production `ActionbaseObjectMapper`'s `NON_NULL` inclusion so schema round-trip fixtures don't regress on new nullable fields. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>

Defensive copy via `new HashSet<>(dimension)` is enough — the surrounding `Cache.fields` list isn't unmodifiable either, so wrapping only `dimension` was inconsistent. `HashSet` over `LinkedHashSet`: a whitelist is order- independent by definition; preserving JSON input order would contradict the Set semantics we already chose. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>

Cache field values are direction-independent, so computing them once per cache (alongside the dimension match check) instead of once per direction saves an allocation pair on `BOTH` mutations. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>

zipdoki · 2026-05-07T10:27:29Z

            jacksonObjectMapper().apply {
                setDefaultPrettyPrinter(PrettyPrinter())
                configure(DeserializationFeature.FAIL_ON_UNKNOWN_PROPERTIES, false)
+                setSerializationInclusion(JsonInclude.Include.NON_NULL)


@em3s
Surfaced by nullable dimension — aligns test mapper with production NON_NULL so nullable fields don't break round-trip fixtures.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>

Two cases — matching and non-matching edges against a `dimension` whitelist on `permission` — confirming the filter applies through `BulkEdgeEncoder.bulkEncodeAll` without affecting the hash/indexed/counter rows. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>

em3s · 2026-05-07T13:17:00Z

Adding context from our offline discussion — this is groundwork for per-dimension limit distribution on multi-field EdgeCache. Today, with a two-field cache (field1, field2) and limit 100, both bulkload and pruner spend the budget in leading-field order, so we can't allocate it across field1 buckets — the bucket set isn't known up front.

Pinning that set in the label via dimension gives bulkload and pruner a shared source, enabling per-dimension top-N.

em3s · 2026-05-07T13:30:36Z

    val field: String,
    val order: Order,
-)
+    val dimension: Set<Any>? = null,


dimension is EdgeCache-only and stays null on EdgeIndex — time to split off a CacheField.

em3s · 2026-05-07T13:33:45Z

Left a review comment. Please take a look. @zipdoki

`dimension` is EdgeCache-only and stays `null` on EdgeIndex, so move it to a dedicated `CacheField` and keep `IndexField` to its index shape. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>

em3s · 2026-05-08T07:50:23Z

@zipdoki

As also discussed offline: instead of matches, pass the actual dimensionValue through to the caller — so bulkload (and similar) can compute top-N on rowkey + dimensionValue rather than rowkey alone.

Adds a `dimensionValue` byte/string field to `KeyFieldValue`/`TypedKeyFieldValue` and threads it through the cache-edge write path. The value encodes only the dimensioned fields (in declared order) and is shared across OUT/IN directions. `Cache.Field.hasDimension()` treats `null` and `[]` as equivalent (no filter), preserving JSON round-trip while unifying behavior. `Cache` precomputes `dimensionedFields` so the hot path avoids re-filtering, and `passesAllDimensions` / `encodeDimensionValue` iterate the precomputed list directly. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>

zipdoki · 2026-05-08T08:32:29Z

@em3s
Thanks for both reviews.

1. Splitting CacheField off IndexField — done in Split CacheField off IndexField. dimension now lives only on CacheField.

2. Threading dimensionValue instead of matches — tried, reverted.

Three outcomes have to coexist: skip on mismatch, emit with dimensionValue = null, emit with an encoded value. Merging the predicate and the encoder into one return path looks like:

.flatMap(cache -> {
  List<Cache.Field> dimensionedFields = cache.getDimensionedFields();
  Object[] values = new Object[dimensionedFields.size()];
  for (int i = 0; i < dimensionedFields.size(); i++) {
    Cache.Field f = dimensionedFields.get(i);
    Object v = resolveFieldValue(f.getField(), ts, src, tgt, props);
    if (!f.getDimension().contains(v)) return Stream.empty();
    values[i] = v;
  }
  T dimensionValue = dimensionedFields.isEmpty()
      ? null
      : encodeBufferAsT(buffer -> { /* encode values[i] */ });
  return dirType.getDirs().stream()
      .map(dir -> encodeCacheEdge(...).withDimensionValue(dimensionValue));
})

Costs of merging:

A per-cache Object[] allocation just to carry resolved values between the validate phase and the encode phase — extra GC pressure on the hot path that the separated form avoids entirely.
Object[] + Stream.empty() sentinel to encode the three outcomes in a single return.
Lambda body grows to ~15 lines, vs. two ~5-line helpers.
Loses the natural filter (predicate) / flatMap (value + expand) mapping onto stream operators.
Harder to unit-test in isolation.

What the current shape pays for that:

.filter(cache -> passesAllDimensions(cache, ts, src, tgt, props))
.flatMap(cache -> {
  T dimensionValue = encodeDimensionValue(cache, ts, src, tgt, props);
  return dirType.getDirs().stream()
      .map(dir -> encodeCacheEdge(...).withDimensionValue(dimensionValue));
})

resolveFieldValue runs twice per dimensioned field, and the two helpers must stay aligned on iteration. Both iterate cache.getDimensionedFields(), so drift risk is low, and resolveFieldValue is a switch + HashMap.get — sub-100ns. Happy to revisit if that cost ever gets heavier.

Added in 4c9c586 to keep schema fixtures stable against the new nullable `IndexField.dimension`. After 77bbb5b split `CacheField` off `IndexField`, `IndexField` has no nullable field again and no current fixture serializes a `CacheField` with `dimension` set, so the inclusion override no longer affects any test. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>

Existing dimension tests only exercise 2-field caches (one dimensioned field plus `created_at`). Adds a 3-field case (`permission`, `memo`, `created_at`) that verifies the encoder output downstream per-dimension top-N depends on: - four (permission, memo) buckets emit distinct `dimensionValue` tags, - same-bucket edges with different `created_at` share a `dimensionValue`, - configured field order is preserved in the byte encoding so same-permission siblings share a longer prefix than cross-permission pairs. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>

em3s · 2026-05-08T09:23:05Z

Once the three-field tests pass and the bulkload/seek paths we're testing internally are green, we can merge and follow up with the codec-java release. @zipdoki

`testCacheDimensionFourBucketsHaveDistinctOrderedTags` previously checked distinctness, stability, and field-order prefix grouping. Adds an explicit byte-wise comparison so ASC encoding is verified to sort the four buckets in (permission, memo) lexicographic order: me/a < me/b < others/a < others/b. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>

This reverts commit 99dc9ec, reversing changes made to 5f351b2.

zipdoki and others added 3 commits May 7, 2026 18:55

zipdoki self-assigned this May 7, 2026

zipdoki requested a review from em3s as a code owner May 7, 2026 10:24

zipdoki removed the request for review from em3s May 7, 2026 10:24

dosubot Bot added size:L This PR changes 100-499 lines, ignoring generated files. enhancement New feature or request labels May 7, 2026

zipdoki commented May 7, 2026

View reviewed changes

zipdoki and others added 2 commits May 7, 2026 19:29

Apply Spotless formatting

5a3c66f

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>

em3s reviewed May 7, 2026

View reviewed changes

Split CacheField off IndexField

77bbb5b

`dimension` is EdgeCache-only and stays `null` on EdgeIndex, so move it to a dedicated `CacheField` and keep `IndexField` to its index shape. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>

zipdoki and others added 2 commits May 8, 2026 17:56

zipdoki and others added 5 commits May 8, 2026 18:27

Apply Spotless formatting

5f351b2

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>

Merge branch 'main' into feat/cache-dimension

99dc9ec

Revert "Merge branch 'main' into feat/cache-dimension"

2925a4a

This reverts commit 99dc9ec, reversing changes made to 5f351b2.

Merge branch 'main' into feat/cache-dimension

d0dff08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add per-field `dimension` to caches#309

Add per-field `dimension` to caches#309
zipdoki wants to merge 14 commits into
mainfrom
feat/cache-dimension

zipdoki commented May 7, 2026

Uh oh!

zipdoki May 7, 2026

Uh oh!

em3s commented May 7, 2026

Uh oh!

em3s May 7, 2026

Uh oh!

em3s commented May 7, 2026

Uh oh!

em3s commented May 8, 2026

Uh oh!

zipdoki commented May 8, 2026 •

edited

Loading

Uh oh!

em3s commented May 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

zipdoki commented May 7, 2026

Summary

Changes

How to Test

AI Assistance

Uh oh!

zipdoki May 7, 2026

Choose a reason for hiding this comment

Uh oh!

em3s commented May 7, 2026

Uh oh!

em3s May 7, 2026

Choose a reason for hiding this comment

Uh oh!

em3s commented May 7, 2026

Uh oh!

em3s commented May 8, 2026

Uh oh!

zipdoki commented May 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

em3s commented May 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

zipdoki commented May 8, 2026 •

edited

Loading