Skip to content

Commit b91f42b

Browse files
ADR-53: JetStream Read-after-Write
Signed-off-by: Maurice van Veen <[email protected]>
1 parent bd3db72 commit b91f42b

File tree

5 files changed

+183
-21
lines changed

5 files changed

+183
-21
lines changed

README.md

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -21,6 +21,7 @@ This repository captures Architecture, Design Specifications and Feature Guidanc
2121
|[ADR-49](adr/ADR-49.md)|jetstream, spec, 2.12|JetStream Distributed Counter CRDT|
2222
|[ADR-50](adr/ADR-50.md)|jetstream, server, client, 2.12|JetStream Batch Publishing|
2323
|[ADR-52](adr/ADR-52.md)|jetstream, client, refinement, 2.12|No Headers support for Direct Get (updating [ADR-31](adr/ADR-31.md))|
24+
|[ADR-53](adr/ADR-53.md)|jetstream, kv, objectstore, server, client, refinement, 2.12|JetStream Read-after-Write (updating [ADR-8](adr/ADR-8.md), [ADR-17](adr/ADR-17.md), [ADR-20](adr/ADR-20.md), [ADR-31](adr/ADR-31.md), [ADR-37](adr/ADR-37.md))|
2425

2526
## Client
2627

@@ -56,6 +57,7 @@ This repository captures Architecture, Design Specifications and Feature Guidanc
5657
|[ADR-48](adr/ADR-48.md)|jetstream, client, kv, refinement, 2.11|TTL Support for Key-Value Buckets (updating [ADR-8](adr/ADR-8.md))|
5758
|[ADR-50](adr/ADR-50.md)|jetstream, server, client, 2.12|JetStream Batch Publishing|
5859
|[ADR-52](adr/ADR-52.md)|jetstream, client, refinement, 2.12|No Headers support for Direct Get (updating [ADR-31](adr/ADR-31.md))|
60+
|[ADR-53](adr/ADR-53.md)|jetstream, kv, objectstore, server, client, refinement, 2.12|JetStream Read-after-Write (updating [ADR-8](adr/ADR-8.md), [ADR-17](adr/ADR-17.md), [ADR-20](adr/ADR-20.md), [ADR-31](adr/ADR-31.md), [ADR-37](adr/ADR-37.md))|
5961

6062
## Jetstream
6163

@@ -87,6 +89,7 @@ This repository captures Architecture, Design Specifications and Feature Guidanc
8789
|[ADR-49](adr/ADR-49.md)|jetstream, spec, 2.12|JetStream Distributed Counter CRDT|
8890
|[ADR-50](adr/ADR-50.md)|jetstream, server, client, 2.12|JetStream Batch Publishing|
8991
|[ADR-52](adr/ADR-52.md)|jetstream, client, refinement, 2.12|No Headers support for Direct Get (updating [ADR-31](adr/ADR-31.md))|
92+
|[ADR-53](adr/ADR-53.md)|jetstream, kv, objectstore, server, client, refinement, 2.12|JetStream Read-after-Write (updating [ADR-8](adr/ADR-8.md), [ADR-17](adr/ADR-17.md), [ADR-20](adr/ADR-20.md), [ADR-31](adr/ADR-31.md), [ADR-37](adr/ADR-37.md))|
9093

9194
## Kv
9295

@@ -95,13 +98,15 @@ This repository captures Architecture, Design Specifications and Feature Guidanc
9598
|[ADR-8](adr/ADR-8.md)|jetstream, client, kv, spec|JetStream based Key-Value Stores|
9699
|[ADR-19](adr/ADR-19.md)|jetstream, client, kv, objectstore|API prefixes for materialized JetStream views|
97100
|[ADR-48](adr/ADR-48.md)|jetstream, client, kv, refinement, 2.11|TTL Support for Key-Value Buckets (updating [ADR-8](adr/ADR-8.md))|
101+
|[ADR-53](adr/ADR-53.md)|jetstream, kv, objectstore, server, client, refinement, 2.12|JetStream Read-after-Write (updating [ADR-8](adr/ADR-8.md), [ADR-17](adr/ADR-17.md), [ADR-20](adr/ADR-20.md), [ADR-31](adr/ADR-31.md), [ADR-37](adr/ADR-37.md))|
98102

99103
## Objectstore
100104

101105
|Index|Tags|Description|
102106
|-----|----|-----------|
103107
|[ADR-19](adr/ADR-19.md)|jetstream, client, kv, objectstore|API prefixes for materialized JetStream views|
104108
|[ADR-20](adr/ADR-20.md)|jetstream, client, objectstore, spec|JetStream based Object Stores|
109+
|[ADR-53](adr/ADR-53.md)|jetstream, kv, objectstore, server, client, refinement, 2.12|JetStream Read-after-Write (updating [ADR-8](adr/ADR-8.md), [ADR-17](adr/ADR-17.md), [ADR-20](adr/ADR-20.md), [ADR-31](adr/ADR-31.md), [ADR-37](adr/ADR-37.md))|
105110

106111
## Observability
107112

@@ -122,6 +127,7 @@ This repository captures Architecture, Design Specifications and Feature Guidanc
122127
|-----|----|-----------|
123128
|[ADR-48](adr/ADR-48.md)|jetstream, client, kv, refinement, 2.11|TTL Support for Key-Value Buckets (updating [ADR-8](adr/ADR-8.md))|
124129
|[ADR-52](adr/ADR-52.md)|jetstream, client, refinement, 2.12|No Headers support for Direct Get (updating [ADR-31](adr/ADR-31.md))|
130+
|[ADR-53](adr/ADR-53.md)|jetstream, kv, objectstore, server, client, refinement, 2.12|JetStream Read-after-Write (updating [ADR-8](adr/ADR-8.md), [ADR-17](adr/ADR-17.md), [ADR-20](adr/ADR-20.md), [ADR-31](adr/ADR-31.md), [ADR-37](adr/ADR-37.md))|
125131

126132
## Security
127133

@@ -160,6 +166,7 @@ This repository captures Architecture, Design Specifications and Feature Guidanc
160166
|[ADR-43](adr/ADR-43.md)|jetstream, client, server, 2.11|JetStream Per-Message TTL|
161167
|[ADR-44](adr/ADR-44.md)|jetstream, server, 2.11|Versioning for JetStream Assets|
162168
|[ADR-50](adr/ADR-50.md)|jetstream, server, client, 2.12|JetStream Batch Publishing|
169+
|[ADR-53](adr/ADR-53.md)|jetstream, kv, objectstore, server, client, refinement, 2.12|JetStream Read-after-Write (updating [ADR-8](adr/ADR-8.md), [ADR-17](adr/ADR-17.md), [ADR-20](adr/ADR-20.md), [ADR-31](adr/ADR-31.md), [ADR-37](adr/ADR-37.md))|
163170

164171
## Spec
165172

adr-template.md

Lines changed: 0 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -24,10 +24,6 @@
2424

2525
[If this is a specification or actual design, write something here.]
2626

27-
## Decision
28-
29-
[Maybe this was just an architectural decision...]
30-
3127
## Consequences
3228

3329
[Any consequences of this design, such as breaking change or Vorpal Bunnies]

adr/ADR-31.md

Lines changed: 20 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -7,11 +7,12 @@
77
| Status | Implemented |
88
| Tags | jetstream, client, server, 2.11 |
99

10-
| Revision | Date | Author | Info |
11-
|----------|------------|------------|----------------------------------------------------------|
12-
| 1 | 2022-08-08 | @tbeets | Initial design |
13-
| 2 | 2024-03-06 | @ripienaar | Adds Multi and Batch behaviors for Server 2.11 |
14-
| 3 | 2025-06-19 | @ripienaar | Support surpressing headers in replies using `NoHeaders` |
10+
| Revision | Date | Author | Info | Refinement | Server Requirement |
11+
|----------|------------|-----------------|----------------------------------------------------------|------------|--------------------|
12+
| 1 | 2022-08-08 | @tbeets | Initial design | | |
13+
| 2 | 2024-03-06 | @ripienaar | Adds Multi and Batch behaviors for Server 2.11 | | |
14+
| 3 | 2025-06-19 | @ripienaar | Support suppressing headers in replies using `NoHeaders` | | |
15+
| 4 | 2025-07-11 | @MauriceVanVeen | Update on Read-after-Write guarantee | ADR-53 | |
1516

1617
## Context and motivation
1718

@@ -42,14 +43,20 @@ clients. Also, read availability can be enhanced as mirrors may be available to
4243

4344
###### A note on read-after-write coherency
4445

45-
The existing Get API `$JS.API.STREAM.MSG.GET.<stream>` provides read-after-write coherency by routing requests to a
46-
stream's current peer leader (R>1) or single server (R=1). A client that publishes a message to stream (with ACK) is
47-
assured that a subsequent call to the Get API will return that message as the read will go a server that defines
48-
_most current_.
46+
The existing Get API `$JS.API.STREAM.MSG.GET.<stream>` as well as _Direct Get_ do NOT provide any read-after-write
47+
guarantees by default. The existing Get API only guarantees read-after-write if the underlying stream is not
48+
replicated (R=1).
4949

50-
In contrast, _Direct Get_ does not assure read-after-write coherency as responders may be non-leader stream servers
51-
(that may not have yet applied the latest consensus writes) or MIRROR downstream servers that have not yet _consumed_
52-
the latest consensus writes from upstream.
50+
_Direct Get_ does not assure read-after-write coherency as responders may be non-leader stream servers (that may not
51+
have yet applied the latest consensus writes) or MIRROR downstream servers that have not yet _consumed_ the latest
52+
consensus writes from upstream.
53+
54+
The Get API routes requests to a stream's current peer leader (R>1). A client that publishes multiple messages to a
55+
stream (with ACK) is assured that they will be properly ordered by sequence, regardless of which peer leader was active
56+
at that time. However, during and after leader elections, calls to the Get API could still be served by a server that
57+
still thinks it's leader even if a new leader was elected in the meantime (but it doesn't know yet).
58+
59+
Read-after-write guarantees can be opted into with [ADR-53](adr/ADR-53.md).
5360

5461
## Implementation
5562

@@ -61,7 +68,7 @@ the latest consensus writes from upstream.
6168
based on `max_msgs_per_subject`
6269

6370
> Allow Direct is set automatically based on the inferred use case of the stream. Maximum messages per subject is a
64-
tell-tale of a stream that is a KV bucket.
71+
> tell-tale of a stream that is a KV bucket.
6572
6673
### Direct Get API
6774

adr/ADR-53.md

Lines changed: 151 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,151 @@
1+
# JetStream Read-after-Write
2+
3+
| Metadata | Value |
4+
|----------|--------------------------------------------------------------|
5+
| Date | 2025-07-11 |
6+
| Author | @MauriceVanVeen |
7+
| Status | Proposed |
8+
| Tags | jetstream, kv, objectstore, server, client, refinement, 2.12 |
9+
| Updates | ADR-8, ADR-17, ADR-20, ADR-31, ADR-37 |
10+
11+
| Revision | Date | Author | Info |
12+
|----------|------------|-----------------|----------------|
13+
| 1 | 2025-07-11 | @MauriceVanVeen | Initial design |
14+
15+
## Problem Statement
16+
17+
JetStream does NOT support read-after-write or monotonic reads. This can be especially problematic when
18+
using [ADR-8 JetStream based Key-Value Stores](ADR-8.md), primarily but not limited to the use of _Direct Get_.
19+
20+
Specifically, we have no way to guarantee a write like `kv.Put` can be observed by a subsequent `kv.Get` or `kv.Watch`,
21+
especially when the KV/stream is replicated or mirrored.
22+
23+
## Context
24+
25+
The topic of immediate consistency within NATS JetStream can sometimes be a bit confusing. On our docs we claim we
26+
maintain immediate consistency (as opposed to eventual consistency) even in the face of failures. Which is true, but
27+
as with anything, it depends.
28+
29+
- **Monotonic writes**, all writes to a single stream (replicated or not) are monotonic. It's ordered regardless of
30+
publisher by the stream sequence.
31+
- **Monotonic reads**, if you're using consumers. All reads for a consumer (replicated or not) are monotonic. It's
32+
ordered by consumer delivery sequence. (Messages can be redelivered on failure, but this also depends on which
33+
settings are used)
34+
35+
Those paths are immediately consistent, but they are not immediately consistent with respect to each other. This is no
36+
problem for publishers and consumers of a stream, because they observe all operations to be monotonic.
37+
But, if you use the KV abstraction for example, you're more often going to use single message gets through `kv.Get`.
38+
Since those rely on `DirectGet`, even followers can answer, which means we (by default) can't guarantee read-after-write
39+
or even monotonic reads. Such message GET requests get served randomly by all servers within the peer group (or even
40+
mirrors if enabled). Those obviously can't be made immediately consistent, since both replication and mirroring are
41+
async.
42+
43+
Also, when following up a `kv.Create` with `kv.Keys`, you might expect read-after-write such that the returned keys
44+
contains the key you've just written to. This also requires read-after-write.
45+
46+
## Design
47+
48+
Before sharing the proposed design, let's look at an alternative. Read-after-write could be achieved by having reads (on
49+
an opt-in basis) go through Raft replication first. This has several disadvantages:
50+
51+
- Reads will become significantly slower, due to requiring replication first.
52+
- Reads require quorum, due to replication, disallowing any reads when there's downtime or temporarily no leader.
53+
- Only the stream leader can answer reads, as it is the first one to know that it can answer the request. (Followers
54+
replicate asynchronously, so letting them answer would make the response take even longer to return.)
55+
- Mirrors can still answer `DirectGet` requests, the transparency of mirrors answering read requests will violate any
56+
read-after-write guarantees (as the client will not know). This would mean mirrors must not be enabled if this
57+
guarantee should be kept.
58+
- Read-after-write guarantees could temporarily be violated when scaling streams up or down.
59+
- This is not a compatible approach for consumers, meaning they could not have these guarantees based on this approach.
60+
It would require limiting consumer creation to R1 on the stream leader, which is not possible since the assignment is
61+
done by the meta leader that has no knowledge about the stream leader. A replicated consumer could violate the
62+
requirement if the consumer leader changes to an outdated follower in between. And would not work at all when creating
63+
a consumer on a mirrored stream.
64+
65+
Although having reads be served through Raft does (mostly) offer a strong guarantee of read-after-write and monotonic
66+
reads, the disadvantages outway the advantages. Ideally, the solution has the following advantages:
67+
68+
- It's explicitly defined, either in configuration or in code.
69+
- Works for both replicated and non-replicated streams. (Scale up/down has no influence, and implementation is not
70+
replication-specific)
71+
- Incurs no slowdown, just as fast as reads that don't guarantee read-after-write (no prior replication required).
72+
- Let followers, and even mirrors, answer read requests as long as they can make the guarantee.
73+
- Let followers, and mirrors, inform the client when they can't make the guarantee. The guarantee is always kept, but
74+
an error is returned that can be retried (to get a successful read). This can be tuned by disabling reads on mirrors
75+
or followers.
76+
77+
Now, on to the proposed design which has the above advantages.
78+
79+
The write and read paths remain eventually consistent as it is now. But one can opt-in for immediate consistency to
80+
guarantee read-after-write and monotonic reads, for both direct/msg read requests as well as consumers.
81+
82+
- **Read-after-write** is achieved because all writes through `js.Publish`, `kv.Put`, etc. return the sequence
83+
(inherently last sequence) of the stream. In `DirectGet` requests those observed last sequences can be used for read
84+
requests.
85+
- **Monotonic reads** is achieved by collecting the highest sequence seen in read requests and using that sequence for
86+
subsequent read requests.
87+
88+
This can be implemented with an additional `MinLastSeq` field in `JSApiMsgGetRequest` and `ConsumerConfig`.
89+
90+
- This ensures the server only replies with data if it can actually 100% guarantee immediate consistency. This is done
91+
by confirming the `LastSeq` it has for its local stream, is at least the `MinLastSeq` specified.
92+
- Side-note: although `MsgGet` is only answered by the leader, technically an old leader could still respond and serve
93+
stale reads. Although this shouldn't happen often in practice, until now we couldn't guarantee it. The error can be
94+
detected on the old leader, and it can delay the error response, allowing for the real leader to send the actual
95+
answer.
96+
- Followers that can't satisfy the `MinLastSeq` redirect the request to the leader for it to answer instead. This allows
97+
followers to still serve reads and share the load if they can, but if they can't, they defer to the leader to not
98+
require a client to retry on what would otherwise be an error.
99+
- Mirrors reject the read request if they can't satisfy the `MinLastSeq`. But can serve reads and share the load
100+
otherwise. Mirrors don't redirect requests to a leader, not even to the stream leader if the mirror is replicated.
101+
- Leaders/followers/mirrors don't reject a request immediately, but delay this error response to make sure clients don't
102+
spam these requests while allowing the underlying resources to try and become up-to-date enough in the meantime.
103+
- Rejected read requests have the error code returned as a header, e.g. `NATS/1.0 412 Min Last Sequence`.
104+
- Consumers don't start delivering messages until the `MinLastSeq` is reached, and don't reject the consumer creation.
105+
This allows consumers to be created successfully, even on outdated followers or mirrors, while waiting to ensure
106+
`pending` counts are correct when following up `kv.Create` with `kv.Keys` for example.
107+
108+
In terms of API, it can look like this:
109+
110+
```go
111+
// Write
112+
r, err := kv.Put(ctx, "key", []byte("value"))
113+
114+
// Read request
115+
kve, err := kv.Get(ctx, "key", jetstream.MinLastRevision(r))
116+
117+
// Watch/consumer
118+
kl, err := kv.ListKeys(ctx, jetstream.MinLastRevision(r))
119+
```
120+
121+
By specifying the `MinLastRevision` (or `MinLastSequence` when using a stream normally), you can be sure your read
122+
request will be rejected if it can't be satisfied, or the follower/mirror will wait to deliver you messages from
123+
the consumer until it's up-to-date. Followers redirect requests, that would otherwise error, to the leader to not
124+
require the client to retry in these cases.
125+
126+
This satisfies read-after-write and monotonic reads when combining the write and read paths, as well as when only
127+
preforming reads.
128+
129+
### A note about message deletion and purges
130+
131+
JetStream allows in-place deletion of messages through a "message delete" or "purge" request. These don't write new
132+
messages, and thus don't increase the last sequence. This means there are no read-after-write or monotonic reads after a
133+
message is deleted or purged. For example, after deleting a message or purging the stream, multiple requests can flip
134+
between returning the original messages and returning them as deleted.
135+
136+
Although a downside of this approach, it can only be supported when using a replicated stream that's not mirrored, which
137+
would be too restrictive. Whereas with the proposed approach, all followers and mirrors can contribute to providing the
138+
guarantee, regardless of replication or topology (which is valued more highly).
139+
140+
When deleting or purging messages is still desired AND you want to rely on read-after-write or monotonic reads, rollups
141+
can be used instead. The `Nats-Rollup` header can be used to purge messages where the subject equals, or purge the whole
142+
stream. Because a rollup message increases the last sequence, these guarantees can be relied upon again. However, the
143+
client application will need to interpret this rollup message as a "delete/purge" similar to how KV uses delete and
144+
purge markers. Therefore, the KV abstraction still has these guarantees since it places a new message for its
145+
`kv.Delete` and uses a rollup message for its `kv.Purge`.
146+
147+
## Consequences
148+
149+
Since this is an opt-in on a read request or consumer create basis, this is not a breaking change. Depending on client
150+
implementation, this could be harder to implement. But given it's just another field in the `JSApiMsgGetRequest` and
151+
`ConsumerConfig`, each client should have no trouble supporting it.

adr/ADR-8.md

Lines changed: 5 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,7 @@
2222
| 7 | 2025-01-23 | Add Max Age limit Markers, remove non direct gets | ADR-48 | 2.11.0 |
2323
| 8 | 2025-02-17 | Add Metadata | | 2.10.0 |
2424
| 9 | 2025-04-09 | Document max_age and duplicate_window requirements | | |
25+
| 10 | 2025-07-11 | Update on Read-after-Write guarantee | ADR-53 | |
2526

2627
## Context
2728

@@ -291,12 +292,12 @@ The features to support KV is in NATS Server 2.6.0.
291292

292293
#### Consistency Guarantees
293294

294-
We do not provide read-after-write consistency. Reads are performed directly to any replica, including out
295-
of date ones. If those replicas do not catch up multiple reads of the same key can give different values between
296-
reads. If the cluster is healthy and performing well most reads would result in consistent values, but this should not
295+
We do not provide read-after-write consistency by default. Reads are performed directly to any replica, including
296+
out-of-date ones. If those replicas do not catch up, multiple reads of the same key can give different values between
297+
reads. If the cluster is healthy and performing well, most reads would result in consistent values, but this should not
297298
be relied on to be true.
298299

299-
Historically we had read-after-write consistency, this has been deprecated and retained here for historical record only.
300+
Read-after-write guarantees can be opted into with [ADR-53](adr/ADR-53.md).
300301

301302
#### Buckets
302303

0 commit comments

Comments
 (0)