Symptom
ci/circleci: examples-windows (and examples-linux on most runs) has been failing on every main commit since 027d38ef (PR #1887, merged 2026-05-26 05:16 UTC) — 9 consecutive reds at the time of filing, no greens since.
Both jobs fail at the same step, "Rust Git Dataflow example" (cargo run --example rust-dataflow-git), with a CircleCI no_output_timeout: 30m cap:
[success ] 19.7m Build examples + CLI binary
[success ] 10.4m Rust Dataflow example
[timedout] 28.9m Rust Git Dataflow example ← here
[timedout] 0.0m Build timed out
Sample failing runs:
Bisect
| Commit |
PR |
examples-windows |
536c5116 |
#1918 |
✅ SUCCESS (last green) |
027d38ef |
#1887 "Stop properly on stop message" |
❌ first red |
7aad98c7 |
#1888 |
❌ |
4c376497 |
#1929 |
❌ |
8e2f0233 |
#1931 |
❌ |
630eec2b |
#1935 |
❌ |
eab18ab0 |
#1787 (zenoh-routing rewrite) |
❌ |
2a69ad72 |
#1941 |
❌ |
f8aebe18 |
#1883 |
❌ |
9f4242b6 |
#1884 |
❌ |
Root cause hypothesis
examples/rust-dataflow-git/dataflow.yml pins each node to a specific dora commit via rev::
nodes:
- id: rust-node
git: https://github.com/dora-rs/dora.git
rev: 10cf7fe9c082caaa90679bcca48c873cdc16311b
...
10cf7fe9 is from 2026-04-28 — 74 commits behind current main. The file's own comment names exactly this scenario:
Smoke-tests git-sourced nodes. Pins to a dora commit (not a released tag) so the example runs against matching message-format versions without needing a release. Update rev: when a message-format-breaking change lands on main — otherwise the CI job catches the mismatch and signals a compatibility break, which is the whole point of this test.
Between 10cf7fe9 and current main, message-format-adjacent files have churned heavily (1598 ins, 349 del across libraries/message/ and node/daemon source). The most likely culprits:
The fact that 027d38ef is the first red (and the bisect-immediate-predecessor merge to land after 5/24) is most likely coincidental — the drift was already accumulating; the pin reached the breaking threshold around then. PR #1887 itself only touched README + python sender + smoke tests, so it almost certainly isn't the direct cause despite the bisect position.
Why it's slipped through the merge queue
Branch protection on main requires only these checks: Format, Clippy, Check, Typos, Audit (cargo-audit + cargo-deny), Unwrap budget, License check. examples-windows and examples-linux aren't in the required list, so Trunk Merge Queue doesn't gate on them. PRs land regardless.
Reproduction
# On current main (or any commit since 027d38ef):
cargo run --example rust-dataflow-git
# Hangs indefinitely; nodes built from pinned rev 10cf7fe9 don't exit.
On Linux it hangs ~48m before the CircleCI wall, on Windows ~29m. The hang itself probably starts much earlier; the no-output-timeout is what eventually kills it.
Proposed fixes
Short-term (unblock CI immediately)
Bump rev: in examples/rust-dataflow-git/dataflow.yml to a current main commit (e.g., 9f4242b6 or whatever's latest at fix time). One-line change in YAML. CI should go green on the next push.
If post-bump CI is STILL red at the same step, that confirms the regression isn't purely message-format drift and there's a real bug to chase — but I expect the bump alone is enough.
Medium-term (prevent recurrence)
Either:
-
Add examples-windows (or at least the examples-linux subset) to the required-checks list for main. This ensures Trunk Merge Queue gates on it. Risk: any flake blocks all PRs.
-
Set up a scheduled job that runs cargo run --example rust-dataflow-git against HEAD of main once per day and auto-bumps the pin on success. Removes the manual maintenance step the file's comment relies on.
-
Document in CONTRIBUTING.md (or the dataflow.yml comment more prominently) that any PR touching libraries/message/, apis/rust/node/src/, or daemon protocol code MUST bump the pin in the same PR. Trusts the contributor, costs nothing structurally.
(1) is the cheapest to implement but adds load to every PR. (3) is the cheapest in CI cost but trusts process. (2) is the most automated.
Severity
Medium. The bug is in CI-only test infrastructure; doesn't affect users. But:
- Every PR's CI looks "red" on examples-windows/linux, which trains maintainers to ignore those statuses (bad habit).
- The whole point of this example is to catch message-format breaks. A persistently-red detector that everyone tunes out is worse than no detector.
- After 9 consecutive reds without anyone addressing it, the "is the test wrong, or is the code wrong?" question gets harder to answer for the next real break.
Acceptance criteria
Related
cc @phil-opp (author of #1887, also touched the python-dataflow sender — though the bisect almost certainly fingers the wrong PR)
Symptom
ci/circleci: examples-windows(andexamples-linuxon most runs) has been failing on every main commit since027d38ef(PR #1887, merged 2026-05-26 05:16 UTC) — 9 consecutive reds at the time of filing, no greens since.Both jobs fail at the same step, "Rust Git Dataflow example" (
cargo run --example rust-dataflow-git), with a CircleCIno_output_timeout: 30mcap:Sample failing runs:
examples-windowsjob 12101 on commit9f4242b6(docs: add debugging error lookup table #1884 merge) — https://app.circleci.com/workflow/54012652-3e29-4d73-a288-89f95b3bfe7cexamples-linuxjob 12115 — same workflow — Linux side takes 48.8m before the job-level wall fires.Bisect
536c5116027d38ef7aad98c74c3764978e2f0233630eec2beab18ab02a69ad72f8aebe189f4242b6Root cause hypothesis
examples/rust-dataflow-git/dataflow.ymlpins each node to a specific dora commit viarev::10cf7fe9is from 2026-04-28 — 74 commits behind current main. The file's own comment names exactly this scenario:Between
10cf7fe9and current main, message-format-adjacent files have churned heavily (1598 ins, 349 del acrosslibraries/message/and node/daemon source). The most likely culprits:eab18ab0(node: route all data via zenoh, switch to callback subscribers #1787, "node: route all data via zenoh, switch to callback subscribers") — substantial node-to-daemon protocol rewrite. Landed 2026-05-26 17:20 UTC.81ba1bc1(fix(daemon): treat SIGTERM-induced exit during planned stop as clean (closes #1882) #1909) onward —node_to_daemon.rsgained 9 lines,running_dataflow.rs52,spawn/prepared.rs32.The fact that
027d38efis the first red (and the bisect-immediate-predecessor merge to land after 5/24) is most likely coincidental — the drift was already accumulating; the pin reached the breaking threshold around then. PR #1887 itself only touched README + python sender + smoke tests, so it almost certainly isn't the direct cause despite the bisect position.Why it's slipped through the merge queue
Branch protection on
mainrequires only these checks:Format,Clippy,Check,Typos,Audit (cargo-audit + cargo-deny),Unwrap budget,License check.examples-windowsandexamples-linuxaren't in the required list, so Trunk Merge Queue doesn't gate on them. PRs land regardless.Reproduction
On Linux it hangs ~48m before the CircleCI wall, on Windows ~29m. The hang itself probably starts much earlier; the no-output-timeout is what eventually kills it.
Proposed fixes
Short-term (unblock CI immediately)
Bump
rev:inexamples/rust-dataflow-git/dataflow.ymlto a current main commit (e.g.,9f4242b6or whatever's latest at fix time). One-line change in YAML. CI should go green on the next push.If post-bump CI is STILL red at the same step, that confirms the regression isn't purely message-format drift and there's a real bug to chase — but I expect the bump alone is enough.
Medium-term (prevent recurrence)
Either:
Add
examples-windows(or at least theexamples-linuxsubset) to the required-checks list formain. This ensures Trunk Merge Queue gates on it. Risk: any flake blocks all PRs.Set up a scheduled job that runs
cargo run --example rust-dataflow-gitagainstHEADof main once per day and auto-bumps the pin on success. Removes the manual maintenance step the file's comment relies on.Document in
CONTRIBUTING.md(or the dataflow.yml comment more prominently) that any PR touchinglibraries/message/,apis/rust/node/src/, or daemon protocol code MUST bump the pin in the same PR. Trusts the contributor, costs nothing structurally.(1) is the cheapest to implement but adds load to every PR. (3) is the cheapest in CI cost but trusts process. (2) is the most automated.
Severity
Medium. The bug is in CI-only test infrastructure; doesn't affect users. But:
Acceptance criteria
examples/rust-dataflow-git/dataflow.ymlrev:updated to a current main commitexamples-windowsandexamples-linux✅CONTRIBUTING.mdsnippetRelated
536c5116(ci: fix test-c-cpp-libraries.yml glob escape for c++ path #1918, 2026-05-24)027d38ef(Stop properly on stop message #1887, 2026-05-26)10cf7fe9was last bumped by5dccae78(Fix git dataflow example #1780, pre-1.0 consolidation)cc @phil-opp (author of #1887, also touched the python-dataflow sender — though the bisect almost certainly fingers the wrong PR)