Skip to content
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
26 changes: 26 additions & 0 deletions docs/debugging.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@ This guide covers how to debug, record, replay, and monitor dora dataflows. It i

- [Prerequisites](#prerequisites)
- [Quick Debugging Checklist](#quick-debugging-checklist)
- [Common Error Messages](#common-error-messages)
- [Record and Replay](#record-and-replay)
- [Recording a Dataflow](#recording-a-dataflow)
- [Recording Specific Topics](#recording-specific-topics)
Expand Down Expand Up @@ -113,6 +114,31 @@ dora record dataflow.yml -o debug-capture.drec

---

## Common Error Messages

Use this table as a first stop when a command or log line already includes a
clear error fragment. The fix column points to the command that usually gives
the next useful signal.

| Error message (or fragment) | Likely cause | Fix or next step |
|---|---|---|
| `Could not connect to the daemon` | The node tried to connect to the local daemon port, but no daemon is listening or the port is wrong. | Start the runtime with `dora up`, or check `DORA_DAEMON_LOCAL_LISTEN_PORT` when using a custom port. |
| `failed to request node config from daemon` | A manually started node reached the daemon, but the daemon could not return the node configuration. | Start the node through `dora start`/`dora run`, or confirm the node ID exists in the running dataflow. |
| `failed to register node with dora-daemon` | The daemon rejected a node registration request. | Check `dora logs --daemon`, then confirm the node ID and dataflow ID match the current run. |
| `no running dataflow with ID ...` | A CLI command is using a stale dataflow ID from a previous run. | Run `dora list` and retry with the current dataflow ID, or stop old references with `dora stop`. |
| `node ... not connected` | A command targeted a node that has not connected, has crashed, or has already shut down. | Run `dora node info -d <dataflow> <node>` and inspect `dora logs <dataflow> <node>`. |
| `node ... channel full` | The node is not draining control messages fast enough, so the daemon cannot enqueue another event. | Check `dora top` for pressure, inspect node logs, and reduce input rate or restart the node. |
| `node ... channel closed` | The node's control channel closed, usually because the node process exited. | Use `dora logs <dataflow> <node>` to find the crash or shutdown reason, then restart the node. |
| `failed to serialize param value for node ...` | A runtime parameter update could not be encoded for delivery to the target node. | Check the value passed to `dora param set`, then compare it with the parameter type shown by `dora param list`. |
| `unexpected ... reply` | Coordinator, daemon, or node API versions disagreed about the expected protocol reply. | Confirm all binaries come from the same Dora build, then restart coordinator, daemon, and dataflow. |
| `coordinator heartbeat timeout (20s)` | The daemon stopped receiving coordinator heartbeats. | Inspect coordinator and daemon logs for disconnects, then restart with `dora down` followed by `dora up`. |
| `there is already a running dataflow with ID ...` | A new start request reused an ID that is still active. | Run `dora list`, stop the existing dataflow if it is stale, or start with a different name/ID. |
| `failed to infer JSON schema` | JSON supplied to a topic or interactive input could not be mapped to an Arrow schema. | Check that the JSON is valid and homogeneous, or publish data with an explicit schema-aware producer. |
| `Arrow IPC stream contained no record batches` | A producer sent an empty or invalid Arrow IPC payload. | Verify the upstream node output and replay/recording file before debugging downstream consumers. |
| `zenoh publish failed` | Direct Zenoh publishing failed before the node could deliver data over the fast path. | Check Zenoh/network configuration and node logs; Dora may fall back to the daemon path for some publish failures. |

---

## Record and Replay

Record captures live dataflow messages to a file. Replay substitutes source nodes with recorded data, letting you reproduce behavior without hardware.
Expand Down
Loading