diff --git a/.dockerignore b/.dockerignore index 93a84bc3c9..fd9e6d0e35 100644 --- a/.dockerignore +++ b/.dockerignore @@ -34,3 +34,4 @@ accounts genesis.dat miden-store.* store.* +/rust-toolchain.toml diff --git a/CHANGELOG.md b/CHANGELOG.md index 8bdd77a0d1..85da940bec 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -1,5 +1,15 @@ # Changelog +## v0.15.0 (TBD) + +- [BREAKING] Changed `GetBlockByNumber` to accept a `BlockRequest` (with optional `include_proof` flag) and returns a response containing the block and an optional block proof ([#1864](https://github.com/0xMiden/node/pull/1864)). +- Network monitor now auto-regenerates accounts after persistent increment failures instead of staying unhealthy indefinitely ([#1942](https://github.com/0xMiden/node/pull/1942)). +- [BREAKING] Renamed `GetNoteError` endpoint to `GetNetworkNoteStatus` and extended it to return the full lifecycle status of a network note (`Pending`, `Processed`, `Discarded`, `Committed`) instead of only error information. Consumed notes are now retained in the database after block commit instead of being deleted ([#1892](https://github.com/0xMiden/node/pull/1892)). +- Extended `ValidatorStatus` proto response with `chain_tip`, `validated_transactions_count`, and `signed_blocks_count`; added Validator card to the network monitor dashboard ([#1900](https://github.com/0xMiden/node/pull/1900)). +- Updated the RocksDB SMT backend to use budgeted deserialization for bytes read from disk, ported from `0xMiden/crypto` PR [#846](https://github.com/0xMiden/crypto/pull/846) ([#1923](https://github.com/0xMiden/node/pull/1923)). +- [BREAKING] Network monitor `/status` endpoint now emits a single `RemoteProverStatus` entry per remote prover that bundles status, workers, and test results, instead of separate entries ([#1980](https://github.com/0xMiden/node/pull/1980)). +- Refactored the validator gRPC API implementation to use the new per-method trait implementations ([#1959](https://github.com/0xMiden/node/pull/1959)). + ## v0.14.9 (2026-04-21) - Simplified network monitor counter script loading by linking the counter module directly via `with_linked_module` instead of assembling a standalone library ([#1957](https://github.com/0xMiden/node/pull/1957)). @@ -12,6 +22,7 @@ ## v0.14.7 (2026-04-15) - [BREAKING] Aligned proto `TransactionHeader` with domain type and exposed erased notes in `SyncTransactions` ([#1941](https://github.com/0xMiden/node/pull/1941)). +- Improved LargeSmt RocksDB defaults, added per-DB memory-budget controls, and exposed durability mode selection ([#1947](https://github.com/0xMiden/node/pull/1947)). ## v0.14.6 (2026-04-10) @@ -32,14 +43,14 @@ ## v0.14.2 (2026-04-07) -- Added inclusion proofs to `SyncTransactions` output notes ([#1893](https://github.com/0xMiden/node/pull/1893)). - Added `block_header` field to `SyncChainMmrResponse` so clients can obtain the `block_to` block header without a separate request ([#1881](https://github.com/0xMiden/node/pull/1881)). +- Added inclusion proofs to `SyncTransactions` output notes ([#1893](https://github.com/0xMiden/node/pull/1893)). ## v0.14.1 (2026-04-02) - Fixed batch building issue with unauthenticated notes consumed in the same batch as they were created ([#1875](https://github.com/0xMiden/node/issues/1875)). -## v0.14.0 (2025-04-01) +## v0.14.0 (2026-04-01) ### Enhancements @@ -80,6 +91,7 @@ - [BREAKING] Modified `TransactionHeader` serialization to allow converting back into the native type after serialization ([#1759](https://github.com/0xMiden/node/issues/1759)). - Removed `chain_tip` requirement from mempool subscription request ([#1771](https://github.com/0xMiden/node/pull/1771)). - Moved bootstrap procedure to `miden-node validator bootstrap` command ([#1764](https://github.com/0xMiden/node/pull/1764)). +- [BREAKING] Removed `bundled` command; each component is now started as a separate process. Added `ntx-builder` CLI subcommand. Added `docker-compose.yml` for local multi-process deployment ([#1765](https://github.com/0xMiden/node/pull/1765)). - NTX Builder now deactivates network accounts which crash repeatedly (configurable via `--ntx-builder.max-account-crashes`, default 10) ([#1712](https://github.com/0xMiden/miden-node/pull/1712)). - Removed gRPC reflection v1-alpha support ([#1795](https://github.com/0xMiden/node/pull/1795)). - [BREAKING] Rust requirement bumped from `v1.91` to `v1.93` ([#1803](https://github.com/0xMiden/node/pull/1803)). @@ -89,8 +101,6 @@ ### Fixes - Fixed network monitor looping on stale wallet nonce after node restarts by re-syncing wallet state from RPC after repeated failures ([#1748](https://github.com/0xMiden/node/pull/1748)). -- Fixed `bundled start` panicking due to duplicate `data_directory` clap argument name between `BundledCommand::Start` and `NtxBuilderConfig` ([#1732](https://github.com/0xMiden/node/pull/1732)). -- Fixed `bundled bootstrap` requiring `--validator.key.hex` or `--validator.key.kms-id` despite a default key being configured ([#1732](https://github.com/0xMiden/node/pull/1732)). - Fixed incorrectly classifying private notes with the network attachment as network notes ([#1378](https://github.com/0xMiden/node/pull/1738)). - Fixed accept header version negotiation rejecting all pre-release versions; pre-release label matching is now lenient, accepting any numeric suffix within the same label (e.g. `alpha.3` accepts `alpha.1`) ([#1755](https://github.com/0xMiden/node/pull/1755)). - Fixed `GetAccount` returning an internal error for `AllEntries` requests on storage maps where all entries are in a single block (e.g. genesis accounts) ([#1816](https://github.com/0xMiden/node/pull/1816)). diff --git a/Cargo.lock b/Cargo.lock index 006e46f007..875987a9c3 100644 --- a/Cargo.lock +++ b/Cargo.lock @@ -1063,6 +1063,15 @@ dependencies = [ "cc", ] +[[package]] +name = "codegen" +version = "0.3.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "573800db6c3319bc125ddbf9b9cb001ad1602957f53642ba8d09ff3ddd4da7f1" +dependencies = [ + "indexmap", +] + [[package]] name = "colorchoice" version = "1.0.5" @@ -2675,6 +2684,7 @@ dependencies = [ "libc", "libz-sys", "lz4-sys", + "zstd-sys", ] [[package]] @@ -3058,7 +3068,7 @@ dependencies = [ [[package]] name = "miden-genesis" -version = "0.14.9" +version = "0.15.0" dependencies = [ "anyhow", "clap", @@ -3077,7 +3087,7 @@ dependencies = [ [[package]] name = "miden-large-smt-backend-rocksdb" -version = "0.14.9" +version = "0.15.0" dependencies = [ "miden-crypto", "miden-node-rocksdb-cxx-linkage-fix", @@ -3138,7 +3148,7 @@ dependencies = [ [[package]] name = "miden-network-monitor" -version = "0.14.9" +version = "0.15.0" dependencies = [ "anyhow", "axum", @@ -3166,7 +3176,7 @@ dependencies = [ [[package]] name = "miden-node" -version = "0.14.9" +version = "0.15.0" dependencies = [ "anyhow", "clap", @@ -3186,7 +3196,7 @@ dependencies = [ [[package]] name = "miden-node-block-producer" -version = "0.14.9" +version = "0.15.0" dependencies = [ "anyhow", "assert_matches", @@ -3221,7 +3231,7 @@ dependencies = [ [[package]] name = "miden-node-db" -version = "0.14.9" +version = "0.15.0" dependencies = [ "deadpool", "deadpool-diesel", @@ -3234,7 +3244,7 @@ dependencies = [ [[package]] name = "miden-node-grpc-error-macro" -version = "0.14.9" +version = "0.15.0" dependencies = [ "quote", "syn 2.0.117", @@ -3242,7 +3252,7 @@ dependencies = [ [[package]] name = "miden-node-ntx-builder" -version = "0.14.9" +version = "0.15.0" dependencies = [ "anyhow", "build-rs", @@ -3265,7 +3275,6 @@ dependencies = [ "thiserror 2.0.18", "tokio", "tokio-stream", - "tokio-util", "tonic", "tonic-reflection", "tower-http", @@ -3275,11 +3284,12 @@ dependencies = [ [[package]] name = "miden-node-proto" -version = "0.14.9" +version = "0.15.0" dependencies = [ "anyhow", "assert_matches", "build-rs", + "codegen", "fs-err", "hex", "http 1.4.0", @@ -3291,6 +3301,7 @@ dependencies = [ "miette", "proptest", "prost", + "prost-types", "thiserror 2.0.18", "tonic", "tonic-prost", @@ -3300,9 +3311,10 @@ dependencies = [ [[package]] name = "miden-node-proto-build" -version = "0.14.9" +version = "0.15.0" dependencies = [ "build-rs", + "codegen", "fs-err", "miette", "protox", @@ -3311,11 +3323,11 @@ dependencies = [ [[package]] name = "miden-node-rocksdb-cxx-linkage-fix" -version = "0.14.9" +version = "0.15.0" [[package]] name = "miden-node-rpc" -version = "0.14.9" +version = "0.15.0" dependencies = [ "anyhow", "futures", @@ -3347,7 +3359,7 @@ dependencies = [ [[package]] name = "miden-node-store" -version = "0.14.9" +version = "0.15.0" dependencies = [ "anyhow", "assert_matches", @@ -3393,7 +3405,7 @@ dependencies = [ [[package]] name = "miden-node-stress-test" -version = "0.14.9" +version = "0.15.0" dependencies = [ "clap", "fs-err", @@ -3421,7 +3433,7 @@ dependencies = [ [[package]] name = "miden-node-utils" -version = "0.14.9" +version = "0.15.0" dependencies = [ "anyhow", "bytes", @@ -3454,7 +3466,7 @@ dependencies = [ [[package]] name = "miden-node-validator" -version = "0.14.9" +version = "0.15.0" dependencies = [ "anyhow", "aws-config", @@ -3590,7 +3602,7 @@ dependencies = [ [[package]] name = "miden-remote-prover" -version = "0.14.9" +version = "0.15.0" dependencies = [ "anyhow", "assert_matches", @@ -3627,7 +3639,7 @@ dependencies = [ [[package]] name = "miden-remote-prover-client" -version = "0.14.9" +version = "0.15.0" dependencies = [ "build-rs", "fs-err", @@ -7615,3 +7627,13 @@ name = "zmij" version = "1.0.21" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "b8848ee67ecc8aedbaf3e4122217aff892639231befc6a1b58d29fff4c2cabaa" + +[[package]] +name = "zstd-sys" +version = "2.0.16+zstd.1.5.7" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "91e19ebc2adc8f83e43039e79776e3fda8ca919132d68a1fed6a5faca2683748" +dependencies = [ + "cc", + "pkg-config", +] diff --git a/Cargo.toml b/Cargo.toml index 567ca37ca9..192acc98a6 100644 --- a/Cargo.toml +++ b/Cargo.toml @@ -32,7 +32,7 @@ license = "MIT" readme = "README.md" repository = "https://github.com/0xMiden/node" rust-version = "1.93" -version = "0.14.9" +version = "0.15.0" # Optimize the cryptography for faster tests involving account creation. [profile.test.package.miden-crypto] @@ -43,22 +43,22 @@ debug = true [workspace.dependencies] # Workspace crates. -miden-large-smt-backend-rocksdb = { path = "crates/large-smt-backend-rocksdb", version = "0.14" } -miden-node-block-producer = { path = "crates/block-producer", version = "0.14" } -miden-node-db = { path = "crates/db", version = "0.14" } -miden-node-grpc-error-macro = { path = "crates/grpc-error-macro", version = "0.14" } -miden-node-ntx-builder = { path = "crates/ntx-builder", version = "0.14" } -miden-node-proto = { path = "crates/proto", version = "0.14" } -miden-node-proto-build = { path = "proto", version = "0.14" } -miden-node-rpc = { path = "crates/rpc", version = "0.14" } -miden-node-store = { path = "crates/store", version = "0.14" } +miden-large-smt-backend-rocksdb = { path = "crates/large-smt-backend-rocksdb", version = "0.15" } +miden-node-block-producer = { path = "crates/block-producer", version = "0.15" } +miden-node-db = { path = "crates/db", version = "0.15" } +miden-node-grpc-error-macro = { path = "crates/grpc-error-macro", version = "0.15" } +miden-node-ntx-builder = { path = "crates/ntx-builder", version = "0.15" } +miden-node-proto = { path = "crates/proto", version = "0.15" } +miden-node-proto-build = { path = "proto", version = "0.15" } +miden-node-rpc = { path = "crates/rpc", version = "0.15" } +miden-node-store = { path = "crates/store", version = "0.15" } miden-node-test-macro = { path = "crates/test-macro" } -miden-node-utils = { path = "crates/utils", version = "0.14" } -miden-node-validator = { path = "crates/validator", version = "0.14" } -miden-remote-prover-client = { path = "crates/remote-prover-client", version = "0.14" } +miden-node-utils = { path = "crates/utils", version = "0.15" } +miden-node-validator = { path = "crates/validator", version = "0.15" } +miden-remote-prover-client = { path = "crates/remote-prover-client", version = "0.15" } # Temporary workaround until # is part of `rocksdb-rust` release -miden-node-rocksdb-cxx-linkage-fix = { path = "crates/rocksdb-cxx-linkage-fix", version = "0.14" } +miden-node-rocksdb-cxx-linkage-fix = { path = "crates/rocksdb-cxx-linkage-fix", version = "0.15" } # miden-protocol dependencies. These should be updated in sync. miden-agglayer = { version = "0.14" } @@ -78,6 +78,7 @@ assert_matches = { version = "1.5" } async-trait = { version = "0.1" } build-rs = { version = "0.3" } clap = { features = ["derive"], version = "4.5" } +codegen = { version = "0.3" } deadpool = { default-features = false, version = "0.12" } deadpool-diesel = { version = "0.6" } deadpool-sync = { default-features = false, version = "0.1" } @@ -97,6 +98,7 @@ pretty_assertions = { version = "1.4" } # lockstep, nor are they adhering to semver semantics. We keep this # to avoid future breakage. prost = { default-features = false, version = "=0.14.3" } +prost-types = { default-features = false, version = "=0.14.3" } protox = { version = "=0.9.1" } rand = { version = "0.9" } rand_chacha = { default-features = false, version = "0.9" } diff --git a/Makefile b/Makefile index 33ab72a885..31b5c9c88f 100644 --- a/Makefile +++ b/Makefile @@ -9,6 +9,7 @@ help: WARNINGS=RUSTDOCFLAGS="-D warnings" CONTAINER_RUNTIME ?= docker STRESS_TEST_DATA_DIR ?= stress-test-store-$(shell date +%Y%m%d-%H%M%S) +COMPOSE_FILES = -f docker-compose.yml -f compose/telemetry.yml -f compose/monitor.yml # -- linting -------------------------------------------------------------------------------------- @@ -127,6 +128,24 @@ install-network-monitor: ## Installs network monitor binary # --- docker -------------------------------------------------------------------------------------- +.PHONY: compose-genesis +compose-genesis: ## Wipes node volumes and creates a fresh genesis block + $(CONTAINER_RUNTIME) compose $(COMPOSE_FILES) down --volumes --remove-orphans + $(CONTAINER_RUNTIME) volume rm -f miden-node_genesis-data miden-node_store-data miden-node_validator-data miden-node_ntx-builder-data miden-node_accounts + $(CONTAINER_RUNTIME) compose $(COMPOSE_FILES) --profile genesis run --rm genesis + +.PHONY: compose-up +compose-up: ## Starts all node components, telemetry, and monitor via docker compose + $(CONTAINER_RUNTIME) compose $(COMPOSE_FILES) up -d + +.PHONY: compose-down +compose-down: ## Stops and removes all containers via docker compose + $(CONTAINER_RUNTIME) compose $(COMPOSE_FILES) down + +.PHONY: compose-logs +compose-logs: ## Follows logs for all components via docker compose + $(CONTAINER_RUNTIME) compose $(COMPOSE_FILES) logs -f + .PHONY: docker-build-node docker-build-node: ## Builds the Miden node using Docker (override with CONTAINER_RUNTIME=podman) @CREATED=$$(date) && \ @@ -138,6 +157,12 @@ docker-build-node: ## Builds the Miden node using Docker (override with CONTAINE -f bin/node/Dockerfile \ -t miden-node-image . +.PHONY: docker-build-monitor +docker-build-monitor: ## Builds the network monitor using Docker (override with CONTAINER_RUNTIME=podman) + $(CONTAINER_RUNTIME) build \ + -f bin/network-monitor/Dockerfile \ + -t miden-network-monitor-image . + .PHONY: docker-run-node docker-run-node: ## Runs the Miden node as a Docker container (override with CONTAINER_RUNTIME=podman) $(CONTAINER_RUNTIME) volume create miden-db diff --git a/bin/network-monitor/.env b/bin/network-monitor/.env index 38ffd6ba94..8b2e04c849 100644 --- a/bin/network-monitor/.env +++ b/bin/network-monitor/.env @@ -24,3 +24,5 @@ MIDEN_MONITOR_COUNTER_LATENCY_TIMEOUT=2m MIDEN_MONITOR_EXPLORER_URL=https://scan-backend-devnet-miden.eu-central-8.gateway.fm/graphql # note transport checks MIDEN_MONITOR_NOTE_TRANSPORT_URL=https://transport.miden.io +# validator checks +MIDEN_MONITOR_VALIDATOR_URL= diff --git a/bin/network-monitor/Dockerfile b/bin/network-monitor/Dockerfile new file mode 100644 index 0000000000..b954f73f58 --- /dev/null +++ b/bin/network-monitor/Dockerfile @@ -0,0 +1,39 @@ +FROM rust:1.93-slim-bookworm AS chef +# Install build dependencies. RocksDB is compiled from source by librocksdb-sys. +RUN apt-get update && \ + apt-get -y upgrade && \ + apt-get install -y \ + llvm \ + clang \ + libclang-dev \ + cmake \ + pkg-config \ + libssl-dev \ + libsqlite3-dev \ + ca-certificates && \ + rm -rf /var/lib/apt/lists/* +RUN cargo install cargo-chef +WORKDIR /app + +FROM chef AS planner +COPY . . +RUN cargo chef prepare --recipe-path recipe.json + +FROM chef AS builder +COPY --from=planner /app/recipe.json recipe.json +# Build dependencies - this is the caching Docker layer! +RUN cargo chef cook --release --recipe-path recipe.json +# Build application +COPY . . +RUN cargo build --release --locked --bin miden-network-monitor + +FROM debian:bookworm-slim AS runtime +RUN apt-get update && \ + apt-get -y upgrade && \ + apt-get install -y --no-install-recommends sqlite3 ca-certificates \ + && rm -rf /var/lib/apt/lists/* +COPY --from=builder /app/target/release/miden-network-monitor /usr/local/bin/miden-network-monitor + +EXPOSE 3000 + +CMD ["miden-network-monitor"] diff --git a/bin/network-monitor/README.md b/bin/network-monitor/README.md index 03a7f884dd..604e054c88 100644 --- a/bin/network-monitor/README.md +++ b/bin/network-monitor/README.md @@ -33,6 +33,7 @@ miden-network-monitor start --faucet-url http://localhost:8080 --enable-otel - `--faucet-url`: Faucet service URL for testing. If omitted, faucet testing is disabled. - `--explorer-url`: Explorer service GraphQL endpoint. If omitted, explorer checks are disabled. - `--note-transport-url`: Note transport service URL for health checking. If omitted, note transport checks are disabled. +- `--validator-url`: Validator service URL for status checking. If omitted, validator checks are disabled. - `--disable-ntx-service`: Disable the network transaction service checks (enabled by default). The network transaction service consists of two components: counter increment (sending increment transactions) and counter tracking (monitoring counter value changes). - `--remote-prover-test-interval`: Interval at which to test the remote provers services (default: `2m`) - `--faucet-test-interval`: Interval at which to test the faucet services (default: `2m`) @@ -58,6 +59,7 @@ If command-line arguments are not provided, the application falls back to enviro - `MIDEN_MONITOR_FAUCET_URL`: Faucet service URL for testing. If unset, faucet testing is disabled. - `MIDEN_MONITOR_EXPLORER_URL`: Explorer service GraphQL endpoint. If unset, explorer checks are disabled. - `MIDEN_MONITOR_NOTE_TRANSPORT_URL`: Note transport service URL for health checking. If unset, note transport checks are disabled. +- `MIDEN_MONITOR_VALIDATOR_URL`: Validator service URL for status checking. If unset, validator checks are disabled. - `MIDEN_MONITOR_DISABLE_NTX_SERVICE`: Set to `true` to disable the network transaction service checks (enabled by default). This affects both counter increment and tracking components. - `MIDEN_MONITOR_REMOTE_PROVER_TEST_INTERVAL`: Interval at which to test the remote provers services - `MIDEN_MONITOR_FAUCET_TEST_INTERVAL`: Interval at which to test the faucet services @@ -83,6 +85,7 @@ Starts the network monitoring service with the web dashboard. RPC status is alwa - Faucet testing: enabled when `--faucet-url` (or `MIDEN_MONITOR_FAUCET_URL`) is provided - Network transaction service: enabled when `--disable-ntx-service=false` or unset (or `MIDEN_MONITOR_DISABLE_NTX_SERVICE=false` or unset) - Note transport checks: enabled when `--note-transport-url` (or `MIDEN_MONITOR_NOTE_TRANSPORT_URL`) is provided +- Validator checks: enabled when `--validator-url` (or `MIDEN_MONITOR_VALIDATOR_URL`) is provided ```bash # Start with default configuration (RPC only) @@ -216,6 +219,14 @@ The monitor application provides real-time status monitoring for the following M - Service URL - gRPC serving status (Serving, NotServing, Unknown) +### Validator +- **Service Health**: Checks the validator service via its gRPC Status endpoint +- **Metrics**: + - Service URL and version + - Chain tip (highest signed block number) + - Validated transactions count (total transactions validated via `SubmitProvenTransaction`) + - Signed blocks count (total blocks signed via `SignBlock`) + ## User Interface The web dashboard provides a clean, responsive interface with the following features: diff --git a/bin/network-monitor/assets/index.css b/bin/network-monitor/assets/index.css index b3d86d791a..b0cbc26037 100644 --- a/bin/network-monitor/assets/index.css +++ b/bin/network-monitor/assets/index.css @@ -4,10 +4,37 @@ box-sizing: border-box; } +:root { + --color-accent: #ff5500; + --color-healthy: #22C55D; + --color-unhealthy: #ff5500; + --color-warning: #ff8c00; + --color-text-primary: #333; + --color-text-secondary: rgba(0, 0, 0, 0.6); + --color-text-muted: rgba(0, 0, 0, 0.4); + --color-text-faint: #999; + --color-text-meta: #666; + --color-bg: #fafafa; + --color-bg-card: white; + --color-bg-hover: #f5f5f5; + --color-border: #e0e0e0; + --color-border-hover: #d0d0d0; + --color-border-light: #f0f0f0; + --color-error-bg: #f8d7da; + --color-error-text: #721c24; + --color-error-inline: #dc2626; + --color-badge-healthy-bg: #d4edda; + --color-badge-healthy-text: #155724; + --color-badge-unhealthy-bg: #f8d7da; + --color-badge-unhealthy-text: #721c24; + --color-badge-unknown-bg: #fff3cd; + --color-badge-unknown-text: #856404; +} + body { font-family: "DM Mono", monospace; - background-color: #fafafa; - color: rgba(0, 0, 0, 0.4); + background-color: var(--color-bg); + color: var(--color-text-muted); font-weight: 500; letter-spacing: -0.04em; min-height: 100vh; @@ -45,11 +72,7 @@ body { .logo-text { font-size: 24px; - color: #333; -} - -.logo-text .highlight { - color: #ff5500; + color: var(--color-text-primary); } @@ -62,19 +85,12 @@ body { .form-title { font-size: 28px; font-weight: 500; - color: #333; + color: var(--color-text-primary); text-align: center; margin-bottom: 32px; letter-spacing: -0.02em; } -.button-group { - margin-top: 36px; - display: flex; - flex-direction: column; - gap: 12px; -} - .button { font-family: "DM Mono", monospace; font-weight: 500; @@ -82,9 +98,9 @@ body { font-size: 16px; width: 100%; height: 48px; - border: 1px solid #e0e0e0; - background-color: white; - color: #ff5500; + border: 1px solid var(--color-border); + background-color: var(--color-bg-card); + color: var(--color-accent); cursor: pointer; display: flex; align-items: center; @@ -95,7 +111,7 @@ body { .button:hover { background-color: #eeeeee; - border-color: #d0d0d0; + border-color: var(--color-border-hover); } .button-icon { @@ -111,21 +127,13 @@ body { } .highlight { - color: #ff5500; -} - -.loading-section { - display: flex; - align-items: center; - justify-content: center; - position: relative; - height: 60px; + color: var(--color-accent); } .loader { width: 60px; height: 60px; - border: 3px solid #ff5500; + border: 3px solid var(--color-accent); border-bottom-color: transparent; border-radius: 50%; display: inline-block; @@ -140,7 +148,7 @@ body { } .footer-content { - color: rgba(0, 0, 0, 0.4); + color: var(--color-text-muted); padding: 30px 24px; display: flex; justify-content: space-between; @@ -151,7 +159,7 @@ body { .footer-line { height: 17px; - background-color: #ff5500; + background-color: var(--color-accent); width: 100%; } @@ -165,7 +173,7 @@ body { } .footer-value { - color: rgba(0, 0, 0, 0.4); + color: var(--color-text-muted); font-weight: 400; font-size: 12px; } @@ -179,8 +187,8 @@ body { } .error-message { - background-color: #f8d7da; - color: #721c24; + background-color: var(--color-error-bg); + color: var(--color-error-text); padding: 12px 16px; border-radius: 6px; margin-top: 16px; @@ -191,27 +199,10 @@ body { grid-column: 1 / -1; /* Span the full width of the grid when in grid context */ } -@media (max-width: 768px) { - .container { - padding: 40px 16px 0; - } - - .footer-content { - flex-direction: column; - gap: 16px; - text-align: center; - align-items: center; - } - - .tokens-section { - text-align: center; - } -} - /* Network Monitor Styles */ .service-card { - background: white; - border: 0.5px solid #e0e0e0; + background: var(--color-bg-card); + border: 0.5px solid var(--color-border); border-radius: 8px; padding: 20px; transition: all 0.2s; @@ -220,16 +211,16 @@ body { } .service-card:hover { - border-color: #d0d0d0; + border-color: var(--color-border-hover); box-shadow: 0 2px 8px rgba(0, 0, 0, 0.1); } .service-card.healthy { - border-left: 4px solid #22C55D; + border-left: 4px solid var(--color-healthy); } .service-card.unhealthy { - border-left: 4px solid #ff5500; + border-left: 4px solid var(--color-accent); } .service-header { @@ -243,7 +234,7 @@ body { .service-name { font-size: 16px; font-weight: 500; - color: #333; + color: var(--color-text-primary); } .service-status { @@ -253,16 +244,6 @@ body { letter-spacing: 0.5px; } -.service-error { - background-color: #fef2f2; - color: #dc2626; - padding: 8px 12px; - border-radius: 4px; - font-size: 12px; - margin-bottom: 12px; - border: 1px solid #fecaca; -} - .service-content { flex: 1; margin-bottom: 8px; @@ -272,7 +253,7 @@ body { .service-details { margin: 12px 0; font-size: 12px; - color: rgba(0, 0, 0, 0.6); + color: var(--color-text-secondary); } .detail-item { @@ -282,15 +263,14 @@ body { } .detail-item strong { - color: #333; + color: var(--color-text-primary); margin-right: 8px; min-width: 80px; } .genesis-value { - font-family: "DM Mono", monospace; font-size: 11px; - color: #666; + color: var(--color-text-meta); margin-right: 8px; } @@ -304,12 +284,12 @@ body { align-items: center; justify-content: center; transition: all 0.2s; - color: #999; + color: var(--color-text-faint); } .copy-button:hover { - background-color: #f5f5f5; - color: #ff5500; + background-color: var(--color-bg-hover); + color: var(--color-accent); } .copy-icon { @@ -325,18 +305,18 @@ body { .nested-status { margin-top: 8px; padding-left: 12px; - border-left: 2px solid #e0e0e0; + border-left: 2px solid var(--color-border); font-size: 11px; } .service-timestamp { font-size: 11px; - color: #999; + color: var(--color-text-faint); margin-top: auto; padding-top: 8px; font-style: italic; flex-shrink: 0; - border-top: 1px solid #f0f0f0; + border-top: 1px solid var(--color-border-light); } #status-container { @@ -357,44 +337,25 @@ body { grid-column: 1 / -1; /* Span the full width of the grid when in grid context */ } -/* Responsive adjustments for service cards */ -@media (max-width: 768px) { - .service-header { - flex-direction: column; - align-items: flex-start; - gap: 8px; - } - - .service-status { - align-self: flex-end; - } - - .detail-item { - flex-direction: column; - align-items: flex-start; - } - - .detail-item strong { - margin-bottom: 2px; - } -} - .worker-status { margin: 4px 0; padding: 4px 8px; background: rgba(0, 0, 0, 0.05); border-radius: 4px; font-size: 12px; - font-family: "DM Mono", monospace; + display: flex; + align-items: center; + gap: 8px; + flex-wrap: wrap; } .worker-name { font-weight: 500; - color: #333; + color: var(--color-text-primary); } .worker-version { - color: #666; + color: var(--color-text-meta); } .worker-status-badge { @@ -406,18 +367,18 @@ body { } .worker-status-badge.healthy { - background: #d4edda; - color: #155724; + background: var(--color-badge-healthy-bg); + color: var(--color-badge-healthy-text); } .worker-status-badge.unhealthy { - background: #f8d7da; - color: #721c24; + background: var(--color-badge-unhealthy-bg); + color: var(--color-badge-unhealthy-text); } .worker-status-badge.unknown { - background: #fff3cd; - color: #856404; + background: var(--color-badge-unknown-bg); + color: var(--color-badge-unknown-text); } /* Transaction Prover Test Styles */ @@ -429,12 +390,12 @@ body { .test-metrics.healthy { background: rgba(34, 197, 93, 0.05); - border-left: 3px solid #22C55D; + border-left: 3px solid var(--color-healthy); } .test-metrics.unhealthy { background: rgba(255, 85, 0, 0.05); - border-left: 3px solid #ff5500; + border-left: 3px solid var(--color-accent); } .metric-row { @@ -446,21 +407,17 @@ body { } .metric-label { - color: #333; + color: var(--color-text-primary); font-weight: 500; } .metric-value { - font-family: "DM Mono", monospace; font-weight: 500; } .metric-value.warning-delta, .warning-text { - color: #ff8c00; -} - -.warning-text { + color: var(--color-warning); font-weight: 500; font-size: 12px; } @@ -470,15 +427,15 @@ body { padding: 8px 12px; border-radius: 4px; background: rgba(255, 85, 0, 0.08); - border-left: 3px solid #ff8c00; + border-left: 3px solid var(--color-warning); } .test-metrics.healthy .metric-value { - color: #22C55D; + color: var(--color-healthy); } .test-metrics.unhealthy .metric-value { - color: #ff5500; + color: var(--color-accent); } /* Refresh button container */ @@ -487,24 +444,6 @@ body { margin-top: 20px; } -/* Responsive adjustments for test metrics */ -@media (max-width: 768px) { - .metric-row { - flex-direction: column; - align-items: flex-start; - gap: 2px; - } - - /* Adjust grid for mobile */ - #status-container { - grid-template-columns: 1fr; - } - - .main-content { - padding: 0 16px; - } -} - /* Additional responsive breakpoints */ @media (max-width: 480px) { #status-container { @@ -527,14 +466,14 @@ body { .probe-section { margin-top: 12px; padding-top: 8px; - border-top: 1px dashed #e0e0e0; + border-top: 1px dashed var(--color-border); } .probe-spinner { width: 12px; height: 12px; border: 2px solid #ccc; - border-top-color: #666; + border-top-color: var(--color-text-meta); border-radius: 50%; animation: spin 0.8s linear infinite; } @@ -556,21 +495,21 @@ body { .probe-result.probe-ok { background-color: rgba(34, 197, 93, 0.1); - border-left: 3px solid #22C55D; + border-left: 3px solid var(--color-healthy); } .probe-result.probe-failed { background-color: rgba(255, 85, 0, 0.1); - border-left: 3px solid #ff5500; + border-left: 3px solid var(--color-accent); } .probe-result.probe-pending { background-color: rgba(150, 150, 150, 0.1); - border-left: 3px solid #999; + border-left: 3px solid var(--color-text-faint); } .probe-pending .probe-status-badge { - color: #666; + color: var(--color-text-meta); text-transform: none; } @@ -581,20 +520,19 @@ body { } .probe-ok .probe-status-badge { - color: #22C55D; + color: var(--color-healthy); } .probe-failed .probe-status-badge { - color: #ff5500; + color: var(--color-accent); } .probe-latency { - font-family: "DM Mono", monospace; - color: #666; + color: var(--color-text-meta); } .probe-error { - color: #dc2626; + color: var(--color-error-inline); font-size: 10px; max-width: 200px; overflow: hidden; @@ -603,12 +541,61 @@ body { } .probe-time { - color: #999; + color: var(--color-text-faint); font-size: 10px; margin-left: auto; } +/* Responsive: 768px and below */ @media (max-width: 768px) { + .container { + padding: 40px 16px 0; + } + + .footer-content { + flex-direction: column; + gap: 16px; + text-align: center; + align-items: center; + } + + .tokens-section { + text-align: center; + } + + .service-header { + flex-direction: column; + align-items: flex-start; + gap: 8px; + } + + .service-status { + align-self: flex-end; + } + + .detail-item { + flex-direction: column; + align-items: flex-start; + } + + .detail-item strong { + margin-bottom: 2px; + } + + .metric-row { + flex-direction: column; + align-items: flex-start; + gap: 2px; + } + + #status-container { + grid-template-columns: 1fr; + } + + .main-content { + padding: 0 16px; + } + .probe-result { flex-direction: column; align-items: flex-start; diff --git a/bin/network-monitor/assets/index.js b/bin/network-monitor/assets/index.js index 369ec67d6a..842a9567c8 100644 --- a/bin/network-monitor/assets/index.js +++ b/bin/network-monitor/assets/index.js @@ -5,6 +5,11 @@ let statusData = null; let updateInterval = null; const EXPLORER_LAG_TOLERANCE = 20; // max allowed block delta vs RPC, roughly 1 minute +// Read theme colors from CSS custom properties +const rootStyle = getComputedStyle(document.documentElement); +const COLOR_HEALTHY = rootStyle.getPropertyValue('--color-healthy').trim(); +const COLOR_UNHEALTHY = rootStyle.getPropertyValue('--color-unhealthy').trim(); + // Store gRPC-Web probe results keyed by service URL const grpcWebProbeResults = new Map(); @@ -167,10 +172,11 @@ function collectGrpcWebEndpoints() { }); } // Remote Prover service - if (service.details.RemoteProverStatus && service.details.RemoteProverStatus.url) { + const proverUrl = service.details.RemoteProverStatus?.status?.url; + if (proverUrl) { endpoints.push({ - serviceKey: service.details.RemoteProverStatus.url, - baseUrl: service.details.RemoteProverStatus.url, + serviceKey: proverUrl, + baseUrl: proverUrl, grpcPath: '/remote_prover.ProxyStatusApi/Status', }); } @@ -298,55 +304,6 @@ async function fetchStatus() { } } -// Merge Remote Prover status and test entries into a single card per prover. -function mergeProverStatusAndTests(services) { - const testsByName = new Map(); - const merged = []; - const usedTests = new Set(); - - services.forEach(service => { - if (service.details && service.details.RemoteProverTest) { - testsByName.set(service.name, service); - } - }); - - services.forEach(service => { - if (service.details && service.details.RemoteProverStatus) { - const test = testsByName.get(service.name); - if (test) { - usedTests.add(service.name); - } - merged.push({ - ...service, - testDetails: test?.details?.RemoteProverTest ?? null, - testStatus: test?.status ?? null, - testError: test?.error ?? null - }); - } else if (!(service.details && service.details.RemoteProverTest)) { - // Non-prover entries pass through unchanged - merged.push(service); - } - }); - - // Add orphaned tests (in case a test arrives before a status) - testsByName.forEach((test, name) => { - if (!usedTests.has(name)) { - merged.push({ - name, - status: test.status, - last_checked: test.last_checked, - error: test.error, - details: null, - testDetails: test.details.RemoteProverTest, - testStatus: test.status, - testError: test.error - }); - } - }); - - return merged; -} - function updateDisplay() { if (!statusData) return; @@ -359,34 +316,33 @@ function updateDisplay() { const lastUpdateTime = new Date(statusData.last_updated * 1000); lastUpdated.textContent = lastUpdateTime.toLocaleString(); - // Group remote prover status + test into single cards - const processedServices = mergeProverStatusAndTests(statusData.services); - const rpcService = processedServices.find(s => s.details && s.details.RpcStatus); + const services = statusData.services; + const rpcService = services.find(s => s.details && s.details.RpcStatus); const rpcChainTip = rpcService?.details?.RpcStatus?.store_status?.chain_tip ?? rpcService?.details?.RpcStatus?.block_producer_status?.chain_tip ?? null; - // Compute effective health for a service, considering all signals for remote provers. + // Compute effective health const isServiceHealthy = (s) => { - if (s.details && s.details.RemoteProverStatus) { - const statusOk = s.status === 'Healthy'; - const testOk = s.testStatus == null || s.testStatus === 'Healthy'; - const probeResult = grpcWebProbeResults.get(s.details.RemoteProverStatus.url); - const probeOk = !probeResult || probeResult.ok; - return statusOk && testOk && probeOk; + if (s.status !== 'Healthy') return false; + const probeUrl = s.details?.RemoteProverStatus?.status?.url + ?? s.details?.RpcStatus?.url; + if (probeUrl) { + const probe = grpcWebProbeResults.get(probeUrl); + if (probe && !probe.ok) return false; } - return s.status === 'Healthy'; + return true; }; // Count healthy vs unhealthy services - const healthyServices = processedServices.filter(isServiceHealthy).length; - const totalServices = processedServices.length; + const healthyServices = services.filter(isServiceHealthy).length; + const totalServices = services.length; const allHealthy = healthyServices === totalServices; // Update footer overallStatus.textContent = allHealthy ? 'All Systems Operational' : `${healthyServices}/${totalServices} Services Healthy`; - overallStatus.style.color = allHealthy ? '#22C55D' : '#ff5500'; + overallStatus.style.color = allHealthy ? COLOR_HEALTHY : COLOR_UNHEALTHY; servicesCount.textContent = `${totalServices} Services`; // Update network name in logo @@ -399,9 +355,9 @@ function updateDisplay() { } // Generate status cards - const serviceCardsHtml = processedServices.map(service => { + const serviceCardsHtml = services.map(service => { const isHealthy = isServiceHealthy(service); - const statusColor = isHealthy ? '#22C55D' : '#ff5500'; + const statusColor = isHealthy ? COLOR_HEALTHY : COLOR_UNHEALTHY; const statusIcon = isHealthy ? '✓' : '✗'; const numOrDash = value => isHealthy ? (value?.toLocaleString?.() ?? value ?? '-') : '-'; const timeOrDash = ts => { @@ -494,24 +450,32 @@ function updateDisplay() { ` : ''} ` : ''} - ${details.RemoteProverStatus ? ` -
URL: ${details.RemoteProverStatus.url}${renderCopyButton(details.RemoteProverStatus.url, 'URL')}
-
Version: ${details.RemoteProverStatus.version}
-
Proof Type: ${details.RemoteProverStatus.supported_proof_type}
- ${renderGrpcWebProbeSection(details.RemoteProverStatus.url)} - ${details.RemoteProverStatus.workers && details.RemoteProverStatus.workers.length > 0 ? ` -
- Workers (${details.RemoteProverStatus.workers.length}): - ${details.RemoteProverStatus.workers.map(worker => ` -
- ${worker.name} - - ${worker.version} - - ${worker.status} -
- `).join('')} -
- ` : ''} - ` : ''} + ${details.RemoteProverStatus ? (() => { + const p = details.RemoteProverStatus.status; + return ` +
URL: ${p.url}${renderCopyButton(p.url, 'URL')}
+
Version: ${p.version}
+
Proof Type: ${p.supported_proof_type}
+ ${renderGrpcWebProbeSection(p.url)} + ${p.workers && p.workers.length > 0 ? ` +
+ Workers (${p.workers.length}): + ${p.workers.map(worker => { + const nameDisplay = worker.name.length > 20 + ? `${worker.name.substring(0, 20)}...${renderCopyButton(worker.name, 'worker name')}` + : worker.name; + return ` +
+ ${nameDisplay} + ${worker.version} + ${worker.status} +
+ `; + }).join('')} +
+ ` : ''} + `; + })() : ''} ${details.FaucetTest ? `
Faucet: @@ -651,25 +615,56 @@ function updateDisplay() {
` : ''} - ${service.testDetails ? ` + ${details.ValidatorStatus ? `
- Proof Generation Testing (${service.testDetails.proof_type}): -
+ Validator: +
- Success Rate: - ${formatSuccessRate(service.testDetails.success_count, service.testDetails.failure_count)} + URL: + ${details.ValidatorStatus.url}${renderCopyButton(details.ValidatorStatus.url, 'URL')}
- Last Response Time: - ${service.testDetails.test_duration_ms}ms + Version: + ${details.ValidatorStatus.version} +
+
+ Chain Tip: + ${numOrDash(details.ValidatorStatus.chain_tip)} +
+
+ Validated Transactions: + ${numOrDash(details.ValidatorStatus.validated_transactions_count)}
- Last Proof Size: - ${(service.testDetails.proof_size_bytes / 1024).toFixed(2)} KB + Signed Blocks: + ${numOrDash(details.ValidatorStatus.signed_blocks_count)}
` : ''} + ${details.RemoteProverStatus?.test ? (() => { + const t = details.RemoteProverStatus.test; + const ts = details.RemoteProverStatus.test_status; + return ` +
+ Proof Generation Testing (${t.proof_type}): +
+
+ Success Rate: + ${formatSuccessRate(t.success_count, t.failure_count)} +
+
+ Last Response Time: + ${t.test_duration_ms}ms +
+
+ Last Proof Size: + ${(t.proof_size_bytes / 1024).toFixed(2)} KB +
+
+
+ `; + })() : ''}
`; } @@ -796,7 +791,7 @@ async function copyToClipboard(text, event) { // Show a brief success indicator const originalContent = button.innerHTML; button.innerHTML = ''; - button.style.color = '#22C55D'; + button.style.color = COLOR_HEALTHY; setTimeout(() => { button.innerHTML = originalContent; @@ -805,7 +800,7 @@ async function copyToClipboard(text, event) { } catch (err) { console.error('Failed to copy to clipboard:', err); // Show error feedback on button - button.style.color = '#ff5500'; + button.style.color = COLOR_UNHEALTHY; setTimeout(() => { button.style.color = ''; }, 2000); @@ -832,4 +827,3 @@ window.addEventListener('beforeunload', () => { clearInterval(grpcWebProbeInterval); } }); - diff --git a/bin/network-monitor/src/commands/start.rs b/bin/network-monitor/src/commands/start.rs index 880fa242f2..924df7e46b 100644 --- a/bin/network-monitor/src/commands/start.rs +++ b/bin/network-monitor/src/commands/start.rs @@ -55,6 +55,13 @@ pub async fn start_monitor(config: MonitorConfig) -> Result<()> { None }; + // Initialize the validator status checker task. + let validator_rx = if config.validator_url.is_some() { + Some(tasks.spawn_validator_checker(&config).await?) + } else { + None + }; + // Initialize the prover checkers & tests tasks, only if URLs were provided. let prover_rxs = if config.remote_prover_urls.is_empty() { debug!(target: COMPONENT, "No remote prover URLs configured, skipping prover tasks"); @@ -93,6 +100,7 @@ pub async fn start_monitor(config: MonitorConfig) -> Result<()> { ntx_tracking: ntx_tracking_rx, explorer: explorer_rx, note_transport: note_transport_rx, + validator: validator_rx, monitor_version: env!("CARGO_PKG_VERSION").to_string(), network_name: config.network_name.clone(), }; diff --git a/bin/network-monitor/src/config.rs b/bin/network-monitor/src/config.rs index 83326fa611..0c34ed32c6 100644 --- a/bin/network-monitor/src/config.rs +++ b/bin/network-monitor/src/config.rs @@ -183,6 +183,14 @@ pub struct MonitorConfig { )] pub note_transport_url: Option, + /// The URL of the validator service. + #[arg( + long = "validator-url", + env = "MIDEN_MONITOR_VALIDATOR_URL", + help = "The URL of the validator service" + )] + pub validator_url: Option, + /// Maximum time without a chain tip update before marking RPC as unhealthy. /// /// If the chain tip does not increment within this duration, the RPC service will be diff --git a/bin/network-monitor/src/counter.rs b/bin/network-monitor/src/counter.rs index c1bb55867b..5db834cfd7 100644 --- a/bin/network-monitor/src/counter.rs +++ b/bin/network-monitor/src/counter.rs @@ -6,7 +6,7 @@ use std::path::Path; use std::sync::Arc; use std::sync::atomic::{AtomicU64, Ordering}; -use std::time::Instant; +use std::time::{Duration, Instant}; use anyhow::{Context, Result}; use miden_node_proto::clients::RpcClient; @@ -42,7 +42,12 @@ use tracing::{error, info, instrument, warn}; /// Number of consecutive increment failures before re-syncing the wallet account from the RPC. const RESYNC_FAILURE_THRESHOLD: usize = 3; -use crate::COMPONENT; +/// Number of consecutive increment failures before regenerating accounts from scratch. +const REGENERATE_FAILURE_THRESHOLD: usize = 10; + +/// Minimum time between account regeneration attempts. +const REGENERATE_COOLDOWN: Duration = Duration::from_secs(3600); + use crate::config::MonitorConfig; use crate::deploy::counter::COUNTER_SLOT_NAME; use crate::deploy::{MonitorDataStore, create_genesis_aware_rpc_client}; @@ -52,8 +57,8 @@ use crate::status::{ PendingLatencyDetails, ServiceDetails, ServiceStatus, - Status, }; +use crate::{COMPONENT, current_unix_timestamp_secs}; #[derive(Debug, Default, Clone)] pub struct LatencyState { @@ -390,16 +395,17 @@ pub async fn run_increment_task( let ( mut details, mut wallet_account, - counter_account, - block_header, + mut counter_account, + mut block_header, mut data_store, - increment_script, - secret_key, + mut increment_script, + mut secret_key, ) = setup_increment_task(config.clone(), &mut rpc_client).await?; let mut rng = ChaCha20Rng::from_os_rng(); let mut interval = tokio::time::interval(config.counter_increment_interval); let mut consecutive_failures: usize = 0; + let mut last_regeneration: Option = None; loop { interval.tick().await; @@ -443,16 +449,41 @@ pub async fn run_increment_task( consecutive_failures += 1; last_error = Some(handle_increment_failure(&mut details, &e)); - if consecutive_failures >= RESYNC_FAILURE_THRESHOLD { - if try_resync_wallet_account( + if consecutive_failures >= RESYNC_FAILURE_THRESHOLD + && try_resync_wallet_account( &mut rpc_client, &mut wallet_account, &mut data_store, ) .await .is_ok() - { - consecutive_failures = 0; + { + consecutive_failures = 0; + } + + // If re-sync keeps failing, regenerate accounts from scratch (rate-limited). + let cooldown_elapsed = + last_regeneration.is_none_or(|t| t.elapsed() >= REGENERATE_COOLDOWN); + if consecutive_failures >= REGENERATE_FAILURE_THRESHOLD && cooldown_elapsed { + warn!( + consecutive_failures, + "re-sync ineffective, regenerating accounts from scratch" + ); + last_regeneration = Some(Instant::now()); + match try_regenerate_accounts(&config, &mut rpc_client).await { + Ok(new_state) => { + ( + details, + wallet_account, + counter_account, + block_header, + data_store, + increment_script, + secret_key, + ) = new_state; + consecutive_failures = 0; + }, + Err(regen_err) => error!("account regeneration failed: {regen_err:?}"), } } }, @@ -530,6 +561,46 @@ async fn try_resync_wallet_account( Ok(()) } +/// Regenerate accounts from scratch when re-sync is ineffective. +/// +/// Creates fresh wallet and counter accounts, deploys them to the network, and re-initializes +/// the increment task state. This is a last resort after [`REGENERATE_FAILURE_THRESHOLD`] +/// consecutive failures. +#[instrument( + parent = None, + target = COMPONENT, + name = "network_monitor.counter.try_regenerate_accounts", + skip_all, + level = "warn", + err, +)] +async fn try_regenerate_accounts( + config: &MonitorConfig, + rpc_client: &mut RpcClient, +) -> Result<( + IncrementDetails, + Account, + Account, + BlockHeader, + MonitorDataStore, + NoteScript, + SecretKey, +)> { + crate::deploy::force_recreate_accounts( + &config.wallet_filepath, + &config.counter_filepath, + &config.rpc_url, + ) + .await + .context("failed to regenerate accounts")?; + + // Re-initialize the full task state from the newly-created account files. + let state = setup_increment_task(config.clone(), rpc_client).await?; + + info!("account regeneration completed, increment task re-initialized"); + Ok(state) +} + /// Handle the failure path when creating/submitting the network note fails. fn handle_increment_failure(details: &mut IncrementDetails, error: &anyhow::Error) -> String { error!("Failed to create and submit network note: {:?}", error); @@ -539,24 +610,21 @@ fn handle_increment_failure(details: &mut IncrementDetails, error: &anyhow::Erro /// Build a `ServiceStatus` snapshot from the current increment details and last error. fn build_increment_status(details: &IncrementDetails, last_error: Option) -> ServiceStatus { - let status = if last_error.is_some() { - // If the most recent attempt failed, surface the service as unhealthy so the - // dashboard reflects that the increment pipeline is not currently working. - Status::Unhealthy - } else if details.failure_count == 0 { - Status::Healthy - } else if details.success_count == 0 { - Status::Unhealthy + let service_details = ServiceDetails::NtxIncrement(details.clone()); + + // If the most recent attempt failed, surface the service as unhealthy so the + // dashboard reflects that the increment pipeline is not currently working. + // Also unhealthy if we've never succeeded but have failures. + if let Some(err) = last_error { + ServiceStatus::unhealthy("Local Transactions", err, service_details) + } else if details.success_count == 0 && details.failure_count > 0 { + ServiceStatus::unhealthy( + "Local Transactions", + format!("no successful increments ({} failures)", details.failure_count), + service_details, + ) } else { - Status::Healthy - }; - - ServiceStatus { - name: "Local Transactions".to_string(), - status, - last_checked: crate::monitor::tasks::current_unix_timestamp_secs(), - error: last_error, - details: ServiceDetails::NtxIncrement(details.clone()), + ServiceStatus::healthy("Local Transactions", service_details) } } @@ -595,7 +663,7 @@ pub async fn run_counter_tracking_task( create_genesis_aware_rpc_client(&config.rpc_url, config.request_timeout).await?; // Load counter account to get the account ID - let counter_account = match load_counter_account(&config.counter_filepath) { + let mut counter_account = match load_counter_account(&config.counter_filepath) { Ok(account) => account, Err(e) => { error!("Failed to load counter account: {:?}", e); @@ -617,6 +685,17 @@ pub async fn run_counter_tracking_task( loop { poll_interval.tick().await; + // The increment task may regenerate accounts when doesn't fixes the card, reload from + // the account file so tracking follows the new account. + reload_counter_account_if_changed( + &config, + &mut counter_account, + &mut rpc_client, + &expected_counter_value, + &mut details, + ) + .await; + let last_error = poll_counter_once( &mut rpc_client, &counter_account, @@ -631,6 +710,40 @@ pub async fn run_counter_tracking_task( } } +/// Reload the counter account from disk and re-initialize tracking state if its ID changed. +/// +/// The increment task regenerates accounts on persistent failure and rewrites the account +/// file. Without this the tracking task would keep polling the stale account ID forever. +async fn reload_counter_account_if_changed( + config: &MonitorConfig, + counter_account: &mut Account, + rpc_client: &mut RpcClient, + expected_counter_value: &Arc, + details: &mut CounterTrackingDetails, +) { + let reloaded = match load_counter_account(&config.counter_filepath) { + Ok(account) => account, + Err(e) => { + warn!(err = ?e, "failed to reload counter account file"); + return; + }, + }; + + if reloaded.id() == counter_account.id() { + return; + } + + info!( + old.id = %counter_account.id(), + new.id = %reloaded.id(), + "counter account file changed, resetting tracking state", + ); + *counter_account = reloaded; + *details = CounterTrackingDetails::default(); + initialize_counter_tracking_state(rpc_client, counter_account, expected_counter_value, details) + .await; +} + /// Initialize tracking state by fetching the current counter value from the node. /// /// Populates `expected_counter_value` and seeds `details` with the latest observed @@ -646,7 +759,7 @@ async fn initialize_counter_tracking_state( expected_counter_value.store(initial_value, Ordering::Relaxed); details.current_value = Some(initial_value); details.expected_value = Some(initial_value); - details.last_updated = Some(crate::monitor::tasks::current_unix_timestamp_secs()); + details.last_updated = Some(current_unix_timestamp_secs()); info!("Initialized counter tracking with value: {}", initial_value); }, Ok(None) => { @@ -673,7 +786,7 @@ async fn poll_counter_once( config: &MonitorConfig, ) -> Option { let mut last_error = None; - let current_time = crate::monitor::tasks::current_unix_timestamp_secs(); + let current_time = current_unix_timestamp_secs(); match fetch_counter_value(rpc_client, counter_account.id()).await { Ok(Some(value)) => { @@ -773,22 +886,16 @@ fn build_tracking_status( details: &CounterTrackingDetails, last_error: Option, ) -> ServiceStatus { - let status = if last_error.is_some() { - // If the latest poll failed, surface the service as unhealthy even if we have - // a previously cached value, so the dashboard shows that tracking is degraded. - Status::Unhealthy + let service_details = ServiceDetails::NtxTracking(details.clone()); + + // If the latest poll failed, surface the service as unhealthy even if we have + // a previously cached value, so the dashboard shows that tracking is degraded. + if let Some(err) = last_error { + ServiceStatus::unhealthy("Network Transactions", err, service_details) } else if details.current_value.is_some() { - Status::Healthy + ServiceStatus::healthy("Network Transactions", service_details) } else { - Status::Unknown - }; - - ServiceStatus { - name: "Network Transactions".to_string(), - status, - last_checked: crate::monitor::tasks::current_unix_timestamp_secs(), - error: last_error, - details: ServiceDetails::NtxTracking(details.clone()), + ServiceStatus::unknown("Network Transactions", service_details) } } diff --git a/bin/network-monitor/src/deploy/mod.rs b/bin/network-monitor/src/deploy/mod.rs index cc1e774e81..5a998bd636 100644 --- a/bin/network-monitor/src/deploy/mod.rs +++ b/bin/network-monitor/src/deploy/mod.rs @@ -137,6 +137,28 @@ pub async fn ensure_accounts_exist( save_counter_account(&counter_account, counter_filepath) } +/// Unconditionally creates fresh wallet and counter accounts, deploys the counter, and saves both +/// to disk. Unlike [`ensure_accounts_exist`], this always replaces existing account files. +/// +/// Used by the increment task when accounts are fundamentally outdated (e.g., after a network +/// reset) and re-syncing from the RPC is not sufficient. +pub async fn force_recreate_accounts( + wallet_filepath: &Path, + counter_filepath: &Path, + rpc_url: &Url, +) -> Result<()> { + tracing::warn!("Regenerating monitor accounts (force recreate)"); + + let (wallet_account, secret_key) = create_wallet_account()?; + let counter_account = create_counter_account(wallet_account.id())?; + + deploy_counter_account(&counter_account, rpc_url).await?; + tracing::info!("Successfully recreated and deployed accounts"); + + save_wallet_account(&wallet_account, &secret_key, wallet_filepath)?; + save_counter_account(&counter_account, counter_filepath) +} + /// Deploy counter account to the network. /// /// This function creates a counter program account, diff --git a/bin/network-monitor/src/explorer.rs b/bin/network-monitor/src/explorer.rs index da154a229e..3d3499980c 100644 --- a/bin/network-monitor/src/explorer.rs +++ b/bin/network-monitor/src/explorer.rs @@ -11,8 +11,8 @@ use tokio::time::MissedTickBehavior; use tracing::{info, instrument}; use url::Url; -use crate::status::{ExplorerStatusDetails, ServiceDetails, ServiceStatus, Status}; -use crate::{COMPONENT, current_unix_timestamp_secs}; +use crate::COMPONENT; +use crate::status::{ExplorerStatusDetails, ServiceDetails, ServiceStatus}; const LATEST_BLOCK_QUERY: &str = " query LatestBlock { @@ -79,13 +79,10 @@ pub async fn run_explorer_status_task( loop { interval.tick().await; - let current_time = current_unix_timestamp_secs(); - let status = check_explorer_status( &mut explorer_client, explorer_url.clone(), name.clone(), - current_time, request_timeout, ) .await; @@ -121,7 +118,6 @@ pub(crate) async fn check_explorer_status( explorer_client: &mut Client, explorer_url: Url, name: String, - current_time: u64, request_timeout: Duration, ) -> ServiceStatus { let resp = explorer_client @@ -134,61 +130,96 @@ pub(crate) async fn check_explorer_status( let body = match resp { Ok(resp) => match resp.text().await { Ok(body) => body, - Err(e) => return unhealthy(&name, current_time, &e), + Err(e) => return ServiceStatus::error(&name, e), }, - Err(e) => return unhealthy(&name, current_time, &e), + Err(e) => return ServiceStatus::error(&name, e), }; let value: serde_json::Value = match serde_json::from_str(&body) { Ok(value) => value, Err(e) => { - let msg = format!("{e}: {body}"); - return unhealthy(&name, current_time, &msg); + return ServiceStatus::error(&name, format!("{e}: {body}")); }, }; - let details = ExplorerStatusDetails::try_from(value); - - match details { - Ok(details) => ServiceStatus { - name: name.clone(), - status: Status::Healthy, - last_checked: current_time, - error: None, - details: ServiceDetails::ExplorerStatus(details), - }, - Err(e) => unhealthy(&name, current_time, &e), - } -} - -/// Returns an unhealthy service status. -fn unhealthy(name: &str, current_time: u64, err: &impl ToString) -> ServiceStatus { - ServiceStatus { - name: name.to_owned(), - status: Status::Unhealthy, - last_checked: current_time, - error: Some(err.to_string()), - details: ServiceDetails::Error, + match ExplorerStatusDetails::try_from(value) { + Ok(details) => ServiceStatus::healthy(name, ServiceDetails::ExplorerStatus(details)), + Err(e) => ServiceStatus::error(&name, e), } } #[derive(Debug)] pub enum ExplorerStatusError { - MissingField(String), + /// A required field was not present in the response. + NotPresent { field: String, response: String }, + /// A field was present but had an unexpected type. + TypeMismatch { + field: String, + expected: &'static str, + got: String, + }, } impl Display for ExplorerStatusError { fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result { match self { - ExplorerStatusError::MissingField(field) => write!(f, "missing field: {field}"), + ExplorerStatusError::NotPresent { field, response } => { + write!(f, "field '{field}': not present in response (got: {response})") + }, + ExplorerStatusError::TypeMismatch { field, expected, got } => { + write!(f, "field '{field}': expected {expected}, got {got}") + }, } } } -/// Extracts a u64 from a JSON value that may be either a number or a -/// string-encoded number (as returned by the Explorer's GraphQL API). -fn value_as_u64(value: &serde_json::Value) -> Option { - value.as_u64().or_else(|| value.as_str().and_then(|s| s.parse().ok())) +/// Extracts a u64 from a named field. +/// +/// Accepts both numeric values and string-encoded numbers (as returned by the Explorer's +/// GraphQL API). +fn require_u64(node: &serde_json::Value, field: &str) -> Result { + let value = node.get(field).ok_or_else(|| ExplorerStatusError::NotPresent { + field: field.into(), + response: truncate_json(node), + })?; + + value + .as_u64() + .or_else(|| value.as_str().and_then(|s| s.parse().ok())) + .ok_or_else(|| ExplorerStatusError::TypeMismatch { + field: field.into(), + expected: "u64-compatible value", + got: truncate_json(value), + }) +} + +/// Extracts a string from a named field. +fn require_str(node: &serde_json::Value, field: &str) -> Result { + let value = node.get(field).ok_or_else(|| ExplorerStatusError::NotPresent { + field: field.into(), + response: truncate_json(node), + })?; + + value + .as_str() + .map(String::from) + .ok_or_else(|| ExplorerStatusError::TypeMismatch { + field: field.into(), + expected: "string", + got: truncate_json(value), + }) +} + +/// Returns a short string representation of a JSON value for error messages. +/// +/// Truncates the JSON string to at most 60 characters, appending "..." if truncated. +/// Truncation is done at a character boundary to avoid panicking on multi-byte characters. +fn truncate_json(value: &serde_json::Value) -> String { + let s = value.to_string(); + match s.char_indices().nth(60) { + Some((idx, _)) => format!("{}...", &s[..idx]), + None => s, + } } impl TryFrom for ExplorerStatusDetails { @@ -196,71 +227,137 @@ impl TryFrom for ExplorerStatusDetails { fn try_from(value: serde_json::Value) -> Result { let node = value.pointer("/data/blocks/edges/0/node").ok_or_else(|| { - ExplorerStatusError::MissingField("data.blocks.edges[0].node".to_string()) + ExplorerStatusError::NotPresent { + field: "data.blocks.edges[0].node".to_string(), + response: truncate_json(&value), + } })?; - let block_number = node - .get("block_number") - .and_then(value_as_u64) - .ok_or_else(|| ExplorerStatusError::MissingField("block_number".to_string()))?; - let timestamp = node - .get("timestamp") - .and_then(value_as_u64) - .ok_or_else(|| ExplorerStatusError::MissingField("timestamp".to_string()))?; - - let number_of_transactions = - node.get("number_of_transactions").and_then(value_as_u64).ok_or_else(|| { - ExplorerStatusError::MissingField("number_of_transactions".to_string()) - })?; - let number_of_nullifiers = node - .get("number_of_nullifiers") - .and_then(value_as_u64) - .ok_or_else(|| ExplorerStatusError::MissingField("number_of_nullifiers".to_string()))?; - let number_of_notes = node - .get("number_of_notes") - .and_then(value_as_u64) - .ok_or_else(|| ExplorerStatusError::MissingField("number_of_notes".to_string()))?; - let number_of_account_updates = - node.get("number_of_account_updates").and_then(value_as_u64).ok_or_else(|| { - ExplorerStatusError::MissingField("number_of_account_updates".to_string()) - })?; - - let block_commitment = node - .get("block_commitment") - .and_then(|v| v.as_str()) - .ok_or_else(|| ExplorerStatusError::MissingField("block_commitment".to_string()))? - .to_string(); - let chain_commitment = node - .get("chain_commitment") - .and_then(|v| v.as_str()) - .ok_or_else(|| ExplorerStatusError::MissingField("chain_commitment".to_string()))? - .to_string(); - let proof_commitment = node - .get("proof_commitment") - .and_then(|v| v.as_str()) - .ok_or_else(|| ExplorerStatusError::MissingField("proof_commitment".to_string()))? - .to_string(); - Ok(Self { - block_number, - timestamp, - number_of_transactions, - number_of_nullifiers, - number_of_notes, - number_of_account_updates, - block_commitment, - chain_commitment, - proof_commitment, + block_number: require_u64(node, "block_number")?, + timestamp: require_u64(node, "timestamp")?, + number_of_transactions: require_u64(node, "number_of_transactions")?, + number_of_nullifiers: require_u64(node, "number_of_nullifiers")?, + number_of_notes: require_u64(node, "number_of_notes")?, + number_of_account_updates: require_u64(node, "number_of_account_updates")?, + block_commitment: require_str(node, "block_commitment")?, + chain_commitment: require_str(node, "chain_commitment")?, + proof_commitment: require_str(node, "proof_commitment")?, }) } } pub(crate) fn initial_explorer_status() -> ServiceStatus { - ServiceStatus { - name: "Explorer".to_string(), - status: Status::Unknown, - last_checked: current_unix_timestamp_secs(), - error: None, - details: ServiceDetails::ExplorerStatus(ExplorerStatusDetails::default()), + ServiceStatus::unknown( + "Explorer", + ServiceDetails::ExplorerStatus(ExplorerStatusDetails::default()), + ) +} + +// TESTS +// ================================================================================================ + +#[cfg(test)] +mod tests { + use serde_json::json; + + use super::*; + + // truncate_json tests + // -------------------------------------------------------------------------------------------- + + #[test] + fn truncate_json_short_value_is_not_truncated() { + let value = json!({"key": "short"}); + let result = truncate_json(&value); + assert_eq!(result, value.to_string()); + assert!(!result.ends_with("...")); + } + + #[test] + fn truncate_json_long_value_is_truncated() { + let long_string = "a".repeat(100); + let value = json!(long_string); + let result = truncate_json(&value); + assert!(result.ends_with("...")); + // 60 chars + "..." + assert_eq!(result.chars().count(), 63); + } + + #[test] + fn truncate_json_multibyte_chars_are_handled() { + // Each 'é' is 2 bytes in UTF-8. Build a string whose serialized JSON form + // exceeds 60 characters, ensuring truncation lands on a char boundary. + let multibyte_string = "é".repeat(80); + let value = json!(multibyte_string); + // Should not panic and should still truncate correctly. + let result = truncate_json(&value); + assert!(result.ends_with("...")); + } + + #[test] + fn truncate_json_exactly_60_chars_is_not_truncated() { + // Build a JSON string whose serialized form is exactly 60 characters. + // json!("x".repeat(58)) serializes as `"xxx...xxx"` (58 chars + 2 quotes = 60). + let value = json!("x".repeat(58)); + let result = truncate_json(&value); + assert_eq!(result.chars().count(), 60); + assert!(!result.ends_with("...")); + } + + // require_u64 tests + // -------------------------------------------------------------------------------------------- + + #[test] + fn require_u64_from_number() { + let node = json!({"block_number": 42}); + assert_eq!(require_u64(&node, "block_number").unwrap(), 42); + } + + #[test] + fn require_u64_from_string() { + let node = json!({"block_number": "42"}); + assert_eq!(require_u64(&node, "block_number").unwrap(), 42); + } + + #[test] + fn require_u64_missing_field() { + let node = json!({}); + let err = require_u64(&node, "block_number").unwrap_err(); + assert!( + matches!(err, ExplorerStatusError::NotPresent { field, .. } if field == "block_number") + ); + } + + #[test] + fn require_u64_wrong_type() { + let node = json!({"block_number": [1, 2, 3]}); + let err = require_u64(&node, "block_number").unwrap_err(); + assert!( + matches!(err, ExplorerStatusError::TypeMismatch { field, .. } if field == "block_number") + ); + } + + // require_str tests + // -------------------------------------------------------------------------------------------- + + #[test] + fn require_str_valid() { + let node = json!({"name": "hello"}); + assert_eq!(require_str(&node, "name").unwrap(), "hello"); + } + + #[test] + fn require_str_missing_field() { + let node = json!({}); + let err = require_str(&node, "name").unwrap_err(); + assert!(matches!(err, ExplorerStatusError::NotPresent { field, .. } if field == "name")); + } + + #[test] + fn require_str_wrong_type() { + let node = json!({"name": 123}); + let err = require_str(&node, "name").unwrap_err(); + assert!(matches!(err, ExplorerStatusError::TypeMismatch { field, .. } if field == "name")); } } diff --git a/bin/network-monitor/src/faucet.rs b/bin/network-monitor/src/faucet.rs index b66bde7871..75c223a9a3 100644 --- a/bin/network-monitor/src/faucet.rs +++ b/bin/network-monitor/src/faucet.rs @@ -17,8 +17,8 @@ use tokio::time::MissedTickBehavior; use tracing::{debug, info, instrument, warn}; use url::Url; -use crate::status::{ServiceDetails, ServiceStatus, Status}; -use crate::{COMPONENT, current_unix_timestamp_secs}; +use crate::COMPONENT; +use crate::status::{ServiceDetails, ServiceStatus}; // CONSTANTS // ================================================================================================ @@ -110,8 +110,6 @@ pub async fn run_faucet_test_task( loop { interval.tick().await; - let current_time = current_unix_timestamp_secs(); - let start_time = std::time::Instant::now(); match perform_faucet_test(&client, &faucet_url).await { @@ -131,25 +129,18 @@ pub async fn run_faucet_test_task( let test_duration_ms = start_time.elapsed().as_millis() as u64; - let test_details = FaucetTestDetails { + let details = ServiceDetails::FaucetTest(FaucetTestDetails { url: faucet_url.to_string(), test_duration_ms, success_count, failure_count, last_tx_id: last_tx_id.clone(), faucet_metadata: faucet_metadata.clone(), - }; + }); - let status = ServiceStatus { - name: "Faucet".to_string(), - status: if last_error.is_some() { - Status::Unhealthy - } else { - Status::Healthy - }, - last_checked: current_time, - error: last_error.clone(), - details: ServiceDetails::FaucetTest(test_details), + let status = match &last_error { + Some(err) => ServiceStatus::unhealthy("Faucet", err, details), + None => ServiceStatus::healthy("Faucet", details), }; // Send the status update; exit if no receivers (shutdown signal) @@ -202,8 +193,8 @@ async fn perform_faucet_test( let response_text: String = response.text().await?; debug!("Faucet PoW response: {}", response_text); - let challenge_response: PowChallengeResponse = serde_json::from_str(&response_text) - .with_context(|| format!("Failed to parse PoW response: {response_text}"))?; + let challenge_response: PowChallengeResponse = + serde_json::from_str(&response_text).context("unexpected response from /pow")?; debug!( "Received PoW challenge: target={}, challenge={}...", @@ -230,9 +221,10 @@ async fn perform_faucet_test( let response = client.get(tokens_url).send().await?; let response_text: String = response.text().await?; + debug!("Faucet /get_tokens response: {}", response_text); - let tokens_response: GetTokensResponse = serde_json::from_str(&response_text) - .with_context(|| format!("Failed to parse tokens response: {response_text}"))?; + let tokens_response: GetTokensResponse = + serde_json::from_str(&response_text).context("unexpected response from /get_tokens")?; // Step 4: Get faucet metadata let metadata_url = faucet_url.join("/get_metadata")?; @@ -240,9 +232,10 @@ async fn perform_faucet_test( let response = client.get(metadata_url).send().await?; let response_text = response.text().await?; + debug!("Faucet /get_metadata response: {}", response_text); - let metadata: GetMetadataResponse = serde_json::from_str(&response_text) - .with_context(|| format!("Failed to parse metadata response: {response_text}"))?; + let metadata: GetMetadataResponse = + serde_json::from_str(&response_text).context("unexpected response from /get_metadata")?; Ok((tokens_response, metadata)) } diff --git a/bin/network-monitor/src/frontend.rs b/bin/network-monitor/src/frontend.rs index 1d905c5e23..320a2efe9f 100644 --- a/bin/network-monitor/src/frontend.rs +++ b/bin/network-monitor/src/frontend.rs @@ -12,7 +12,7 @@ use tracing::{info, instrument}; use crate::COMPONENT; use crate::config::MonitorConfig; -use crate::status::{NetworkStatus, ServiceStatus}; +use crate::status::{NetworkStatus, RemoteProverDetails, ServiceDetails, ServiceStatus, Status}; // SERVER STATE // ================================================================================================ @@ -27,6 +27,7 @@ pub struct ServerState { pub ntx_tracking: Option>, pub explorer: Option>, pub note_transport: Option>, + pub validator: Option>, pub monitor_version: String, pub network_name: String, } @@ -85,10 +86,9 @@ async fn get_status( services.push(faucet_rx.borrow().clone()); } - // Collect all remote prover statuses + // Collect all remote prover statuses, merging status + test into a single entry per prover. for (prover_status_rx, prover_test_rx) in &server_state.provers { - services.push(prover_status_rx.borrow().clone()); - services.push(prover_test_rx.borrow().clone()); + services.push(merge_prover(&prover_status_rx.borrow(), &prover_test_rx.borrow())); } // Collect explorer status if available @@ -111,6 +111,11 @@ async fn get_status( services.push(note_transport_rx.borrow().clone()); } + // Collect validator status if available + if let Some(validator_rx) = &server_state.validator { + services.push(validator_rx.borrow().clone()); + } + let network_status = NetworkStatus { services, last_updated: current_time, @@ -144,3 +149,50 @@ async fn serve_favicon() -> Response { ) .into_response() } + +/// Merges the status and test receivers for a single remote prover into one `ServiceStatus`. +/// +/// The combined status is `Unhealthy` if either the status check or the test failed, `Unknown` +/// if the status checker has not yet seen the prover, and `Healthy` otherwise. The test result +/// is only attached when the test task has produced an actual `RemoteProverTest` result (before +/// the first test completes, the test channel holds the initial prover status and should not be +/// surfaced as a test). +fn merge_prover(status: &ServiceStatus, test: &ServiceStatus) -> ServiceStatus { + // Extract prover status details, or pass through the raw status if the prover is down + // (details will be `ServiceDetails::Error` in that case). + let status_details = match &status.details { + ServiceDetails::ProverStatusCheck(d) => d.clone(), + _ => return status.clone(), + }; + + // Only attach test details once the test task has produced a real result. + let (test_details, test_status, test_error) = match &test.details { + ServiceDetails::ProverTestResult(d) => { + (Some(d.clone()), Some(test.status.clone()), test.error.clone()) + }, + _ => (None, None, None), + }; + + let details = ServiceDetails::RemoteProverStatus(RemoteProverDetails { + status: status_details, + test: test_details, + test_status: test_status.clone(), + test_error: test_error.clone(), + }); + + let name = &status.name; + let base = match (&status.status, &test_status) { + (Status::Unhealthy, _) | (_, Some(Status::Unhealthy)) => { + let error = status + .error + .clone() + .or(test_error) + .unwrap_or_else(|| "prover is unhealthy".to_string()); + ServiceStatus::unhealthy(name, error, details) + }, + (Status::Unknown, _) => ServiceStatus::unknown(name, details), + _ => ServiceStatus::healthy(name, details), + }; + + base.with_last_checked(status.last_checked) +} diff --git a/bin/network-monitor/src/main.rs b/bin/network-monitor/src/main.rs index 80244a47a1..e0dba4b51f 100644 --- a/bin/network-monitor/src/main.rs +++ b/bin/network-monitor/src/main.rs @@ -18,12 +18,14 @@ pub mod frontend; mod monitor; pub mod note_transport; pub mod remote_prover; +pub mod service_status; pub mod status; +pub mod validator; // Re-exports for cleaner imports use cli::Cli; // Re-export for other modules -pub use monitor::tasks::current_unix_timestamp_secs; +pub use service_status::current_unix_timestamp_secs; /// Component identifier for structured logging and tracing pub const COMPONENT: &str = "miden-network-monitor"; @@ -37,5 +39,5 @@ pub const COMPONENT: &str = "miden-network-monitor"; #[tokio::main] async fn main() -> Result<()> { let cli = Cli::parse(); - cli.execute().await + Box::pin(cli.execute()).await } diff --git a/bin/network-monitor/src/monitor/tasks.rs b/bin/network-monitor/src/monitor/tasks.rs index 57f5ce3950..1db8e6525a 100644 --- a/bin/network-monitor/src/monitor/tasks.rs +++ b/bin/network-monitor/src/monitor/tasks.rs @@ -3,7 +3,6 @@ use std::collections::HashMap; use std::sync::Arc; use std::sync::atomic::AtomicU64; -use std::time::{Duration, SystemTime, UNIX_EPOCH}; use anyhow::Result; use miden_node_proto::clients::{ @@ -21,18 +20,23 @@ use crate::config::MonitorConfig; use crate::counter::{LatencyState, run_counter_tracking_task, run_increment_task}; use crate::deploy::ensure_accounts_exist; use crate::explorer::{initial_explorer_status, run_explorer_status_task}; -use crate::faucet::run_faucet_test_task; +use crate::faucet::{FaucetTestDetails, run_faucet_test_task}; use crate::frontend::{ServerState, serve}; use crate::note_transport::{initial_note_transport_status, run_note_transport_status_task}; use crate::remote_prover::{ProofType, generate_prover_test_payload, run_remote_prover_test_task}; use crate::status::{ + CounterTrackingDetails, + IncrementDetails, + ServiceDetails, ServiceStatus, StaleChainTracker, check_remote_prover_status, check_rpc_status, + current_unix_timestamp_secs, run_remote_prover_status_task, run_rpc_status_task, }; +use crate::validator::{initial_validator_status, run_validator_status_task}; /// Task management structure that encapsulates `JoinSet` and component names. #[derive(Default)] @@ -176,6 +180,38 @@ impl Tasks { Ok(rx) } + /// Spawn the validator status checker task. + #[instrument(target = COMPONENT, name = "tasks.spawn-validator-checker", skip_all)] + pub async fn spawn_validator_checker( + &mut self, + config: &MonitorConfig, + ) -> Result> { + let validator_url = config.validator_url.clone().expect("Validator URL exists"); + let name = "Validator".to_string(); + let status_check_interval = config.status_check_interval; + let request_timeout = config.request_timeout; + let (tx, rx) = watch::channel(initial_validator_status()); + + let id = self + .handles + .spawn(async move { + run_validator_status_task( + validator_url, + name, + tx, + status_check_interval, + request_timeout, + ) + .await; + }) + .id(); + self.names.insert(id, "validator-checker".to_string()); + + println!("Spawned validator status checker task"); + + Ok(rx) + } + /// Spawn prover status and test tasks for all configured provers. #[instrument( parent = None, @@ -205,13 +241,10 @@ impl Tasks { .without_otel_context_injection() .connect_lazy::(); - let current_time = current_unix_timestamp_secs(); - let initial_prover_status = check_remote_prover_status( &mut remote_prover, name.clone(), prover_url.to_string(), - current_time, ) .await; @@ -241,7 +274,7 @@ impl Tasks { // Extract proof_type directly from the service status // If the prover is not available during startup, skip spawning test tasks - let proof_type = if let crate::status::ServiceDetails::RemoteProverStatus(details) = + let proof_type = if let ServiceDetails::ProverStatusCheck(details) = &initial_prover_status.details { Some(details.supported_proof_type.clone()) @@ -312,15 +345,10 @@ impl Tasks { ret(level = "debug") )] pub fn spawn_faucet(&mut self, config: &MonitorConfig) -> Receiver { - let current_time = current_unix_timestamp_secs(); - // Create initial faucet test status - let initial_faucet_status = ServiceStatus { - name: "Faucet".to_string(), - status: crate::status::Status::Unknown, - last_checked: current_time, - error: None, - details: crate::status::ServiceDetails::FaucetTest(crate::faucet::FaucetTestDetails { + let initial_faucet_status = ServiceStatus::unknown( + "Faucet", + ServiceDetails::FaucetTest(FaucetTestDetails { url: config.faucet_url.as_ref().expect("faucet URL exists").to_string(), test_duration_ms: 0, success_count: 0, @@ -328,7 +356,7 @@ impl Tasks { last_tx_id: None, faucet_metadata: None, }), - }; + ); // Spawn the faucet testing task let (faucet_tx, faucet_rx) = watch::channel(initial_faucet_status); @@ -366,8 +394,6 @@ impl Tasks { ensure_accounts_exist(&config.wallet_filepath, &config.counter_filepath, &config.rpc_url) .await?; - let current_time = current_unix_timestamp_secs(); - // Create shared atomic counter for tracking expected counter value let expected_counter_value = Arc::new(AtomicU64::new(0)); let latency_state = Arc::new(Mutex::new(LatencyState::default())); @@ -375,34 +401,26 @@ impl Tasks { let latency_state_for_tracking = latency_state.clone(); // Create initial increment status - let initial_increment_status = ServiceStatus { - name: "Local Transactions".to_string(), - status: crate::status::Status::Unknown, - last_checked: current_time, - error: None, - details: crate::status::ServiceDetails::NtxIncrement(crate::status::IncrementDetails { + let initial_increment_status = ServiceStatus::unknown( + "Local Transactions", + ServiceDetails::NtxIncrement(IncrementDetails { success_count: 0, failure_count: 0, last_tx_id: None, last_latency_blocks: None, }), - }; + ); // Create initial tracking status - let initial_tracking_status = ServiceStatus { - name: "Network Transactions".to_string(), - status: crate::status::Status::Unknown, - last_checked: current_time, - error: None, - details: crate::status::ServiceDetails::NtxTracking( - crate::status::CounterTrackingDetails { - current_value: None, - expected_value: None, - last_updated: None, - pending_increments: None, - }, - ), - }; + let initial_tracking_status = ServiceStatus::unknown( + "Network Transactions", + ServiceDetails::NtxTracking(CounterTrackingDetails { + current_value: None, + expected_value: None, + last_updated: None, + pending_increments: None, + }), + ); // Spawn the increment task let (increment_tx, increment_rx) = watch::channel(initial_increment_status); @@ -493,14 +511,3 @@ impl Tasks { Err(err.context(format!("component {component_name} failed"))) } } - -/// Gets the current Unix timestamp in seconds. -/// -/// This function is infallible - if the system time is somehow before Unix epoch -/// (extremely unlikely), it returns 0. -pub fn current_unix_timestamp_secs() -> u64 { - SystemTime::now() - .duration_since(UNIX_EPOCH) - .unwrap_or_else(|_| Duration::from_secs(0)) // Fallback to 0 if before Unix epoch - .as_secs() -} diff --git a/bin/network-monitor/src/note_transport.rs b/bin/network-monitor/src/note_transport.rs index 4556f8bf10..7fc618243b 100644 --- a/bin/network-monitor/src/note_transport.rs +++ b/bin/network-monitor/src/note_transport.rs @@ -11,8 +11,8 @@ use tonic_health::pb::{HealthCheckRequest, health_check_response}; use tracing::{info, instrument}; use url::Url; -use crate::status::{NoteTransportStatusDetails, ServiceDetails, ServiceStatus, Status}; -use crate::{COMPONENT, current_unix_timestamp_secs}; +use crate::COMPONENT; +use crate::status::{NoteTransportStatusDetails, ServiceDetails, ServiceStatus}; /// Creates a `tonic` channel for the given URL, enabling TLS for `https` schemes. fn create_channel(url: &Url, timeout: Duration) -> Result { @@ -42,15 +42,8 @@ pub async fn run_note_transport_status_task( loop { interval.tick().await; - let current_time = current_unix_timestamp_secs(); - - let status = check_note_transport_status( - &mut health_client, - url.to_string(), - name.clone(), - current_time, - ) - .await; + let status = + check_note_transport_status(&mut health_client, url.to_string(), name.clone()).await; if status_sender.send(status).is_err() { info!("No receivers for note transport status updates, shutting down"); @@ -70,7 +63,6 @@ pub(crate) async fn check_note_transport_status( health_client: &mut HealthClient, url: String, name: String, - current_time: u64, ) -> ServiceStatus { let request = HealthCheckRequest { service: String::new() }; @@ -78,42 +70,30 @@ pub(crate) async fn check_note_transport_status( Ok(response) => { let serving_status = response.into_inner().status(); let is_serving = serving_status == health_check_response::ServingStatus::Serving; - - let status = if is_serving { Status::Healthy } else { Status::Unhealthy }; let serving_status_str = format!("{serving_status:?}"); - ServiceStatus { - name, - status, - last_checked: current_time, - error: None, - details: ServiceDetails::NoteTransportStatus(NoteTransportStatusDetails { + if is_serving { + let details = ServiceDetails::NoteTransportStatus(NoteTransportStatusDetails { + url, + serving_status: serving_status_str, + }); + ServiceStatus::healthy(name, details) + } else { + let error = format!("serving status: {serving_status_str}"); + let details = ServiceDetails::NoteTransportStatus(NoteTransportStatusDetails { url, serving_status: serving_status_str, - }), + }); + ServiceStatus::unhealthy(name, error, details) } }, - Err(e) => unhealthy(&name, current_time, &e), - } -} - -/// Returns an unhealthy service status. -fn unhealthy(name: &str, current_time: u64, err: &impl ToString) -> ServiceStatus { - ServiceStatus { - name: name.to_owned(), - status: Status::Unhealthy, - last_checked: current_time, - error: Some(err.to_string()), - details: ServiceDetails::Error, + Err(e) => ServiceStatus::error(name, e), } } pub(crate) fn initial_note_transport_status() -> ServiceStatus { - ServiceStatus { - name: "Note Transport".to_string(), - status: Status::Unknown, - last_checked: current_unix_timestamp_secs(), - error: None, - details: ServiceDetails::NoteTransportStatus(NoteTransportStatusDetails::default()), - } + ServiceStatus::unknown( + "Note Transport", + ServiceDetails::NoteTransportStatus(NoteTransportStatusDetails::default()), + ) } diff --git a/bin/network-monitor/src/remote_prover.rs b/bin/network-monitor/src/remote_prover.rs index ffa2a2724e..27d0a99705 100644 --- a/bin/network-monitor/src/remote_prover.rs +++ b/bin/network-monitor/src/remote_prover.rs @@ -22,8 +22,8 @@ use tonic::Request; use tracing::{info, instrument}; use url::Url; -use crate::status::{ServiceDetails, ServiceStatus, Status}; -use crate::{COMPONENT, current_unix_timestamp_secs}; +use crate::COMPONENT; +use crate::status::{ServiceDetails, ServiceStatus}; // PROOF TYPE // ================================================================================================ @@ -115,14 +115,11 @@ pub async fn run_remote_prover_test_task( loop { interval.tick().await; - let current_time = current_unix_timestamp_secs(); - let status = test_remote_prover( &mut client, name, &proof_type, &serialized_request_payload, - current_time, &mut success_count, &mut failure_count, ) @@ -147,7 +144,6 @@ pub async fn run_remote_prover_test_task( /// * `name` - The name of the remote prover. /// * `proof_type` - The type of proof to test. /// * `serialized_request_payload` - The serialized request payload to send to the remote prover. -/// * `current_time` - The current time in seconds since UNIX epoch. /// * `success_count` - Mutable reference to the success counter. /// * `failure_count` - Mutable reference to the failure counter. /// @@ -167,7 +163,6 @@ async fn test_remote_prover( name: &str, proof_type: &ProofType, serialized_request_payload: &proto::remote_prover::ProofRequest, - current_time: u64, success_count: &mut u64, failure_count: &mut u64, ) -> ServiceStatus { @@ -184,36 +179,31 @@ async fn test_remote_prover( *success_count += 1; - ServiceStatus { - name: name.to_string(), - status: Status::Healthy, - last_checked: current_time, - error: None, - details: ServiceDetails::RemoteProverTest(ProverTestDetails { + ServiceStatus::healthy( + name, + ServiceDetails::ProverTestResult(ProverTestDetails { test_duration_ms: duration.as_millis() as u64, proof_size_bytes: response_inner.payload.len(), success_count: *success_count, failure_count: *failure_count, proof_type: proof_type.clone(), }), - } + ) }, Err(e) => { *failure_count += 1; - ServiceStatus { - name: name.to_string(), - status: Status::Unhealthy, - last_checked: current_time, - error: Some(tonic_status_to_json(&e)), - details: ServiceDetails::RemoteProverTest(ProverTestDetails { + ServiceStatus::unhealthy( + name, + tonic_status_to_json(&e), + ServiceDetails::ProverTestResult(ProverTestDetails { test_duration_ms: 0, proof_size_bytes: 0, success_count: *success_count, failure_count: *failure_count, proof_type: proof_type.clone(), }), - } + ) }, } } diff --git a/bin/network-monitor/src/service_status.rs b/bin/network-monitor/src/service_status.rs new file mode 100644 index 0000000000..689b976bf7 --- /dev/null +++ b/bin/network-monitor/src/service_status.rs @@ -0,0 +1,387 @@ +//! Service status types and constructors for the network monitor. +//! +//! This module defines the data model for service health reporting: the [`ServiceStatus`] struct +//! with its builder methods, the [`ServiceDetails`] enum covering all monitored service types, +//! and the corresponding detail structs. + +use std::time::{Duration, SystemTime, UNIX_EPOCH}; + +use miden_node_proto::generated as proto; +use miden_node_proto::generated::rpc::{BlockProducerStatus, RpcStatus, StoreStatus}; +use serde::{Deserialize, Serialize}; + +use crate::faucet::FaucetTestDetails; +use crate::remote_prover::{ProofType, ProverTestDetails}; + +// STATUS +// ================================================================================================ + +#[derive(Debug, Clone, Serialize, Deserialize, PartialEq, Eq)] +pub enum Status { + Healthy, + Unhealthy, + Unknown, +} + +impl From for Status { + fn from(value: String) -> Self { + match value.as_str() { + "HEALTHY" | "connected" => Status::Healthy, + "UNHEALTHY" | "disconnected" => Status::Unhealthy, + _ => Status::Unknown, + } + } +} + +impl From for Status { + fn from(value: proto::remote_prover::WorkerHealthStatus) -> Self { + match value { + proto::remote_prover::WorkerHealthStatus::Unknown => Status::Unknown, + proto::remote_prover::WorkerHealthStatus::Healthy => Status::Healthy, + proto::remote_prover::WorkerHealthStatus::Unhealthy => Status::Unhealthy, + } + } +} + +// SERVICE STATUS +// ================================================================================================ + +/// Status of a service. +/// +/// This struct contains the status of a service, the last time it was checked, and any errors that +/// occurred. It also contains the details of the service, which is a union of the details of the +/// service. +#[derive(Debug, Clone, Serialize, Deserialize)] +pub struct ServiceStatus { + pub name: String, + pub status: Status, + pub last_checked: u64, + pub error: Option, + pub details: ServiceDetails, +} + +impl ServiceStatus { + /// Creates a healthy service status with the current timestamp. + pub fn healthy(name: impl Into, details: ServiceDetails) -> Self { + Self { + name: name.into(), + status: Status::Healthy, + last_checked: current_unix_timestamp_secs(), + error: None, + details, + } + } + + /// Creates an unhealthy service status with the current timestamp and an error message. + #[expect(clippy::needless_pass_by_value)] + pub fn unhealthy( + name: impl Into, + error: impl ToString, + details: ServiceDetails, + ) -> Self { + Self { + name: name.into(), + status: Status::Unhealthy, + last_checked: current_unix_timestamp_secs(), + error: Some(error.to_string()), + details, + } + } + + /// Creates an unknown service status with the current timestamp. + pub fn unknown(name: impl Into, details: ServiceDetails) -> Self { + Self { + name: name.into(), + status: Status::Unknown, + last_checked: current_unix_timestamp_secs(), + error: None, + details, + } + } + + /// Creates an unhealthy service status with [`ServiceDetails::Error`] details. + #[expect(clippy::needless_pass_by_value)] + pub fn error(name: impl Into, error: impl ToString) -> Self { + Self { + name: name.into(), + status: Status::Unhealthy, + last_checked: current_unix_timestamp_secs(), + error: Some(error.to_string()), + details: ServiceDetails::Error, + } + } + + /// Overrides the `last_checked` timestamp on an existing status. + /// + /// Useful when composing a new status from pre-existing data where we want to preserve the + /// original check timestamp instead of using the moment of construction. + #[must_use] + pub fn with_last_checked(mut self, ts: u64) -> Self { + self.last_checked = ts; + self + } +} + +// SERVICE DETAILS +// ================================================================================================ + +/// Details of a service. +#[derive(Debug, Clone, Serialize, Deserialize)] +pub enum ServiceDetails { + RpcStatus(RpcStatusDetails), + /// Remote prover status combined with its most recent test result. + RemoteProverStatus(RemoteProverDetails), + /// Internal: raw output of a remote prover status check task. + ProverStatusCheck(RemoteProverStatusDetails), + /// Internal: raw output of a remote prover test task. + ProverTestResult(ProverTestDetails), + FaucetTest(FaucetTestDetails), + NtxIncrement(IncrementDetails), + NtxTracking(CounterTrackingDetails), + ExplorerStatus(ExplorerStatusDetails), + NoteTransportStatus(NoteTransportStatusDetails), + ValidatorStatus(ValidatorStatusDetails), + Error, +} + +/// Remote prover status combined with its most recent test result. +#[derive(Debug, Clone, Serialize, Deserialize)] +pub struct RemoteProverDetails { + pub status: RemoteProverStatusDetails, + pub test: Option, + pub test_status: Option, + pub test_error: Option, +} + +/// Details of the increment service. +#[derive(Debug, Clone, Serialize, Deserialize, Default)] +pub struct IncrementDetails { + /// Number of successful counter increments. + pub success_count: u64, + /// Number of failed counter increments. + pub failure_count: u64, + /// Last transaction ID (if available). + pub last_tx_id: Option, + /// Last measured latency in blocks from submission to state update. + pub last_latency_blocks: Option, +} + +/// Details about an in-flight latency measurement. +#[derive(Debug, Clone, Serialize, Deserialize, Default)] +pub struct PendingLatencyDetails { + /// Block height returned when the transaction was submitted. + pub submit_height: u32, + /// Counter value we expect to see once the transaction is applied. + pub target_value: u64, +} + +/// Details of the counter tracking service. +#[derive(Debug, Clone, Serialize, Deserialize, Default)] +pub struct CounterTrackingDetails { + /// Current counter value observed on-chain (if available). + pub current_value: Option, + /// Expected counter value based on successful increments sent. + pub expected_value: Option, + /// Last time the counter value was successfully updated. + pub last_updated: Option, + /// Number of pending increments (expected - current). + pub pending_increments: Option, +} + +/// Details of the explorer service. +#[derive(Debug, Clone, Serialize, Deserialize, Default)] +pub struct ExplorerStatusDetails { + pub block_number: u64, + pub timestamp: u64, + pub number_of_transactions: u64, + pub number_of_nullifiers: u64, + pub number_of_notes: u64, + pub number_of_account_updates: u64, + pub block_commitment: String, + pub chain_commitment: String, + pub proof_commitment: String, +} + +/// Details of the note transport service. +#[derive(Debug, Clone, Serialize, Deserialize, Default)] +pub struct NoteTransportStatusDetails { + pub url: String, + pub serving_status: String, +} + +/// Details of the validator service. +#[derive(Debug, Clone, Serialize, Deserialize, Default)] +pub struct ValidatorStatusDetails { + pub url: String, + pub version: String, + pub chain_tip: u32, + pub validated_transactions_count: u64, + pub signed_blocks_count: u64, +} + +// RPC STATUS DETAILS +// ================================================================================================ + +/// Details of an RPC service. +#[derive(Debug, Clone, Serialize, Deserialize)] +pub struct RpcStatusDetails { + /// The URL of the RPC service (used by the frontend for gRPC-Web probing). + pub url: String, + pub version: String, + pub genesis_commitment: Option, + pub store_status: Option, + pub block_producer_status: Option, +} + +/// Details of a store service. +#[derive(Debug, Clone, Serialize, Deserialize)] +pub struct StoreStatusDetails { + pub version: String, + pub status: Status, + pub chain_tip: u32, +} + +/// Details of a block producer service. +#[derive(Debug, Clone, Serialize, Deserialize)] +pub struct BlockProducerStatusDetails { + pub version: String, + pub status: Status, + /// The block producer's current view of the chain tip height. + pub chain_tip: u32, + /// Mempool statistics for this block producer. + pub mempool: MempoolStatusDetails, +} + +/// Details about the block producer's mempool. +#[derive(Debug, Clone, Serialize, Deserialize)] +pub struct MempoolStatusDetails { + /// Number of transactions currently in the mempool waiting to be batched. + pub unbatched_transactions: u64, + /// Number of batches currently being proven. + pub proposed_batches: u64, + /// Number of proven batches waiting for block inclusion. + pub proven_batches: u64, +} + +// REMOTE PROVER STATUS DETAILS +// ================================================================================================ + +/// Details of a remote prover service. +#[derive(Debug, Clone, Serialize, Deserialize)] +pub struct RemoteProverStatusDetails { + pub url: String, + pub version: String, + pub supported_proof_type: ProofType, + pub workers: Vec, +} + +/// Details of a worker service. +#[derive(Debug, Clone, Serialize, Deserialize)] +pub struct WorkerStatusDetails { + pub name: String, + pub version: String, + pub status: Status, +} + +// NETWORK STATUS +// ================================================================================================ + +/// Status of the entire network, aggregating all service statuses. +#[derive(Debug, Clone, Serialize, Deserialize)] +pub struct NetworkStatus { + pub services: Vec, + pub last_updated: u64, + pub monitor_version: String, + pub network_name: String, +} + +// FROM IMPLEMENTATIONS +// ================================================================================================ + +impl From for StoreStatusDetails { + fn from(value: StoreStatus) -> Self { + Self { + version: value.version, + status: value.status.into(), + chain_tip: value.chain_tip, + } + } +} + +impl From for BlockProducerStatusDetails { + fn from(value: BlockProducerStatus) -> Self { + // We assume all supported nodes expose mempool statistics. + let mempool_stats = value + .mempool_stats + .expect("block producer status must include mempool statistics"); + + Self { + version: value.version, + status: value.status.into(), + chain_tip: value.chain_tip, + mempool: MempoolStatusDetails { + unbatched_transactions: mempool_stats.unbatched_transactions, + proposed_batches: mempool_stats.proposed_batches, + proven_batches: mempool_stats.proven_batches, + }, + } + } +} + +impl From for WorkerStatusDetails { + fn from(value: proto::remote_prover::ProxyWorkerStatus) -> Self { + let status = + proto::remote_prover::WorkerHealthStatus::try_from(value.status).unwrap().into(); + + Self { + name: value.name, + version: value.version, + status, + } + } +} + +impl RemoteProverStatusDetails { + pub fn from_proxy_status(status: proto::remote_prover::ProxyStatus, url: String) -> Self { + let proof_type = proto::remote_prover::ProofType::try_from(status.supported_proof_type) + .unwrap() + .into(); + + let workers: Vec = + status.workers.into_iter().map(WorkerStatusDetails::from).collect(); + + Self { + url, + version: status.version, + supported_proof_type: proof_type, + workers, + } + } +} + +impl RpcStatusDetails { + /// Creates `RpcStatusDetails` from a gRPC `RpcStatus` response and the configured URL. + pub fn from_rpc_status(status: RpcStatus, url: String) -> Self { + Self { + url, + version: status.version, + genesis_commitment: status.genesis_commitment.as_ref().map(|gc| format!("{gc:?}")), + store_status: status.store.map(StoreStatusDetails::from), + block_producer_status: status.block_producer.map(BlockProducerStatusDetails::from), + } + } +} + +// UTILITIES +// ================================================================================================ + +/// Gets the current Unix timestamp in seconds. +/// +/// This function is infallible - if the system time is somehow before Unix epoch +/// (extremely unlikely), it returns 0. +pub fn current_unix_timestamp_secs() -> u64 { + SystemTime::now() + .duration_since(UNIX_EPOCH) + .unwrap_or_else(|_| Duration::from_secs(0)) + .as_secs() +} diff --git a/bin/network-monitor/src/status.rs b/bin/network-monitor/src/status.rs index a15f4c3d10..d0a6724d89 100644 --- a/bin/network-monitor/src/status.rs +++ b/bin/network-monitor/src/status.rs @@ -1,7 +1,9 @@ -//! Network monitor status checker. +//! Network monitor status checker tasks. //! //! This module contains the logic for checking the status of network services. //! Individual status checker tasks send updates via watch channels to the web server. +//! +//! Type definitions live in [`crate::service_status`] and are re-exported here for convenience. use std::time::Duration; @@ -10,17 +12,13 @@ use miden_node_proto::clients::{ RemoteProverProxyStatusClient, RpcClient, }; -use miden_node_proto::generated as proto; -use miden_node_proto::generated::rpc::{BlockProducerStatus, RpcStatus, StoreStatus}; -use serde::{Deserialize, Serialize}; use tokio::sync::watch; use tokio::time::MissedTickBehavior; use tracing::{debug, info, instrument}; use url::Url; -use crate::faucet::FaucetTestDetails; -use crate::remote_prover::{ProofType, ProverTestDetails}; -use crate::{COMPONENT, current_unix_timestamp_secs}; +use crate::COMPONENT; +pub use crate::service_status::*; // STALE CHAIN TIP TRACKER // ================================================================================================ @@ -74,287 +72,6 @@ impl StaleChainTracker { } } -// STATUS -// ================================================================================================ - -#[derive(Debug, Clone, Serialize, Deserialize, PartialEq, Eq)] -pub enum Status { - Healthy, - Unhealthy, - Unknown, -} - -impl From for Status { - fn from(value: String) -> Self { - match value.as_str() { - "HEALTHY" | "connected" => Status::Healthy, - "UNHEALTHY" | "disconnected" => Status::Unhealthy, - _ => Status::Unknown, - } - } -} - -impl From for Status { - fn from(value: proto::remote_prover::WorkerHealthStatus) -> Self { - match value { - proto::remote_prover::WorkerHealthStatus::Unknown => Status::Unknown, - proto::remote_prover::WorkerHealthStatus::Healthy => Status::Healthy, - proto::remote_prover::WorkerHealthStatus::Unhealthy => Status::Unhealthy, - } - } -} - -// SERVICE STATUS -// ================================================================================================ - -/// Status of a service. -/// -/// This struct contains the status of a service, the last time it was checked, and any errors that -/// occurred. It also contains the details of the service, which is a union of the details of the -/// service. -#[derive(Debug, Clone, Serialize, Deserialize)] -pub struct ServiceStatus { - pub name: String, - pub status: Status, - pub last_checked: u64, - pub error: Option, - pub details: ServiceDetails, -} - -/// Details of the increment service. -#[derive(Debug, Clone, Serialize, Deserialize, Default)] -pub struct IncrementDetails { - /// Number of successful counter increments. - pub success_count: u64, - /// Number of failed counter increments. - pub failure_count: u64, - /// Last transaction ID (if available). - pub last_tx_id: Option, - /// Last measured latency in blocks from submission to state update. - pub last_latency_blocks: Option, -} - -/// Details about an in-flight latency measurement. -#[derive(Debug, Clone, Serialize, Deserialize, Default)] -pub struct PendingLatencyDetails { - /// Block height returned when the transaction was submitted. - pub submit_height: u32, - /// Counter value we expect to see once the transaction is applied. - pub target_value: u64, -} - -/// Details of the counter tracking service. -#[derive(Debug, Clone, Serialize, Deserialize, Default)] -pub struct CounterTrackingDetails { - /// Current counter value observed on-chain (if available). - pub current_value: Option, - /// Expected counter value based on successful increments sent. - pub expected_value: Option, - /// Last time the counter value was successfully updated. - pub last_updated: Option, - /// Number of pending increments (expected - current). - pub pending_increments: Option, -} - -/// Details of the explorer service. -#[derive(Debug, Clone, Serialize, Deserialize, Default)] -pub struct ExplorerStatusDetails { - pub block_number: u64, - pub timestamp: u64, - pub number_of_transactions: u64, - pub number_of_nullifiers: u64, - pub number_of_notes: u64, - pub number_of_account_updates: u64, - pub block_commitment: String, - pub chain_commitment: String, - pub proof_commitment: String, -} - -/// Details of the note transport service. -#[derive(Debug, Clone, Serialize, Deserialize, Default)] -pub struct NoteTransportStatusDetails { - pub url: String, - pub serving_status: String, -} - -/// Details of a service. -#[derive(Debug, Clone, Serialize, Deserialize)] -pub enum ServiceDetails { - RpcStatus(RpcStatusDetails), - RemoteProverStatus(RemoteProverStatusDetails), - RemoteProverTest(ProverTestDetails), - FaucetTest(FaucetTestDetails), - NtxIncrement(IncrementDetails), - NtxTracking(CounterTrackingDetails), - ExplorerStatus(ExplorerStatusDetails), - NoteTransportStatus(NoteTransportStatusDetails), - Error, -} - -/// Details of an RPC service. -/// -/// This struct contains the details of an RPC service, which is a union of the details of the RPC -/// service. -#[derive(Debug, Clone, Serialize, Deserialize)] -pub struct RpcStatusDetails { - /// The URL of the RPC service (used by the frontend for gRPC-Web probing). - pub url: String, - pub version: String, - pub genesis_commitment: Option, - pub store_status: Option, - pub block_producer_status: Option, -} - -/// Details of a store service. -/// -/// This struct contains the details of a store service, which is a union of the details of the -/// store service. -#[derive(Debug, Clone, Serialize, Deserialize)] -pub struct StoreStatusDetails { - pub version: String, - pub status: Status, - pub chain_tip: u32, -} - -/// Details of a block producer service. -/// -/// This struct contains the details of a block producer service, which is a union of the details -/// of the block producer service. -#[derive(Debug, Clone, Serialize, Deserialize)] -pub struct BlockProducerStatusDetails { - pub version: String, - pub status: Status, - /// The block producer's current view of the chain tip height. - pub chain_tip: u32, - /// Mempool statistics for this block producer. - pub mempool: MempoolStatusDetails, -} - -/// Details about the block producer's mempool. -#[derive(Debug, Clone, Serialize, Deserialize)] -pub struct MempoolStatusDetails { - /// Number of transactions currently in the mempool waiting to be batched. - pub unbatched_transactions: u64, - /// Number of batches currently being proven. - pub proposed_batches: u64, - /// Number of proven batches waiting for block inclusion. - pub proven_batches: u64, -} - -/// Details of a remote prover service. -/// -/// This struct contains the details of a remote prover service, which is a union of the details -/// of the remote prover service. -#[derive(Debug, Clone, Serialize, Deserialize)] -pub struct RemoteProverStatusDetails { - pub url: String, - pub version: String, - pub supported_proof_type: ProofType, - pub workers: Vec, -} - -/// Details of a worker service. -/// -/// This struct contains the details of a worker service, which is a union of the details of the -/// worker service. -#[derive(Debug, Clone, Serialize, Deserialize)] -pub struct WorkerStatusDetails { - pub name: String, - pub version: String, - pub status: Status, -} - -/// Status of a network. -/// -/// This struct contains the status of a network, which is a union of the status of the network. -#[derive(Debug, Clone, Serialize, Deserialize)] -pub struct NetworkStatus { - pub services: Vec, - pub last_updated: u64, - pub monitor_version: String, - pub network_name: String, -} - -// FROM IMPLEMENTATIONS -// ================================================================================================ - -/// From implementations for converting gRPC types to domain types -/// -/// This implementation converts a `StoreStatus` to a `StoreStatusDetails`. -impl From for StoreStatusDetails { - fn from(value: StoreStatus) -> Self { - Self { - version: value.version, - status: value.status.into(), - chain_tip: value.chain_tip, - } - } -} - -impl From for BlockProducerStatusDetails { - fn from(value: BlockProducerStatus) -> Self { - // We assume all supported nodes expose mempool statistics. - let mempool_stats = value - .mempool_stats - .expect("block producer status must include mempool statistics"); - - Self { - version: value.version, - status: value.status.into(), - chain_tip: value.chain_tip, - mempool: MempoolStatusDetails { - unbatched_transactions: mempool_stats.unbatched_transactions, - proposed_batches: mempool_stats.proposed_batches, - proven_batches: mempool_stats.proven_batches, - }, - } - } -} - -impl From for WorkerStatusDetails { - fn from(value: proto::remote_prover::ProxyWorkerStatus) -> Self { - let status = - proto::remote_prover::WorkerHealthStatus::try_from(value.status).unwrap().into(); - - Self { - name: value.name, - version: value.version, - status, - } - } -} - -impl RemoteProverStatusDetails { - pub fn from_proxy_status(status: proto::remote_prover::ProxyStatus, url: String) -> Self { - let proof_type = proto::remote_prover::ProofType::try_from(status.supported_proof_type) - .unwrap() - .into(); - - let workers: Vec = - status.workers.into_iter().map(WorkerStatusDetails::from).collect(); - - Self { - url, - version: status.version, - supported_proof_type: proof_type, - workers, - } - } -} - -impl RpcStatusDetails { - /// Creates `RpcStatusDetails` from a gRPC `RpcStatus` response and the configured URL. - pub fn from_rpc_status(status: RpcStatus, url: String) -> Self { - Self { - url, - version: status.version, - genesis_commitment: status.genesis_commitment.as_ref().map(|gc| format!("{gc:?}")), - store_status: status.store.map(StoreStatusDetails::from), - block_producer_status: status.block_producer.map(BlockProducerStatusDetails::from), - } - } -} - // RPC STATUS CHECKER // ================================================================================================ @@ -461,36 +178,22 @@ pub(crate) async fn check_rpc_status( stale_duration_secs = stale_duration, "Chain tip is stale" ); - return ServiceStatus { - name: "RPC".to_string(), - status: Status::Unhealthy, - last_checked: current_time, - error: Some(format!( + return ServiceStatus::unhealthy( + "RPC", + format!( "Chain tip {} has not changed for {} seconds", store_status.chain_tip, stale_duration - )), - details: ServiceDetails::RpcStatus(rpc_details), - }; + ), + ServiceDetails::RpcStatus(rpc_details), + ); } } - ServiceStatus { - name: "RPC".to_string(), - status: Status::Healthy, - last_checked: current_time, - error: None, - details: ServiceDetails::RpcStatus(rpc_details), - } + ServiceStatus::healthy("RPC", ServiceDetails::RpcStatus(rpc_details)) }, Err(e) => { debug!(target: COMPONENT, error = %e, "RPC status check failed"); - ServiceStatus { - name: "RPC".to_string(), - status: Status::Unhealthy, - last_checked: current_time, - error: Some(e.to_string()), - details: ServiceDetails::Error, - } + ServiceStatus::error("RPC", e) }, } } @@ -537,15 +240,8 @@ pub async fn run_remote_prover_status_task( loop { interval.tick().await; - let current_time = current_unix_timestamp_secs(); - - let status = check_remote_prover_status( - &mut remote_prover, - name.clone(), - url_str.clone(), - current_time, - ) - .await; + let status = + check_remote_prover_status(&mut remote_prover, name.clone(), url_str.clone()).await; // Send the status update; exit if no receivers (shutdown signal) if status_sender.send(status).is_err() { @@ -564,7 +260,6 @@ pub async fn run_remote_prover_status_task( /// * `remote_prover` - The remote prover client. /// * `name` - The name of the remote prover. /// * `url` - The URL of the remote prover. -/// * `current_time` - The current time. /// /// # Returns /// @@ -581,7 +276,6 @@ pub(crate) async fn check_remote_prover_status( remote_prover: &mut miden_node_proto::clients::RemoteProverProxyStatusClient, display_name: String, url: String, - current_time: u64, ) -> ServiceStatus { match remote_prover.status(()).await { Ok(response) => { @@ -592,31 +286,32 @@ pub(crate) async fn check_remote_prover_status( // Determine overall health based on worker statuses. // All workers must be healthy for the prover to be considered healthy. - let overall_health = if remote_prover_details.workers.is_empty() { - Status::Unknown - } else if remote_prover_details.workers.iter().all(|w| w.status == Status::Healthy) { - Status::Healthy + let no_workers = remote_prover_details.workers.is_empty(); + let all_healthy = + remote_prover_details.workers.iter().all(|w| w.status == Status::Healthy); + let unhealthy_worker_names: Vec<_> = remote_prover_details + .workers + .iter() + .filter(|w| w.status != Status::Healthy) + .map(|w| w.name.clone()) + .collect(); + let details = ServiceDetails::ProverStatusCheck(remote_prover_details); + + if no_workers { + ServiceStatus::unknown(display_name, details) + } else if all_healthy { + ServiceStatus::healthy(display_name, details) } else { - Status::Unhealthy - }; - - ServiceStatus { - name: display_name.clone(), - status: overall_health, - last_checked: current_time, - error: None, - details: ServiceDetails::RemoteProverStatus(remote_prover_details), + ServiceStatus::unhealthy( + display_name, + format!("unhealthy workers: {}", unhealthy_worker_names.join(", ")), + details, + ) } }, Err(e) => { debug!(target: COMPONENT, prover_name = %display_name, error = %e, "Remote prover status check failed"); - ServiceStatus { - name: display_name, - status: Status::Unhealthy, - last_checked: current_time, - error: Some(e.to_string()), - details: ServiceDetails::Error, - } + ServiceStatus::error(display_name, e) }, } } diff --git a/bin/network-monitor/src/validator.rs b/bin/network-monitor/src/validator.rs new file mode 100644 index 0000000000..2376ebd475 --- /dev/null +++ b/bin/network-monitor/src/validator.rs @@ -0,0 +1,83 @@ +// VALIDATOR STATUS CHECKER +// ================================================================================================ + +use std::time::Duration; + +use miden_node_proto::clients::{Builder as ClientBuilder, ValidatorClient}; +use tokio::sync::watch; +use tokio::time::MissedTickBehavior; +use tracing::{info, instrument}; +use url::Url; + +use crate::COMPONENT; +use crate::status::{ServiceDetails, ServiceStatus, ValidatorStatusDetails}; + +/// Runs a task that continuously checks validator status and updates a watch channel. +pub async fn run_validator_status_task( + url: Url, + name: String, + status_sender: watch::Sender, + status_check_interval: Duration, + request_timeout: Duration, +) { + let mut validator = ClientBuilder::new(url.clone()) + .with_tls() + .expect("TLS is enabled") + .with_timeout(request_timeout) + .without_metadata_version() + .without_metadata_genesis() + .without_otel_context_injection() + .connect_lazy::(); + + let mut interval = tokio::time::interval(status_check_interval); + interval.set_missed_tick_behavior(MissedTickBehavior::Skip); + + loop { + interval.tick().await; + + let status = check_validator_status(&mut validator, &url, name.clone()).await; + + if status_sender.send(status).is_err() { + info!("No receivers for validator status updates, shutting down"); + return; + } + } +} + +/// Checks the status of the validator service via its gRPC Status endpoint. +#[instrument( + target = COMPONENT, + name = "check-status.validator", + skip_all, + ret(level = "info") +)] +pub(crate) async fn check_validator_status( + validator: &mut ValidatorClient, + url: &Url, + name: String, +) -> ServiceStatus { + match validator.status(()).await { + Ok(response) => { + let status = response.into_inner(); + + ServiceStatus::healthy( + name, + ServiceDetails::ValidatorStatus(ValidatorStatusDetails { + url: url.to_string(), + version: status.version, + chain_tip: status.chain_tip, + validated_transactions_count: status.validated_transactions_count, + signed_blocks_count: status.signed_blocks_count, + }), + ) + }, + Err(e) => ServiceStatus::error(name, e), + } +} + +pub(crate) fn initial_validator_status() -> ServiceStatus { + ServiceStatus::unknown( + "Validator", + ServiceDetails::ValidatorStatus(ValidatorStatusDetails::default()), + ) +} diff --git a/bin/node/.env b/bin/node/.env index 51a04794f9..51cfaa3f1e 100644 --- a/bin/node/.env +++ b/bin/node/.env @@ -1,8 +1,7 @@ # For more info use -h on the relevant commands: -# miden-node bundled start -h +# miden-node start -h MIDEN_NODE_BLOCK_PRODUCER_URL= MIDEN_NODE_VALIDATOR_URL= -MIDEN_NODE_NTX_BUILDER_URL= MIDEN_NODE_BATCH_PROVER_URL= MIDEN_NODE_BLOCK_PROVER_URL= MIDEN_NODE_NTX_PROVER_URL= diff --git a/bin/node/Dockerfile b/bin/node/Dockerfile index 04cb6783cd..5986451a28 100644 --- a/bin/node/Dockerfile +++ b/bin/node/Dockerfile @@ -28,7 +28,7 @@ COPY . . RUN cargo build --release --locked --bin miden-node # Base line runtime image with runtime dependencies installed. -FROM debian:bullseye-slim AS runtime-base +FROM debian:bookworm-slim AS runtime-base RUN apt-get update && \ apt-get -y upgrade && \ apt-get install -y --no-install-recommends sqlite3 \ diff --git a/bin/node/src/commands/bundled.rs b/bin/node/src/commands/bundled.rs deleted file mode 100644 index 2b4c69dd30..0000000000 --- a/bin/node/src/commands/bundled.rs +++ /dev/null @@ -1,391 +0,0 @@ -use std::collections::HashMap; -use std::num::NonZeroUsize; -use std::path::PathBuf; - -use anyhow::Context; -use miden_node_block_producer::BlockProducer; -use miden_node_rpc::Rpc; -use miden_node_store::{DEFAULT_MAX_CONCURRENT_PROOFS, Store}; -use miden_node_utils::clap::{GrpcOptionsExternal, StorageOptions}; -use miden_node_utils::grpc::UrlExt; -use miden_node_validator::{Validator, ValidatorSigner}; -use miden_protocol::crypto::dsa::ecdsa_k256_keccak::SecretKey; -use miden_protocol::utils::serde::Deserializable; -use tokio::net::TcpListener; -use tokio::task::JoinSet; -use url::Url; - -use super::{ENV_DATA_DIRECTORY, ENV_RPC_URL}; -use crate::commands::{ - BlockProducerConfig, - BundledValidatorConfig, - ENV_BLOCK_PROVER_URL, - ENV_ENABLE_OTEL, - ENV_GENESIS_CONFIG_FILE, - NtxBuilderConfig, - ValidatorKey, -}; - -#[derive(clap::Subcommand)] -#[expect(clippy::large_enum_variant, reason = "This is a single use enum")] -pub enum BundledCommand { - /// Bootstraps the blockchain database with the genesis block. - /// - /// The genesis block contains a single public faucet account. The private key for this - /// account is written to the `accounts-directory` which can be used to control the account. - /// - /// This key is not required by the node and can be moved. - Bootstrap { - /// Directory in which to store the database and raw block data. - #[arg(long, env = ENV_DATA_DIRECTORY, value_name = "DIR")] - data_directory: PathBuf, - // Directory to write the account data to. - #[arg(long, value_name = "DIR")] - accounts_directory: PathBuf, - /// Constructs the genesis block from the given toml file. - #[arg(long, env = ENV_GENESIS_CONFIG_FILE, value_name = "FILE")] - genesis_config_file: Option, - /// Configuration for the Validator key used to sign genesis block. - #[command(flatten)] - validator_key: ValidatorKey, - }, - - /// Runs all three node components in the same process. - /// - /// The internal gRPC endpoints for the store and block-producer will each be assigned a random - /// open port on localhost (127.0.0.1:0). - Start { - /// Url at which to serve the RPC component's gRPC API. - #[arg(long = "rpc.url", env = ENV_RPC_URL, value_name = "URL")] - rpc_url: Url, - - /// The remote block prover's gRPC url. If not provided, a local block prover will be used. - #[arg(long = "block-prover.url", env = ENV_BLOCK_PROVER_URL, value_name = "URL")] - block_prover_url: Option, - - /// Directory in which the Store component should store the database and raw block data. - #[arg(long = "data-directory", env = ENV_DATA_DIRECTORY, value_name = "DIR")] - data_directory: PathBuf, - - #[command(flatten)] - block_producer: BlockProducerConfig, - - #[command(flatten)] - ntx_builder: NtxBuilderConfig, - - #[command(flatten)] - validator: BundledValidatorConfig, - - /// Enables the exporting of traces for OpenTelemetry. - /// - /// This can be further configured using environment variables as defined in the official - /// OpenTelemetry documentation. See our operator manual for further details. - #[arg(long = "enable-otel", default_value_t = false, env = ENV_ENABLE_OTEL, value_name = "BOOL")] - enable_otel: bool, - - /// Maximum number of concurrent block proofs to be scheduled. - #[arg( - long = "max-concurrent-proofs", - default_value_t = DEFAULT_MAX_CONCURRENT_PROOFS, - value_name = "NUM" - )] - max_concurrent_proofs: NonZeroUsize, - - #[command(flatten)] - grpc_options: GrpcOptionsExternal, - - #[command(flatten)] - storage_options: StorageOptions, - }, -} - -impl BundledCommand { - pub async fn handle(self) -> anyhow::Result<()> { - match self { - BundledCommand::Bootstrap { - data_directory, - accounts_directory, - genesis_config_file, - validator_key, - } => { - // Run validator bootstrap to create genesis block + account files. - crate::commands::validator::ValidatorCommand::bootstrap_genesis( - &data_directory, - &accounts_directory, - &data_directory, - genesis_config_file.as_ref(), - validator_key, - ) - .await - .context("failed to bootstrap genesis block")?; - - // Feed the genesis block file into the store bootstrap. - let genesis_block_path = - data_directory.join(crate::commands::validator::GENESIS_BLOCK_FILENAME); - crate::commands::store::bootstrap_store(&data_directory, &genesis_block_path) - .context("failed to bootstrap the store component") - }, - BundledCommand::Start { - rpc_url, - block_prover_url, - data_directory, - block_producer, - ntx_builder, - validator, - enable_otel: _, - grpc_options, - max_concurrent_proofs, - storage_options, - } => { - Self::start( - rpc_url, - block_prover_url, - data_directory, - block_producer, - ntx_builder, - validator, - grpc_options, - max_concurrent_proofs, - storage_options, - ) - .await - }, - } - } - - #[expect(clippy::too_many_lines, clippy::too_many_arguments)] - async fn start( - rpc_url: Url, - block_prover_url: Option, - data_directory: PathBuf, - block_producer: BlockProducerConfig, - ntx_builder: NtxBuilderConfig, - validator: BundledValidatorConfig, - grpc_options: GrpcOptionsExternal, - max_concurrent_proofs: NonZeroUsize, - storage_options: StorageOptions, - ) -> anyhow::Result<()> { - // Start listening on all gRPC urls so that inter-component connections can be created - // before each component is fully started up. - // - // This is required because `tonic` does not handle retries nor reconnections and our - // services expect to be able to connect on startup. - let grpc_rpc = rpc_url.to_socket().context("Failed to to RPC gRPC socket")?; - let grpc_rpc = TcpListener::bind(grpc_rpc) - .await - .context("Failed to bind to RPC gRPC endpoint")?; - - let (block_producer_url, block_producer_address) = { - let socket_addr = TcpListener::bind("127.0.0.1:0") - .await - .context("Failed to bind to block-producer gRPC endpoint")? - .local_addr() - .context("Failed to retrieve the block-producer's gRPC address")?; - let url = Url::parse(&format!("http://{socket_addr}")) - .context("Failed to parse Block Producer URL")?; - (url, socket_addr) - }; - - // Validator URL is either specified remote, or generated local. - let (validator_url, validator_socket_address) = validator.to_addresses().await?; - - // Store addresses for each exposed API - let store_rpc_listener = TcpListener::bind("127.0.0.1:0") - .await - .context("Failed to bind to store RPC gRPC endpoint")?; - let store_ntx_builder_listener = TcpListener::bind("127.0.0.1:0") - .await - .context("Failed to bind to store ntx-builder gRPC endpoint")?; - let store_block_producer_listener = TcpListener::bind("127.0.0.1:0") - .await - .context("Failed to bind to store block-producer gRPC endpoint")?; - let store_rpc_address = store_rpc_listener - .local_addr() - .context("Failed to retrieve the store's RPC gRPC address")?; - let store_block_producer_address = store_block_producer_listener - .local_addr() - .context("Failed to retrieve the store's block-producer gRPC address")?; - let store_ntx_builder_address = store_ntx_builder_listener - .local_addr() - .context("Failed to retrieve the store's ntx-builder gRPC address")?; - - let mut join_set = JoinSet::new(); - // Start store. The store endpoint is available after loading completes. - let data_directory_clone = data_directory.clone(); - let store_id = join_set - .spawn(async move { - Store { - rpc_listener: store_rpc_listener, - block_producer_listener: store_block_producer_listener, - ntx_builder_listener: store_ntx_builder_listener, - data_directory: data_directory_clone, - block_prover_url, - grpc_options: grpc_options.into(), - max_concurrent_proofs, - storage_options, - } - .serve() - .await - .context("failed while serving store component") - }) - .id(); - - let should_start_ntx_builder = !ntx_builder.disabled; - - // Start block-producer. The block-producer's endpoint is available after loading completes. - let block_producer_id = { - let validator_url = validator_url.clone(); - join_set - .spawn({ - let store_url = Url::parse(&format!("http://{store_block_producer_address}")) - .context("Failed to parse URL")?; - async move { - BlockProducer { - block_producer_address, - store_url, - validator_url, - batch_prover_url: block_producer.batch_prover_url, - batch_interval: block_producer.batch_interval, - block_interval: block_producer.block_interval, - max_batches_per_block: block_producer.max_batches_per_block, - max_txs_per_batch: block_producer.max_txs_per_batch, - grpc_options: grpc_options.into(), - mempool_tx_capacity: block_producer.mempool_tx_capacity, - } - .serve() - .await - .context("failed while serving block-producer component") - } - }) - .id() - }; - - // Prepare network transaction builder (bind listener + config before starting RPC, - // so that the ntx-builder URL is available for the RPC proxy). - let mut ntx_builder_url_for_rpc = None; - let ntx_builder_prepared = if should_start_ntx_builder { - let store_ntx_builder_url = Url::parse(&format!("http://{store_ntx_builder_address}")) - .context("Failed to parse URL")?; - let block_producer_url = block_producer_url.clone(); - let validator_url = validator_url.clone(); - - let builder_config = ntx_builder.into_builder_config( - store_ntx_builder_url, - block_producer_url, - validator_url, - &data_directory, - ); - - // Bind a listener for the ntx-builder gRPC server. - let ntx_builder_listener = TcpListener::bind("127.0.0.1:0") - .await - .context("Failed to bind to ntx-builder gRPC endpoint")?; - let ntx_builder_address = ntx_builder_listener - .local_addr() - .context("Failed to retrieve the ntx-builder's gRPC address")?; - ntx_builder_url_for_rpc = Some( - Url::parse(&format!("http://{ntx_builder_address}")) - .context("Failed to parse ntx-builder URL")?, - ); - - Some((builder_config, ntx_builder_listener)) - } else { - None - }; - - // Start RPC component. - let rpc_id = { - let block_producer_url = block_producer_url.clone(); - let validator_url = validator_url.clone(); - let ntx_builder_url = ntx_builder_url_for_rpc; - join_set - .spawn(async move { - let store_url = Url::parse(&format!("http://{store_rpc_address}")) - .context("Failed to parse URL")?; - Rpc { - listener: grpc_rpc, - store_url, - block_producer_url: Some(block_producer_url), - validator_url, - ntx_builder_url, - grpc_options, - } - .serve() - .await - .context("failed while serving RPC component") - }) - .id() - }; - - // Lookup table so we can identify the failed component. - let mut component_ids = HashMap::from([ - (store_id, "store"), - (block_producer_id, "block-producer"), - (rpc_id, "rpc"), - ]); - - // Start network transaction builder. - if let Some((builder_config, ntx_builder_listener)) = ntx_builder_prepared { - let id = join_set - .spawn(async move { - builder_config - .build() - .await - .context("failed to initialize ntx builder")? - .run(Some(ntx_builder_listener)) - .await - .context("failed while serving ntx builder component") - }) - .id(); - component_ids.insert(id, "ntx-builder"); - } - - // Start the Validator if we have bound a socket. - if let Some(address) = validator_socket_address { - let secret_key_bytes = hex::decode(validator.validator_key)?; - let signer = SecretKey::read_from_bytes(&secret_key_bytes)?; - let signer = ValidatorSigner::new_local(signer); - let id = join_set - .spawn({ - async move { - Validator { - address, - grpc_options: grpc_options.into(), - signer, - data_directory, - } - .serve() - .await - .context("failed while serving validator component") - } - }) - .id(); - component_ids.insert(id, "validator"); - } - - // SAFETY: The joinset is definitely not empty. - let component_result = join_set.join_next_with_id().await.unwrap(); - - // We expect components to run indefinitely, so we treat any return as fatal. - // - // Map all outcomes to an error, and provide component context. - let (id, err) = match component_result { - Ok((id, Ok(_))) => (id, Err(anyhow::anyhow!("Component completed unexpectedly"))), - Ok((id, Err(err))) => (id, Err(err)), - Err(join_err) => (join_err.id(), Err(join_err).context("Joining component task")), - }; - let component = component_ids.get(&id).unwrap_or(&"unknown"); - - // We could abort and gracefully shutdown the other components, but since we're crashing the - // node there is no point. - err.context(format!("Component {component} failed")) - } - - pub fn is_open_telemetry_enabled(&self) -> bool { - if let Self::Start { enable_otel, .. } = self { - *enable_otel - } else { - false - } - } -} diff --git a/bin/node/src/commands/mod.rs b/bin/node/src/commands/mod.rs index bdc5e29df3..4297bc594f 100644 --- a/bin/node/src/commands/mod.rs +++ b/bin/node/src/commands/mod.rs @@ -1,9 +1,6 @@ -use std::net::SocketAddr; use std::num::NonZeroUsize; -use std::path::{Path, PathBuf}; use std::time::Duration; -use anyhow::Context; use miden_node_block_producer::{ DEFAULT_BATCH_INTERVAL, DEFAULT_BLOCK_INTERVAL, @@ -14,11 +11,10 @@ use miden_node_utils::clap::duration_to_human_readable_string; use miden_node_validator::ValidatorSigner; use miden_protocol::crypto::dsa::ecdsa_k256_keccak::SecretKey; use miden_protocol::utils::serde::Deserializable; -use tokio::net::TcpListener; use url::Url; pub mod block_producer; -pub mod bundled; +pub mod ntx_builder; pub mod rpc; pub mod store; pub mod validator; @@ -50,7 +46,6 @@ const ENV_NTX_DATA_DIRECTORY: &str = "MIDEN_NODE_NTX_DATA_DIRECTORY"; const ENV_NTX_BUILDER_URL: &str = "MIDEN_NODE_NTX_BUILDER_URL"; const ENV_NTX_MAX_CYCLES: &str = "MIDEN_NTX_MAX_CYCLES"; -const DEFAULT_NTX_TICKER_INTERVAL: Duration = Duration::from_millis(200); const DEFAULT_NTX_IDLE_TIMEOUT: Duration = Duration::from_secs(5 * 60); const DEFAULT_NTX_SCRIPT_CACHE_SIZE: NonZeroUsize = NonZeroUsize::new(1000).unwrap(); const DEFAULT_NTX_MAX_CYCLES: u32 = 1 << 18; @@ -101,149 +96,6 @@ impl ValidatorKey { } } -/// Configuration for the Validator component when run in the bundled mode. -#[derive(clap::Args)] -pub struct BundledValidatorConfig { - /// Insecure, hex-encoded validator secret key for development and testing purposes. - /// Only used when the Validator URL argument is not set. - #[arg( - long = "validator.key", - env = ENV_VALIDATOR_KEY, - value_name = "VALIDATOR_KEY", - default_value = INSECURE_VALIDATOR_KEY_HEX - )] - validator_key: String, - - /// The remote Validator's gRPC URL. If unset, will default to running a Validator - /// in-process. If set, the insecure key argument is ignored. - #[arg(long = "validator.url", env = ENV_VALIDATOR_URL, value_name = "URL")] - validator_url: Option, -} - -impl BundledValidatorConfig { - /// Converts the [`BundledValidatorConfig`] into a URL and an optional [`SocketAddr`]. - /// - /// If the `validator_url` is set, it returns the URL and `None` for the [`SocketAddr`]. - /// - /// If `validator_url` is not set, it binds to a random port on localhost, creates a URL, - /// and returns the URL and the bound [`SocketAddr`]. - async fn to_addresses(&self) -> anyhow::Result<(Url, Option)> { - if let Some(url) = &self.validator_url { - Ok((url.clone(), None)) - } else { - let socket_addr = TcpListener::bind("127.0.0.1:0") - .await - .context("Failed to bind to validator gRPC endpoint")? - .local_addr() - .context("Failed to retrieve the validator's gRPC address")?; - let url = Url::parse(&format!("http://{socket_addr}")) - .context("Failed to parse Validator URL")?; - Ok((url, Some(socket_addr))) - } - } -} - -/// Configuration for the Network Transaction Builder component. -#[derive(clap::Args)] -pub struct NtxBuilderConfig { - /// Disable spawning the network transaction builder. - #[arg(long = "no-ntx-builder", default_value_t = false)] - pub disabled: bool, - - /// The remote transaction prover's gRPC url, used for the ntx builder. If unset, - /// will default to running a prover in-process which is expensive. - #[arg(long = "tx-prover.url", env = ENV_NTX_PROVER_URL, value_name = "URL")] - pub tx_prover_url: Option, - - /// Interval at which to run the network transaction builder's ticker. - #[arg( - long = "ntx-builder.interval", - default_value = &duration_to_human_readable_string(DEFAULT_NTX_TICKER_INTERVAL), - value_parser = humantime::parse_duration, - value_name = "DURATION" - )] - pub ticker_interval: Duration, - - /// Number of note scripts to cache locally. - /// - /// Note scripts not in cache must first be retrieved from the store. - #[arg( - long = "ntx-builder.script-cache-size", - env = ENV_NTX_SCRIPT_CACHE_SIZE, - value_name = "NUM", - default_value_t = DEFAULT_NTX_SCRIPT_CACHE_SIZE - )] - pub script_cache_size: NonZeroUsize, - - /// Duration after which an idle network account will deactivate. - /// - /// An account is considered idle once it has no viable notes to consume. - /// A deactivated account will reactivate if targeted with new notes. - #[arg( - long = "ntx-builder.idle-timeout", - default_value = &duration_to_human_readable_string(DEFAULT_NTX_IDLE_TIMEOUT), - value_parser = humantime::parse_duration, - value_name = "DURATION" - )] - pub idle_timeout: Duration, - - /// Maximum number of crashes before an account deactivated. - /// - /// Once this limit is reached, no new transactions will be created for this account. - #[arg( - long = "ntx-builder.max-account-crashes", - default_value_t = 10, - value_name = "NUM" - )] - pub max_account_crashes: usize, - - /// Maximum number of VM execution cycles allowed for a single network transaction. - /// - /// Network transactions that exceed this limit will fail. Defaults to 2^18 (262.144) cycles. - #[arg( - long = "ntx-builder.max-cycles", - env = ENV_NTX_MAX_CYCLES, - default_value_t = DEFAULT_NTX_MAX_CYCLES, - value_name = "NUM", - )] - pub max_tx_cycles: u32, - - /// Directory for the ntx-builder's persistent database. - /// - /// If not set, defaults to the node's data directory. - #[arg(long = "ntx-builder.data-directory", env = ENV_NTX_DATA_DIRECTORY, value_name = "DIR")] - pub ntx_data_directory: Option, -} - -impl NtxBuilderConfig { - /// Converts this CLI config into the ntx-builder's internal config. - /// - /// The `node_data_directory` is used as the default location for the ntx-builder's database - /// if `--ntx-builder.data-directory` is not explicitly set. - pub fn into_builder_config( - self, - store_url: Url, - block_producer_url: Url, - validator_url: Url, - node_data_directory: &Path, - ) -> miden_node_ntx_builder::NtxBuilderConfig { - let data_dir = self.ntx_data_directory.unwrap_or_else(|| node_data_directory.to_path_buf()); - let database_filepath = data_dir.join("ntx-builder.sqlite3"); - - miden_node_ntx_builder::NtxBuilderConfig::new( - store_url, - block_producer_url, - validator_url, - database_filepath, - ) - .with_tx_prover_url(self.tx_prover_url) - .with_script_cache_size(self.script_cache_size) - .with_idle_timeout(self.idle_timeout) - .with_max_account_crashes(self.max_account_crashes) - .with_max_cycles(self.max_tx_cycles) - } -} - /// Configuration for the Block Producer component #[derive(clap::Args)] pub struct BlockProducerConfig { diff --git a/bin/node/src/commands/ntx_builder.rs b/bin/node/src/commands/ntx_builder.rs new file mode 100644 index 0000000000..c9279668e3 --- /dev/null +++ b/bin/node/src/commands/ntx_builder.rs @@ -0,0 +1,161 @@ +use std::num::NonZeroUsize; +use std::path::PathBuf; +use std::time::Duration; + +use anyhow::Context; +use miden_node_utils::clap::duration_to_human_readable_string; +use miden_node_utils::grpc::UrlExt; +use tokio::net::TcpListener; +use url::Url; + +use super::{ + DEFAULT_NTX_IDLE_TIMEOUT, + DEFAULT_NTX_MAX_CYCLES, + DEFAULT_NTX_SCRIPT_CACHE_SIZE, + ENV_BLOCK_PRODUCER_URL, + ENV_ENABLE_OTEL, + ENV_NTX_DATA_DIRECTORY, + ENV_NTX_MAX_CYCLES, + ENV_NTX_PROVER_URL, + ENV_NTX_SCRIPT_CACHE_SIZE, + ENV_STORE_NTX_BUILDER_URL, + ENV_VALIDATOR_URL, +}; +use crate::commands::ENV_NTX_BUILDER_URL; + +#[derive(clap::Subcommand)] +pub enum NtxBuilderCommand { + /// Starts the network transaction builder component. + Start { + /// Url at which to serve the ntx-builder's gRPC API. + #[arg(long = "url", env = ENV_NTX_BUILDER_URL, value_name = "URL")] + url: Option, + + /// The store's ntx-builder service gRPC url. + #[arg(long = "store.url", env = ENV_STORE_NTX_BUILDER_URL, value_name = "URL")] + store_url: Url, + + /// The block-producer's gRPC url. + #[arg(long = "block-producer.url", env = ENV_BLOCK_PRODUCER_URL, value_name = "URL")] + block_producer_url: Url, + + /// The validator's gRPC url. + #[arg(long = "validator.url", env = ENV_VALIDATOR_URL, value_name = "URL")] + validator_url: Url, + + /// The remote transaction prover's gRPC url. If unset, will default to running a + /// prover in-process which is expensive. + #[arg(long = "tx-prover.url", env = ENV_NTX_PROVER_URL, value_name = "URL")] + tx_prover_url: Option, + + /// Number of note scripts to cache locally. + /// + /// Note scripts not in cache must first be retrieved from the store. + #[arg( + long = "script-cache-size", + env = ENV_NTX_SCRIPT_CACHE_SIZE, + value_name = "NUM", + default_value_t = DEFAULT_NTX_SCRIPT_CACHE_SIZE + )] + script_cache_size: NonZeroUsize, + + /// Duration after which an idle network account will deactivate. + /// + /// An account is considered idle once it has no viable notes to consume. + /// A deactivated account will reactivate if targeted with new notes. + #[arg( + long = "idle-timeout", + default_value = &duration_to_human_readable_string(DEFAULT_NTX_IDLE_TIMEOUT), + value_parser = humantime::parse_duration, + value_name = "DURATION" + )] + idle_timeout: Duration, + + /// Maximum number of crashes before an account deactivated. + /// + /// Once this limit is reached, no new transactions will be created for this account. + #[arg(long = "max-account-crashes", default_value_t = 10, value_name = "NUM")] + max_account_crashes: usize, + + /// Maximum number of VM execution cycles allowed for a single network transaction. + /// + /// Network transactions that exceed this limit will fail. Defaults to 2^18 (262.144) + /// cycles. + #[arg( + long = "max-cycles", + env = ENV_NTX_MAX_CYCLES, + default_value_t = DEFAULT_NTX_MAX_CYCLES, + value_name = "NUM", + )] + max_tx_cycles: u32, + + /// Directory for the ntx-builder's persistent database. + #[arg(long = "data-directory", env = ENV_NTX_DATA_DIRECTORY, value_name = "DIR")] + data_directory: PathBuf, + + /// Enables the exporting of traces for OpenTelemetry. + /// + /// This can be further configured using environment variables as defined in the official + /// OpenTelemetry documentation. See our operator manual for further details. + #[arg(long = "enable-otel", default_value_t = false, env = ENV_ENABLE_OTEL, value_name = "BOOL")] + enable_otel: bool, + }, +} + +impl NtxBuilderCommand { + pub async fn handle(self) -> anyhow::Result<()> { + let Self::Start { + url, + store_url, + block_producer_url, + validator_url, + tx_prover_url, + script_cache_size, + idle_timeout, + max_account_crashes, + max_tx_cycles, + data_directory, + enable_otel: _, + } = self; + + let listener = if let Some(url) = url { + let addr = url + .to_socket() + .context("Failed to extract socket address from ntx-builder URL")?; + Some( + TcpListener::bind(addr) + .await + .context("Failed to bind to ntx-builder's gRPC URL")?, + ) + } else { + None + }; + + let database_filepath = data_directory.join("ntx-builder.sqlite3"); + + let config = miden_node_ntx_builder::NtxBuilderConfig::new( + store_url, + block_producer_url, + validator_url, + database_filepath, + ) + .with_tx_prover_url(tx_prover_url) + .with_script_cache_size(script_cache_size) + .with_idle_timeout(idle_timeout) + .with_max_account_crashes(max_account_crashes) + .with_max_cycles(max_tx_cycles); + + config + .build() + .await + .context("failed to initialize ntx builder")? + .run(listener) + .await + .context("failed while running ntx builder component") + } + + pub fn is_open_telemetry_enabled(&self) -> bool { + let Self::Start { enable_otel, .. } = self; + *enable_otel + } +} diff --git a/bin/node/src/commands/rpc.rs b/bin/node/src/commands/rpc.rs index a0ecd455d9..37607d1c90 100644 --- a/bin/node/src/commands/rpc.rs +++ b/bin/node/src/commands/rpc.rs @@ -34,8 +34,7 @@ pub enum RpcCommand { #[arg(long = "validator.url", env = ENV_VALIDATOR_URL, value_name = "URL")] validator_url: Url, - /// The network transaction builder's gRPC url. If unset, the `GetNoteError` endpoint - /// will be unavailable. + /// The network transaction builder's gRPC url. #[arg(long = "ntx-builder.url", env = ENV_NTX_BUILDER_URL, value_name = "URL")] ntx_builder_url: Option, diff --git a/bin/node/src/commands/store.rs b/bin/node/src/commands/store.rs index 7fb62278dc..d3ea6038fc 100644 --- a/bin/node/src/commands/store.rs +++ b/bin/node/src/commands/store.rs @@ -7,7 +7,7 @@ use miden_node_store::{DEFAULT_MAX_CONCURRENT_PROOFS, Store}; use miden_node_utils::clap::{GrpcOptionsInternal, StorageOptions}; use miden_node_utils::fs::ensure_empty_directory; use miden_node_utils::grpc::UrlExt; -use miden_protocol::block::ProvenBlock; +use miden_protocol::block::SignedBlock; use miden_protocol::utils::serde::Deserializable; use url::Url; @@ -175,10 +175,10 @@ impl StoreCommand { pub fn bootstrap_store(data_directory: &Path, genesis_block_path: &Path) -> anyhow::Result<()> { // Read and deserialize the genesis block file. let bytes = fs_err::read(genesis_block_path).context("failed to read genesis block")?; - let proven_block = ProvenBlock::read_from_bytes(&bytes) + let signed_block = SignedBlock::read_from_bytes(&bytes) .context("failed to deserialize genesis block from file")?; let genesis_block = - GenesisBlock::try_from(proven_block).context("genesis block validation failed")?; + GenesisBlock::try_from(signed_block).context("genesis block validation failed")?; Store::bootstrap(genesis_block, data_directory) } diff --git a/bin/node/src/commands/validator.rs b/bin/node/src/commands/validator.rs index 8e1e9fdf9f..24bc15bad1 100644 --- a/bin/node/src/commands/validator.rs +++ b/bin/node/src/commands/validator.rs @@ -170,9 +170,7 @@ impl ValidatorCommand { /// Bootstraps the genesis block: creates accounts, signs the block, and writes artifacts to /// disk. - /// - /// This is extracted as a free function so it can be reused by the bundled bootstrap command. - pub async fn bootstrap_genesis( + async fn bootstrap_genesis( genesis_block_directory: &Path, accounts_directory: &Path, data_directory: &Path, diff --git a/bin/node/src/main.rs b/bin/node/src/main.rs index 72582382bb..93789b97e9 100644 --- a/bin/node/src/main.rs +++ b/bin/node/src/main.rs @@ -33,15 +33,13 @@ pub enum Command { #[command(subcommand)] BlockProducer(commands::block_producer::BlockProducerCommand), - // Commands related to the node's validator component. + /// Commands related to the node's validator component. #[command(subcommand)] Validator(commands::validator::ValidatorCommand), - /// Commands relevant to running all components in the same process. - /// - /// This is the recommended way to run the node at the moment. + /// Commands related to the node's network transaction builder component. #[command(subcommand)] - Bundled(Box), + NtxBuilder(commands::ntx_builder::NtxBuilderCommand), } impl Command { @@ -54,7 +52,7 @@ impl Command { Command::Rpc(subcommand) => subcommand.is_open_telemetry_enabled(), Command::BlockProducer(subcommand) => subcommand.is_open_telemetry_enabled(), Command::Validator(subcommand) => subcommand.is_open_telemetry_enabled(), - Command::Bundled(subcommand) => subcommand.is_open_telemetry_enabled(), + Command::NtxBuilder(subcommand) => subcommand.is_open_telemetry_enabled(), } { OpenTelemetry::Enabled } else { @@ -68,7 +66,7 @@ impl Command { Command::Store(store_command) => store_command.handle().await, Command::BlockProducer(block_producer_command) => block_producer_command.handle().await, Command::Validator(validator) => validator.handle().await, - Command::Bundled(node) => node.handle().await, + Command::NtxBuilder(ntx_builder) => ntx_builder.handle().await, } } } diff --git a/bin/stress-test/src/seeding/mod.rs b/bin/stress-test/src/seeding/mod.rs index 0b860838ca..79b4618489 100644 --- a/bin/stress-test/src/seeding/mod.rs +++ b/bin/stress-test/src/seeding/mod.rs @@ -28,7 +28,6 @@ use miden_protocol::block::{ BlockNumber, FeeParameters, ProposedBlock, - ProvenBlock, SignedBlock, }; use miden_protocol::crypto::dsa::ecdsa_k256_keccak::SecretKey as EcdsaSecretKey; @@ -111,12 +110,12 @@ pub async fn seed_store( let accounts_filepath = data_directory.join(ACCOUNTS_FILENAME); let data_directory = miden_node_store::DataDirectory::load(data_directory).expect("data directory should exist"); - let genesis_header = genesis_state.into_block().await.unwrap().into_inner(); + let genesis_block = genesis_state.into_block().await.unwrap().into_inner(); let metrics = generate_blocks( num_accounts, public_accounts_percentage, faucet, - genesis_header, + genesis_block, &store_client, data_directory, accounts_filepath, @@ -137,7 +136,7 @@ async fn generate_blocks( num_accounts: usize, public_accounts_percentage: u8, mut faucet: Account, - genesis_block: ProvenBlock, + genesis_block: SignedBlock, store_client: &StoreClient, data_directory: DataDirectory, accounts_filepath: PathBuf, diff --git a/bin/stress-test/src/store/mod.rs b/bin/stress-test/src/store/mod.rs index 7c68b025e8..cedd3e23a3 100644 --- a/bin/stress-test/src/store/mod.rs +++ b/bin/stress-test/src/store/mod.rs @@ -469,8 +469,8 @@ async fn sync_chain_mmr( block_to: u32, ) -> SyncChainMmrRun { let sync_request = proto::rpc::SyncChainMmrRequest { - block_range: Some(proto::rpc::BlockRange { block_from, block_to: Some(block_to) }), - finality: proto::rpc::Finality::Committed.into(), + block_from, + upper_bound: Some(proto::rpc::sync_chain_mmr_request::UpperBound::BlockNum(block_to)), }; let start = Instant::now(); diff --git a/compose/grafana/dashboards.yml b/compose/grafana/dashboards.yml new file mode 100644 index 0000000000..deecbcd28d --- /dev/null +++ b/compose/grafana/dashboards.yml @@ -0,0 +1,12 @@ +apiVersion: 1 + +providers: + - name: default + orgId: 1 + folder: '' + type: file + disableDeletion: false + editable: true + options: + path: /var/lib/grafana/dashboards + foldersFromFilesStructure: false diff --git a/compose/grafana/dashboards/miden-node.json b/compose/grafana/dashboards/miden-node.json new file mode 100644 index 0000000000..085e3dc974 --- /dev/null +++ b/compose/grafana/dashboards/miden-node.json @@ -0,0 +1,123 @@ +{ + "annotations": { + "list": [] + }, + "editable": true, + "fiscalYearStartMonth": 0, + "graphTooltip": 0, + "links": [], + "panels": [ + { + "datasource": { + "type": "tempo", + "uid": "tempo" + }, + "fieldConfig": { + "defaults": {}, + "overrides": [] + }, + "gridPos": { + "h": 10, + "w": 24, + "x": 0, + "y": 0 + }, + "id": 1, + "targets": [ + { + "datasource": { + "type": "tempo", + "uid": "tempo" + }, + "queryType": "traceqlSearch", + "limit": 20, + "tableType": "traces", + "filters": [ + { + "id": "service-name", + "tag": "service.name", + "operator": "=", + "scope": "resource" + } + ] + } + ], + "title": "Recent Traces", + "type": "table" + }, + { + "datasource": { + "type": "tempo", + "uid": "tempo" + }, + "fieldConfig": { + "defaults": {}, + "overrides": [] + }, + "gridPos": { + "h": 10, + "w": 24, + "x": 0, + "y": 10 + }, + "id": 2, + "targets": [ + { + "datasource": { + "type": "tempo", + "uid": "tempo" + }, + "queryType": "traceql", + "query": "{name =~ \"block_builder.*\"}", + "limit": 20, + "tableType": "traces" + } + ], + "title": "Block Building Traces", + "type": "table" + }, + { + "datasource": { + "type": "tempo", + "uid": "tempo" + }, + "fieldConfig": { + "defaults": {}, + "overrides": [] + }, + "gridPos": { + "h": 10, + "w": 24, + "x": 0, + "y": 20 + }, + "id": 3, + "targets": [ + { + "datasource": { + "type": "tempo", + "uid": "tempo" + }, + "queryType": "traceql", + "query": "{name =~ \"batch_builder.*\"}", + "limit": 20, + "tableType": "traces" + } + ], + "title": "Batch Building Traces", + "type": "table" + } + ], + "schemaVersion": 39, + "tags": ["miden"], + "templating": { + "list": [] + }, + "time": { + "from": "now-30m", + "to": "now" + }, + "title": "Miden Node", + "uid": "miden-node", + "version": 2 +} diff --git a/compose/grafana/datasources.yml b/compose/grafana/datasources.yml new file mode 100644 index 0000000000..257e5903d0 --- /dev/null +++ b/compose/grafana/datasources.yml @@ -0,0 +1,11 @@ +apiVersion: 1 + +datasources: + - name: Tempo + type: tempo + access: proxy + url: http://tempo:3200 + isDefault: true + version: 1 + editable: false + uid: tempo diff --git a/compose/monitor.yml b/compose/monitor.yml new file mode 100644 index 0000000000..6d31d2650e --- /dev/null +++ b/compose/monitor.yml @@ -0,0 +1,19 @@ +services: + monitor: + image: miden-network-monitor-image + pull_policy: if_not_present + command: + - miden-network-monitor + - start + environment: + - MIDEN_MONITOR_RPC_URL=http://localhost:57291 + - MIDEN_MONITOR_PORT=3001 + - MIDEN_MONITOR_NETWORK_NAME=Localhost + - MIDEN_MONITOR_DISABLE_NTX_SERVICE=true + - MIDEN_MONITOR_ENABLE_OTEL=true + - OTEL_EXPORTER_OTLP_ENDPOINT=http://tempo:4317 + - OTEL_SERVICE_NAME=monitor + extra_hosts: + - "localhost:host-gateway" + ports: + - "3001:3001" diff --git a/compose/telemetry.yml b/compose/telemetry.yml new file mode 100644 index 0000000000..b8302fff87 --- /dev/null +++ b/compose/telemetry.yml @@ -0,0 +1,54 @@ +services: + tempo: + image: grafana/tempo:2.10.4 + volumes: + - ./compose/tempo/tempo.yml:/etc/tempo.yaml + command: ["-config.file=/etc/tempo.yaml"] + ports: + - "3200:3200" + - "4317:4317" + + grafana: + image: grafana/grafana:12.4 + volumes: + - ./compose/grafana/datasources.yml:/etc/grafana/provisioning/datasources/datasources.yaml + - ./compose/grafana/dashboards.yml:/etc/grafana/provisioning/dashboards/dashboards.yaml + - ./compose/grafana/dashboards:/var/lib/grafana/dashboards + environment: + - GF_AUTH_ANONYMOUS_ENABLED=true + - GF_AUTH_ANONYMOUS_ORG_ROLE=Admin + - GF_DASHBOARDS_DEFAULT_HOME_DASHBOARD_PATH=/var/lib/grafana/dashboards/miden-node.json + ports: + - "3000:3000" + depends_on: + - tempo + + store: + environment: + - MIDEN_NODE_ENABLE_OTEL=true + - OTEL_EXPORTER_OTLP_ENDPOINT=http://tempo:4317 + - OTEL_SERVICE_NAME=store + + validator: + environment: + - MIDEN_NODE_ENABLE_OTEL=true + - OTEL_EXPORTER_OTLP_ENDPOINT=http://tempo:4317 + - OTEL_SERVICE_NAME=validator + + block-producer: + environment: + - MIDEN_NODE_ENABLE_OTEL=true + - OTEL_EXPORTER_OTLP_ENDPOINT=http://tempo:4317 + - OTEL_SERVICE_NAME=block-producer + + rpc: + environment: + - MIDEN_NODE_ENABLE_OTEL=true + - OTEL_EXPORTER_OTLP_ENDPOINT=http://tempo:4317 + - OTEL_SERVICE_NAME=rpc + + ntx-builder: + environment: + - MIDEN_NODE_ENABLE_OTEL=true + - OTEL_EXPORTER_OTLP_ENDPOINT=http://tempo:4317 + - OTEL_SERVICE_NAME=ntx-builder diff --git a/compose/tempo/tempo.yml b/compose/tempo/tempo.yml new file mode 100644 index 0000000000..8c5c7bed55 --- /dev/null +++ b/compose/tempo/tempo.yml @@ -0,0 +1,31 @@ +stream_over_http_enabled: true + +server: + http_listen_port: 3200 + +distributor: + receivers: + otlp: + protocols: + grpc: + endpoint: "0.0.0.0:4317" + +storage: + trace: + backend: local + local: + path: /var/tempo/traces + wal: + path: /var/tempo/wal + +metrics_generator: + registry: + external_labels: + source: tempo + storage: + path: /var/tempo/metrics/wal + traces_storage: + path: /var/tempo/metrics/traces + +overrides: + metrics_generator_processors: [service-graphs, span-metrics, local-blocks] # enables metrics generator diff --git a/crates/db/src/lib.rs b/crates/db/src/lib.rs index 7000f131d1..cd35cc65af 100644 --- a/crates/db/src/lib.rs +++ b/crates/db/src/lib.rs @@ -46,10 +46,13 @@ impl Db { .await .map_err(|e| DatabaseError::ConnectionPoolObtainError(Box::new(e)))?; - conn.interact(|conn| <_ as diesel::Connection>::transaction::(conn, query)) - .in_current_span() - .await - .map_err(|err| E::from(DatabaseError::interact(&msg.to_string(), &err)))? + let span = tracing::Span::current(); + conn.interact(move |conn| { + let _guard = span.enter(); + <_ as diesel::Connection>::transaction::(conn, query) + }) + .await + .map_err(|err| E::from(DatabaseError::interact(&msg.to_string(), &err)))? } /// Run the query _without_ a transaction @@ -67,7 +70,9 @@ impl Db { .await .map_err(|e| DatabaseError::ConnectionPoolObtainError(Box::new(e)))?; + let span = tracing::Span::current(); conn.interact(move |conn| { + let _guard = span.enter(); let r = query(conn)?; Ok(r) }) diff --git a/crates/large-smt-backend-rocksdb/Cargo.toml b/crates/large-smt-backend-rocksdb/Cargo.toml index 5109ecc9c3..4499dffaa0 100644 --- a/crates/large-smt-backend-rocksdb/Cargo.toml +++ b/crates/large-smt-backend-rocksdb/Cargo.toml @@ -18,7 +18,7 @@ workspace = true miden-crypto = { features = ["concurrent", "std"], workspace = true } miden-protocol = { features = ["std"], workspace = true } rayon = { version = "1.10" } -rocksdb = { default-features = false, features = ["bindgen-runtime", "lz4"], version = "0.24" } +rocksdb = { default-features = false, features = ["bindgen-runtime", "lz4", "zstd"], version = "0.24" } [build-dependencies] miden-node-rocksdb-cxx-linkage-fix = { workspace = true } diff --git a/crates/large-smt-backend-rocksdb/src/lib.rs b/crates/large-smt-backend-rocksdb/src/lib.rs index 563439c9f4..eaca341fdc 100644 --- a/crates/large-smt-backend-rocksdb/src/lib.rs +++ b/crates/large-smt-backend-rocksdb/src/lib.rs @@ -56,4 +56,12 @@ pub use miden_protocol::{ merkle::{EmptySubtreeRoots, InnerNodeInfo, MerkleError, NodeIndex, SparseMerklePath}, }, }; -pub use rocksdb::{RocksDbConfig, RocksDbStorage}; +pub use rocksdb::{ + RocksDbBloomFilterBitsPerKey, + RocksDbConfig, + RocksDbDurabilityMode, + RocksDbMemoryBudget, + RocksDbStorage, + RocksDbTuningOptions, + RocksDbWriteBufferManagerBudget, +}; diff --git a/crates/large-smt-backend-rocksdb/src/rocksdb.rs b/crates/large-smt-backend-rocksdb/src/rocksdb.rs index 1a54feb475..1310ae6dff 100644 --- a/crates/large-smt-backend-rocksdb/src/rocksdb.rs +++ b/crates/large-smt-backend-rocksdb/src/rocksdb.rs @@ -20,6 +20,8 @@ use rocksdb::{ Options, ReadOptions, WriteBatch, + WriteBufferManager, + WriteOptions, }; use super::{SmtStorage, StorageError, StorageUpdateParts, StorageUpdates, SubtreeUpdate}; @@ -27,6 +29,12 @@ use crate::helpers::{insert_into_leaf, map_rocksdb_err, remove_from_leaf}; use crate::{EMPTY_WORD, Word}; const IN_MEMORY_DEPTH: u8 = 24; +const DEFAULT_CACHE_SIZE: usize = 1 << 30; +const DEFAULT_MAX_OPEN_FILES: i32 = 512; +const DEFAULT_BLOCK_SIZE: usize = 16 << 10; +const DEFAULT_MAX_TOTAL_WAL_SIZE: u64 = 512 * 1024 * 1024; +const DEFAULT_BOTTOMMOST_ZSTD_MAX_TRAIN_BYTES: i32 = 1 << 20; +const DEFAULT_BLOOM_FILTER_BITS_PER_KEY: f64 = 10.0; /// The name of the `RocksDB` column family used for storing SMT leaves. const LEAVES_CF: &str = "leaves"; @@ -70,6 +78,7 @@ const ENTRY_COUNT_KEY: &[u8] = b"entry_count"; #[derive(Debug, Clone)] pub struct RocksDbStorage { db: Arc, + durability_mode: RocksDbDurabilityMode, } impl RocksDbStorage { @@ -80,10 +89,20 @@ impl RocksDbStorage { /// and applies various RocksDB options for performance, such as caching, bloom filters, /// and compaction strategies tailored for SMT workloads. /// + /// The default profile uses: + /// - a 1 GiB block cache shared by this database's column families + /// - up to 512 open files + /// - 16 KiB block-based table blocks with cached index/filter blocks + /// - 128 MiB write buffers with up to 3 memtables per write-heavy column family + /// - LZ4 compression for active data and ZSTD for bottommost files + /// /// # Errors /// Returns `StorageError::Backend` if the database cannot be opened or configured, /// for example, due to path issues, permissions, or RocksDB internal errors. + #[expect(clippy::too_many_lines)] pub fn open(config: RocksDbConfig) -> Result { + let tuning_options = &config.tuning_options; + // Base DB options let mut db_opts = Options::default(); // Create DB if it doesn't exist @@ -99,90 +118,130 @@ impl RocksDbStorage { // Parallelize flush/compaction up to CPU count db_opts.set_max_background_jobs(rayon::current_num_threads() as i32); // Maximum WAL size - db_opts.set_max_total_wal_size(512 * 1024 * 1024); + db_opts.set_max_total_wal_size(tuning_options.max_total_wal_size); - // Shared block cache across all column families + // Cache and optional write-buffer manager are shared across this DB's column families. let cache = Cache::new_lru_cache(config.cache_size); + let write_buffer_manager = config.write_buffer_manager(&cache); // Common table options for bloom filtering and cache let mut table_opts = BlockBasedOptions::default(); - // Use shared LRU cache for block data - table_opts.set_block_cache(&cache); - table_opts.set_bloom_filter(10.0, false); - // Enable whole-key bloom filtering (better with point lookups) - table_opts.set_whole_key_filtering(true); - // Pin L0 filter and index blocks in cache (improves performance) - table_opts.set_pin_l0_filter_and_index_blocks_in_cache(true); + configure_block_table_options( + &mut table_opts, + &cache, + tuning_options, + tuning_options.bloom_filter_bits_per_key.leaves, + ); // Column family for leaves let mut leaves_opts = Options::default(); leaves_opts.set_block_based_table_factory(&table_opts); - // 128 MB memtable - leaves_opts.set_write_buffer_size(128 << 20); - // Allow up to 3 memtables - leaves_opts.set_max_write_buffer_number(3); - leaves_opts.set_min_write_buffer_number_to_merge(1); - // Do not retain flushed memtables in memory - leaves_opts.set_max_write_buffer_size_to_maintain(0); - // Use level-based compaction - leaves_opts.set_compaction_style(DBCompactionStyle::Level); - // 512 MB target file size - leaves_opts.set_target_file_size_base(512 << 20); - leaves_opts.set_target_file_size_multiplier(2); - // LZ4 compression - leaves_opts.set_compression_type(DBCompressionType::Lz4); - // Set level-based compaction parameters - leaves_opts.set_level_zero_file_num_compaction_trigger(8); - - // Helper to build subtree CF options with correct prefix length + configure_smt_cf_options(&mut leaves_opts); + if let Some(write_buffer_manager) = write_buffer_manager.as_ref() { + db_opts.set_write_buffer_manager(write_buffer_manager); + leaves_opts.set_write_buffer_manager(write_buffer_manager); + } + + // Helper to build subtree CF options with the tuned block-table profile #[expect(clippy::items_after_statements)] - fn subtree_cf(cache: &Cache, bloom_filter_bits: f64) -> Options { - let mut tbl = BlockBasedOptions::default(); - // Use shared LRU cache for block data - tbl.set_block_cache(cache); - // Set bloom filter for subtree lookups - tbl.set_bloom_filter(bloom_filter_bits, false); - // Enable whole-key bloom filtering - tbl.set_whole_key_filtering(true); - // Pin L0 filter and index blocks in cache - tbl.set_pin_l0_filter_and_index_blocks_in_cache(true); + fn subtree_cf( + cache: &Cache, + tuning_options: &RocksDbTuningOptions, + bloom_filter_bits: f64, + write_buffer_manager: Option<&WriteBufferManager>, + ) -> Options { + let mut table_opts = BlockBasedOptions::default(); + configure_block_table_options( + &mut table_opts, + cache, + tuning_options, + bloom_filter_bits, + ); let mut opts = Options::default(); - opts.set_block_based_table_factory(&tbl); - // 128 MB memtable - opts.set_write_buffer_size(128 << 20); - opts.set_max_write_buffer_number(3); - opts.set_min_write_buffer_number_to_merge(1); - // Do not retain flushed memtables in memory - opts.set_max_write_buffer_size_to_maintain(0); - // Use level-based compaction - opts.set_compaction_style(DBCompactionStyle::Level); - // 512 MB target file size - opts.set_target_file_size_base(512 << 20); - opts.set_target_file_size_multiplier(2); - // LZ4 compression - opts.set_compression_type(DBCompressionType::Lz4); - // Set level-based compaction parameters - opts.set_level_zero_file_num_compaction_trigger(8); + opts.set_block_based_table_factory(&table_opts); + configure_smt_cf_options(&mut opts); + if let Some(write_buffer_manager) = write_buffer_manager { + opts.set_write_buffer_manager(write_buffer_manager); + } opts } + // Depth-24 cache column family + let mut depth24_table_opts = BlockBasedOptions::default(); + configure_block_table_options( + &mut depth24_table_opts, + &cache, + tuning_options, + tuning_options.bloom_filter_bits_per_key.depth_24, + ); + let mut depth24_opts = Options::default(); depth24_opts.set_compression_type(DBCompressionType::Lz4); - depth24_opts.set_block_based_table_factory(&table_opts); + depth24_opts.set_bottommost_compression_type(DBCompressionType::Zstd); + // Enable the bottommost compression setting; selecting ZSTD alone is not enough. + depth24_opts + .set_bottommost_zstd_max_train_bytes(DEFAULT_BOTTOMMOST_ZSTD_MAX_TRAIN_BYTES, true); + depth24_opts.set_block_based_table_factory(&depth24_table_opts); + if let Some(write_buffer_manager) = write_buffer_manager.as_ref() { + depth24_opts.set_write_buffer_manager(write_buffer_manager); + } // Metadata CF with no compression let mut metadata_opts = Options::default(); metadata_opts.set_compression_type(DBCompressionType::None); + if let Some(write_buffer_manager) = write_buffer_manager.as_ref() { + metadata_opts.set_write_buffer_manager(write_buffer_manager); + } // Define column families with tailored options let cfs = vec![ ColumnFamilyDescriptor::new(LEAVES_CF, leaves_opts), - ColumnFamilyDescriptor::new(SUBTREE_24_CF, subtree_cf(&cache, 8.0)), - ColumnFamilyDescriptor::new(SUBTREE_32_CF, subtree_cf(&cache, 10.0)), - ColumnFamilyDescriptor::new(SUBTREE_40_CF, subtree_cf(&cache, 10.0)), - ColumnFamilyDescriptor::new(SUBTREE_48_CF, subtree_cf(&cache, 12.0)), - ColumnFamilyDescriptor::new(SUBTREE_56_CF, subtree_cf(&cache, 12.0)), + ColumnFamilyDescriptor::new( + SUBTREE_24_CF, + subtree_cf( + &cache, + tuning_options, + tuning_options.bloom_filter_bits_per_key.subtree_24, + write_buffer_manager.as_ref(), + ), + ), + ColumnFamilyDescriptor::new( + SUBTREE_32_CF, + subtree_cf( + &cache, + tuning_options, + tuning_options.bloom_filter_bits_per_key.subtree_32, + write_buffer_manager.as_ref(), + ), + ), + ColumnFamilyDescriptor::new( + SUBTREE_40_CF, + subtree_cf( + &cache, + tuning_options, + tuning_options.bloom_filter_bits_per_key.subtree_40, + write_buffer_manager.as_ref(), + ), + ), + ColumnFamilyDescriptor::new( + SUBTREE_48_CF, + subtree_cf( + &cache, + tuning_options, + tuning_options.bloom_filter_bits_per_key.subtree_48, + write_buffer_manager.as_ref(), + ), + ), + ColumnFamilyDescriptor::new( + SUBTREE_56_CF, + subtree_cf( + &cache, + tuning_options, + tuning_options.bloom_filter_bits_per_key.subtree_56, + write_buffer_manager.as_ref(), + ), + ), ColumnFamilyDescriptor::new(METADATA_CF, metadata_opts), ColumnFamilyDescriptor::new(DEPTH_24_CF, depth24_opts), ]; @@ -190,7 +249,24 @@ impl RocksDbStorage { // Open the database with our tuned CFs let db = DB::open_cf_descriptors(&db_opts, config.path, cfs).map_err(map_rocksdb_err)?; - Ok(Self { db: Arc::new(db) }) + Ok(Self { + db: Arc::new(db), + durability_mode: config.durability_mode, + }) + } + + fn write_options(&self) -> WriteOptions { + let mut write_opts = WriteOptions::default(); + write_opts.set_sync(self.should_sync_writes()); + write_opts + } + + fn write_batch(&self, batch: WriteBatch) -> Result<(), StorageError> { + self.db.write_opt(batch, &self.write_options()).map_err(map_rocksdb_err) + } + + fn should_sync_writes(&self) -> bool { + self.durability_mode == RocksDbDurabilityMode::Sync } /// Syncs the RocksDB database to disk. @@ -376,7 +452,7 @@ impl SmtStorage for RocksDbStorage { batch.put_cf(metadata_cf, ENTRY_COUNT_KEY, current_entry_count.to_be_bytes()); // Atomically write all changes (leaf data and metadata counts). - self.db.write(batch).map_err(map_rocksdb_err)?; + self.write_batch(batch)?; Ok(value_to_return) } @@ -423,7 +499,7 @@ impl SmtStorage for RocksDbStorage { } batch.put_cf(metadata_cf, LEAF_COUNT_KEY, leaf_count.to_be_bytes()); batch.put_cf(metadata_cf, ENTRY_COUNT_KEY, entry_count.to_be_bytes()); - self.db.write(batch).map_err(map_rocksdb_err)?; + self.write_batch(batch)?; Ok(current_value) } @@ -437,7 +513,7 @@ impl SmtStorage for RocksDbStorage { let key = Self::index_db_key(index); match self.db.get_cf(cf, key).map_err(map_rocksdb_err)? { Some(bytes) => { - let leaf = SmtLeaf::read_from_bytes(&bytes)?; + let leaf = SmtLeaf::read_from_bytes_with_budget(&bytes, bytes.len())?; Ok(Some(leaf)) }, None => Ok(None), @@ -470,7 +546,7 @@ impl SmtStorage for RocksDbStorage { let metadata_cf = self.cf_handle(METADATA_CF)?; batch.put_cf(metadata_cf, LEAF_COUNT_KEY, leaf_count.to_be_bytes()); batch.put_cf(metadata_cf, ENTRY_COUNT_KEY, entry_count.to_be_bytes()); - self.db.write(batch).map_err(map_rocksdb_err)?; + self.write_batch(batch)?; Ok(()) } @@ -492,9 +568,11 @@ impl SmtStorage for RocksDbStorage { let key = Self::index_db_key(index); let cf = self.cf_handle(LEAVES_CF)?; let old_bytes = self.db.get_cf(cf, key).map_err(map_rocksdb_err)?; - self.db.delete_cf(cf, key).map_err(map_rocksdb_err)?; - Ok(old_bytes - .map(|bytes| SmtLeaf::read_from_bytes(&bytes).expect("failed to deserialize leaf"))) + self.db.delete_cf_opt(cf, key, &self.write_options()).map_err(map_rocksdb_err)?; + Ok(old_bytes.map(|bytes| { + SmtLeaf::read_from_bytes_with_budget(&bytes, bytes.len()) + .expect("failed to deserialize leaf") + })) } /// Retrieves multiple SMT leaf nodes by their logical `indices` using RocksDB's `multi_get_cf`. @@ -510,7 +588,9 @@ impl SmtStorage for RocksDbStorage { results .into_iter() .map(|result| match result { - Ok(Some(bytes)) => Ok(Some(SmtLeaf::read_from_bytes(&bytes)?)), + Ok(Some(bytes)) => { + Ok(Some(SmtLeaf::read_from_bytes_with_budget(&bytes, bytes.len())?)) + }, Ok(None) => Ok(None), Err(e) => Err(map_rocksdb_err(e)), }) @@ -665,7 +745,7 @@ impl SmtStorage for RocksDbStorage { batch.put_cf(depth24_cf, hash_key, root_hash.to_bytes()); } - self.db.write(batch).map_err(map_rocksdb_err)?; + self.write_batch(batch)?; Ok(()) } @@ -701,7 +781,7 @@ impl SmtStorage for RocksDbStorage { } } - self.db.write(batch).map_err(map_rocksdb_err)?; + self.write_batch(batch)?; Ok(()) } @@ -724,7 +804,7 @@ impl SmtStorage for RocksDbStorage { batch.delete_cf(depth24_cf, hash_key); } - self.db.write(batch).map_err(map_rocksdb_err)?; + self.write_batch(batch)?; Ok(()) } @@ -911,10 +991,7 @@ impl SmtStorage for RocksDbStorage { batch.put_cf(metadata_cf, ENTRY_COUNT_KEY, new_entry_count.to_be_bytes()); } - let mut write_opts = rocksdb::WriteOptions::default(); - // Disable immediate WAL sync to disk for better performance - write_opts.set_sync(false); - self.db.write_opt(batch, &write_opts).map_err(map_rocksdb_err)?; + self.write_batch(batch)?; Ok(()) } @@ -975,7 +1052,7 @@ impl SmtStorage for RocksDbStorage { let (key_bytes, value_bytes) = item.map_err(map_rocksdb_err)?; let index = index_from_key_bytes(&key_bytes)?; - let hash = Word::read_from_bytes(&value_bytes)?; + let hash = Word::read_from_bytes_with_budget(&value_bytes, value_bytes.len())?; hashes.push((index, hash)); } @@ -1017,7 +1094,8 @@ impl Iterator for RocksDbDirectLeafIterator<'_> { self.iter.find_map(|result| { let (key_bytes, value_bytes) = result.ok()?; let leaf_idx = index_from_key_bytes(&key_bytes).ok()?; - let leaf = SmtLeaf::read_from_bytes(&value_bytes).ok()?; + let leaf = + SmtLeaf::read_from_bytes_with_budget(&value_bytes, value_bytes.len()).ok()?; Some((leaf_idx, leaf)) }) } @@ -1094,6 +1172,43 @@ impl Iterator for RocksDbSubtreeIterator<'_> { } } +fn configure_smt_cf_options(opts: &mut Options) { + // 128 MB memtable + opts.set_write_buffer_size(128 << 20); + // Allow up to 3 memtables + opts.set_max_write_buffer_number(3); + opts.set_min_write_buffer_number_to_merge(1); + // Do not retain flushed memtables in memory + opts.set_max_write_buffer_size_to_maintain(0); + // Use level-based compaction + opts.set_compaction_style(DBCompactionStyle::Level); + // 512 MB target file size + opts.set_target_file_size_base(512 << 20); + opts.set_target_file_size_multiplier(2); + // LZ4 compression for active files, ZSTD for bottommost files + opts.set_compression_type(DBCompressionType::Lz4); + opts.set_bottommost_compression_type(DBCompressionType::Zstd); + // Enable the bottommost compression setting; selecting ZSTD alone is not enough. + opts.set_bottommost_zstd_max_train_bytes(DEFAULT_BOTTOMMOST_ZSTD_MAX_TRAIN_BYTES, true); + // Set level-based compaction parameters + opts.set_level_zero_file_num_compaction_trigger(8); +} + +fn configure_block_table_options( + table_opts: &mut BlockBasedOptions, + cache: &Cache, + tuning_options: &RocksDbTuningOptions, + bloom_bits_per_key: f64, +) { + // Keep all block-based column families on the same cache and metadata policy. + table_opts.set_block_cache(cache); + table_opts.set_cache_index_and_filter_blocks(true); + table_opts.set_bloom_filter(bloom_bits_per_key, false); + table_opts.set_block_size(tuning_options.block_size); + table_opts.set_whole_key_filtering(true); + table_opts.set_pin_l0_filter_and_index_blocks_in_cache(true); +} + // ROCKSDB CONFIGURATION // -------------------------------------------------------------------------------------------- @@ -1102,7 +1217,7 @@ impl Iterator for RocksDbSubtreeIterator<'_> { /// This struct contains the essential configuration parameters needed to initialize /// and optimize RocksDB for SMT storage operations. It provides sensible defaults /// while allowing customization for specific performance requirements. -#[derive(Debug, Clone)] +#[derive(Debug, Clone, PartialEq)] pub struct RocksDbConfig { /// The filesystem path where the RocksDB database will be stored. /// @@ -1124,6 +1239,15 @@ pub struct RocksDbConfig { /// process. Higher values may improve performance for databases with many SST files but /// increase resource usage. Default: 512 files pub(crate) max_open_files: i32, + + /// Optional per-DB write-buffer manager shared by this DB's column families. + pub(crate) write_buffer_manager: Option, + + /// Tunable RocksDB profile values. + pub(crate) tuning_options: RocksDbTuningOptions, + + /// Write durability mode for RocksDB write operations. + pub(crate) durability_mode: RocksDbDurabilityMode, } impl RocksDbConfig { @@ -1136,6 +1260,9 @@ impl RocksDbConfig { /// # Default Settings /// * `cache_size`: 1GB (1,073,741,824 bytes) /// * `max_open_files`: 512 + /// * `write_buffer_manager`: disabled + /// * `tuning_options`: [`RocksDbTuningOptions::default()`] + /// * `durability_mode`: [`RocksDbDurabilityMode::Relaxed`] /// /// # Examples /// ``` @@ -1146,8 +1273,11 @@ impl RocksDbConfig { pub fn new>(path: P) -> Self { Self { path: path.into(), - cache_size: 1 << 30, - max_open_files: 512, + cache_size: DEFAULT_CACHE_SIZE, + max_open_files: DEFAULT_MAX_OPEN_FILES, + write_buffer_manager: None, + tuning_options: RocksDbTuningOptions::default(), + durability_mode: RocksDbDurabilityMode::default(), } } @@ -1173,6 +1303,22 @@ impl RocksDbConfig { self } + /// Sets the RocksDB memory budget for this database instance. + /// + /// This controls the block cache size and optional write-buffer manager created by + /// [`RocksDbStorage::open`] for one DB and its column families. It is not a process-wide + /// budget across multiple RocksDB instances. + /// + /// # Arguments + /// * `memory_budget` - Memory budget settings for RocksDB. + #[must_use] + pub fn with_memory_budget(mut self, memory_budget: RocksDbMemoryBudget) -> Self { + let RocksDbMemoryBudget { block_cache_size, write_buffer_manager } = memory_budget; + self.cache_size = block_cache_size; + self.write_buffer_manager = write_buffer_manager; + self + } + /// Sets the maximum number of files that RocksDB can have open simultaneously. /// /// This setting affects both memory usage and the number of file descriptors used by the @@ -1194,6 +1340,163 @@ impl RocksDbConfig { self.max_open_files = count; self } + + /// Sets the RocksDB tuning options. + #[must_use] + pub fn with_tuning_options(mut self, tuning_options: RocksDbTuningOptions) -> Self { + self.tuning_options = tuning_options; + self + } + + /// Sets the RocksDB write durability mode. + /// + /// The default is [`RocksDbDurabilityMode::Relaxed`], matching RocksDB's default non-sync + /// writes. + #[must_use] + pub fn with_durability_mode(mut self, durability_mode: RocksDbDurabilityMode) -> Self { + self.durability_mode = durability_mode; + self + } + + fn write_buffer_manager(&self, cache: &Cache) -> Option { + self.write_buffer_manager.as_ref().map(|budget| { + if budget.charge_to_block_cache { + WriteBufferManager::new_write_buffer_manager_with_cache( + budget.buffer_size, + budget.allow_stall, + cache.clone(), + ) + } else { + WriteBufferManager::new_write_buffer_manager(budget.buffer_size, budget.allow_stall) + } + }) + } +} + +#[derive(Debug, Clone, Copy, Eq, PartialEq, Default)] +pub enum RocksDbDurabilityMode { + #[default] + Relaxed, + Sync, +} + +#[derive(Debug, Clone, Copy, PartialEq, Eq)] +pub struct RocksDbMemoryBudget { + /// Block cache size for one RocksDB instance. + pub block_cache_size: usize, + /// Optional write-buffer manager for one RocksDB instance. + pub write_buffer_manager: Option, +} + +impl Default for RocksDbMemoryBudget { + fn default() -> Self { + Self { + block_cache_size: DEFAULT_CACHE_SIZE, + write_buffer_manager: None, + } + } +} + +#[derive(Debug, Clone, Copy, PartialEq, Eq)] +pub struct RocksDbWriteBufferManagerBudget { + pub buffer_size: usize, + pub allow_stall: bool, + pub charge_to_block_cache: bool, +} + +#[derive(Debug, Clone, PartialEq)] +pub struct RocksDbTuningOptions { + pub block_size: usize, + pub max_total_wal_size: u64, + pub bloom_filter_bits_per_key: RocksDbBloomFilterBitsPerKey, +} + +impl Default for RocksDbTuningOptions { + fn default() -> Self { + Self { + block_size: DEFAULT_BLOCK_SIZE, + max_total_wal_size: DEFAULT_MAX_TOTAL_WAL_SIZE, + bloom_filter_bits_per_key: RocksDbBloomFilterBitsPerKey::default(), + } + } +} + +#[derive(Debug, Clone, PartialEq)] +pub struct RocksDbBloomFilterBitsPerKey { + pub leaves: f64, + pub depth_24: f64, + pub subtree_24: f64, + pub subtree_32: f64, + pub subtree_40: f64, + pub subtree_48: f64, + pub subtree_56: f64, +} + +impl Default for RocksDbBloomFilterBitsPerKey { + fn default() -> Self { + Self { + leaves: DEFAULT_BLOOM_FILTER_BITS_PER_KEY, + depth_24: DEFAULT_BLOOM_FILTER_BITS_PER_KEY, + subtree_24: DEFAULT_BLOOM_FILTER_BITS_PER_KEY, + subtree_32: DEFAULT_BLOOM_FILTER_BITS_PER_KEY, + subtree_40: DEFAULT_BLOOM_FILTER_BITS_PER_KEY, + subtree_48: DEFAULT_BLOOM_FILTER_BITS_PER_KEY, + subtree_56: DEFAULT_BLOOM_FILTER_BITS_PER_KEY, + } + } +} + +#[cfg(test)] +mod tests { + use super::*; + + #[test] + fn rocksdb_config_builders_update_independent_knobs() { + let memory_budget = RocksDbMemoryBudget { + block_cache_size: 512 << 20, + write_buffer_manager: Some(RocksDbWriteBufferManagerBudget { + buffer_size: 64 << 20, + allow_stall: true, + charge_to_block_cache: true, + }), + }; + let tuning_options = RocksDbTuningOptions { + block_size: 8 << 10, + max_total_wal_size: 2 << 30, + bloom_filter_bits_per_key: RocksDbBloomFilterBitsPerKey { + leaves: 11.0, + depth_24: 12.0, + subtree_24: 13.0, + subtree_32: 14.0, + subtree_40: 15.0, + subtree_48: 16.0, + subtree_56: 17.0, + }, + }; + + let config = RocksDbConfig::new("/tmp/smt") + .with_memory_budget(memory_budget) + .with_max_open_files(1024) + .with_tuning_options(tuning_options.clone()) + .with_durability_mode(RocksDbDurabilityMode::Sync); + + assert_eq!( + config, + RocksDbConfig { + path: PathBuf::from("/tmp/smt"), + cache_size: 512 << 20, + max_open_files: 1024, + write_buffer_manager: memory_budget.write_buffer_manager, + tuning_options, + durability_mode: RocksDbDurabilityMode::Sync, + } + ); + } + + #[test] + fn rocksdb_config_defaults_to_relaxed_durability() { + assert_eq!(RocksDbConfig::new("/tmp/smt").durability_mode, RocksDbDurabilityMode::Relaxed); + } } // SUBTREE DB KEY diff --git a/crates/ntx-builder/Cargo.toml b/crates/ntx-builder/Cargo.toml index c4e2a991cc..b8b2909e7d 100644 --- a/crates/ntx-builder/Cargo.toml +++ b/crates/ntx-builder/Cargo.toml @@ -30,7 +30,6 @@ miden-tx = { features = ["concurrent"], workspace = true } thiserror = { workspace = true } tokio = { features = ["rt-multi-thread"], workspace = true } tokio-stream = { features = ["net"], workspace = true } -tokio-util = { workspace = true } tonic = { workspace = true } tonic-reflection = { workspace = true } tower-http = { workspace = true } diff --git a/crates/ntx-builder/src/actor/candidate.rs b/crates/ntx-builder/src/actor/candidate.rs index a5429a6602..56602e70ec 100644 --- a/crates/ntx-builder/src/actor/candidate.rs +++ b/crates/ntx-builder/src/actor/candidate.rs @@ -3,8 +3,7 @@ use std::sync::Arc; use miden_protocol::account::Account; use miden_protocol::block::BlockHeader; use miden_protocol::transaction::PartialBlockchain; - -use crate::inflight_note::InflightNetworkNote; +use miden_standards::note::AccountTargetNetworkNote; // TRANSACTION CANDIDATE // ================================================================================================ @@ -19,7 +18,7 @@ pub struct TransactionCandidate { pub account: Account, /// A set of notes addressed to this network account. - pub notes: Vec, + pub notes: Vec, /// The latest locally committed block header. /// diff --git a/crates/ntx-builder/src/actor/execute.rs b/crates/ntx-builder/src/actor/execute.rs index 9718056eea..1a53bfee22 100644 --- a/crates/ntx-builder/src/actor/execute.rs +++ b/crates/ntx-builder/src/actor/execute.rs @@ -32,6 +32,7 @@ use miden_protocol::transaction::{ }; use miden_protocol::vm::FutureMaybeSend; use miden_remote_prover_client::RemoteTransactionProver; +use miden_standards::note::AccountTargetNetworkNote; use miden_tx::auth::UnreachableAuth; use miden_tx::{ DataStore, @@ -205,7 +206,8 @@ impl NtxContext { ); // Filter notes. - let notes = notes.into_iter().map(Note::from).collect::>(); + let notes = + notes.into_iter().map(AccountTargetNetworkNote::into_note).collect::>(); let (successful_notes, failed_notes) = self.filter_notes(&data_store, notes).await?; diff --git a/crates/ntx-builder/src/actor/mod.rs b/crates/ntx-builder/src/actor/mod.rs index 983d14e02e..4bd1d42ff7 100644 --- a/crates/ntx-builder/src/actor/mod.rs +++ b/crates/ntx-builder/src/actor/mod.rs @@ -12,14 +12,12 @@ use miden_node_proto::domain::account::NetworkAccountId; use miden_node_utils::ErrorReport; use miden_node_utils::lru_cache::LruCache; use miden_protocol::Word; -use miden_protocol::account::{Account, AccountDelta}; use miden_protocol::block::BlockNumber; use miden_protocol::note::{NoteScript, Nullifier}; use miden_protocol::transaction::TransactionId; use miden_remote_prover_client::RemoteTransactionProver; use miden_tx::FailedNote; use tokio::sync::{Notify, RwLock, Semaphore, mpsc}; -use tokio_util::sync::CancellationToken; use crate::NoteError; use crate::chain_state::ChainState; @@ -44,12 +42,12 @@ pub enum ActorRequest { CacheNoteScript { script_root: Word, script: NoteScript }, } -// ACCOUNT ACTOR CONFIG +// ACTOR SUB-STRUCTS // ================================================================================================ -/// Contains miscellaneous resources that are required by all account actors. +/// gRPC clients used by an account actor to interact with the node's services. #[derive(Clone)] -pub struct AccountActorContext { +pub struct GrpcClients { /// Client for interacting with the store in order to load account state. pub store: StoreClient, /// Client for interacting with the block producer. @@ -59,26 +57,45 @@ pub struct AccountActorContext { /// Client for remote transaction proving. If `None`, transactions will be proven locally, /// which is undesirable due to the performance impact. pub prover: Option, - /// The latest chain state that account all actors can rely on. A single chain state is shared - /// among all actors. - pub chain_state: Arc>, +} + +/// Shared state read (and written, in the case of `db`) by all account actors. +#[derive(Clone)] +pub struct State { + /// Local database for account state, notes, and transaction tracking. + pub db: Db, + /// The latest chain state. A single chain state is shared among all actors. + pub chain: Arc>, /// Shared LRU cache for storing retrieved note scripts to avoid repeated store calls. - /// This cache is shared across all account actors to maximize cache efficiency. pub script_cache: LruCache, +} + +/// Per-actor configuration knobs. +#[derive(Debug, Clone, Copy)] +pub struct ActorConfig { /// Maximum number of notes per transaction. pub max_notes_per_tx: NonZeroUsize, /// Maximum number of note execution attempts before dropping a note. pub max_note_attempts: usize, /// Duration after which an idle actor will deactivate. pub idle_timeout: Duration, - /// Database for persistent state. - pub db: Db, - /// Channel for sending requests to the coordinator (via the builder event loop). - pub request_tx: mpsc::Sender, /// Maximum number of VM execution cycles for network transactions. pub max_cycles: u32, } +// ACCOUNT ACTOR CONTEXT +// ================================================================================================ + +/// Contains resources shared by all account actors. The coordinator uses this to spawn new actors. +#[derive(Clone)] +pub struct AccountActorContext { + pub clients: GrpcClients, + pub state: State, + pub config: ActorConfig, + /// Channel for sending requests to the coordinator (via the builder event loop). + pub request_tx: mpsc::Sender, +} + #[cfg(test)] impl AccountActorContext { /// Creates a minimal `AccountActorContext` suitable for unit tests. @@ -101,57 +118,24 @@ impl AccountActorContext { let (request_tx, _request_rx) = mpsc::channel(1); Self { - block_producer: BlockProducerClient::new(url.clone()), - validator: ValidatorClient::new(url.clone()), - prover: None, - chain_state, - store: StoreClient::new(url), - script_cache: LruCache::new(NonZeroUsize::new(1).unwrap()), - max_notes_per_tx: NonZeroUsize::new(1).unwrap(), - max_note_attempts: 1, - idle_timeout: Duration::from_secs(60), - db: db.clone(), + clients: GrpcClients { + store: StoreClient::new(url.clone()), + block_producer: BlockProducerClient::new(url.clone()), + validator: ValidatorClient::new(url), + prover: None, + }, + state: State { + db: db.clone(), + chain: chain_state, + script_cache: LruCache::new(NonZeroUsize::new(1).unwrap()), + }, + config: ActorConfig { + max_notes_per_tx: NonZeroUsize::new(1).unwrap(), + max_note_attempts: 1, + idle_timeout: Duration::from_secs(60), + max_cycles: 1 << 18, + }, request_tx, - max_cycles: 1 << 18, - } - } -} - -// ACCOUNT ORIGIN -// ================================================================================================ - -/// The origin of the account which the actor will use to initialize the account state. -#[derive(Debug)] -pub enum AccountOrigin { - /// Accounts that have just been created by a transaction but have not been committed to the - /// store yet. - Transaction(Box), - /// Accounts that already exist in the store. - Store(NetworkAccountId), -} - -impl AccountOrigin { - /// Returns an [`AccountOrigin::Transaction`] if the account is a network account. - pub fn transaction(delta: &AccountDelta) -> Option { - let account = Account::try_from(delta).ok()?; - if account.is_network() { - Some(AccountOrigin::Transaction(account.clone().into())) - } else { - None - } - } - - /// Returns an [`AccountOrigin::Store`]. - pub fn store(account_id: NetworkAccountId) -> Self { - AccountOrigin::Store(account_id) - } - - /// Returns the [`NetworkAccountId`] of the account. - pub fn id(&self) -> NetworkAccountId { - match self { - AccountOrigin::Transaction(account) => NetworkAccountId::try_from(account.id()) - .expect("actor accounts are always network accounts"), - AccountOrigin::Store(account_id) => *account_id, } } } @@ -187,68 +171,50 @@ enum ActorMode { /// /// ## Lifecycle /// -/// 1. **Initialization**: Checks DB for available notes to determine initial mode. +/// 1. **Initialization**: Waits for committed account state, then checks DB for available notes. /// 2. **Event Loop**: Continuously processes mempool events and executes transactions. /// 3. **Transaction Processing**: Selects, executes, and proves transactions, and submits them to /// block producer. /// 4. **State Updates**: Event effects are persisted to DB by the coordinator before actors are /// notified. -/// 5. **Shutdown**: Terminates gracefully when cancelled or encounters unrecoverable errors. +/// 5. **Shutdown**: Terminates gracefully on idle timeout, or returns an error on unrecoverable +/// failures. /// /// ## Concurrency /// /// Each actor runs in its own async task and communicates with other system components through -/// channels and shared state. The actor uses a cancellation token for graceful shutdown -/// coordination. +/// shared state. The coordinator signals state changes by notifying a shared [`Notify`]; the +/// actor exits of its own accord when idle for longer than [`ActorConfig::idle_timeout`]. pub struct AccountActor { - origin: AccountOrigin, - store: StoreClient, - db: Db, - mode: ActorMode, + /// The network account this actor is responsible for. + account_id: NetworkAccountId, + /// gRPC clients used by the actor. + clients: GrpcClients, + /// Shared state accessed by the actor. + state: State, + /// Per-actor configuration knobs. + config: ActorConfig, + /// Notification signal from the coordinator indicating that DB state relevant to this actor + /// may have changed. The actor re-evaluates its state from the DB on each notification. notify: Arc, - cancel_token: CancellationToken, - block_producer: BlockProducerClient, - validator: ValidatorClient, - prover: Option, - chain_state: Arc>, - script_cache: LruCache, - /// Maximum number of notes per transaction. - max_notes_per_tx: NonZeroUsize, - /// Maximum number of note execution attempts before dropping a note. - max_note_attempts: usize, - /// Duration after which an idle actor will deactivate. - idle_timeout: Duration, /// Channel for sending requests to the coordinator. - request_tx: mpsc::Sender, - /// Maximum number of VM execution cycles for network transactions. - max_cycles: u32, + request: mpsc::Sender, } impl AccountActor { /// Constructs a new account actor with the given configuration. pub fn new( - origin: AccountOrigin, + account_id: NetworkAccountId, actor_context: &AccountActorContext, notify: Arc, - cancel_token: CancellationToken, ) -> Self { Self { - origin, - store: actor_context.store.clone(), - db: actor_context.db.clone(), - mode: ActorMode::NoViableNotes, + account_id, + clients: actor_context.clients.clone(), + state: actor_context.state.clone(), + config: actor_context.config, notify, - cancel_token, - block_producer: actor_context.block_producer.clone(), - validator: actor_context.validator.clone(), - prover: actor_context.prover.clone(), - chain_state: actor_context.chain_state.clone(), - script_cache: actor_context.script_cache.clone(), - max_notes_per_tx: actor_context.max_notes_per_tx, - max_note_attempts: actor_context.max_note_attempts, - idle_timeout: actor_context.idle_timeout, - request_tx: actor_context.request_tx.clone(), - max_cycles: actor_context.max_cycles, + request: actor_context.request_tx.clone(), } } @@ -256,26 +222,35 @@ impl AccountActor { /// /// The return value signals the shutdown category to the coordinator: /// - /// - `Ok(())`: intentional shutdown (idle timeout, cancellation, or account removal). + /// - `Ok(())`: intentional shutdown (idle timeout or account removal). /// - `Err(_)`: crash (database error, semaphore failure, or any other bug). - pub async fn run(mut self, semaphore: Arc) -> anyhow::Result<()> { - let account_id = self.origin.id(); + pub async fn run(self, semaphore: Arc) -> anyhow::Result<()> { + let account_id = self.account_id; + + // Wait for the account to be committed to the DB. For newly created accounts, + // the creation transaction must be committed before we start processing notes. + if !self.wait_for_committed_account(account_id).await? { + return Ok(()); + } // Determine initial mode by checking DB for available notes. - let block_num = self.chain_state.read().await.chain_tip_header.block_num(); + let block_num = self.state.chain.read().await.chain_tip_header.block_num(); let has_notes = self + .state .db - .has_available_notes(account_id, block_num, self.max_note_attempts) + .has_available_notes(account_id, block_num, self.config.max_note_attempts) .await .context("failed to check for available notes")?; - if has_notes { - self.mode = ActorMode::NotesAvailable; - } + let mut mode = if has_notes { + ActorMode::NotesAvailable + } else { + ActorMode::NoViableNotes + }; loop { // Enable or disable transaction execution based on actor mode. - let tx_permit_acquisition = match self.mode { + let tx_permit_acquisition = match mode { // Disable transaction execution. ActorMode::NoViableNotes | ActorMode::TransactionInflight(_) => { std::future::pending().boxed() @@ -286,31 +261,29 @@ impl AccountActor { // Idle timeout timer: only ticks when in NoViableNotes mode. // Mode changes cause the next loop iteration to create a fresh sleep or pending. - let idle_timeout_sleep = match self.mode { - ActorMode::NoViableNotes => tokio::time::sleep(self.idle_timeout).boxed(), + let idle_timeout_sleep = match mode { + ActorMode::NoViableNotes => tokio::time::sleep(self.config.idle_timeout).boxed(), _ => std::future::pending().boxed(), }; tokio::select! { - _ = self.cancel_token.cancelled() => { - return Ok(()); - } // Handle coordinator notifications. On notification, re-evaluate state from DB. _ = self.notify.notified() => { - match self.mode { + match mode { ActorMode::TransactionInflight(awaited_id) => { // Check DB: is the inflight tx still pending? let exists = self + .state .db .transaction_exists(awaited_id) .await .context("failed to check transaction status")?; if exists { - self.mode = ActorMode::NotesAvailable; + mode = ActorMode::NotesAvailable; } }, _ => { - self.mode = ActorMode::NotesAvailable; + mode = ActorMode::NotesAvailable; } } }, @@ -319,7 +292,7 @@ impl AccountActor { let _permit = permit.context("semaphore closed")?; // Read the chain state. - let chain_state = self.chain_state.read().await.clone(); + let chain_state = self.state.chain.read().await.clone(); // Query DB for latest account and available notes. let tx_candidate = self.select_candidate_from_db( @@ -328,10 +301,10 @@ impl AccountActor { ).await?; if let Some(tx_candidate) = tx_candidate { - self.execute_transactions(account_id, tx_candidate).await; + mode = self.execute_transactions(account_id, tx_candidate).await; } else { // No transactions to execute, wait for events. - self.mode = ActorMode::NoViableNotes; + mode = ActorMode::NoViableNotes; } } // Idle timeout: actor has been idle too long, deactivate account. @@ -350,11 +323,12 @@ impl AccountActor { chain_state: ChainState, ) -> anyhow::Result> { let block_num = chain_state.chain_tip_header.block_num(); - let max_notes = self.max_notes_per_tx.get(); + let max_notes = self.config.max_notes_per_tx.get(); let (latest_account, notes) = self + .state .db - .select_candidate(account_id, block_num, self.max_note_attempts) + .select_candidate(account_id, block_num, self.config.max_note_attempts) .await .context("failed to query DB for transaction candidate")?; @@ -377,31 +351,79 @@ impl AccountActor { })) } + /// Waits until a committed account state exists in the DB. + /// + /// For accounts that are being created by an inflight transaction, this will idle + /// until the transaction is committed. Returns `true` when the account is ready, or + /// `false` if no commit arrived within [`ActorConfig::idle_timeout`] — in which case + /// the coordinator will respawn a new actor when the account reappears through + /// [`Coordinator::send_targeted`](crate::coordinator::Coordinator::send_targeted) or the + /// account loader. + async fn wait_for_committed_account( + &self, + account_id: NetworkAccountId, + ) -> anyhow::Result { + // Check if the account is already committed. + if self + .state + .db + .has_committed_account(account_id) + .await + .context("failed to check for committed account")? + { + return Ok(true); + } + + loop { + tokio::select! { + _ = self.notify.notified() => { + if self + .state + .db + .has_committed_account(account_id) + .await + .context("failed to check for committed account")? + { + tracing::info!(account.id=%account_id, "Account committed, starting normal operation"); + return Ok(true); + } + } + _ = tokio::time::sleep(self.config.idle_timeout) => { + tracing::info!( + %account_id, + "Account actor deactivated while waiting for account commit", + ); + return Ok(false); + } + } + } + } + /// Execute a transaction candidate and mark notes as failed as required. /// - /// Updates the state of the actor based on the execution result. + /// Returns the new actor mode based on the execution result. #[tracing::instrument(name = "ntx.actor.execute_transactions", skip(self, tx_candidate))] async fn execute_transactions( - &mut self, + &self, account_id: NetworkAccountId, tx_candidate: TransactionCandidate, - ) { + ) -> ActorMode { let block_num = tx_candidate.chain_tip_header.block_num(); // Execute the selected transaction. let context = execute::NtxContext::new( - self.block_producer.clone(), - self.validator.clone(), - self.prover.clone(), - self.store.clone(), - self.script_cache.clone(), - self.db.clone(), - self.max_cycles, + self.clients.block_producer.clone(), + self.clients.validator.clone(), + self.clients.prover.clone(), + self.clients.store.clone(), + self.state.script_cache.clone(), + self.state.db.clone(), + self.config.max_cycles, ); let notes = tx_candidate.notes.clone(); let account_id = tx_candidate.account.id(); - let note_ids: Vec<_> = notes.iter().map(|n| n.to_inner().as_note().id()).collect(); + let note_ids: Vec<_> = notes.iter().map(|n| n.as_note().id()).collect(); tracing::info!( %account_id, ?note_ids, @@ -423,7 +445,7 @@ impl AccountActor { let failed_notes = log_failed_notes(failed); self.mark_notes_failed(&failed_notes, block_num).await; } - self.mode = ActorMode::TransactionInflight(tx_id); + ActorMode::TransactionInflight(tx_id) }, // Transaction execution failed. Err(err) => { @@ -434,7 +456,6 @@ impl AccountActor { err = %error_msg, "network transaction failed", ); - self.mode = ActorMode::NoViableNotes; // For `AllNotesFailed`, use the per-note errors which contain the // specific reason each note failed (e.g. consumability check details). @@ -446,17 +467,18 @@ impl AccountActor { .iter() .map(|note| { tracing::info!( - note.id = %note.to_inner().as_note().id(), - nullifier = %note.nullifier(), + note.id = %note.as_note().id(), + nullifier = %note.as_note().nullifier(), err = %error_msg, "note failed: transaction execution error", ); - (note.nullifier(), error.clone()) + (note.as_note().nullifier(), error.clone()) }) .collect() }, }; self.mark_notes_failed(&failed_notes, block_num).await; + ActorMode::NoViableNotes }, } } @@ -465,7 +487,7 @@ impl AccountActor { async fn cache_note_scripts(&self, scripts: Vec<(Word, NoteScript)>) { for (script_root, script) in scripts { if self - .request_tx + .request .send(ActorRequest::CacheNoteScript { script_root, script }) .await .is_err() @@ -485,7 +507,7 @@ impl AccountActor { ) { let (ack_tx, ack_rx) = tokio::sync::oneshot::channel(); if self - .request_tx + .request .send(ActorRequest::NotesFailed { failed_notes: failed_notes.to_vec(), block_num, diff --git a/crates/ntx-builder/src/builder.rs b/crates/ntx-builder/src/builder.rs index 3e3581cbe8..12bc442a8b 100644 --- a/crates/ntx-builder/src/builder.rs +++ b/crates/ntx-builder/src/builder.rs @@ -14,7 +14,7 @@ use tokio_stream::StreamExt; use tonic::Status; use crate::NtxBuilderConfig; -use crate::actor::{AccountActorContext, AccountOrigin, ActorRequest}; +use crate::actor::{AccountActorContext, ActorRequest}; use crate::chain_state::ChainState; use crate::clients::StoreClient; use crate::coordinator::Coordinator; @@ -90,7 +90,7 @@ impl NetworkTransactionBuilder { /// Runs the network transaction builder event loop until a fatal error occurs. /// /// If a `TcpListener` is provided, a gRPC server is also spawned to expose the - /// `GetNoteError` endpoint. + /// `GetNetworkNoteStatus` endpoint. /// /// This method: /// 1. Optionally starts a gRPC server for note error queries @@ -109,7 +109,7 @@ impl NetworkTransactionBuilder { // Start the gRPC server if a listener is provided. if let Some(listener) = listener { - let server = NtxBuilderRpcServer::new(self.db.clone()); + let server = NtxBuilderRpcServer::new(self.db.clone(), self.config.max_note_attempts); join_set.spawn(async move { server.serve(listener).await.context("ntx-builder gRPC server failed") }); @@ -147,7 +147,7 @@ impl NetworkTransactionBuilder { result = self.coordinator.next() => { if let Some(account_id) = result? { self.coordinator - .spawn_actor(AccountOrigin::store(account_id), &self.actor_context); + .spawn_actor(account_id, &self.actor_context); } }, // Handle mempool events. @@ -210,8 +210,7 @@ impl NetworkTransactionBuilder { .await .context("failed to sync account to DB")?; - self.coordinator - .spawn_actor(AccountOrigin::store(account_id), &self.actor_context); + self.coordinator.spawn_actor(account_id, &self.actor_context); Ok(()) } @@ -226,21 +225,17 @@ impl NetworkTransactionBuilder { .await .context("failed to write TransactionAdded to DB")?; - // Handle account deltas in case an account is being created. + // Spawn new actors for newly created network accounts. if let Some(AccountUpdateDetails::Delta(delta)) = account_delta { - // Handle account deltas for network accounts only. - if let Some(network_account) = AccountOrigin::transaction(delta) { - // Spawn new actors if a transaction creates a new network account. - let is_creating_account = delta.is_full_state(); - if is_creating_account { - self.coordinator.spawn_actor(network_account, &self.actor_context); + if delta.is_full_state() { + if let Ok(network_id) = NetworkAccountId::try_from(delta.id()) { + self.coordinator.spawn_actor(network_id, &self.actor_context); } } } let inactive_targets = self.coordinator.send_targeted(&event); for account_id in inactive_targets { - self.coordinator - .spawn_actor(AccountOrigin::store(account_id), &self.actor_context); + self.coordinator.spawn_actor(account_id, &self.actor_context); } Ok(()) }, diff --git a/crates/ntx-builder/src/coordinator.rs b/crates/ntx-builder/src/coordinator.rs index 87aa5edcd1..7fdb6f0d5d 100644 --- a/crates/ntx-builder/src/coordinator.rs +++ b/crates/ntx-builder/src/coordinator.rs @@ -7,9 +7,8 @@ use miden_node_proto::domain::mempool::MempoolEvent; use miden_protocol::account::delta::AccountUpdateDetails; use tokio::sync::{Notify, Semaphore}; use tokio::task::JoinSet; -use tokio_util::sync::CancellationToken; -use crate::actor::{AccountActor, AccountActorContext, AccountOrigin}; +use crate::actor::{AccountActor, AccountActorContext}; use crate::db::Db; // WRITE EVENT RESULT @@ -24,16 +23,39 @@ pub struct WriteEventResult { // ACTOR HANDLE // ================================================================================================ -/// Handle to account actors that are spawned by the coordinator. +/// Handle to an account actor spawned by the coordinator. #[derive(Clone)] struct ActorHandle { + /// [`Notify`] shared with the actor. The coordinator calls [`Notify::notify_one`] when DB + /// state relevant to the actor may have changed, the actor awaits [`Notify::notified`] and + /// re-evaluates its state on wake-up. notify: Arc, - cancel_token: CancellationToken, } impl ActorHandle { - fn new(notify: Arc, cancel_token: CancellationToken) -> Self { - Self { notify, cancel_token } + fn new(notify: Arc) -> Self { + Self { notify } + } + + /// Signals the actor that DB state may have changed. Notifications coalesce when one is + /// already pending. + fn notify(&self) { + self.notify.notify_one(); + } + + /// Returns `true` if a notification is queued but not yet consumed by the actor. + /// + /// Used after an actor has shut down to detect the race where a notification arrived just + /// as the actor timed out. If so, the coordinator should respawn the actor. + fn has_pending_notification(&self) -> bool { + use futures::FutureExt; + if self.notify.notified().now_or_never().is_some() { + // Restore the permit so the respawned actor still sees the notification. + self.notify.notify_one(); + true + } else { + false + } } } @@ -54,10 +76,10 @@ impl ActorHandle { /// - Monitors actor tasks through a join set to detect completion or errors. /// /// ## Event Notification -/// - Notifies actors via [`Notify`] when state may have changed. +/// - Notifies actors via a shared [`Notify`] when state may have changed. /// - The DB is the source of truth: actors re-evaluate their state from DB on notification. -/// - Notifications are coalesced: multiple notifications while an actor is busy result in a single -/// wake-up. +/// - Notifications are coalesced: [`Notify`] stores at most one permit, so multiple notifications +/// while an actor is busy result in a single wake-up. /// /// ## Resource Management /// - Controls transaction concurrency across all network accounts using a semaphore. @@ -76,7 +98,7 @@ impl ActorHandle { /// 3. Actor completion/failure events are monitored and handled. /// 4. Failed or completed actors are cleaned up from the registry. pub struct Coordinator { - /// Mapping of network account IDs to their notification handles and cancellation tokens. + /// Mapping of network account IDs to their notification handles. /// /// This registry serves as the primary directory for notifying active account actors. /// When actors are spawned, they register their notification handle here. When events need @@ -133,10 +155,12 @@ impl Coordinator { /// This method creates a new [`AccountActor`] instance for the specified account origin /// and adds it to the coordinator's management system. The actor will be responsible for /// processing transactions and managing state for the network account. - #[tracing::instrument(name = "ntx.builder.spawn_actor", skip(self, origin, actor_context))] - pub fn spawn_actor(&mut self, origin: AccountOrigin, actor_context: &AccountActorContext) { - let account_id = origin.id(); - + #[tracing::instrument(name = "ntx.builder.spawn_actor", skip(self, actor_context))] + pub fn spawn_actor( + &mut self, + account_id: NetworkAccountId, + actor_context: &AccountActorContext, + ) { // Skip spawning if the account has been deactivated due to repeated crashes. if let Some(&count) = self.crash_counts.get(&account_id) { if count >= self.max_account_crashes { @@ -149,19 +173,19 @@ impl Coordinator { } } - // If an actor already exists for this account ID, something has gone wrong. - if let Some(handle) = self.actor_registry.remove(&account_id) { + // If an actor already exists for this account ID, something has gone wrong. Reject the + // spawn rather than replacing. + if self.actor_registry.contains_key(&account_id) { tracing::error!( account_id = %account_id, "Account actor already exists" ); - handle.cancel_token.cancel(); + return; } let notify = Arc::new(Notify::new()); - let cancel_token = tokio_util::sync::CancellationToken::new(); - let actor = AccountActor::new(origin, actor_context, notify.clone(), cancel_token.clone()); - let handle = ActorHandle::new(notify, cancel_token); + let actor = AccountActor::new(account_id, actor_context, notify.clone()); + let handle = ActorHandle::new(notify); // Run the actor. Actor reads state from DB on startup. let semaphore = self.semaphore.clone(); @@ -180,7 +204,7 @@ impl Coordinator { pub fn notify_accounts(&self, account_ids: &[NetworkAccountId]) { for account_id in account_ids { if let Some(handle) = self.actor_registry.get(account_id) { - handle.notify.notify_one(); + handle.notify(); } } } @@ -200,15 +224,13 @@ impl Coordinator { let actor_result = self.actor_join_set.join_next().await; match actor_result { Some(Ok((account_id, Ok(())))) => { - // Actor shut down intentionally (idle timeout, cancelled, account removed). + // Actor shut down intentionally (idle timeout or account removed). // Remove from registry and check if a notification arrived just as it shut // down. If so, the caller should respawn it. - let should_respawn = - self.actor_registry.remove(&account_id).is_some_and(|handle| { - let notified = handle.notify.notified(); - tokio::pin!(notified); - notified.enable() - }); + let should_respawn = self + .actor_registry + .remove(&account_id) + .is_some_and(|handle| handle.has_pending_notification()); Ok(should_respawn.then_some(account_id)) }, @@ -278,7 +300,7 @@ impl Coordinator { // Notify target actors. for account_id in &target_account_ids { if let Some(handle) = self.actor_registry.get(account_id) { - handle.notify.notify_one(); + handle.notify(); } } @@ -341,22 +363,17 @@ impl Coordinator { #[cfg(test)] mod tests { - use std::sync::Arc; - use miden_node_proto::domain::mempool::MempoolEvent; use super::*; - use crate::actor::{AccountActorContext, AccountOrigin}; + use crate::actor::AccountActorContext; use crate::db::Db; use crate::test_utils::*; /// Registers a dummy actor handle (no real actor task) in the coordinator's registry. fn register_dummy_actor(coordinator: &mut Coordinator, account_id: NetworkAccountId) { let notify = Arc::new(Notify::new()); - let cancel_token = CancellationToken::new(); - coordinator - .actor_registry - .insert(account_id, ActorHandle::new(notify, cancel_token)); + coordinator.actor_registry.insert(account_id, ActorHandle::new(notify)); } // SEND TARGETED TESTS @@ -403,7 +420,7 @@ mod tests { // Simulate the account having reached the crash threshold. coordinator.crash_counts.insert(account_id, max_crashes); - coordinator.spawn_actor(AccountOrigin::Store(account_id), &actor_context); + coordinator.spawn_actor(account_id, &actor_context); assert!( !coordinator.actor_registry.contains_key(&account_id), @@ -423,7 +440,7 @@ mod tests { // Set crash count below the threshold. coordinator.crash_counts.insert(account_id, max_crashes - 1); - coordinator.spawn_actor(AccountOrigin::Store(account_id), &actor_context); + coordinator.spawn_actor(account_id, &actor_context); assert!( coordinator.actor_registry.contains_key(&account_id), diff --git a/crates/ntx-builder/src/db/migrations/2026020900000_setup/up.sql b/crates/ntx-builder/src/db/migrations/2026020900000_setup/up.sql index 4a1480b08d..46d71689c0 100644 --- a/crates/ntx-builder/src/db/migrations/2026020900000_setup/up.sql +++ b/crates/ntx-builder/src/db/migrations/2026020900000_setup/up.sql @@ -36,7 +36,7 @@ CREATE INDEX idx_accounts_tx ON accounts(transaction_id) WHERE transaction_id IS -- Notes: committed, inflight, and nullified — all in one table. -- created_by = NULL means committed note; non-NULL means created by inflight tx. -- consumed_by = NULL means unconsumed; non-NULL means consumed by inflight tx. --- Row is deleted once consumption is committed. +-- committed_at = block number when the consuming transaction was committed on-chain. CREATE TABLE notes ( -- Nullifier bytes (32 bytes). Primary key. nullifier BLOB PRIMARY KEY, @@ -56,9 +56,13 @@ CREATE TABLE notes ( created_by BLOB, -- NULL if unconsumed; transaction ID of the consuming inflight tx. consumed_by BLOB, + -- Block number at which the note's consuming transaction was committed. + -- NULL while the note is still pending or in-flight; set on block commit. + committed_at INTEGER, CONSTRAINT notes_attempt_count_non_negative CHECK (attempt_count >= 0), - CONSTRAINT notes_last_attempt_is_u32 CHECK (last_attempt BETWEEN 0 AND 0xFFFFFFFF) + CONSTRAINT notes_last_attempt_is_u32 CHECK (last_attempt BETWEEN 0 AND 0xFFFFFFFF), + CONSTRAINT notes_committed_at_is_u32 CHECK (committed_at BETWEEN 0 AND 0xFFFFFFFF) ) WITHOUT ROWID; CREATE INDEX idx_notes_account ON notes(account_id); diff --git a/crates/ntx-builder/src/db/mod.rs b/crates/ntx-builder/src/db/mod.rs index 28f1fef933..a743e09d47 100644 --- a/crates/ntx-builder/src/db/mod.rs +++ b/crates/ntx-builder/src/db/mod.rs @@ -14,7 +14,6 @@ use tracing::{info, instrument}; use crate::db::migrations::apply_migrations; use crate::db::models::queries; -use crate::inflight_note::InflightNetworkNote; use crate::{COMPONENT, NoteError}; pub(crate) mod models; @@ -84,13 +83,22 @@ impl Db { .await } + /// Returns `true` if a committed account state exists for the given account. + pub async fn has_committed_account(&self, account_id: NetworkAccountId) -> Result { + self.inner + .query("has_committed_account", move |conn| { + Ok(queries::get_committed_account(conn, account_id)?.is_some()) + }) + .await + } + /// Returns the latest account state and available notes for the given account. pub async fn select_candidate( &self, account_id: NetworkAccountId, block_num: BlockNumber, max_note_attempts: usize, - ) -> Result<(Option, Vec)> { + ) -> Result<(Option, Vec)> { self.inner .query("select_candidate", move |conn| { let account = queries::get_account(conn, account_id)?; @@ -115,11 +123,11 @@ impl Db { .await } - /// Returns the latest execution error for a note identified by its note ID. - pub async fn get_note_error(&self, note_id: NoteId) -> Result> { + /// Returns the status for a note identified by its note ID. + pub async fn get_note_status(&self, note_id: NoteId) -> Result> { let note_id_bytes = models::conv::note_id_to_bytes(¬e_id); self.inner - .query("get_note_error", move |conn| queries::get_note_error(conn, ¬e_id_bytes)) + .query("get_note_status", move |conn| queries::get_note_status(conn, ¬e_id_bytes)) .await } diff --git a/crates/ntx-builder/src/db/models/queries/accounts.rs b/crates/ntx-builder/src/db/models/queries/accounts.rs index 7d52c6554c..79035918e0 100644 --- a/crates/ntx-builder/src/db/models/queries/accounts.rs +++ b/crates/ntx-builder/src/db/models/queries/accounts.rs @@ -103,6 +103,33 @@ pub fn get_account( .transpose() } +/// Returns the committed account state (`transaction_id IS NULL`), ignoring any inflight rows. +/// +/// # Raw SQL +/// +/// ```sql +/// SELECT account_data +/// FROM accounts +/// WHERE account_id = ?1 AND transaction_id IS NULL +/// LIMIT 1 +/// ``` +pub fn get_committed_account( + conn: &mut SqliteConnection, + account_id: NetworkAccountId, +) -> Result, DatabaseError> { + let account_id_bytes = conversions::network_account_id_to_bytes(account_id); + + let row: Option = schema::accounts::table + .filter(schema::accounts::account_id.eq(&account_id_bytes)) + .filter(schema::accounts::transaction_id.is_null()) + .select(AccountRow::as_select()) + .first(conn) + .optional()?; + + row.map(|AccountRow { account_data, .. }| conversions::account_from_bytes(&account_data)) + .transpose() +} + /// Returns `true` when an inflight account row exists with the given `transaction_id`. /// /// # Raw SQL diff --git a/crates/ntx-builder/src/db/models/queries/mod.rs b/crates/ntx-builder/src/db/models/queries/mod.rs index 3d71f3a94d..1bf74a66ba 100644 --- a/crates/ntx-builder/src/db/models/queries/mod.rs +++ b/crates/ntx-builder/src/db/models/queries/mod.rs @@ -47,7 +47,7 @@ mod tests; /// /// DELETE FROM notes WHERE created_by IS NOT NULL /// -/// UPDATE notes SET consumed_by = NULL WHERE consumed_by IS NOT NULL +/// UPDATE notes SET consumed_by = NULL WHERE consumed_by IS NOT NULL AND committed_at IS NULL /// ``` pub fn purge_inflight(conn: &mut SqliteConnection) -> Result<(), DatabaseError> { // Delete inflight account rows. @@ -58,10 +58,14 @@ pub fn purge_inflight(conn: &mut SqliteConnection) -> Result<(), DatabaseError> diesel::delete(schema::notes::table.filter(schema::notes::created_by.is_not_null())) .execute(conn)?; - // Un-nullify notes consumed by inflight transactions. - diesel::update(schema::notes::table.filter(schema::notes::consumed_by.is_not_null())) - .set(schema::notes::consumed_by.eq(None::>)) - .execute(conn)?; + // Un-nullify notes consumed by inflight transactions (skip committed notes). + diesel::update( + schema::notes::table + .filter(schema::notes::consumed_by.is_not_null()) + .filter(schema::notes::committed_at.is_null()), + ) + .set(schema::notes::consumed_by.eq(None::>)) + .execute(conn)?; Ok(()) } @@ -154,6 +158,7 @@ pub fn add_transaction( last_error: None, created_by: Some(tx_id_bytes.clone()), consumed_by: None, + committed_at: None, }; diesel::insert_or_ignore_into(schema::notes::table) .values(&insert) @@ -194,8 +199,8 @@ pub fn add_transaction( /// UPDATE accounts SET transaction_id = NULL /// WHERE account_id = ?1 AND transaction_id = ?2 /// -/// -- Delete consumed notes -/// DELETE FROM notes WHERE consumed_by = ?1 +/// -- Mark consumed notes as committed +/// UPDATE notes SET committed_at = ?block_num WHERE consumed_by = ?1 /// /// -- Promote inflight-created notes to committed /// UPDATE notes SET created_by = NULL WHERE created_by = ?1 @@ -242,7 +247,7 @@ pub fn commit_block( .execute(conn)?; } - // Collect accounts of notes consumed by this tx before deleting them. + // Collect accounts of notes consumed by this tx. let consumed_note_accounts: Vec> = schema::notes::table .filter(schema::notes::consumed_by.eq(&tx_id_bytes)) .select(schema::notes::account_id) @@ -251,8 +256,10 @@ pub fn commit_block( affected_accounts.insert(conversions::network_account_id_from_bytes(account_id_bytes)?); } - // Delete consumed notes (consumed_by = tx_id). - diesel::delete(schema::notes::table.filter(schema::notes::consumed_by.eq(&tx_id_bytes))) + // Mark consumed notes as committed (set committed_at = block_num). + let block_num_val = conversions::block_num_to_i64(block_num); + diesel::update(schema::notes::table.filter(schema::notes::consumed_by.eq(&tx_id_bytes))) + .set(schema::notes::committed_at.eq(Some(block_num_val))) .execute(conn)?; // Promote inflight-created notes to committed (set created_by = NULL). diff --git a/crates/ntx-builder/src/db/models/queries/notes.rs b/crates/ntx-builder/src/db/models/queries/notes.rs index 54022d774f..384994c74f 100644 --- a/crates/ntx-builder/src/db/models/queries/notes.rs +++ b/crates/ntx-builder/src/db/models/queries/notes.rs @@ -11,7 +11,6 @@ use miden_standards::note::AccountTargetNetworkNote; use crate::NoteError; use crate::db::models::conv as conversions; use crate::db::schema; -use crate::inflight_note::InflightNetworkNote; // MODELS // ================================================================================================ @@ -40,17 +39,20 @@ pub struct NoteInsert { pub last_error: Option, pub created_by: Option>, pub consumed_by: Option>, + pub committed_at: Option, } -/// Row returned by `get_note_error()`. +/// Row returned by `get_note_status()`. #[derive(Debug, Clone, Queryable, Selectable)] #[diesel(table_name = schema::notes)] #[diesel(check_for_backend(diesel::sqlite::Sqlite))] -pub struct NoteErrorRow { +pub struct NoteStatusRow { pub note_id: Option>, pub last_error: Option, pub attempt_count: i32, pub last_attempt: Option, + pub consumed_by: Option>, + pub committed_at: Option, } // QUERIES @@ -86,6 +88,7 @@ pub fn insert_committed_notes( last_error: None, created_by: None, consumed_by: None, + committed_at: None, }; diesel::replace_into(schema::notes::table).values(&row).execute(conn)?; } @@ -95,8 +98,7 @@ pub fn insert_committed_notes( /// Returns notes available for consumption by a given account. /// /// Queries unconsumed notes (`consumed_by IS NULL`) for the account that have not exceeded the -/// maximum attempt count, then applies backoff filtering in Rust via -/// `InflightNetworkNote::is_available`. +/// maximum attempt count, then applies backoff and execution hint filtering in Rust. /// /// # Raw SQL /// @@ -106,6 +108,7 @@ pub fn insert_committed_notes( /// WHERE /// account_id = ?1 /// AND consumed_by IS NULL +/// AND committed_at IS NULL /// AND attempt_count < ?2 /// ``` #[expect(clippy::cast_possible_wrap)] @@ -114,13 +117,15 @@ pub fn available_notes( account_id: NetworkAccountId, block_num: BlockNumber, max_attempts: usize, -) -> Result, DatabaseError> { +) -> Result, DatabaseError> { let account_id_bytes = conversions::network_account_id_to_bytes(account_id); - // Get unconsumed notes for this account that haven't exceeded the max attempt count. + // Get unconsumed, uncommitted notes for this account that haven't exceeded the max + // attempt count. let rows: Vec = schema::notes::table .filter(schema::notes::account_id.eq(&account_id_bytes)) .filter(schema::notes::consumed_by.is_null()) + .filter(schema::notes::committed_at.is_null()) .filter(schema::notes::attempt_count.lt(max_attempts as i32)) .select(NoteRow::as_select()) .load(conn)?; @@ -129,12 +134,11 @@ pub fn available_notes( for row in rows { #[expect(clippy::cast_sign_loss)] let attempt_count = row.attempt_count as usize; - let note = note_row_to_inflight( - &row.note_data, - attempt_count, - row.last_attempt.map(conversions::block_num_from_i64), - )?; - if note.is_available(block_num) { + let last_attempt = row.last_attempt.map(conversions::block_num_from_i64); + let note = deserialize_note(&row.note_data)?; + + let execution_hint_ok = note.execution_hint().can_be_consumed(block_num).unwrap_or(true); + if execution_hint_ok && has_backoff_passed(block_num, last_attempt, attempt_count) { result.push(note); } } @@ -176,22 +180,22 @@ pub fn notes_failed( Ok(()) } -/// Returns the latest execution error for a note identified by its note ID. +/// Returns the status for a note identified by its note ID. /// /// # Raw SQL /// /// ```sql -/// SELECT note_id, last_error, attempt_count, last_attempt +/// SELECT note_id, last_error, attempt_count, last_attempt, consumed_by /// FROM notes /// WHERE note_id = ?1 /// ``` -pub fn get_note_error( +pub fn get_note_status( conn: &mut SqliteConnection, note_id_bytes: &[u8], -) -> Result, DatabaseError> { +) -> Result, DatabaseError> { schema::notes::table .filter(schema::notes::note_id.eq(note_id_bytes)) - .select(NoteErrorRow::as_select()) + .select(NoteStatusRow::as_select()) .first(conn) .optional() .map_err(Into::into) @@ -200,17 +204,74 @@ pub fn get_note_error( // HELPERS // ================================================================================================ -/// Constructs an `InflightNetworkNote` from DB row fields. -fn note_row_to_inflight( - note_data: &[u8], - attempt_count: usize, - last_attempt: Option, -) -> Result { +/// Deserializes an [`AccountTargetNetworkNote`] from raw note bytes. +fn deserialize_note(note_data: &[u8]) -> Result { let note = Note::read_from_bytes(note_data) .map_err(|source| DatabaseError::deserialization("failed to parse note", source))?; - let note = AccountTargetNetworkNote::new(note).map_err(|source| { + AccountTargetNetworkNote::new(note).map_err(|source| { DatabaseError::deserialization("failed to convert to network note", source) - })?; + }) +} - Ok(InflightNetworkNote::from_parts(note, attempt_count, last_attempt)) +/// Checks if the backoff block period has passed. +/// +/// The number of blocks passed since the last attempt must be greater than or equal to +/// e^(0.25 * `attempt_count`) rounded to the nearest integer. +/// +/// This evaluates to the following: +/// - After 1 attempt, the backoff period is 1 block. +/// - After 3 attempts, the backoff period is 2 blocks. +/// - After 10 attempts, the backoff period is 12 blocks. +/// - After 20 attempts, the backoff period is 148 blocks. +/// - etc... +#[expect(clippy::cast_precision_loss, clippy::cast_sign_loss)] +fn has_backoff_passed( + chain_tip: BlockNumber, + last_attempt: Option, + attempts: usize, +) -> bool { + if attempts == 0 { + return true; + } + // Compute the number of blocks passed since the last attempt. + let blocks_passed = last_attempt + .and_then(|last| chain_tip.checked_sub(last.as_u32())) + .unwrap_or_default(); + + // Compute the exponential backoff threshold: Δ = e^(0.25 * n). + let backoff_threshold = (0.25 * attempts as f64).exp().round() as usize; + + // Check if the backoff period has passed. + blocks_passed.as_usize() > backoff_threshold +} + +#[cfg(test)] +mod tests { + use miden_protocol::block::BlockNumber; + + use super::has_backoff_passed; + + #[rstest::rstest] + #[test] + #[case::all_zero(Some(BlockNumber::GENESIS), BlockNumber::GENESIS, 0, true)] + #[case::no_attempts(None, BlockNumber::GENESIS, 0, true)] + #[case::one_attempt(Some(BlockNumber::GENESIS), BlockNumber::from(2), 1, true)] + #[case::three_attempts(Some(BlockNumber::GENESIS), BlockNumber::from(3), 3, true)] + #[case::ten_attempts(Some(BlockNumber::GENESIS), BlockNumber::from(13), 10, true)] + #[case::twenty_attempts(Some(BlockNumber::GENESIS), BlockNumber::from(149), 20, true)] + #[case::one_attempt_false(Some(BlockNumber::GENESIS), BlockNumber::from(1), 1, false)] + #[case::three_attempts_false(Some(BlockNumber::GENESIS), BlockNumber::from(2), 3, false)] + #[case::ten_attempts_false(Some(BlockNumber::GENESIS), BlockNumber::from(12), 10, false)] + #[case::twenty_attempts_false(Some(BlockNumber::GENESIS), BlockNumber::from(148), 20, false)] + fn backoff_has_passed( + #[case] last_attempt_block_num: Option, + #[case] current_block_num: BlockNumber, + #[case] attempt_count: usize, + #[case] backoff_should_have_passed: bool, + ) { + assert_eq!( + backoff_should_have_passed, + has_backoff_passed(current_block_num, last_attempt_block_num, attempt_count) + ); + } } diff --git a/crates/ntx-builder/src/db/models/queries/tests.rs b/crates/ntx-builder/src/db/models/queries/tests.rs index 48ec573414..108dcab617 100644 --- a/crates/ntx-builder/src/db/models/queries/tests.rs +++ b/crates/ntx-builder/src/db/models/queries/tests.rs @@ -195,11 +195,12 @@ fn block_committed_promotes_inflight_notes_to_committed() { } #[test] -fn block_committed_deletes_consumed_notes() { +fn block_committed_marks_consumed_notes_as_committed() { let (conn, _dir) = &mut test_conn(); let account_id = mock_network_account_id(); let note = mock_single_target_note(account_id, 10); + let note_id = note.as_note().id(); // Insert a committed note. insert_committed_notes(conn, std::slice::from_ref(¬e)).unwrap(); @@ -214,8 +215,13 @@ fn block_committed_deletes_consumed_notes() { let header = mock_block_header(block_num); commit_block(conn, &[tx_id], block_num, &header).unwrap(); - // Consumed note should be deleted. - assert_eq!(count_notes(conn), 0); + // Note should still exist but be marked as committed. + assert_eq!(count_notes(conn), 1); + let row = get_note_status(conn, &conversions::note_id_to_bytes(¬e_id)) + .unwrap() + .unwrap(); + assert_eq!(row.committed_at, Some(conversions::block_num_to_i64(block_num))); + assert!(row.consumed_by.is_some()); } #[test] @@ -251,6 +257,39 @@ fn block_committed_promotes_inflight_account_to_committed() { assert_eq!(count_inflight_accounts(conn), 0); } +// GET COMMITTED ACCOUNT TESTS +// ================================================================================================ + +#[test] +fn get_committed_account_ignores_inflight() { + let (conn, _dir) = &mut test_conn(); + + let account_id = mock_network_account_id(); + let account = mock_account(account_id); + + // Insert only an inflight account row (simulating account creation). + let tx_id = mock_tx_id(1); + let row = AccountInsert { + account_id: conversions::network_account_id_to_bytes(account_id), + transaction_id: Some(conversions::transaction_id_to_bytes(&tx_id)), + account_data: conversions::account_to_bytes(&account), + }; + diesel::insert_into(schema::accounts::table).values(&row).execute(conn).unwrap(); + + // get_committed_account should return None (only inflight exists). + let result = get_committed_account(conn, account_id).unwrap(); + assert!(result.is_none()); + + // Commit the block to promote inflight to committed. + let block_num = BlockNumber::from(1u32); + let header = mock_block_header(block_num); + commit_block(conn, &[tx_id], block_num, &header).unwrap(); + + // Now get_committed_account should return the account. + let result = get_committed_account(conn, account_id).unwrap(); + assert!(result.is_some()); +} + // HANDLE TRANSACTIONS REVERTED TESTS // ================================================================================================ @@ -378,7 +417,7 @@ fn available_notes_filters_consumed_and_exceeded_attempts() { // Only note_good should be available (note_consumed is consumed, note_failed exceeded // attempts). assert_eq!(result.len(), 1); - assert_eq!(result[0].nullifier(), note_good.as_note().nullifier()); + assert_eq!(result[0].as_note().nullifier(), note_good.as_note().nullifier()); } #[test] @@ -397,7 +436,7 @@ fn available_notes_only_returns_notes_for_specified_account() { let result = available_notes(conn, account_id_1, block_num, 30).unwrap(); assert_eq!(result.len(), 1); - assert_eq!(result[0].nullifier(), note_acct1.as_note().nullifier()); + assert_eq!(result[0].as_note().nullifier(), note_acct1.as_note().nullifier()); } // NOTES FAILED TESTS @@ -436,11 +475,11 @@ fn notes_failed_increments_attempt_count() { assert_eq!(last_attempt, Some(conversions::block_num_to_i64(block_num))); } -// GET NOTE ERROR TESTS +// GET NOTE STATUS TESTS // ================================================================================================ #[test] -fn get_note_error_returns_latest_error() { +fn get_note_status_returns_latest_error() { let (conn, _dir) = &mut test_conn(); let account_id = mock_network_account_id(); @@ -450,19 +489,20 @@ fn get_note_error_returns_latest_error() { // Insert as committed note. insert_committed_notes(conn, std::slice::from_ref(¬e)).unwrap(); - // Initially no error. - let result = get_note_error(conn, &conversions::note_id_to_bytes(¬e_id)).unwrap(); + // Initially no error, not consumed. + let result = get_note_status(conn, &conversions::note_id_to_bytes(¬e_id)).unwrap(); assert!(result.is_some()); let row = result.unwrap(); assert!(row.last_error.is_none()); assert_eq!(row.attempt_count, 0); + assert!(row.consumed_by.is_none()); // Mark as failed. let block_num = BlockNumber::from(5u32); notes_failed(conn, &[(note.as_note().nullifier(), test_note_error("first error"))], block_num) .unwrap(); - let result = get_note_error(conn, &conversions::note_id_to_bytes(¬e_id)).unwrap(); + let result = get_note_status(conn, &conversions::note_id_to_bytes(¬e_id)).unwrap(); let row = result.unwrap(); assert_eq!(row.last_error.as_deref(), Some("first error")); assert_eq!(row.attempt_count, 1); @@ -475,21 +515,54 @@ fn get_note_error_returns_latest_error() { ) .unwrap(); - let result = get_note_error(conn, &conversions::note_id_to_bytes(¬e_id)).unwrap(); + let result = get_note_status(conn, &conversions::note_id_to_bytes(¬e_id)).unwrap(); let row = result.unwrap(); assert_eq!(row.last_error.as_deref(), Some("second error")); assert_eq!(row.attempt_count, 2); } #[test] -fn get_note_error_returns_none_for_unknown_note() { +fn get_note_status_returns_none_for_unknown_note() { let (conn, _dir) = &mut test_conn(); let unknown_id = vec![0u8; 32]; - let result = get_note_error(conn, &unknown_id).unwrap(); + let result = get_note_status(conn, &unknown_id).unwrap(); assert!(result.is_none()); } +#[test] +fn get_note_status_includes_consumed_by() { + let (conn, _dir) = &mut test_conn(); + + let account_id = mock_network_account_id(); + let note = mock_single_target_note(account_id, 10); + let note_id = note.as_note().id(); + + // Insert as committed note. + insert_committed_notes(conn, &[note]).unwrap(); + + // Initially consumed_by is NULL. + let row = get_note_status(conn, &conversions::note_id_to_bytes(¬e_id)) + .unwrap() + .unwrap(); + assert!(row.consumed_by.is_none()); + + // Simulate consumption by setting consumed_by to a dummy transaction ID. + let dummy_tx_id = vec![42u8; 32]; + diesel::update( + schema::notes::table + .filter(schema::notes::note_id.eq(conversions::note_id_to_bytes(¬e_id))), + ) + .set(schema::notes::consumed_by.eq(Some(&dummy_tx_id))) + .execute(conn) + .unwrap(); + + let row = get_note_status(conn, &conversions::note_id_to_bytes(¬e_id)) + .unwrap() + .unwrap(); + assert_eq!(row.consumed_by, Some(dummy_tx_id)); +} + // CHAIN STATE TESTS // ================================================================================================ diff --git a/crates/ntx-builder/src/db/schema.rs b/crates/ntx-builder/src/db/schema.rs index c52ca5f538..9797dca10a 100644 --- a/crates/ntx-builder/src/db/schema.rs +++ b/crates/ntx-builder/src/db/schema.rs @@ -35,6 +35,7 @@ diesel::table! { last_error -> Nullable, created_by -> Nullable, consumed_by -> Nullable, + committed_at -> Nullable, } } diff --git a/crates/ntx-builder/src/inflight_note.rs b/crates/ntx-builder/src/inflight_note.rs deleted file mode 100644 index 70af7ded8d..0000000000 --- a/crates/ntx-builder/src/inflight_note.rs +++ /dev/null @@ -1,143 +0,0 @@ -use miden_protocol::block::BlockNumber; -use miden_protocol::note::{Note, Nullifier}; -use miden_standards::note::AccountTargetNetworkNote; - -// INFLIGHT NETWORK NOTE -// ================================================================================================ - -/// An unconsumed network note that may have failed to execute. -/// -/// The block number at which the network note was attempted are approximate and may not -/// reflect the exact block number for which the execution attempt failed. The actual block -/// will likely be soon after the number that is recorded here. -#[derive(Debug, Clone)] -pub struct InflightNetworkNote { - note: AccountTargetNetworkNote, - attempt_count: usize, - last_attempt: Option, -} - -impl InflightNetworkNote { - /// Creates a new inflight network note. - pub fn new(note: AccountTargetNetworkNote) -> Self { - Self { - note, - attempt_count: 0, - last_attempt: None, - } - } - - /// Reconstructs an inflight network note from its constituent parts (e.g., from DB rows). - pub fn from_parts( - note: AccountTargetNetworkNote, - attempt_count: usize, - last_attempt: Option, - ) -> Self { - Self { note, attempt_count, last_attempt } - } - - /// Consumes the inflight network note and returns the inner network note. - pub fn into_inner(self) -> AccountTargetNetworkNote { - self.note - } - - /// Returns a reference to the inner network note. - pub fn to_inner(&self) -> &AccountTargetNetworkNote { - &self.note - } - - /// Returns the number of attempts made to execute the network note. - pub fn attempt_count(&self) -> usize { - self.attempt_count - } - - /// Checks if the network note is available for execution. - /// - /// The note is available if the backoff period has passed. - pub fn is_available(&self, block_num: BlockNumber) -> bool { - self.note.execution_hint().can_be_consumed(block_num).unwrap_or(true) - && has_backoff_passed(block_num, self.last_attempt, self.attempt_count) - } - - /// Registers a failed attempt to execute the network note at the specified block number. - pub fn fail(&mut self, block_num: BlockNumber) { - self.last_attempt = Some(block_num); - self.attempt_count += 1; - } - - pub fn nullifier(&self) -> Nullifier { - self.note.as_note().nullifier() - } -} - -impl From for Note { - fn from(value: InflightNetworkNote) -> Self { - value.into_inner().into_note() - } -} - -// HELPERS -// ================================================================================================ - -/// Checks if the backoff block period has passed. -/// -/// The number of blocks passed since the last attempt must be greater than or equal to -/// e^(0.25 * `attempt_count`) rounded to the nearest integer. -/// -/// This evaluates to the following: -/// - After 1 attempt, the backoff period is 1 block. -/// - After 3 attempts, the backoff period is 2 blocks. -/// - After 10 attempts, the backoff period is 12 blocks. -/// - After 20 attempts, the backoff period is 148 blocks. -/// - etc... -#[expect(clippy::cast_precision_loss, clippy::cast_sign_loss)] -fn has_backoff_passed( - chain_tip: BlockNumber, - last_attempt: Option, - attempts: usize, -) -> bool { - if attempts == 0 { - return true; - } - // Compute the number of blocks passed since the last attempt. - let blocks_passed = last_attempt - .and_then(|last| chain_tip.checked_sub(last.as_u32())) - .unwrap_or_default(); - - // Compute the exponential backoff threshold: Δ = e^(0.25 * n). - let backoff_threshold = (0.25 * attempts as f64).exp().round() as usize; - - // Check if the backoff period has passed. - blocks_passed.as_usize() > backoff_threshold -} - -#[cfg(test)] -mod tests { - use miden_protocol::block::BlockNumber; - - use super::has_backoff_passed; - - #[rstest::rstest] - #[test] - #[case::all_zero(Some(BlockNumber::GENESIS), BlockNumber::GENESIS, 0, true)] - #[case::no_attempts(None, BlockNumber::GENESIS, 0, true)] - #[case::one_attempt(Some(BlockNumber::GENESIS), BlockNumber::from(2), 1, true)] - #[case::three_attempts(Some(BlockNumber::GENESIS), BlockNumber::from(3), 3, true)] - #[case::ten_attempts(Some(BlockNumber::GENESIS), BlockNumber::from(13), 10, true)] - #[case::twenty_attempts(Some(BlockNumber::GENESIS), BlockNumber::from(149), 20, true)] - #[case::one_attempt_false(Some(BlockNumber::GENESIS), BlockNumber::from(1), 1, false)] - #[case::three_attempts_false(Some(BlockNumber::GENESIS), BlockNumber::from(2), 3, false)] - #[case::ten_attempts_false(Some(BlockNumber::GENESIS), BlockNumber::from(12), 10, false)] - #[case::twenty_attempts_false(Some(BlockNumber::GENESIS), BlockNumber::from(148), 20, false)] - fn backoff_has_passed( - #[case] last_attempt_block_num: Option, - #[case] current_block_num: BlockNumber, - #[case] attempt_count: usize, - #[case] backoff_should_have_passed: bool, - ) { - assert_eq!( - backoff_should_have_passed, - has_backoff_passed(current_block_num, last_attempt_block_num, attempt_count) - ); - } -} diff --git a/crates/ntx-builder/src/lib.rs b/crates/ntx-builder/src/lib.rs index fed307ed6c..4b70137380 100644 --- a/crates/ntx-builder/src/lib.rs +++ b/crates/ntx-builder/src/lib.rs @@ -3,7 +3,7 @@ use std::path::PathBuf; use std::sync::Arc; use std::time::Duration; -use actor::AccountActorContext; +use actor::{AccountActorContext, ActorConfig, GrpcClients, State}; use anyhow::Context; use builder::MempoolEventStream; use chain_state::ChainState; @@ -25,7 +25,6 @@ mod chain_state; mod clients; mod coordinator; pub(crate) mod db; -pub(crate) mod inflight_note; pub mod server; #[cfg(test)] @@ -295,18 +294,24 @@ impl NtxBuilderConfig { let (request_tx, actor_request_rx) = mpsc::channel(1); let actor_context = AccountActorContext { - block_producer: block_producer.clone(), - validator, - prover, - chain_state: chain_state.clone(), - store: store.clone(), - script_cache, - max_notes_per_tx: self.max_notes_per_tx, - max_note_attempts: self.max_note_attempts, - idle_timeout: self.idle_timeout, - db: db.clone(), + clients: GrpcClients { + store: store.clone(), + block_producer: block_producer.clone(), + validator, + prover, + }, + state: State { + db: db.clone(), + chain: chain_state.clone(), + script_cache, + }, + config: ActorConfig { + max_notes_per_tx: self.max_notes_per_tx, + max_note_attempts: self.max_note_attempts, + idle_timeout: self.idle_timeout, + max_cycles: self.max_cycles, + }, request_tx, - max_cycles: self.max_cycles, }; Ok(NetworkTransactionBuilder::new( diff --git a/crates/ntx-builder/src/server.rs b/crates/ntx-builder/src/server.rs index 4c2d921aec..12ef5bfa63 100644 --- a/crates/ntx-builder/src/server.rs +++ b/crates/ntx-builder/src/server.rs @@ -1,6 +1,7 @@ use anyhow::Context; use miden_node_proto::generated::note::NoteId; -use miden_node_proto::generated::ntx_builder::{self, api_server}; +use miden_node_proto::generated::ntx_builder::api_server; +use miden_node_proto::generated::rpc; use miden_node_proto_build::ntx_builder_api_descriptor; use miden_node_utils::panic::{CatchPanicLayer, catch_panic_layer_fn}; use miden_node_utils::tracing::grpc::grpc_trace_fn; @@ -19,15 +20,16 @@ use crate::db::Db; /// gRPC server for the network transaction builder. /// -/// Exposes endpoints for querying note execution errors, useful for debugging +/// Exposes endpoints for querying network note status, useful for debugging /// network notes that fail to be consumed. pub struct NtxBuilderRpcServer { db: Db, + max_note_attempts: usize, } impl NtxBuilderRpcServer { - pub fn new(db: Db) -> Self { - Self { db } + pub fn new(db: Db, max_note_attempts: usize) -> Self { + Self { db, max_note_attempts } } /// Starts the gRPC server on the given listener. @@ -58,10 +60,10 @@ impl NtxBuilderRpcServer { #[tonic::async_trait] impl api_server::Api for NtxBuilderRpcServer { #[expect(clippy::cast_sign_loss)] - async fn get_note_error( + async fn get_network_note_status( &self, request: Request, - ) -> Result, Status> { + ) -> Result, Status> { let note_id_proto = request.into_inner(); let note_id_digest: Word = note_id_proto @@ -73,8 +75,8 @@ impl api_server::Api for NtxBuilderRpcServer { let note_id = miden_protocol::note::NoteId::from_raw(note_id_digest); - let row = self.db.get_note_error(note_id).await.map_err(|err| { - tracing::error!(err = %err, "failed to query note error from DB"); + let row = self.db.get_note_status(note_id).await.map_err(|err| { + tracing::error!(err = %err, "failed to query note status from DB"); Status::internal("database error") })?; @@ -82,8 +84,16 @@ impl api_server::Api for NtxBuilderRpcServer { return Err(Status::not_found("note not found in ntx-builder database")); }; - let response = ntx_builder::GetNoteErrorResponse { - error: row.last_error, + let status = derive_status( + row.committed_at.is_some(), + row.consumed_by.is_some(), + row.attempt_count as usize, + self.max_note_attempts, + ); + + let response = rpc::GetNetworkNoteStatusResponse { + status: status.into(), + last_error: row.last_error, attempt_count: row.attempt_count as u32, last_attempt_block_num: row.last_attempt.map(|v| v as u32), }; @@ -91,3 +101,60 @@ impl api_server::Api for NtxBuilderRpcServer { Ok(Response::new(response)) } } + +// HELPERS +// ================================================================================================ + +/// Derives the lifecycle status of a network note from its DB state. +fn derive_status( + is_committed: bool, + is_consumed: bool, + attempt_count: usize, + max_note_attempts: usize, +) -> rpc::NetworkNoteStatus { + if is_committed { + rpc::NetworkNoteStatus::NullifierCommitted + } else if is_consumed { + rpc::NetworkNoteStatus::NullifierInflight + } else if attempt_count >= max_note_attempts { + rpc::NetworkNoteStatus::Discarded + } else { + rpc::NetworkNoteStatus::Pending + } +} + +#[cfg(test)] +mod tests { + use miden_node_proto::generated::rpc::NetworkNoteStatus; + + use super::*; + + #[test] + fn derive_status_pending() { + assert_eq!(derive_status(false, false, 0, 30), NetworkNoteStatus::Pending); + assert_eq!(derive_status(false, false, 15, 30), NetworkNoteStatus::Pending); + assert_eq!(derive_status(false, false, 29, 30), NetworkNoteStatus::Pending); + } + + #[test] + fn derive_status_processed() { + assert_eq!(derive_status(false, true, 0, 30), NetworkNoteStatus::NullifierInflight); + assert_eq!(derive_status(false, true, 5, 30), NetworkNoteStatus::NullifierInflight); + // consumed_by takes precedence over attempt count + assert_eq!(derive_status(false, true, 30, 30), NetworkNoteStatus::NullifierInflight); + } + + #[test] + fn derive_status_discarded() { + assert_eq!(derive_status(false, false, 30, 30), NetworkNoteStatus::Discarded); + assert_eq!(derive_status(false, false, 100, 30), NetworkNoteStatus::Discarded); + } + + #[test] + fn derive_status_committed() { + assert_eq!(derive_status(true, true, 0, 30), NetworkNoteStatus::NullifierCommitted); + assert_eq!(derive_status(true, true, 5, 30), NetworkNoteStatus::NullifierCommitted); + // committed takes precedence over everything + assert_eq!(derive_status(true, false, 30, 30), NetworkNoteStatus::NullifierCommitted); + } +} diff --git a/crates/proto/Cargo.toml b/crates/proto/Cargo.toml index fa48024ce5..42cb8aeb2e 100644 --- a/crates/proto/Cargo.toml +++ b/crates/proto/Cargo.toml @@ -34,9 +34,11 @@ proptest = { version = "1.7" } [build-dependencies] build-rs = { workspace = true } +codegen = { workspace = true } fs-err = { workspace = true } miden-node-proto-build = { features = ["internal"], workspace = true } miette = { version = "7.6" } +prost-types = { workspace = true } tonic-prost-build = { workspace = true } [package.metadata.cargo-machete] diff --git a/crates/proto/build.rs b/crates/proto/build.rs index 4c3d38ab47..8e23179ac4 100644 --- a/crates/proto/build.rs +++ b/crates/proto/build.rs @@ -1,17 +1,20 @@ +use std::collections::HashSet; +use std::io::ErrorKind; use std::path::Path; +use std::process::{Command, ExitStatus}; +use codegen::{Function, Impl, Module, Trait, Type}; use fs_err as fs; use miden_node_proto_build::{ block_producer_api_descriptor, ntx_builder_api_descriptor, remote_prover_api_descriptor, rpc_api_descriptor, - store_block_producer_api_descriptor, - store_ntx_builder_api_descriptor, - store_rpc_api_descriptor, + store_api_descriptor, validator_api_descriptor, }; use miette::{Context, IntoDiagnostic}; +use prost_types::{MethodDescriptorProto, ServiceDescriptorProto}; use tonic_prost_build::FileDescriptorSet; /// Generates Rust protobuf bindings using `miden-node-proto-build`. @@ -24,17 +27,33 @@ fn main() -> miette::Result<()> { .into_diagnostic() .wrap_err("creating destination folder")?; - generate_bindings(rpc_api_descriptor(), &dst_dir)?; - generate_bindings(store_rpc_api_descriptor(), &dst_dir)?; - generate_bindings(store_ntx_builder_api_descriptor(), &dst_dir)?; - generate_bindings(store_block_producer_api_descriptor(), &dst_dir)?; - generate_bindings(block_producer_api_descriptor(), &dst_dir)?; - generate_bindings(remote_prover_api_descriptor(), &dst_dir)?; - generate_bindings(validator_api_descriptor(), &dst_dir)?; - generate_bindings(ntx_builder_api_descriptor(), &dst_dir)?; + let descriptor_sets = [ + rpc_api_descriptor(), + store_api_descriptor(), + block_producer_api_descriptor(), + remote_prover_api_descriptor(), + validator_api_descriptor(), + ntx_builder_api_descriptor(), + ]; + + for file_descriptors in &descriptor_sets { + generate_bindings(file_descriptors.clone(), &dst_dir)?; + } + + let server_dst_dir = dst_dir.join("server"); + fs::create_dir_all(&server_dst_dir) + .into_diagnostic() + .wrap_err("creating server destination folder")?; + + generate_server_modules(&descriptor_sets, &server_dst_dir)?; + + generate_mod_rs(&server_dst_dir) + .into_diagnostic() + .wrap_err("generating server mod.rs")?; generate_mod_rs(&dst_dir).into_diagnostic().wrap_err("generating mod.rs")?; + rustfmt_generated(&dst_dir)?; Ok(()) } @@ -54,30 +73,488 @@ fn generate_bindings(file_descriptors: FileDescriptorSet, dst_dir: &Path) -> mie Ok(()) } +fn rustfmt_generated(dir: &Path) -> miette::Result<()> { + let mut rs_files = Vec::new(); + collect_rs_files(dir, &mut rs_files)?; + + if rs_files.is_empty() { + return Ok(()); + } + + let status = match Command::new("rustfmt").args(&rs_files).status() { + Err(e) if e.kind() == ErrorKind::NotFound => { + // rustfmt is not installed, skip without an error + ExitStatus::default() + }, + Err(e) => return Err(e).into_diagnostic().wrap_err("running rustfmt on generated files"), + Ok(status) => status, + }; + + if !status.success() { + miette::bail!("rustfmt failed with status: {status}"); + } + + Ok(()) +} + +fn collect_rs_files(dir: &Path, out: &mut Vec) -> miette::Result<()> { + for entry in fs_err::read_dir(dir).into_diagnostic()? { + let entry = entry.into_diagnostic()?; + let path = entry.path(); + if path.is_dir() { + collect_rs_files(&path, out)?; + } else if path.extension().is_some_and(|ext| ext == "rs") { + out.push(path); + } + } + Ok(()) +} + /// Generate `mod.rs` which includes all files in the folder as submodules. fn generate_mod_rs(dst_dir: impl AsRef) -> std::io::Result<()> { - let mod_filepath = dst_dir.as_ref().join("mod.rs"); + // I couldn't find any `codegen::` function for `mod ;`, so we generate it manually. + let mut modules = Vec::new(); - // Discover all submodules by iterating over the folder contents. - let mut submodules = Vec::new(); for entry in fs::read_dir(dst_dir.as_ref())? { let entry = entry?; let path = entry.path(); - if path.is_file() { - let file_stem = path - .file_stem() - .and_then(|f| f.to_str()) - .expect("Could not get file name") - .to_owned(); - - submodules.push(file_stem); + + let module = if path.is_file() { + path.file_stem().and_then(|f| f.to_str()).expect("Could not get file name") + } else if path.is_dir() { + path.file_name().and_then(|f| f.to_str()).expect("Could not get directory name") + } else { + continue; + }; + + modules.push(format!("pub mod {module};")); + } + + modules.sort(); + fs::write(dst_dir.as_ref().join("mod.rs"), modules.join("\n")) +} + +/// Generate server facade modules (one per service) from the provided descriptor sets. +fn generate_server_modules( + descriptor_sets: &[FileDescriptorSet], + dst_dir: &Path, +) -> miette::Result<()> { + let mut generated: HashSet<(String, String)> = HashSet::new(); + + for fds in descriptor_sets { + for file in &fds.file { + let package = file.package.as_deref().unwrap_or_default(); + let package = package.replace('.', "_"); + + for service in &file.service { + let service_name = service.name.as_deref().unwrap_or("Service"); + let key = (package.clone(), service_name.to_string()); + if !generated.insert(key) { + continue; + } + + let service_name = to_snake_case(service_name); + let module_name = format!("{}_{}", &package, service_name); + + let contents = + Service::from_descriptor(service, &package)?.generate().scope().to_string(); + + let path = dst_dir.join(format!("{module_name}.rs")); + fs::write(path, contents).into_diagnostic().wrap_err("writing server module")?; + } + } + } + + Ok(()) +} + +struct Service { + name: String, + package: String, + unary_methods: Vec, + server_streams: Vec, +} + +struct UnaryMethod { + name: String, + request: String, + response: String, +} + +struct ServerStream { + name: String, + request: String, + response: String, +} + +impl Service { + fn from_descriptor(descriptor: &ServiceDescriptorProto, package: &str) -> miette::Result { + let name = descriptor.name().to_string(); + let unary_methods = descriptor + .method + .iter() + .filter(|method| !method.client_streaming() && !method.server_streaming()) + .map(UnaryMethod::from_descriptor) + .collect(); + let server_streams = descriptor + .method + .iter() + .filter(|method| method.server_streaming()) + .map(ServerStream::from_descriptor) + .collect(); + let package = package.to_string(); + + // We don't have any client streams, so no need to support them. + miette::ensure!( + !descriptor.method.iter().any(MethodDescriptorProto::client_streaming), + "client streams are not supported" + ); + + Ok(Self { + name, + package, + unary_methods, + server_streams, + }) + } + + /// Generates a module containing the service's interface and implementation, including the + /// methods. + fn generate(&self) -> Module { + let mut module = Module::new(&self.name); + + module.push_trait(self.service_trait()); + module.push_impl(self.blanket_impl()); + module.push_impl(self.tonic_impl()); + + for method in &self.unary_methods { + module.push_trait(method.as_trait()); + } + + for stream in &self.server_streams { + module.push_trait(stream.as_trait()); + } + + module + } + + /// The trait describing the service's interface. + /// + /// This is a super trait consisting of all the gRPC method traits for this service. + /// + /// ```rust + /// trait : + /// method[0]::trait() + + /// method[1]::trait() + + /// ... + /// method[N]::trait(), + /// {} + /// ``` + fn service_trait(&self) -> Trait { + let mut ret = Trait::new(format!("{}Service", &self.name)); + ret.vis("pub"); + + for method in &self.unary_methods { + ret.parent(method.as_trait().ty()); } + + for stream in &self.server_streams { + ret.parent(stream.as_trait().ty()); + } + + ret + } + + /// The blanket implementation of the the service's trait, for all `T` that implement all + /// required gRPC methods. + /// + /// ```rust + /// impl for T + /// where T: + /// method[0]::trait() + + /// method[1]::trait() + + /// ... + /// method[N]::trait(), + /// {} + /// ``` + fn blanket_impl(&self) -> Impl { + let mut ret = Impl::new("T"); + ret.generic("T").impl_trait(self.service_trait().ty()); + + for method in &self.unary_methods { + ret.bound("T", method.as_trait().ty()); + } + + for stream in &self.server_streams { + ret.bound("T", stream.as_trait().ty()); + } + + ret + } + + /// Blanket implementation for all T that implement our service trait, for the tonic generated + /// trait. + /// + /// ```rust + /// #[tonic::async_trait] + /// impl tonic::generated::service_trait for T + /// where T: + /// + Send + Sync + 'static { + /// + /// async fn tonic_method[0](request) -> response { + /// ::full(self, request.into_inner()).await.map(tonic::Response::new) + /// } + /// + /// ... + /// } + /// ``` + fn tonic_impl(&self) -> Impl { + let tonic_path = format!( + "crate::generated::{}::{}_server::{}", + self.package, + to_snake_case(&self.name), + self.name + ); + + let mut ret = Impl::new("T"); + ret.generic("T") + .bound("T", self.service_trait().ty()) + .bound("T", "Send") + .bound("T", "Sync") + .bound("T", "'static") + .impl_trait(tonic_path) + .r#macro("#[tonic::async_trait]"); + + for method in &self.unary_methods { + ret.push_fn(method.tonic_impl()); + } + + for stream in &self.server_streams { + ret.push_fn(stream.tonic_impl()); + ret.associate_type(stream.associated_type().0, stream.associated_type().1); + } + + ret + } +} + +impl UnaryMethod { + fn from_descriptor(descriptor: &MethodDescriptorProto) -> Self { + let name = descriptor.name().to_string(); + + let request = grpc_path_to_generated(descriptor.input_type()); + let response = grpc_path_to_generated(descriptor.output_type()); + + Self { name, request, response } + } + + /// Function invoking the method handler and mapping from/to tonic's request/response. + /// + /// ```rust + /// async fn ( + /// request: tonic::Request<>, + /// ) -> tonic::Result>> { + /// >::full(self, request.into_inner()).await.map(tonic::Response::new) + /// } + /// ``` + fn tonic_impl(&self) -> Function { + let mut ret = Function::new(to_snake_case(&self.name)); + ret.set_async(true) + .arg_ref_self() + .arg("request", format!("tonic::Request<{}>", self.request)) + .ret(format!("tonic::Result>", self.response)) + .line("#[allow(clippy::unit_arg)]") + .line(format!( + "::full(self, request.into_inner()).await.map(tonic::Response::new)", + self.name + )); + + ret + } + + /// This method's trait definition. + /// + /// ```rust + /// trait { + /// type Input; + /// type Output; + /// + /// fn decode(request: ) -> tonic::Result; + /// fn encode(output: Self::Output) -> tonic::Result; + /// async fn handle(&self, input: Self::Input) -> tonic::Result; + /// + /// async fn full( + /// &self, + /// request: , + /// ) -> tonic::Result<> { + /// let input = Self::decode(request)?; + /// let output = self.handle(input).await?; + /// Self::encode(output) + /// } + /// } + // /// ``` + fn as_trait(&self) -> Trait { + let mut ret = Trait::new(&self.name); + ret.vis("pub"); + ret.attr("tonic::async_trait"); + ret.associated_type("Input"); + ret.associated_type("Output"); + + ret.new_fn("decode") + .arg("request", &self.request) + .ret("tonic::Result"); + + ret.new_fn("encode") + .arg("output", "Self::Output") + .ret(format!("tonic::Result<{}>", &self.response)); + + ret.new_fn("handle") + .set_async(true) + .arg_ref_self() + .arg("input", "Self::Input") + .ret("tonic::Result"); + + ret.new_fn("full") + .set_async(true) + .arg_ref_self() + .arg("request", &self.request) + .ret(format!("tonic::Result<{}>", &self.response)) + .line("let input = Self::decode(request)?;") + .line("let output = self.handle(input).await?;") + .line("Self::encode(output)"); + + ret } +} + +impl ServerStream { + fn from_descriptor(descriptor: &MethodDescriptorProto) -> Self { + let name = descriptor.name().to_string(); - submodules.sort(); + let request = grpc_path_to_generated(descriptor.input_type()); + let response = grpc_path_to_generated(descriptor.output_type()); + + Self { name, request, response } + } - let modules = submodules.iter().map(|f| format!("pub mod {f};\n")); - let contents = modules.into_iter().collect::(); + /// This stream's per-method trait definition. + /// + /// ```rust + /// trait { + /// type Input; + /// type Item; + /// type ItemStream: Stream> + Send + 'static; + /// + /// fn decode(request: ) -> tonic::Result; + /// fn encode(item: Self::Item) -> tonic::Result; + /// async fn handle(&self, input: Self::Input) -> tonic::Result; + /// + /// async fn full(&self, request: ) -> tonic::Result>>> { + /// use tokio_stream::StreamExt as _; + /// let input = Self::decode(request)?; + /// let stream = self.handle(input).await?; + /// Ok(Box::pin(stream.map(|item| item.and_then(|i| Self::encode(i))))) + /// } + /// } + /// ``` + fn as_trait(&self) -> Trait { + let stream_bound = + "tonic::codegen::tokio_stream::Stream>".to_string(); + let boxed_stream = format!( + "std::pin::Pin> + Send + 'static>>", + self.response + ); + + let mut ret = Trait::new(&self.name); + ret.vis("pub"); + ret.attr("tonic::async_trait"); + ret.associated_type("Input"); + ret.associated_type("Item"); + ret.associated_type("ItemStream") + .bound(&stream_bound) + .bound("Send") + .bound("'static"); + + ret.new_fn("decode") + .arg("request", &self.request) + .ret("tonic::Result"); + + ret.new_fn("encode") + .arg("item", "Self::Item") + .ret(format!("tonic::Result<{}>", &self.response)); + + ret.new_fn("handle") + .set_async(true) + .arg_ref_self() + .arg("input", "Self::Input") + .ret("tonic::Result"); + + ret.new_fn("full") + .set_async(true) + .arg_ref_self() + .arg("request", &self.request) + .ret(format!("tonic::Result<{boxed_stream}>")) + .line("use tonic::codegen::tokio_stream::StreamExt as _;") + .line("let input = Self::decode(request)?;") + .line("let stream = self.handle(input).await?;") + .line("Ok(Box::pin(stream.map(|item| item.and_then(|i| Self::encode(i)))))"); + + ret + } + + fn tonic_impl(&self) -> Function { + let mut ret = Function::new(to_snake_case(&self.name)); + ret.set_async(true) + .arg_ref_self() + .arg("request", format!("tonic::Request<{}>", self.request)) + .ret(format!("tonic::Result>", self.associated_type().0)) + .line("#[allow(clippy::unit_arg)]") + .line(format!( + "::full(self, request.into_inner()).await.map(tonic::Response::new)", + self.name + )); + + ret + } + + fn associated_type(&self) -> (String, Type) { + ( + format!("{}Stream", self.name), + Type::new(format!( + "std::pin::Pin> + Send + 'static>>", + self.response + )), + ) + } +} + +/// Converts a string to `snake_case`. +fn to_snake_case(s: &str) -> String { + let mut ret = String::new(); + + for c in s.chars() { + if c.is_uppercase() { + if !ret.is_empty() { + ret.push('_'); + } + } + ret.push(c.to_ascii_lowercase()); + } + + ret +} + +/// Translates a gRPC protobuf path to the corresponding generated Rust path. This is used to +/// translate the protobuf type definitions to their tonic generated Rust types. +/// +/// i.e. `.x.y.z` -> `crate::generated::x::y::z` +/// +/// It also handles the case where the path is `.google.protobuf.Empty` by returning `()`. +fn grpc_path_to_generated(path: &str) -> String { + if path == ".google.protobuf.Empty" { + return "()".to_string(); + } - fs::write(mod_filepath, contents) + let path = path.trim_start_matches('.').replace('.', "::"); + format!("crate::generated::{path}") } diff --git a/crates/proto/src/domain/block.rs b/crates/proto/src/domain/block.rs index 7f2646db0f..19a4bf8bf1 100644 --- a/crates/proto/src/domain/block.rs +++ b/crates/proto/src/domain/block.rs @@ -332,6 +332,42 @@ impl From<&FeeParameters> for proto::blockchain::FeeParameters { } } +// SYNC TARGET +// ================================================================================================ + +/// The target block to sync up to in a chain MMR sync request. +#[derive(Debug, Clone, Copy, PartialEq, Eq)] +pub enum SyncTarget { + /// Sync up to a specific block number (inclusive). + BlockNumber(BlockNumber), + /// Sync up to the latest committed block (chain tip). + CommittedChainTip, + /// Sync up to the latest proven block. + ProvenChainTip, +} + +impl TryFrom for SyncTarget { + type Error = ConversionError; + + fn try_from( + value: proto::rpc::sync_chain_mmr_request::UpperBound, + ) -> Result { + use proto::rpc::sync_chain_mmr_request::UpperBound; + + match value { + UpperBound::BlockNum(block_num) => Ok(Self::BlockNumber(block_num.into())), + UpperBound::ChainTip(tip) => match proto::rpc::ChainTip::try_from(tip) { + Ok(proto::rpc::ChainTip::Committed) => Ok(Self::CommittedChainTip), + Ok(proto::rpc::ChainTip::Proven) => Ok(Self::ProvenChainTip), + // These variants should never be encountered. + Ok(proto::rpc::ChainTip::Unspecified) | Err(_) => { + Err(ConversionError::message("unexpected chain tip")) + }, + }, + } + } +} + // BLOCK RANGE // ================================================================================================ diff --git a/crates/rpc/src/server/api.rs b/crates/rpc/src/server/api.rs index 56386570f8..699c813f75 100644 --- a/crates/rpc/src/server/api.rs +++ b/crates/rpc/src/server/api.rs @@ -234,7 +234,7 @@ impl api_server::Api for RpcService { async fn get_block_by_number( &self, - request: Request, + request: Request, ) -> Result, Status> { let request = request.into_inner(); @@ -586,23 +586,19 @@ impl api_server::Api for RpcService { // -- Note debugging endpoints ---------------------------------------------------------------- - async fn get_note_error( + async fn get_network_note_status( &self, request: Request, - ) -> Result, Status> { + ) -> Result, Status> { debug!(target: COMPONENT, request = ?request.get_ref()); let Some(ntx_builder) = &self.ntx_builder else { return Err(Status::unavailable("Network transaction builder is not enabled")); }; - let response = ntx_builder.clone().get_note_error(request).await?.into_inner(); + let response = ntx_builder.clone().get_network_note_status(request).await?.into_inner(); - Ok(Response::new(proto::rpc::GetNoteErrorResponse { - error: response.error, - attempt_count: response.attempt_count, - last_attempt_block_num: response.last_attempt_block_num, - })) + Ok(Response::new(response)) } } diff --git a/crates/rpc/src/tests.rs b/crates/rpc/src/tests.rs index 794d9b6042..da19ab6d0b 100644 --- a/crates/rpc/src/tests.rs +++ b/crates/rpc/src/tests.rs @@ -607,8 +607,10 @@ async fn sync_chain_mmr_returns_delta() { let (store_runtime, _data_directory, _genesis, _store_addr) = start_store(store_listener).await; let request = proto::rpc::SyncChainMmrRequest { - block_range: Some(proto::rpc::BlockRange { block_from: 0, block_to: None }), - finality: proto::rpc::Finality::Committed.into(), + block_from: 0, + upper_bound: Some(proto::rpc::sync_chain_mmr_request::UpperBound::ChainTip( + proto::rpc::ChainTip::Committed.into(), + )), }; let response = rpc_client.sync_chain_mmr(request).await.expect("sync_chain_mmr should succeed"); let response = response.into_inner(); diff --git a/crates/store/src/blocks.rs b/crates/store/src/blocks.rs index ad34ff0dd3..749ef02892 100644 --- a/crates/store/src/blocks.rs +++ b/crates/store/src/blocks.rs @@ -1,3 +1,12 @@ +//! File-based storage for raw block data and block proofs. +//! +//! Block data is stored under `{store_dir}/{epoch:04x}/block_{block_num:08x}.dat`, and proof data +//! for proven blocks is stored under `{store_dir}/{epoch:04x}/proof_{block_num:08x}.dat`. +//! +//! The epoch is derived from the 16 most significant bits of the block number (i.e., +//! `block_num >> 16`), and both the epoch and block number are formatted as zero-padded +//! hexadecimal strings. + use std::io::ErrorKind; use std::ops::Not; use std::path::PathBuf; @@ -102,9 +111,9 @@ impl BlockStore { #[instrument( target = COMPONENT, name = "store.block_store.save_proof", - skip(self, data), + skip_all, err, - fields(proof_size = data.len()) + fields(block.number = block_num.as_u32(), proof_size = data.len()) )] pub async fn save_proof(&self, block_num: BlockNumber, data: &[u8]) -> std::io::Result<()> { let (epoch_path, proof_path) = self.epoch_proof_path(block_num)?; @@ -115,6 +124,14 @@ impl BlockStore { tokio::fs::write(proof_path, data).await } + pub async fn load_proof(&self, block_num: BlockNumber) -> std::io::Result>> { + match tokio::fs::read(self.proof_path(block_num)).await { + Ok(data) => Ok(Some(data)), + Err(err) if err.kind() == std::io::ErrorKind::NotFound => Ok(None), + Err(err) => Err(err), + } + } + // HELPER FUNCTIONS // -------------------------------------------------------------------------------------------- diff --git a/crates/store/src/db/mod.rs b/crates/store/src/db/mod.rs index 86581a9d96..b4c6c8ce71 100644 --- a/crates/store/src/db/mod.rs +++ b/crates/store/src/db/mod.rs @@ -271,9 +271,8 @@ impl Db { // Run migrations. apply_migrations(&mut conn).context("failed to apply database migrations")?; - // Insert genesis block data. Deconstruct into signed block. - let (header, body, signature, _proof) = genesis.into_inner().into_parts(); - let genesis_block = SignedBlock::new_unchecked(header, body, signature); + // Insert genesis block data. + let genesis_block = genesis.into_inner(); conn.transaction(move |conn| models::queries::apply_block(conn, &genesis_block, &[], None)) .context("failed to insert genesis block")?; Ok(()) @@ -578,12 +577,15 @@ impl Db { /// /// Atomically clears `proving_inputs` for the given block, then walks forward from the /// current proven-in-sequence tip through consecutive proven blocks, marking each as - /// proven-in-sequence. Returns the block numbers that were newly marked in-sequence. + /// proven-in-sequence. + /// + /// Returns the new tip of blocks that are proven in-sequence (which may have been unchanged by + /// this function). #[instrument(target = COMPONENT, skip_all, err)] pub async fn mark_proven_and_advance_sequence( &self, block_num: BlockNumber, - ) -> Result> { + ) -> Result { self.transact("mark block proven", move |conn| { mark_proven_and_advance_sequence(conn, block_num) }) @@ -619,7 +621,7 @@ impl Db { /// /// This includes the genesis block, which is not technically proven, but treated as such. #[instrument(level = "debug", target = COMPONENT, skip_all, ret(level = "debug"), err)] - pub async fn select_latest_proven_in_sequence_block_num(&self) -> Result { + pub async fn proven_chain_tip(&self) -> Result { self.transact("select latest proven block num", |conn| { models::queries::select_latest_proven_in_sequence_block_num(conn) }) @@ -843,41 +845,46 @@ impl Db { /// 3. Walks forward from the current proven-in-sequence tip through consecutive proven blocks and /// sets `proven_in_sequence = TRUE` for each. /// +/// Returns the new tip of blocks that are proven in-sequence (which may have been unchanged by this +/// function). +/// /// Returns [`DatabaseError::DataCorrupted`] if any proven-but-not-in-sequence block is found at /// or below the current tip, as that indicates a consistency bug. pub(crate) fn mark_proven_and_advance_sequence( conn: &mut SqliteConnection, block_num: BlockNumber, -) -> Result, DatabaseError> { +) -> Result { // Clear proving_inputs for the specified block. models::queries::clear_block_proving_inputs(conn, block_num)?; // Get the current proven-in-sequence tip (highest in-sequence). - let mut tip = models::queries::select_latest_proven_in_sequence_block_num(conn)?; + let current_tip = models::queries::select_latest_proven_in_sequence_block_num(conn)?; + let mut new_tip = current_tip; // Get all blocks that are proven but not yet marked in-sequence. let unsequenced = models::queries::select_proven_not_in_sequence_blocks(conn)?; // Walk forward from the tip through consecutive proven blocks. - let mut newly_in_sequence = Vec::new(); for candidate in unsequenced { - if candidate <= tip { + if candidate <= current_tip { return Err(DatabaseError::DataCorrupted(format!( - "block {candidate} is proven but not marked in-sequence while the tip is at {tip}" + "block {candidate} is proven but not marked in-sequence while the tip is at {current_tip}" ))); } - if candidate == tip + 1 { - tip = candidate; - newly_in_sequence.push(candidate); + if candidate == new_tip.child() { + // Walk the tip forward. + new_tip = candidate; } else { + // Sequence has been broken. Discontinue walking tip forward. break; } } // Mark the newly contiguous blocks as proven-in-sequence. - if let (Some(&from), Some(&to)) = (newly_in_sequence.first(), newly_in_sequence.last()) { - models::queries::mark_blocks_as_proven_in_sequence(conn, from, to)?; + if new_tip > current_tip { + let block_from = current_tip.child(); + models::queries::mark_blocks_as_proven_in_sequence(conn, block_from, new_tip)?; } - Ok(newly_in_sequence) + Ok(new_tip) } diff --git a/crates/store/src/db/tests.rs b/crates/store/src/db/tests.rs index 738889cddd..0375fe1dd6 100644 --- a/crates/store/src/db/tests.rs +++ b/crates/store/src/db/tests.rs @@ -3949,9 +3949,9 @@ fn mark_block_proven_advances_in_sequence_for_consecutive_blocks() { // Mark all three as proven in order. Each call atomically advances the in-sequence tip. for i in 1u32..=3 { - let advanced = + let new_tip = super::mark_proven_and_advance_sequence(&mut conn, BlockNumber::from(i)).unwrap(); - assert_eq!(advanced, vec![BlockNumber::from(i)]); + assert_eq!(new_tip, BlockNumber::from(i)); } let latest = queries::select_latest_proven_in_sequence_block_num(&mut conn).unwrap(); @@ -3969,17 +3969,17 @@ fn mark_block_proven_with_hole_does_not_advance_past_gap() { } // Prove block 1 — advances tip to 1. - let advanced = + let new_tip = super::mark_proven_and_advance_sequence(&mut conn, BlockNumber::from(1u32)).unwrap(); - assert_eq!(advanced, vec![BlockNumber::from(1u32)]); + assert_eq!(new_tip, BlockNumber::from(1u32)); // Prove blocks 3, 4 (skipping 2) — cannot advance past the gap. - let advanced = + let new_tip = super::mark_proven_and_advance_sequence(&mut conn, BlockNumber::from(3u32)).unwrap(); - assert!(advanced.is_empty()); - let advanced = + assert_eq!(new_tip, BlockNumber::from(1u32)); + let new_tip = super::mark_proven_and_advance_sequence(&mut conn, BlockNumber::from(4u32)).unwrap(); - assert!(advanced.is_empty()); + assert_eq!(new_tip, BlockNumber::from(1u32)); // Latest proven in sequence should be 1 (blocks 3, 4 are proven but not in sequence). let latest = queries::select_latest_proven_in_sequence_block_num(&mut conn).unwrap(); @@ -3997,28 +3997,25 @@ fn mark_block_proven_filling_hole_advances_through_all_consecutive() { } // Prove blocks out of order: 1, 3, 4 first. - let advanced = + let new_tip = super::mark_proven_and_advance_sequence(&mut conn, BlockNumber::from(1u32)).unwrap(); - assert_eq!(advanced, vec![BlockNumber::from(1u32)]); - let advanced = + assert_eq!(new_tip, BlockNumber::from(1u32)); + let new_tip = super::mark_proven_and_advance_sequence(&mut conn, BlockNumber::from(3u32)).unwrap(); - assert!(advanced.is_empty()); - let advanced = + assert_eq!(new_tip, BlockNumber::from(1u32)); + let new_tip = super::mark_proven_and_advance_sequence(&mut conn, BlockNumber::from(4u32)).unwrap(); - assert!(advanced.is_empty()); + assert_eq!(new_tip, BlockNumber::from(1u32)); assert_eq!( queries::select_latest_proven_in_sequence_block_num(&mut conn).unwrap(), BlockNumber::from(1u32), ); - // Now prove block 2, filling the hole. Should advance through 2, 3, 4. - let advanced = + // Now prove block 2, filling the hole. Should advance tip through to 4. + let new_tip = super::mark_proven_and_advance_sequence(&mut conn, BlockNumber::from(2u32)).unwrap(); - assert_eq!( - advanced, - vec![BlockNumber::from(2u32), BlockNumber::from(3u32), BlockNumber::from(4u32)], - ); + assert_eq!(new_tip, BlockNumber::from(4u32)); // Now all blocks through 4 are proven in sequence. let latest = queries::select_latest_proven_in_sequence_block_num(&mut conn).unwrap(); @@ -4068,9 +4065,9 @@ fn mark_block_proven_is_idempotent_for_in_sequence() { create_unproven_block(&mut conn, BlockNumber::from(1u32)); // First call marks block 1 proven and advances it in-sequence. - let advanced = + let new_tip = super::mark_proven_and_advance_sequence(&mut conn, BlockNumber::from(1u32)).unwrap(); - assert_eq!(advanced, vec![BlockNumber::from(1u32)]); + assert_eq!(new_tip, BlockNumber::from(1u32)); let latest = queries::select_latest_proven_in_sequence_block_num(&mut conn).unwrap(); assert_eq!(latest, BlockNumber::from(1u32)); diff --git a/crates/store/src/genesis/mod.rs b/crates/store/src/genesis/mod.rs index 6c4624fb00..b875aa5f31 100644 --- a/crates/store/src/genesis/mod.rs +++ b/crates/store/src/genesis/mod.rs @@ -9,9 +9,8 @@ use miden_protocol::block::{ BlockHeader, BlockNoteTree, BlockNumber, - BlockProof, FeeParameters, - ProvenBlock, + SignedBlock, }; use miden_protocol::crypto::merkle::mmr::{Forest, MmrPeaks}; use miden_protocol::crypto::merkle::smt::{LargeSmt, MemoryStorage, Smt}; @@ -35,23 +34,23 @@ pub struct GenesisState { } /// A type-safety wrapper ensuring that genesis block data can only be created from -/// [`GenesisState`] or validated from a [`ProvenBlock`] via [`GenesisBlock::try_from`]. -pub struct GenesisBlock(ProvenBlock); +/// [`GenesisState`] or validated from a [`SignedBlock`] via [`GenesisBlock::try_from`]. +pub struct GenesisBlock(SignedBlock); impl GenesisBlock { - pub fn inner(&self) -> &ProvenBlock { + pub fn inner(&self) -> &SignedBlock { &self.0 } - pub fn into_inner(self) -> ProvenBlock { + pub fn into_inner(self) -> SignedBlock { self.0 } } -impl TryFrom for GenesisBlock { +impl TryFrom for GenesisBlock { type Error = anyhow::Error; - fn try_from(block: ProvenBlock) -> anyhow::Result { + fn try_from(block: SignedBlock) -> anyhow::Result { anyhow::ensure!( block.header().block_num() == BlockNumber::GENESIS, "expected genesis block number (0), got {}", @@ -152,15 +151,11 @@ impl GenesisState { empty_transactions, ); - let block_proof = BlockProof::new_dummy(); - // Sign and assert verification for sanity (no mismatch between frontend and backend signing // impls). let signature = self.block_signer.sign(&header).await?; assert!(signature.verify(header.commitment(), &self.block_signer.public_key())); - // SAFETY: Header and accounts should be valid by construction. - // No notes or nullifiers are created at genesis, which is consistent with the above empty - // block note tree root and empty nullifier tree root. - Ok(GenesisBlock(ProvenBlock::new_unchecked(header, body, signature, block_proof))) + let signed_block = SignedBlock::new(header, body, signature)?; + Ok(GenesisBlock(signed_block)) } } diff --git a/crates/store/src/lib.rs b/crates/store/src/lib.rs index a4134aa33c..68278375b4 100644 --- a/crates/store/src/lib.rs +++ b/crates/store/src/lib.rs @@ -4,6 +4,7 @@ mod blocks; mod db; mod errors; pub mod genesis; +mod proven_tip; mod server; pub mod state; diff --git a/crates/store/src/proven_tip.rs b/crates/store/src/proven_tip.rs new file mode 100644 index 0000000000..4505828123 --- /dev/null +++ b/crates/store/src/proven_tip.rs @@ -0,0 +1,62 @@ +use std::sync::Arc; +use std::sync::atomic::{AtomicU32, Ordering}; + +use miden_protocol::block::BlockNumber; + +/// Single-owner handle that can advance the proven chain tip. +/// +/// Not cloneable — only the proof scheduler should write. +pub struct ProvenTipWriter(Arc); + +/// Cheaply cloneable handle for reading the current proven chain tip. +#[derive(Clone)] +pub struct ProvenTipReader(Arc); + +impl ProvenTipWriter { + /// Creates a new writer/reader pair initialized to `tip`. + pub fn new(tip: BlockNumber) -> (Self, ProvenTipReader) { + let inner = Arc::new(AtomicU32::new(tip.as_u32())); + (Self(Arc::clone(&inner)), ProvenTipReader(inner)) + } + + /// Advances the tip to `new_tip` if it is greater than the current value. + /// + /// This is a no-op when `new_tip` is less than or equal to the existing tip. + pub fn advance(&self, new_tip: BlockNumber) { + self.0.fetch_max(new_tip.as_u32(), Ordering::Release); + } +} + +impl ProvenTipReader { + /// Returns the current proven chain tip. + pub fn read(&self) -> BlockNumber { + BlockNumber::from(self.0.load(Ordering::Acquire)) + } +} + +#[cfg(test)] +mod tests { + use super::*; + + #[test] + fn advance_only_increases_tip() { + let (writer, reader) = ProvenTipWriter::new(BlockNumber::from(5u32)); + assert_eq!(reader.read(), BlockNumber::from(5u32)); + + // Advancing to a higher value updates the tip. + writer.advance(BlockNumber::from(10u32)); + assert_eq!(reader.read(), BlockNumber::from(10u32)); + + // Advancing to a lower value is a no-op. + writer.advance(BlockNumber::from(7u32)); + assert_eq!(reader.read(), BlockNumber::from(10u32)); + + // Advancing to the same value is a no-op. + writer.advance(BlockNumber::from(10u32)); + assert_eq!(reader.read(), BlockNumber::from(10u32)); + + // Advancing to a higher value again works. + writer.advance(BlockNumber::from(15u32)); + assert_eq!(reader.read(), BlockNumber::from(15u32)); + } +} diff --git a/crates/store/src/server/block_producer.rs b/crates/store/src/server/block_producer.rs index 16c2ee4886..0102a19285 100644 --- a/crates/store/src/server/block_producer.rs +++ b/crates/store/src/server/block_producer.rs @@ -26,6 +26,7 @@ use crate::server::api::{ validate_note_commitments, validate_nullifiers, }; +use crate::state::Finality; // BLOCK PRODUCER ENDPOINTS // ================================================================================================ @@ -207,7 +208,7 @@ impl block_producer_server::BlockProducer for StoreApi { .inspect_err(|err| tracing::Span::current().set_error(err)) .map_err(|err| tonic::Status::internal(err.as_report()))?; - let block_height = self.state.latest_block_num().await.as_u32(); + let block_height = self.state.chain_tip(Finality::Committed).await.as_u32(); Ok(Response::new(proto::store::TransactionInputs { account_state: Some(proto::store::transaction_inputs::AccountTransactionInputRecord { diff --git a/crates/store/src/server/mod.rs b/crates/store/src/server/mod.rs index 85292bb726..bcd8689bea 100644 --- a/crates/store/src/server/mod.rs +++ b/crates/store/src/server/mod.rs @@ -6,15 +6,12 @@ use std::time::Duration; use anyhow::Context; use miden_node_proto::generated::store; -use miden_node_proto_build::{ - store_block_producer_api_descriptor, - store_ntx_builder_api_descriptor, - store_rpc_api_descriptor, -}; +use miden_node_proto_build::store_api_descriptor; use miden_node_utils::clap::{GrpcOptionsInternal, StorageOptions}; use miden_node_utils::panic::{CatchPanicLayer, catch_panic_layer_fn}; use miden_node_utils::tracing::grpc::grpc_trace_fn; use tokio::net::TcpListener; +use tokio::sync::watch; use tokio::task::JoinSet; use tokio_stream::wrappers::TcpListenerStream; use tower_http::trace::TraceLayer; @@ -25,6 +22,7 @@ use crate::blocks::BlockStore; use crate::db::Db; use crate::errors::ApplyBlockError; use crate::genesis::GenesisBlock; +use crate::proven_tip::ProvenTipWriter; use crate::state::State; use crate::{BlockProver, COMPONENT}; @@ -92,35 +90,96 @@ impl Store { block_producer_endpoint=?block_producer_address, ?self.data_directory, ?self.grpc_options.request_timeout, "Loading database"); + // Load initial state. let (termination_ask, mut termination_signal) = tokio::sync::mpsc::channel::(1); - let state = Arc::new( + let (state, tx_proven_tip) = State::load(&self.data_directory, self.storage_options, termination_ask) .await - .context("failed to load state")?, - ); + .context("failed to load state")?; + + // Spawn proof scheduler. + let (proof_scheduler_task, chain_tip_sender) = Self::spawn_proof_scheduler( + &state, + self.block_prover_url, + self.max_concurrent_proofs, + tx_proven_tip, + ) + .await; - // Initialize local or remote block prover. - let block_prover = if let Some(url) = self.block_prover_url { + // Spawn gRPC Servers. + let mut join_set = Self::spawn_grpc_servers( + state, + chain_tip_sender, + self.grpc_options, + self.rpc_listener, + self.ntx_builder_listener, + self.block_producer_listener, + )?; + + // Wait on any workload to finish / error out. + let service = async move { + join_set.join_next().await.expect("joinset is not empty")?.map_err(Into::into) + }; + tokio::select! { + result = service => result, + Some(err) = termination_signal.recv() => { + Err(anyhow::anyhow!("received termination signal").context(err)) + }, + result = proof_scheduler_task => { + match result { + Ok(Ok(())) => Err(anyhow::anyhow!("proof scheduler exited unexpectedly")), + Ok(Err(err)) => Err(err.context("proof scheduler fatal error")), + Err(join_err) => Err(join_err).context("proof scheduler panicked"), + } + } + } + } + + /// Initializes the block prover client and spawns the proof scheduler as a background task. + /// + /// Returns the scheduler task handle and the chain tip sender (needed by gRPC services to + /// notify the scheduler of new blocks). + async fn spawn_proof_scheduler( + state: &State, + block_prover_url: Option, + max_concurrent_proofs: NonZeroUsize, + proven_tip: ProvenTipWriter, + ) -> ( + tokio::task::JoinHandle>, + watch::Sender, + ) { + let block_prover = if let Some(url) = block_prover_url { Arc::new(BlockProver::remote(url)) } else { Arc::new(BlockProver::local()) }; - // Initialize the chain tip watch channel. - let chain_tip = state.latest_block_num().await; - let (chain_tip_sender, chain_tip_rx) = tokio::sync::watch::channel(chain_tip); + let chain_tip = state.chain_tip(crate::state::Finality::Committed).await; + let (chain_tip_tx, chain_tip_rx) = watch::channel(chain_tip); - // Spawn the proof scheduler as a background task. It will immediately pick up any - // unproven blocks from previous runs and begin proving them. - let proof_scheduler_task = proof_scheduler::spawn( + let handle = proof_scheduler::spawn( state.db().clone(), block_prover, state.block_store(), chain_tip_rx, - self.max_concurrent_proofs, + proven_tip, + max_concurrent_proofs, ); + (handle, chain_tip_tx) + } + + /// Spawns the gRPC servers and the DB maintenance background task. + fn spawn_grpc_servers( + state: State, + chain_tip_sender: watch::Sender, + grpc_options: GrpcOptionsInternal, + rpc_listener: TcpListener, + ntx_builder_listener: TcpListener, + block_producer_listener: TcpListener, + ) -> anyhow::Result>> { + let state = Arc::new(state); let rpc_service = store::rpc_server::RpcServer::new(api::StoreApi { state: Arc::clone(&state), chain_tip_sender: chain_tip_sender.clone(), @@ -135,9 +194,7 @@ impl Store { chain_tip_sender, }); let reflection_service = tonic_reflection::server::Builder::configure() - .register_file_descriptor_set(store_rpc_api_descriptor()) - .register_file_descriptor_set(store_ntx_builder_api_descriptor()) - .register_file_descriptor_set(store_block_producer_api_descriptor()) + .register_file_descriptor_set(store_api_descriptor()) .build_v1() .context("failed to build reflection service")?; @@ -150,61 +207,45 @@ impl Store { // // 5 minutes seems like a reasonable interval, where this should have minimal database // IO impact while providing a decent view into table growth over time. - let mut interval = tokio::time::interval(Duration::from_secs(5 * 60)); - let database = Arc::clone(&state); + let mut interval = tokio::time::interval(Duration::from_mins(5)); loop { interval.tick().await; - let _ = database.analyze_table_sizes().await; + let _ = state.analyze_table_sizes().await; } }); - // Build the gRPC server with the API services and trace layer. join_set.spawn( tonic::transport::Server::builder() - .timeout(self.grpc_options.request_timeout) + .timeout(grpc_options.request_timeout) .layer(CatchPanicLayer::custom(catch_panic_layer_fn)) .layer(TraceLayer::new_for_grpc().make_span_with(grpc_trace_fn)) .add_service(rpc_service) .add_service(reflection_service.clone()) - .serve_with_incoming(TcpListenerStream::new(self.rpc_listener)), + .serve_with_incoming(TcpListenerStream::new(rpc_listener)), ); join_set.spawn( tonic::transport::Server::builder() - .timeout(self.grpc_options.request_timeout) + .timeout(grpc_options.request_timeout) .layer(CatchPanicLayer::custom(catch_panic_layer_fn)) .layer(TraceLayer::new_for_grpc().make_span_with(grpc_trace_fn)) .add_service(ntx_builder_service) .add_service(reflection_service.clone()) - .serve_with_incoming(TcpListenerStream::new(self.ntx_builder_listener)), + .serve_with_incoming(TcpListenerStream::new(ntx_builder_listener)), ); join_set.spawn( tonic::transport::Server::builder() .accept_http1(true) - .timeout(self.grpc_options.request_timeout) + .timeout(grpc_options.request_timeout) .layer(CatchPanicLayer::custom(catch_panic_layer_fn)) .layer(TraceLayer::new_for_grpc().make_span_with(grpc_trace_fn)) .add_service(block_producer_service) .add_service(reflection_service) - .serve_with_incoming(TcpListenerStream::new(self.block_producer_listener)), + .serve_with_incoming(TcpListenerStream::new(block_producer_listener)), ); - // SAFETY: The joinset is definitely not empty. - let service = async move { join_set.join_next().await.unwrap()?.map_err(Into::into) }; - tokio::select! { - result = service => result, - Some(err) = termination_signal.recv() => { - Err(anyhow::anyhow!("received termination signal").context(err)) - }, - result = proof_scheduler_task => { - match result { - Ok(Ok(())) => Err(anyhow::anyhow!("proof scheduler exited unexpectedly")), - Ok(Err(err)) => Err(err.context("proof scheduler fatal error")), - Err(join_err) => Err(join_err).context("proof scheduler panicked"), - } - } - } + Ok(join_set) } } diff --git a/crates/store/src/server/ntx_builder.rs b/crates/store/src/server/ntx_builder.rs index f7973f51fb..ff1b6eb332 100644 --- a/crates/store/src/server/ntx_builder.rs +++ b/crates/store/src/server/ntx_builder.rs @@ -31,6 +31,7 @@ use crate::server::api::{ read_block_range, read_root, }; +use crate::state::Finality; // NTX BUILDER ENDPOINTS // ================================================================================================ @@ -145,7 +146,7 @@ impl ntx_builder_server::NtxBuilder for StoreApi { ) -> Result, Status> { let request = request.into_inner(); - let mut chain_tip = self.state.latest_block_num().await; + let mut chain_tip = self.state.chain_tip(Finality::Committed).await; let block_range = read_block_range::(Some(request), "GetNetworkAccountIds")? .into_inclusive_range::(&chain_tip)?; @@ -159,7 +160,7 @@ impl ntx_builder_server::NtxBuilder for StoreApi { last_block_included = chain_tip; } - chain_tip = self.state.latest_block_num().await; + chain_tip = self.state.chain_tip(Finality::Committed).await; Ok(Response::new(proto::store::NetworkAccountIdList { account_ids, @@ -251,7 +252,7 @@ impl ntx_builder_server::NtxBuilder for StoreApi { let block_num = if let Some(num) = request.block_num { num.into() } else { - self.state.latest_block_num().await + self.state.chain_tip(Finality::Committed).await }; // Retrieve the asset witnesses. @@ -305,7 +306,7 @@ impl ntx_builder_server::NtxBuilder for StoreApi { let block_num = if let Some(num) = request.block_num { num.into() } else { - self.state.latest_block_num().await + self.state.chain_tip(Finality::Committed).await }; // Retrieve the storage map witness. @@ -322,7 +323,7 @@ impl ntx_builder_server::NtxBuilder for StoreApi { key: Some(map_key.into()), proof: Some(proof.into()), }), - block_num: self.state.latest_block_num().await.as_u32(), + block_num: self.state.chain_tip(Finality::Committed).await.as_u32(), })) } } diff --git a/crates/store/src/server/proof_scheduler.rs b/crates/store/src/server/proof_scheduler.rs index 38fae36700..785629895e 100644 --- a/crates/store/src/server/proof_scheduler.rs +++ b/crates/store/src/server/proof_scheduler.rs @@ -8,7 +8,7 @@ //! as proven, the database atomically advances the `proven_in_sequence` column for all blocks //! that now form a contiguous proven sequence from genesis. //! 4. On transient errors (DB reads, prover failures, timeouts), the failed block is retried -//! internally within its proving task. +//! internally within its proving task, subject to an overall per-block time budget. //! 5. On fatal errors (e.g. deserialization failures, missing proving inputs), the scheduler //! returns the error to the caller for node shutdown. @@ -23,19 +23,26 @@ use miden_remote_prover_client::RemoteProverClientError; use thiserror::Error; use tokio::sync::watch; use tokio::task::{JoinHandle, JoinSet}; -use tracing::{error, info, instrument}; +use tracing::{Instrument, info, instrument}; use crate::COMPONENT; use crate::blocks::BlockStore; use crate::db::Db; use crate::errors::{DatabaseError, ProofSchedulerError}; +use crate::proven_tip::ProvenTipWriter; use crate::server::block_prover_client::{BlockProver, StoreProverError}; // CONSTANTS // ================================================================================================ -/// Overall timeout for proving a single block. -const BLOCK_PROVE_TIMEOUT: Duration = Duration::from_mins(4); +/// Timeout for a single block proof attempt (per-retry). +const BLOCK_PROVE_ATTEMPT_TIMEOUT: Duration = Duration::from_mins(4); + +/// Overall timeout for proving a single block (across all retries). +const BLOCK_PROVE_OVERALL_TIMEOUT: Duration = Duration::from_mins(12); + +/// Maximum number of proving attempts per block before giving up. +const MAX_PROVE_ATTEMPTS: u32 = 3; /// Default maximum number of blocks being proven concurrently. pub const DEFAULT_MAX_CONCURRENT_PROOFS: NonZeroUsize = NonZeroUsize::new(8).unwrap(); @@ -59,13 +66,16 @@ impl ProofTaskJoinSet { db: &Arc, block_prover: &Arc, block_store: &Arc, + proven_tip: &Arc, block_num: BlockNumber, ) { let db = Arc::clone(db); let block_prover = Arc::clone(block_prover); let block_store = Arc::clone(block_store); - self.0 - .spawn(async move { prove_block(&db, &block_prover, &block_store, block_num).await }); + let proven_tip = Arc::clone(proven_tip); + self.0.spawn(async move { + prove_block(&db, &block_prover, &block_store, &proven_tip, block_num).await + }); } /// Returns the result of the next completed task, or pends forever if the set is empty. @@ -98,9 +108,18 @@ pub fn spawn( block_prover: Arc, block_store: Arc, chain_tip_rx: watch::Receiver, + proven_tip: ProvenTipWriter, max_concurrent_proofs: NonZeroUsize, ) -> JoinHandle> { - tokio::spawn(run(db, block_prover, block_store, chain_tip_rx, max_concurrent_proofs)) + let proven_tip = Arc::new(proven_tip); + tokio::spawn(run( + db, + block_prover, + block_store, + chain_tip_rx, + proven_tip, + max_concurrent_proofs, + )) } /// Main loop of the proof scheduler. @@ -117,6 +136,7 @@ async fn run( block_prover: Arc, block_store: Arc, mut chain_tip_rx: watch::Receiver, + proven_tip: Arc, max_concurrent_proofs: NonZeroUsize, ) -> anyhow::Result<()> { info!(target: COMPONENT, "Proof scheduler started"); @@ -127,7 +147,7 @@ async fn run( // Highest block number that is in-flight or has been proven. Used to avoid re-querying // blocks we've already scheduled. Initialized from the in-sequence tip so we skip // already-proven blocks on restart. - let mut highest_scheduled = db.select_latest_proven_in_sequence_block_num().await?; + let mut highest_scheduled = db.proven_chain_tip().await?; loop { // Query the DB for unproven blocks beyond what we've already scheduled. @@ -140,7 +160,7 @@ async fn run( } for block_num in unproven { - join_set.spawn(&db, &block_prover, &block_store, block_num); + join_set.spawn(&db, &block_prover, &block_store, &proven_tip, block_num); } } @@ -166,52 +186,71 @@ async fn run( /// Proves a single block, saves the proof to the block store, marks the block as proven in the /// DB, and advances the proven-in-sequence tip. -#[instrument(target = COMPONENT, name = "prove_block", skip_all, fields(block.number=block_num.as_u32()), err)] +#[instrument(target = COMPONENT, name = "prove_block", skip_all, + fields( + block.number=block_num.as_u32(), + proven_chain_tip = tracing::field::Empty + ), err)] async fn prove_block( db: &Db, block_prover: &BlockProver, block_store: &BlockStore, + proven_tip: &ProvenTipWriter, block_num: BlockNumber, ) -> anyhow::Result<()> { - const MAX_RETRIES: u32 = 10; + tokio::time::timeout(BLOCK_PROVE_OVERALL_TIMEOUT, async { + let mut attempt: u32 = 0; + loop { + // Create a span for each attempt. + attempt += 1; + let attempt_span = tracing::info_span!( + target: COMPONENT, + "prove_attempt", + attempt, + error = tracing::field::Empty, + timed_out = tracing::field::Empty, + ); + + // Generate block proof with timeout. + let result = tokio::time::timeout( + BLOCK_PROVE_ATTEMPT_TIMEOUT, + generate_block_proof(db, block_prover, block_num), + ) + .instrument(attempt_span.clone()) + .await; + + match result { + Ok(Ok(proof)) => { + // Save the block proof to file. + block_store.save_proof(block_num, &proof.to_bytes()).await?; + + // Mark the block as proven and advance the sequence in the database. + let tip = db.mark_proven_and_advance_sequence(block_num).await?; + tracing::Span::current().record("proven_chain_tip", tip.as_u32()); + + // Advance the cached proven tip if the new tip is higher. + proven_tip.advance(tip); - for _ in 0..MAX_RETRIES { - match tokio::time::timeout( - BLOCK_PROVE_TIMEOUT, - generate_block_proof(db, block_prover, block_num), - ) - .await - { - Ok(Ok(proof)) => { - // Save the block proof to file. - block_store.save_proof(block_num, &proof.to_bytes()).await?; - - // Mark the block as proven and advance the sequence in the database. - let advanced_in_sequence = db.mark_proven_and_advance_sequence(block_num).await?; - if let Some(&last) = advanced_in_sequence.last() { - info!( - target = COMPONENT, - block.number = %block_num, - proven_in_sequence_tip = %last, - "Block proven and in-sequence advanced", - ); - } else { - info!(target = COMPONENT, block.number = %block_num, "Block proven"); - } + return Ok(()); + }, + Ok(Err(ProveBlockError::Fatal(err))) => Err(err).context("fatal error")?, + Ok(Err(ProveBlockError::Transient(err))) => { + attempt_span.record("error", tracing::field::display(&err)); + }, + Err(elapsed) => { + attempt_span.record("timed_out", elapsed.to_string()); + }, + } - return Ok(()); - }, - Ok(Err(ProveBlockError::Fatal(err))) => Err(err).context("fatal error")?, - Ok(Err(ProveBlockError::Transient(err))) => { - error!(target = COMPONENT, block.number = %block_num, err = ?err, "transient error proving block, retrying"); - }, - Err(elapsed) => { - error!(target = COMPONENT, block.number = %block_num, %elapsed, "block proving timed out, retrying"); - }, + if attempt >= MAX_PROVE_ATTEMPTS { + anyhow::bail!("block {} failed after {attempt} attempts", block_num.as_u32()); + } } - } - - anyhow::bail!("maximum retries ({MAX_RETRIES}) exceeded"); + }) + .await + .context(format!( + "block proving overall timeout ({BLOCK_PROVE_OVERALL_TIMEOUT:?}) exceeded" + ))? } /// Generates a block proof by loading inputs from the DB and invoking the block prover. diff --git a/crates/store/src/server/rpc_api.rs b/crates/store/src/server/rpc_api.rs index e684b20a0e..87055936ff 100644 --- a/crates/store/src/server/rpc_api.rs +++ b/crates/store/src/server/rpc_api.rs @@ -1,6 +1,5 @@ use miden_node_proto::convert; -use miden_node_proto::domain::block::InvalidBlockRange; -use miden_node_proto::errors::ConversionError; +use miden_node_proto::domain::block::SyncTarget; use miden_node_proto::generated::store::rpc_server; use miden_node_proto::generated::{self as proto}; use miden_node_utils::limiter::{ @@ -41,6 +40,7 @@ use crate::server::api::{ read_root, validate_nullifiers, }; +use crate::state::Finality; // CLIENT ENDPOINTS // ================================================================================================ @@ -94,7 +94,7 @@ impl rpc_server::Rpc for StoreApi { return Err(SyncNullifiersError::InvalidPrefixLength(request.prefix_len).into()); } - let chain_tip = self.state.latest_block_num().await; + let chain_tip = self.state.chain_tip(Finality::Committed).await; let block_range = read_block_range::(request.block_range, "SyncNullifiersRequest")? .into_inclusive_range::(&chain_tip)?; @@ -129,7 +129,7 @@ impl rpc_server::Rpc for StoreApi { ) -> Result, Status> { let request = request.into_inner(); - let chain_tip = self.state.latest_block_num().await; + let chain_tip = self.state.chain_tip(Finality::Committed).await; let block_range = read_block_range::(request.block_range, "SyncNotesRequest")? .into_inclusive_range::(&chain_tip)?; @@ -167,39 +167,27 @@ impl rpc_server::Rpc for StoreApi { request: Request, ) -> Result, Status> { let request = request.into_inner(); - let chain_tip = self.state.latest_block_num().await; - let block_range = request - .block_range - .ok_or_else(|| { - ConversionError::missing_field::("block_range") - }) - .map_err(SyncChainMmrError::DeserializationFailed)?; - - // Determine the effective tip based on the requested finality level. - let effective_tip = match request.finality() { - proto::rpc::Finality::Unspecified | proto::rpc::Finality::Committed => chain_tip, - proto::rpc::Finality::Proven => self - .state - .db() - .select_latest_proven_in_sequence_block_num() - .await - .map_err(SyncChainMmrError::DatabaseError)?, + let block_from = BlockNumber::from(request.block_from); + + // Determine upper bound to sync to or default to last committed block. + let sync_target = request + .upper_bound + .map(SyncTarget::try_from) + .transpose() + .map_err(SyncChainMmrError::DeserializationFailed)? + .unwrap_or(SyncTarget::CommittedChainTip); + + let block_to = match sync_target { + SyncTarget::BlockNumber(block_num) => { + block_num.min(self.state.chain_tip(Finality::Committed).await) + }, + SyncTarget::CommittedChainTip => self.state.chain_tip(Finality::Committed).await, + SyncTarget::ProvenChainTip => self.state.chain_tip(Finality::Proven).await, }; - let block_from = BlockNumber::from(block_range.block_from); - if block_from > effective_tip { - Err(SyncChainMmrError::FutureBlock { chain_tip: effective_tip, block_from })?; - } - - let block_to = - block_range.block_to.map_or(effective_tip, BlockNumber::from).min(effective_tip); - if block_from > block_to { - Err(SyncChainMmrError::InvalidBlockRange(InvalidBlockRange::StartGreaterThanEnd { - start: block_from, - end: block_to, - }))?; + Err(SyncChainMmrError::FutureBlock { chain_tip: block_to, block_from })?; } let block_range = block_from..=block_to; let (mmr_delta, block_header) = @@ -248,19 +236,24 @@ impl rpc_server::Rpc for StoreApi { async fn get_block_by_number( &self, - request: Request, + request: Request, ) -> Result, Status> { let request = request.into_inner(); debug!(target: COMPONENT, ?request); - let block = self - .state - .load_block(request.block_num.into()) - .await - .map_err(GetBlockByNumberError::from)?; + // Load block from state. + let block_num = BlockNumber::from(request.block_num); + let block = self.state.load_block(block_num).await.map_err(GetBlockByNumberError::from)?; + + // Load proof from state. + let proof = if request.include_proof.unwrap_or_default() { + self.state.load_proof(block_num).await.map_err(GetBlockByNumberError::from)? + } else { + None + }; - Ok(Response::new(proto::blockchain::MaybeBlock { block })) + Ok(Response::new(proto::blockchain::MaybeBlock { block, proof })) } async fn get_account( @@ -281,7 +274,7 @@ impl rpc_server::Rpc for StoreApi { request: Request, ) -> Result, Status> { let request = request.into_inner(); - let chain_tip = self.state.latest_block_num().await; + let chain_tip = self.state.chain_tip(Finality::Committed).await; let account_id: AccountId = read_account_id::< proto::rpc::SyncAccountVaultRequest, @@ -343,7 +336,7 @@ impl rpc_server::Rpc for StoreApi { Err(SyncAccountStorageMapsError::AccountNotPublic(account_id))?; } - let chain_tip = self.state.latest_block_num().await; + let chain_tip = self.state.chain_tip(Finality::Committed).await; let block_range = read_block_range::( request.block_range, "SyncAccountStorageMapsRequest", @@ -383,7 +376,7 @@ impl rpc_server::Rpc for StoreApi { Ok(Response::new(proto::rpc::StoreStatus { version: env!("CARGO_PKG_VERSION").to_string(), status: "connected".to_string(), - chain_tip: self.state.latest_block_num().await.as_u32(), + chain_tip: self.state.chain_tip(Finality::Committed).await.as_u32(), })) } @@ -415,7 +408,7 @@ impl rpc_server::Rpc for StoreApi { let request = request.into_inner(); - let chain_tip = self.state.latest_block_num().await; + let chain_tip = self.state.chain_tip(Finality::Committed).await; let block_range = read_block_range::( request.block_range, "SyncTransactionsRequest", diff --git a/crates/store/src/state/loader.rs b/crates/store/src/state/loader.rs index 3863f4afbc..181578d6db 100644 --- a/crates/store/src/state/loader.rs +++ b/crates/store/src/state/loader.rs @@ -15,6 +15,7 @@ use std::path::Path; use miden_crypto::merkle::mmr::Mmr; #[cfg(feature = "rocksdb")] use miden_large_smt_backend_rocksdb::RocksDbStorage; +#[cfg(feature = "rocksdb")] use miden_node_utils::clap::RocksDbOptions; use miden_protocol::block::account_tree::{AccountIdKey, AccountTree}; use miden_protocol::block::nullifier_tree::NullifierTree; diff --git a/crates/store/src/state/mod.rs b/crates/store/src/state/mod.rs index 8d9fe376c1..74a41f1807 100644 --- a/crates/store/src/state/mod.rs +++ b/crates/store/src/state/mod.rs @@ -52,6 +52,7 @@ use crate::errors::{ GetCurrentBlockchainDataError, StateInitializationError, }; +use crate::proven_tip::{ProvenTipReader, ProvenTipWriter}; use crate::{COMPONENT, DataDirectory}; mod loader; @@ -69,6 +70,18 @@ use loader::{ mod apply_block; mod sync_state; +// FINALITY +// ================================================================================================ + +/// The finality level for chain tip queries. +#[derive(Debug, Clone, Copy)] +pub enum Finality { + /// The latest committed (but not necessarily proven) block. + Committed, + /// The latest block that has been proven in an unbroken sequence from genesis. + Proven, +} + // STRUCTURES // ================================================================================================ @@ -125,6 +138,9 @@ pub struct State { /// Request termination of the process due to a fatal internal state error. termination_ask: tokio::sync::mpsc::Sender, + + /// The latest proven-in-sequence block number, updated by the proof scheduler. + proven_tip: ProvenTipReader, } impl State { @@ -137,7 +153,7 @@ impl State { data_path: &Path, storage_options: StorageOptions, termination_ask: tokio::sync::mpsc::Sender, - ) -> Result { + ) -> Result<(Self, ProvenTipWriter), StateInitializationError> { let data_directory = DataDirectory::load(data_path.to_path_buf()) .map_err(StateInitializationError::DataDirectoryLoadError)?; @@ -154,18 +170,20 @@ impl State { let blockchain = load_mmr(&mut db).await?; let latest_block_num = blockchain.chain_tip().unwrap_or(BlockNumber::GENESIS); - let account_storage = TreeStorage::create( - data_path, - &storage_options.account_tree.into(), - ACCOUNT_TREE_STORAGE_DIR, - )?; + #[cfg(feature = "rocksdb")] + let (account_storage_config, nullifier_storage_config) = + (storage_options.account_tree.into(), storage_options.nullifier_tree.into()); + #[cfg(not(feature = "rocksdb"))] + let (account_storage_config, nullifier_storage_config) = { + let _ = &storage_options; + ((), ()) + }; + let account_storage = + TreeStorage::create(data_path, &account_storage_config, ACCOUNT_TREE_STORAGE_DIR)?; let account_tree = account_storage.load_account_tree(&mut db).await?; - let nullifier_storage = TreeStorage::create( - data_path, - &storage_options.nullifier_tree.into(), - NULLIFIER_TREE_STORAGE_DIR, - )?; + let nullifier_storage = + TreeStorage::create(data_path, &nullifier_storage_config, NULLIFIER_TREE_STORAGE_DIR)?; let nullifier_tree = nullifier_storage.load_nullifier_tree(&mut db).await?; // Verify that tree roots match the expected roots from the database. @@ -183,14 +201,23 @@ impl State { let writer = Mutex::new(()); let db = Arc::new(db); - Ok(Self { - db, - block_store, - inner, - forest, - writer, - termination_ask, - }) + // Initialize the proven tip from database. + let proven_tip = + db.proven_chain_tip().await.map_err(StateInitializationError::DatabaseError)?; + let (proven_tip_writer, proven_tip) = ProvenTipWriter::new(proven_tip); + + Ok(( + Self { + db, + block_store, + inner, + forest, + writer, + termination_ask, + proven_tip, + }, + proven_tip_writer, + )) } /// Returns the database. @@ -264,7 +291,7 @@ impl State { ) -> Result, GetCurrentBlockchainDataError> { let blockchain = &self.inner.read().await.blockchain; if let Some(number) = block_num - && number == self.latest_block_num().await + && number == self.chain_tip(Finality::Committed).await { return Ok(None); } @@ -836,20 +863,43 @@ impl State { }) } + /// Returns the effective chain tip for the given finality level. + /// + /// - [`Finality::Committed`]: returns the latest committed block number (from in-memory MMR). + /// - [`Finality::Proven`]: returns the latest proven-in-sequence block number (cached via watch + /// channel, updated by the proof scheduler). + pub async fn chain_tip(&self, finality: Finality) -> BlockNumber { + match finality { + Finality::Committed => self + .inner + .read() + .instrument(tracing::info_span!("acquire_inner")) + .await + .latest_block_num(), + Finality::Proven => self.proven_tip.read(), + } + } + /// Loads a block from the block store. Return `Ok(None)` if the block is not found. pub async fn load_block( &self, block_num: BlockNumber, ) -> Result>, DatabaseError> { - if block_num > self.latest_block_num().await { + if block_num > self.chain_tip(Finality::Committed).await { return Ok(None); } self.block_store.load_block(block_num).await.map_err(Into::into) } - /// Returns the latest block number. - pub async fn latest_block_num(&self) -> BlockNumber { - self.inner.read().await.latest_block_num() + /// Loads a block proof from the block store. Returns `Ok(None)` if the proof is not found. + pub async fn load_proof( + &self, + block_num: BlockNumber, + ) -> Result>, DatabaseError> { + if block_num > self.chain_tip(Finality::Proven).await { + return Ok(None); + } + self.block_store.load_proof(block_num).await.map_err(Into::into) } /// Emits metrics for each database table's size. diff --git a/crates/utils/src/clap.rs b/crates/utils/src/clap.rs index 079a619d35..b5a93dbf29 100644 --- a/crates/utils/src/clap.rs +++ b/crates/utils/src/clap.rs @@ -161,10 +161,12 @@ impl StorageOptions { let account_tree = AccountTreeRocksDbOptions { max_open_fds: self::rocksdb::BENCH_ROCKSDB_MAX_OPEN_FDS, cache_size_in_bytes: self::rocksdb::DEFAULT_ROCKSDB_CACHE_SIZE, + durability_mode: None, }; let nullifier_tree = NullifierTreeRocksDbOptions { max_open_fds: BENCH_ROCKSDB_MAX_OPEN_FDS, cache_size_in_bytes: DEFAULT_ROCKSDB_CACHE_SIZE, + durability_mode: None, }; Self { account_tree, nullifier_tree } } diff --git a/crates/utils/src/clap/rocksdb.rs b/crates/utils/src/clap/rocksdb.rs index 572e5b1cf7..d2af9c83bd 100644 --- a/crates/utils/src/clap/rocksdb.rs +++ b/crates/utils/src/clap/rocksdb.rs @@ -2,12 +2,27 @@ use std::path::Path; -use miden_large_smt_backend_rocksdb::RocksDbConfig; +use miden_large_smt_backend_rocksdb::{RocksDbConfig, RocksDbDurabilityMode}; pub(crate) const DEFAULT_ROCKSDB_MAX_OPEN_FDS: i32 = 64; pub(crate) const DEFAULT_ROCKSDB_CACHE_SIZE: usize = 2 << 30; pub(crate) const BENCH_ROCKSDB_MAX_OPEN_FDS: i32 = 512; +#[derive(clap::ValueEnum, Clone, Copy, Debug, PartialEq, Eq)] +pub enum CliRocksDbDurabilityMode { + Relaxed, + Sync, +} + +impl From for RocksDbDurabilityMode { + fn from(value: CliRocksDbDurabilityMode) -> Self { + match value { + CliRocksDbDurabilityMode::Relaxed => Self::Relaxed, + CliRocksDbDurabilityMode::Sync => Self::Sync, + } + } +} + /// Per usage options for rocksdb configuration #[derive(clap::Args, Clone, Debug, PartialEq, Eq)] pub struct NullifierTreeRocksDbOptions { @@ -25,6 +40,13 @@ pub struct NullifierTreeRocksDbOptions { value_name = "NULLIFIER_TREE__ROCKSDB__CACHE_SIZE" )] pub cache_size_in_bytes: usize, + #[arg( + id = "nullifier_tree_rocksdb_durability_mode", + long = "nullifier_tree.rocksdb.durability_mode", + value_enum, + value_name = "NULLIFIER_TREE__ROCKSDB__DURABILITY_MODE" + )] + pub durability_mode: Option, } impl Default for NullifierTreeRocksDbOptions { @@ -50,6 +72,13 @@ pub struct AccountTreeRocksDbOptions { value_name = "ACCOUNT_TREE__ROCKSDB__CACHE_SIZE" )] pub cache_size_in_bytes: usize, + #[arg( + id = "account_tree_rocksdb_durability_mode", + long = "account_tree.rocksdb.durability_mode", + value_enum, + value_name = "ACCOUNT_TREE__ROCKSDB__DURABILITY_MODE" + )] + pub durability_mode: Option, } impl Default for AccountTreeRocksDbOptions { @@ -63,6 +92,7 @@ impl Default for AccountTreeRocksDbOptions { pub struct RocksDbOptions { pub max_open_fds: i32, pub cache_size_in_bytes: usize, + pub durability_mode: Option, } impl Default for RocksDbOptions { @@ -70,42 +100,81 @@ impl Default for RocksDbOptions { Self { max_open_fds: DEFAULT_ROCKSDB_MAX_OPEN_FDS, cache_size_in_bytes: DEFAULT_ROCKSDB_CACHE_SIZE, + durability_mode: None, } } } impl From for RocksDbOptions { fn from(value: AccountTreeRocksDbOptions) -> Self { - let AccountTreeRocksDbOptions { max_open_fds, cache_size_in_bytes } = value; - Self { max_open_fds, cache_size_in_bytes } + let AccountTreeRocksDbOptions { + max_open_fds, + cache_size_in_bytes, + durability_mode, + } = value; + Self { + max_open_fds, + cache_size_in_bytes, + durability_mode, + } } } impl From for RocksDbOptions { fn from(value: NullifierTreeRocksDbOptions) -> Self { - let NullifierTreeRocksDbOptions { max_open_fds, cache_size_in_bytes } = value; - Self { max_open_fds, cache_size_in_bytes } + let NullifierTreeRocksDbOptions { + max_open_fds, + cache_size_in_bytes, + durability_mode, + } = value; + Self { + max_open_fds, + cache_size_in_bytes, + durability_mode, + } } } impl From for AccountTreeRocksDbOptions { fn from(value: RocksDbOptions) -> Self { - let RocksDbOptions { max_open_fds, cache_size_in_bytes } = value; - Self { max_open_fds, cache_size_in_bytes } + let RocksDbOptions { + max_open_fds, + cache_size_in_bytes, + durability_mode, + } = value; + Self { + max_open_fds, + cache_size_in_bytes, + durability_mode, + } } } impl From for NullifierTreeRocksDbOptions { fn from(value: RocksDbOptions) -> Self { - let RocksDbOptions { max_open_fds, cache_size_in_bytes } = value; - Self { max_open_fds, cache_size_in_bytes } + let RocksDbOptions { + max_open_fds, + cache_size_in_bytes, + durability_mode, + } = value; + Self { + max_open_fds, + cache_size_in_bytes, + durability_mode, + } } } impl RocksDbOptions { pub fn with_path(self, path: &Path) -> RocksDbConfig { - RocksDbConfig::new(path) + let mut config = RocksDbConfig::new(path) .with_cache_size(self.cache_size_in_bytes) - .with_max_open_files(self.max_open_fds) + .with_max_open_files(self.max_open_fds); + + if let Some(durability_mode) = self.durability_mode { + config = config.with_durability_mode(durability_mode.into()); + } + + config } } diff --git a/crates/utils/src/logging.rs b/crates/utils/src/logging.rs index 5893650303..e8164f6e52 100644 --- a/crates/utils/src/logging.rs +++ b/crates/utils/src/logging.rs @@ -2,7 +2,6 @@ use std::str::FromStr; use std::sync::OnceLock; use opentelemetry::trace::TracerProvider as _; -use opentelemetry_otlp::WithTonicConfig; use opentelemetry_sdk::propagation::TraceContextPropagator; use opentelemetry_sdk::trace::SdkTracerProvider; use tracing::subscriber::Subscriber; @@ -114,10 +113,9 @@ pub fn setup_tracing(otel: OpenTelemetry) -> anyhow::Result> { } fn init_tracer_provider() -> anyhow::Result { - let exporter = opentelemetry_otlp::SpanExporter::builder() - .with_tonic() - .with_tls_config(tonic::transport::ClientTlsConfig::new().with_native_roots()) - .build()?; + let builder = opentelemetry_otlp::SpanExporter::builder().with_tonic(); + + let exporter = builder.build()?; Ok(opentelemetry_sdk::trace::SdkTracerProvider::builder() .with_batch_exporter(exporter) diff --git a/crates/validator/README.md b/crates/validator/README.md index d3c4571c3b..bb51dd6332 100644 --- a/crates/validator/README.md +++ b/crates/validator/README.md @@ -58,7 +58,7 @@ sequenceDiagram The validator exposes a gRPC API with the following endpoints: -- `Status()` - Returns validator health and version information. +- `Status()` - Returns validator health, version, chain tip, validated transactions count, and signed blocks count. - `SubmitProvenTransaction()` - Validates and stores a proven transaction. - `SignBlock()` - Validates a proposed block and returns a signature. diff --git a/crates/validator/src/db/mod.rs b/crates/validator/src/db/mod.rs index 96c210f8a3..057c5e8ab5 100644 --- a/crates/validator/src/db/mod.rs +++ b/crates/validator/src/db/mod.rs @@ -5,7 +5,7 @@ mod schema; use std::path::PathBuf; use diesel::SqliteConnection; -use diesel::dsl::exists; +use diesel::dsl::{count_star, exists}; use diesel::prelude::*; use miden_node_db::{DatabaseError, Db, SqlTypeConvert}; use miden_protocol::block::{BlockHeader, BlockNumber}; @@ -135,3 +135,17 @@ pub fn load_block_header( }) .transpose() } + +/// Returns the total number of validated transactions in the database. +#[instrument(target = COMPONENT, skip(conn), err)] +pub fn count_validated_transactions(conn: &mut SqliteConnection) -> Result { + let count = schema::validated_transactions::table.select(count_star()).first::(conn)?; + Ok(count) +} + +/// Returns the total number of signed blocks in the database. +#[instrument(target = COMPONENT, skip(conn), err)] +pub fn count_signed_blocks(conn: &mut SqliteConnection) -> Result { + let count = schema::block_headers::table.select(count_star()).first::(conn)?; + Ok(count) +} diff --git a/crates/validator/src/server/mod.rs b/crates/validator/src/server/mod.rs index 47bc4b0f30..be2920b457 100644 --- a/crates/validator/src/server/mod.rs +++ b/crates/validator/src/server/mod.rs @@ -1,36 +1,31 @@ use std::net::SocketAddr; use std::path::PathBuf; use std::sync::Arc; +use std::sync::atomic::{AtomicU32, AtomicU64}; use anyhow::Context; use miden_node_db::Db; use miden_node_proto::generated::validator::api_server; -use miden_node_proto::generated::{self as proto}; use miden_node_proto_build::validator_api_descriptor; -use miden_node_utils::ErrorReport; use miden_node_utils::clap::GrpcOptionsInternal; use miden_node_utils::panic::catch_panic_layer_fn; -use miden_node_utils::tracing::OpenTelemetrySpanExt; use miden_node_utils::tracing::grpc::grpc_trace_fn; -use miden_protocol::block::ProposedBlock; -use miden_protocol::transaction::{ProvenTransaction, TransactionInputs}; -use miden_protocol::utils::serde::{Deserializable, Serializable}; use tokio::net::TcpListener; use tokio::sync::Semaphore; use tokio_stream::wrappers::TcpListenerStream; -use tonic::Status; use tower_http::catch_panic::CatchPanicLayer; use tower_http::trace::TraceLayer; -use tracing::{info_span, instrument}; -use crate::block_validation::validate_block; -use crate::db::{insert_transaction, load, load_chain_tip, upsert_block_header}; -use crate::tx_validation::validate_transaction; +use crate::db::{count_signed_blocks, count_validated_transactions, load, load_chain_tip}; use crate::{COMPONENT, ValidatorSigner}; #[cfg(test)] mod tests; +mod sign_block; +mod status; +mod submit_proven_transaction; + // VALIDATOR // ================================================================================ @@ -65,6 +60,17 @@ impl Validator { .await .context("failed to initialize validator database")?; + // Load initial metrics from the database for the in-memory counters. + let (initial_chain_tip, initial_tx_count, initial_block_count) = db + .query("load_initial_metrics", |conn| { + let tip = load_chain_tip(conn)?.map_or(0, |h| h.block_num().as_u32()); + let tx_count = u64::try_from(count_validated_transactions(conn)?).unwrap_or(0); + let block_count = u64::try_from(count_signed_blocks(conn)?).unwrap_or(0); + Ok::<_, miden_node_db::DatabaseError>((tip, tx_count, block_count)) + }) + .await + .context("failed to load initial metrics")?; + let listener = TcpListener::bind(self.address) .await .context("failed to bind to block producer address")?; @@ -79,7 +85,13 @@ impl Validator { .layer(CatchPanicLayer::custom(catch_panic_layer_fn)) .layer(TraceLayer::new_for_grpc().make_span_with(grpc_trace_fn)) .timeout(self.grpc_options.request_timeout) - .add_service(api_server::ApiServer::new(ValidatorServer::new(self.signer, db))) + .add_service(api_server::ApiServer::new(ValidatorServer::new( + self.signer, + db, + initial_chain_tip, + initial_tx_count, + initial_block_count, + ))) .add_service(reflection_service) .serve_with_incoming(TcpListenerStream::new(listener)) .await @@ -99,124 +111,29 @@ struct ValidatorServer { /// Serializes `sign_block` requests so that concurrent calls are processed sequentially, /// ensuring consistent chain tip reads and preventing race conditions. sign_block_semaphore: Semaphore, + /// In-memory chain tip, updated atomically after each signed block. + chain_tip: AtomicU32, + /// In-memory count of validated transactions, incremented after each new insert. + validated_transactions_count: AtomicU64, + /// In-memory count of signed blocks, incremented after each signed block. + signed_blocks_count: AtomicU64, } impl ValidatorServer { - fn new(signer: ValidatorSigner, db: Db) -> Self { + fn new( + signer: ValidatorSigner, + db: Db, + initial_chain_tip: u32, + initial_tx_count: u64, + initial_block_count: u64, + ) -> Self { Self { signer, db: db.into(), sign_block_semaphore: Semaphore::new(1), + chain_tip: AtomicU32::new(initial_chain_tip), + validated_transactions_count: AtomicU64::new(initial_tx_count), + signed_blocks_count: AtomicU64::new(initial_block_count), } } } - -#[tonic::async_trait] -impl api_server::Api for ValidatorServer { - /// Returns the status of the validator. - async fn status( - &self, - _request: tonic::Request<()>, - ) -> Result, tonic::Status> { - Ok(tonic::Response::new(proto::validator::ValidatorStatus { - version: env!("CARGO_PKG_VERSION").to_string(), - status: "OK".to_string(), - })) - } - - /// Receives a proven transaction, then validates and stores it. - #[instrument(target = COMPONENT, skip_all, err)] - async fn submit_proven_transaction( - &self, - request: tonic::Request, - ) -> Result, tonic::Status> { - let (tx, inputs) = info_span!("deserialize").in_scope(|| { - let request = request.into_inner(); - let tx = ProvenTransaction::read_from_bytes(&request.transaction).map_err(|err| { - Status::invalid_argument(err.as_report_context("Invalid proven transaction")) - })?; - let inputs = request - .transaction_inputs - .ok_or(Status::invalid_argument("Missing transaction inputs"))?; - let inputs = TransactionInputs::read_from_bytes(&inputs).map_err(|err| { - Status::invalid_argument(err.as_report_context("Invalid transaction inputs")) - })?; - - Result::<_, tonic::Status>::Ok((tx, inputs)) - })?; - - tracing::Span::current().set_attribute("transaction.id", tx.id()); - - // Validate the transaction. - let tx_info = validate_transaction(tx, inputs).await.map_err(|err| { - Status::invalid_argument(err.as_report_context("Invalid transaction")) - })?; - - // Store the validated transaction. - self.db - .transact("insert_transaction", move |conn| insert_transaction(conn, &tx_info)) - .await - .map_err(|err| { - Status::internal(err.as_report_context("Failed to insert transaction")) - })?; - Ok(tonic::Response::new(())) - } - - /// Validates a proposed block, verifies chain continuity, signs the block header, and updates - /// the chain tip. - async fn sign_block( - &self, - request: tonic::Request, - ) -> Result, tonic::Status> { - let proposed_block = info_span!("deserialize").in_scope(|| { - let proposed_block_bytes = request.into_inner().proposed_block; - - ProposedBlock::read_from_bytes(&proposed_block_bytes).map_err(|err| { - tonic::Status::invalid_argument(format!( - "Failed to deserialize proposed block: {err}", - )) - }) - })?; - - // Serialize sign_block requests to prevent race conditions between loading the - // chain tip and persisting the validated block header. - let _permit = self.sign_block_semaphore.acquire().await.map_err(|err| { - tonic::Status::internal(format!("sign_block semaphore closed: {err}")) - })?; - - // Load the current chain tip from the database. - let chain_tip = self - .db - .query("load_chain_tip", load_chain_tip) - .await - .map_err(|err| { - tonic::Status::internal(format!("Failed to load chain tip: {}", err.as_report())) - })? - .ok_or_else(|| tonic::Status::internal("Chain tip not found in database"))?; - - // Validate the block against the current chain tip. - let (signature, header) = validate_block(proposed_block, &self.signer, &self.db, chain_tip) - .await - .map_err(|err| { - tonic::Status::invalid_argument(format!( - "Failed to validate block: {}", - err.as_report() - )) - })?; - - // Persist the validated block header. - self.db - .transact("upsert_block_header", move |conn| upsert_block_header(conn, &header)) - .await - .map_err(|err| { - tonic::Status::internal(format!( - "Failed to persist block header: {}", - err.as_report() - )) - })?; - - // Send the signature. - let response = proto::blockchain::BlockSignature { signature: signature.to_bytes() }; - Ok(tonic::Response::new(response)) - } -} diff --git a/crates/validator/src/server/sign_block.rs b/crates/validator/src/server/sign_block.rs new file mode 100644 index 0000000000..a9f288b60a --- /dev/null +++ b/crates/validator/src/server/sign_block.rs @@ -0,0 +1,75 @@ +use std::sync::atomic::Ordering; + +use miden_node_proto::generated as grpc; +use miden_node_utils::ErrorReport; +use miden_protocol::block::ProposedBlock; +use miden_protocol::crypto::dsa::ecdsa_k256_keccak::Signature; +use miden_tx::utils::serde::{Deserializable, Serializable}; + +use crate::block_validation::validate_block; +use crate::db::{load_chain_tip, upsert_block_header}; +use crate::server::ValidatorServer; + +#[tonic::async_trait] +impl grpc::server::validator_api::SignBlock for ValidatorServer { + type Input = ProposedBlock; + type Output = Signature; + + fn decode(request: grpc::blockchain::ProposedBlock) -> tonic::Result { + ProposedBlock::read_from_bytes(&request.proposed_block).map_err(|err| { + tonic::Status::invalid_argument( + err.as_report_context("Failed to deserialize proposed block"), + ) + }) + } + + fn encode(output: Self::Output) -> tonic::Result { + Ok(grpc::blockchain::BlockSignature { signature: output.to_bytes() }) + } + + async fn handle(&self, proposed_block: Self::Input) -> tonic::Result { + // Serialize sign_block requests to prevent race conditions between loading the + // chain tip and persisting the validated block header. + let _permit = self.sign_block_semaphore.acquire().await.map_err(|err| { + tonic::Status::internal(format!("sign_block semaphore closed: {err}")) + })?; + + // Load the current chain tip from the database. + let chain_tip = self + .db + .query("load_chain_tip", load_chain_tip) + .await + .map_err(|err| { + tonic::Status::internal(format!("Failed to load chain tip: {}", err.as_report())) + })? + .ok_or_else(|| tonic::Status::internal("Chain tip not found in database"))?; + + // Validate the block against the current chain tip. + let (signature, header) = validate_block(proposed_block, &self.signer, &self.db, chain_tip) + .await + .map_err(|err| { + tonic::Status::invalid_argument(format!( + "Failed to validate block: {}", + err.as_report() + )) + })?; + + // Persist the validated block header. + let new_block_num = header.block_num().as_u32(); + self.db + .transact("upsert_block_header", move |conn| upsert_block_header(conn, &header)) + .await + .map_err(|err| { + tonic::Status::internal(format!( + "Failed to persist block header: {}", + err.as_report() + )) + })?; + + // Update the in-memory counters after successful persistence. + self.chain_tip.store(new_block_num, Ordering::Relaxed); + self.signed_blocks_count.fetch_add(1, Ordering::Relaxed); + + Ok(signature) + } +} diff --git a/crates/validator/src/server/status.rs b/crates/validator/src/server/status.rs new file mode 100644 index 0000000000..078b809113 --- /dev/null +++ b/crates/validator/src/server/status.rs @@ -0,0 +1,33 @@ +use std::sync::atomic::Ordering; + +use miden_node_proto::generated as grpc; + +use crate::server::ValidatorServer; + +#[tonic::async_trait] +impl grpc::server::validator_api::Status for ValidatorServer { + type Input = (); + type Output = (); + + async fn full(&self, _request: ()) -> tonic::Result { + Ok(grpc::validator::ValidatorStatus { + version: env!("CARGO_PKG_VERSION").to_string(), + status: "OK".to_string(), + chain_tip: self.chain_tip.load(Ordering::Relaxed), + validated_transactions_count: self.validated_transactions_count.load(Ordering::Relaxed), + signed_blocks_count: self.signed_blocks_count.load(Ordering::Relaxed), + }) + } + + async fn handle(&self, _input: Self::Input) -> tonic::Result { + unimplemented!() + } + + fn decode(_request: ()) -> tonic::Result { + unimplemented!() + } + + fn encode(_output: Self::Output) -> tonic::Result { + unimplemented!() + } +} diff --git a/crates/validator/src/server/submit_proven_transaction.rs b/crates/validator/src/server/submit_proven_transaction.rs new file mode 100644 index 0000000000..d5d7d9b21d --- /dev/null +++ b/crates/validator/src/server/submit_proven_transaction.rs @@ -0,0 +1,62 @@ +use std::sync::atomic::Ordering; + +use miden_node_proto::generated as grpc; +use miden_node_utils::ErrorReport; +use miden_node_utils::tracing::OpenTelemetrySpanExt; +use miden_protocol::transaction::{ProvenTransaction, TransactionInputs}; +use miden_tx::utils::serde::Deserializable; +use tonic::Status; + +use crate::db::insert_transaction; +use crate::server::ValidatorServer; +use crate::tx_validation::validate_transaction; + +#[tonic::async_trait] +impl grpc::server::validator_api::SubmitProvenTransaction for ValidatorServer { + type Input = Input; + type Output = (); + + async fn handle(&self, input: Self::Input) -> tonic::Result { + tracing::Span::current().set_attribute("transaction.id", input.tx.id()); + + // Validate the transaction. + let tx_info = validate_transaction(input.tx, input.inputs).await.map_err(|err| { + Status::invalid_argument(err.as_report_context("Invalid transaction")) + })?; + + // Store the validated transaction. + let count = self + .db + .transact("insert_transaction", move |conn| insert_transaction(conn, &tx_info)) + .await + .map_err(|err| { + Status::internal(err.as_report_context("Failed to insert transaction")) + })?; + + self.validated_transactions_count.fetch_add(count as u64, Ordering::Relaxed); + Ok(()) + } + + fn decode(request: grpc::transaction::ProvenTransaction) -> tonic::Result { + let tx = ProvenTransaction::read_from_bytes(&request.transaction).map_err(|err| { + Status::invalid_argument(err.as_report_context("Invalid proven transaction")) + })?; + let inputs = request + .transaction_inputs + .ok_or(Status::invalid_argument("Missing transaction inputs"))?; + let inputs = TransactionInputs::read_from_bytes(&inputs).map_err(|err| { + Status::invalid_argument(err.as_report_context("Invalid transaction inputs")) + })?; + + Ok(Self::Input { tx, inputs }) + } + + fn encode(output: Self::Output) -> tonic::Result<()> { + Ok(output) + } +} + +pub struct Input { + tx: ProvenTransaction, + inputs: TransactionInputs, +} diff --git a/crates/validator/src/server/tests.rs b/crates/validator/src/server/tests.rs index e87b821e21..66357ab22e 100644 --- a/crates/validator/src/server/tests.rs +++ b/crates/validator/src/server/tests.rs @@ -44,7 +44,7 @@ impl TestValidator { .unwrap(); Self { - server: ValidatorServer::new(signer, db), + server: ValidatorServer::new(signer, db, 0, 0, 0), chain: PartialBlockchain::default(), chain_tip: genesis_header, } diff --git a/docker-compose.yml b/docker-compose.yml new file mode 100644 index 0000000000..c66667ef9d --- /dev/null +++ b/docker-compose.yml @@ -0,0 +1,128 @@ +services: + genesis: + image: miden-node-image + pull_policy: if_not_present + profiles: + - genesis + volumes: + - genesis-data:/genesis + - store-data:/store + - validator-data:/validator + - accounts:/accounts + entrypoint: ["/bin/sh", "-c"] + command: + - | + set -e + echo "Bootstrapping validator (creating genesis block)..." + miden-node validator bootstrap \ + --data-directory /validator \ + --genesis-block-directory /genesis \ + --accounts-directory /accounts + echo "Bootstrapping store..." + miden-node store bootstrap \ + --data-directory /store \ + --genesis-block /genesis/genesis.dat + + store: + image: miden-node-image + pull_policy: if_not_present + volumes: + - store-data:/data + environment: + - MIDEN_NODE_ENABLE_OTEL=true + - OTEL_EXPORTER_OTLP_ENDPOINT=http://tempo:4317 + - OTEL_SERVICE_NAME=store + command: + - miden-node + - store + - start + - --rpc.url=http://0.0.0.0:50001 + - --ntx-builder.url=http://0.0.0.0:50002 + - --block-producer.url=http://0.0.0.0:50003 + - --data-directory=/data + - --account_tree.rocksdb.max_cache_size=4294967296 + - --account_tree.rocksdb.max_open_fds=512 + - --nullifier_tree.rocksdb.max_cache_size=4294967296 + - --nullifier_tree.rocksdb.max_open_fds=512 + ports: + - "50001:50001" + - "50002:50002" + - "50003:50003" + + validator: + image: miden-node-image + pull_policy: if_not_present + volumes: + - validator-data:/data + environment: + - MIDEN_NODE_ENABLE_OTEL=true + - OTEL_EXPORTER_OTLP_ENDPOINT=http://tempo:4317 + - OTEL_SERVICE_NAME=validator + command: + - miden-node + - validator + - start + - http://0.0.0.0:50101 + - --data-directory=/data + ports: + - "50101:50101" + + block-producer: + image: miden-node-image + pull_policy: if_not_present + environment: + - MIDEN_NODE_ENABLE_OTEL=true + - OTEL_EXPORTER_OTLP_ENDPOINT=http://tempo:4317 + - OTEL_SERVICE_NAME=block-producer + command: + - miden-node + - block-producer + - start + - http://0.0.0.0:50201 + - --store.url=http://store:50003 + - --validator.url=http://validator:50101 + ports: + - "50201:50201" + + rpc: + image: miden-node-image + pull_policy: if_not_present + environment: + - MIDEN_NODE_ENABLE_OTEL=true + - OTEL_EXPORTER_OTLP_ENDPOINT=http://tempo:4317 + - OTEL_SERVICE_NAME=rpc + command: + - miden-node + - rpc + - start + - --url=http://0.0.0.0:57291 + - --store.url=http://store:50001 + - --block-producer.url=http://block-producer:50201 + - --validator.url=http://validator:50101 + ports: + - "57291:57291" + + ntx-builder: + image: miden-node-image + pull_policy: if_not_present + volumes: + - ntx-builder-data:/data + environment: + - MIDEN_NODE_ENABLE_OTEL=true + - OTEL_EXPORTER_OTLP_ENDPOINT=http://tempo:4317 + - OTEL_SERVICE_NAME=ntx-builder + command: + - miden-node + - ntx-builder + - start + - --store.url=http://store:50002 + - --block-producer.url=http://block-producer:50201 + - --validator.url=http://validator:50101 + - --data-directory=/data + +volumes: + genesis-data: + store-data: + validator-data: + ntx-builder-data: + accounts: diff --git a/docs/external/src/operator/architecture.md b/docs/external/src/operator/architecture.md index 93a02fe5ad..5dac265c81 100644 --- a/docs/external/src/operator/architecture.md +++ b/docs/external/src/operator/architecture.md @@ -64,5 +64,5 @@ number of failures, preventing resource exhaustion. The threshold can be set wit `--ntx-builder.max-account-crashes` (default: 10). The builder also exposes an internal gRPC server that the RPC component uses to proxy debugging endpoints such as -`GetNoteError`. In bundled mode this is wired automatically; in distributed mode operators must set +`GetNetworkNoteStatus`. In bundled mode this is wired automatically; in distributed mode operators must set `--ntx-builder.url` (or `MIDEN_NODE_NTX_BUILDER_URL`) on the RPC component. diff --git a/docs/external/src/operator/monitoring.md b/docs/external/src/operator/monitoring.md index 9e3ba945cd..fe28c54ffd 100644 --- a/docs/external/src/operator/monitoring.md +++ b/docs/external/src/operator/monitoring.md @@ -118,7 +118,7 @@ The available log levels are `trace`, `debug`, `info` (default), `warn`, `error` export RUST_LOG=debug ``` -The verbosity can also be specified by component (when running them as a single process): +The verbosity can also be specified by component: ```sh export RUST_LOG=warn,block-producer=debug,rpc=error @@ -129,10 +129,12 @@ The above would set the general level to `warn`, and the `block-producer` and `r ## Configuration -The OpenTelemetry trace exporter is enabled by adding the `--enable-otel` flag to the node's start command: +The OpenTelemetry trace exporter is enabled by adding the `--enable-otel` flag to each component's start command: ```sh -miden-node bundled start --enable-otel +miden-node store start --enable-otel +miden-node block-producer start --enable-otel +miden-node rpc start --enable-otel ``` The exporter can be configured using environment variables as specified in the official @@ -153,7 +155,7 @@ This is based off Honeycomb's OpenTelemetry ```sh OTEL_EXPORTER_OTLP_ENDPOINT=https://api.honeycomb.io:443 \ OTEL_EXPORTER_OTLP_HEADERS="x-honeycomb-team=your-api-key" \ -miden-node bundled start --enable-otel +miden-node store start --enable-otel ``` ### Honeycomb queries, triggers and board examples diff --git a/docs/external/src/operator/usage.md b/docs/external/src/operator/usage.md index 840bee2cc4..99ddaa20e8 100644 --- a/docs/external/src/operator/usage.md +++ b/docs/external/src/operator/usage.md @@ -4,10 +4,10 @@ sidebar_position: 4 # Configuration and Usage -As outlined in the [Architecture](./architecture) chapter, the node consists of several components which can be run -separately or as a single bundled process. At present, the recommended way to operate a node is in bundled mode and is -what this guide will focus on. Operating the components separately is very similar and should be relatively -straight-forward to derive from these instructions. +As outlined in the [Architecture](./architecture) chapter, the node consists of several components which are run +as separate processes. The recommended way to operate a node locally is using `docker compose`, which starts each +component in its own container and automatically bootstraps on first run. Operating the components without Docker +is also straightforward using the individual CLI subcommands. This guide focuses on basic usage. To discover more advanced options we recommend exploring the various help menus which can be accessed by appending `--help` to any of the commands. @@ -15,27 +15,19 @@ which can be accessed by appending `--help` to any of the commands. ## Bootstrapping The first step in starting a new Miden network is to initialize the genesis block data. This is a -one-off operation using the `bootstrap` command and by default the genesis block will contain a single -faucet account. +two-step process: first the validator signs and creates the genesis block, then the store initializes +its database from that block. By default the genesis block will contain a single faucet account. ```sh -# Create a folder to store the node's data. -mkdir data - -# Bootstrap the node. -# -# This creates the node's database and initializes it with the genesis data. -# -# The genesis block currently contains a single public faucet account. The -# secret for this account is stored in the `` -# file. This file is not used by the node and should instead by used wherever -# you intend to operate this faucet account. -# -# For example, you could operate a public faucet using our faucet reference -# implementation whose operation is described in a later section. -miden-node bundled bootstrap \ - --data-directory data \ - --accounts-directory . +# Step 1: Validator bootstrap — create the signed genesis block and account files. +miden-node validator bootstrap \ + --genesis-block-directory genesis-data \ + --accounts-directory accounts + +# Step 2: Store bootstrap — initialize the store database from the genesis block. +miden-node store bootstrap \ + --data-directory store-data \ + --genesis-block genesis-data/genesis.dat ``` You can also configure the account and asset data in the genesis block by passing in a toml configuration file. @@ -44,9 +36,9 @@ transactions to achieve the desired state. Any account secrets will be written t the provided `--accounts-directory` path in the process. ```sh -miden-node bundled bootstrap \ - --data-directory data \ - --accounts-directory . \ +miden-node validator bootstrap \ + --genesis-block-directory genesis-data \ + --accounts-directory accounts \ --genesis-config-file genesis.toml ``` @@ -110,18 +102,93 @@ path = "eth_faucet.mac" ## Operation -Start the node with the desired public gRPC server address. +### Using docker compose + +Build the Docker image and start the node. Bootstrap happens automatically on first run. +The default `compose-up` target starts all node components along with a telemetry stack +([Tempo](https://grafana.com/oss/tempo/) + [Grafana](https://grafana.com/oss/grafana/)) and +a network monitor: ```sh -miden-node bundled start \ - --data-directory data \ - --rpc.url http://0.0.0.0:57291 +make docker-build-node +make docker-build-monitor +make compose-genesis +make compose-up +``` + +This starts: + +- All node components (store, validator, block-producer, rpc, ntx-builder) with OpenTelemetry tracing enabled +- **Tempo** — receives OTLP traces from the node on port `4317`, HTTP API on port `3200` +- **Grafana** — pre-configured with a Tempo datasource and a Miden Node dashboard, available at `http://localhost:3000` +- **Network monitor** — monitors the local node, available at `http://localhost:3001` + +Follow logs: + +```sh +make compose-logs +``` + +Stop the node: + +```sh +make compose-down +``` + +Teardown and regenesis: + +```sh +make compose-genesis +``` + +### Running components individually + +A convenience script is provided that bootstraps and starts all components as separate processes: + +```sh +export AWS_REGION=eu-north-1 +export KMS_KEY_ID= +./scripts/run-node.sh +``` + +Each component can also be started as a standalone process. For example: + +```sh +# Start the store +miden-node store start \ + --rpc.url http://0.0.0.0:50001 \ + --ntx-builder.url http://0.0.0.0:50002 \ + --block-producer.url http://0.0.0.0:50003 \ + --data-directory /tmp/store + +# Start the validator +miden-node validator start http://0.0.0.0:50101 \ + --data-directory /tmp/validator + +# Start the block producer +miden-node block-producer start http://0.0.0.0:50201 \ + --store.url http://127.0.0.1:50003 \ + --validator.url http://127.0.0.1:50101 + +# Start the RPC server +miden-node rpc start \ + --url http://0.0.0.0:57291 \ + --store.url http://127.0.0.1:50001 \ + --block-producer.url http://127.0.0.1:50201 \ + --validator.url http://127.0.0.1:50101 + +# Start the network transaction builder +miden-node ntx-builder start \ + --store.url http://127.0.0.1:50002 \ + --block-producer.url http://127.0.0.1:50201 \ + --validator.url http://127.0.0.1:50101 \ + --data-directory /tmp/ntx-builder ``` ### gRPC server limits and timeouts The RPC component enforces per-request timeouts, per-IP rate limits, and global concurrency caps. Configure these -settings for bundled or standalone RPC with the following options: +settings with the following options: - `--grpc.timeout` (default `10s`): Maximum request duration before the server drops the request. - `--grpc.max_connection_age` (default `30m`): Maximum lifetime of a connection before the server closes it. @@ -153,7 +220,7 @@ are exposed as CLI flags (also available as environment variables): Compaction parallelism is set automatically to the number of available CPU cores. ```sh -miden-node bundled start \ +miden-node store start \ --data-directory data \ --rpc.url http://0.0.0.0:57291 \ --account_tree.rocksdb.max_cache_size 4294967296 \ @@ -165,7 +232,7 @@ miden-node bundled start \ ## Environment variables Most configuration options can also be configured using environment variables as an alternative to providing the values -via the command-line. This is useful for certain deployment options like `docker` or `systemd`, where they can be easier +via the command-line. This is useful for certain deployment options like `docker`, where they can be easier to define or inject instead of changing the underlying command line options. These are especially convenient where multiple different configuration profiles are used. Write the environment diff --git a/docs/external/src/rpc.md b/docs/external/src/rpc.md index 9cc736a55d..585388aab1 100644 --- a/docs/external/src/rpc.md +++ b/docs/external/src/rpc.md @@ -16,17 +16,17 @@ The gRPC service definition can be found in the Miden node's `proto` [directory] - [GetBlockByNumber](#getblockbynumber) - [GetBlockHeaderByNumber](#getblockheaderbynumber) - [GetLimits](#getlimits) +- [GetNetworkNoteStatus](#getnetworknotestatus) - [GetNotesById](#getnotesbyid) - [GetNoteScriptByRoot](#getnotescriptbyroot) +- [Status](#status) - [SubmitProvenTransaction](#submitproventransaction) -- [SyncNullifiers](#syncnullifiers) -- [SyncAccountVault](#syncaccountvault) -- [SyncNotes](#syncnotes) - [SyncAccountStorageMaps](#syncaccountstoragemaps) +- [SyncAccountVault](#syncaccountvault) - [SyncChainMmr](#syncchainmmr) +- [SyncNotes](#syncnotes) +- [SyncNullifiers](#syncnullifiers) - [SyncTransactions](#synctransactions) -- [Status](#status) -- [GetNoteError](#getnoteerror) @@ -157,10 +157,49 @@ Request a set of notes. **Limits:** `note_id` (100) +### GetNetworkNoteStatus + +Returns the current lifecycle status of a network note. The status indicates where the note is in its lifecycle: pending execution, processed (consumed by a transaction in the mempool), or discarded after too many failed attempts. The response also includes the latest execution error, if any. + +This endpoint is only available when the network transaction builder is enabled and connected. If it is not configured, the endpoint returns `UNAVAILABLE`. + +#### Request + +```protobuf +message NoteId { + Digest id = 1; // The note ID +} +``` + +#### Response + +```protobuf +enum NetworkNoteStatus { + NETWORK_NOTE_STATUS_UNSPECIFIED = 0; + NETWORK_NOTE_STATUS_PENDING = 1; // Awaiting execution or being retried + NETWORK_NOTE_STATUS_NULLIFIER_INFLIGHT = 2; // Consumed by a transaction sent to block producer + NETWORK_NOTE_STATUS_DISCARDED = 3; // Exceeded max retries, will not be retried + NETWORK_NOTE_STATUS_NULLIFIER_COMMITTED = 4; // Consuming transaction committed on-chain +} + +message GetNetworkNoteStatusResponse { + NetworkNoteStatus status = 1; // Current lifecycle status + optional string last_error = 2; // The latest error message, if any + uint32 attempt_count = 3; // Number of failed execution attempts + optional fixed32 last_attempt_block_num = 4; // Block number of the last failed attempt, if any +} +``` + +If the note is not found in the network transaction builder's database, the endpoint returns `NOT_FOUND`. + ### GetNoteScriptByRoot Request the script for a note by its root. +### Status + +Request the status of the node components. The response contains the current version of the RPC component and the connection status of the other components, including their versions and the number of the most recent block in the chain (chain tip). + ### SubmitProvenTransaction Submit a transaction to the network. @@ -182,15 +221,13 @@ When transaction submission fails, detailed error information is provided throug | `OUTPUT_NOTES_ALREADY_EXIST` | 6 | `INVALID_ARGUMENT` | Output note IDs are already in use | | `TRANSACTION_EXPIRED` | 7 | `INVALID_ARGUMENT` | Transaction has exceeded its expiration block height | -### SyncNullifiers - -Returns nullifier synchronization data for a set of prefixes within a given block range. This method allows clients to efficiently track nullifier creation by retrieving only the nullifiers produced between two blocks. +### SyncAccountStorageMaps -Caller specifies the `prefix_len` (currently only 16), the list of prefix values (`nullifiers`), and the block range (`block_from`, optional `block_to`). The response includes all matching nullifiers created within that range, the last block included in the response (`block_num`), and the current chain tip (`chain_tip`). +Returns storage map synchronization data for a specified public account within a given block range. This method allows clients to efficiently sync the storage map state of an account by retrieving only the changes that occurred between two blocks. -If the response is chunked (i.e., `block_num < block_to`), continue by issuing another request with `block_from = block_num + 1` to retrieve subsequent updates. +Caller specifies the `account_id` of the public account and the block range (`block_from`, `block_to`) for which to retrieve storage updates. The response includes all storage map key-value updates that occurred within that range, along with the last block included in the sync and the current chain tip. -**Limits:** `nullifier` (1000) +This endpoint enables clients to maintain an updated view of account storage. ### SyncAccountVault @@ -198,6 +235,12 @@ Returns information that allows clients to sync asset values for specific public For any `[block_from..block_to]` range, the latest known set of assets is returned for the requested account ID. The data can be split and a cutoff block may be selected if there are too many assets to sync. The response contains the chain tip so that the caller knows when it has been reached. +### SyncChainMmr + +Returns MMR delta information needed to synchronize the chain MMR within a block range. + +Caller specifies the `block_range`, starting from the last block already represented in its local MMR. The response contains the MMR delta for the requested range, but at most to (including) the chain tip. + ### SyncNotes Iteratively sync data for a given set of note tags. @@ -210,54 +253,20 @@ A basic note sync can be implemented by repeatedly requesting the previous respo **Limits:** `note_tag` (1000) -### SyncAccountStorageMaps - -Returns storage map synchronization data for a specified public account within a given block range. This method allows clients to efficiently sync the storage map state of an account by retrieving only the changes that occurred between two blocks. - -Caller specifies the `account_id` of the public account and the block range (`block_from`, `block_to`) for which to retrieve storage updates. The response includes all storage map key-value updates that occurred within that range, along with the last block included in the sync and the current chain tip. +### SyncNullifiers -This endpoint enables clients to maintain an updated view of account storage. +Returns nullifier synchronization data for a set of prefixes within a given block range. This method allows clients to efficiently track nullifier creation by retrieving only the nullifiers produced between two blocks. -### SyncChainMmr +Caller specifies the `prefix_len` (currently only 16), the list of prefix values (`nullifiers`), and the block range (`block_from`, optional `block_to`). The response includes all matching nullifiers created within that range, the last block included in the response (`block_num`), and the current chain tip (`chain_tip`). -Returns MMR delta information needed to synchronize the chain MMR within a block range. +If the response is chunked (i.e., `block_num < block_to`), continue by issuing another request with `block_from = block_num + 1` to retrieve subsequent updates. -Caller specifies the `block_range`, starting from the last block already represented in its local MMR. The response contains the MMR delta for the requested range, but at most to (including) the chain tip. +**Limits:** `nullifier` (1000) ### SyncTransactions Returns transaction records for specific accounts within a block range. -### Status - -Request the status of the node components. The response contains the current version of the RPC component and the connection status of the other components, including their versions and the number of the most recent block in the chain (chain tip). - -### GetNoteError - -Returns the latest execution error for a network note, if any. This is useful for debugging notes that are failing to be consumed by the network transaction builder. - -This endpoint is only available when the network transaction builder is enabled and connected. If it is not configured, the endpoint returns `UNAVAILABLE`. - -#### Request - -```protobuf -message NoteId { - Digest id = 1; // The note ID -} -``` - -#### Response - -```protobuf -message GetNoteErrorResponse { - optional string error = 1; // The latest error message, if any - uint32 attempt_count = 2; // Number of failed execution attempts - optional fixed32 last_attempt_block_num = 3; // Block number of the last failed attempt, if any -} -``` - -If the note is not found in the network transaction builder's database, the endpoint returns `NOT_FOUND`. - ## Error Handling The Miden node uses standard gRPC error reporting mechanisms. When an RPC call fails, a `Status` object is returned containing: diff --git a/docs/internal/src/ntx-builder.md b/docs/internal/src/ntx-builder.md index 2fcf59ca00..27be99a0b7 100644 --- a/docs/internal/src/ntx-builder.md +++ b/docs/internal/src/ntx-builder.md @@ -39,10 +39,12 @@ coordinator syncs all known network accounts and their unconsumed notes from the monitors the mempool for events (via a gRPC event stream from the block-producer) which would impact network account state. -For each network account that has available notes, the coordinator spawns a dedicated -`AccountActor`. Each actor runs in its own async task and is responsible for creating transactions -that consume network notes targeting its account. Actors read their state from the database and -re-evaluate whenever notified by the coordinator. +For each network account, the coordinator spawns a dedicated `AccountActor`. Each actor runs in +its own async task and is responsible for creating transactions that consume network notes targeting +its account. On startup, each actor waits until its account has been committed to the chain before +producing any transactions. This means newly created network accounts will idle until their creation +transaction is included in a block. Once the committed state is available, the actor reads its state +from the database and re-evaluates whenever notified by the coordinator. Actors that have been idle (no available notes to consume) for longer than the **idle timeout** will be deactivated. The idle timeout is configurable via the `--ntx-builder.idle-timeout` CLI @@ -67,6 +69,7 @@ requests to this server. In bundled mode the server is started automatically on wired to the RPC; in distributed mode operators must pass the NTB's address to the RPC via `--ntx-builder.url` (or `MIDEN_NODE_NTX_BUILDER_URL`). -Currently the only endpoint is `GetNoteError(note_id)` which returns the latest execution error -for a given network note, along with the attempt count and the block number of the last attempt. -This is useful for debugging notes that fail to be consumed. +Currently the only endpoint is `GetNetworkNoteStatus(note_id)` which returns the lifecycle status +of a network note (pending, processed, or discarded), along with the latest execution error, +attempt count, and block number of the last attempt. This is useful for debugging notes that fail +to be consumed. diff --git a/proto/Cargo.toml b/proto/Cargo.toml index ee79d7adc1..95ddd8ef88 100644 --- a/proto/Cargo.toml +++ b/proto/Cargo.toml @@ -25,6 +25,11 @@ tonic-prost-build = { workspace = true } [build-dependencies] build-rs = { workspace = true } +codegen = { workspace = true } fs-err = { workspace = true } miette = { version = "7.6" } protox = { workspace = true } + +[package.metadata.cargo-machete] +# Machete misses these because they're required in files generated by build.rs. +ignored = ["protox", "tonic-prost-build"] diff --git a/proto/build.rs b/proto/build.rs index c4c2f9b924..e2498808b8 100644 --- a/proto/build.rs +++ b/proto/build.rs @@ -1,87 +1,105 @@ +use std::ffi::OsStr; +use std::path::{Path, PathBuf}; + use fs_err as fs; -use miette::{Context, IntoDiagnostic}; +use miette::{IntoDiagnostic, miette}; use protox::prost::Message; -const RPC_PROTO: &str = "rpc.proto"; -// Unified internal store API (store.Rpc, store.BlockProducer, store.NtxBuilder). -// We compile the same file three times to preserve existing descriptor names. -const STORE_RPC_PROTO: &str = "internal/store.proto"; -const STORE_NTX_BUILDER_PROTO: &str = "internal/store.proto"; -const STORE_BLOCK_PRODUCER_PROTO: &str = "internal/store.proto"; -const BLOCK_PRODUCER_PROTO: &str = "internal/block_producer.proto"; -const REMOTE_PROVER_PROTO: &str = "remote_prover.proto"; -const VALIDATOR_PROTO: &str = "internal/validator.proto"; -const NTX_BUILDER_PROTO: &str = "internal/ntx_builder.proto"; - -const RPC_DESCRIPTOR: &str = "rpc_file_descriptor.bin"; -const STORE_RPC_DESCRIPTOR: &str = "store_rpc_file_descriptor.bin"; -const STORE_NTX_BUILDER_DESCRIPTOR: &str = "store_ntx_builder_file_descriptor.bin"; -const STORE_BLOCK_PRODUCER_DESCRIPTOR: &str = "store_block_producer_file_descriptor.bin"; -const BLOCK_PRODUCER_DESCRIPTOR: &str = "block_producer_file_descriptor.bin"; -const REMOTE_PROVER_DESCRIPTOR: &str = "remote_prover_file_descriptor.bin"; -const VALIDATOR_DESCRIPTOR: &str = "validator_file_descriptor.bin"; -const NTX_BUILDER_DESCRIPTOR: &str = "ntx_builder_file_descriptor.bin"; - -/// Generates Rust protobuf bindings from .proto files. +/// Compiles each gRPC service definitions into a +/// [`FileDescriptorSet`](tonic_prost_build::FileDescriptorSet) and exposes it as a function: /// -/// This is done only if `BUILD_PROTO` environment variable is set to `1` to avoid running the -/// script on crates.io where repo-level .proto files are not available. +/// ```rust, ignore +/// fn _api_descriptor() -> FileDescriptorSet; +/// ``` fn main() -> miette::Result<()> { build_rs::output::rerun_if_changed("./proto"); + build_rs::output::rerun_if_changed("Cargo.toml"); let out_dir = build_rs::input::out_dir(); - let crate_root = build_rs::input::cargo_manifest_dir(); - let proto_src_dir = crate_root.join("proto"); - let includes = &[proto_src_dir]; - - let rpc_file_descriptor = protox::compile([RPC_PROTO], includes)?; - let rpc_path = out_dir.join(RPC_DESCRIPTOR); - fs::write(&rpc_path, rpc_file_descriptor.encode_to_vec()) - .into_diagnostic() - .wrap_err("writing rpc file descriptor")?; - - let remote_prover_file_descriptor = protox::compile([REMOTE_PROVER_PROTO], includes)?; - let remote_prover_path = out_dir.join(REMOTE_PROVER_DESCRIPTOR); - fs::write(&remote_prover_path, remote_prover_file_descriptor.encode_to_vec()) - .into_diagnostic() - .wrap_err("writing remote prover file descriptor")?; - - let store_rpc_file_descriptor = protox::compile([STORE_RPC_PROTO], includes)?; - let store_rpc_path = out_dir.join(STORE_RPC_DESCRIPTOR); - fs::write(&store_rpc_path, store_rpc_file_descriptor.encode_to_vec()) - .into_diagnostic() - .wrap_err("writing store rpc file descriptor")?; - - let store_ntx_builder_file_descriptor = protox::compile([STORE_NTX_BUILDER_PROTO], includes)?; - let store_ntx_builder_path = out_dir.join(STORE_NTX_BUILDER_DESCRIPTOR); - fs::write(&store_ntx_builder_path, store_ntx_builder_file_descriptor.encode_to_vec()) - .into_diagnostic() - .wrap_err("writing store ntx builder file descriptor")?; - - let store_block_producer_file_descriptor = - protox::compile([STORE_BLOCK_PRODUCER_PROTO], includes)?; - let store_block_producer_path = out_dir.join(STORE_BLOCK_PRODUCER_DESCRIPTOR); - fs::write(&store_block_producer_path, store_block_producer_file_descriptor.encode_to_vec()) - .into_diagnostic() - .wrap_err("writing store block producer file descriptor")?; - - let block_producer_file_descriptor = protox::compile([BLOCK_PRODUCER_PROTO], includes)?; - let block_producer_path = out_dir.join(BLOCK_PRODUCER_DESCRIPTOR); - fs::write(&block_producer_path, block_producer_file_descriptor.encode_to_vec()) - .into_diagnostic() - .wrap_err("writing block producer file descriptor")?; - - let validator_file_descriptor = protox::compile([VALIDATOR_PROTO], includes)?; - let validator_path = out_dir.join(VALIDATOR_DESCRIPTOR); - fs::write(&validator_path, validator_file_descriptor.encode_to_vec()) - .into_diagnostic() - .wrap_err("writing validator file descriptor")?; - - let ntx_builder_file_descriptor = protox::compile([NTX_BUILDER_PROTO], includes)?; - let ntx_builder_path = out_dir.join(NTX_BUILDER_DESCRIPTOR); - fs::write(&ntx_builder_path, ntx_builder_file_descriptor.encode_to_vec()) - .into_diagnostic() - .wrap_err("writing ntx builder file descriptor")?; + let schema_dir = build_rs::input::cargo_manifest_dir().join("proto"); + + // Codegen which will hold the file descriptor functions. + // + // `protox::prost::Message` is a trait which brings into scope the encoding and decoding of file + // descriptors. This is required so because we serialize the descriptors in code as a `Vec` + // and then decode it again inline. + let mut code = codegen::Scope::new(); + code.import("tonic_prost_build", "FileDescriptorSet"); + code.import("protox::prost", "Message"); + + // We split our gRPC services into public and internal. + // + // This is easy to do since public services are listed in the root of the schema folder, + // and internal services are nested in the `internal` folder. + for public_api in proto_files_in_directory(&schema_dir)? { + let file_descriptor_fn = generate_file_descriptor(&public_api, &schema_dir)?; + code.push_fn(file_descriptor_fn); + } + + // Internal gRPC services need an additional feature gate `#[cfg(feature = "internal")]`. + for internal_api in proto_files_in_directory(&schema_dir.join("internal"))? { + let mut file_descriptor_fn = generate_file_descriptor(&internal_api, &schema_dir)?; + file_descriptor_fn.attr("cfg(feature = \"internal\")"); + code.push_fn(file_descriptor_fn); + } + + fs::write(out_dir.join("file_descriptors.rs"), code.to_string()).into_diagnostic()?; Ok(()) } + +/// The list of `*.proto` files in the given directory. +/// +/// Does _not_ recurse into folders; only top level files are returned. +fn proto_files_in_directory(directory: &Path) -> Result, miette::Error> { + let mut proto_files = Vec::new(); + for entry in fs::read_dir(directory).into_diagnostic()? { + let entry = entry.into_diagnostic()?; + + // Skip non-files + if !entry.file_type().into_diagnostic()?.is_file() { + continue; + } + + // Skip non-protobuf files + if PathBuf::from(entry.file_name()).extension().is_none_or(|ext| ext != "proto") { + continue; + } + + proto_files.push(entry.path()); + } + Ok(proto_files) +} + +/// Creates a function which emits the file descriptor of the given gRPC service file. +/// +/// The function looks as follows: +/// +/// ```rust, ignore +/// fn _api_descriptor() -> FileDescriptorSet { +/// FileDescriptorSet::decode(vec![].as_slice()) +/// .expect("encoded file descriptor should decode") +/// } +/// ``` +/// +/// where `` is bytes of the compiled gRPC service. +fn generate_file_descriptor( + grpc_service: &Path, + includes: &Path, +) -> Result { + let file_name = grpc_service + .file_stem() + .and_then(OsStr::to_str) + .ok_or_else(|| miette!("invalid file name for {grpc_service:?}"))?; + + let file_descriptor = protox::compile([grpc_service], includes)?; + let file_descriptor = file_descriptor.encode_to_vec(); + + let mut f = codegen::Function::new(format!("{file_name}_api_descriptor")); + f.vis("pub") + .ret("FileDescriptorSet") + .line(format!("FileDescriptorSet::decode(vec!{file_descriptor:?}.as_slice())")) + .line(".expect(\"we just encoded this so it should decode\")"); + + Ok(f) +} diff --git a/proto/proto/internal/ntx_builder.proto b/proto/proto/internal/ntx_builder.proto index ecc8514a85..5a12d83dd3 100644 --- a/proto/proto/internal/ntx_builder.proto +++ b/proto/proto/internal/ntx_builder.proto @@ -2,6 +2,7 @@ syntax = "proto3"; package ntx_builder; +import "rpc.proto"; import "types/note.proto"; // NTX BUILDER API @@ -9,22 +10,12 @@ import "types/note.proto"; // API for querying network transaction builder state. service Api { - // Returns the latest execution error for a network note, if any. + // Returns the current status of a network note. // - // This is useful for debugging notes that are failing to be consumed by - // the network transaction builder. - rpc GetNoteError(note.NoteId) returns (GetNoteErrorResponse) {} -} - -// GET NOTE ERROR -// ================================================================================================ - -// Response containing the latest execution error for a network note. -message GetNoteErrorResponse { - // The latest error message, if any. - optional string error = 1; - // Number of failed execution attempts. - uint32 attempt_count = 2; - // Block number of the last failed attempt, if any. - optional fixed32 last_attempt_block_num = 3; + // The status indicates where the note is in its lifecycle: pending execution, + // processed (consumed by a transaction in the mempool), or discarded after too + // many failed attempts. + // + // Returns `NOT_FOUND` if the note ID is not tracked by the network transaction builder. + rpc GetNetworkNoteStatus(note.NoteId) returns (rpc.GetNetworkNoteStatusResponse) {} } diff --git a/proto/proto/internal/store.proto b/proto/proto/internal/store.proto index 7de72ef0d6..d510d4666b 100644 --- a/proto/proto/internal/store.proto +++ b/proto/proto/internal/store.proto @@ -34,8 +34,8 @@ service Rpc { // Returns the latest details the specified account. rpc GetAccount(rpc.AccountRequest) returns (rpc.AccountResponse) {} - // Returns raw block data for the specified block number. - rpc GetBlockByNumber(blockchain.BlockNumber) returns (blockchain.MaybeBlock) {} + // Returns raw block data for the specified block number, optionally including the block proof. + rpc GetBlockByNumber(blockchain.BlockRequest) returns (blockchain.MaybeBlock) {} // Retrieves block header by given block number. Optionally, it also returns the MMR path // and current chain length to authenticate the block's inclusion. diff --git a/proto/proto/internal/validator.proto b/proto/proto/internal/validator.proto index e3bb02a61c..76d22828b8 100644 --- a/proto/proto/internal/validator.proto +++ b/proto/proto/internal/validator.proto @@ -32,4 +32,13 @@ message ValidatorStatus { // The validator's status. string status = 2; + + // The validator's current chain tip (highest signed block number). + fixed32 chain_tip = 3; + + // The total number of transactions validated by this validator. + fixed64 validated_transactions_count = 4; + + // The total number of blocks signed by this validator. + fixed64 signed_blocks_count = 5; } diff --git a/proto/proto/rpc.proto b/proto/proto/rpc.proto index ed7841f574..ede29f1518 100644 --- a/proto/proto/rpc.proto +++ b/proto/proto/rpc.proto @@ -40,8 +40,8 @@ service Api { // Returns the latest details of the specified account. rpc GetAccount(AccountRequest) returns (AccountResponse) {} - // Returns raw block data for the specified block number. - rpc GetBlockByNumber(blockchain.BlockNumber) returns (blockchain.MaybeBlock) {} + // Returns raw block data for the specified block number, optionally including the block proof. + rpc GetBlockByNumber(blockchain.BlockRequest) returns (blockchain.MaybeBlock) {} // Retrieves block header by given block number. Optionally, it also returns the MMR path // and current chain length to authenticate the block's inclusion. @@ -100,11 +100,14 @@ service Api { // NOTE DEBUGGING ENDPOINTS // -------------------------------------------------------------------------------------------- - // Returns the latest execution error for a network note, if any. + // Returns the current status of a network note. // - // This is useful for debugging notes that are failing to be consumed by - // the network transaction builder. - rpc GetNoteError(note.NoteId) returns (GetNoteErrorResponse) {} + // The status indicates where the note is in its lifecycle: pending execution, + // processed (consumed by a transaction in the mempool), or discarded after too + // many failed attempts. + // + // Returns `NOT_FOUND` if the note ID is not tracked by the network transaction builder. + rpc GetNetworkNoteStatus(note.NoteId) returns (GetNetworkNoteStatusResponse) {} } // RPC STATUS @@ -494,30 +497,30 @@ message SyncNotesResponse { // SYNC CHAIN MMR // ================================================================================================ -// The finality level for chain data queries. -enum Finality { - // Return data up to the latest committed block. - FINALITY_UNSPECIFIED = 0; - // Return data up to the latest committed block. - FINALITY_COMMITTED = 1; - // Return data only up to the latest proven block. - FINALITY_PROVEN = 2; +// The chain tip variant to sync up to. +enum ChainTip { + CHAIN_TIP_UNSPECIFIED = 0; + // Sync up to the latest committed block (chain tip). + CHAIN_TIP_COMMITTED = 1; + // Sync up to the latest proven block. + CHAIN_TIP_PROVEN = 2; } // Chain MMR synchronization request. message SyncChainMmrRequest { - // Block range from which to synchronize the chain MMR. - // - // The response will contain MMR delta starting after `block_range.block_from` up to - // `block_range.block_to` or the effective tip (whichever is lower). Set `block_from` to the - // last block already present in the caller's MMR so the delta begins at the next block. - BlockRange block_range = 1; + // Block number from which to synchronize (inclusive). Set this to the last block + // already present in the caller's MMR so the delta begins at the next block. + fixed32 block_from = 1; - // The finality level to use when clamping the upper bound of the block range. - // - // When set to `FINALITY_UNSPECIFIED` or `FINALITY_COMMITTED`, the upper bound is clamped to the chain tip. - // When set to `FINALITY_PROVEN`, the upper bound is clamped to the latest proven block. - Finality finality = 2; + // Upper bound for the block range. Determines how far ahead to sync. + oneof upper_bound { + // Sync up to this specific block number (inclusive), clamped to the committed chain tip. + fixed32 block_num = 2; + // Sync up to a chain tip variant (committed or proven). + ChainTip chain_tip = 3; + } + + reserved 4; } // Represents the result of syncing chain MMR. @@ -647,17 +650,33 @@ message TransactionRecord { repeated note.NoteInclusionInBlockProof output_note_proofs = 3; } -// GET NOTE ERROR +// GET NETWORK NOTE STATUS // ================================================================================================ -// Response containing the latest execution error for a network note. -message GetNoteErrorResponse { - // The latest error message, if any. - optional string error = 1; +// Lifecycle status of a network note within the transaction builder. +enum NetworkNoteStatus { + // Default / unspecified status. + NETWORK_NOTE_STATUS_UNSPECIFIED = 0; + // The note is awaiting execution or being retried after transient failures. + NETWORK_NOTE_STATUS_PENDING = 1; + // The note has been consumed by a transaction that was sent to the block producer. + NETWORK_NOTE_STATUS_NULLIFIER_INFLIGHT = 2; + // The note exceeded the maximum retry count and will not be retried. + NETWORK_NOTE_STATUS_DISCARDED = 3; + // The note's consuming transaction has been committed on-chain. + NETWORK_NOTE_STATUS_NULLIFIER_COMMITTED = 4; +} + +// Response containing the lifecycle status and latest execution error for a network note. +message GetNetworkNoteStatusResponse { + // Current lifecycle status of the note. + NetworkNoteStatus status = 1; + // The latest error message from execution, if any. + optional string last_error = 2; // Number of failed execution attempts. - uint32 attempt_count = 2; + uint32 attempt_count = 3; // Block number of the last failed attempt, if any. - optional fixed32 last_attempt_block_num = 3; + optional fixed32 last_attempt_block_num = 4; } // RPC LIMITS diff --git a/proto/proto/types/blockchain.proto b/proto/proto/types/blockchain.proto index e87a3648da..e865258768 100644 --- a/proto/proto/types/blockchain.proto +++ b/proto/proto/types/blockchain.proto @@ -21,11 +21,25 @@ message ProposedBlock { bytes proposed_block = 1; } -// Represents a block or nothing. +// Request for retrieving a block by its number, optionally including the block proof. +message BlockRequest { + // The block number of the target block. + fixed32 block_num = 1; + // Whether to include the block proof in the response. + optional bool include_proof = 2; +} + +// Response containing the block data and optionally its proof. +// +// Contains empty values for both blocks and proofs that are not found. Some blocks may not yet be +// proven so it is possible to retrieve a block without a proof even if the proof has been requested. message MaybeBlock { // The requested block data encoded using [miden_serde_utils::Serializable] implementation for - // [miden_protocol::block::Block]. + // [miden_protocol::block::SignedBlock]. optional bytes block = 1; + // The block proof encoded using [miden_serde_utils::Serializable] implementation for + // [miden_protocol::block::BlockProof], if requested and available. + optional bytes proof = 2; } // Represents a block number. diff --git a/proto/src/lib.rs b/proto/src/lib.rs index 6cbc6eb015..f38cc3ad50 100644 --- a/proto/src/lib.rs +++ b/proto/src/lib.rs @@ -1,65 +1 @@ -use protox::prost::Message; -use tonic_prost_build::FileDescriptorSet; - -/// Returns the Protobuf file descriptor for the RPC API. -pub fn rpc_api_descriptor() -> FileDescriptorSet { - let bytes = include_bytes!(concat!(env!("OUT_DIR"), "/", "rpc_file_descriptor.bin")); - FileDescriptorSet::decode(&bytes[..]) - .expect("bytes should be a valid file descriptor created by build.rs") -} - -/// Returns the Protobuf file descriptor for the remote prover API. -pub fn remote_prover_api_descriptor() -> FileDescriptorSet { - let bytes = include_bytes!(concat!(env!("OUT_DIR"), "/", "remote_prover_file_descriptor.bin")); - FileDescriptorSet::decode(&bytes[..]) - .expect("bytes should be a valid file descriptor created by build.rs") -} - -/// Returns the Protobuf file descriptor for the store RPC API. -#[cfg(feature = "internal")] -pub fn store_rpc_api_descriptor() -> FileDescriptorSet { - let bytes = include_bytes!(concat!(env!("OUT_DIR"), "/", "store_rpc_file_descriptor.bin")); - FileDescriptorSet::decode(&bytes[..]) - .expect("bytes should be a valid file descriptor created by build.rs") -} - -/// Returns the Protobuf file descriptor for the store NTX builder API. -#[cfg(feature = "internal")] -pub fn store_ntx_builder_api_descriptor() -> FileDescriptorSet { - let bytes = - include_bytes!(concat!(env!("OUT_DIR"), "/", "store_ntx_builder_file_descriptor.bin")); - FileDescriptorSet::decode(&bytes[..]) - .expect("bytes should be a valid file descriptor created by build.rs") -} - -/// Returns the Protobuf file descriptor for the store block producer API. -#[cfg(feature = "internal")] -pub fn store_block_producer_api_descriptor() -> FileDescriptorSet { - let bytes = - include_bytes!(concat!(env!("OUT_DIR"), "/", "store_block_producer_file_descriptor.bin")); - FileDescriptorSet::decode(&bytes[..]) - .expect("bytes should be a valid file descriptor created by build.rs") -} - -/// Returns the Protobuf file descriptor for the block-producer API. -#[cfg(feature = "internal")] -pub fn block_producer_api_descriptor() -> FileDescriptorSet { - let bytes = include_bytes!(concat!(env!("OUT_DIR"), "/", "block_producer_file_descriptor.bin")); - FileDescriptorSet::decode(&bytes[..]) - .expect("bytes should be a valid file descriptor created by build.rs") -} - -/// Returns the Protobuf file descriptor for the validator API. -pub fn validator_api_descriptor() -> FileDescriptorSet { - let bytes = include_bytes!(concat!(env!("OUT_DIR"), "/", "validator_file_descriptor.bin")); - FileDescriptorSet::decode(&bytes[..]) - .expect("bytes should be a valid file descriptor created by build.rs") -} - -/// Returns the Protobuf file descriptor for the NTX builder API. -#[cfg(feature = "internal")] -pub fn ntx_builder_api_descriptor() -> FileDescriptorSet { - let bytes = include_bytes!(concat!(env!("OUT_DIR"), "/", "ntx_builder_file_descriptor.bin")); - FileDescriptorSet::decode(&bytes[..]) - .expect("bytes should be a valid file descriptor created by build.rs") -} +include!(concat!(env!("OUT_DIR"), "/file_descriptors.rs")); diff --git a/scripts/run-node.sh b/scripts/run-node.sh new file mode 100755 index 0000000000..febd5cf965 --- /dev/null +++ b/scripts/run-node.sh @@ -0,0 +1,135 @@ +#!/usr/bin/env bash +set -euo pipefail + +# Configuration +SKIP_BOOTSTRAP="${SKIP_BOOTSTRAP:-false}" +BINARY="${MIDEN_NODE_BIN:-./target/debug/miden-node}" +KMS_KEY_ID="${KMS_KEY_ID:-}" +if [[ -n "$KMS_KEY_ID" ]]; then + AWS_REGION="${AWS_REGION:?error: AWS_REGION environment variable must be set when KMS_KEY_ID is set}" + export AWS_REGION +fi + +GENESIS_CONFIG="crates/store/src/genesis/config/samples/01-simple.toml" +STORE_DIR="/tmp/store" +VALIDATOR_DIR="/tmp/validator" +NTX_BUILDER_DIR="/tmp/ntx-builder" +ACCOUNTS_DIR="/tmp/accounts" + +# Store exposes 3 separate APIs. +STORE_RPC_URL="http://0.0.0.0:50001" +STORE_NTX_BUILDER_URL="http://0.0.0.0:50002" +STORE_BLOCK_PRODUCER_URL="http://0.0.0.0:50003" + +VALIDATOR_URL="http://0.0.0.0:50101" +BLOCK_PRODUCER_URL="http://0.0.0.0:50201" +RPC_URL="http://0.0.0.0:57291" + +PIDS=() + +cleanup() { + echo "Shutting down..." + for pid in "${PIDS[@]}"; do + kill "$pid" 2>/dev/null || true + done + wait + echo "All components stopped." +} +trap cleanup EXIT INT TERM + +# --- Kill processes on required ports --- + +PORTS=(50001 50002 50003 50101 50201 57291) +echo "=== Killing processes on required ports ===" +for port in "${PORTS[@]}"; do + pids=$(lsof -ti :"$port" 2>/dev/null || true) + if [[ -n "$pids" ]]; then + for pid in $pids; do + echo "Killing PID $pid on port $port" + kill -9 "$pid" 2>/dev/null || true + done + fi +done +sleep 1 + +# --- Bootstrap --- + +if [[ "$SKIP_BOOTSTRAP" != "true" ]]; then + echo "=== Bootstrapping ===" + + rm -rf "$VALIDATOR_DIR" "$ACCOUNTS_DIR" "$STORE_DIR" "$NTX_BUILDER_DIR" + mkdir -p "$NTX_BUILDER_DIR" + + echo "Bootstrapping validator..." + KMS_BOOTSTRAP_ARGS=() + if [[ -n "$KMS_KEY_ID" ]]; then + KMS_BOOTSTRAP_ARGS+=(--validator.key.kms-id "$KMS_KEY_ID") + fi + + $BINARY validator bootstrap \ + --data-directory "$VALIDATOR_DIR" \ + --genesis-block-directory "$VALIDATOR_DIR" \ + --accounts-directory "$ACCOUNTS_DIR" \ + --genesis-config-file "$GENESIS_CONFIG" \ + "${KMS_BOOTSTRAP_ARGS[@]+"${KMS_BOOTSTRAP_ARGS[@]}"}" + + echo "Bootstrapping store..." + $BINARY store bootstrap \ + --data-directory "$STORE_DIR" \ + --genesis-block "$VALIDATOR_DIR/genesis.dat" +else + echo "=== Skipping bootstrap (SKIP_BOOTSTRAP=true) ===" +fi + +# --- Start components --- + +echo "=== Starting components ===" + +echo "Starting store..." +$BINARY store start \ + --rpc.url "$STORE_RPC_URL" \ + --ntx-builder.url "$STORE_NTX_BUILDER_URL" \ + --block-producer.url "$STORE_BLOCK_PRODUCER_URL" \ + --data-directory "$STORE_DIR" \ + --enable-otel & +PIDS+=($!) + +KMS_START_ARGS=() +if [[ -n "$KMS_KEY_ID" ]]; then + KMS_START_ARGS+=(--key.kms-id "$KMS_KEY_ID") +fi + +echo "Starting validator..." +$BINARY validator start "$VALIDATOR_URL" \ + --enable-otel \ + --data-directory "$VALIDATOR_DIR" \ + "${KMS_START_ARGS[@]+"${KMS_START_ARGS[@]}"}" & +PIDS+=($!) + +# Give store and validator a moment to bind their ports. +sleep 2 + +echo "Starting block producer..." +$BINARY block-producer start "$BLOCK_PRODUCER_URL" \ + --store.url "http://127.0.0.1:50003" \ + --validator.url "http://127.0.0.1:50101" & +PIDS+=($!) + +echo "Starting RPC server..." +$BINARY rpc start \ + --url "$RPC_URL" \ + --store.url "http://127.0.0.1:50001" \ + --block-producer.url "http://127.0.0.1:50201" \ + --validator.url "http://127.0.0.1:50101" & +PIDS+=($!) + +echo "Starting network transaction builder..." +$BINARY ntx-builder start \ + --store.url "http://127.0.0.1:50002" \ + --block-producer.url "http://127.0.0.1:50201" \ + --validator.url "http://127.0.0.1:50101" \ + --data-directory "$NTX_BUILDER_DIR" & +PIDS+=($!) + +echo "=== All components running. Ctrl+C to stop. ===" +wait