β οΈ Work in ProgressStwo is currently under active development and is not recommended for production use.
Stwo is a next-generation implementation of a Circle STARK (CSTARK) prover and verifier framework, written in Rust π¦. While it includes example implementations for demonstration purposes, the primary purpose of this repository is to serve as a foundational framework for building more complex, production-grade provers.
The framework provides a high-performance, flexible foundation for generating and verifying STARK proofs using the latest cryptographic research and innovations in STARK proof systems. It is designed to be extended and customized for specific use cases beyond the simple examples included in this repository.
- Circle STARKs: Implementation based on the latest cryptographic research and innovations in STARK proof systems
- High Performance: Designed for extreme speed and efficiency with optimized SIMD operations
- Rust Toolchain: This project requires Rust nightly (see
rust-toolchain.tomlfor the specific version)
Clone the repository and build the project:
git clone https://github.com/starkware-libs/stwo.git
cd stwo
cargo build --releaseThe project includes example implementations in the crates/examples directory:
# Run examples
cargo run --release --example <example_name>This is a Rust workspace containing multiple crates:
crates/stwo- Core library implementing the Circle STARK prover and verifiercrates/constraint-framework- Framework for expressing and evaluating constraintscrates/air-utils- Utilities for working with Arithmetic Intermediate Representation (AIR)crates/air-utils-derive- Derive macros for AIR utilitiescrates/examples- Example implementations demonstrating basic usagecrates/std-shims- Standard library shims forno_stdcompatibility
While the examples in this repository demonstrate basic usage patterns, Stwo is designed as a framework for building sophisticated provers that handle complex real-world scenarios. A prominent example of such an application is:
stwo-cairo is a production-oriented prover built on top of the Stwo framework that enables proving Cairo program executions.
Run a single-threaded Poseidon2 hash proof benchmark:
./poseidon_benchmark.shRun all benchmarks using Cargo:
cargo benchNote: to keep benchmark output clean from optional println! noise, set STWO_QUIET=1 when running benchmarks, for example: STWO_QUIET=1 cargo bench ....
Visual representation of benchmark results can be found at: https://starkware-libs.github.io/stwo/dev/bench/index.html
- 1 *
NVIDIA GeForce RTX 4090 - CPU:
AMD EPYC 9224 with 16 cores - Memory:
94GB - CUDA Toolkit:
13.0.1_580.82.07
- Blake2s commit channel
- Tests (Prove)
- SIMD:
RUSTFLAGS="-C target-cpu=native -C opt-level=3" MIN_LOG=16 MAX_LOG=24 RUST_LOG=info RAYON_NUM_THREADS=16 cargo test test_wide_fib_prove_with_blake_simd --release --features parallel -- --nocapture - GPU:
MIN_LOG=16 MAX_LOG=23 RAYON_NUM_THREADS=16 cargo test --release test_wide_fib_prove_with_blake_cuda --features parallel -- --nocapture
- SIMD:
- Benchmarks (Throughput)
- SIMD:
STWO_QUIET=1 LOG_N_INSTANCES=23 RUSTFLAGS="-C target-cpu=native -C opt-level=3" RAYON_NUM_THREADS=16 cargo bench --bench wide_fibonacci_blake2s --features parallel -- --nocapture - GPU:
STWO_QUIET=1 LOG_N_INSTANCES=23 RAYON_NUM_THREADS=16 cargo bench --bench wide_fibonacci_cuda_blake2s --features parallel -- --nocapture
- SIMD:
- Tests (Prove)
| Log(Size) | Prove SIMD ms | Prove GPU ms | Speedup | Thr SIMD (Kelem/s) | Thr GPU (Kelem/s) | Thr Speedup |
|---|---|---|---|---|---|---|
| 16 | 199 | 222 | 0.90x | 429 | 2131 | 4.96x |
| 17 | 267 | 34 | 7.85x | 478 | 3844 | 8.05x |
| 18 | 450 | 47 | 9.57x | 671 | 6672 | 9.94x |
| 19 | 757 | 57 | 13.28x | 909 | 10123 | 11.14x |
| 20 | 1390 | 87 | 15.98x | 973 | 12638 | 12.99x |
| 21 | 2670 | 139 | 19.21x | 1127 | 15007 | 13.32x |
| 22 | 5166 | 254 | 20.34x | 870 | 16435 | 18.89x |
| 23 | 11014 | 488 | 22.57x | 783 | 17898 | 22.86x |
- Poseidon commit channel
- Tests (Prove)
- SIMD:
RUSTFLAGS="-C target-cpu=native -C opt-level=3" MIN_LOG=16 MAX_LOG=24 RUST_LOG=info RAYON_NUM_THREADS=16 cargo test test_wide_fib_prove_with_poseidon_simd --release --features parallel -- --nocapture - GPU:
MIN_LOG=16 MAX_LOG=23 RAYON_NUM_THREADS=16 cargo test --release test_wide_fib_prove_with_poseidon_cuda --features parallel -- --nocapture
- SIMD:
- Benchmarks (Throughput)
- SIMD:
STWO_QUIET=1 LOG_N_INSTANCES=23 RUSTFLAGS="-C target-cpu=native -C opt-level=3" RAYON_NUM_THREADS=16 cargo bench --bench wide_fibonacci_poseidon --features parallel -- --nocapture - GPU:
STWO_QUIET=1 LOG_N_INSTANCES=23 RAYON_NUM_THREADS=16 cargo bench --bench wide_fibonacci_cuda_poseidon --features parallel -- --nocapture
- SIMD:
- Tests (Prove)
| Log(Size) | Prove SIMD ms | Prove GPU ms | Speedup | Thr SIMD (Kelem/s) | Thr GPU (Kelem/s) | Thr Speedup |
|---|---|---|---|---|---|---|
| 16 | 2566 | 278 | 9.23x | 34 | 798 | 23.73x |
| 17 | 4919 | 105 | 46.85x | 34 | 1301 | 37.75x |
| 18 | 9745 | 133 | 73.27x | 35 | 1939 | 54.65x |
| 19 | 19146 | 196 | 97.68x | 36 | 2642 | 74.27x |
| 20 | 38325 | 321 | 119.39x | 36 | 3282 | 91.78x |
| 21 | 76322 | 558 | 136.78x | 35 | 3695 | 104.18x |
| 22 | 152660 | 1037 | 147.21x | 36 | 3975 | 111.35x |
| 23 | 305667 | 2010 | 152.07x | 36 | 4124 | 116.10x |
- Blake2s commit channel
- Tests (Prove)
- SIMD:
RUSTFLAGS="-C target-cpu=native -C opt-level=3" MIN_LOG=16 MAX_LOG=23 RUST_LOG=info RAYON_NUM_THREADS=16 cargo test test_simd_poseidon_prove --release --features parallel -- --nocapture - GPU:
MIN_LOG=16 MAX_LOG=22 RAYON_NUM_THREADS=16 cargo test --release test_poseidon_prove_with_blake_cuda --features parallel -- --nocapture
- SIMD:
- Benchmarks (Throughput)
- SIMD:
STWO_QUIET=1 LOG_N_INSTANCES=22 RUSTFLAGS="-C target-cpu=native -C opt-level=3" RAYON_NUM_THREADS=16 cargo bench --bench poseidon_blake2s --features parallel -- --nocapture - GPU:
STWO_QUIET=1 LOG_N_INSTANCES=22 RAYON_NUM_THREADS=16 cargo bench --bench poseidon_cuda_blake2s --features parallel -- --nocapture
- SIMD:
- Tests (Prove)
| Log(Size) | Prove SIMD ms | Prove GPU ms | Speedup | Thr SIMD (Kelem/s) | Thr GPU (Kelem/s) | Thr Speedup |
|---|---|---|---|---|---|---|
| 16 | 3545 | 198 | 17.90x | 207 | 1280 | 6.19x |
| 17 | 6653 | 221 | 30.10x | 250 | 2179 | 8.73x |
| 18 | 13355 | 231 | 57.81x | 288 | 3497 | 12.12x |
| 19 | 26097 | 272 | 95.94x | 363 | 4962 | 13.66x |
| 20 | 51518 | 312 | 165.12x | 457 | 6312 | 13.82x |
| 21 | 103045 | 403 | 255.69x | 331 | 7186 | 21.71x |
| 22 | 206096 | 579 | 355.95x | 309 | 7763 | 25.12x |
| 23 | 422364 | N/A | N/A | N/A | N/A | N/A |
- Poseidon commit channel
- Tests (Prove)
- SIMD:
RUSTFLAGS="-C target-cpu=native -C opt-level=3" MIN_LOG=16 MAX_LOG=23 RUST_LOG=info RAYON_NUM_THREADS=16 cargo test test_simd_poseidon_prove_with_poseidon --release --features parallel -- --nocapture - GPU:
MIN_LOG=16 MAX_LOG=22 RAYON_NUM_THREADS=16 cargo test --release test_poseidon_prove_with_poseidon_cuda --features parallel -- --nocapture
- SIMD:
- Benchmarks (Throughput)
- SIMD:
STWO_QUIET=1 LOG_N_INSTANCES=22 RUSTFLAGS="-C target-cpu=native -C opt-level=3" RAYON_NUM_THREADS=16 cargo bench --bench poseidon_poseidon --features parallel -- --nocapture - GPU:
STWO_QUIET=1 LOG_N_INSTANCES=22 RAYON_NUM_THREADS=16 cargo bench --bench poseidon_cuda_poseidon --features parallel -- --nocapture
- SIMD:
- Tests (Prove)
| Log(Size) | Prove SIMD ms | Prove GPU ms | Speedup | Thr SIMD (Kelem/s) | Thr GPU (Kelem/s) | Thr Speedup |
|---|---|---|---|---|---|---|
| 16 | 3341 | 245 | 13.64x | 17 | 542 | 31.27x |
| 17 | 6613 | 290 | 22.80x | 18 | 845 | 47.19x |
| 18 | 13384 | 348 | 38.46x | 18 | 1208 | 66.46x |
| 19 | 26085 | 436 | 59.83x | 18 | 1543 | 84.68x |
| 20 | 51397 | 598 | 85.95x | 18 | 1803 | 98.11x |
| 21 | 103973 | 964 | 107.86x | 18 | 1957 | 107.54x |
| 22 | 207684 | 1623 | 127.96x | 18 | 2060 | 112.94x |
| 23 | 411598 | N/A | N/A | N/A | N/A | N/A |
- Added Poseidon commit channel variants for both Wide-Fibonacci and Poseidon examples (SIMD and CUDA), and corresponding benches/tests.
- Renamed previous Blake2s-channel benches to
*_blake2sto distinguish from*_poseidonbenches. - Tests now reuse shared example prove functions across SIMD/CUDA and Blake2s/Poseidon channels, reducing duplication.
- stwo-gpu : The m31 field arithmetic and extended field operations, FRI operations and quotient accumulator are inspired by stwo-gpu.
- era-bellman-cuda : Low-level field arithmetic code was referenced when implementing Poseidon252 hash computations.
This project is licensed under the Apache License 2.0.
See the LICENSE file for more information.
