feat: allow custom lookup paths for nonlinear ops and fix Python GIL deadlock by changshenhan · Pull Request #1023 · zkonduit/ezkl

changshenhan · 2026-03-09T01:33:15Z

Add optional custom lookup table for Sigmoid (PWL from JSON)

Problem

EZKL implements Sigmoid (and other nonlinearities) via fixed built‑in lookup tables. There is currently no way for users to:

Plug in a custom approximation (e.g. a different piecewise‑linear fit, non‑uniform segments, or externally calibrated breakpoints).
Integrate EZKL with existing PWL pipelines/tools.

This makes it harder to:

Experiment with accuracy vs. circuit‑size trade‑offs on real data distributions.
Share and reproduce explicit approximation schemes (e.g. a particular PWL fit used across systems).

Approach

Add an optional run arg:

pub struct RunArgs {
    // ...
    pub custom_lookup_path: Option<String>,
}

When custom_lookup_path is set:

Every ONNX Sigmoid node is implemented as

LookupOp::Custom { scale, path }

instead of

LookupOp::Sigmoid { scale }

The file at path is a JSON with
```
{
  "breakpoints": [...], // length n+1
  "slopes":      [...], // length n
  "intercepts":  [...]  // length n
}
```
defining a piecewise‑linear map. The lookup table is filled by evaluating this PWL over the configured lookup_range.
The same Halo2 lookup constraint machinery as for LookupOp::Sigmoid is used; only the table contents are user‑defined.

When custom_lookup_path is None or unset, behavior is unchanged from current main (fully backward compatible).

Design choices

Choice	Reason
JSON file for PWL	Easy to generate from Python/other languages; no Rust recompile required; matches common PWL representations (breakpoints + slope/intercept per segment).
Global cache (`lazy_static` + `Mutex`) for loaded PWL	During prove, layout runs multiple times and can run on worker threads. Re‑reading the file on every pass is redundant and can cause hangs with a thread‑local cache. A single global cache ensures one load per path and safe reuse across passes/threads.
Only Sigmoid → Custom for now	Keeps the PR tightly scoped; the same mechanism could later be extended to other nonlinear ops if desired.
Python: release GIL during `prove`	`prove()` calls into halo2, which uses rayon for FFT/commit. If the main thread holds the GIL while blocking on rayon, deadlocks can occur. Wrapping the prove call in `py.allow_threads(...)` releases the GIL so that workers can run; this was necessary when using the custom lookup path from Python.
Input range enforced within PWL breakpoints	If the circuit’s lookup range (in float) extends outside the user’s breakpoints, we return a clear error instead of extrapolating, to avoid unsound behavior.

Compatibility with existing `LookupOp`

Constraint system: No structural change. Custom uses the same table layout and lookup argument as LookupOp::Sigmoid; only the way table cells are filled differs (user PWL vs. built‑in σ(x)).
Serialization: LookupOp gains a new variant
```
Custom { scale, path }
```
Existing configs that do not reference Custom are unaffected.
RunArgs: custom_lookup_path is an Option<String> with #[serde(default)], so existing JSON configs and CLI usage remain valid.

Files touched

src/lib.rs: add custom_lookup_path to RunArgs, defaulting to None.
src/bindings/python.rs:
- add custom_lookup_path to PyRunArgs, and pass it through in conversions;
- wrap prove() in py.allow_threads(...) to release the GIL while halo2/rayon runs.
src/circuit/ops/lookup.rs:
- add LookupOp::Custom { scale, path };
- add PwlParams (breakpoints/slopes/intercepts), JSON load + global cache;
- implement f() for Custom by evaluating the PWL map over the integer‑scaled input;
- validate breakpoints strictly increasing and input range within breakpoints; clearer errors and log::warn! for soundness.
src/graph/utilities.rs: in the ONNX "Sigmoid" branch, if run_args.custom_lookup_path is Some(path), emit LookupOp::Custom { scale, path }, otherwise keep the existing LookupOp::Sigmoid { scale }.
src/circuit/table.rs: rely on nonlinearity.f() to produce table contents; for Custom this means evaluating the user PWL. Optional logs for custom lookup.
src/pfsys/mod.rs: add an optional heartbeat during create_proof and fix an mpsc::channel::<()>() type issue.
examples/notebooks/custom_lookup_demo.ipynb: new demo notebook with generate_pwl_json (uniform / curvature / quantile / custom), full pipeline, production tip.
examples/pwl_sigmoid_example.json: example PWL file.
docs/custom_lookup_table.md: JSON schema, step‑by‑step guide, caveats (including input range and margin).
README.md: short “Custom lookup table” line and link to the doc.
tests/py_integration_tests.rs: register custom_lookup_demo.ipynb.
tests/integration_tests.rs: add mock_custom_lookup_1l_sigmoid and optional custom_lookup_path in gen_circuit_settings_and_witness.

Accuracy and performance (clarified)

To avoid confusion: the main benefit of this PR is numerical accuracy and flexibility, not a dramatic asymptotic speedup.

For a small Conv+ReLU+Sigmoid toy model with num_rows ≈ 4,385:

With a conservative logrows = 17, the lookup circuit pays a larger SRS and longer proving time.
With the more natural choice logrows = ceil(log2(num_rows)) = 13, both the built‑in Sigmoid lookup and the custom PWL lookup prove in about 1.1 s on the same machine.
In that “balanced” configuration, the difference is almost entirely in accuracy, not in proving cost.

Single‑Sigmoid input level (real input distribution, same circuit/logrows):

Custom PWL table (pwl_params.json built from real data.json by non‑uniform quantiles):
- max |approx − σ(x)| ≈ 5.4e‑11
- mean |approx − σ(x)| ≈ 5.6e‑12
A quantized “default” lookup model (simulating input/output quantized at 1/128):
- max |approx − σ(x)| ≈ 4.7e‑3
- mean |approx − σ(x)| ≈ 1.9e‑3

Full model output level (Conv+ReLU+Sigmoid branch, same circuit/logrows):

With the custom PWL lookup:
- average absolute error ≈ 4.8e‑5
- max absolute error ≈ 1.8e‑4
With the default lookup (quantized model):
- average absolute error ≈ 1.8e‑3
- max absolute error ≈ 4.6e‑3

So, for this toy model and data, the custom PWL improves end‑to‑end model accuracy by roughly 1–2 orders of magnitude, at essentially the same proving cost when logrows is chosen based on num_rows.

Testing

Rust: Added mock_custom_lookup_1l_sigmoid in tests/integration_tests.rs, which runs the 1l_sigmoid example with a PWL file via --custom-lookup-path, then calibrate → compile → gen_witness → mock.
Python: Registered custom_lookup_demo.ipynb in tests/py_integration_tests.rs so the demo notebook is executed with the rest of the notebook suite.
Verified locally: full pipeline with custom_lookup_path set (gen_settings → compile → gen_witness → setup → prove → verify) succeeds; with custom_lookup_path = None, behavior matches upstream (Sigmoid uses built‑in lookup).
Additional experiments in a separate repo compare built‑in vs custom PWL vs true σ(x) on real input distributions and accuracy at both single‑Sigmoid and full‑model level.

Documentation

README: Added a short “Custom lookup table” sentence and link to the doc.
New doc: docs/custom_lookup_table.md with the JSON schema, step‑by‑step guide, caveats (input must be within breakpoints, production/margin tip), and usage for Python/CLI.
Notebook: custom_lookup_demo.ipynb documents how to produce the table (generate_pwl_json with uniform, curvature, quantile, or custom breakpoints) and the full EZKL pipeline.

Review feedback addressed

Example in examples/notebooks: custom_lookup_demo.ipynb with full pipeline and generate_pwl_json.
Py integration test: custom_lookup_demo.ipynb registered in tests/py_integration_tests.rs.
Rust test: mock_custom_lookup_1l_sigmoid in tests/integration_tests.rs.
Usability (“how to produce the table”): Doc, notebook helper, and production tip; breakpoints can be uniform, curvature‑based, quantile‑based, or custom.
Error handling and soundness: Input range check (error if lookup range outside PWL breakpoints), strictly increasing breakpoints validation, clearer errors with path, and log::warn! on first load.

…ase GIL during prove

changshenhan · 2026-03-13T16:20:05Z

Hi @jasonmorton @alexander-camuto @JSeam2,Sorry for the ping, but I wanted to share some compelling end-to-end benchmark results I just finished, which might be helpful for the review of this PR.In a full Conv+ReLU+Sigmoid forward pass, using the custom 1024-segment PWL proposed in this PR:Precision: Reduced the Mean Absolute Error (MAE) from $1.8 \times 10^{-3}$ (default lookup) to $4.8 \times 10^{-5}$ (this PR) — a ~40x improvement in end-to-end accuracy.Efficiency: Achieved this near double-precision result with zero additional proof overhead (keeping logrows=13 and similar proving time).Stability: Confirmed that the py.allow_threads fix effectively eliminates the GIL deadlocks we previously encountered during high-concurrency proving.I believe this significantly enhances ezkl's reliability for high-precision financial or medical use cases. I’d love to hear your thoughts on the design or if any further adjustments are needed to align with the upstream roadmap.Thanks for your time and for maintaining such a great framework!

JSeam2 · 2026-03-13T19:14:52Z

I have triggered some initial tests, this looks interesting. For this to be mergeable we need a few more things

example in example/notebooks. Add the test to py_integration_tests
Additional rust test to check the functionality

Some caveat about usability

how would a user produce this look up table?
Additional error handling and providing feedback regarding soundness issues

… generate_pwl_json and 1e-11 note

… no personal link

changshenhan · 2026-03-15T09:00:01Z

I have triggered some initial tests, this looks interesting. For this to be mergeable we need a few more things

example in example/notebooks. Add the test to py_integration_tests

Additional rust test to check the functionality

Some caveat about usability

how would a user produce this look up table?

Additional error handling and providing feedback regarding soundness issues

Thanks for the encouraging feedback, I really appreciate the guidance on making this more robust for the community. I've addressed your points with a focus on usability, soundness, and seamless integration:

1. Usability & documentation

Python utility: I've added a flexible generate_pwl_json helper in examples/notebooks/custom_lookup_demo.ipynb. It supports uniform, curvature-based, and quantile-based (data-driven) spacing so users can easily reach that ~10⁻¹¹ precision.
Step-by-step guide: Created docs/custom_lookup_table.md with the JSON schema and best practices, including a tip on using safety margins for lookup_range.
Reference repo: For a more complex real-world case, I've prepared a full experiment (Conv + ReLU + Sigmoid) here: ezkl-custom-lookup-experiment.

2. Soundness & error handling

Input range check: Following your suggestion, the circuit now validates that the lookup range stays within the PWL breakpoints. If it exceeds the range, it returns a clear error suggesting breakpoint extension or lookup_range adjustment.
Internal validation: In src/circuit/ops/lookup.rs, breakpoints must be strictly increasing; added log::warn! and file-path context in errors to help with debugging.

3. Testing & integration

Notebook integration: The demo notebook is registered in tests/py_integration_tests.rs so it stays in sync with the test suite.
Rust mock test: Added mock_custom_lookup_1l_sigmoid in tests/integration_tests.rs to cover the full calibrate → compile → witness → mock pipeline.

I've included examples/pwl_sigmoid_example.json as a reference for these tests. All changes are backward compatible.

Please let me know if there are any other areas to refine—looking forward to your thoughts!

Move secret usage into dedicated GitHub Environments and replace runner-superfluous actions with native script steps, so `zizmor .` exits cleanly without relying on a repo-level suppression config. Made-with: Cursor

changshenhan · 2026-03-26T03:58:34Z

Hi @JSeam2,

I've successfully addressed all the static analysis findings reported by zizmor.

What’s been updated:

Hardened Security: Added environment declarations for jobs accessing secrets to resolve secrets-outside-env warnings.

Refactored Workflows: Replaced several redundant third-party actions with native GitHub Runner commands (e.g., using gh CLI and rustup) to fix superfluous-actions and reduce supply chain surface.

Cleaned up: Removed the zizmor.yml override since the workflows are now natively compliant with the audit.

The CI for Static Analysis is now passing with a clean exit code. Ready for your further review!

Add optional custom lookup table for Sigmoid (PWL from JSON) and rele…

f51061c

…ase GIL during prove

changshenhan marked this pull request as draft March 13, 2026 16:06

changshenhan marked this pull request as ready for review March 13, 2026 16:09

changshenhan changed the title ~~Add optional custom lookup table for Sigmoid (PWL from JSON)~~ feat: allow custom lookup paths for nonlinear ops and fix Python GIL deadlock Mar 13, 2026

changshenhan added 2 commits March 14, 2026 00:43

chore: trigger ci for accuracy verification

5a87633

build: trigger ezkl ci pipeline

5e7a4c8

changshenhan added 4 commits March 15, 2026 16:22

Address review: add notebook example, tests, docs, and error handling

b0b8e4a

Soundness: enforce input range within PWL breakpoints; usability: add…

3ed1bdc

… generate_pwl_json and 1e-11 note

Remove personal link from README and docs; keep link in PR comment only

43a3bb7

Notebook: add quantile spacing and production tip; docs: margin note,…

e4a75ba

… no personal link

changshenhan added 2 commits March 22, 2026 12:06

ci: add zizmor.yml for workflow audit policy

bdebd46

ci: fix zizmor findings in workflows

3670801

Move secret usage into dedicated GitHub Environments and replace runner-superfluous actions with native script steps, so `zizmor .` exits cleanly without relying on a repo-level suppression config. Made-with: Cursor

changshenhan temporarily deployed to evm-verifier March 30, 2026 16:50 — with GitHub Actions Inactive

changshenhan had a problem deploying to evm-verifier March 30, 2026 16:50 — with GitHub Actions Error

changshenhan temporarily deployed to evm-verifier March 30, 2026 16:50 — with GitHub Actions Inactive

JSeam2 requested a review from jasonmorton March 30, 2026 16:51

changshenhan had a problem deploying to evm-verifier March 30, 2026 17:04 — with GitHub Actions Error

changshenhan deployed to evm-verifier March 30, 2026 17:04 — with GitHub Actions Active

changshenhan had a problem deploying to evm-verifier March 30, 2026 17:04 — with GitHub Actions Error

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: allow custom lookup paths for nonlinear ops and fix Python GIL deadlock#1023

feat: allow custom lookup paths for nonlinear ops and fix Python GIL deadlock#1023
changshenhan wants to merge 9 commits into
zkonduit:mainfrom
changshenhan:pr-custom-lookup

changshenhan commented Mar 9, 2026 •

edited

Loading

Uh oh!

changshenhan commented Mar 13, 2026

Uh oh!

JSeam2 commented Mar 13, 2026

Uh oh!

changshenhan commented Mar 15, 2026

Uh oh!

changshenhan commented Mar 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

changshenhan commented Mar 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Add optional custom lookup table for Sigmoid (PWL from JSON)

Problem

Approach

Design choices

Compatibility with existing LookupOp

Files touched

Accuracy and performance (clarified)

Testing

Documentation

Review feedback addressed

Uh oh!

changshenhan commented Mar 13, 2026

Uh oh!

JSeam2 commented Mar 13, 2026

Uh oh!

changshenhan commented Mar 15, 2026

Uh oh!

changshenhan commented Mar 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

changshenhan commented Mar 9, 2026 •

edited

Loading

Compatibility with existing `LookupOp`