Optimize quotient decomposition and coset LDEs

Right now, the prover decomposes the quotient as follows

1. Reshape the quotient into D column
2. Perform an iDFT 
3. Manually perform a batch coset shift 
4. Compute the DFT to obtain the evaluations over the LDE domain

Unfortunately, the DFT interface takes as input and returns matrices in natural order, which means that the internal implementations perform redundant bit-reversals. We could instead preserve the bit-reversed order between the two DFT calls, and only bit-reverse the `row_base`, which would be slightly more efficient. This may require modifications to Plonky3's `TwoAdicSubgroupDft` crate though. 

Moreover, the current coset LDE implementations used for trace commitments handle coset shifts sequentially by iterating over powers of the shift. This could be parallelized by allocating the vector of powers using `shift.powers` which is optimized to use parallelism and packing, and scaling the rows in parallel.

---

Ported from https://github.com/0xMiden/p3-miden/issues/28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize quotient decomposition and coset LDEs #950

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Optimize quotient decomposition and coset LDEs #950

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions