Right now, the prover decomposes the quotient as follows
- Reshape the quotient into D column
- Perform an iDFT
- Manually perform a batch coset shift
- Compute the DFT to obtain the evaluations over the LDE domain
Unfortunately, the DFT interface takes as input and returns matrices in natural order, which means that the internal implementations perform redundant bit-reversals. We could instead preserve the bit-reversed order between the two DFT calls, and only bit-reverse the row_base, which would be slightly more efficient. This may require modifications to Plonky3's TwoAdicSubgroupDft crate though.
Moreover, the current coset LDE implementations used for trace commitments handle coset shifts sequentially by iterating over powers of the shift. This could be parallelized by allocating the vector of powers using shift.powers which is optimized to use parallelism and packing, and scaling the rows in parallel.
Ported from 0xMiden/p3-miden#28
Right now, the prover decomposes the quotient as follows
Unfortunately, the DFT interface takes as input and returns matrices in natural order, which means that the internal implementations perform redundant bit-reversals. We could instead preserve the bit-reversed order between the two DFT calls, and only bit-reverse the
row_base, which would be slightly more efficient. This may require modifications to Plonky3'sTwoAdicSubgroupDftcrate though.Moreover, the current coset LDE implementations used for trace commitments handle coset shifts sequentially by iterating over powers of the shift. This could be parallelized by allocating the vector of powers using
shift.powerswhich is optimized to use parallelism and packing, and scaling the rows in parallel.Ported from 0xMiden/p3-miden#28