Skip to content

Conversation

@hero78119
Copy link
Collaborator

@hero78119 hero78119 commented Nov 7, 2025

Related to
gpu PR: https://github.com/scroll-tech/ceno-gpu/pull/100/
gkr-backend PR scroll-tech/gkr-backend#17

change scope

  • add new prove_generic_sumcheck_gpu_v2 and prove with dag expression
  • add dag related parameter in prove_generic_sumcheck_gpu and defer flow selection in gpu impl

design rationale

setup dag-expression along with monomial term in vk, so prover can choose optimal strategy accordingly

@hero78119 hero78119 changed the title improve arithmetics efficiency of expression WIP improve arithmetics efficiency of expression Nov 7, 2025
@hero78119 hero78119 force-pushed the feat/efficient_arith branch from 670c499 to 530c90b Compare November 7, 2025 07:24
@hero78119 hero78119 marked this pull request as draft November 7, 2025 07:24
@hero78119
Copy link
Collaborator Author

hero78119 commented Nov 10, 2025

circuit stats on dag

Layer Name num_add num_mul max_degree max_dag_depth num_scalar
ADD_main 173 87 3 8 44
SUB_main 187 93 3 8 47
AND_main 182 95 2 8 47
OR_main 183 95 2 8 46
XOR_main 182 95 2 8 47
SLL_main 481 405 4 9 107
SRL_main 520 463 4 9 109
SRA_main 528 467 4 9 111
SLT_main 275 232 4 8 59
SLTU_main 271 232 4 8 59
MUL_main 173 87 3 11 47
MULH_main 256 143 3 13 59
MULHSU_main 260 143 3 13 61
MULHU_main 247 137 3 13 59
DIVU_main 576 603 4 13 96
REMU_main 573 603 4 13 96
DIV_main 586 610 4 13 98
REM_main 586 610 4 13 98
ADDI_main 137 71 3 8 41
ANDI_main 145 75 2 8 42
ORI_main 143 75 2 8 41
XORI_main 143 75 2 8 42
SLLI_main 441 382 4 9 102
SRLI_main 479 440 4 9 103
SRAI_main 483 444 4 9 106
SLTI_main 235 214 4 8 57
SLTIU_main 234 214 4 8 57
LUI_main 123 58 2 8 47
AUIPC_main 323 287 3 10 79
BEQ_main 133 84 3 9 40
BNE_main 135 85 3 9 39
BLT_main 243 221 4 9 55
BLTU_main 243 221 4 9 55
BGE_main 248 222 4 9 54
BGEU_main 248 222 4 9 54
JAL_main 95 48 2 8 41
JALR_main 227 176 3 9 67
LW_main 235 168 3 8 69
LHU_main 247 182 3 8 71
LH_main 261 192 3 8 74
LBU_main 276 204 3 9 79
LB_main 292 215 3 9 83
SW_main 244 172 3 8 70
SH_main 253 187 3 8 71
SB_main 306 228 3 9 83
ECALL_HALT_main 86 38 2 8 43
Ecall_Keccak 13178 6758 3 8 2300
weierstrass_add 37156 24177 3 9 1680
weierstrass_double 48042 30842 3 9 1908
weierstrass_add 37170 24177 3 9 1654
weierstrass_double 48036 30842 3 9 1882
weierstrass_decompress 35430 26258 4 9 1357
DYNAMIC_RANGE_18 8 6 2 7 7
DOUBLE_RANGE_DoubleU8 8 6 2 7 8
And_OPS_ROM_TABLE 11 7 2 7 9
Or_OPS_ROM_TABLE 11 7 2 7 9
Xor_OPS_ROM_TABLE 11 7 2 7 9
Ltu_OPS_ROM_TABLE 11 7 2 7 9
RAM_Register_RegTable 10 5 2 7 9
RAM_Memory_StaticMemTable 10 5 2 7 9
RAM_Memory_PubIOTable 8 5 2 7 9
HintsTable_Memory_RAM 10 5 2 7 9
StackTable_Memory_RAM 8 3 2 7 9
HeapTable_Memory_RAM 8 3 2 7 9
LocalRAMTableFinal 13 7 2 7 9
ShardRamCircuit_main 260518 502742 4 8 98805
ECALL_DUMMY_main 190 90 2 8 51
SECP256K1_DECOMPRESS_main 920 435 2 8 157
SHA256_EXTEND_main 3256 1539 2 8 493
BN254_FP_ADD_main 917 435 2 8 157
BN254_FP_MUL_main 924 435 2 8 157
BN254_FP2_ADD_main 1700 803 2 8 269
BN254_FP2_MUL_main 1698 803 2 8 269
PROGRAM 19 11 2 7 13

@hero78119
Copy link
Collaborator Author

hero78119 commented Nov 10, 2025

cache with append only index version

some issue:

  1. this routine will panic on ShardRamCircuit for super lengthy expression
Layer Name num_add num_mul max_degree max_dag_depth num_scalar
ADD_main 153 74 3 294 44
SUB_main 160 78 3 308 47
AND_main 167 83 2 326 47
OR_main 167 82 2 324 46
XOR_main 168 82 2 326 47
SLL_main 369 306 4 829 107
SRL_main 381 334 4 871 109
SRA_main 389 338 4 885 111
SLT_main 223 154 4 463 59
SLTU_main 222 154 4 462 59
MUL_main 157 76 3 303 47
MULH_main 214 111 3 411 59
MULHSU_main 215 110 3 413 61
MULHU_main 206 104 3 396 59
DIVU_main 397 356 4 889 96
REMU_main 399 356 4 891 96
DIV_main 407 363 4 908 98
REM_main 408 363 4 909 98
ADDI_main 122 61 3 243 41
ANDI_main 136 68 2 271 42
ORI_main 134 68 2 268 41
XORI_main 134 68 2 269 42
SLLI_main 330 288 4 760 102
SRLI_main 345 317 4 805 103
SRAI_main 359 321 4 826 106
SLTI_main 189 140 4 409 57
SLTIU_main 192 140 4 412 57
LUI_main 119 55 2 238 47
AUIPC_main 259 210 3 570 79
BEQ_main 118 73 3 250 40
BNE_main 119 74 3 251 39
BLT_main 191 143 4 412 55
BLTU_main 190 144 4 412 55
BGE_main 194 144 4 415 54
BGEU_main 196 144 4 417 54
JAL_main 94 46 2 195 41
JALR_main 190 137 3 417 67
LW_main 197 130 3 420 69
LHU_main 207 142 3 446 71
LH_main 220 152 3 473 74
LBU_main 230 159 3 497 79
LB_main 244 169 3 526 83
SW_main 206 133 3 433 70
SH_main 212 146 3 454 71
SB_main 257 179 3 549 83
ECALL_HALT_main 83 36 2 169 41
Ecall_Keccak 12589 6318 3 22907 2300
weierstrass_add 32167 20058 3 55841 1680
weierstrass_double 39587 24169 3 67857 1908
weierstrass_add 22728 12577 3 38895 1654
weierstrass_double 28158 15020 3 47253 1882
weierstrass_decompress 21324 13819 4 37878 1357
DYNAMIC_RANGE_18 8 6 2 25 7
DOUBLE_RANGE_DoubleU8 8 6 2 26 8
And_OPS_ROM_TABLE 11 7 2 32 9
Or_OPS_ROM_TABLE 11 7 2 32 9
Xor_OPS_ROM_TABLE 11 7 2 32 9
Ltu_OPS_ROM_TABLE 11 7 2 32 9
RAM_Register_RegTable 10 5 2 28 9
RAM_Memory_StaticMemTable 10 5 2 28 9
RAM_Memory_PubIOTable 8 5 2 26 9
HintsTable_Memory_RAM 10 5 2 28 9
StackTable_Memory_RAM 8 3 2 22 9
HeapTable_Memory_RAM 8 3 2 22 9
LocalRAMTableFinal 13 7 2 35 9

@hero78119
Copy link
Collaborator Author

hero78119 commented Nov 10, 2025

Removed zero/one expression

Layer Name num_add_orig num_add_opt Δ add (%) num_mul_orig num_mul_opt Δ mul (%)
ADD_main 173 127 −26.6% 87 80 −8.0%
SUB_main 187 140 −25.1% 93 86 −7.5%
AND_main 182 139 −23.6% 95 92 −3.2%
OR_main 183 139 −24.0% 95 92 −3.2%
XOR_main 182 139 −23.6% 95 92 −3.2%
SLL_main 481 278 −42.2% 405 304 −24.9%
SRL_main 520 292 −43.8% 463 326 −29.6%
SRA_main 528 299 −43.3% 467 329 −29.6%
SLT_main 275 175 −36.4% 232 172 −25.9%
SLTU_main 271 175 −35.4% 232 172 −25.9%
MUL_main 173 126 −27.2% 87 84 −3.4%
MULH_main 256 177 −30.9% 143 138 −3.5%
MULHSU_main 260 179 −31.2% 143 138 −3.5%
MULHU_main 247 171 −30.8% 137 132 −3.6%
DIVU_main 576 335 −41.8% 603 440 −27.0%
REMU_main 573 335 −41.5% 603 440 −27.0%
DIV_main 586 343 −41.5% 610 447 −26.7%
REM_main 586 343 −41.5% 610 447 −26.7%
ADDI_main 137 98 −28.5% 71 65 −8.5%
ANDI_main 145 108 −25.5% 75 73 −2.7%
ORI_main 143 108 −24.5% 75 73 −2.7%
XORI_main 143 108 −24.5% 75 73 −2.7%
SLLI_main 441 245 −44.5% 382 282 −26.2%
SRLI_main 479 259 −45.9% 440 304 −30.9%
SRAI_main 483 266 −44.9% 444 307 −30.8%
SLTI_main 235 146 −37.9% 214 155 −27.6%
SLTIU_main 234 146 −37.6% 214 155 −27.6%
LUI_main 123 94 −23.6% 58 55 −5.2%
AUIPC_main 323 203 −37.1% 287 221 −23.0%
BEQ_main 133 90 −32.3% 84 70 −16.7%
BNE_main 135 90 −33.3% 85 70 −17.6%
BLT_main 243 150 −38.3% 221 160 −27.6%
BLTU_main 243 150 −38.3% 221 160 −27.6%
BGE_main 248 150 −39.5% 222 160 −27.9%
BGEU_main 248 150 −39.5% 222 160 −27.9%
JAL_main 95 73 −23.2% 48 46 −4.2%
JALR_main 227 145 −36.1% 176 139 −21.0%
LW_main 235 152 −35.3% 168 138 −17.9%
LHU_main 247 160 −35.2% 182 149 −18.1%
LH_main 261 168 −35.6% 192 157 −18.2%
LBU_main 276 178 −35.5% 204 166 −18.6%
LB_main 292 187 −36.0% 215 175 −18.6%
SW_main 244 160 −34.4% 172 142 −17.4%
SH_main 253 168 −33.6% 187 155 −17.1%
SB_main 306 201 −34.3% 228 186 −18.4%
ECALL_HALT_main 86 71 −17.4% 38 36 −5.3%
Ecall_Keccak 13178 9297 −29.4% 6758 6610 −2.2%
weierstrass_add 37156 21156 −43.1% 24177 19114 −20.9%
weierstrass_double 48042 24761 −48.4% 30842 23158 −24.9%
weierstrass_decompress 35430 18295 −48.4% 26258 19592 −25.3%
ShardRamCircuit_main 260518 130248 −50.0% 502742 372609 −25.9%
ECALL_DUMMY_main 190 141 −25.8% 90 86 −4.4%
SECP256K1_DECOMPRESS_main 920 697 −24.2% 435 416 −4.4%
SHA256_EXTEND_main 3256 2473 −24.1% 1539 1472 −4.4%
BN254_FP_ADD_main 917 697 −24.0% 435 416 −4.4%
BN254_FP_MUL_main 924 697 −24.6% 435 416 −4.4%
BN254_FP2_ADD_main 1700 1289 −24.2% 803 768 −4.4%
BN254_FP2_MUL_main 1698 1289 −24.1% 803 768 −4.4%
PROGRAM 19 11 −42.1% 11 11 0.0%

@hero78119
Copy link
Collaborator Author

Monomial vs Dag add/mul

Layer Name DAG num_add Monomial num_add DAG num_mul Monomial num_mul DAG max_degree DAG max_dag_depth DAG num_scalar
ADD_main 127 24 80 51 3 7 43
SUB_main 140 24 86 51 3 7 46
AND_main 139 28 92 57 2 7 46
OR_main 139 28 92 57 2 7 45
XOR_main 139 28 92 57 2 7 46
SLL_main 278 107 304 286 4 7 106
SRL_main 292 116 326 317 4 7 108
SRA_main 299 116 329 317 4 7 110
SLT_main 175 56 172 149 4 7 58
SLTU_main 175 56 172 149 4 7 58
MUL_main 126 25 84 54 3 8 46
MULH_main 177 36 138 83 3 10 58
MULHSU_main 179 36 138 83 3 10 60
MULHU_main 171 36 132 83 3 10 58
DIVU_main 335 150 440 450 4 10 95
REMU_main 335 150 440 450 4 10 95
DIV_main 343 150 447 450 4 10 97
REM_main 343 150 447 450 4 10 97
ADDI_main 98 20 65 43 3 7 40
ANDI_main 108 24 73 49 2 7 41
ORI_main 108 24 73 49 2 7 40
XORI_main 108 24 73 49 2 7 41
SLLI_main 245 100 282 272 4 7 101
SRLI_main 259 109 304 303 4 7 102
SRAI_main 266 109 307 303 4 7 105
SLTI_main 146 52 155 141 4 7 56
SLTIU_main 146 52 155 141 4 7 56
LUI_main 94 16 55 33 2 7 46
AUIPC_main 203 68 221 184 3 9 78
BEQ_main 90 27 70 64 3 8 39
BNE_main 90 27 70 64 3 7 38
BLT_main 150 53 160 144 4 8 54
BLTU_main 150 53 160 144 4 8 54
BGE_main 150 53 160 144 4 7 53
BGEU_main 150 53 160 144 4 7 53
JAL_main 73 13 46 27 2 7 40
JALR_main 145 45 139 114 3 7 66
LW_main 152 44 138 110 3 7 68
LHU_main 160 49 149 123 3 7 70
LH_main 168 51 157 128 3 7 73
LBU_main 178 55 166 138 3 7 78
LB_main 187 57 175 143 3 7 82
SW_main 160 44 142 110 3 7 69
SH_main 168 49 155 124 3 7 70
SB_main 201 58 186 146 3 7 82
ECALL_HALT_main 71 8 36 17 2 7 41
Ecall_Keccak 9297 2784 6610 5595 3 8 2298
weierstrass_add 21156 4511 19114 11599 3 8 1679
weierstrass_double 24761 5296 23158 13697 3 8 1907
weierstrass_decompress 18295 5331 19592 14681 4 8 1356
DYNAMIC_RANGE_18 5 3 6 7 2 6 6
HeapTable_Memory_RAM 6 1 3 3 2 5 8
LocalRAMTableFinal 6 5 6 11 2 5 7
ShardRamCircuit_main 6003 956 11780 2778 4 8 1724
ECALL_DUMMY_main 141 32 86 65 2 7 50
SECP256K1_DECOMPRESS_main 697 153 416 307 2 7 156
SHA256_EXTEND_main 2473 537 1472 1075 2 7 492
BN254_FP_ADD_main 697 153 416 307 2 7 156
BN254_FP_MUL_main 697 153 416 307 2 7 156
BN254_FP2_ADD_main 1289 281 768 563 2 7 268
BN254_FP2_MUL_main 1289 281 768 563 2 7 268
PROGRAM 11 8 11 17 2 6 12

@hero78119
Copy link
Collaborator Author

hero78119 commented Nov 11, 2025

Example

StackTable_Memory_RAM zero_expr 0 + WitIn(1) * (0 + Challenge(2) * (2 + WitIn(0) * Challenge(1) + 0*Challenge(1)^2 + 0*Challenge(1)^3 + 0*Challenge(1)^4 + Challenge(0) + (-1)))

StackTable_Memory_RAM monomial term [Term { scalar: (2*Challenge(2) + 1 * Challenge(2) * 0*Challenge(1)^2 + 1 * Challenge(2) * 0*Challenge(1)^3 + 1 * Challenge(2) * 0*Challenge(1)^4 + 1 * Challenge(2) * Challenge(0) + -1*Challenge(2)), product: [W[1]] }, Term { scalar: (0 + 1 * Challenge(2) * Challenge(1)), product: [W[0], W[1]] }]

@hero78119
Copy link
Collaborator Author

Build dag from monomial term with common-subtree-extraction

Layer Name Monomial Add DAG Add Monomial Mul DAG Mul Mul Reduction (%)
ADD_main 24 24 51 25 50.98% ↓
SUB_main 24 24 51 25 50.98% ↓
AND_main 28 28 57 29 49.12% ↓
OR_main 28 28 57 29 49.12% ↓
XOR_main 28 28 57 29 49.12% ↓
SLL_main 107 107 286 108 62.24% ↓
SRL_main 116 116 317 117 63.09% ↓
SRA_main 116 116 317 117 63.09% ↓
SLT_main 56 56 149 57 61.74% ↓
SLTU_main 56 56 149 57 61.74% ↓
MUL_main 25 25 54 26 51.85% ↓
MULH_main 36 36 83 37 55.42% ↓
MULHSU_main 36 36 83 37 55.42% ↓
MULHU_main 36 36 83 37 55.42% ↓
DIVU_main 150 150 450 151 66.44% ↓
REMU_main 150 150 450 151 66.44% ↓
DIV_main 150 150 450 151 66.44% ↓
REM_main 150 150 450 151 66.44% ↓
ADDI_main 20 20 43 21 51.16% ↓
ANDI_main 24 24 49 25 48.98% ↓
ORI_main 24 24 49 25 48.98% ↓
XORI_main 24 24 49 25 48.98% ↓
SLLI_main 100 100 272 101 62.87% ↓
SRLI_main 109 109 303 110 63.69% ↓
SRAI_main 109 109 303 110 63.69% ↓
SLTI_main 52 52 141 53 62.41% ↓
SLTIU_main 52 52 141 53 62.41% ↓
LUI_main 16 16 33 17 48.48% ↓
AUIPC_main 68 68 184 69 62.50% ↓
BEQ_main 27 27 64 28 56.25% ↓
BNE_main 27 27 64 28 56.25% ↓
BLT_main 53 53 144 54 62.50% ↓
BLTU_main 53 53 144 54 62.50% ↓
BGE_main 53 53 144 54 62.50% ↓
BGEU_main 53 53 144 54 62.50% ↓
JAL_main 13 13 27 14 48.15% ↓
JALR_main 45 45 114 46 59.65% ↓
LW_main 44 44 110 45 59.09% ↓
LHU_main 49 49 123 50 59.35% ↓
LH_main 51 51 128 52 59.38% ↓
LBU_main 55 55 138 56 59.42% ↓
LB_main 57 57 143 58 59.44% ↓
SW_main 44 44 110 45 59.09% ↓
SH_main 49 49 124 50 59.68% ↓
SB_main 58 58 146 59 59.59% ↓
ECALL_HALT_main 8 8 17 9 47.06% ↓
Ecall_Keccak 2784 2784 5595 2788 50.17% ↓
weierstrass_add 4511 4511 11599 4512 61.10% ↓
weierstrass_double 5296 5296 13697 5297 61.33% ↓
weierstrass_decompress 5331 5331 14681 5332 63.68% ↓
DYNAMIC_RANGE_18 3 3 7 4 42.86% ↓
And_OPS_ROM_TABLE 4 4 9 5 44.44% ↓
StackTable_Memory_RAM 1 1 3 2 33.33% ↓
ShardRamCircuit_main 956 956 2778 970 65.09% ↓
ECALL_DUMMY_main 32 32 65 33 49.23% ↓
SECP256K1_DECOMPRESS_main 153 153 307 154 49.84% ↓
SHA256_EXTEND_main 537 537 1075 538 49.95% ↓
BN254_FP_ADD_main 153 153 307 154 49.84% ↓
BN254_FP_MUL_main 153 153 307 154 49.84% ↓
BN254_FP2_ADD_main 281 281 563 282 49.91% ↓
BN254_FP2_MUL_main 281 281 563 282 49.91% ↓
PROGRAM 8 8 17 9 47.06% ↓

@hero78119 hero78119 force-pushed the feat/efficient_arith branch from 81f1452 to 87b24ef Compare November 11, 2025 08:21
@hero78119 hero78119 force-pushed the feat/efficient_arith branch from 87b24ef to d09d75b Compare November 11, 2025 08:23
@hero78119 hero78119 marked this pull request as ready for review November 12, 2025 11:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants