Skip to content

Conversation

@jonasnick
Copy link
Contributor

This is a draft because I'm not sure about the cleanest way to implement it.

@real-or-random
Copy link
Contributor

This is a draft because I'm not sure about the cleanest way to implement it.

The current approach looks clean. What other approaches do you have in mind?

Copy link
Member

@hebasto hebasto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Concept ACK.

Copy link
Member

@hebasto hebasto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On my x86_64 system, this PR reduces the memory allocated on the scratch space from 2224 bytes to 1452 bytes per point.

Please ping me once it’s undrafted.

@jonasnick
Copy link
Contributor Author

The current approach requires a temporary array int wnaf_tmp[256]; to provide to secp256k1_ecmult_wnaf which looks unclean. The alternatives are

  1. copy almost all of secp256k1_ecmult_wnaf into secp256k1_ecmult_wnaf_small, or
  2. remove secp256k1_ecmult_wnaf_small and write an secp256k1_ecmult macro.

Both options seem to be worse.

@hebasto
Copy link
Member

hebasto commented Oct 27, 2025

The current approach requires a temporary array int wnaf_tmp[256]; to provide to secp256k1_ecmult_wnaf which looks unclean. The alternatives are

  1. copy almost all of secp256k1_ecmult_wnaf into secp256k1_ecmult_wnaf_small, or

  2. remove secp256k1_ecmult_wnaf_small and write an secp256k1_ecmult macro.

Both options seem to be worse.

I might suggest a third option: hebasto@5c0d6ee.

int wnaf_tmp[256];
int ret, i;

VERIFY_CHECK(2 <= w && w <= 8);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
VERIFY_CHECK(2 <= w && w <= 8);
VERIFY_CHECK(2 <= w && w <= 7);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see why w = 8 wouldn't work. The documentation of wnaf says

 *  - each wnaf[i] is either 0, or an odd integer between -(1<<(w-1) - 1) and (1<<(w-1) - 1)

So for w = 8, wnaf[i] is in [-127, 127] which fits in an int8_t.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, yes, you're right. I was getting confused. secp256k1_ecmult_wnaf itself needs w <= 31 (and not 32), if only because it performs a carry << w shift (for int carry) which is certainly UB if int is 32 bits. (In fact, if carry == 1, then even 1 << 31 is UB. This is another edge case that we should fix! Let me add this to the other issue.)

But since your function only copies the results, everything is fine.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In fact, if carry == 1, then even 1 << 31 is UB. This is another edge case that we should fix! Let me add this to the other issue

Oh, great catch!

Copy link
Contributor

@real-or-random real-or-random left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the current approach in the PR is good. It may not be elegant to have a tmp array, but it's simple and correct. We'd need to benchmark if the tmp array makes a difference in the end. But I think this PR needs a benchmark in general to make sure that using int8_t does not increase the running time (much).

If we want to avoid it, here's yet another variant: real-or-random@f83731b
It uses a macro to define different variants of secp256k1_ecmult_wnaf parametrized in the output type. The macro is not elegant either, but this variant is better for type safety than just turning secp256k1_ecmult_wnaf into a macro.

In fact, the current secp256k1_ecmult_wnaf needs the unstated and unchecked assumption that int has at least 32 value bits when it VERIFY_CHECKs that w <= 31. In practice, we call it only with WINDOW_A == 5 and WINDOW_G == ECMULT_WINDOW_SIZE where the latter is configurable in the range 2..24.

A consequence of this "bug" is that the code fails on a 16-bit platform if you set ECMULT_WINDOW_SIZE > 16. I don't think we need to support this, but code without unchecked assumptions is bad. So I suggest that we rewrite the function to use int32_t instead of int even if we don't use my macro approach. Alternatively, we could add the assumption that INT_MAX >= INT32_MAX but this forbids 16-bit platforms, and the code seems to work on them in principle; see #792 (comment).

@real-or-random
Copy link
Contributor

I might suggest a third option: hebasto@5c0d6ee.

Sorry, I forgot to comment on that option. That's also clean, but it introduces a lot of code complexity.

The way I see it:

  1. If the current approach is fine performance-wise, let's take it.
  2. If not, use either @hebasto's approach or mine. If you ask me, I prefer mine slightly because it's less code and more "direct" even though it uses a macro.
  3. If none of this is satisfactory, we can still duplicate the code.

@jonasnick
Copy link
Contributor Author

Thanks for demonstrating creative alternative solutions :) If it weren't for the layers of indirection, I'd consider @hebasto's approach to be the most elegant. The PR's current approach is just so much simpler. And I ran benchmarks with bench_ecmult, which showed at most a 0.1us slowdown (for some number of points) on my Intel i7 machine.

@jonasnick jonasnick marked this pull request as ready for review November 12, 2025 19:26
Copy link
Contributor

@real-or-random real-or-random left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

utACK 26166c4

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants