Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
29 changes: 29 additions & 0 deletions .github/DEVELOPMENT.md
Original file line number Diff line number Diff line change
Expand Up @@ -144,6 +144,35 @@ allows static code analysis tools (e.g. Error Prone's `MissingCasesInEnumSwitch`
check) report a problem when the enum definition is updated but the code using
it is not.

### Vector API
It's safe to assume that the JVM has the Vector API
([JEP 508](https://openjdk.org/jeps/508)) enabled and available at runtime, but
not safe to assume that the Vector API implementation will perform faster than
equivalent scalar code on whatever hardware the engine happens to be running on.

Different CPU hardware can exhibit dramatically different performance
characteristics, so it's important to use hardware feature detection to
determine under which scenarios a vectorized approach will be faster for
each implementation. Vectorized code should be tested on AMD, ARM, and Intel
CPUs to verify the benefits hold on each of those platforms before deciding
to enable a given code path on each of those platforms. Also note that ARM CPUs
can exhibit significant differences from between hardware generations as well
as between Apple Silicon and datacenter class CPUs.

When adding implementations that use the Vector API, prefer the following
approach unless the specifics of the situation dictate otherwise:
* Provide an equivalent scalar implementation in code, if one does not already
Copy link
Member

@martint martint Nov 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We had an offline conversation about this, so let me re-hash my comments here:

  • I would not make such a blanket statement of requiring a scalar implementation every time the Vector API. I there are cases where an implementation based on Vector API may run slower that a scalar version (or possibly, other variants of the same algorithm using a different subset of the Vector API or even other scalar variants), then by all means, add multiple versions and make the code able to switch dynamically. We use employ similar techniques for data-dependent optimizations (block types, specialized hash tables for certain types, etc).
  • Functional behavior tests can be implemented via unit tests, just like anything else. Using an equivalent scalar-based implementation is an option, but not necessarily the only approach.

I think these are reasonable guidelines for complex cases. But I would soften the language, as there's no one-size-fits all (e.g., rephrase as "when adding impls that use the Vector API, consider:")

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've softened the language to "prefer the following approach unless the specifics of the situation dictate otherwise"

exist.
* Use configuration flags and hardware support detection to ensure that
vectorized implementation is only selected when running on hardware where it is
expected to perform better than its scalar equivalent.
* Add tests that ensure the behavior of the vectorized and scalar
implementations match.
* Include micro-benchmarks that demonstrate the performance benefits of the
vectorized implementation compared to the scalar equivalent logic. Ensure that
the benefits hold for all CPU architectures on which the vectorized
implementation is enabled.

## Keep pom.xml clean and sorted

There are several plugins in place to keep pom.xml clean.
Expand Down