Skip to content
mlohry edited this page Jun 21, 2023 · 5 revisions

Kernel fusion experiments

Executed with

--dimension=3 --max_time_steps=1000 --eps2=0.0019974621629115655 --sigma=3.5306509073075123 --domain_x_begin=-3.14159 --domain_x_end=3.14159 --time_integrator=ode23 --converged_rel_tol=1e-6 --kernel_variant=

with kernel_variant set to 0, 1, or 2.

GPU

nvidia A100

Variant Evals per second Speedup Factor
Unfused 3543 -
Fused 6760 1.908x

GTX Titan Black

Variant Evals per second Speedup Factor
Unfused 272.1 -
Fused 323.0 1.19x

CPU

AMD Ryzen 9 3900X 12-core CPU, 12 OpenMP threads

Variant Evals per second Speedup Factor
Unfused 72.45 -
Fused 130.8 1.806x

Intel(R) Xeon(R) Gold 6242 CPU @ 2.80GHz, 32 OpenMP threads

Variant Evals per second Speedup Factor
Unfused 107.59 -
Fused 141.34 1.314x
Clone this wiki locally