Skip to content

Parallel Efficiency Tests

Edmond Chow edited this page Oct 18, 2020 · 5 revisions

Tests in this page were performed on a server with the following hardware and software configuration:

  • 2 * Intel Xeon Gold 6226 CPU @ 2.7GHz (2 * 12 cores, 2 * 12 * 2 threads, hyperthreading disabled)
  • 6 * 32 GB DDR4 memory
  • Red Hat Enterprise Linux 7.6 (kernel 3.10.0-957.12.1.el7)
  • Intel Parallel Studio Cluster version 2019.5
  • ICC optimization flags: -O3 -xHost
  • OpenMP environment variables
    • OMP_NUM_THREADS=1, 2, 4, 8, 12, 24
    • OMP_PLACES=cores
    • OMP_PROC_BIND=close

Test point sets: 800,000 uniformly and randomly distributed points in a 3D unit ball

Prescribed QR relative error tolerance: 1e-6

Report H2-build and H2-matvec timings

Test kernel: Coulomb

Components and Running modes Parallel Efficiency Using 12 Cores Parallel Efficiency Using 24 Cores
H2-construction, AOT 65.0% 56.5%
H2-matvec, AOT 68.8% 58.3%
H2-construction, JIT 71.2% 56.9%
H2-matvec, JIT 66.7% 54.8%
H2-matvec, JIT, projected by CPU frequency 89.7% 73.7%

The parallel efficiency figure above looks different from the one in our paper. The reason for this is that here we used a much larger point set and the computing node has fewer processors. Nevertheless, the conclusions in the paper still hold.

Clone this wiki locally