Skip to content

Using and Writing Kernel Functions

Edmond Chow edited this page Oct 17, 2020 · 8 revisions

In H2Pack, the supported kernel functions K(x,y) should be translationally-invariant (i.e., K(x,y)=K(x-y)) and symmetric (i.e., K(x,y)=K(y,x)). You can either write your own kernel function or use one that is built into H2Pack.

Writing Your Own Kernel Function

To use your own kernel in H2Pack, you need to write a Kernel Matrix Evaluation (KME) function which takes two point sets X0, Y0, and outputs the kernel block K(X0, Y0). Note that the performance of H2Pack depends heavily on the performance of evaluating the kernel function. Your KME function must be defined as follows:

// Input parameters:
//   coord0     : Matrix, size pt_dim-by-ld0, coordinates of the 1st point set
//   ld0        : Leading dimension of coord0, should be >= n0
//   n0         : Number of points in coord0 (each column in coord0 is a coordinate)
//   coord1     : Matrix, size pt_dim-by-ld1, coordinates of the 2nd point set
//   ld1        : Leading dimension of coord1, should be >= n1
//   n1         : Number of points in coord1 (each column in coord0 is a coordinate)
//   ldm        : Leading dimension of the kernel matrix
//   krnl_param : Pointer to kernel function parameter array
// Output parameter:
//   mat : Obtained kernel matrix, size n0-by-ldm
void name_of_your_kernel(
    const DTYPE *coord0, const int ld0, const int n0,
    const DTYPE *coord1, const int ld1, const int n1,
    const void *krnl_param, DTYPE *mat, const int ldm
);

A KME function takes two sets of point coordinates and a parameter array as input and returns a kernel matrix. Two sets of point coordinates: coord0 is a pt_dim-by-ld0 row-major matrix, its first n0 columns are point coordinates, and n0 <= ld0. coord1 is a pt_dim-by-ld1 row-major matrix, its first n1 columns are point coordinates, and n1 <= ld1. The parameter array krnl_param contains the parameters used by the kernel. The returning matrix is a row-major matrix of size n0-by-ldm, its first n0-by-n1 block stores the kernel matrix.

An example of a 3D Gaussian kernel:

void Gaussian_3D_eval_std(
    const DTYPE *coord0, const int ld0, const int n0,
    const DTYPE *coord1, const int ld1, const int n1,
    const void *krnl_param, DTYPE *mat, const int ldm
)
{
    const DTYPE *x0 = coord0 + ld0 * 0, *x1 = coord1 + ld1 * 0;
    const DTYPE *y0 = coord0 + ld0 * 1, *y1 = coord1 + ld1 * 1;
    const DTYPE *z0 = coord0 + ld0 * 2, *z1 = coord1 + ld1 * 2;
    const DTYPE *krnl_param_ = (DTYPE*) krnl_param;
    const DTYPE l = krnl_param_[0];
    for (int i = 0; i < n0; i++)
    {
        const DTYPE x0_i = x0[i], y0_i = y0[i], z0_i = z0[i];
        DTYPE *kmat_i = kmat + i * ldm;
        #pragma omp simd  // Don't forget it! 
        for (int j = 0; j < n1; j++)
        {
            DTYPE dx = x1[j] - x0_i;
            DTYPE dy = y1[j] - y0_i;
            DTYPE dz = z1[j] - z0_i;
            DTYPE r2 = dx * dx + dy * dy + dz * dz;
            kmat_i[j] = DEXP(-l * r2);
        }
    }
}

-construction,-matvec in JIT mode, and -matmul in JIT mode all need to evaluate a large number of kernel matrix blocks. Instead of evaluating the kernel function at one pair of points, the above KME function allows the compiler to vectorize kernel evaluations and thus provides better performance.

Note 1: In the function, we use DEXP() instead of exp(). DEXP is a macro defined in src/include/H2Pack_config.h, and by default DEXP is defined as exp. Using DEXP() instead of exp() allows H2Pack and your kernel to be used in single precision mode without extra work. We recommend users to use the macros provided in H2Pack_config.h.

Note 2: You do not need to and are actually not supposed to parallelize your KME function with multiple threads. A KME function should be single-threaded and should only use variables or memory that can be updated by the current thread.

Advanced Kernel Functions for Better Efficiency:

  • For better -matvec and -matmul performance in JIT mode, users can provide an additional kernel-related function called the Bi-Kernel Matvec (BKM) function. For more details, see Bi-Kernel Matvec (BKM) Functions.

  • For even better performance of -construction, -matvec (JIT mode), and -matmul (JIT mode) performance, vector wrapper functions can be used for writing the KME and/or BKM functions. For more details, see Vector Wrapper Functions For Kernel Functions.

Note: Providing a KME function is both sufficient and necessary for H2Pack to work properly. The above BKM function and vector wrapper functions are used as auxiliary tools to further improve the efficiency of H2Pack.

Using Optimized Kernels in H2Pack

H2Pack provides the following optimized kernels (notation: equation, 2D / 3D means the point coordinate dimension):

  • 2D Laplace:
  • 3D Coulomb:
  • 2D & 3D Gaussian: , is a scalar parameter
  • 2D & 3D Exponential: , is a scalar parameter
  • 2D & 3D Matern 3/2: , is a scalar parameter
  • 2D & 3D Matern 5/2: , is a scalar parameter
  • 2D & 3D Quadratic: , are two scalar parameters
  • 3D Stokes:

To use these optimized kernels, add #include "H2Pack_2D_kernels.h" and #include "H2Pack_3D_kernels.h"to your C/C++ file. Both KME and BKM functions are provided:

  • For KME function pointer kernel_eval_fptr, the optimized scalar kernels (i.e. except Stokes) are named as <kernel_prefix>_eval_intrin_d, the suffix _intrin_d means the function is manually vectorized for double precision. For 3D Stokes kernel, the function is Stokes_eval_std without manual vectorization.
  • For BKM function pointer kernel_bimv_fptr, the optimized kernels are named as <kernel_prefix>_krnl_bimv_intrin_d, the suffix _intrin_d means the function is manually vectorized for double precision.
  • For BKM flop numbers krnl_bimv_flop, the values are named as <kernel_prefix>_krnl_bimv_flop.

Kernel naming prefixes:

Kernel Naming Prefix
2D Laplace Laplace_2D
3D Coulomb Coulomb_3D
2D & 3D Gaussian Gaussian_{2, 3}D
2D & 3D Exponential Expon_{2, 3}D
2D & 3D Matern 3/2 Matern32_{2, 3}D
2D & 3D Matern 5/2 Matern52_{2, 3}D
2D & 3D Quadratic Quadratic_{2, 3}D
3D Stokes Stokes

An example code for using 3D Gaussian kernel:

// Partitioning points
int max_leaf_points = 0;
DTYPE max_leaf_size = 0.0;
char *pp_fname = "./PP_3DCoulomb_1e-6.dat";
H2P_calc_enclosing_box(pt_dim, n_point, coord, pp_fname, &h2pack->root_enbox);
H2P_partition_points(h2pack, n_point, coord, max_leaf_points, max_leaf_size);

// Kernel configuration
DTYPE krnl_param[1] = {2.0}; 
kernel_eval_fptr krnl_eval = Gaussian_3D_eval_intrin_d;
kernel_bimv_fptr krnl_bimv = Gaussian_3D_krnl_bimv_intrin_d;
int krnl_bimv_flop = Gaussian_3D_krnl_bimv_flop;

// Constructing proxy points
H2P_dense_mat_p *pp;
H2P_generate_proxy_point_ID_file(h2pack, krnl_param, krnl_eval, pp_fname, &pp);

// Constructing H2 representation
int BD_JIT = 1;
H2P_build(
    h2pack, pp, BD_JIT, krnl_param, 
    krnl_eval, krnl_bimv, krnl_bimv_flop
);
Clone this wiki locally