Skip to content

Bi Kernel Matvec Functions

Edmond Chow edited this page Oct 18, 2020 · 3 revisions

Bi-Kernel Matvec (BKM) Functions

An optional BKM function can be used to further accelerate matrix-vector multiplication in JIT mode.

Since H2Pack currently focuses on symmetric kernel matrices, for most Bi,j and Di,j kernel blocks, the matrix-vector multiplication needs to multiply one block by a vector and multiply its transposes by another vector. If only a kernel matrix evaluation (KME) function is provided, H2Pack will compute this Bi,j or Di,j block first, and immediately use the block twice for the two multiplications. In this approach, the computed Bi,j or Di,j block needs to be stored in a buffer (usually in L2 cache) and transferred to the processor. A BKM function further eliminates this data movement for better performance.

A BKM function must be defined as follows:

// Input parameters:
//   coord0     : Matrix, size pt_dim-by-ld0, coordinates of the 1st point set
//   ld0        : Leading dimension of coord0, should be >= n0
//   n0         : Number of points in coord0 (each column in coord0 is a coordinate)
//   coord1     : Matrix, size pt_dim-by-ld1, coordinates of the 2nd point set
//   ld1        : Leading dimension of coord1, should be >= n1
//   n1         : Number of points in coord1 (each column in coord0 is a coordinate)
//   krnl_param : Pointer to kernel function parameter array
//   x_in_0     : Vector, size >= krnl_dim * n1, will be left multiplied by kernel_matrix(coord0, coord1)
//   x_in_1     : Vector, size >= krnl_dim * n0, will be left multiplied by kernel_matrix(coord1, coord0).
// Output parameters:
//   x_out_0 : Vector, size >= krnl_dim * n0, x_out_0 += kernel_matrix(coord0, coord1) * x_in_0
//   x_out_1 : Vector, size >= krnl_dim * n1, x_out_1 += kernel_matrix(coord1, coord0) * x_in_1
void your_bkm_function_name (
    const DTYPE *coord0, const int ld0, const int n0,
    const DTYPE *coord1, const int ld1, const int n1,
    const void *krnl_param, 
    const DTYPE *x_in_0, const DTYPE *x_in_1, 
    DTYPE * restrict x_out_0, DTYPE * restrict x_out_1
);

The first 7 input parameters are the same as that of a KME function. x_in_0 is a vector of size krnl_dim * n1 that will be multiplied by the kernel matrix block and the output vector will be accumulated to output vector x_out_0 of size krnl_dim * n0. Similarly, x_in_1 is a vector of size krnl_dim * n0 that will be multiplied by the transpose of kernel matrix block and the output vector will be accumulated to output vector x_out_1 of size krnl_dim * n1.

An example of a KME function for the 3D Gaussian kernel:

void Gaussian_3D_krnl_bimv_std(
    const DTYPE *coord0, const int ld0, const int n0,
    const DTYPE *coord1, const int ld1, const int n1,
    const void *krnl_param, 
    const DTYPE *x_in_0, const DTYPE *x_in_1, 
    DTYPE * restrict x_out_0, DTYPE * restrict x_out_1
)
{
    const DTYPE *x0 = coord0 + ld0 * 0, *x1 = coord1 + ld1 * 0;
    const DTYPE *y0 = coord0 + ld0 * 1, *y1 = coord1 + ld1 * 1;
    const DTYPE *z0 = coord0 + ld0 * 2, *z1 = coord1 + ld1 * 2;
    const DTYPE *krnl_param_ = (DTYPE*) krnl_param;
    const DTYPE l = krnl_param_[0];
    for (int i = 0; i < n0; i++)
    {
        const DTYPE x0_i = x0[i], y0_i = y0[i], z0_i = z0[i];
        const DTYPE x_in_1_i = x_in_1[i];
        DTYPE sum_i = 0.0;
        #pragma omp simd  // Don't forget it! 
        for (int j = 0; j < n1; j++)
        {
            DTYPE dx = x1[j] - x0_i;
            DTYPE dy = y1[j] - y0_i;
            DTYPE dz = z1[j] - z0_i;
            DTYPE r2 = dx * dx + dy * dy + dz * dz;
            DTYPE kval = DEXP(-l * r2);
            sum_i += kval * x_in_0[j];
            x_out_1[j] += kval * x_in_1_i;
        }
        x_out_0[i] += sum_i;
    }
}

This BKM function is very similar to the 3D Gaussian kernel KME function in Using and Writing Kernel Functions. The only difference is that the calculated kernel matrix element kval is consumed and discarded immediately in the BKM function but stored to kmat in the KME function.

For tensor kernels, H2Pack has other requirements for their BKM functions. If you need to write a BKM function for your tensor kernel, please email Hua Huang for further assistance.

Clone this wiki locally