Skip to content

Performance tuning for 2D calculations (Trac #782) #125

@pkienzle

Description

@pkienzle

Can save 6 trig functions 9 multiplications and 5 additions by precomputing the orientation info for each q point. In absolute terms, it is 325k operations on a 128x128 detector. In relative terms, the fcc mode uses an additional 4 special functions, 49 multiplications and 16 adds, so this could be a 25% speed up.

Need to transform:

    q = sqrt(qx*qx + qy*qy);
    const double qxhat = qx/q;
    const double qyhat = qy/q;
    double sin_theta, cos_theta;
    double sin_phi, cos_phi;
    double sin_psi, cos_psi;
    SINCOS(theta*M_PI_180, sin_theta, cos_theta);
    SINCOS(phi*M_PI_180, sin_phi, cos_phi);
    SINCOS(psi*M_PI_180, sin_psi, cos_psi);
    cos_alpha = cos_theta*cos_phi*qxhat + sin_theta*qyhat;
    cos_mu = (-sin_theta*cos_psi*cos_phi - sin_psi*sin_phi)*qxhat + cos_theta*cos_psi*qyhat;
    cos_nu = (-cos_phi*sin_psi*sin_theta + sin_phi*cos_psi)*qxhat + sin_psi*cos_theta*qyhat;

Into a precompute phase:

    double sin_theta, cos_theta;
    double sin_phi, cos_phi;
    double sin_psi, cos_psi;
    SINCOS(theta*M_PI_180, sin_theta, cos_theta);
    SINCOS(phi*M_PI_180, sin_phi, cos_phi);
    SINCOS(psi*M_PI_180, sin_psi, cos_psi);
    alpha_x = cos_theta*cos_phi;
    alpha_y = sin_theta;
    mu_x = -sin_theta*cos_psi*cos_phi - sin_psi*sin_phi;
    mu_y = cos_theta*cos_psi;
    nu_x = -cos_phi*sin_psi*sin_theta + sin_phi*cos_psi;
    nu_y = sin_psi*cos_theta;

and a compute phase:

    q = sqrt(qx*qx + qy*qy);
    const double qxhat = qx/q;
    const double qyhat = qy/q;
    cos_alpha = alpha_x*qxhat + alpha_y*qyhat;
    cos_mu = mu_x*qxhat + mu_y*qyhat;
    cos_nu = nu_x*qxhat + nu_y*qyhat;

For polydisperse systems, need to precompute for each independent (theta,phi,psi) triple, but this can be done in parallel.

For polydisperse systems, can save a sqrt, 4 multiplies and an add by precomputing q, qxhat and qyhat for each point. Again, this can be done in parallel.

Could be implemented using global working memory (ticket SasView/sasview#810).

Migrated from http://trac.sasview.org/ticket/782

{
    "status": "new",
    "changetime": "2019-02-14T03:35:20",
    "_ts": "2019-02-14 03:35:20.686629+00:00",
    "description": "Can save 6 trig functions 9 multiplications and 5 additions by precomputing the orientation info for each q point.  In absolute terms, it is 325k operations on a 128x128 detector.  In relative terms, the fcc mode uses an additional 4 special functions, 49 multiplications and 16 adds, so this could be a 25% speed up.  \n\nNeed to transform:\n{{{\n    q = sqrt(qx*qx + qy*qy);\n    const double qxhat = qx/q;\n    const double qyhat = qy/q;\n    double sin_theta, cos_theta;\n    double sin_phi, cos_phi;\n    double sin_psi, cos_psi;\n    SINCOS(theta*M_PI_180, sin_theta, cos_theta);\n    SINCOS(phi*M_PI_180, sin_phi, cos_phi);\n    SINCOS(psi*M_PI_180, sin_psi, cos_psi);\n    cos_alpha = cos_theta*cos_phi*qxhat + sin_theta*qyhat;\n    cos_mu = (-sin_theta*cos_psi*cos_phi - sin_psi*sin_phi)*qxhat + cos_theta*cos_psi*qyhat;\n    cos_nu = (-cos_phi*sin_psi*sin_theta + sin_phi*cos_psi)*qxhat + sin_psi*cos_theta*qyhat;\n}}}\n\nInto a precompute phase:\n{{{\n    double sin_theta, cos_theta;\n    double sin_phi, cos_phi;\n    double sin_psi, cos_psi;\n    SINCOS(theta*M_PI_180, sin_theta, cos_theta);\n    SINCOS(phi*M_PI_180, sin_phi, cos_phi);\n    SINCOS(psi*M_PI_180, sin_psi, cos_psi);\n    alpha_x = cos_theta*cos_phi;\n    alpha_y = sin_theta;\n    mu_x = -sin_theta*cos_psi*cos_phi - sin_psi*sin_phi;\n    mu_y = cos_theta*cos_psi;\n    nu_x = -cos_phi*sin_psi*sin_theta + sin_phi*cos_psi;\n    nu_y = sin_psi*cos_theta;\n}}}\n\nand a compute phase:\n{{{\n    q = sqrt(qx*qx + qy*qy);\n    const double qxhat = qx/q;\n    const double qyhat = qy/q;\n    cos_alpha = alpha_x*qxhat + alpha_y*qyhat;\n    cos_mu = mu_x*qxhat + mu_y*qyhat;\n    cos_nu = nu_x*qxhat + nu_y*qyhat;\n}}}\n\nFor polydisperse systems, need to precompute for each independent (theta,phi,psi) triple, but this can be done in parallel.\n\nFor polydisperse systems, can save a sqrt, 4 multiplies and an add by precomputing q, qxhat and qyhat for each point.  Again, this can be done in parallel.\n\nCould be implemented using global working memory (ticket #679).",
    "reporter": "pkienzle",
    "cc": "",
    "resolution": "",
    "workpackage": "SasView Bug Fixing",
    "time": "2016-10-14T15:00:42",
    "component": "sasmodels",
    "summary": "Performance tuning for 2D calculations",
    "priority": "minor",
    "keywords": "",
    "milestone": "sasmodels WishList",
    "owner": "",
    "type": "enhancement"
}

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions