-
Notifications
You must be signed in to change notification settings - Fork 30
Description
Can save 6 trig functions 9 multiplications and 5 additions by precomputing the orientation info for each q point. In absolute terms, it is 325k operations on a 128x128 detector. In relative terms, the fcc mode uses an additional 4 special functions, 49 multiplications and 16 adds, so this could be a 25% speed up.
Need to transform:
q = sqrt(qx*qx + qy*qy);
const double qxhat = qx/q;
const double qyhat = qy/q;
double sin_theta, cos_theta;
double sin_phi, cos_phi;
double sin_psi, cos_psi;
SINCOS(theta*M_PI_180, sin_theta, cos_theta);
SINCOS(phi*M_PI_180, sin_phi, cos_phi);
SINCOS(psi*M_PI_180, sin_psi, cos_psi);
cos_alpha = cos_theta*cos_phi*qxhat + sin_theta*qyhat;
cos_mu = (-sin_theta*cos_psi*cos_phi - sin_psi*sin_phi)*qxhat + cos_theta*cos_psi*qyhat;
cos_nu = (-cos_phi*sin_psi*sin_theta + sin_phi*cos_psi)*qxhat + sin_psi*cos_theta*qyhat;
Into a precompute phase:
double sin_theta, cos_theta;
double sin_phi, cos_phi;
double sin_psi, cos_psi;
SINCOS(theta*M_PI_180, sin_theta, cos_theta);
SINCOS(phi*M_PI_180, sin_phi, cos_phi);
SINCOS(psi*M_PI_180, sin_psi, cos_psi);
alpha_x = cos_theta*cos_phi;
alpha_y = sin_theta;
mu_x = -sin_theta*cos_psi*cos_phi - sin_psi*sin_phi;
mu_y = cos_theta*cos_psi;
nu_x = -cos_phi*sin_psi*sin_theta + sin_phi*cos_psi;
nu_y = sin_psi*cos_theta;
and a compute phase:
q = sqrt(qx*qx + qy*qy);
const double qxhat = qx/q;
const double qyhat = qy/q;
cos_alpha = alpha_x*qxhat + alpha_y*qyhat;
cos_mu = mu_x*qxhat + mu_y*qyhat;
cos_nu = nu_x*qxhat + nu_y*qyhat;
For polydisperse systems, need to precompute for each independent (theta,phi,psi) triple, but this can be done in parallel.
For polydisperse systems, can save a sqrt, 4 multiplies and an add by precomputing q, qxhat and qyhat for each point. Again, this can be done in parallel.
Could be implemented using global working memory (ticket SasView/sasview#810).
Migrated from http://trac.sasview.org/ticket/782
{
"status": "new",
"changetime": "2019-02-14T03:35:20",
"_ts": "2019-02-14 03:35:20.686629+00:00",
"description": "Can save 6 trig functions 9 multiplications and 5 additions by precomputing the orientation info for each q point. In absolute terms, it is 325k operations on a 128x128 detector. In relative terms, the fcc mode uses an additional 4 special functions, 49 multiplications and 16 adds, so this could be a 25% speed up. \n\nNeed to transform:\n{{{\n q = sqrt(qx*qx + qy*qy);\n const double qxhat = qx/q;\n const double qyhat = qy/q;\n double sin_theta, cos_theta;\n double sin_phi, cos_phi;\n double sin_psi, cos_psi;\n SINCOS(theta*M_PI_180, sin_theta, cos_theta);\n SINCOS(phi*M_PI_180, sin_phi, cos_phi);\n SINCOS(psi*M_PI_180, sin_psi, cos_psi);\n cos_alpha = cos_theta*cos_phi*qxhat + sin_theta*qyhat;\n cos_mu = (-sin_theta*cos_psi*cos_phi - sin_psi*sin_phi)*qxhat + cos_theta*cos_psi*qyhat;\n cos_nu = (-cos_phi*sin_psi*sin_theta + sin_phi*cos_psi)*qxhat + sin_psi*cos_theta*qyhat;\n}}}\n\nInto a precompute phase:\n{{{\n double sin_theta, cos_theta;\n double sin_phi, cos_phi;\n double sin_psi, cos_psi;\n SINCOS(theta*M_PI_180, sin_theta, cos_theta);\n SINCOS(phi*M_PI_180, sin_phi, cos_phi);\n SINCOS(psi*M_PI_180, sin_psi, cos_psi);\n alpha_x = cos_theta*cos_phi;\n alpha_y = sin_theta;\n mu_x = -sin_theta*cos_psi*cos_phi - sin_psi*sin_phi;\n mu_y = cos_theta*cos_psi;\n nu_x = -cos_phi*sin_psi*sin_theta + sin_phi*cos_psi;\n nu_y = sin_psi*cos_theta;\n}}}\n\nand a compute phase:\n{{{\n q = sqrt(qx*qx + qy*qy);\n const double qxhat = qx/q;\n const double qyhat = qy/q;\n cos_alpha = alpha_x*qxhat + alpha_y*qyhat;\n cos_mu = mu_x*qxhat + mu_y*qyhat;\n cos_nu = nu_x*qxhat + nu_y*qyhat;\n}}}\n\nFor polydisperse systems, need to precompute for each independent (theta,phi,psi) triple, but this can be done in parallel.\n\nFor polydisperse systems, can save a sqrt, 4 multiplies and an add by precomputing q, qxhat and qyhat for each point. Again, this can be done in parallel.\n\nCould be implemented using global working memory (ticket #679).",
"reporter": "pkienzle",
"cc": "",
"resolution": "",
"workpackage": "SasView Bug Fixing",
"time": "2016-10-14T15:00:42",
"component": "sasmodels",
"summary": "Performance tuning for 2D calculations",
"priority": "minor",
"keywords": "",
"milestone": "sasmodels WishList",
"owner": "",
"type": "enhancement"
}