Skip to content

[newchem-cpp] We need to replace the gaussj_g routine (for licensing reasons) #423

@mabruzzo

Description

@mabruzzo

Some important considerations for replacing gaussj_g are described below:

  • While doing some background reading to understand what it does, I realized that this is a slightly modified version of a routine directly provided in "Numerical Recipes."

    • The original Numerical Recipes code can be found here
    • Code from Numerical Recipes CANNOT be merged into Grackle because it is commercially licensed and there are well-known incompatibilities with open source licenses
  • the only place this is called just uses it to solve the system of equations (it doesn't need matrix inversion). There are faster alternatives that don't invert the matrix. In comparison to the current algorithm (for solving a system of linear equations for 1 right-hand vector choice), then:

    1. rewriting Gauss-Jordan without the matrix-inversion logic reduces the number of operations by a factor of 2
    2. Gaussian Elimination with Back Substitution reduces the number of operations by a factor of 3
    3. When LU-decomposition is used, the number of operations is also reduced by a factor of 3.

    These numbers all come from Numerical Recipes. For our purposes, the last 2 choices have equivalent complexity (LU-decomposition is the standard choice since you can do the majority of the algorithmic work ahead of time without knowing the right-hand vector).

  • We could theoretically experiment with using LAPACK's dgesv routine. I think we will probably want to ship a default implementation anyway, so this is something we should consider later. Overall, this opens a large can of worms:

    • For the uninitiated, LAPACK is essentially a specification implemented by various libraries. They are usually implemented by BLAS routines (which are similarly a specification with many implementations). The selling point is that you can get highly optimized linear-algebra operations on different machines
    • The tradeoff is reproducibility. Different implementations adopt different algorithms and achieve different levels of accuracy. If we allowed people to use these accelerated implementations, we should probably make it very easy to dump all of the configuration information if they encounter a bug. (Numpy and scipy both do this).
    • To have any hope of reproducing user-reported errors, we would either need to:
      • ship a default routine as part of Grackle and only let people adopt an external LAPACK implementation if they opt into it (and not promise any support if they don't stick with the default)
      • come up with some sophisticated machinery to let people serialize Grackle's internal state immediately after invoking the LAPACK routine (but before encountering a bug)
    • If we don't ship a default built-in routine and require people to link against an external LAPACK implementation, I worry this will significantly complicate Grackle builds:
      • While CMake has handy machinery for locating a LAPACK implementation (along with its underlying BLAS routine), But I worry it may still be a little tricky for inexperienced students (especially if they don't have a LAPACK implementation already installed on their machine and they don't have administrative privileges)
      • I also don't think we should support linking LAPACK in the classic build-system
      • To upload pygrackle as a binary package on pypi, we will need to ship a precompiled copy of LAPACK (just like numpy and scipy)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    Status

    Todo

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions