Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GHMC is slow #22

Open
kyleabeauchamp opened this issue Dec 5, 2014 · 17 comments
Open

GHMC is slow #22

kyleabeauchamp opened this issue Dec 5, 2014 · 17 comments

Comments

@kyleabeauchamp
Copy link
Collaborator

My crude experiments suggest that, for a 4500 atom system, the GHMC integrator is 3 or 4 times slower than Langevin.

Given this slowdown, most users would be more inclined to just reduce their timestep and stick with Langevin. A faster implementation of GHMC might help with that.

@jchodera
Copy link
Member

jchodera commented Dec 5, 2014

I've noticed a number of CustomIntegrator versions are pretty slow, so I'm wondering if there is some general optimization that can be done.

Were you using CUDA or OpenCL here?

@kyleabeauchamp
Copy link
Collaborator Author

CUDA

@jchodera
Copy link
Member

jchodera commented Dec 5, 2014

It certainly shouldn't be 3-4 times slower. Yes, it does require the force and energy be computed each timestep, but this shouldn't be 4x slower.

One issue is that sigma is recomputed each iteration, rather than just once at the beginning:
https://github.com/choderalab/openmmtools/blob/master/OpenMMTools/integrators.py#L553

If we extended the API to add a addInitializePerDof() method that would ensure this is just called once, that would speed things up.

There might be other hidden issues.

@jchodera
Copy link
Member

jchodera commented Dec 5, 2014

For now, a viable route would be to adaptively tune the timestep for very high acceptance rates (e.g. 99.999%) with GHMC then lock in that timestep for VVVR.

@jchodera
Copy link
Member

jchodera commented Dec 5, 2014

CUDA

Could you give OpenCL a try?

@peastman is certainly more familiar with the implementation details, but I believe the CUDA version interprets the various steps to launch kernels on the GPU, while the OpenCL version actually compiles the integrator into a GPU kernel. But I may be wrong---that may have been an older implementation.

@peastman
Copy link

peastman commented Dec 5, 2014

The CUDA and OpenCL versions are nearly identical.

Looking at the code, I see several reasons for it to be slower. First, it applies constraints five different times (twice for positions and three times for velocities). The Langevin integrator only applies constraints once. Depending on what sort of constraints your system has, that could have a big impact on speed.

Second, it requires two force/energy evaluations per time step. You use the force and energy, so it has to compute them:

integrator.addComputeGlobal("Eold", "ke + energy")
...
integrator.addComputePerDof("v", "v + 0.5*dt*f/m")

Then you modify the positions, so the forces it computed before are no longer valid:

integrator.addComputePerDof("x", "x + v*dt")

Then you use the force and energy again, so it has to recompute them:

integrator.addComputePerDof("v", "v + 0.5*dt*f/m + (x-x1)/dt")
...
integrator.addComputeGlobal("Enew", "ke + energy")

And then you modify the positions yet again, which again invalidates the forces and means they'll have to be recomputed at the start of the next time step:

integrator.addComputePerDof("x", "x*accept + xold*(1-accept)")

Anything that modifies the positions invalidates the forces, either addComputePerDof("x", ...), addConstrainPositions(), or potentially addUpdateContextState().

@kyleabeauchamp
Copy link
Collaborator Author

Is there something like a "stack" that can be used to cache the old forces, energies, and positions without massive recomputation?

@peastman
Copy link

peastman commented Dec 5, 2014

No, it only caches one set of forces at a time, then throws them out when anything causes them to become invalid.

@jchodera
Copy link
Member

@kyleabeauchamp Did you want me to take a stab at speeding this up? Or are you on that?

@kyleabeauchamp
Copy link
Collaborator Author

I think eventually we should do that. My own work is still at the level of running some basic tests on these things.

@kyleabeauchamp
Copy link
Collaborator Author

I think the easiest way to speed up our GHMC code is to avoid re-calculating the energy with every timestep. This is easily done by doing n_steps iterations of hamiltonian dynamics with every round of GHMC; our current code essentially hard-wires n_steps = 1, which kills us on the energy calculation.

@jchodera
Copy link
Member

We can definitely expose the number of steps as a parameter. The acceptance rate falls off with the number of steps, but there is likely an optimal, and there may be other tricks (as you've pointed out) to keep acceptance rates high.

Should I submit a PR?

@kyleabeauchamp
Copy link
Collaborator Author

Let's punt until this project is our top priority. I believe I already have this code written as well.

@jchodera
Copy link
Member

Did you experiment with removing the extra velocity/position constraint steps too? I'm really curious why the CustomIntegrator is so slow---is it really just the energy call?

@kyleabeauchamp
Copy link
Collaborator Author

I did not experiment with the constraint steps yet.

@jchodera
Copy link
Member

I think the easiest way to speed up our GHMC code is to avoid re-calculating the energy with every timestep. This is easily done by doing n_steps iterations of hamiltonian dynamics with every round of GHMC; our current code essentially hard-wires n_steps = 1, which kills us on the energy calculation.

Oh, there's another way to avoid this that doesn't involve doing extra hamiltonian dynamics inside GHMC (which decreases the acceptance rate): Have your n_steps parameter run a block of GHMC steps where the steps are unrolled in the integrator such that the total energy from the last GHMC step is not recomputed for the next GHMC step.

@kyleabeauchamp
Copy link
Collaborator Author

I think I like the idea of just exposing n_steps for now, as it's a hyperparameter than can be tuned rather easily. It also cleans up the code base

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants