Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot access frames or slice trajectories for large or combined dcds #4777

Open
vishlashkari opened this issue Nov 1, 2024 · 1 comment
Labels
Component-Readers Format-DCD more information needed Please reply to requests for information or the issue will be closed.

Comments

@vishlashkari
Copy link

Expected behavior

  1. Slice trajectories as described in the userguide. Eventual goal is to concatenate / combine a bunch of .dcd trajectories (load all at once) and do analysis on every 10th frame, so basically exactly Cannot access frames after a given point in large DCD files  #3075

  2. Use existing workaround to slice individual trajectories into new dcds and then concatenate those.

Actual behavior

  1. When iterating over a dcd (by any more than 1 frame at a time) or when attempting to access frames 939+ (have tried many dcds):
OSError                                   Traceback (most recent call last)
Cell In[3], line 1
----> 1 for ts in u.trajectory[::10]:
      2     print(ts.frame)

File [~\AppData\Local\anaconda3\envs\mdanalysis\Lib\site-packages\MDAnalysis\coordinates\base.py:211](http://localhost:8888/lab/workspaces/auto-W/tree/Desktop/~/AppData/Local/anaconda3/envs/mdanalysis/Lib/site-packages/MDAnalysis/coordinates/base.py#line=210), in FrameIteratorSliced.__iter__(self)
    209 def __iter__(self):
    210     for i in range(self.start, self.stop, self.step):
--> 211         yield self.trajectory[i]
    212     self.trajectory.rewind()

File [~\AppData\Local\anaconda3\envs\mdanalysis\Lib\site-packages\MDAnalysis\coordinates\base.py:833](http://localhost:8888/lab/workspaces/auto-W/tree/Desktop/~/AppData/Local/anaconda3/envs/mdanalysis/Lib/site-packages/MDAnalysis/coordinates/base.py#line=832), in ProtoReader.__getitem__(self, frame)
    831 if isinstance(frame, numbers.Integral):
    832     frame = self._apply_limits(frame)
--> 833     return self._read_frame_with_aux(frame)
    834 elif isinstance(frame, (list, np.ndarray)):
    835     if len(frame) != 0 and isinstance(frame[0], (bool, np.bool_)):
    836         # Avoid having list of bools

File [~\AppData\Local\anaconda3\envs\mdanalysis\Lib\site-packages\MDAnalysis\coordinates\base.py:865](http://localhost:8888/lab/workspaces/auto-W/tree/Desktop/~/AppData/Local/anaconda3/envs/mdanalysis/Lib/site-packages/MDAnalysis/coordinates/base.py#line=864), in ProtoReader._read_frame_with_aux(self, frame)
    863 def _read_frame_with_aux(self, frame):
    864     """Move to *frame*, updating ts with trajectory, transformations and auxiliary data."""
--> 865     ts = self._read_frame(frame)  # pylint: disable=assignment-from-no-return
    866     for aux in self.aux_list:
    867         ts = self._auxs[aux].update_ts(ts)

File [~\AppData\Local\anaconda3\envs\mdanalysis\Lib\site-packages\MDAnalysis\coordinates\DCD.py:198](http://localhost:8888/lab/workspaces/auto-W/tree/Desktop/~/AppData/Local/anaconda3/envs/mdanalysis/Lib/site-packages/MDAnalysis/coordinates/DCD.py#line=197), in DCDReader._read_frame(self, i)
    196 """read frame i"""
    197 self._frame = i - 1
--> 198 self._file.seek(i)
    199 return self._read_next_timestep()

File [~\AppData\Local\anaconda3\envs\mdanalysis\Lib\site-packages\MDAnalysis\lib\formats\libdcd.pyx:400](http://localhost:8888/lab/workspaces/auto-W/tree/Desktop/~/AppData/Local/anaconda3/envs/mdanalysis/Lib/site-packages/MDAnalysis/lib/formats/libdcd.pyx#line=399), in MDAnalysis.lib.formats.libdcd.DCDFile.seek()

OSError: DCD seek failed with DCD error=Normal EOF
  1. When using the workaround and only writing every 10th frame manually with i % 10 == 0 (no slice), I can iterate to the end of the dcd with no issue and write every 10th frame. However, I can only do this if 1 dcd is loaded. If I try to load 2 dcds, then I get the same error. So I can do this on all individual dcds in my overall simulation. Then I can load all of the shortened trajectories at once and avoid the above problem by simply using every frame (the stride/slice has already been applied now). However, the issue is that the ts.time values of the new trajectory are not correct and the original ts.time information seems to have been lost. The frames of the new trajectory are all separated by ~1 picosecond. The time also resets to 0 at the frame where two shortened dcds got concatenated. (e.g. times = 1,2,...,999,1000,0,1,2,...) So I cannot graph my metrics against the original time scale as I intended. A workaround would be to use my step size and frame save rate, but I am wondering if this behavior is expected. This time discrepancy in written dcds is also present on Linux.

In short, the second problem is that when writing a dcd from a loaded dcd, the time of each frame is incorrect/lost.

Code to reproduce the behavior

  1. The test files are not long enough for this issue. My error begins consistently at frame 939. When I put 10x of the test DCD to get 940+ frames, I don't get any of these errors. But in that case I am using the test PSF because the test PDB gives an error with the test DCD. When I run this with my exact same files on Linux, I also don't get this first problem.

Iterating over every frame with no skips produces no errors.

for ts in u.trajectory:
    print(ts.frame)

Directly from the userguide:

u = mda.Universe('test.pdb', 'test.dcd')
fiter = u.trajectory[10::10]
frames = [ts.frame for ts in fiter]
print(frames, u.trajectory.frame)

Out: ... OSError: DCD seek failed with DCD error=Normal EOF

And similarly, this also fails if the 10 is replaced with anything other than 1:

u = mda.Universe('test.pdb', 'test.dcd')
for ts in u.trajectory[::10]:
    print(ts.frame)

Out: ... OSError: DCD seek failed with DCD error=Normal EOF

Accessing any frame 938 or before is fine.

In: u.trajectory[938]
Out: < Timestep 938 with unit cell dimensions [132.12038  120.366974 119.66685   90.        90.        90.      ] >

But for all frames 939+:

In: u.trajectory[939]
Out: ... OSError: DCD seek failed with DCD error=Normal EOF
  1. Workaround with 1 dcd loaded works, but the times of the frames of the written dcd do not match that of the original:
u = mda.Universe("test.pdb", "test.dcd")
ag = u.select_atoms('all')
with mda.Writer('out.dcd', ag.n_atoms) as w:
    for ts in u.trajectory:
        if ts.frame % 10 == 0:  # Write every 10th frame
            w.write(ag.atoms)
            print(ts.frame)
    print('Done')

With multiple dcds loaded, prints Done with no errors but the last frame recorded is 930, and frame times are still incorrect:

u = mda.Universe("test.pdb", "test.dcd", "test.dcd")
ag = u.select_atoms('all')
with mda.Writer('out.dcd', ag.n_atoms) as w:
    for ts in u.trajectory:
        if ts.frame % 10 == 0:  # Write every 10th frame
            w.write(ag.atoms)
            print(ts.frame)
    print('Done')

Current version of MDAnalysis

  • Which version are you using? 2.7.0
  • Which version of Python (python -V)? Python 3.12.0 (using JupyterLab or Spyder)
  • Which operating system? Windows 11 (first issue does not occur on Linux with same pdb and dcd files, frame times still an issue)
@orbeckst
Copy link
Member

orbeckst commented Nov 20, 2024

  1. How did you produce your DCD files? Does your program properly record the number of frames in it?
  2. How big (in GiB) are your DCD files? How many atoms per frame?
  3. You can try the DCD from https://www.mdanalysis.org/MDAnalysisData/adk_equilibrium.html which contains >4000 frames (install MDAnalysisData with e.g. mamba install mdanalysisdata). This works for me:
In [1]: from MDAnalysisData import datasets

In [2]: adk = datasets.fetch_adk_equilibrium()

In [3]: import MDAnalysis as mda

In [4]: u = mda.Universe(adk.topology, adk.trajectory)
~/anaconda3/envs/mda311/lib/python3.11/site-packages/MDAnalysis/coordinates/DCD.py:165: DeprecationWarning: DCDReader currently makes independent timesteps by copying self.ts while other readers update self.ts inplace. This behavior will be changed in 3.0 to be the same as other readers. Read more at https://github.com/MDAnalysis/mdanalysis/issues/3889 to learn if this change in behavior might affect you.
  warnings.warn("DCDReader currently makes independent timesteps"

In [5]: u.trajectory
Out[5]: <DCDReader ~/MDAnalysis_data/adk_equilibrium/1ake_007-nowater-core-dt240ps.dcd with 4187 frames of 3341 atoms>

In [6]: frames = [ts.frame for ts in u.trajectory[10::10]]

In [7]: print(frames, u.trajectory.frame)
[10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120, 130, 140, 150, 160, 170, 180, 190, 200, 210, 220, 230, 240, 250, 260, 270, 280, 290, 300, 310, 320, 330, 340, 350, 360, 370, 380, 390, 400, 410, 420, 430, 440, 450, 460, 470, 480, 490, 500, 510, 520, 530, 540, 550, 560, 570, 580, 590, 600, 610, 620, 630, 640, 650, 660, 670, 680, 690, 700, 710, 720, 730, 740, 750, 760, 770, 780, 790, 800, 810, 820, 830, 840, 850, 860, 870, 880, 890, 900, 910, 920, 930, 940, 950, 960, 970, 980, 990, 1000, 1010, 1020, 1030, 1040, 1050, 1060, 1070, 1080, 1090, 1100, 1110, 1120, 1130, 1140, 1150, 1160, 1170, 1180, 1190, 1200, 1210, 1220, 1230, 1240, 1250, 1260, 1270, 1280, 1290, 1300, 1310, 1320, 1330, 1340, 1350, 1360, 1370, 1380, 1390, 1400, 1410, 1420, 1430, 1440, 1450, 1460, 1470, 1480, 1490, 1500, 1510, 1520, 1530, 1540, 1550, 1560, 1570, 1580, 1590, 1600, 1610, 1620, 1630, 1640, 1650, 1660, 1670, 1680, 1690, 1700, 1710, 1720, 1730, 1740, 1750, 1760, 1770, 1780, 1790, 1800, 1810, 1820, 1830, 1840, 1850, 1860, 1870, 1880, 1890, 1900, 1910, 1920, 1930, 1940, 1950, 1960, 1970, 1980, 1990, 2000, 2010, 2020, 2030, 2040, 2050, 2060, 2070, 2080, 2090, 2100, 2110, 2120, 2130, 2140, 2150, 2160, 2170, 2180, 2190, 2200, 2210, 2220, 2230, 2240, 2250, 2260, 2270, 2280, 2290, 2300, 2310, 2320, 2330, 2340, 2350, 2360, 2370, 2380, 2390, 2400, 2410, 2420, 2430, 2440, 2450, 2460, 2470, 2480, 2490, 2500, 2510, 2520, 2530, 2540, 2550, 2560, 2570, 2580, 2590, 2600, 2610, 2620, 2630, 2640, 2650, 2660, 2670, 2680, 2690, 2700, 2710, 2720, 2730, 2740, 2750, 2760, 2770, 2780, 2790, 2800, 2810, 2820, 2830, 2840, 2850, 2860, 2870, 2880, 2890, 2900, 2910, 2920, 2930, 2940, 2950, 2960, 2970, 2980, 2990, 3000, 3010, 3020, 3030, 3040, 3050, 3060, 3070, 3080, 3090, 3100, 3110, 3120, 3130, 3140, 3150, 3160, 3170, 3180, 3190, 3200, 3210, 3220, 3230, 3240, 3250, 3260, 3270, 3280, 3290, 3300, 3310, 3320, 3330, 3340, 3350, 3360, 3370, 3380, 3390, 3400, 3410, 3420, 3430, 3440, 3450, 3460, 3470, 3480, 3490, 3500, 3510, 3520, 3530, 3540, 3550, 3560, 3570, 3580, 3590, 3600, 3610, 3620, 3630, 3640, 3650, 3660, 3670, 3680, 3690, 3700, 3710, 3720, 3730, 3740, 3750, 3760, 3770, 3780, 3790, 3800, 3810, 3820, 3830, 3840, 3850, 3860, 3870, 3880, 3890, 3900, 3910, 3920, 3930, 3940, 3950, 3960, 3970, 3980, 3990, 4000, 4010, 4020, 4030, 4040, 4050, 4060, 4070, 4080, 4090, 4100, 4110, 4120, 4130, 4140, 4150, 4160, 4170, 4180] 0

@orbeckst orbeckst added Component-Readers Format-DCD more information needed Please reply to requests for information or the issue will be closed. labels Nov 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Component-Readers Format-DCD more information needed Please reply to requests for information or the issue will be closed.
Projects
None yet
Development

No branches or pull requests

2 participants