This page was generated from examples/iterload.ipynb.
Out-of-core calculations with md.iterload
¶
[1]:
import mdtraj as md
import numpy as np
np.set_printoptions(threshold=50)
Sometimes your molecular dynamics trajectory files are too large to fit into memory. This can make analysis a burden, because you have to be very aware of the size of various objects. This can be a challenge in python because of the language’s automatic memory management.
Fortunately, python provides the iterator protocol that can help us out here. We can “stream through” a trajectory, without loading the entire thing into memory at all. Instead, we’ll process it in chunks.
For the purpose of this example, we’ll use a short trajectory that’s included with MDTraj for testing purposes. When you use this recipe yourself, you probably will want to point your code to your own trajectory file
[2]:
traj_filename = "data/frame0.h5"
First, if you only want a single frame of a trajectory, there’s no reason to load up the whole thing. md.load_frame
can load up a single frame for you. Let’s get the first one.
[3]:
first_frame = md.load_frame(traj_filename, 0)
first_frame
[3]:
<mdtraj.Trajectory with 1 frames, 22 atoms, 3 residues, and unitcells at 0x168ba2b00>
Using md.iterload
, you can iterate through chunks of the trajectory. If you don’t retain a reference to the chunk as you iterate through, then the python garbage collector can recycle the memory.
[4]:
rmsds = []
for chunk in md.iterload(traj_filename, chunk=100):
rmsds.append(md.rmsd(chunk, first_frame))
print(chunk, "\n", chunk.time)
<mdtraj.Trajectory with 100 frames, 22 atoms, 3 residues, and unitcells>
[500.00003 501.00003 502.00003 ... 597. 598. 599. ]
<mdtraj.Trajectory with 100 frames, 22 atoms, 3 residues, and unitcells>
[600. 601. 602. ... 697.00006 698.00006 699.00006]
<mdtraj.Trajectory with 100 frames, 22 atoms, 3 residues, and unitcells>
[700.00006 701.00006 702.00006 ... 797.00006 798.00006 799.00006]
<mdtraj.Trajectory with 100 frames, 22 atoms, 3 residues, and unitcells>
[800.00006 801.00006 802.00006 ... 897.00006 898.00006 899.00006]
<mdtraj.Trajectory with 100 frames, 22 atoms, 3 residues, and unitcells>
[900.00006 901.00006 902.00006 ... 997.00006 998.00006 999.00006]
<mdtraj.Trajectory with 1 frames, 22 atoms, 3 residues, and unitcells>
[1000.00006]
Now, we’ve calculated all of the rmsds chunk by chunk, and we can take a look at them.
[5]:
rmsds = np.concatenate(rmsds)
print(rmsds)
print("max rmsd ", np.max(rmsds), "at index", np.argmax(rmsds))
[0. 0.05957904 0.12331477 ... 0.12567955 0.08629464 0.14820947]
max rmsd 0.18972313 at index 44