Time irreversibility is a fundamental concept in physics, and the analysis of this property can provide insights into the underlying physical laws that govern the universe. However, the study of time irreversibility is often limited to mathematical models and computational simulations, and it can be challenging to gain a deeper understanding of the underlying principles. In this project, we aim to analyze time irreversibility through the lens of neural networks. The approach would be to compare the performance of the predictive models in both time directions for various physical systems, including Kepler orbital motion, Lorenz attractors and Belousov-Zhabotinsky reaction. The difference in performance or architecture giving similar performance should indicate the symmetry in the physics laws.
Predicting the trajectory of a dynamical system can be thought of as a time series problem: knowing the position at moments
- Generate time series data (see e.g.
generate_time_series.py). - Chop it into sliding windows.
- Train a model to predict a point after and before the window, given the window.
- Compare performance.
- Different performance forward and backward => irreversibility.
Physical and other processes under consideration:
- Brownian particle from the paper in Nature (
brownian_datagen.py), the stochastic thermodynamic system - probabilistic time series from the papers by Maximilano Zanin (
zanins_time_series.ipynb), - damped harmonic oscillator, damped double pendulum
- Lorenz attractor, Belousov-Zhabotinsky reaction, Kepler motion
- in
20230507_distributions/,20230626_distributions/the serialized learning curves and loss distributions are stored *.py: generally, python files hold helpers and infrastucture-supporting code that is then imported to Jupyter notebooks and the liketraintest_*.py: scripts that are run once to train the model and serialize the training process for future reuse and poking in Jupyter notebooksbayesian_*.{py,ipynb}: files in which we attempted to apply Bayesian neural networks to stochastic thermodynamical systems
When we were still experimenting with the hyperparameters, we used tensorboard to store and display learning curves.
Below is the history of experiments, with brief summaries.
Same as tensorboard5, but hidden_layer_size=30 and 10k points with timestep=0.025.
In tensorboard5 the model is probably underfitted, since the loss doesn't improve much, so I tried increasing the model size.
All plots are again similar: starts at 4, reach ~0.9 by epoch 1 and reach ~0.6 at epoch 50.
No difference between forward and backward.
New system: double pendulum!
It is chaotic, yet perfectly reversible, and our method must indicate that.
Tried 2 different time series samplings, with hidden_layer_size in (10,20) and window_len in (5,12,25).
- double pendulum, 9k points with timestep=0.067, more sparce
- double pendulum, 10k points with timestep=0.025, more dense
All plots are very similar: starts at 4, reach ~1.7 by epoch 5 and reach ~1 at epoch 50.
There's no qualitative difference between two sampling strategies.
The data confirms the hypothesis that the double pendulum is reversible: forward and backward are identical.
Rerun tensorboard4.1 on Kepler with hidden_layer_size =10 instead of 5.
- Kepler
- Now that there are way more params, the model probably overfits, getting to 1e-4 loss, but no weird underfitting as with
hidden_layer_size=5. - No clear winner
forwardvsbackward, as expected
- Now that there are way more params, the model probably overfits, getting to 1e-4 loss, but no weird underfitting as with
As in tensorboard4, vary window_len and target_len with hidden_layer_size fixed.
Pecularities:
-
chunk_len=window_len+target_lenis kept constant to see howwindow_lenaffects prediction without changing the total number of trainable parameters in the model (proportional tochunk_len). - Optimizer is still
Adam+ExponentialLRbut changedgammafrom 0.95 to 0.96 ($0.96^{n_{\text{epoch}}=50} \approx 0.13$ vs$0.95^{n_{\text{epoch}}=50} \approx 0.08$ ). - The grid is much sparser (so it's easier to grasp) and now it's hardcoded.
Tensorboards:
- Kepler
- For Lorenz and Belousov-Zhabotinsky
hidden_layer_size=13and forKeplerI usedhidden_layer_size=5. - There are several weird curves that stay at 0.4 after epoch 5, probably need more params.
- For Lorenz and Belousov-Zhabotinsky
- Lorenz
- Belousov-Zhabotinsky
- all learning curves are almost identical: abrupt loss drop after the first epoch, then
forwardslowly and steadily wins, which you can better see in the log scale
- all learning curves are almost identical: abrupt loss drop after the first epoch, then
Rerun tensorboard2 on Belousov-Zhabotinsky after I changed the dataset so that it only includes the first period of the periodic motion.
It used to include about 20 identical periods, and I thought it was wrong.
Observations:
- For some reason, learning curves are much smoother than for
tensorboard2. It would be pointless to add a schedulerExponentialLR. - For
hidden_layer_sizeequal to 1 or 5, weird things happen, so I assume model needs more parameters to learn. - For
hidden_layer_sizeequal to 9,13,17forwardquickly reaches 1e-3 loss, whilebackward's loss increases and then falls back (why?). backwardhas reliably greater loss thanforward-- the process is "irreversible"
It is unobvious whether or not shrinking the dataset to 1 period was a good idea.
Vary window_len and target_len at hidden_layer_size=13 with (torch.optim.Adam + torch.optim.lr_scheduler.ExponentialLR(gamma=0.95)).
Observations:
- Too many pictures, hard to make conclusions + computation takes too long.
backwardhas greater loss thatforward, but often insignificantly. Need a closer look with fewer pictures.- It might be better to vary
window_lenandtarget_lenand keep their sum (proportional to the total amount of parameters in the model) constant. - The bigger
target_lenis, the greater the typical loss values are. I average the loss over the train dataset, but not over each target point. - For
window_len>36andtarget_len > 0.6*window_len, loss goes, very roughly speaking, from 90 down to 30. Probably underfitting, probably due toExponentialLRdying out too fast. - Consider
window_len=76,target_len=31. This amounts tochunk_len=107, which is 1% of the 10000 points in the original time series. If you look at the plot indataset_review.ipynb, this is a hugechunk_len. If you look inmodel_review.ipynb, withsize=13the total number of trainable parameters in the model is about 4.5k, half the training dataset size. This is to say, I should've stopped atwindow_len=30.
Same as tensorboard3, except
- added 4th optimizer into comparison:
torch.optim.RMSprop+torch.optim.lr_scheduler.ExponentialLR(gamma=0.95) - changed
hidden_layer_sizefrom 10 to 16.
Other parameters remain window_len=30, target_len=1.
-
Lorenz
-
Adamis a bit noisy -
Adam+ExponentialLRis very smooth, increasegamma=0.95to make it less smooth ($0.95^{n_{\text{epoch}}=50} \approx 0.07$ ) -
RMSprop-- model doesn't learn, too noisy -
RMSprop+ExponentialLRroughly same as Adam, a bit noisy
-
I compare three different optimizers, all with default parameters:
torch.optim.Adam(was used in all runs before)torch.optim.RMSprop(turned out to be too noisy)torch.optim.Adam+torch.optim.lr_scheduler.ExponentialLR(gamma=0.95)(maybe optimal, maybe too smooth)
Other parameters are fixed: window_len=30, hidden_layer1_size=hidden_layer2_size=10, target_len=1.
Test dataset is the same as train to avoid randomization and sampling bias observed in tensorboard1.1.
For each system, there are ~50 learning curves with hidden_layer_size going from 1 to 20.
This corresponds to the total number of parameters in ThreeFullyConnectedLayers ranging from ~0.1k to ~2k.
I rerun the same learning process 3 times, each labeled by one of the letters a,b,c to make up for some randomness due to randomized batching in torch.utils.data.DataLoader and initial weights.
An observation: for small hidden_layer_size, loss usually stops at value > 10, implying the model doesn't learn.
- Kepler
- size 1-4: weird stuff, too few params
- size 10-20: best fit after 5-10 epochs, crazy noise with 1e-2 loss afterwards
- size 5-9: a bit noisy, something in between.
- no clear winner
forwardvsbackward
- Lorenz
- size 1-4: weird stuff, too few params
- size 5-8: very smooth
- size 5-20:
backwardhas greater loss about 80% of the time.
- Belousov-Zhabotinsky
- size 1-3: weird stuff, too few params
- size 6-20: noisy, but
backwardis strictly greater thanforward, and also much noisier
Redo the exact same plots with few minor fixes.
For each of three physical systems, I vary (1) the hidden layer size at fixed window_len and (2) window_len and shift_ratio at fixed hidden layer size.
shift_ratio defines which part of the periodic trajectory we consider to be the test and which to be the train data.
The somewhat chaotic results for (2) show that shift_ratio is important.
If you reveal only a region of the periodic orbit for training, the remaining [test] region might be qualitatively different from the training one, and it's unreasonable to expect that the model will make accurate predictions about the hidden part of the orbit.
