Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pacemaker crashes when early stopping and maximum iterations flags are raised at the same time #86

Open
HaithamGaafer opened this issue Feb 13, 2025 · 0 comments

Comments

@HaithamGaafer
Copy link
Contributor

I have been using the min_relative_train_loss_per_iter and min_relative_test_loss_per_iter in my fittings but during the ladder scheme fitting when the early stopping is triggered at the same time the maximum number of iterations have been reached (I set this to 1000), it crashes the run.

This may happen when min_relative_test_loss_per_iter is set to 5e-5 or 1e-4
Those are the other parameters I used in my fittings,

kappa = 0.08
nrad_max = [15, 7, 3, 2, 1, 1], l_max = [0, 4, 3, 2, 1, 1] => 374 functions
ladder_type = 'power_order'
ladder_step = [20, 0.1]
batch_size = 1000
early_stopping_patience = 200
max_iter = 1000

This is the error message I get from the log file after the 1000th iteration,

 --------------------------------------------TEST STATS--------------------------------------------
 Iteration:  #1000Loss:    Total:  2.6584e-05 (100%) 
                          Energy:  1.2040e-05 ( 45%) 
                           Force:  1.3858e-05 ( 52%) 
                              L1:  4.0020e-07 (  2%) 
                              L2:  2.8561e-07 (  1%) 
 Number of params./funcs:    585/100                                  Avg. time:       0.00 mcs/at
 -------------------------------------------------------------------------------------------------
            Energy/at, meV/at   Energy_low/at, meV/at      Force, meV/A        Force_low, meV/A   
    RMSE:            7.70                 3.36                31.63                   13.22
     MAE:            3.90                 2.27                10.31                    6.08
  MAX_AE:          154.11                28.21              1023.20                  208.15
 -------------------------------------------------------------------------------------------------
 2025/02/11 13:15:36 I - Last relative TEST loss change -4.72e-05/iter (averaged over last 50 step(s))
 /cmmc/ptmp/hgaafer/mambaforge/lib/python3.11/site-packages/scipy/optimize/_minimize.py:726: OptimizeWarning: Maximum number of iterations has been exceeded.
   res = _minimize_bfgs(fun, x0, args, jac, callback, **options)
 2025/02/11 13:15:40 I - EARLY STOPPING: Too small or even positive TEST loss change (best=-9.94e-05  / iter, last=+0.00e+00/iter, threshold = -1.00e-04/iter) within last 200 iterations. Stopping
          Current function value: 0.000229
          Iterations: 1000
          Function evaluations: 1028
          Gradient evaluations: 1028
 Fitting took 1767.17 seconds
          Current function value: 0.000065
          Iterations: 1000
          Function evaluations: 1023
          Gradient evaluations: 1023
 Fitting took 1998.61 seconds
          Current function value: 0.000044
          Iterations: 1000
          Function evaluations: 1024
          Gradient evaluations: 1024
 Fitting took 3360.17 seconds
          Current function value: 0.000031
          Iterations: 1000
          Function evaluations: 1010
          Gradient evaluations: 1010
 Fitting took 2765.05 seconds
          Current function value: 0.000024
          Iterations: 1000
          Function evaluations: 1013
          Gradient evaluations: 1013
 Traceback (most recent call last):
   File "/cmmc/ptmp/hgaafer/mambaforge/bin/pacemaker", line 401, in <module>
     main(sys.argv[1:])
   File "/cmmc/ptmp/hgaafer/mambaforge/bin/pacemaker", line 248, in main
     general_fit.fit()
   File "/cmmc/ptmp/hgaafer/mambaforge/lib/python3.11/site-packages/pyace/generalfit.py", line 481, in fit
     self.target_bbasisconfig = self.ladder_fitting(self.initial_bbasisconfig, self.target_bbasisconfig)
                                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
   File "/cmmc/ptmp/hgaafer/mambaforge/lib/python3.11/site-packages/pyace/generalfit.py", line 509, in ladder_fitting
     current_bbasisconfig = self.cycle_fitting(current_bbasisconfig)
                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
   File "/cmmc/ptmp/hgaafer/mambaforge/lib/python3.11/site-packages/pyace/generalfit.py", line 564, in cycle_fitting
     current_bbasisconfig = self.fit_backend.fit(
                            ^^^^^^^^^^^^^^^^^^^^^
   File "/cmmc/ptmp/hgaafer/mambaforge/lib/python3.11/site-packages/pyace/fitadapter.py", line 129, in fit
     raise e
   File "/cmmc/ptmp/hgaafer/mambaforge/lib/python3.11/site-packages/pyace/fitadapter.py", line 96, in fit
     fit_res = self.run_tensorpot_fit(bbasisconfig, dataframe, loss_spec, fit_config,
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
   File "/cmmc/ptmp/hgaafer/mambaforge/lib/python3.11/site-packages/pyace/fitadapter.py", line 236, in run_tensorpot_fit
     self.fitter.fit(dataframe, test_df=test_dataframe, niter=fit_config[FIT_NITER_KW],
   File "/cmmc/ptmp/hgaafer/mambaforge/lib/python3.11/site-packages/tensorpotential/fit.py", line 125, in fit
     self.process_test_metric()
   File "/cmmc/ptmp/hgaafer/mambaforge/lib/python3.11/site-packages/tensorpotential/fit.py", line 318, in process_test_metric
     self.test_metric_callback(curr_test_metrics_data)
   File "/cmmc/ptmp/hgaafer/mambaforge/lib/python3.11/site-packages/pyace/generalfit.py", line 410, in test_metric_callback
     self.detect_early_stopping(mode='test')
   File "/cmmc/ptmp/hgaafer/mambaforge/lib/python3.11/site-packages/pyace/generalfit.py", line 465, in detect_early_stopping
     raise TestLossChangeTooSmallException(msg)
 pyace.generalfit.TestLossChangeTooSmallException: EARLY STOPPING: Too small or even positive TEST loss change (best=-9.94e-05  / iter, last=+0.00e+00/iter, threshold = -1.00e-04/iter) within last 200 iterations. Stopping
 Exception: Potential file output_potential.yaml doesn't existsLoading B-basis from 'output_potential.yaml'
 Traceback (most recent call last):
   File "/cmmc/ptmp/hgaafer/mambaforge/bin/pace_yaml2yace", line 28, in <module>
     bbasis = ACEBBasisSet(input_yaml_filename)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 ValueError: Potential file output_potential.yaml doesn't exists
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant