Skip to content

pacemaker crashes when early stopping and maximum iterations flags are raised at the same time #86

Open
@HaithamGaafer

Description

@HaithamGaafer

I have been using the min_relative_train_loss_per_iter and min_relative_test_loss_per_iter in my fittings but during the ladder scheme fitting when the early stopping is triggered at the same time the maximum number of iterations have been reached (I set this to 1000), it crashes the run.

This may happen when min_relative_test_loss_per_iter is set to 5e-5 or 1e-4
Those are the other parameters I used in my fittings,

kappa = 0.08
nrad_max = [15, 7, 3, 2, 1, 1], l_max = [0, 4, 3, 2, 1, 1] => 374 functions
ladder_type = 'power_order'
ladder_step = [20, 0.1]
batch_size = 1000
early_stopping_patience = 200
max_iter = 1000

This is the error message I get from the log file after the 1000th iteration,

 --------------------------------------------TEST STATS--------------------------------------------
 Iteration:  #1000Loss:    Total:  2.6584e-05 (100%) 
                          Energy:  1.2040e-05 ( 45%) 
                           Force:  1.3858e-05 ( 52%) 
                              L1:  4.0020e-07 (  2%) 
                              L2:  2.8561e-07 (  1%) 
 Number of params./funcs:    585/100                                  Avg. time:       0.00 mcs/at
 -------------------------------------------------------------------------------------------------
            Energy/at, meV/at   Energy_low/at, meV/at      Force, meV/A        Force_low, meV/A   
    RMSE:            7.70                 3.36                31.63                   13.22
     MAE:            3.90                 2.27                10.31                    6.08
  MAX_AE:          154.11                28.21              1023.20                  208.15
 -------------------------------------------------------------------------------------------------
 2025/02/11 13:15:36 I - Last relative TEST loss change -4.72e-05/iter (averaged over last 50 step(s))
 /cmmc/ptmp/hgaafer/mambaforge/lib/python3.11/site-packages/scipy/optimize/_minimize.py:726: OptimizeWarning: Maximum number of iterations has been exceeded.
   res = _minimize_bfgs(fun, x0, args, jac, callback, **options)
 2025/02/11 13:15:40 I - EARLY STOPPING: Too small or even positive TEST loss change (best=-9.94e-05  / iter, last=+0.00e+00/iter, threshold = -1.00e-04/iter) within last 200 iterations. Stopping
          Current function value: 0.000229
          Iterations: 1000
          Function evaluations: 1028
          Gradient evaluations: 1028
 Fitting took 1767.17 seconds
          Current function value: 0.000065
          Iterations: 1000
          Function evaluations: 1023
          Gradient evaluations: 1023
 Fitting took 1998.61 seconds
          Current function value: 0.000044
          Iterations: 1000
          Function evaluations: 1024
          Gradient evaluations: 1024
 Fitting took 3360.17 seconds
          Current function value: 0.000031
          Iterations: 1000
          Function evaluations: 1010
          Gradient evaluations: 1010
 Fitting took 2765.05 seconds
          Current function value: 0.000024
          Iterations: 1000
          Function evaluations: 1013
          Gradient evaluations: 1013
 Traceback (most recent call last):
   File "/cmmc/ptmp/hgaafer/mambaforge/bin/pacemaker", line 401, in <module>
     main(sys.argv[1:])
   File "/cmmc/ptmp/hgaafer/mambaforge/bin/pacemaker", line 248, in main
     general_fit.fit()
   File "/cmmc/ptmp/hgaafer/mambaforge/lib/python3.11/site-packages/pyace/generalfit.py", line 481, in fit
     self.target_bbasisconfig = self.ladder_fitting(self.initial_bbasisconfig, self.target_bbasisconfig)
                                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
   File "/cmmc/ptmp/hgaafer/mambaforge/lib/python3.11/site-packages/pyace/generalfit.py", line 509, in ladder_fitting
     current_bbasisconfig = self.cycle_fitting(current_bbasisconfig)
                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
   File "/cmmc/ptmp/hgaafer/mambaforge/lib/python3.11/site-packages/pyace/generalfit.py", line 564, in cycle_fitting
     current_bbasisconfig = self.fit_backend.fit(
                            ^^^^^^^^^^^^^^^^^^^^^
   File "/cmmc/ptmp/hgaafer/mambaforge/lib/python3.11/site-packages/pyace/fitadapter.py", line 129, in fit
     raise e
   File "/cmmc/ptmp/hgaafer/mambaforge/lib/python3.11/site-packages/pyace/fitadapter.py", line 96, in fit
     fit_res = self.run_tensorpot_fit(bbasisconfig, dataframe, loss_spec, fit_config,
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
   File "/cmmc/ptmp/hgaafer/mambaforge/lib/python3.11/site-packages/pyace/fitadapter.py", line 236, in run_tensorpot_fit
     self.fitter.fit(dataframe, test_df=test_dataframe, niter=fit_config[FIT_NITER_KW],
   File "/cmmc/ptmp/hgaafer/mambaforge/lib/python3.11/site-packages/tensorpotential/fit.py", line 125, in fit
     self.process_test_metric()
   File "/cmmc/ptmp/hgaafer/mambaforge/lib/python3.11/site-packages/tensorpotential/fit.py", line 318, in process_test_metric
     self.test_metric_callback(curr_test_metrics_data)
   File "/cmmc/ptmp/hgaafer/mambaforge/lib/python3.11/site-packages/pyace/generalfit.py", line 410, in test_metric_callback
     self.detect_early_stopping(mode='test')
   File "/cmmc/ptmp/hgaafer/mambaforge/lib/python3.11/site-packages/pyace/generalfit.py", line 465, in detect_early_stopping
     raise TestLossChangeTooSmallException(msg)
 pyace.generalfit.TestLossChangeTooSmallException: EARLY STOPPING: Too small or even positive TEST loss change (best=-9.94e-05  / iter, last=+0.00e+00/iter, threshold = -1.00e-04/iter) within last 200 iterations. Stopping
 Exception: Potential file output_potential.yaml doesn't existsLoading B-basis from 'output_potential.yaml'
 Traceback (most recent call last):
   File "/cmmc/ptmp/hgaafer/mambaforge/bin/pace_yaml2yace", line 28, in <module>
     bbasis = ACEBBasisSet(input_yaml_filename)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 ValueError: Potential file output_potential.yaml doesn't exists

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions