@@ -110,19 +110,27 @@ Example YAML Configuration for the Script
110110
111111 pos_file : " mean_md-pos-1.xyz"
112112 frc_file : " mean_md-frc-1.xyz"
113- temperature : 300.0
114- temperature_target : 300
115- temperature_target_surface : 450
116- max_displacement : 2.0
117- max_random_displacement : 0.1
118- surface_atom_types :
119- - " Cs"
120- - " Br"
121- clustering_method : " KMeans"
122- num_clusters : 100
123- num_samples_pca : 1200
124- num_samples_pca_surface : 600
125- num_samples_randomization : 200
113+ scaling_factor : 0.4
114+ scaling_surf : 0.6
115+ scaling_core : 0.4
116+ max_random_displacement : 0.15
117+ surface_atom_types :
118+ - " In"
119+ - " P"
120+ - " Cl"
121+ clustering_method : " KMeans"
122+ num_clusters : 100
123+ num_samples_pca : 1200
124+ num_samples_pca_surface : 600
125+ num_samples_randomization : 200
126+ SOAP :
127+ species : ["In", "P", "Cl"]
128+ r_cut : 12.0
129+ n_max : 7
130+ l_max : 3
131+ sigma : 0.1
132+ periodic : False
133+ sparse : False
126134
127135 Structure Generation Breakdown
128136~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -149,26 +157,11 @@ Detailed Explanation of YAML Input Keywords
149157 Path to the `.xyz ` file containing **corresponding atomic forces **.
150158 - These forces are used to evaluate **structural dynamics **.
151159
152- - **temperature **
153- The temperature at which the **AIMD simulation ** was run (typically in **Kelvin **).
154- - This provides context for the **thermal behavior ** of the system.
155-
156- - **temperature_target **
157- Desired **temperature for the core atoms ** during structure perturbation, ensuring they reflect **realistic thermal motion **.
158-
159- - **temperature_target_surface **
160- Higher **target temperature for surface atoms ** to reflect their **increased mobility **, leading to **larger perturbations ** compared to core atoms.
161-
162- - **max_displacement **
163- The **maximum allowed atomic displacement ** (in Ångströms) during structure perturbation along PCA components.
164- - This limits how much atoms can move, maintaining **realistic structures **.
165-
166160- **max_random_displacement **
167161 The **maximum displacement ** applied in the **random sampling step **.
168- - This value is **smaller ** than `max_displacement ` to introduce **minor random variations ** without disrupting structural integrity.
169162
170163- **surface_atom_types **
171- A list of **atomic species ** (e.g., `"Cs " `, `"Br " `) considered as **surface atoms **.
164+ A list of **atomic species ** (e.g., `"In " `, `"Cl " `) considered as **surface atoms **.
172165 - These atoms are **more prone to movement ** and are treated differently during **PCA sampling **.
173166
174167- **clustering_method **
@@ -188,6 +181,13 @@ Detailed Explanation of YAML Input Keywords
188181- **num_samples_randomization **
189182 Number of **randomly perturbed structures ** added to the dataset to increase **diversity **.
190183
184+ **SOAP ** refers to **Smooth Overlap of Atomic Positions **:
185+
186+ - **species **: adjust according to your model.
187+ - **r_cut **: a cutoff for the neighbouring environment.
188+ - **n_max **: max number of radial basis functions (RBF).
189+ - **l_max **: max degree of spherical harmonics.
190+ - **sigma **: the width of smearing.
191191
192192
193193Output Files and Visualization
@@ -517,14 +517,6 @@ Subset counts:
517517- ``Random ``: randomly selected structures for additional diversity.
518518- ``contamination ``: fraction of outliers removed by Isolation Forest.
519519
520- **SOAP ** refers to **Smooth Overlap of Atomic Positions **:
521-
522- - ``species ``: adjust according to your model.
523- - ``r_cut ``: a cutoff for the neighbouring environment.
524- - ``n_max ``: max number of radial basis functions (RBF).
525- - ``l_max ``: max degree of spherical harmonics.
526- - ``sigma ``: the width of smearing.
527-
528520The output files contain:
529521 * `consolidated_dataset `: a chunk of dataset with the most diverse structures (preferred for ML training).
530522 * `MD_random_dataset `: random structures picked from MD data.
0 commit comments