@@ -110,19 +110,27 @@ Example YAML Configuration for the Script
110
110
111
111
pos_file : " mean_md-pos-1.xyz"
112
112
frc_file : " mean_md-frc-1.xyz"
113
- temperature : 300.0
114
- temperature_target : 300
115
- temperature_target_surface : 450
116
- max_displacement : 2.0
117
- max_random_displacement : 0.1
118
- surface_atom_types :
119
- - " Cs"
120
- - " Br"
121
- clustering_method : " KMeans"
122
- num_clusters : 100
123
- num_samples_pca : 1200
124
- num_samples_pca_surface : 600
125
- num_samples_randomization : 200
113
+ scaling_factor : 0.4
114
+ scaling_surf : 0.6
115
+ scaling_core : 0.4
116
+ max_random_displacement : 0.15
117
+ surface_atom_types :
118
+ - " In"
119
+ - " P"
120
+ - " Cl"
121
+ clustering_method : " KMeans"
122
+ num_clusters : 100
123
+ num_samples_pca : 1200
124
+ num_samples_pca_surface : 600
125
+ num_samples_randomization : 200
126
+ SOAP :
127
+ species : ["In", "P", "Cl"]
128
+ r_cut : 12.0
129
+ n_max : 7
130
+ l_max : 3
131
+ sigma : 0.1
132
+ periodic : False
133
+ sparse : False
126
134
127
135
Structure Generation Breakdown
128
136
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -149,26 +157,11 @@ Detailed Explanation of YAML Input Keywords
149
157
Path to the `.xyz ` file containing **corresponding atomic forces **.
150
158
- These forces are used to evaluate **structural dynamics **.
151
159
152
- - **temperature **
153
- The temperature at which the **AIMD simulation ** was run (typically in **Kelvin **).
154
- - This provides context for the **thermal behavior ** of the system.
155
-
156
- - **temperature_target **
157
- Desired **temperature for the core atoms ** during structure perturbation, ensuring they reflect **realistic thermal motion **.
158
-
159
- - **temperature_target_surface **
160
- Higher **target temperature for surface atoms ** to reflect their **increased mobility **, leading to **larger perturbations ** compared to core atoms.
161
-
162
- - **max_displacement **
163
- The **maximum allowed atomic displacement ** (in Ångströms) during structure perturbation along PCA components.
164
- - This limits how much atoms can move, maintaining **realistic structures **.
165
-
166
160
- **max_random_displacement **
167
161
The **maximum displacement ** applied in the **random sampling step **.
168
- - This value is **smaller ** than `max_displacement ` to introduce **minor random variations ** without disrupting structural integrity.
169
162
170
163
- **surface_atom_types **
171
- A list of **atomic species ** (e.g., `"Cs " `, `"Br " `) considered as **surface atoms **.
164
+ A list of **atomic species ** (e.g., `"In " `, `"Cl " `) considered as **surface atoms **.
172
165
- These atoms are **more prone to movement ** and are treated differently during **PCA sampling **.
173
166
174
167
- **clustering_method **
@@ -188,6 +181,13 @@ Detailed Explanation of YAML Input Keywords
188
181
- **num_samples_randomization **
189
182
Number of **randomly perturbed structures ** added to the dataset to increase **diversity **.
190
183
184
+ **SOAP ** refers to **Smooth Overlap of Atomic Positions **:
185
+
186
+ - **species **: adjust according to your model.
187
+ - **r_cut **: a cutoff for the neighbouring environment.
188
+ - **n_max **: max number of radial basis functions (RBF).
189
+ - **l_max **: max degree of spherical harmonics.
190
+ - **sigma **: the width of smearing.
191
191
192
192
193
193
Output Files and Visualization
@@ -517,14 +517,6 @@ Subset counts:
517
517
- ``Random ``: randomly selected structures for additional diversity.
518
518
- ``contamination ``: fraction of outliers removed by Isolation Forest.
519
519
520
- **SOAP ** refers to **Smooth Overlap of Atomic Positions **:
521
-
522
- - ``species ``: adjust according to your model.
523
- - ``r_cut ``: a cutoff for the neighbouring environment.
524
- - ``n_max ``: max number of radial basis functions (RBF).
525
- - ``l_max ``: max degree of spherical harmonics.
526
- - ``sigma ``: the width of smearing.
527
-
528
520
The output files contain:
529
521
* `consolidated_dataset `: a chunk of dataset with the most diverse structures (preferred for ML training).
530
522
* `MD_random_dataset `: random structures picked from MD data.
0 commit comments