Skip to content

Commit 55c2b55

Browse files
authored
input.yaml updated.rst
1 parent d7426cc commit 55c2b55

File tree

1 file changed

+29
-37
lines changed

1 file changed

+29
-37
lines changed

docs/Machine_Learning_Force_Fields/Dataset_Preparation.rst

+29-37
Original file line numberDiff line numberDiff line change
@@ -110,19 +110,27 @@ Example YAML Configuration for the Script
110110
111111
pos_file: "mean_md-pos-1.xyz"
112112
frc_file: "mean_md-frc-1.xyz"
113-
temperature: 300.0
114-
temperature_target: 300
115-
temperature_target_surface: 450
116-
max_displacement: 2.0
117-
max_random_displacement: 0.1
118-
surface_atom_types:
119-
- "Cs"
120-
- "Br"
121-
clustering_method: "KMeans"
122-
num_clusters: 100
123-
num_samples_pca: 1200
124-
num_samples_pca_surface: 600
125-
num_samples_randomization: 200
113+
scaling_factor: 0.4
114+
scaling_surf: 0.6
115+
scaling_core: 0.4
116+
max_random_displacement: 0.15
117+
surface_atom_types:
118+
- "In"
119+
- "P"
120+
- "Cl"
121+
clustering_method: "KMeans"
122+
num_clusters: 100
123+
num_samples_pca: 1200
124+
num_samples_pca_surface: 600
125+
num_samples_randomization: 200
126+
SOAP:
127+
species: ["In", "P", "Cl"]
128+
r_cut: 12.0
129+
n_max: 7
130+
l_max: 3
131+
sigma: 0.1
132+
periodic: False
133+
sparse: False
126134
127135
Structure Generation Breakdown
128136
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -149,26 +157,11 @@ Detailed Explanation of YAML Input Keywords
149157
Path to the `.xyz` file containing **corresponding atomic forces**.
150158
- These forces are used to evaluate **structural dynamics**.
151159

152-
- **temperature**
153-
The temperature at which the **AIMD simulation** was run (typically in **Kelvin**).
154-
- This provides context for the **thermal behavior** of the system.
155-
156-
- **temperature_target**
157-
Desired **temperature for the core atoms** during structure perturbation, ensuring they reflect **realistic thermal motion**.
158-
159-
- **temperature_target_surface**
160-
Higher **target temperature for surface atoms** to reflect their **increased mobility**, leading to **larger perturbations** compared to core atoms.
161-
162-
- **max_displacement**
163-
The **maximum allowed atomic displacement** (in Ångströms) during structure perturbation along PCA components.
164-
- This limits how much atoms can move, maintaining **realistic structures**.
165-
166160
- **max_random_displacement**
167161
The **maximum displacement** applied in the **random sampling step**.
168-
- This value is **smaller** than `max_displacement` to introduce **minor random variations** without disrupting structural integrity.
169162

170163
- **surface_atom_types**
171-
A list of **atomic species** (e.g., `"Cs"`, `"Br"`) considered as **surface atoms**.
164+
A list of **atomic species** (e.g., `"In"`, `"Cl"`) considered as **surface atoms**.
172165
- These atoms are **more prone to movement** and are treated differently during **PCA sampling**.
173166

174167
- **clustering_method**
@@ -188,6 +181,13 @@ Detailed Explanation of YAML Input Keywords
188181
- **num_samples_randomization**
189182
Number of **randomly perturbed structures** added to the dataset to increase **diversity**.
190183

184+
**SOAP** refers to **Smooth Overlap of Atomic Positions**:
185+
186+
- **species**: adjust according to your model.
187+
- **r_cut**: a cutoff for the neighbouring environment.
188+
- **n_max**: max number of radial basis functions (RBF).
189+
- **l_max**: max degree of spherical harmonics.
190+
- **sigma**: the width of smearing.
191191

192192

193193
Output Files and Visualization
@@ -517,14 +517,6 @@ Subset counts:
517517
- ``Random``: randomly selected structures for additional diversity.
518518
- ``contamination``: fraction of outliers removed by Isolation Forest.
519519

520-
**SOAP** refers to **Smooth Overlap of Atomic Positions**:
521-
522-
- ``species``: adjust according to your model.
523-
- ``r_cut``: a cutoff for the neighbouring environment.
524-
- ``n_max``: max number of radial basis functions (RBF).
525-
- ``l_max``: max degree of spherical harmonics.
526-
- ``sigma``: the width of smearing.
527-
528520
The output files contain:
529521
* `consolidated_dataset`: a chunk of dataset with the most diverse structures (preferred for ML training).
530522
* `MD_random_dataset`: random structures picked from MD data.

0 commit comments

Comments
 (0)