Skip to content

Commit 441f7fb

Browse files
authored
Update Dataset_Preparation.rst
1 parent c51ef4e commit 441f7fb

File tree

1 file changed

+5
-2
lines changed

1 file changed

+5
-2
lines changed

docs/Machine_Learning_Force_Fields/Dataset_Preparation.rst

+5-2
Original file line numberDiff line numberDiff line change
@@ -508,26 +508,29 @@ Detailed Explanation of YAML Input :
508508
- ``input_file``: specifies the input file name.
509509
- ``output_prefix``: specifies the prefix of the output files
510510
- ``sizes``: creates chunks of different sizes.
511+
511512
**Subset counts**
513+
512514
- ``MD``: structures obtained from Molecular Dynamics (MD) simulation. Adjust according to your data.
513515
- ``PCA``: structures obtained from Principal Component Analysis (PCA).
514516
- ``PCA_Surface``: Surface'focused structures from PCA sampling.
515517
- ``Random``: Randomly selected structures for additional diversity.
516518
- ``contamination``: Fraction of outliers removed by Isolation Forest.
517519

518520
**SOAP** refers to **Smooth Overlap of Atomic Positions**:
521+
519522
- ``species``: adjust according to your model.
520523
- ``r_cut``: a cutoff for the neighbouring environment.
521524
- ``n_max``: max number of radial basis functions (RBF).
522525
- ``l_max``: max degree of shperical harmonics.
523526
- ``sigma``: the width of smearing.
524527

525-
The output files contain:
528+
The output files contain:
526529
* `consolidated_dataset`: a chunk of dataset with the most diverse structures (preferred for ML training).
527530
* `MD_random_dataset`: random structures picked from MD data.
528531
* `random_dataset`: random structures from the whole dataset.
529532

530-
Choose the subset preferred for your method and convert according `xyz` file to `npz` format using:
533+
Choose the subset preferred for your method and convert according `xyz` file to `npz` format using:
531534

532535
.. code-block:: bash
533536

0 commit comments

Comments
 (0)