-
Notifications
You must be signed in to change notification settings - Fork 2
/
Copy pathindex.html
1766 lines (1545 loc) · 62.2 KB
/
index.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1">
<title>pbqff: push-button quartic force fields</title>
<link rel="icon" type="image/png" href="favicon.ico">
<style>
body {
width:60%;
margin-left: auto;
margin-right: auto;
margin-bottom: 20%;
}
pre {
word-wrap: normal;
overflow-x: auto;
background-color: rgb(245, 245, 245);
padding: 10px;
}
table {
border-collapse: collapse;
border: 2px solid rgb(140 140 140);
font-family: sans-serif;
font-size: 0.8rem;
letter-spacing: 1px;
}
caption {
caption-side: bottom;
padding: 10px;
font-weight: bold;
}
th, td {
padding: 8px 10px;
text-align: left;
}
h2 a, h3 a, h4 a, h5 a {
text-decoration: none;
color: inherit;
}
h2 a:hover, h3 a:hover, h4 a:hover, h5 a:hover {
text-decoration: underline;
}
</style>
<script src=qffbuddy.js></script>
</head>
<body>
<h1>pbqff</h1>
<p>
This page contains the complete documentation for pbqff as a single HTML
page. For a quick reference see
the <a href="https://github.com/ntBre/pbqff/tree/master/man">man</a>
directory in the main repository, where you can find a manual page in both
Unix <code>man</code> and PDF formats.
</p>
<h2>Table of Contents</h2>
<ul>
<li><a href="#introduction">Introduction</a></li>
<li><a href="#program-input">Program Input</a></li>
<ul>
<li><a href="#example-input-file">Example Input File</a></li>
<li><a href="#summary-of-input-options">Summary of Input Options</a></li>
</ul>
<ul>
<li><a href="#geometry-specification-and-optimization">Geometry Specification and Optimization</a></li>
<li><a href="#charge">Charge</a></li>
<li><a href="#step-size">Step Size</a></li>
<li><a href="#sleep-interval">Sleep Interval</a></li>
<li><a href="#job-limit-and-chunk-size">Job Limit and Chunk Size</a></li>
<ul>
<li><a href="#rules-of-thumb">Rules of Thumb</a></li>
</ul>
<li><a href="#coordinate-types">Coordinate Types</a></li>
<ul>
<li><a href="#cartesian-coordinates">Cartesian Coordinates</a></li>
<li><a href="#normal-coordinates">Normal Coordinates</a></li>
<li><a href="#symmetry-internal-coordinates">Symmetry-Internal Coordinates</a></li>
</ul>
<li><a href="#quantum-chemistry-program-types">Quantum Chemistry Program Types</a></li>
<li><a href="#queue-types">Queue Types</a></li>
<li><a href="#quantum-chemistry-program-templates">Quantum Chemistry Program Templates</a></li>
<ul>
<li><a href="#mopac">MOPAC</a></li>
<li><a href="#molpro">Molpro</a></li>
<li><a href="#cfour">CFOUR</a></li>
<li><a href="#dftb+">DFTB+</a></li>
</ul>
<li><a href="#queue-templates">Queue Templates</a></li>
<ul>
<li><a href="#pbs">PBS</a></li>
<ul>
<li><a href="#mopac-">MOPAC </a></li>
<li><a href="#molpro-">Molpro </a></li>
<li><a href="#cfour-">CFOUR </a></li>
<li><a href="#dftb+-">DFTB+ </a></li>
</ul>
<li><a href="#slurm">Slurm</a></li>
<ul>
<li><a href="#mopac--">MOPAC </a></li>
<li><a href="#molpro--">Molpro </a></li>
<li><a href="#cfour--">CFOUR </a></li>
<li><a href="#dftb+--">DFTB+ </a></li>
</ul>
<li><a href="#local">Local</a></li>
<ul>
<li><a href="#mopac---">MOPAC </a></li>
<li><a href="#dftb+---">DFTB+ </a></li>
<li><a href="#molpro---">Molpro </a></li>
<li><a href="#cfour---">CFOUR </a></li>
</ul>
</ul>
<li><a href="#derivative-types">Derivative Types</a></li>
<li><a href="#checkpoints">Checkpoints</a></li>
<li><a href="#atomic-masses">Atomic Masses</a></li>
<ul>
<li><a href="#example">Example</a></li>
</ul>
<li><a href="#dummy-atoms">Dummy Atoms</a></li>
<li><a href="#resuming-normal-coordinate-qffs">Resuming Normal Coordinate QFFs</a></li>
</ul>
<li><a href="#command-line-options">Command Line Options</a></li>
<li><a href="#qffbuddy">qffbuddy</a></li>
</ul>
<h2 id=introduction><a href="#introduction">Introduction</a></h2>
<p>
pbqff is a package (really a collection of packages) for computing
anharmonic vibrational and rotational spectroscopic constants for small
molecules. It does this through the use of quartic force fields (QFFs) to
approximate the molecular potential energy function and second-order
vibrational and rotational perturbation theory, which utilize the
derivative information from the function approximation to compute the
actual spectroscopic constants.
</p>
<p>
Because this is a complex task, pbqff is broken into several reusable
components, only the final piece of which is included in this repository.
Other major components include:
</p>
<ul>
<li>
<a href="https://github.com/ntBre/symm">symm</a> for the
main <code>Molecule</code> data structure and symmetry operations
</li>
<li>
<a href="https://github.com/ntBre/psqs">psqs</a> for interfaces to
quantum chemistry programs and distributed queuing systems
</li>
<li>
<a href="https://github.com/ntBre/spectro">spectro</a> for the final
spectroscopic calculations
</li>
</ul>
<p>
pbqff itself is a relatively thin wrapper around these packages that
primarily handles tasks like generating the grid of displaced geometries
to map out the QFF in various coordinate systems. This fact isn't really
important directly, but it may help to explain some of the potential error
messages you encounter, if they reference source code
in <code>psqs</code>, for example.
</p>
<h2 id=program-input><a href="#program-input">Program Input</a></h2>
<p>
The primary input for pbqff is a <a href="https://toml.io/en/">TOML</a>
file. TOML as a format is widely used in the Rust community and is pretty
easy for humans to write, although you will end up having to type a lot of
quotes compared to something like YAML. To start, here's an example input
file for a <a href="https://github.com/openmopac/mopac">MOPAC</a> QFF on a
PBS queue:
</p>
<h4 id=example-input-file><a href="#example-input-file">Example Input File</a></h4>
<pre><code>geometry = """
O
H 1 OH
H 1 OH 2 HOH
OH = 1.0
HOH = 109.5
"""
optimize = true
charge = 0
step_size = 0.005
sleep_int = 2
job_limit = 2048
chunk_size = 1
coord_type = "norm"
template = "scfcrt=1.D-21 aux(precision=14 comp xp xs xw) PM6 THREADS=1"
queue_template = """
#!/bin/sh
#PBS -N {{.basename}}
#PBS -S /bin/bash
#PBS -j oe
#PBS -o pts/{{.basename}}.out
#PBS -W umask=022
#PBS -l walltime=100:00:00
#PBS -l cput=100:00:00
#PBS -l ncpus=1
#PBS -l mem=16gb
#PBS -q workq
module load openpbs
export WORKDIR=$PBS_O_WORKDIR
export TMPDIR=/tmp/$USER/$PBS_JOBID
export MOPAC_CMD=/ddnlus/r2518/Packages/mopac/build/mopac
export LD_LIBRARY_PATH=/ddnlus/r2518/Packages/mopac/build
cd $WORKDIR
mkdir -p $TMPDIR
trap 'rm -rf $TMPDIR' EXIT
"""
program = "mopac"
queue = "slurm"
findiff = true
check_int = 0</code></pre>
<p>
In addition to the options specified here, there are a few other optional
values like <code>hybrid_template</code> for specifying a separate quantum
chemistry program template for the cubic and quartic force
constants, <code>weights</code> for specifying atomic
weights, <code>dummy_atoms</code> for specifying a number of trailing
dummy atoms, and the very specialized <code>norm_resume_hff</code> for
resuming a normal coordinate QFF from the finished Cartesian harmonic
force field (HFF). A summary of all the possible input options is given in
the table below, followed by a more detailed description of each option.
</p>
<h4 id=summary-of-input-options><a href="#summary-of-input-options">Summary of Input Options</a></h4>
<table>
<tr>
<th>Option</th>
<th>Type</th>
<th>Description</th>
</tr>
<tr>
<td><code>geometry</code></td>
<td>String</td>
<td>The initial molecular geometry as XYZ or Z-matrix</td>
</tr>
<tr>
<td><code>optimize</code></td>
<td>bool</td>
<td>Whether or not to optimize the geometry before running the QFF</td>
</tr>
<tr>
<td><code>charge</code></td>
<td>isize</td>
<td>The molecular charge</td>
</tr>
<tr>
<td><code>step_size</code></td>
<td>f64</td>
<td>The displacement size for the QFF in Å</td>
</tr>
<tr>
<td><code>sleep_int</code></td>
<td>usize</td>
<td>Interval in seconds to sleep between job polling iterations</td>
</tr>
<tr>
<td><code>job_limit</code></td>
<td>usize</td>
<td>Maximum number of jobs to submit to the queue at once</td>
</tr>
<tr>
<td><code>chunk_size</code></td>
<td>usize</td>
<td>Number of jobs to combine into a single queue submission</td>
</tr>
<tr>
<td><code>coord_type</code></td>
<td>CoordType</td>
<td>Displacement type for the QFF. Supported values are sic, cart, and
norm</td>
</tr>
<tr>
<td><code>template</code></td>
<td>TemplateSrc</td>
<td>Quantum chemistry program template. Can either be a literal String
or a map with the key <code>file</code></td>
</tr>
<tr>
<td><code>hybrid_template</code></td>
<td> Option<TemplateSrc></td>
<td>Separate quantum chemistry program template for cubic and quartic
force constants</td>
</tr>
<tr>
<td><code>queue_template</code></td>
<td> Option<TemplateSrc></td>
<td>Queuing system template</td>
</tr>
<tr>
<td><code>program</code></td>
<td>Program</td>
<td>A quantum chemistry program name. Supported values are dftb+, mopac,
molpro, and cfour</td>
</tr>
<tr>
<td><code>queue</code></td>
<td> Queue</td>
<td>A queuing system. Supported values are pbs, slurm, and local</td>
</tr>
<tr>
<td><code>findiff</code></td>
<td> Option<bool></td>
<td>Whether to use finite differences for derivative evaluations. Not
all coordinate types support this</td>
</tr>
<tr>
<td><code>check_int</code></td>
<td>usize</td>
<td>Interval in polling iterations to write checkpoints. A value of 0
disables checkpoints</td>
</tr>
<tr>
<td><code>weights</code></td>
<td> Option<Vec<f64>></td>
<td>An optional sequence of atomic masses for use in normal coordinate
QFFs and spectroscopic calculations</td>
</tr>
<tr>
<td><code>dummy_atoms</code></td>
<td> Option<usize></td>
<td>An optional number of trailing dummy atoms</td>
</tr>
<tr>
<td><code>norm_resume_hff</code></td>
<td>Option<bool></td>
<td>Resume a normal coordinate QFF from a previously-finished HFF
checkpoint</td>
</tr>
</table>
<h3 id=geometry-specification-and-optimization><a href="#geometry-specification-and-optimization">Geometry Specification and Optimization</a></h3>
<p>
The <code>geometry</code> can be specified as either an XYZ geometry with
an optional number of atoms and comment line or as a Z-matrix. Currently,
pbqff cannot automatically convert between Z-matrix and XYZ geometries,
so <code>optimize = true</code> is required if the geometry is provided as
a Z-matrix. This allows pbqff to extract the Cartesian geometry from the
quantum chemistry program's output file and use that for the geometrical
displacements. For completeness, here is an example XYZ geometry for
cyclopropenylidene:
</p>
<pre><code>5
c3h2 geometry
C 0.0000000000 0.0000000000 -0.8887390447
C 0.0000000000 -0.6627588746 0.3682596138
C 0.0000000000 0.6627588746 0.3682596138
H 0.0000000000 -1.5952720351 0.9069548903
H 0.0000000000 1.5952720351 0.9069548903
</code></pre>
<p>
Again, the first two lines are optional, but if either one is present then
both must be present. And an example Z-matrix geometry for water:
</p>
<pre><code>O
H 1 OH
H 1 OH 2 HOH
OH = 1.0
HOH = 109.5
</code></pre>
<p>
Aside from the caveats above about Z-matrix input,
the <code>optimize</code> keyword tells pbqff whether or not to optimize
the geometry before building the rest of the QFF. Because the quality of
the Taylor series approximation underlying the QFF is tied to the
assumption that the input geometry is a minimum on the potential energy
surface (PES), any geometries provided with <code>optimize = false</code>
should definitely have been optimized prior to the pbqff run and at the
same level of theory as the QFF.
</p>
<h3 id=charge><a href="#charge">Charge</a></h3>
<p>
<code>charge</code> is possibly the most straightforward input option as
it is simply the molecular charge as a whole number. As we'll see in later
sections, this won't necessarily be used if you omit
a <code>{{.charge}}</code> entry in your program template, but the
option must be provided in the pbqff input file even in those cases.
</p>
<h3 id=step-size><a href="#step-size">Step Size</a></h3>
<p>
<code>step_size</code> sets the displacement size for the QFF in units of
Ångstroms. 0.005 is a reasonable default, with larger values like 0.010 or
even 0.020 helping occasionally with floppy systems.
</p>
<h3 id=sleep-interval><a href="#sleep-interval">Sleep Interval</a></h3>
<p>
pbqff uses a polling model for running jobs in parallel. On each iteration
of a big loop that runs until all jobs have finished, pbqff checks each
output file for the currently-running jobs to see if any have finished.
For very short single-point energy calculations (like PM6 or HF/STO-3G)
this makes a lot of sense because many jobs are likely to finish
constantly and each iteration will make some progress. On the other hand,
for long-running individual jobs (like CCSD(T)-F12/cc-pCVTZ-F12 on a large
molecule), pbqff will waste a lot of CPU time constantly checking the same
output files that may not have changed meaningfully since the last polling
interval. Thus, <code>sleep_int</code> is an interval in seconds for pbqff
to sleep between such polling attempts. Higher values should cause pbqff
to spend less CPU time reading the same files over and over, but extremely
high values could potentially lead to periods of no jobs running because
submitting more jobs as old ones finish is also part of the polling loop.
In practice, I usually use a minimum of 60 seconds for this and go up to
~600 seconds for what I consider really long jobs. The options in the next
section also have some effect on what to choose here.
</p>
<h3 id=job-limit-and-chunk-size><a href="#job-limit-and-chunk-size">Job Limit and Chunk Size</a></h3>
<p>
These two options, <code>job_limit</code> and <code>chunk_size</code> are
in the same section because they are very closely related. Early versions
of pbqff submitted each single-point energy calculation as a separate
queue submission limited by the <code>job_limit</code> setting. However,
this was inefficient enough to attract the attention of the HPC
administrators who recommended combining multiple program invocations into
a single submission script. This led to the addition of
the <code>chunk_size</code> option, which sets the number of invocations
per submission script (chunk). Hopefully this is already clear, but below
is an illustration of a chunk of MOPAC jobs for a local queue, where "job
submission" is handled by a very simple bash script. This would correspond
to a <code>chunk_size</code> of 5.
</p>
<pre><code>/opt/mopac/mopac job.0000000.mop
/opt/mopac/mopac job.0000001.mop
/opt/mopac/mopac job.0000002.mop
/opt/mopac/mopac job.0000003.mop
/opt/mopac/mopac job.0000004.mop
</code></pre>
<p>
Because of this legacy, <code>job_limit</code> still corresponds to
individual calculations, despite the fact that they are now grouped into
chunks. As a result, if you want to restrict the number of jobs as
reported by your queue manager you should consider the
ratio <code>job_limit / chunk_size</code> rather
than <code>job_limit</code> alone. For example, to target a maximum number
of queued jobs of 500 with a chunk size of 25, you could set
a <code>job_limit</code> of <code>500 * 25 = 12_500</code>. Note the use
of the underscore as a digit separator. This is not required in TOML but
can be handy for large numbers.
</p>
<p>
The main consideration when choosing a <code>chunk_size</code> is the
amount of time spent submitting jobs versus actually running jobs.
According to our system administrators, the cutoff (likely pretty
conservative) is around 5 minutes of job time. This can lead to pretty
crazy chunk sizes, especially for PM6 calculations on small molecules,
where single-point energy computations can take on the order of 0.01
seconds (<code>5 minutes = 300 seconds | 1 job / 0.01 seconds => 30_000
jobs per chunk</code>!). However, more expensive calculations can easily
take five minutes or more each, allowing you to use a chunk size of 1.
Individual jobs give pbqff the greatest freedom to queue new jobs as old
ones finish, but, again, it's possible to spend more time writing and
submitting separate jobs (and taxing the queue system) than actually
running calculations.
</p>
<p>
Of course, the preceding paragraphs all assume that you're running pbqff
on a large molecule with many single-point calculations <i>and</i> that
you're on some kind of distributed compute cluster. If, on the other hand,
you're running pbqff on a personal computer or a single node on a cluster
using the local queue option, you will only be able to run a very small
number of chunks at once anyway (roughly one per CPU or thread you allowed
pbqff to use). In this case, there's no real incentive to shrink
your <code>chunk_size</code> to allow more flexible submission. As long as
your chunk size is small enough to send one chunk to each core you
requested, all of your compute will be saturated already. For example, you
wouldn't want to request a chunk size of 250 for a 1000-job QFF and
request 8 cores because only 4 cores would get chunks to work on, but
there's no real reason to set a chunk size of 25 either. It will just lead
to a lot more shell scripts being written. The logic is similar for a
small molecule like a diatomic where the total number of points may be
smaller than a reasonable chunk size for other calculations.
</p>
<h4 id=rules-of-thumb><a href="#rules-of-thumb">Rules of Thumb</a></h4>
<p>
This table contains my general heuristics for <code>job_limit</code>
and <code>chunk_size</code> requests on our compute cluster, where the
hard limit on queued jobs is around 1000, but most of the time I can only
get ~50 jobs to run at once anyway.
</p>
<table>
<tr>
<th>Level of Theory</th>
<th>Time Per Job (s)</th>
<th>Job Limit</th>
<th>Chunk Size</th>
</tr>
<tr>
<td>PM6 (5 atoms)</td>
<td>0.01</td>
<td>1_500_000</td>
<td>30_000</td>
</tr>
<tr>
<td>F12-DZ (5 atoms)</td>
<td> 1</td>
<td> 15_000</td>
<td> 300</td>
</tr>
<tr>
<td>TZ-cCR (8 atoms)</td>
<td> 300</td>
<td> 50</td>
<td> 1</td>
</tr>
</table>
<h3 id=coordinate-types><a href="#coordinate-types">Coordinate Types</a></h3>
<p>
pbqff originally grew out of two separate projects: <code>go-cart</code>,
which was my first implementation of finite-difference
Cartesian-coordinate QFFs, and pbqff proper, which automated our group's
traditional least-squares fitted symmetry-internal coordinate (SIC) QFFs.
These two (SICs and Cartesian coordinates) are still supported, along with
the newest but already most-used normal-coordinate QFFs. Fully documenting
the construction of a symmetry-internal coordinate system is beyond the
scope of this manual (at least for now but likely forever) because it
involves a lot of practice and manual construction. Thus, the two
recommended coordinate systems for pbqff are Cartesian coordinates,
requested by <code>coord_type = "cart"</code> and normal coordinates,
requested by <code>coord_type = "norm"</code>.
</p>
<h4 id=cartesian-coordinates><a href="#cartesian-coordinates">Cartesian Coordinates</a></h4>
<p>
Cartesian coordinates are the most straightforward coordinate system. The
initial geometry is either already in Cartesian (XYZ) form or can be
converted from a Z-matrix by the quantum chemistry program. Then, each of
these coordinates is displaced in all of the two-, three-, and
four-coordinate combinations needed to map out the QFF. It is
theoretically possible to approximate the PES with a least-squares fit in
Cartesian coordinates, but this is not currently implemented. I tried this
in a previous version of pbqff, and it led to significant bottlenecks in
the least-squares fit due to the much larger number of Cartesian points
compared to SICs without any improvement (and occasional worsening) of the
results. In other words, Cartesian coordinates imply <code>findiff =
true</code>.
</p>
<h4 id=normal-coordinates><a href="#normal-coordinates">Normal Coordinates</a></h4>
<p>
Normal coordinates are generally a clear improvement over Cartesian
coordinates. First, the VPT2 implementation in spectro only requires the
semi-diagonal normal coordinate force constants (Φ<sub>iijj</sub> and
Φ<sub>iiii</sub>), but these require the fully-off-diagonal Cartesian
coordinate force constants like F<sub>ijkl</sub> to compute. Similarly,
and even more straightforwardly, normal coordinates discard the rotational
and translational degrees of freedom present in Cartesian coordinates,
lowering the number of coordinates from <code>3N</code> to <code>3N
− 6</code>. Combined, these factors allow for far fewer single-point
energies to be evaluated in normal coordinates.
</p>
<p>
Further, normal coordinates share the convenience of Cartesian
coordinates: they can be derived automatically from the input geometry.
This may not be clear from this manual because of the limited treatment of
SICs, but this is the primary contrast between SICs and normal coordinates
because SICs also require far fewer points than Cartesians, putting them
on more equal footing with normal coordinates in terms of computational
cost (but still worse for even medium-sized molecules) but certainly not
in convenience.
</p>
<p>
Of course, the similarity to Cartesian coordinates is no accident. Normal
coordinates can only be derived automatically by computing a Cartesian
harmonic force field (HFF) to obtain the Hessian matrix, from which the
mass-weighted eigenvalues can be extracted to yield the normal
coordinates. This separation of the HFF and QFF leads to two additional
options for normal coordinate QFFs: <code>hybrid_template</code>
and <code>norm_resume_hff</code>. These are both discussed more fully in
later sections, but in brief, <code>hybrid_template</code> allows you to
use the plain <code>template</code> as the template for the HFF points and
the <code>hybrid_template</code> for the cubic and quartic points;
and <code>norm_resume_hff</code> allows you to resume a normal coordinate
QFF from a checkpoint after the initial HFF.
</p>
<p>
Normal coordinates are also the only coordinate type for which
the <code>findiff</code> option actually has any effect. SICs are assumed
to be fitted (<code>findiff = false</code>) and Cartesians are assumed to
be finite-difference. However, because of the way normal coordinates are
constructed (via the initial HFF), fitted normal coordinates are not refit
to zero the approximated function gradient as was typical with SICs.
Again, this is likely not to make much sense unless you're familiar with
the SIC procedure itself, but in short, this means normal coordinates
don't really capture the benefits SICs got from performing this fitting,
and finite differences are more economical, so stick with finite
differences (<code>findiff = true</code>) unless you really know what
you're doing.
</p>
<p>
Finally, normal coordinates are not without their limitations. Their main
shortcoming relates to the mass-weighting involved in their definition,
which causes the atomic masses to be baked into the resulting force
constants in a way that is not easy to reverse. Practically, this means
that computing isotopic spectroscopic data with normal coordinate QFFs
requires recomputing the QFF for each combination
of <a href="#atomic-masses">atomic masses</a>. Second, because they are
based on the Cartesian HFF, any numerical issues in the initial HFF can be
dramatically compounded in the QFF. This mostly arises with highly
symmetrical molecules (symmetric and spherical tops) with many resonances,
but it can also pop up for other tricky molecules. It should be pretty
obvious when this happens (anharmonic frequencies of 10,000 or more), but
it's something to keep in the back of your mind.
</p>
<h4 id=symmetry-internal-coordinates><a href="#symmetry-internal-coordinates">Symmetry-Internal Coordinates</a></h4>
<p>
As mentioned above, fully documenting the SIC code could double the size
of this document. All I'll say here is that running an SIC QFF in pbqff
requires the top of an INTDER input file (the simple- and
symmetry-internal coordinate definitions) in a file
named <code>intder.in</code> in the directory where you invoke pbqff. If
there's a resurgence of interest in SICs, I'd like to handle this kind of
input in a more compact way and without requiring a separate file. Feel
free to send me an email or open a GitHub issue if you are interested in
this or try using it and encounter any problems.
</p>
<h3 id=quantum-chemistry-program-types><a href="#quantum-chemistry-program-types">Quantum Chemistry Program Types</a></h3>
<p>
As mentioned in the introduction, pbqff offloads the interface to both
chemistry programs and queuing systems to the psqs library. As listed in
the psqs README, it currently supports writing input files and reading
output files for the following programs:
</p>
<ul>
<li>MOPAC</li>
<li>Molpro</li>
<li>CFOUR</li>
<li>DFTB+</li>
</ul>
<p>
Where these are listed roughly in order of support quality. MOPAC and
Molpro have been tested the most thoroughly, while CFOUR and DFTB+ have
been tested but to a much more limited extent. Each of these can be
requested in the pbqff input file using the <code>program</code> keyword
with valid options being the <b>lowercase</b> versions of each program
name as a TOML
string: <code>"mopac"</code>, <code>"molpro"</code>, <code>"cfour"</code>, <code>"dftb+"</code>.
</p>
<h3 id=queue-types><a href="#queue-types">Queue Types</a></h3>
<p>
Like the <code>program</code> option above, the <code>queue</code> option
depends on the values supported by psqs, which are the following:
</p>
<ul>
<li>PBS</li>
<li>Slurm</li>
<li>Local</li>
</ul>
<p>
PBS and Slurm are popular queuing systems on HPC clusters, while the local
queue is a way to run pbqff on a single computer. Again, these are listed
roughly in order of support. Hundreds of QFFs have been run with pbqff on
a PBS cluster, while a much smaller number have been run on a smaller
Slurm cluster. The local queue is used in all of the pbqff unit tests, so
by that metric it is arguably the best-tested, but it essentially has no
error handling, making it suited really only for these tests and other
short test calculations rather than "production" QFFs.
</p>
<p>
These can be selected by their lowercase TOML string forms with
the <code>queue</code> keyword: <code>"pbs"</code>, <code>"slurm"</code>,
and <code>"local"</code>.
<h3 id=quantum-chemistry-program-templates><a href="#quantum-chemistry-program-templates">Quantum Chemistry Program Templates</a></h3>
<!-- things to cover here include both keyword replacement and default
optimization line additions -->
<p>
The program <code>template</code> is the main way for you to customize how
pbqff runs the single-point energies that compose the QFF. The template
syntax is inspired by
Go's <a href="https://pkg.go.dev/text/template">template</a> library,
where patterns like <code>{{.geom}}</code> are replaced by the
appropriate information (somewhat obviously the geometry in this example).
Currently each supported program has its own set of supported options, but
most of them are the same at least.
</p>
<p>
Most users of pbqff specify both these <code>template</code> values and
the <code>queue_template</code> discussed in the next section as a literal
TOML string, but you can also specify an external file using the syntax
below.
</p>
<pre><code>template = { file = "/path/to/template" }</code></pre>
<p>
All of the examples in the following sections will work for either
plain <code>template</code> input or for the
special <code>hybrid_template</code> option. As mentioned in the section
on <a href="#normal-coordinates">normal
coordinates</a>, <code>hybrid_template</code> allows you to run the HFF
portion of a normal coordinate QFF with one level of theory and the cubic
and quartic portions of the QFF at a second level of theory. As far as I
know, this has only been tested to work with <a href="#molpro">Molpro</a>,
so see that section below for an example input file.
</p>
<h4 id=mopac><a href="#mopac">MOPAC</a></h4>
<p>
MOPAC has the simplest <code>template</code> format because MOPAC uses a
single line at the start of the file to specify all options, followed by
two comment lines and the geometry. As such, pbqff does not handle any
template parameter replacements for MOPAC (the curly brace stuff from the
paragraphs above). Options like the charge and whether or not to run an
optimization are simply appended to the first line of the template, and
the geometry for each single-point energy calculation is appended to the
file after two arbitrary comment lines. As such, the <code>template</code>
value in a MOPAC QFF should be a single-line TOML string like those in the
examples below.
</p>
<pre><code>template = "scfcrt=1.D-21 aux(precision=14 comp xp xs xw) PM6 THREADS=1"</code></pre>
<pre><code>template = """gnorm=0.0 scfcrt=1.D-21 aux(precision=14 comp xp xs xw) \
PM6 SINGLET THREADS=1 external=params.dat"""</code></pre>
<p>
As shown in the second example, you can also use the multi-line TOML
string syntax (triple quotes) to allow very long templates to be broken
across multiple lines with a backslash. Unfortunately this is not
supported by single-line TOML strings with single quotes.
</p>
<p>
MOPAC defaults to running a geometry optimization, so the input
option <code>1SCF</code> is added to the input line to disable this for
the single-point energy computations. As far as I know it's safe to
include any geometry-optimization-specific options
(like <code>gnorm</code> above) in these single-point calculations, so
pbqff doesn't have to do any further processing on the template.
</p>
<h4 id=molpro><a href="#molpro">Molpro</a></h4>
<p>
For Molpro, pbqff supports two template
parameters: <code>{{.geom}}</code> and <code>{{.charge}}</code>,
although it's often easier to supply the charge literally in
your <code>template</code>. The <code>charge</code> option is quite
straightforward as the input <code>charge</code> option is simply
substituted directly for <code>{{.charge}}</code> in the template
string. The geometry handling, on the other hand, is slightly more
complicated. Notice that there is no closing curly brace in the templates
below. The reason for this is simply programming convenience, at a slight
risk of confusion for the user. For Z-matrix geometries, Molpro expects
the main Z-matrix to be inside the curly braces and the variable
definitions to follow the closing curly brace. In contrast, XYZ geometries
should be fully enclosed in curly braces. Omitting the closing curly brace
makes it easier for pbqff to handle this distinction.
</p>
<p>
Here's a somewhat complicated template example that captures the
flexibility of the template input for a TZ-cCR calculation:
</p>
<pre><code>template = """memory,8,g
gthresh,energy=1.d-12,zero=1.d-22,oneint=1.d-22,twoint=1.d-22;
gthresh,optgrad=1.d-8,optstep=1.d-8;
nocompress;
geometry={
{{.geom}}
basis={
default,cc-pCVTZ-f12
}
set,charge={{.charge}}
set,spin=0
hf,accuracy=16,energy=1.0d-10
{CCSD(T)-F12,thrden=1.0d-12,thrvar=1.0d-10,nocheck;core}
{optg,grms=1.d-8,srms=1.d-8}
etz=energy
basis=cc-pvtz-dk
hf,accuracy=16,energy=1.0d-10
{CCSD(T),thrden=1.0d-12,thrvar=1.0d-10,nocheck;}
edk=energy
basis=cc-pvtz-dk
dkroll=1
hf,accuracy=16,energy=1.0d-10
{CCSD(T),thrden=1.0d-12,thrvar=1.0d-10,nocheck;}
edkr=energy
pbqff=etz(2)+edkr-edk
show[1,f20.12],pbqff"""</code></pre>
<p>
In addition to the note about the missing curly brace, also note that
pbqff will only read the output energy if it is named <code>pbqff</code>
and displayed in the output using the <code>show</code> directive. For
simple calculations, this can be achieved using an expression
like <code>pbqff=energy</code>, followed by thte <code>show</code> line in
the example above. However, for F12 calculations in particular, it's
important to remember that the F12 energy in Molpro is a vector, with the
F12a and F12b energies in it. The example above uses the F12b energy
(<code>etz(2)</code>). This might look unusual if you're a programmer used
to indices starting at zero, but Fortran indices actually start at 1,
meaning <code>etz(1)</code> is the F12a energy and <code>etz(2)</code> is
the F12b energy.
</p>
<p>
Another important feature of Molpro template handling is how it adds or
subtracts geometry optimization keywords depending on the type of
calculation requested. There are four possibilities here:
</p>
<ul>
<li>Optimization requested and <code>optg</code> included in template</li>
<li>Optimization requested and no <code>optg</code> included</li>
<li>No optimization requested and <code>optg</code> included</li>
<li>No optimization requested and no <code>optg</code> included</li>
</ul>
<p>
In the first and last cases, pbqff doesn't have to make any changes to
your template. However, in the second and third cases, pbqff needs to
bring the request for an initial geometry optimization into alignment with
the provided template. It does this by either adding a
default <code>optg</code> directive
(<code>{optg,grms=1.d-8,srms=1.d-8}</code>) in the second case, or by
deleting a line matching the regular
expression <code>(?i)^.*optg(,|\s*$)</code>. If you're not familiar with
the intricacies of regular expressions, this looks for a line starting
with some optional junk (such as a curly brace), followed by the literal
word <code>optg</code>, followed by either a comma or some number of
spaces and a newline. While writing this, it doesn't seem quite right, but
it has worked for a couple of years in practice to identify
typical <code>optg</code> lines
like <code>{optg,grms=1.d-8,srms=1.d-8}</code> and skip lines
like <code>gthresh,optgrad=1.d-8,optstep=1.d-8;</code>, which also
includes <code>optg</code> but not in the right way.
</p>
<p>
In summary, if you're picky about how the optimization should be run, be
sure to include a single-line <code>optg</code> directive of your own
choosing, and pbqff will remove it as needed for single-point energy
calculations. Otherwise you can omit an <code>optg</code> line entirely
and pbqff will include a default one as needed too.
</p>
<p>
As one final example, here's a combination <code>template</code>
and <code>hybrid_template</code> input file section for a TZ-cCR+DZ
calculation:
</p>
<pre><code>template = """memory,1,g
gthresh,energy=1.d-12,zero=1.d-22,oneint=1.d-22,twoint=1.d-22;
gthresh,optgrad=1.d-8,optstep=1.d-8;
nocompress;
geometry={
{{.geom}}
basis={
default,cc-pCVTZ-f12
}
set,charge={{.charge}}
set,spin=0
hf,accuracy=16,energy=1.0d-10
{CCSD(T)-F12,thrden=1.0d-12,thrvar=1.0d-10,nocheck;core}
{optg,grms=1.d-8,srms=1.d-8}
etz=energy
basis=cc-pvtz-dk
hf,accuracy=16,energy=1.0d-10
{CCSD(T),thrden=1.0d-12,thrvar=1.0d-10,nocheck;}
edk=energy
basis=cc-pvtz-dk
dkroll=1
hf,accuracy=16,energy=1.0d-10
{CCSD(T),thrden=1.0d-12,thrvar=1.0d-10,nocheck;}
edkr=energy
pbqff=etz(2)+edkr-edk
show[1,f20.12],pbqff"""
hybrid_template = """memory,1,g
gthresh,energy=1.d-12,zero=1.d-22,oneint=1.d-22,twoint=1.d-22;
gthresh,optgrad=1.d-8,optstep=1.d-8;
nocompress;
geometry={
{{.geom}}
basis={
default,cc-pVDZ-f12
}
set,charge={{.charge}}
set,spin=0
hf,accuracy=16,energy=1.0d-10
{CCSD(T)-F12,thrden=1.0d-12,thrvar=1.0d-10}
{optg,grms=1.d-8,srms=1.d-8}
pbqff=energy(2)
show[1,f20.12],pbqff
"""</code></pre>
<p>
If renamed just to <code>template</code>, the <code>hybrid_template</code>
would serve as an F12-DZ template itself.
</p>
<h4 id=cfour><a href="#cfour">CFOUR</a></h4>
<p>
Like with Molpro, pbqff also supports the <code>{{.geom}}</code>
and <code>{{.charge}}</code> keywords for CFOUR. Additionally, pbqff
recognizes a <code>{{.keywords}}</code> template parameter that it
needs to know where to insert things like <code>COORD=CARTESIAN</code> for
the single-point energy calculations. One, admittedly strange, difference
with the CFOUR <code>{{.charge}}</code> parameter is that it also adds
the prefix <code>CHARGE=</code> as expected by CFOUR. With these
considerations in mind, an example CFOUR template for a CCSD/cc-pVTZ QFF
is shown below.
</p>
<pre><code>comment line
{{.geom}}
*CFOUR(CALC=CCSD,BASIS=PVTZ,MEMORY_SIZE=8,MEM_UNIT=GB,REF=RHF,MULT=1
{{.charge}}
{{.keywords}})</code></pre>
<p>
Unlike Molpro, there is no special handling of geometry optimization. If
you request a geometry optimization, pbqff assumes that you have marked
all of the variables in the Z-matrix to optimize with <code>*</code> as
expected by CFOUR and will extract the optimized Cartesian geometry from