Skip to content

Commit 1e5cbcf

Browse files
mitcharfmharding-hpe
authored andcommitted
CASMINST-2969: Clarify some install steps; Miscellaneous doc linting
1 parent 15a7629 commit 1e5cbcf

File tree

334 files changed

+2326
-2315
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

334 files changed

+2326
-2315
lines changed

README.md

+3-3
Original file line numberDiff line numberDiff line change
@@ -15,9 +15,9 @@ for Markdown tools will provide a long list of these tools. Some of the tools ar
1515
at displaying the images and allowing you to follow the navigational links.
1616

1717
The exploration of the CSM documentation begins with the Table of Contents in
18-
the [Cray System Management Installation Guide](index.md) which introduces
18+
the [Cray System Management Installation Guide](index.md) which introduces
1919
topics related to CSM software installation, upgrade, and operational use. Notice that the
20-
previous sentence had a link to the index.md file for the Cray System Management Installation Guide.
20+
previous sentence had a link to the index.md file for the Cray System Management Installation Guide.
2121
If the link does not work, then a better Markdown viewer is needed.
2222

2323
Within this README.md file, these topics are described.
@@ -59,7 +59,7 @@ review serves to keep core contributors in alignment while maintaining coherency
5959
the documentation.
6060

6161
<a name="releases"></a>
62-
### Releases
62+
### Releases
6363

6464
This guide follows a basic release model for enabling amendments and maintenance across releases.
6565

background/certificate_authority.md

+9-9
Original file line numberDiff line numberDiff line change
@@ -13,13 +13,13 @@ installation, there is no supported method to rotate or change the platform CA i
1313
<a name="overview"></a>
1414
## Overview
1515

16-
At *install time*, a PKI certificate authority (CA) can either be generated for a system, or a customer can opt to supply their own (intermediate) CA.
16+
At *install time*, a PKI certificate authority (CA) can either be generated for a system, or a customer can opt to supply their own (intermediate) CA.
1717

18-
> Outside of a new installation, there is currently no supported method to rotate (change) the platform CA. The ability to rotate CAs is anticipated as part of a future release.
18+
> Outside of a new installation, there is currently no supported method to rotate (change) the platform CA. The ability to rotate CAs is anticipated as part of a future release.
1919
2020
Sealed Secrets, part of shasta-cfg, are used by the installation process to inject CA material in an encrypted form. Vault (cray-vault instance) ultimately sources and stores the CA from a K8S secret (result of decrypting the corresponding Sealed Secret).
2121

22-
The resulting CA will be used to sign multiple workloads on the platform (Ingress, mTLS for PostgreSQL Clusters, Spire, ...).
22+
The resulting CA will be used to sign multiple workloads on the platform (Ingress, mTLS for PostgreSQL Clusters, Spire, ...).
2323

2424
> Management of Sealed Secrets should ideally take place on a secure workstation.
2525
@@ -63,21 +63,21 @@ spec:
6363
...
6464
```
6565

66-
> The ```platform_ca``` generator will produce RSA CAs with a 3072-bit modulus, using SHA256 as the base signature algorithm.
66+
> The ```platform_ca``` generator will produce RSA CAs with a 3072-bit modulus, using SHA256 as the base signature algorithm.
6767
6868
<a name="customize_platform_generated_ca"></a>
6969
## Customize Platform Generated CA
7070

71-
The ```platform_ca``` generator inputs can be customized, if desired. Notably, the ```root_days```, ```int_days```, ```root_cn```, and ```int_cn``` fields can be modified. While the shasta-cfg documentation on the use of generators supplies additional detail, the ```*_days``` settings control the validity period and the ```*_cn``` settings control the common name value for the resulting CA certificates. Ensure the Sealed Secret name reference in ```spec.kubernetes.services.cray-vault.sealedSecrets``` is updated if you opt to use a different name.
71+
The ```platform_ca``` generator inputs can be customized, if desired. Notably, the ```root_days```, ```int_days```, ```root_cn```, and ```int_cn``` fields can be modified. While the shasta-cfg documentation on the use of generators supplies additional detail, the ```*_days``` settings control the validity period and the ```*_cn``` settings control the common name value for the resulting CA certificates. Ensure the Sealed Secret name reference in ```spec.kubernetes.services.cray-vault.sealedSecrets``` is updated if you opt to use a different name.
7272

73-
> Outside of a new installation, there is currently no supported method to rotate (change) the platform CA. Please set validity periods accordingly. The ability to rotate CAs is anticipated as part of a future release.
73+
> Outside of a new installation, there is currently no supported method to rotate (change) the platform CA. Please set validity periods accordingly. The ability to rotate CAs is anticipated as part of a future release.
7474
7575
<a name="use_external_ca"></a>
7676
## Use External CA
7777

78-
The ```static_platform_ca``` generator, part of shasta-cfg, can be used to supply an external CA private key, certificate, and associated upstream CAs that form the trust chain. The generator will attempt to prevent you from supplying a root CA. You must also supply the entire trust chain up to the root CA certificate.
78+
The ```static_platform_ca``` generator, part of shasta-cfg, can be used to supply an external CA private key, certificate, and associated upstream CAs that form the trust chain. The generator will attempt to prevent you from supplying a root CA. You must also supply the entire trust chain up to the root CA certificate.
7979

80-
> Outside of a new installation, there is currently no supported method to rotate (change) the platform CA. Please ensure validity periods are set accordingly for external CAs you use in this process. The ability to rotate CAs is anticipated as part of a future release.
80+
> Outside of a new installation, there is currently no supported method to rotate (change) the platform CA. Please ensure validity periods are set accordingly for external CAs you use in this process. The ability to rotate CAs is anticipated as part of a future release.
8181
8282
Here is an example ```customizations.yaml``` snippet illustrating the generator input to inject a static CA:
8383

@@ -203,4 +203,4 @@ spec:
203203
...
204204
```
205205

206-
> Only RSA-based CAs with 3072- or 4096-bit moduli, using RSA256 as a signature/digest algorithm have been tested/are supported. Also note, the generator does not support password-protected private keys.
206+
> Only RSA-based CAs with 3072- or 4096-bit moduli, using RSA256 as a signature/digest algorithm have been tested/are supported. Also note, the generator does not support password-protected private keys.

background/cloud-init_basecamp_configuration.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -268,7 +268,7 @@ data:
268268
<a name="ntp"></a>
269269
## NTP
270270

271-
cloud-init modifications to NTP.
271+
cloud-init modifications to NTP.
272272

273273
> script: `/srv/cray/scripts/metal/set-ntp-config.sh`
274274
---

background/cray_site_init_files.md

+3-3
Original file line numberDiff line numberDiff line change
@@ -2,16 +2,16 @@
22

33
This page describes administrative knowledge around the pre-config files to `csi` or the output files from `csi`.
44

5-
> Information for collecting certain files starts in [Configuration Payload](../install/prepare_configuration_payload.md)
5+
> Information for collecting certain files starts in [Configuration Payload](../install/prepare_configuration_payload.md)
66
77
* [`application_node_config.yaml`](../install/prepare_configuration_payload.md#application_node_config_yaml)
88
* [`cabinets.yaml`](../install/prepare_configuration_payload.md#cabinets_yaml)
99
* [`hmn_connections.json`](../install/prepare_configuration_payload.md#hmn_connections_json)
1010
* [`ncn_metadata.csv`](../install/prepare_configuration_payload.md#ncn_metadata_csv)
1111
* [`switch_metadata.csv`](../install/prepare_configuration_payload.md#switch_metadata_csv)
1212

13-
### Topics:
14-
13+
### Topics:
14+
1515
* [Save-File / Avoiding Parameters](#save-file--avoiding-parameters)
1616

1717
## Details

background/index.md

+2-2
Original file line numberDiff line numberDiff line change
@@ -30,8 +30,8 @@ software, but provides background which might be helpful for troubleshooting an
3030
* [`switch_metadata.csv`](../install/prepare_configuration_payload.md#switch_metadata_csv)
3131

3232
In addition, after running `csi` with those pre-config files, `csi` creates an output `system_config.yaml`
33-
file which can be passed to `csi` when reinstalling this software release.
34-
33+
file which can be passed to `csi` when reinstalling this software release.
34+
3535
See [Cray Site Init Files](cray_site_init_files.md) for more information about these files.
3636

3737
<a name="certificate_authority"></a>

background/ncn_bios.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@ This page denotes BIOS settings that are desirable for non-compute nodes.
1212
| Intel® Hyper-Threading (e.g. HT) | `Enabled` | Enables two-threads per physical core. | Leverage the full performance of the CPU, the higher thread-count assists with parallel tasks within the processor(s). | Within the Processor or the PCH Menu.
1313
| Intel® Virtualization Technology (e.g. VT-x, VT) and AMD Virtualization Technology (e.g. AMD-V)| `Enabled` | Enables Virtual Machine extensions. | Provides added CPU support for hypervisors and more for the virtualized plane within Shasta. | Within the Processor or the PCH Menu.
1414
| PXE Retry Count | 1 or 2 (default: 1) | Attempts done on a single boot-menu option (note: 2 should be set for systems with unsolved network congestion). | If networking is working nominally, then the interface either works or does not. Retrying the same NIC should not work, if it does then there are networking problems that need to be addressed. | Within the Networking Menu, and then under Network Boot.
15-
| PXE Timeout | 5 Seconds (or less, never more) | The time that the PXE ROM will wait for a DHCP handshake to complete before moving on to the next boot device. | If DHCP is working nominally, then the DHCP handshake should not take longer than 5 seconds. This timeout could be increased where networking faults cannot be reconciled, but ideally this should be tuned to 3 or 2 seconds. |
15+
| PXE Timeout | 5 Seconds (or less, never more) | The time that the PXE ROM will wait for a DHCP handshake to complete before moving on to the next boot device. | If DHCP is working nominally, then the DHCP handshake should not take longer than 5 seconds. This timeout could be increased where networking faults cannot be reconciled, but ideally this should be tuned to 3 or 2 seconds. |
1616
| Continuous Boot | `Disabled` | Whether boot-group (e.g. all network devices, or all disk devices) should continuously retry. This prevents fall-through to the fallback disks. | We want deterministic nodes in Shasta, if the boot fails the first tier we want the node to try the next tier of boot mediums before failing at a shell or menu for intervention. |
1717

1818
> **`NOTE`** **PCIe** options can be found in [PCIe : Setting Expected Values](switch_pxe_boot_from_onboard_nic_to_pcie.md#setting-expected-values).

background/ncn_boot_workflow.md

+6-6
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,7 @@ Non-compute nodes boot two ways:
2323
There are two different methods for determining whether a management node is booted using disk or
2424
PXE. The method to use will vary depending on the system environment.
2525

26-
1. Check kernel parameters.
26+
1. Check kernel parameters.
2727

2828
```bash
2929
ncn# cat /proc/cmdline
@@ -126,7 +126,7 @@ Setting the boot order with efibootmgr will ensure that the desired network inte
126126
> <a name="hewlett-packard-enterprise"></a>
127127
> #### Hewlett-Packard Enterprise
128128
>
129-
>
129+
>
130130
> ##### Masters
131131
>
132132
> ```bash
@@ -139,7 +139,7 @@ Setting the boot order with efibootmgr will ensure that the desired network inte
139139
> ncn-m# efibootmgr -o $(cat /tmp/bbs* | awk '!x[$0]++' | sed 's/^Boot//g' | awk '{print $1}' | tr -t '*' ',' | tr -d '\n' | sed 's/,$//') | grep -i bootorder
140140
> BootOrder: 0014,0018,0021,0022
141141
> ```
142-
>
142+
>
143143
> ##### Storage
144144
>
145145
> ```bash
@@ -152,7 +152,7 @@ Setting the boot order with efibootmgr will ensure that the desired network inte
152152
> ncn-s# efibootmgr -o $(cat /tmp/bbs* | awk '!x[$0]++' | sed 's/^Boot//g' | awk '{print $1}' | tr -t '*' ',' | tr -d '\n' | sed 's/,$//') | grep -i bootorder
153153
> BootOrder: 001C,001D,0002,0020
154154
> ```
155-
>
155+
>
156156
> ##### Workers
157157
>
158158
> ```bash
@@ -165,7 +165,7 @@ Setting the boot order with efibootmgr will ensure that the desired network inte
165165
> ncn-w# efibootmgr -o $(cat /tmp/bbs* | awk '!x[$0]++' | sed 's/^Boot//g' | awk '{print $1}' | tr -t '*' ',' | tr -d '\n' | sed 's/,$//') | grep -i bootorder
166166
> BootOrder: 0012,0017,0018
167167
> ```
168-
>
168+
>
169169
> <a name="intel-corporation"></a>
170170
> #### Intel Corporation
171171
>
@@ -315,7 +315,7 @@ Reset the BIOS. Refer to vendor documentation for resetting the BIOS or attempt
315315
> ```
316316
317317
1. Reset BIOS with ipmitool
318-
318+
319319
```bash
320320
ncn# ipmitool chassis bootdev none options=clear-cmos
321321
```

background/ncn_images.md

+10-10
Original file line numberDiff line numberDiff line change
@@ -47,7 +47,7 @@ To boot an NCN, you need 3 artifacts for each node-type (kubernetes-manager/work
4747
/var/www/ephemeral/data/ceph:
4848
total 4
4949
drwxr-xr-x 2 root root 4096 Dec 17 21:42 0.0.7
50-
50+
5151
/var/www/ephemeral/data/k8s:
5252
total 4
5353
drwxr-xr-x 2 root root 4096 Dec 17 21:26 0.0.8
@@ -119,57 +119,57 @@ To boot an NCN, you need 3 artifacts for each node-type (kubernetes-manager/work
119119
-rw-r--r-- 1 dnsmasq tftp 700352 Dec 15 09:35 ipxe.efi.stable
120120
-rw-r--r-- 1 root root 6157 Dec 15 05:12 script.ipxe
121121
-rw-r--r-- 1 root root 6284 Dec 17 13:21 script.ipxe.rpmnew
122-
122+
123123
ephemeral:
124124
total 32
125125
drwxr-xr-x 2 root root 4096 Dec 6 22:18 configs
126126
drwxr-xr-x 4 root root 4096 Dec 7 04:29 data
127127
drwx------ 2 root root 16384 Dec 2 04:25 lost+found
128128
drwxr-xr-x 4 root root 4096 Dec 3 02:31 prep
129129
drwxr-xr-x 2 root root 4096 Dec 2 04:45 static
130-
130+
131131
ncn-m001:
132132
total 4
133133
lrwxrwxrwx 1 root root 53 Dec 26 06:11 filesystem.squashfs -> ../ephemeral/data/k8s/0.0.8/kubernetes-0.0.8.squashfs
134134
lrwxrwxrwx 1 root root 47 Dec 26 06:11 initrd.img.xz -> ../ephemeral/data/k8s/0.0.8/initrd.img-0.0.8.xz
135135
lrwxrwxrwx 1 root root 61 Dec 26 06:11 kernel -> ../ephemeral/data/k8s/0.0.8/5.3.18-24.37-default-0.0.8.kernel
136-
136+
137137
ncn-m002:
138138
total 4
139139
lrwxrwxrwx 1 root root 53 Dec 26 06:11 filesystem.squashfs -> ../ephemeral/data/k8s/0.0.8/kubernetes-0.0.8.squashfs
140140
lrwxrwxrwx 1 root root 47 Dec 26 06:11 initrd.img.xz -> ../ephemeral/data/k8s/0.0.8/initrd.img-0.0.8.xz
141141
lrwxrwxrwx 1 root root 61 Dec 26 06:11 kernel -> ../ephemeral/data/k8s/0.0.8/5.3.18-24.37-default-0.0.8.kernel
142-
142+
143143
ncn-m003:
144144
total 4
145145
lrwxrwxrwx 1 root root 53 Dec 26 06:11 filesystem.squashfs -> ../ephemeral/data/k8s/0.0.8/kubernetes-0.0.8.squashfs
146146
lrwxrwxrwx 1 root root 47 Dec 26 06:11 initrd.img.xz -> ../ephemeral/data/k8s/0.0.8/initrd.img-0.0.8.xz
147147
lrwxrwxrwx 1 root root 61 Dec 26 06:11 kernel -> ../ephemeral/data/k8s/0.0.8/5.3.18-24.37-default-0.0.8.kernel
148-
148+
149149
ncn-s001:
150150
total 4
151151
lrwxrwxrwx 1 root root 56 Dec 26 06:11 filesystem.squashfs -> ../ephemeral/data/ceph/0.0.7/storage-ceph-0.0.7.squashfs
152152
lrwxrwxrwx 1 root root 48 Dec 26 06:11 initrd.img.xz -> ../ephemeral/data/ceph/0.0.7/initrd.img-0.0.7.xz
153153
lrwxrwxrwx 1 root root 62 Dec 26 06:11 kernel -> ../ephemeral/data/ceph/0.0.7/5.3.18-24.37-default-0.0.7.kernel
154-
154+
155155
ncn-s002:
156156
total 4
157157
lrwxrwxrwx 1 root root 56 Dec 26 06:11 filesystem.squashfs -> ../ephemeral/data/ceph/0.0.7/storage-ceph-0.0.7.squashfs
158158
lrwxrwxrwx 1 root root 48 Dec 26 06:11 initrd.img.xz -> ../ephemeral/data/ceph/0.0.7/initrd.img-0.0.7.xz
159159
lrwxrwxrwx 1 root root 62 Dec 26 06:11 kernel -> ../ephemeral/data/ceph/0.0.7/5.3.18-24.37-default-0.0.7.kernel
160-
160+
161161
ncn-s003:
162162
total 4
163163
lrwxrwxrwx 1 root root 56 Dec 26 06:11 filesystem.squashfs -> ../ephemeral/data/ceph/0.0.7/storage-ceph-0.0.7.squashfs
164164
lrwxrwxrwx 1 root root 48 Dec 26 06:11 initrd.img.xz -> ../ephemeral/data/ceph/0.0.7/initrd.img-0.0.7.xz
165165
lrwxrwxrwx 1 root root 62 Dec 26 06:11 kernel -> ../ephemeral/data/ceph/0.0.7/5.3.18-24.37-default-0.0.7.kernel
166-
166+
167167
ncn-w002:
168168
total 4
169169
lrwxrwxrwx 1 root root 53 Dec 26 06:11 filesystem.squashfs -> ../ephemeral/data/k8s/0.0.8/kubernetes-0.0.8.squashfs
170170
lrwxrwxrwx 1 root root 47 Dec 26 06:11 initrd.img.xz -> ../ephemeral/data/k8s/0.0.8/initrd.img-0.0.8.xz
171171
lrwxrwxrwx 1 root root 61 Dec 26 06:11 kernel -> ../ephemeral/data/k8s/0.0.8/5.3.18-24.37-default-0.0.8.kernel
172-
172+
173173
ncn-w003:
174174
total 4
175175
lrwxrwxrwx 1 root root 53 Dec 26 06:11 filesystem.squashfs -> ../ephemeral/data/k8s/0.0.8/kubernetes-0.0.8.squashfs

background/ncn_mounts_and_file_systems.md

+7-7
Original file line numberDiff line numberDiff line change
@@ -42,7 +42,7 @@ Partitioning is controlled by two aspects:
4242
| Node Type | No. of "small" disks (0.5 TiB) | No. of "large" disks (1.9 TiB) |
4343
| --- |:---:|:---:|
4444
| k8s-master nodes | 3 | 0
45-
| k8s-worker nodes | 2 | 1
45+
| k8s-worker nodes | 2 | 1
4646
| ceph-storage nodes | 2 | 3+
4747

4848
Disks are chosen by dracut. Kubernetes and storage nodes use different dracut modules.
@@ -54,7 +54,7 @@ Disks are chosen by dracut. Kubernetes and storage nodes use different dracut mo
5454

5555
The master nodes and worker nodes use the same artifacts, and thus have the same dracut modules assimilating disks. Therefore, it is important
5656
to beware of:
57-
- k8s-master nodes with one or more extra "large" disk(s); these disks help but are unnecessary
57+
- k8s-master nodes with one or more extra "large" disk(s); these disks help but are unnecessary
5858
- ceph-storage nodes do not run the same dracut modules because they have different disk demands
5959

6060
<a name="worker-nodes-with-etcd"></a>
@@ -66,7 +66,7 @@ easily.
6666
<a name="disable-luks"></a>
6767
##### Disable Luks
6868

69-
> **`NOTE`** This is broken, use the [expand RAID](#expand-the-raid) option instead. (MTL-1309)
69+
> **`NOTE`** This is broken, use the [expand RAID](#expand-the-raid) option instead. (MTL-1309)
7070
7171
All NCNs (master/worker/storage) have the same kernel parameters, but are not always necessary. This method works by toggling the dependency
7272
for the metal ETCD module, disabling LUKs will disable ETCD bare-metal creation.
@@ -140,7 +140,7 @@ The above table's rows with overlayFS map their "Mount Paths" to the "Upper Dire
140140

141141
> The "OverlayFS Name" is the name used in fstab and seen in the output of `mount`.
142142

143-
| OverlayFS Name | Upper Directory | Lower Directory (or more)
143+
| OverlayFS Name | Upper Directory | Lower Directory (or more)
144144
| --- | --- | --- |
145145
| `etcd_overlayfs` | `/run/lib-etcd` | `/var/lib/etcd` |
146146
| `containerd_overlayfs` | `/run/lib-containerd` | `/var/lib/containerd` |
@@ -164,7 +164,7 @@ There are a few overlays used for NCN image boots. These enable two critical fun
164164
> 2. `losetup -a` will show where the squashFS is mounted from
165165
> 3. `mount | grep ' / '` will show you the overlay being layered atop the squashFS
166166

167-
Let us pick apart the `SQFSRAID` and `ROOTRAID` overlays.
167+
Let us pick apart the `SQFSRAID` and `ROOTRAID` overlays.
168168
- `/run/rootfsbase` is the SquashFS image itself
169169
- `/run/initramfs/live` is the squashFS's storage array, where one or more squashFS can live
170170
- `/run/initramfs/overlayfs` is the overlayFS storage array, where the persistent directories live
@@ -359,7 +359,7 @@ There are two options one can leave enabled to accomplish this:
359359
360360
For long-term usage, `rd.live.overlay.readonly=1` should be added to the command line.
361361
362-
The `reset=1` toggle is usually used to fix a problematic overlay. If you want to refresh
362+
The `reset=1` toggle is usually used to fix a problematic overlay. If you want to refresh
363363
and purge the overlay completely, then use `rd.live.overlay.reset`.
364364
365365
@@ -391,7 +391,7 @@ rd.live.overlay.size=307200
391391
392392
# Use a 1 TiB overlayFS
393393
rd.live.overlay.size=1000000
394-
```
394+
```
395395
396396
<a name="thin-overlay-feature"></a>
397397
##### Thin Overlay Feature

background/ncn_networking.md

+3-3
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# NCN Networking
22

3-
Non-compute nodes and compute nodes have different network interfaces used for booting, this topic focuses on
3+
Non-compute nodes and compute nodes have different network interfaces used for booting, this topic focuses on
44
the network interfaces for management nodes.
55

66
### Topics:
@@ -72,11 +72,11 @@ lspci | grep c6:00.0
7272
```
7373

7474
The Device and Vendor IDs are used in iPXE for bootstrapping the nodes, this allows generators to
75-
swap IDs out for certain systems until smarter logic can be added to cloud-init.
75+
swap IDs out for certain systems until smarter logic can be added to cloud-init.
7676

7777
The following table includes popular vendor and device IDs.
7878

79-
> The bolded numbers are the defaults that live in metal-ipxe's boot script.
79+
> The bolded numbers are the defaults that live in `metal-ipxe`'s boot script.
8080
8181
| Vendor | Model | Device ID | Vendor ID |
8282
| :---- | :---- | :-----: | :---------: |

0 commit comments

Comments
 (0)