Skip to content

Commit d298663

Browse files
committed
Documentation improvements
Improved the documentation to make it clearer and more readable. Mostly, I added a bunch of commas because, in general, commas make the world a better, safer place. Isn't that right, Grandma?
1 parent 68a8ee4 commit d298663

25 files changed

+248
-191
lines changed

background/ncn_boot_workflow.md

+31-2
Original file line numberDiff line numberDiff line change
@@ -29,15 +29,44 @@ PXE. The method to use will vary depending on the system environment.
2929
ncn# cat /proc/cmdline
3030
```
3131

32-
If it starts with `kernel` then the node network booted. If it starts with `BOOT_IMAGE=(` then it disk booted.
32+
If it starts with `kernel`, then the node network booted. If it starts with `BOOT_IMAGE=(`, then it disk booted.
3333

3434
1. Check output from `efibootmgr`.
3535

3636
```bash
3737
ncn# efibootmgr
3838
```
3939

40-
The `BootCurrent` value should be matched to the list beneath to see if it lines up with a networking option or a `cray sd*)` option for disk boots.
40+
The `BootCurrent` value should be matched to the list beneath it to see if it lines up with a networking option or a `cray sd*)` option for disk boots.
41+
42+
```bash
43+
ncn# efibootmgr
44+
BootCurrent: 0016 <---- BootCurrent
45+
Timeout: 2 seconds
46+
BootOrder: 0000,0011,0013,0014,0015,0016,0017,0005,0007,0018,0019,001A,001B,001C,001D,001E,001F,0020,0021,0012
47+
Boot0000* cray (sda1)
48+
Boot0001* UEFI: Built-in EFI Shell
49+
Boot0005* UEFI: PXE IP4 Mellanox Network Adapter - B8:59:9F:1D:D8:4E
50+
Boot0007* UEFI: PXE IP4 Mellanox Network Adapter - B8:59:9F:1D:D8:4F
51+
Boot0010* UEFI: AMI Virtual CDROM0 1.00
52+
Boot0011* cray (sdb1)
53+
Boot0012* UEFI: Built-in EFI Shell
54+
Boot0013* UEFI OS
55+
Boot0014* UEFI OS
56+
Boot0015* UEFI: AMI Virtual CDROM0 1.00
57+
Boot0016* UEFI: SanDisk <--- Matches here
58+
Boot0017* UEFI: SanDisk, Partition 2
59+
Boot0018* UEFI: HTTP IP4 Intel(R) I350 Gigabit Network Connection
60+
Boot0019* UEFI: PXE IP4 Intel(R) I350 Gigabit Network Connection
61+
Boot001A* UEFI: HTTP IP4 Mellanox Network Adapter - B8:59:9F:1D:D8:4E
62+
Boot001B* UEFI: HTTP IP4 Mellanox Network Adapter - B8:59:9F:1D:D8:4F
63+
Boot001C* UEFI: HTTP IP4 Intel(R) I350 Gigabit Network Connection
64+
Boot001D* UEFI: PXE IP4 Intel(R) I350 Gigabit Network Connection
65+
Boot001E* UEFI: PXE IP6 Intel(R) I350 Gigabit Network Connection
66+
Boot001F* UEFI: PXE IP6 Intel(R) I350 Gigabit Network Connection
67+
Boot0020* UEFI: PXE IP6 Mellanox Network Adapter - B8:59:9F:1D:D8:4E
68+
Boot0021* UEFI: PXE IP6 Mellanox Network Adapter - B8:59:9F:1D:D8:4F
69+
```
4170

4271
<a name="set-bmcs-to-dhcp"></a>
4372
### Set BMCs to DHCP

index.md

+2-2
Original file line numberDiff line numberDiff line change
@@ -12,8 +12,8 @@ perform their function as Kubernetes master nodes, Kubernetes worker nodes, or u
1212
nodes with the Ceph storage.
1313

1414
System services on these nodes are provided as containerized micro-services packaged for deployment
15-
as helm charts. These services are orchestrated by Kubernetes to be scheduled on Kubernetes worker
16-
nodes with horizontal scaling to increase or decrease the number of instances of some services as
15+
via Helm charts. Kubernetes orchestrates these services and schedules them on Kubernetes worker
16+
nodes with horizontal scaling. Horizontal scales increases or decreases the number of service instances as
1717
demand for them varies, such as when booting many compute nodes or application nodes.
1818

1919
This information is intended for system installers, system administrators, and network administrators

install/boot_livecd_virtual_iso.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -115,4 +115,4 @@ tar xf /run/overlay.tar.gz -C /run/overlayfs/rw
115115
mount -o remount /
116116
```
117117

118-
If you excluded the `squashfs` files from the backup you will also need to repopulate them following the configuration section.
118+
If you excluded the `squashfs` files from the backup, you will also need to repopulate them following the configuration section.

install/bootstrap_livecd_remote_iso.md

+15-15
Original file line numberDiff line numberDiff line change
@@ -30,8 +30,8 @@ The LiveCD Remote ISO has known compatibility issues for nodes from certain vend
3030
### 2. Attaching and Booting the LiveCD with the BMC
3131

3232
> **Warning:** If this is a re-installation on a system that still has a USB device from a prior
33-
> installation then that USB device must be wiped before continuing. Failing to wipe the USB, if present, may result in confusion.
34-
> If the USB is booted still then it can wipe itself using the [basic wipe from Wipe NCN Disks for Reinstallation](wipe_ncn_disks_for_reinstallation.md#basic-wipe). If it is not booted, please do so and wipe it _or_ disable the USB ports in the BIOS (not available for all vendors).
33+
> installation, then that USB device must be wiped before continuing. Failing to wipe the USB, if present, may result in confusion.
34+
> If the USB is still booted, then it can wipe itself using the [basic wipe from Wipe NCN Disks for Reinstallation](wipe_ncn_disks_for_reinstallation.md#basic-wipe). If it is not booted, please do so and wipe it _or_ disable the USB ports in the BIOS (not available for all vendors).
3535
3636
Obtain and attach the LiveCD cray-pre-install-toolkit ISO file to the BMC. Depending on the vendor of the node,
3737
the instructions for attaching to the BMC will differ.
@@ -109,7 +109,7 @@ On first login (over SSH or at local console) the LiveCD will prompt the adminis
109109
1. Setup Variables.
110110

111111
```bash
112-
# The IPv4 Address for the nodes external interface(s); this will be provided if not already by the site's network administrator or network authority.
112+
# The IPv4 Address for the nodes external interface(s); this will be provided, if not already by the site's network administrator or network authority.
113113
pit# site_ip=172.30.XXX.YYY/20
114114
pit# site_gw=172.30.48.1
115115
pit# site_dns=172.30.84.40
@@ -126,7 +126,7 @@ On first login (over SSH or at local console) the LiveCD will prompt the adminis
126126
pit# /root/bin/csi-setup-lan0.sh $site_ip $site_gw $site_dns $site_nics
127127
```
128128

129-
1. (recommended) print `lan0`, and if it has an IP address then exit console and log in again using SSH. The
129+
1. (recommended) print `lan0`, and if it has an IP address, then exit console and log in again using SSH. The
130130
SSH connection will provide larger window sizes and better bufferhandling (screen wrapping).
131131

132132
```bash
@@ -237,7 +237,7 @@ On first login (over SSH or at local console) the LiveCD will prompt the adminis
237237

238238
1. Download and install/upgrade the workaround and documentation RPMs.
239239

240-
If this machine does not have direct Internet access these RPMs will need to be externally downloaded and then copied to the system.
240+
If this machine does not have direct Internet access, these RPMs will need to be externally downloaded and then copied to the system.
241241

242242
**Important:** In an earlier step, the CSM release plus any patches, workarounds, or hotfixes
243243
were downloaded to a system using the instructions in [Check for Latest Workarounds and Documentation Updates](../update_product_stream/index.md#workarounds). Use that set of RPMs rather than downloading again.
@@ -265,14 +265,14 @@ On first login (over SSH or at local console) the LiveCD will prompt the adminis
265265
- `ncn_metadata.csv`
266266
- `switch_metadata.csv`
267267
- `system_config.yaml` (see below)
268-
269-
> The optional `application_node_config.yaml` file may be provided for further defining of settings relating to how application nodes will appear in HSM for roles and subroles. See [Create Application Node YAML](create_application_node_config_yaml.md)
270-
268+
269+
> The optional `application_node_config.yaml` file may be provided to further assign application nodes to roles and subroles in the HSM. See [Create Application Node YAML](create_application_node_config_yaml.md)
270+
271271
> The optional `cabinets.yaml` file allows cabinet naming and numbering as well as some VLAN overrides. See [Create Cabinets YAML](create_cabinets_yaml.md).
272-
273-
> The `system_config.yaml` is required for a reinstall, because it was created during a previous install. For a first time install, the information in it can be provided as command line arguments to `csi config init`.
274-
275-
272+
273+
> The `system_config.yaml` is required for a reinstall because it was created during a previous install. For a first time install, the information in it can be provided as command line arguments to `csi config init`.
274+
275+
276276
1. Change into the preparation directory.
277277

278278
```bash
@@ -281,9 +281,9 @@ On first login (over SSH or at local console) the LiveCD will prompt the adminis
281281
```
282282

283283
After gathering the files into this working directory, generate your configurations.
284-
285-
1. If doing a reinstall and have the `system_config.yaml` parameter file avail available, then generate the system configuration reusing this parameter file (see [avoiding parameters](../background/cray_site_init_files.md#save-file--avoiding-parameters)).
286-
284+
285+
1. If doing a reinstall and have the `system_config.yaml` parameter file available, then generate the system configuration reusing this parameter file (see [avoiding parameters](../background/cray_site_init_files.md#save-file--avoiding-parameters)).
286+
287287
If not doing a reinstall of Shasta software, then the `system_config.yaml` file will not be available, so skip the rest of this step.
288288

289289
1. Check for the configuration files. The needed files should be in the current directory.

install/collect_mac_addresses_for_ncns.md

+7-7
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
# Collect MAC Addresses for NCNs
22

33
Now that the PIT node has been booted with the LiveCD and the management network switches have been configured,
4-
the actual MAC address for the management nodes can be collected. This process will include repetition of some
4+
the actual MAC addresses for the management nodes can be collected. This process will include repetition of some
55
of the steps done up to this point because `csi config init` will need to be run with the proper
66
MAC addresses and some services will need to be restarted.
77

@@ -33,7 +33,7 @@ See [Collecting BMC MAC Addresses](collecting_bmc_mac_addresses.md).
3333
<a name="restart_services_after_bmc_mac_addresses_collected"></a>
3434
### 2. Restart Services after BMC MAC Addresses Collected
3535

36-
The previous step updated `ncn_metadata.csv` with the BMC MAC Addresses so several earlier steps need to be repeated.
36+
The previous step updated `ncn_metadata.csv` with the BMC MAC Addresses, so several earlier steps need to be repeated.
3737

3838
1. Change into the preparation directory.
3939

@@ -132,7 +132,7 @@ making a backup of them, in case they need to be examined at a later time.
132132

133133
1. Check that IP addresses are set for each interface and investigate any failures.
134134

135-
1. Check IP addresses, do not run tests if these are missing and instead start triage.
135+
1. Check IP addresses. Do not run tests if these are missing and instead start triaging the issue.
136136

137137
```bash
138138
pit# wicked show bond0 vlan002 vlan004 vlan007
@@ -166,7 +166,7 @@ making a backup of them, in case they need to be examined at a later time.
166166
addr: ipv4 10.254.1.4/17 [static]
167167
```
168168

169-
1. Run tests, inspect failures.
169+
1. Run tests; inspect failures.
170170

171171
```bash
172172
pit# csi pit validate --network
@@ -244,7 +244,7 @@ so several earlier steps need to be repeated.
244244
pit# cat ncn_metadata.csv
245245
```
246246

247-
1. Remove the incorrectly generated configs. Before deleting the incorrectly generated configs consider
247+
1. Remove the incorrectly generated configs. Before deleting the incorrectly generated configs, consider
248248
making a backup of them, in case they need to be examined at a later time.
249249

250250
> **`WARNING`** Ensure that the `SYSTEM_NAME` environment variable is correctly set.
@@ -325,7 +325,7 @@ making a backup of them, in case they need to be examined at a later time.
325325
326326
1. Check that IP addresses are set for each interface and investigate any failures.
327327
328-
1. Check IP addresses, do not run tests if these are missing and instead start triage.
328+
1. Check IP addresses. Do not run tests if these are missing and instead start triaging the issue.
329329
330330
```bash
331331
pit# wicked show bond0 vlan002 vlan004 vlan007
@@ -359,7 +359,7 @@ making a backup of them, in case they need to be examined at a later time.
359359
addr: ipv4 10.254.1.4/17 [static]
360360
```
361361
362-
1. Run tests, inspect failures.
362+
1. Run tests; inspect failures.
363363
364364
```bash
365365
pit# csi pit validate --network

install/collecting_ncn_mac_addresses.md

+10-9
Original file line numberDiff line numberDiff line change
@@ -5,8 +5,8 @@ you will have the MAC addresses needed for the Bootstrap MAC, Bond0 MAC0, and Bo
55

66
The Bootstrap MAC address will be used for identification of this node during the early part of the PXE boot process before the bonded interface can be established.
77
The Bond0 MAC0 and Bond0 MAC1 are the MAC addresses for the physical interfaces that your node will use for the various VLANs.
8-
The Bond0 MAC0 and Bond0 MAC1 should be on the different network cards to establish redundancy for a failed network card.
9-
On the other hand, if the node has only a single network card, then MAC1 and MAC0 will still produce a valid configuration if they do reside on the same physical card.
8+
The Bond0 MAC0 and Bond0 MAC1 should be on different network cards to establish redundancy in case either network card fails.
9+
On the other hand, if the node only has a single network card, then MAC1 and MAC0 will still produce a valid configuration if they reside on the same physical card.
1010

1111
#### Sections
1212

@@ -18,13 +18,13 @@ On the other hand, if the node has only a single network card, then MAC1 and MAC
1818

1919
The easy way to do this leverages the NIC-dump provided by the metal-ipxe package. This page will walk-through
2020
booting NCNs and collecting their MACs from the conman console logs.
21-
> The alternative is to use serial cables (or SSH) to collect the MACs from the switch ARP tables, this can become exponentially difficult for large systems.
21+
> The alternative is to use serial cables (or SSH) to collect the MACs from the switch ARP tables, which can become exponentially difficult for large systems.
2222
> If this is the only way, please proceed to the bottom of this page.
2323
2424
<a name="procedure-ipxe-consoles"></a>
2525
## Procedure: iPXE Consoles
2626

27-
This procedure is faster for those with the LiveCD (CRAY Pre-Install Toolkit) it can be used to quickly
27+
This procedure is faster for those with the LiveCD (CRAY Pre-Install Toolkit). It can be used to quickly
2828
boot-check nodes to dump network device information without an operating system. This works by accessing the PCI Configuration Space.
2929

3030
<a name="requirements"></a>
@@ -72,7 +72,7 @@ For help with either of those, see [LiveCD Setup](bootstrap_livecd_remote_iso.md
7272
pit# sleep 10
7373
pit# grep -oP "($mtoken|$stoken|$wtoken)" /etc/dnsmasq.d/statics.conf | sort -u | xargs -t -i ipmitool -I lanplus -U $USERNAME -E -H {} power on
7474
```
75-
4. Now wait for the nodes to netboot. You can follow them with `conman -j ncn-*id*-mgmt` (use `conman -q` to see ). This takes less than 3 minutes, speed depends on how quickly your nodes POST.
75+
4. Now wait for the nodes to netboot. You can follow them with `conman -j ncn-*id*-mgmt` (use `conman -q` to see the list of nodes). This takes less than 3 minutes, speed depends on how quickly your nodes POST.
7676
5. Print off what has been found in the console logs, this snippet will omit duplicates from multiple boot attempts:
7777
```bash
7878
pit# for file in /var/log/conman/*; do
@@ -101,7 +101,7 @@ For help with either of those, see [LiveCD Setup](bootstrap_livecd_remote_iso.md
101101
grep -Eoh '(net[0-9] MAC .*)' $file | sort -u | grep PCI | grep -Ev "$did" && echo -----
102102
done
103103
```
104-
7. Examine the output from `grep` to identify the MAC address that make up Bond0 for each management NCN, use the lowest value MAC address per PCIe card.
104+
7. Examine the output from `grep` to identify the MAC address that make up Bond0 for each management NCN. Use the lowest value MAC address per PCIe card.
105105

106106
> example: 1 PCIe card with 2 ports for a total of 2 ports per node.\
107107

@@ -127,7 +127,7 @@ For help with either of those, see [LiveCD Setup](bootstrap_livecd_remote_iso.md
127127
-----
128128
```
129129

130-
The above output identified MAC0 and MAC1 of the bond as 94:40:c9:5f:b5:df and 14:02:ec:da:b9:99 respectively.
130+
The above output identified MAC0 and MAC1 of the bond as 94:40:c9:5f:b5:df and 14:02:ec:da:b9:98 respectively.
131131

132132
8. Collect the NCN MAC address for the PIT node. This information will be used to populate the MAC addresses for ncn-m001.
133133

@@ -140,7 +140,7 @@ For help with either of those, see [LiveCD Setup](bootstrap_livecd_remote_iso.md
140140
9. Update `ncn_metadata.csv` with the collected MAC addresses for Bond0 from all of the management NCNs.
141141
> Tip: Mind the index (3, 2, 1.... ; not 1, 2, 3)
142142

143-
For each NCN update the corresponding row in `ncn_metadata` with the values for Bond0 MAC0 and Bond0 MAC1. The Bootstrap MAC should have the same value as the Bond0 MAC0.
143+
For each NCN, update the corresponding row in `ncn_metadata` with the values for Bond0 MAC0 and Bond0 MAC1. The Bootstrap MAC should have the same value as the Bond0 MAC0.
144144

145145
```
146146
Xname,Role,Subrole,BMC MAC,Bootstrap MAC,Bond0 MAC0,Bond0 MAC1
@@ -179,7 +179,8 @@ is quicker.
179179

180180
If you have an incorrect `ncn_metadata.csv` file, you will be unable to deploy the NCNs. This section details a recovery procedure in case that happens.
181181

182-
1. Remove the incorrectly generated configurations. Before deleting the incorrectly generated configurations, consider making a backup of them. In case they need to be examined at a later time.
182+
1. Remove the incorrectly generated configurations. Before deleting the incorrectly generated configurations, consider making a backup of them, in case, they need to be examined at a later time.
183+
183184

184185
> **`WARNING`** Ensure that the `SYSTEM_NAME` environment variable is correctly set. If `SYSTEM_NAME` is
185186
> not set the command below could potentially remove the entire prep directory.

install/configure_administrative_access.md

+6-7
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@
33
There are several operations which configure administrative access to different parts of the system.
44
Ensuring that the `cray` CLI can be used by administrative credentials enables use of many management
55
services via commands. The management nodes can be locked from accidental manipulation by the
6-
`cray capmc` and `cray fas` commands when then intent is to work on the entire system except the
6+
`cray capmc` and `cray fas` commands when the intent is to work on the entire system except the
77
management nodes. The `cray scsd` command can change the SSH keys, NTP server, syslog server, and
88
BMC/controller passwords.
99

@@ -38,9 +38,9 @@ BMC/controller passwords.
3838
APIs into easily usable commands.
3939

4040
Later procedures in the installation workflow use the `cray` command to interact with multiple services.
41-
The `cray` CLI configuration needs to be initialized for the Linux account and the keycloak user credentials
42-
used in initialization running the procedure needs to be authorized for administrative actions.
43-
41+
The `cray` CLI configuration needs to be initialized for the Linux account. The Keycloak user who initializes the
42+
CLI configuration needs to be authorized for administrative actions.
43+
4444
See [Configure the Cray Command Line Interface (cray CLI)](../operations/configure_cray_cli.md)
4545
<a name="lock_management_nodes"></a>
4646
1. Lock Management Nodes
@@ -53,9 +53,8 @@ BMC/controller passwords.
5353
If a single node is taken down by mistake, it is possible that things will recover. However, if all management
5454
nodes are taken down, or all Kubernetes worker nodes are taken down by mistake, the system is dead and has to be
5555
completely restarted.
56-
57-
Lock the management nodes **now**!
58-
56+
**Lock the management nodes now!**
57+
5958
See [Lock and Unlock Nodes](../operations/hardware_state_manager/Lock_and_Unlock_Management_Nodes.md)
6059
<a name="configure_with_scsd"></a>
6160
1. Configure BMC and Controller Parameters with SCSD

install/configure_management_network.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ It is assumed that the administrator configuring the Management Network has a ba
99

1010
Before configuring/reconfiguring any switches make sure to get the current running config and save that in case you need to revert the config.
1111

12-
save the output of.
12+
Save the output:
1313
```
1414
show run
1515
```

0 commit comments

Comments
 (0)