Skip to content

Commit 36e91e9

Browse files
committed
Initial commit
0 parents  commit 36e91e9

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

50 files changed

+4270
-0
lines changed

.gitignore

+2
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
# Ignore 999 prefixed pages to allow for transient/unstashed pages.
2+
999*

.version

+1
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
1.6.1

000-INFO.md

+52
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,52 @@
1+
# CRAY Guide Contribution
2+
3+
Contributing is encouraged, anyone can make a pull-request.
4+
5+
All this guide asks is for a few guard-rails to keep things organized ...
6+
7+
# Page Indexing / Naming
8+
9+
The page name can be anything. This repo has a loose pattern to assist tab-completion and contextual
10+
heuristics:
11+
12+
[XYZ]-[context]-[memo].md
13+
14+
Examples:
15+
- `010-UAN-DEPLOY.md`
16+
- `040-CN-DISCOVERY.md`
17+
- `333-LIVCD-RECOVERY-GUIDE.md`
18+
19+
# Annotations
20+
21+
This repository may change annotations, for now under the MarkDown governance these are the available annotions.
22+
23+
**You must use these to denote the right steps to the right audience.**
24+
25+
These are context clues for steps, if they contain these and you are not in that context you ought to skip them.
26+
27+
> **`AIRGAP/OFFLINE USE`**
28+
29+
This tag should preface any block that is for offline install steps or procedures, where there is
30+
no online/internet connection.
31+
32+
> **`EXTERNAL USE`**
33+
34+
This tag should be used to highlight anything that an internal user should ignore or skip.
35+
36+
> **`INTERNAL USE`**
37+
38+
This tag should be before any block of instruction or text that is only usable or recommended for
39+
internal CRAY labs.
40+
41+
External (GitHub or customer) should disregard these annotated blocks - they maybe contain useful
42+
information but are not intended for their use.
43+
44+
45+
> **`PREFERRED`** use the generated files from your system inputs...
46+
> **`MANUAL`** without CPT files generated by CSI...
47+
48+
These tags denote forks in a direction, denoting a fallback or manual method of the faster/preferred
49+
method is not doable in the given context.
50+
51+
52+
[1]: https://cray.slack.com/messages/docs-csm-install

001-GUIDES.md

+10
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
# Chapters
2+
3+
These serve as guidelines to keep like-pages together.
4+
5+
- 000 - 001 INTRO : Information for contributing this book.
6+
- 002 - 049 INSTALL : Information for installing and upgrading.
7+
- 050 - 099 PROCS : Procedures referenced by install; help guides, tricks/tips, etc.
8+
- 100 - 150 NCN-META : Technical information for Non-Compute Nodes
9+
- 250 - 300 Common : Technical information common to all nodes.
10+
- 300 - 350 MFG/SVC : Procedures references by service teams.

002-LIVECD-CREATION.md

+286
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,286 @@
1+
# LiveCD Creation
2+
3+
This page will assist you with creating a LiveCD for a CSM install (formally known by "CRAY Pre-Install Toolkit") from a laptop
4+
or an existing shasta-1.3+ system.
5+
6+
> **`NOTE`** For installs using the remote mounted LiveCD (no USB stick), pay attention to your memory usage while download and also extracting artifacts.
7+
>
8+
> Remote LiveCDs run entirely in memory, which fills as artifacts are downloaded and subsequently extracted. For most cases this is fine, but in cases when RAM is limited to less than 128GB memory pressure may occur from increasing file-system usage.
9+
>
10+
> For instances where memory is scarce, an NFS/CIF or HTTP/S share can be mounted in-place of the USB's data partition at `/var/www/ephemeral`. Using the same
11+
mount point as the USB data partition will help ward off mistakes when following along.
12+
13+
## Requirements:
14+
15+
> If you are installing a system that previously had 1.3 installed, move external network connections from ncn-w001 to ncn-m001. See [MOVE-SITE-CONNECTIONS](050-MOVE-SITE-CONNECTIONS.md) for instructions.
16+
17+
1. A USB stick or other Block Device
18+
- The block device should be `>=256GB`
19+
2. The drive letter of that device (i.e. `/dev/sdx`)
20+
3. The number of mountain and river cabinets in the system.
21+
4. A set of configuration information sufficient to fill out the [listed flags for the `csi config init` command](#configuration-payload)
22+
23+
> **`INTERNAL USE`**
24+
5. Access to stash/bitbucket
25+
6. The system's CCD/SHCD `.xlsx` file
26+
27+
## Overview:
28+
29+
1. [Download and expand the CSM release](#download-and-expand-the-csm-release)
30+
2. [Install `csi`](#install-csi)
31+
3. [Create the Bootable Media](#create-the-bootable-media)
32+
4. [Gather and Create Seed Files](#gather--create-seed-files)
33+
5. [Generate the Configuration Payload](#configuration-payload)
34+
6. [Pre-Populate LiveCD OS Configuration and Daemon Files](#pre-populate-livecd-os-configuration-and-daemon-files)
35+
7. [Pre-Populate the LiveCD Data and Deployment Files](#pre-populate-the-livecd-data-and-deployment-files)
36+
37+
### Download and Expand the CSM Release
38+
39+
Download the CSM software release to the Linux host which will be preparing the LiveCD.
40+
41+
> **`INTERNAL USE`** The `ENDPOINT` URL below are for internal use, customer/external should
42+
> use the URL for the server hosting their tarball.
43+
44+
```bash
45+
linux# cd ~
46+
linux# export ENDPOINT=https://arti.dev.cray.com/artifactory/shasta-distribution-stable-local/csm/
47+
linux# export CSM_RELEASE=csm-x.y.z
48+
linux# wget ${ENDPOINT}/${CSM_RELEASE}.tar.gz
49+
```
50+
51+
Expand the CSM software release
52+
53+
```bash
54+
linux# tar -zxvf ${CSM_RELEASE}.tar.gz
55+
```
56+
57+
### Install `csi`
58+
59+
> **`IMPORTANT`** If you're using the remote ISO please skip this step and move onto [LiveCD Setup](004-LIVECD-SETUP.md), return here to "Gather / Create Seed Files" if needed.
60+
61+
Install the included Cray Site Init package from the tarball:
62+
63+
```bash
64+
rpm -Uvh ./${CSM_RELEASE}/rpm/cray/csm/sle-15sp2/x86_64/cray-site-init-*.x86_64.rpm
65+
```
66+
67+
### Create the Bootable Media
68+
69+
1. Identify the USB device.
70+
71+
This example shows the USB device is /dev/sdd on the host.
72+
73+
```bash
74+
linux# lsscsi
75+
[6:0:0:0] disk ATA SAMSUNG MZ7LH480 404Q /dev/sda
76+
[7:0:0:0] disk ATA SAMSUNG MZ7LH480 404Q /dev/sdb
77+
[8:0:0:0] disk ATA SAMSUNG MZ7LH480 404Q /dev/sdc
78+
[14:0:0:0] disk SanDisk Extreme SSD 1012 /dev/sdd
79+
[14:0:0:1] enclosu SanDisk SES Device 1012 -
80+
```
81+
82+
If building the LiveCD on ncn-m001 which is booted from a previous v1.3 install, there will be three real disks/SSDs, and the fourth disk will be the USB device. This example shows the fourth disk is clearly a different vendor than the others.
83+
84+
```bash
85+
linux# export USB=/dev/sdd
86+
```
87+
88+
2. Format the USB device
89+
90+
> **`IMPORTANT`** If you're using the remote ISO please skip this step and move onto [LiveCD Setup](004-LIVECD-SETUP.md), return here to "Gather / Create Seed Files" if needed.
91+
92+
```bash
93+
linux# csi pit format $USB ./${CSM_RELEASE}/cray-pre-install-toolkit-*.iso 50000
94+
95+
```
96+
97+
3. Create and mount the partitions needed:
98+
99+
```bash
100+
linux# mkdir -pv /mnt/{cow,pitdata}
101+
linux# mount -L cow /mnt/cow && mount -L PITDATA /mnt/pitdata
102+
```
103+
4. Unpack the release so it's available on the livecd:
104+
105+
```bash
106+
linux# tar -zxvf ~/${CSM_RELEASE}.tar.gz -C /mnt/pitdata/
107+
```
108+
109+
### Gather / Create Seed Files
110+
111+
This is the set of files that you will currently need to create or find to generate the config payload for the system:
112+
113+
1. `ncn_metadata.csv` (NCN configuration)
114+
2. `hmn_connections.json` (RedFish configuration)
115+
3. `switch_metadata.csv` (Switch configuration)
116+
4. `application_node_config.yaml` (Optional: Application node configuration for SLS file generation)
117+
118+
From these four files, you can run `csi config init` and it will generate all of the necessary config files needed for beginning an install.
119+
120+
#### ncn_metadata.csv
121+
122+
Create `ncn_metadata.csv` by referencing these two pages:
123+
124+
- [NCN Metadata BMC](301-NCN-METADATA-BMC.md)
125+
- [NCN Metadata BONDX](302-NCN-METADATA-BONDX.md)
126+
127+
#### hmn_connections.json
128+
129+
Create [hmn_connections.json](307-HMN-CONNECTIONS.md) by running a container against the CCD/SHCD spreadsheet.
130+
131+
#### switch_metadata.csv
132+
133+
Create [switch_metadata.csv](305-SWITCH-METADATA.md).
134+
135+
#### application-node-config.yaml
136+
137+
Create [application-node-config.yaml](308-APPLICATION-NODE-CONFIG.md). Optional configuration file. It allows modification to how CSI finds and treats application nodes discovered from the `hmn_connections.json` file when building the SLS Input file.
138+
139+
### Configuration Payload
140+
141+
The configuration payload comes from the `csi config init` command below.
142+
143+
1. To execute this command you will need the following:
144+
145+
> The hmn_connections.json, ncn_metadata.csv, switch_metadata.csv, and optionally application_node_config.yaml files in the current directory as well as values for the flags listed below.
146+
> If you have a `application_node_config.yaml` input file, you will need to add the flag `--application-node-config-yaml application_node_config.yaml` to `csi config init` in the example below.
147+
> If you have a `system_config.yaml` file from a previous configuration payload generated by CSI, then it can be used to supply configuration options to CSI instead of specifying CLI flags. The `system_config.yaml` must be in the current directory.
148+
> An example of the command to run with the required options.
149+
150+
```bash
151+
linux# csi config init \
152+
--bootstrap-ncn-bmc-user root \
153+
--bootstrap-ncn-bmc-pass changeme \
154+
--system-name eniac \
155+
--mountain-cabinets 0 \
156+
--hill-cabinets 0 \
157+
--river-cabinets 1 \
158+
--can-cidr 10.103.11.0/24 \
159+
--can-gateway 10.103.11.1 \
160+
--can-static-pool 10.103.11.112/28 \
161+
--can-dynamic-pool 10.103.11.128/25 \
162+
--nmn-cidr 10.252.0.0/17 \
163+
--hmn-cidr 10.254.0.0/17 \
164+
--ntp-pool time.nist.gov \
165+
--site-ip 172.30.53.79/20 \
166+
--site-gw 172.30.48.1 \
167+
--site-nic p1p2 \
168+
--site-dns 172.30.84.40 \
169+
--install-ncn-bond-members p1p1,p10p1
170+
```
171+
172+
This will generate the following files in a subdirectory with the system name.
173+
174+
```
175+
linux# ls -R eniac
176+
eniac/:
177+
basecamp conman.conf cpt-files credentials dnsmasq.d manufacturing metallb.yaml networks sls_input_file.json system_config
178+
179+
eniac/basecamp:
180+
data.json
181+
182+
eniac/cpt-files:
183+
ifcfg-bond0 ifcfg-lan0 ifcfg-vlan002 ifcfg-vlan004 ifcfg-vlan007
184+
185+
eniac/credentials:
186+
bmc_password.json mgmt_switch_password.json root_password.json
187+
188+
eniac/dnsmasq.d:
189+
CAN.conf HMN.conf mtl.conf NMN.conf statics.conf
190+
191+
eniac/manufacturing:
192+
193+
eniac/networks:
194+
CAN.yaml HMNLB.yaml HMN.yaml HSN.yaml MTL.yaml NMNLB.yaml NMN.yaml
195+
```
196+
197+
If you see warnings from `csi config init` that are similar to the warning messages below, it means that CSI encountered an unknown piece of hardware in the `hmn_connections.json` file. If you do not see this message you can move on to sub-step 2.
198+
```json
199+
{"level":"warn","ts":1610405168.8705149,"msg":"Found unknown source prefix! If this is expected to be an Application node, please update application_node_config.yaml","row":{"Source":"gateway01","SourceRack":"x3000","SourceLocation":"u33","DestinationRack":"x3002","DestinationLocation":"u48","DestinationPort":"j29"}}
200+
```
201+
If the piece of hardware is expected to be an application node then [follow the procedure to create the application_node_config.yaml](308-APPLICATION-NODE-CONFIG.md) file. The argument `--application-node-config-yaml ./application-node-config.yaml` can be given to `csi config init` to include the additional application node configuration. Due to systems having system specific application node source names in `hmn_connections.json` (and the SHCD) the `csi config init` command will need to be given additional configuration file to properly include these nodes in SLS Input file.
202+
203+
2. Clone the shasta-cfg repository for the system.
204+
> **IMPORTANT - NOTE FOR `INTERNAL`** - It is recommended to sync with STABLE after cloning if you have not already done so.
205+
> **IMPORTANT - NOTE FOR `INTERNAL`** - Configure Cray Datacenter LDAP if this hasn't been done for this system. See the section [Configuring Cray Datacenter LDAP](054-NCN-LDAP.md).
206+
207+
> **IMPORTANT - NOTE FOR `AIRGAP`** - You must do this now while preparing the USB on your local machine if your CRAY is airgapped or if it cannot otherwise reach your local GIT server.
208+
209+
```bash
210+
linux# git clone https://stash.us.cray.com/scm/shasta-cfg/eniac.git /mnt/pitdata/prep/site-init
211+
```
212+
213+
If you would like to customize the PKI Certificate Authority (CA) used by the platform, see [Customizing the Platform CA](055-CERTIFICATE-AUTHORITY.md). This is an optional step. Note that the CA can not be modified after install.
214+
215+
3. Apply workarounds
216+
217+
Check for workarounds in the `~/${CSM_RELEASE}/fix/csi-config` directory. If there are any workarounds in that directory, run those now. Instructions are in the README files.
218+
219+
```bash
220+
# Example
221+
linux# ls ~/${CSM_RELEASE}/fix/csi-config
222+
casminst-999
223+
```
224+
225+
### Pre-Populate LiveCD OS Configuration and Daemon Files
226+
227+
This is accomplished by populating the cow partition with the necessary config files generated by `csi`
228+
229+
```bash
230+
# Copy network config files and DNSMasq
231+
linux# csi pit populate cow /mnt/cow/ eniac/
232+
config------------------------> /mnt/cow/rw/etc/sysconfig/network/config...OK
233+
ifcfg-bond0-------------------> /mnt/cow/rw/etc/sysconfig/network/ifcfg-bond0...OK
234+
ifcfg-lan0--------------------> /mnt/cow/rw/etc/sysconfig/network/ifcfg-lan0...OK
235+
ifcfg-vlan002-----------------> /mnt/cow/rw/etc/sysconfig/network/ifcfg-vlan002...OK
236+
ifcfg-vlan004-----------------> /mnt/cow/rw/etc/sysconfig/network/ifcfg-vlan004...OK
237+
ifcfg-vlan007-----------------> /mnt/cow/rw/etc/sysconfig/network/ifcfg-vlan007...OK
238+
ifroute-lan0------------------> /mnt/cow/rw/etc/sysconfig/network/ifroute-lan0...OK
239+
ifroute-vlan002---------------> /mnt/cow/rw/etc/sysconfig/network/ifroute-vlan002...OK
240+
CAN.conf----------------------> /mnt/cow/rw/etc/dnsmasq.d/CAN.conf...OK
241+
HMN.conf----------------------> /mnt/cow/rw/etc/dnsmasq.d/HMN.conf...OK
242+
NMN.conf----------------------> /mnt/cow/rw/etc/dnsmasq.d/NMN.conf...OK
243+
mtl.conf----------------------> /mnt/cow/rw/etc/dnsmasq.d/mtl.conf...OK
244+
statics.conf------------------> /mnt/cow/rw/etc/dnsmasq.d/statics.conf...OK
245+
246+
# Copy Conman.conf (not copied with CSI)
247+
linux# cp -pv eniac/conman.conf /mnt/cow/rw/etc/conman.conf
248+
```
249+
250+
### Pre-Populate the LiveCD Data and Deployment Files
251+
252+
> NOTE: When running on a system that has 1.3 installed, you may hit a collision with an ip rule that prevents access to arti.dev.cray.com. If you cannot ping that name, try removing this ip rule: `ip rule del from all to 10.100.0.0/17 lookup rt_smnet`
253+
254+
Populate your live cd with the kernel, initrd, and squashfs images (KIS), as well as the basecamp configs and any files you may have in your dir that you'll want on the livecd.
255+
256+
```
257+
linux# mkdir -p /mnt/pitdata/configs/
258+
linux# mkdir -p /mnt/pitdata/data/{k8s,ceph}/
259+
260+
# 1. Copy basecamp data
261+
linux# csi pit populate pitdata ~/eniac/ /mnt/pitdata/configs -b
262+
data.json---------------------> /mnt/pitdata/configs/data.json...OK
263+
264+
# 2. Copy k8s KIS
265+
linux# csi pit populate pitdata ~/${CSM_RELEASE}/images/kubernetes/ /mnt/pitdata/data/k8s/ -k
266+
5.3.18-24.37-default-0.0.6.kernel-----------------> /mnt/pitdata/data/k8s/...OK
267+
linux# csi pit populate pitdata ~/${CSM_RELEASE}/images/kubernetes/ /mnt/pitdata/data/k8s/ -i
268+
initrd.img-0.0.6.xz-------------------------------> /mnt/pitdata/data/k8s/...OK
269+
linux# csi pit populate pitdata ~/${CSM_RELEASE}/images/kubernetes/ /mnt/pitdata/data/k8s/ -K
270+
kubernetes-0.0.6.squashfs-------------------------> /mnt/pitdata/data/k8s/...OK
271+
272+
# 3. Copy ceph/storage KIS
273+
linux# csi pit populate pitdata ~/${CSM_RELEASE}/images/storage-ceph/ /mnt/pitdata/data/ceph/ -k
274+
5.3.18-24.37-default-0.0.5.kernel-----------------> /mnt/pitdata/data/ceph/...OK
275+
linux# csi pit populate pitdata ~/${CSM_RELEASE}/images/storage-ceph/ /mnt/pitdata/data/ceph/ -i
276+
initrd.img-0.0.5.xz-------------------------------> /mnt/pitdata/data/ceph/...OK
277+
linux# csi pit populate pitdata ~/${CSM_RELEASE}/images/storage-ceph/ /mnt/pitdata/data/ceph/ -C
278+
storage-ceph-0.0.5.squashfs-----------------------> /mnt/pitdata/data/ceph/...OK
279+
280+
# 4. Copy the CSI config files to prep dir
281+
linux# cp -r ~/eniac /mnt/pitdata/prep
282+
```
283+
284+
### Next: Boot into your LiveCD.
285+
286+
Now you can boot into your LiveCD [LiveCD USB Boot](003-LIVECD-USB-BOOT.md)

0 commit comments

Comments
 (0)