1
- # Cray System Management Installation Guide
1
+ # Cray System Management Documentation
2
2
3
3
### Scope and Audience
4
4
5
- The documentation included here describes how to install or upgrade the Cray System Management (CSM)
6
- software and related supporting operational procedures to manage an HPE Cray EX system. CSM software
7
- is the foundation upon which other software product streams for the HPE Cray EX system depend.
5
+ The documentation included here describes the Cray System Management (CSM) software, how to install
6
+ or upgrade CSM software, and related supporting operational procedures to manage an HPE Cray EX system.
7
+ CSM software is the foundation upon which other software product streams for the HPE Cray EX system depend.
8
8
9
9
The CSM installation prepares and deploys a distributed system across a group of management
10
10
nodes organized into a Kubernetes cluster which uses Ceph for utility storage. These nodes
@@ -18,7 +18,7 @@ demand for them varies, such as when booting many compute nodes or application n
18
18
19
19
This information is intended for system installers, system administrators, and network administrators
20
20
of the system. It assumes some familiarity with standard Linux and open source tools, such as shell
21
- scripts, Ansible, YAML, JSON, and TOML file formats, etc.
21
+ scripts, revision control with git, configuration management with Ansible, YAML, JSON, and TOML file formats, etc.
22
22
23
23
### Trademarks
24
24
@@ -30,98 +30,77 @@ The chapters with topics which need to be done as part of an ordered procedure a
30
30
31
31
1 . [ Introduction to CSM Installation] ( introduction/index.md )
32
32
33
- Topics:
34
- * [ CSM Overview] ( introduction/index.md#csm_overview )
35
- * [ Scenarios for Shasta v1.5] ( introduction/index.md#scenarios )
36
- * [ CSM Product Stream Updates] ( introduction/index.md#product-stream-updates )
37
- * [ CSM Operational Activities] ( introduction/index.md#operations )
38
- * [ Differences from Previous Release] ( introduction/index.md#differences )
39
- * [ Documentation Conventions] ( introduction/index.md#documentation_conventions )
33
+ This chapter provides an introduction to using the CSM software to manage the HPE Cray EX system which
34
+ also describes the scenarios for installation and upgrade of CSM software, how product stream updates
35
+ for CSM are delivered, the operational activities done after installation for on-going management
36
+ of the HPE Cray EX system, differences between previous release and this release, and conventions
37
+ used in this documentation.
40
38
41
39
1 . [ Update CSM Product Stream] ( update_product_stream/index.md )
42
40
43
- Topics:
44
- 1 . [ Download and Extract CSM Product Release] ( update_product_stream/index.md#download-and-extract )
45
- 1 . [ Apply Patch to CSM Release] ( update_product_stream/index.md#patch )
46
- 1 . [ Check for Latest Workarounds and Documentation Updates] ( update_product_stream/index.md#workarounds )
47
- 1 . [ Check for Field Notices about Hotfixes] ( update_product_stream/index.md#hotfixes )
41
+ This chapter explains how to get the CSM product release, any patches, update to the latest set of
42
+ documenation and any installation workarounds, and check for any Field Notices or Hotfixes.
48
43
49
44
50
45
1 . [ Install CSM] ( install/index.md )
51
46
52
- Topics:
53
- 1 . [ Validate Management Network Cabling] ( install/index.md#validate_management_network_cabling )
54
- 1 . [ Prepare Configuration Payload] ( install/index.md#prepare_configuration_payload )
55
- 1 . [ Prepare Management Nodes] ( install/index.md#prepare_management_nodes )
56
- 1 . [ Bootstrap PIT Node] ( install/index.md#bootstrap_pit_node )
57
- 1 . [ Configure Management Network Switches] ( install/index.md#configure_management_network )
58
- 1 . [ Collect MAC Addresses for NCNs] ( install/index.md#collect_mac_addresses_for_ncns )
59
- 1 . [ Deploy Management Nodes] ( install/index.md#deploy_management_nodes )
60
- 1 . [ Install CSM Services] ( install/index.md#install_csm_services )
61
- 1 . [ Validate CSM Health Before PIT Node Redeploy] ( install/index.md#validate_csm_health_before_pit_redeploy )
62
- 1 . [ Redeploy PIT Node] ( install/index.md#redeploy_pit_node )
63
- 1 . [ Configure Administrative Access] ( install/index.md#configure_administrative_access )
64
- 1 . [ Validate CSM Health] ( operations/validate_csm_health.md )
65
- 1 . [ Configure Prometheus Alert Notifications] ( install/index.md#configure_prometheus_alert_notifications )
66
- 1 . [ Update Firmware with FAS] ( operations/firmware/Update_Firmware_with_FAS.md )
67
- 1 . [ Prepare Compute Nodes] ( install/index.md#prepare_compute_nodes )
68
- 1 . [ Next Topic] ( install/index.md#next_topic )
69
- 1 . [ Troubleshooting Installation Problems] ( install/troubleshooting_installation.md )
47
+ This chapter provides an order list of procedures which can be used for CSM software installation or reinstall
48
+ that indicate when to do operational tasks as part of the installation workflow. Updating software is in another chapter.
49
+ Installation of the CSM product stream has many steps in multiple procedures which should be done in a
50
+ specific order. Information about the HPE Cray EX system and the site is used to prepare the configuration
51
+ payload. The initial node used to bootstrap the installation process is called the PIT node because the
52
+ Pre-Install Toolkit is installed there. Once the management network switches have been configured, the other
53
+ management nodes can be deployed with an operating system and the software to create a Kubernetes cluster
54
+ utilizing Ceph storage. The CSM services provide essential software infrastructure including the API gateway
55
+ and many micro-services with REST APIs for managing the system. Once administrative access has been configured,
56
+ the installation of CSM software and nodes can be validated with health checks before doing operational tasks
57
+ like the check and update of firmware on system components or the preparation of compute nodes.
70
58
71
59
1 . [ Upgrade CSM] ( upgrade/index.md )
72
60
73
- Topics:
74
- 1 . [ Prepare for Upgrade] ( upgrade/index.md#prepare_for_upgrade )
75
- 1 . [ Update Management Network Configuration] ( upgrade/index.md#update_management_network )
76
- 1 . [ Upgrade Management Nodes and CSM Services] ( upgrade/index.md#upgrade_management_nodes_csm_services )
77
- 1 . [ Validate CSM Health] ( upgrade/index.md#validate_csm_health )
78
- 1 . [ Update Firmware with FAS] ( upgrade/index.md#update_firmware_with_fas )
79
- 1 . [ Next Topic] ( upgrade/index.md#next_topic )
61
+ This chapter provides an order list of procedures which can be used to update CSM software that indicate when
62
+ to do operational tasks as part of the software upgrade workflow. There are procedures to prepare the
63
+ HPE Cray system for the upgrade, and update the management network, the management nodes, and the CSM services.
64
+ After the upgrade of CSM software, the CSM health checks are used to validate the system before doing any other
65
+ operational tasks like the check and update of firmware on system components.
80
66
81
67
1 . [ CSM Operational Activities] ( operations/index.md )
82
68
83
- Topics:
84
- * [ CSM Product Management] ( operations/index.md#csm-product-management )
85
- * [ Image Management] ( operations/index.md#image-management )
86
- * [ Boot Orchestration] ( operations/index.md#boot-orchestration )
87
- * [ System Power Off Procedures] ( operations/index.md#system-power-off-procedures )
88
- * [ System Power On Procedures] ( operations/index.md#system-power-on-procedures )
89
- * [ Power Management] ( operations/index.md#power-management )
90
- * [ Artifact Management] ( operations/index.md#artifact-management )
91
- * [ Compute Rolling Upgrades] ( operations/index.md#compute-rolling-upgrades )
92
- * [ Configuration Management] ( operations/index.md#configuration-management )
93
- * [ Kubernetes] ( operations/index.md#kubernetes )
94
- * [ Package Repository Management] ( operations/index.md#package-repository-management )
95
- * [ Security and Authentication] ( operations/index.md#security-and-authentication )
96
- * [ Resiliency] ( operations/index.md#resiliency )
97
- * [ ConMan] ( operations/index.md#conman )
98
- * [ Utility Storage] ( operations/index.md#utility-storage )
99
- * [ System Management Health] ( operations/index.md#system-management-health )
100
- * [ System Layout Service (SLS)] ( operations/index.md#system-layout-service-sls )
101
- * [ System Configuration Service] ( operations/index.md#system-configuration-service )
102
- * [ Hardware State Manager (HSM)] ( operations/index.md#hardware-state-manager-hsm )
103
- * [ Node Management] ( operations/index.md#node-management )
104
- * [ River Endpoint Discovery Service (REDS)] ( operations/index.md#river-endpoint-discovery-service-reds )
105
- * [ Network] ( operations/index.md#network )
106
- * [ Update Firmware with FAS] ( operations/index.md#update-firmware-with-fas )
107
- * [ User Access Service (UAS)] ( operations/index.md#user-access-service-uas )
108
-
109
- 2 . [ CSM Troubleshooting Information] ( troubleshooting/index.md )
110
-
111
- Topics:
112
- * [ Known Issues] ( troubleshooting/index.md#known-issues )
113
-
114
- 3 . [ CSM Background Information] ( background/index.md )
115
-
116
- Topics:
117
- * [ Cray Site Init Files] ( background/cray_site_init_files.md )
118
- * [ Certificate Authority] ( background/certificate_authority.md )
119
- * [ NCN Images] ( background/ncn_images.md )
120
- * [ NCN Boot Workflow] ( background/ncn_boot_workflow.md )
121
- * [ NCN Networking] ( background/ncn_networking.md )
122
- * [ NCN Mounts and File Systems] ( background/ncn_mounts_and_file_systems.md )
123
- * [ NCN Packages] ( background/ncn_packages.md )
124
- * [ NCN Operating System Releases] ( background/ncn_operating_system_releases.md )
125
- * [ cloud-init Basecamp Configuration] ( background/cloud-init_basecamp_configuration.md )
126
-
127
- 4 . [ Glossary] ( glossary.md )
69
+ This chapter provides an unordered set of administrative procedures required to operate an HPE Cray EX system with CSM software and grouped into several major areas:
70
+ * CSM Product Management
71
+ * Artifact Management
72
+ * Boot Orchestration
73
+ * Compute Rolling Upgrade
74
+ * Configuration Management
75
+ * Console Management
76
+ * Firmware Management
77
+ * Hardware State Manager
78
+ * Image Management
79
+ * Kubernetes
80
+ * Network Management
81
+ * Node Management
82
+ * Package Repository Management
83
+ * Power Management
84
+ * Resiliency
85
+ * River Endpoint Discovery Service
86
+ * Security And Authentication
87
+ * System Configuration Service
88
+ * System Layout Service
89
+ * System Management Health
90
+ * UAS User And Admin Topics
91
+ * Utility Storage
92
+ * Validate CSM Health
93
+
94
+ 1 . [ CSM Troubleshooting Information] ( troubleshooting/index.md )
95
+
96
+ This chapter provides information about some known issues in the system and tips for troubleshooting Kubernetes.
97
+
98
+ 1 . [ CSM Background Information] ( background/index.md )
99
+
100
+ This chapter provides background information about the NCNs (non-compute nodes) which function as
101
+ management nodes for the HPE Cray EX system. This information is not normally needed to install
102
+ or upgrade software, but provides background which might be helpful for troubleshooting an installation.
103
+
104
+ 1 . [ Glossary] ( glossary.md )
105
+
106
+ This chapter provides a explanations of terms and acronyms used throughout the rest of this documentation.
0 commit comments