Skip to content
This repository was archived by the owner on Oct 31, 2019. It is now read-only.

Commit a3fcbcd

Browse files
committed
Update arch image. Move Cluster Verification section to README. Restructure Upgrading cluster in README.
1 parent c03b9d5 commit a3fcbcd

File tree

4 files changed

+107
-56
lines changed

4 files changed

+107
-56
lines changed

README.md

Lines changed: 39 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -21,10 +21,10 @@ Terraform is used to _provision_ the cloud infrastructure and any required local
2121

2222
- Virtual Cloud Network (VCN) with dedicated subnets for etcd, masters, and workers in each availability domain
2323
- Dedicated compute instances for etcd, Kubernetes master and worker nodes in each availability domain
24-
- Public or Private TCP/SSL OCI Load Balancer to to distribute traffic to the Kubernetes Master(s)
24+
- Public or Private TCP/SSL OCI Load Balancer to distribute traffic to the Kubernetes Master(s)
2525
- Private OCI Load Balancer to distribute traffic to the node(s) in the etcd cluster
2626
- _Optional_ NAT instance for Internet-bound traffic on any private subnets
27-
- 2048-bit SSH RSA Key-Pair for compute instances when not overridden by `ssh_private_key` and `ssh_public_key_openssh` input variabless
27+
- 2048-bit SSH RSA Key-Pair for compute instances when not overridden by `ssh_private_key` and `ssh_public_key_openssh` input variables
2828
- Self-signed CA and TLS cluster certificates when not overridden by the input variables `ca_cert`, `ca_key`, etc.
2929

3030
#### Cluster Configuration
@@ -71,7 +71,7 @@ $ cp terraform.example.tfvars terraform.tfvars
7171

7272
### Deploy the cluster
7373

74-
Initialise Terraform:
74+
Initialize Terraform:
7575

7676
```
7777
$ terraform init
@@ -91,12 +91,47 @@ $ terraform apply
9191

9292
### Access the cluster
9393

94-
The Kubernetes cluster will be running after the configuration is applied successfully and the cloud-init scripts have been given time to finish asynchronously. Typically this takes around 5 minutes after `terraform apply` and will vary depending on the overall configuration, instance counts, and shapes.
94+
The Kubernetes cluster will be running after the configuration is applied successfully and the cloud-init scripts have been given time to finish asynchronously. Typically, this takes around 5 minutes after `terraform apply` and will vary depending on the overall configuration, instance counts, and shapes.
9595

9696
A working kubeconfig can be found in the ./generated folder or generated on the fly using the `kubeconfig` Terraform output variable.
9797

9898
Your network access settings determine whether your cluster is accessible from the outside. See [Accessing the Cluster](./docs/cluster-access.md) for more details.
9999

100+
#### Verifying the cluster:
101+
102+
If you've chosen to configure a public cluster, you can do a quick and automated verification of your cluster from
103+
your local machine by running the `cluster-check.sh` located in the `scripts` directory. Note that this script requires your KUBECONFIG environment variable to be set (above), and SSH and HTTPs access to be open to etcd and worker nodes.
104+
105+
To temporarily open access SSH and HTTPs access for `cluster-check.sh`, add the following to your `terraform.tfvars` file:
106+
107+
```bash
108+
# warning: 0.0.0.0/0 is wide open. remember to undo this.
109+
etcd_ssh_ingress = "0.0.0.0/0"
110+
master_ssh_ingress = "0.0.0.0/0"
111+
worker_ssh_ingress = "0.0.0.0/0"
112+
master_https_ingress = "0.0.0.0/0"
113+
worker_nodeport_ingress = "0.0.0.0/0"
114+
```
115+
116+
```bash
117+
$ scripts/cluster-check.sh
118+
```
119+
```
120+
[cluster-check.sh] Running some basic checks on Kubernetes cluster....
121+
[cluster-check.sh] Checking ssh connectivity to each node...
122+
[cluster-check.sh] Checking whether instance bootstrap has completed on each node...
123+
[cluster-check.sh] Checking Flannel's etcd key from each node...
124+
[cluster-check.sh] Checking whether expected system services are running on each node...
125+
[cluster-check.sh] Checking status of /healthz endpoint at each k8s master node...
126+
[cluster-check.sh] Checking status of /healthz endpoint at the LB...
127+
[cluster-check.sh] Running 'kubectl get nodes' a number of times through the master LB...
128+
129+
The Kubernetes cluster is up and appears to be healthy.
130+
Kubernetes master is running at https://129.146.22.175:443
131+
KubeDNS is running at https://129.146.22.175:443/api/v1/proxy/namespaces/kube-system/services/kube-dns
132+
kubernetes-dashboard is running at https://129.146.22.175:443/ui
133+
```
134+
100135
### Scale, upgrade, or delete the cluster
101136

102137
Check out the [example operations](./docs/examples.md) for details on how to use Terraform to scale, upgrade, replace, or delete your cluster.

docs/cluster-access.md

Lines changed: 1 addition & 36 deletions
Original file line numberDiff line numberDiff line change
@@ -38,49 +38,14 @@ Note, for easier access, consider setting up an SSH tunnel between your local ho
3838

3939
## Access the cluster using Kubernetes Dashboard
4040

41-
Assuming `kubectl` has access to the Kubernetes Master Load Balancer, you can use use `kubectl proxy` to access the
41+
Assuming `kubectl` has access to the Kubernetes Master Load Balancer, you can use `kubectl proxy` to access the
4242
Dashboard:
4343

4444
```
4545
kubectl proxy &
4646
open http://localhost:8001/ui
4747
```
4848

49-
## Verifying your cluster:
50-
51-
If you've chosen to configure a public cluster, you can do a quick and automated verification of your cluster from
52-
your local machine by running the `cluster-check.sh` located in the `scripts` directory. Note that this script requires your KUBECONFIG environment variable to be set (above), and SSH and HTTPs access to be open to etcd and worker nodes.
53-
54-
To temporarily open access SSH and HTTPs access for `cluster-check.sh`, add the following to your `terraform.tfvars` file:
55-
56-
```bash
57-
# warning: 0.0.0.0/0 is wide open. remember to undo this.
58-
etcd_ssh_ingress = "0.0.0.0/0"
59-
master_ssh_ingress = "0.0.0.0/0"
60-
worker_ssh_ingress = "0.0.0.0/0"
61-
master_https_ingress = "0.0.0.0/0"
62-
worker_nodeport_ingress = "0.0.0.0/0"
63-
```
64-
65-
```bash
66-
$ scripts/cluster-check.sh
67-
```
68-
```
69-
[cluster-check.sh] Running some basic checks on Kubernetes cluster....
70-
[cluster-check.sh] Checking ssh connectivity to each node...
71-
[cluster-check.sh] Checking whether instance bootstrap has completed on each node...
72-
[cluster-check.sh] Checking Flannel's etcd key from each node...
73-
[cluster-check.sh] Checking whether expected system services are running on each node...
74-
[cluster-check.sh] Checking status of /healthz endpoint at each k8s master node...
75-
[cluster-check.sh] Checking status of /healthz endpoint at the LB...
76-
[cluster-check.sh] Running 'kubectl get nodes' a number or times through the master LB...
77-
78-
The Kubernetes cluster is up and appears to be healthy.
79-
Kubernetes master is running at https://129.146.22.175:443
80-
KubeDNS is running at https://129.146.22.175:443/api/v1/proxy/namespaces/kube-system/services/kube-dns
81-
kubernetes-dashboard is running at https://129.146.22.175:443/ui
82-
```
83-
8449
## SSH into OCI Instances
8550

8651
If you've chosen to launch your control plane instance in _public_ subnets (i.e. `control_plane_subnet_access=public`), you can open

docs/examples.md

Lines changed: 67 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -55,7 +55,7 @@ We can use `terraform taint` to worker instances in a particular AD as "tainted"
5555
regenerating a misbehaving worker.
5656

5757
```bash
58-
# taint all workers in AD1
58+
# taint all workers in a particular AD
5959
terraform taint -module=instances-k8sworker-ad1 oci_core_instance.TFInstanceK8sWorker
6060
# optionally taint workers in AD2 and AD3 or do so in a subsequent apply
6161
# terraform taint -module=instances-k8sworker-ad2 oci_core_instance.TFInstanceK8sWorker
@@ -75,7 +75,7 @@ We can also use `terraform taint` to master instances in a particular AD as "tai
7575
changes or regenerating a misbehaving master.
7676

7777
```bash
78-
# taint all masters in AD1
78+
# taint all masters in a particular AD
7979
terraform taint -module=instances-k8smaster-ad1 oci_core_instance.TFInstanceK8sMaster
8080
# optionally taint masters in AD2 and AD3 or do so in a subsequent apply
8181
# terraform taint -module=instances-k8smaster-ad2 oci_core_instance.TFInstanceK8sMaster
@@ -88,36 +88,87 @@ $ terraform plan
8888
$ terraform apply
8989
```
9090

91-
## Upgrading cluster using the k8s_ver input variable
91+
## Upgrading Kubernetes Version
9292

93-
One way to upgrade your cluster is by incrementally changing the value of the `k8s_ver` input variable on your master and then worker nodes.
93+
There are a few ways of moving to a new version of Kubernetes in your cluster.
94+
95+
The easiest way to upgrade to a new Kubernetes version is to use the scripts to do a fresh cluster install using an updated `k8s_ver` inpput variable. The downside with this option is that the new cluster will not have your existing cluster state and deployments.
96+
97+
The other options involve using the `k8s_ver` input variable to _replace_ master and worker instances in your _existing_ cluster. We can replace master and worker instances in the cluster since Kubernetes masters and workers are stateless. This option can either be done all at once or incrementally.
98+
99+
#### Option 1: Do a clean install (easiest overall approach)
100+
101+
Set the `k8s_ver` and follow the original instructions in the [README](../README.md) do install a new cluster. The `label_prefix` variable is useful for installing multiple clusters in a compartment.
102+
103+
#### Option 2: Upgrade cluster all at once (easiest upgrade)
104+
105+
The example `terraform apply` command below will destroy then re-create all master and worker instances using as much parallelism as possible. It's the easiest and quickest upgrade scenario, but will result in some downtime for the workers and masters while they are being re-created. The single example `terraform apply` below will:
106+
107+
1. destroy all worker nodes
108+
2. destroy all master nodes
109+
3. destroy all master load-balancer backends that point to old master instances
110+
4. re-create master instances using Kubernetes 1.7.5
111+
5. re-create worker nodes using Kubernetes 1.7.5
112+
6. re-create master load-balancer backends to point to new master node instances
94113

95114
```bash
96-
# preview upgrade of all workers in AD1 to K8s 1.7.5
115+
# preview upgrade/replace
116+
$ terraform plan -var k8s_ver=1.7.5
117+
118+
# perform upgrade/replace
119+
$ terraform apply -var k8s_ver=1.7.5
120+
```
121+
122+
#### Option 3: Upgrade cluster instances incrementally (most complicated, most control over roll-out)
123+
124+
##### First, upgrade master nodes by AD
125+
126+
If you would rather update the cluster incrementally, we start by upgrading the master nodes in each AD. In this scenario, each `terraform apply` will:
127+
128+
1. destroy all master instances in a particular AD
129+
2. destroy all master load-balancer backends that point to deleted master instances
130+
3. re-create master instances in the AD using Kubernetes 1.7.5
131+
4. re-create master load-balancer backends to point to new master node instances
132+
133+
For example, here is the command to upgrade all the master instances in AD1:
134+
135+
```bash
136+
# preview upgrade of all masters and their LB backends in AD1
137+
$ terraform plan -var k8s_ver=1.7.5 -target=module.instances-k8smaster-ad1 -target=module.k8smaster-public-lb
138+
139+
# perform upgrade/replace masters
140+
$ terraform apply -var k8s_ver=1.7.5 -target=module.instances-k8smaster-ad1 -target=module.k8smaster-public-lb
141+
```
142+
143+
Be sure to repeat this command for each AD you have masters on.
144+
145+
##### Next, upgrade worker nodes by AD
146+
147+
After upgrading all the master nodes, we upgrade the worker nodes in each AD. Each `terraform apply` will:
148+
149+
1. drain all worker nodes in a particular AD to your nodes in AD2 and AD3
150+
2. destroy all worker nodes in a particular AD
151+
3. re-create worker nodes in a particular AD using Kubernetes 1.7.5
152+
153+
For example, here is the command to upgrade the master instances in AD1:
154+
155+
```bash
156+
# preview upgrade of all workers in a particular AD to K8s
97157
$ terraform plan -var k8s_ver=1.7.5 -target=module.instances-k8sworker-ad1
98158

99159
# perform upgrade/replace workers
100160
$ terraform apply -var k8s_ver=1.7.5 -target=module.instances-k8sworker-ad1
101161
```
102162

103-
The above command will:
104-
105-
1. drain all worker nodes in AD1 to your nodes in AD2 and AD3
106-
2. destroy all worker nodes in AD1
107-
3. re-create worker nodes in AD1 using Kubernetes 1.7.5
108-
109-
If you have more than one worker in an AD, you can upgrade worker nodes individually using the subscript operator
163+
Like before, repeat `terraform apply` on each AD you have workers on. Note that if you have more than one worker in an AD, you can upgrade worker nodes individually using the subscript operator e.g.
110164

111165
```bash
112-
# preview upgrade of a single worker in AD1 to K8s 1.7.5
166+
# preview upgrade of a single worker in a particular AD to K8s 1.7.5
113167
$ terraform plan -var k8s_ver=1.7.5 -target=module.instances-k8smaster-ad1.oci_core_instance.TFInstanceK8sMaster[1]
114168

115169
# perform upgrade/replace of worker
116170
$ terraform apply -var k8s_ver=1.7.5 -target=module.instances-k8sworker-ad1
117171
```
118-
Be sure to smoke test this approach on a stand-by cluster to weed out pitfalls and ensure our scripts are compatible
119-
with the version of Kubernetes you are trying to upgrade to. We have not tested other versions of Kubernetes other
120-
than the current default version.
121172

122173
## Replacing etcd cluster members using terraform taint
123174

docs/images/arch.jpg

-206 Bytes
Loading

0 commit comments

Comments
 (0)