Update arch image. Move Cluster Verification section to README. Restructure Upgrading cluster in README.

jlamillan · jlamillan · commit a3fcbcda629f · 2017-11-17T14:22:15.000-08:00
diff --git a/README.md b/README.md
@@ -21,10 +21,10 @@ Terraform is used to _provision_ the cloud infrastructure and any required local
 
 - Virtual Cloud Network (VCN) with dedicated subnets for etcd, masters, and workers in each availability domain
 - Dedicated compute instances for etcd, Kubernetes master and worker nodes in each availability domain
-- Public or Private TCP/SSL OCI Load Balancer to to distribute traffic to the Kubernetes Master(s)
+- Public or Private TCP/SSL OCI Load Balancer to distribute traffic to the Kubernetes Master(s)
 - Private OCI Load Balancer to distribute traffic to the node(s) in the etcd cluster
 - _Optional_ NAT instance for Internet-bound traffic on any private subnets
-- 2048-bit SSH RSA Key-Pair for compute instances when not overridden by `ssh_private_key` and `ssh_public_key_openssh` input variabless
+- 2048-bit SSH RSA Key-Pair for compute instances when not overridden by `ssh_private_key` and `ssh_public_key_openssh` input variables
 - Self-signed CA and TLS cluster certificates when not overridden by the input variables `ca_cert`, `ca_key`, etc.
 
 #### Cluster Configuration
@@ -71,7 +71,7 @@ $ cp terraform.example.tfvars terraform.tfvars
 
 ### Deploy the cluster
 
-Initialise Terraform:
+Initialize Terraform:
 
 ```
 $ terraform init
@@ -91,12 +91,47 @@ $ terraform apply
 
 ### Access the cluster
 
-The Kubernetes cluster will be running after the configuration is applied successfully and the cloud-init scripts have been given time to finish asynchronously. Typically this takes around 5 minutes after `terraform apply` and will vary depending on the overall configuration, instance counts, and shapes.
+The Kubernetes cluster will be running after the configuration is applied successfully and the cloud-init scripts have been given time to finish asynchronously. Typically, this takes around 5 minutes after `terraform apply` and will vary depending on the overall configuration, instance counts, and shapes.
 
 A working kubeconfig can be found in the ./generated folder or generated on the fly using the `kubeconfig` Terraform output variable.
 
 Your network access settings determine whether your cluster is accessible from the outside. See [Accessing the Cluster](./docs/cluster-access.md) for more details.
 
+#### Verifying the cluster:
+
+If you've chosen to configure a public cluster, you can do a quick and automated verification of your cluster from 
+your local machine by running the `cluster-check.sh` located in the `scripts` directory.  Note that this script requires your KUBECONFIG environment variable to be set (above), and SSH and HTTPs access to be open to etcd and worker nodes.
+
+To temporarily open access SSH and HTTPs access for `cluster-check.sh`, add the following to your `terraform.tfvars` file:
+
+```bash
+# warning: 0.0.0.0/0 is wide open. remember to undo this.
+etcd_ssh_ingress = "0.0.0.0/0"
+master_ssh_ingress = "0.0.0.0/0"
+worker_ssh_ingress = "0.0.0.0/0"
+master_https_ingress = "0.0.0.0/0"
+worker_nodeport_ingress = "0.0.0.0/0"
+```
+
+```bash
+$ scripts/cluster-check.sh
+```
+```
+[cluster-check.sh] Running some basic checks on Kubernetes cluster....
+[cluster-check.sh]   Checking ssh connectivity to each node...
+[cluster-check.sh]   Checking whether instance bootstrap has completed on each node...
+[cluster-check.sh]   Checking Flannel's etcd key from each node...
+[cluster-check.sh]   Checking whether expected system services are running on each node...
+[cluster-check.sh]   Checking status of /healthz endpoint at each k8s master node...
+[cluster-check.sh]   Checking status of /healthz endpoint at the LB...
+[cluster-check.sh]   Running 'kubectl get nodes' a number of times through the master LB...
+
+The Kubernetes cluster is up and appears to be healthy.
+Kubernetes master is running at https://129.146.22.175:443
+KubeDNS is running at https://129.146.22.175:443/api/v1/proxy/namespaces/kube-system/services/kube-dns
+kubernetes-dashboard is running at https://129.146.22.175:443/ui
+```
+
 ### Scale, upgrade, or delete the cluster
 
 Check out the [example operations](./docs/examples.md) for details on how to use Terraform to scale, upgrade, replace, or delete your cluster.
diff --git a/docs/cluster-access.md b/docs/cluster-access.md
@@ -38,49 +38,14 @@ Note, for easier access, consider setting up an SSH tunnel between your local ho
 
 ## Access the cluster using Kubernetes Dashboard
 
-Assuming `kubectl` has access to the Kubernetes Master Load Balancer, you can use use `kubectl proxy` to access the 
+Assuming `kubectl` has access to the Kubernetes Master Load Balancer, you can use `kubectl proxy` to access the 
 Dashboard:
 
 ```
 kubectl proxy &
 open http://localhost:8001/ui
 ```
 
-## Verifying your cluster:
-
-If you've chosen to configure a public cluster, you can do a quick and automated verification of your cluster from 
-your local machine by running the `cluster-check.sh` located in the `scripts` directory.  Note that this script requires your KUBECONFIG environment variable to be set (above), and SSH and HTTPs access to be open to etcd and worker nodes.
-
-To temporarily open access SSH and HTTPs access for `cluster-check.sh`, add the following to your `terraform.tfvars` file:
-
-```bash
-# warning: 0.0.0.0/0 is wide open. remember to undo this.
-etcd_ssh_ingress = "0.0.0.0/0"
-master_ssh_ingress = "0.0.0.0/0"
-worker_ssh_ingress = "0.0.0.0/0"
-master_https_ingress = "0.0.0.0/0"
-worker_nodeport_ingress = "0.0.0.0/0"
-```
-
-```bash
-$ scripts/cluster-check.sh
-```
-```
-[cluster-check.sh] Running some basic checks on Kubernetes cluster....
-[cluster-check.sh]   Checking ssh connectivity to each node...
-[cluster-check.sh]   Checking whether instance bootstrap has completed on each node...
-[cluster-check.sh]   Checking Flannel's etcd key from each node...
-[cluster-check.sh]   Checking whether expected system services are running on each node...
-[cluster-check.sh]   Checking status of /healthz endpoint at each k8s master node...
-[cluster-check.sh]   Checking status of /healthz endpoint at the LB...
-[cluster-check.sh]   Running 'kubectl get nodes' a number or times through the master LB...
-
-The Kubernetes cluster is up and appears to be healthy.
-Kubernetes master is running at https://129.146.22.175:443
-KubeDNS is running at https://129.146.22.175:443/api/v1/proxy/namespaces/kube-system/services/kube-dns
-kubernetes-dashboard is running at https://129.146.22.175:443/ui
-```
-
 ## SSH into OCI Instances
 
 If you've chosen to launch your control plane instance in _public_ subnets (i.e. `control_plane_subnet_access=public`), you can open
diff --git a/docs/examples.md b/docs/examples.md
@@ -55,7 +55,7 @@ We can use `terraform taint` to worker instances in a particular AD as "tainted"
  regenerating a misbehaving worker.
 
 ```bash
-# taint all workers in AD1
+# taint all workers in a particular AD
 terraform taint -module=instances-k8sworker-ad1 oci_core_instance.TFInstanceK8sWorker
 # optionally taint workers in AD2 and AD3 or do so in a subsequent apply
 # terraform taint -module=instances-k8sworker-ad2 oci_core_instance.TFInstanceK8sWorker
@@ -75,7 +75,7 @@ We can also use `terraform taint` to master instances in a particular AD as "tai
  changes or regenerating a misbehaving master.
 
 ```bash
-# taint all masters in AD1
+# taint all masters in a particular AD
 terraform taint -module=instances-k8smaster-ad1 oci_core_instance.TFInstanceK8sMaster
 # optionally taint masters in AD2 and AD3 or do so in a subsequent apply
 # terraform taint -module=instances-k8smaster-ad2 oci_core_instance.TFInstanceK8sMaster
@@ -88,36 +88,87 @@ $ terraform plan
 $ terraform apply 
 ```
 
-## Upgrading cluster using the k8s_ver input variable 
+## Upgrading Kubernetes Version
 
-One way to upgrade your cluster is by incrementally changing the value of the `k8s_ver` input variable on your master and then worker nodes.
+There are a few ways of moving to a new version of Kubernetes in your cluster.
+
+The easiest way to upgrade to a new Kubernetes version is to use the scripts to do a fresh cluster install using an updated `k8s_ver` inpput variable. The downside with this option is that the new cluster will not have your existing cluster state and deployments.
+
+The other options involve using the `k8s_ver` input variable to _replace_ master and worker instances in your _existing_ cluster. We can replace master and worker instances in the cluster since Kubernetes masters and workers are stateless. This option can either be done all at once or incrementally.
+
+#### Option 1: Do a clean install (easiest overall approach)
+
+Set the `k8s_ver` and follow the original instructions in the [README](../README.md) do install a new cluster. The `label_prefix` variable is useful for installing multiple clusters in a compartment.
+
+#### Option 2: Upgrade cluster all at once (easiest upgrade)
+
+The example `terraform apply` command below will destroy then re-create all master and worker instances using as much parallelism as possible. It's the easiest and quickest upgrade scenario, but will result in some downtime for the workers and masters while they are being re-created. The single example `terraform apply` below will:
+
+1. destroy all worker nodes
+2. destroy all master nodes
+3. destroy all master load-balancer backends that point to old master instances
+4. re-create master instances using Kubernetes 1.7.5
+5. re-create worker nodes using Kubernetes 1.7.5
+6. re-create master load-balancer backends to point to new master node instances
 
 ```bash
-# preview upgrade of all workers in AD1 to K8s 1.7.5
+# preview upgrade/replace
+$ terraform plan -var k8s_ver=1.7.5
+
+# perform upgrade/replace
+$ terraform apply -var k8s_ver=1.7.5
+```
+
+#### Option 3: Upgrade cluster instances incrementally (most complicated, most control over roll-out)
+
+##### First, upgrade master nodes by AD
+
+If you would rather update the cluster incrementally, we start by upgrading the master nodes in each AD. In this scenario, each `terraform apply` will:
+
+1. destroy all master instances in a particular AD
+2. destroy all master load-balancer backends that point to deleted master instances
+3. re-create master instances in the AD using Kubernetes 1.7.5
+4. re-create master load-balancer backends to point to new master node instances
+
+For example, here is the command to upgrade all the master instances in AD1:
+
+```bash
+# preview upgrade of all masters and their LB backends in AD1
+$ terraform plan -var k8s_ver=1.7.5 -target=module.instances-k8smaster-ad1 -target=module.k8smaster-public-lb
+
+# perform upgrade/replace masters
+$ terraform apply -var k8s_ver=1.7.5 -target=module.instances-k8smaster-ad1 -target=module.k8smaster-public-lb
+```
+
+Be sure to repeat this command for each AD you have masters on.
+
+##### Next, upgrade worker nodes by AD
+
+After upgrading all the master nodes, we upgrade the worker nodes in each AD. Each `terraform apply` will:
+
+1. drain all worker nodes in a particular AD to your nodes in AD2 and AD3
+2. destroy all worker nodes in a particular AD
+3. re-create worker nodes in a particular AD using Kubernetes 1.7.5
+
+For example, here is the command to upgrade the master instances in AD1:
+
+```bash
+# preview upgrade of all workers in a particular AD to K8s
 $ terraform plan -var k8s_ver=1.7.5 -target=module.instances-k8sworker-ad1
 
 # perform upgrade/replace workers
 $ terraform apply -var k8s_ver=1.7.5 -target=module.instances-k8sworker-ad1
 ```
 
-The above command will:
-
-1. drain all worker nodes in AD1 to your nodes in AD2 and AD3
-2. destroy all worker nodes in AD1
-3. re-create worker nodes in AD1 using Kubernetes 1.7.5
-
-If you have more than one worker in an AD, you can upgrade worker nodes individually using the subscript operator
+Like before, repeat `terraform apply` on each AD you have workers on. Note that if you have more than one worker in an AD, you can upgrade worker nodes individually using the subscript operator e.g. 
 
 ```bash
-# preview upgrade of a single worker in AD1 to K8s 1.7.5
+# preview upgrade of a single worker in a particular AD to K8s 1.7.5
 $ terraform plan -var k8s_ver=1.7.5 -target=module.instances-k8smaster-ad1.oci_core_instance.TFInstanceK8sMaster[1]
 
 # perform upgrade/replace of worker
 $ terraform apply -var k8s_ver=1.7.5 -target=module.instances-k8sworker-ad1
 ```
-Be sure to smoke test this approach on a stand-by cluster to weed out pitfalls and ensure our scripts are compatible 
-with the version of Kubernetes you are trying to upgrade to. We have not tested other versions of Kubernetes other 
-than the current default version.
 
 ## Replacing etcd cluster members using terraform taint
 
diff --git a/docs/images/arch.jpg b/docs/images/arch.jpg