Skip to content
This repository was archived by the owner on Jun 29, 2022. It is now read-only.

Commit fd056ae

Browse files
committed
docs: Add "How to upgrade etcd"
Signed-off-by: Suraj Deshmukh <suraj@kinvolk.io>
1 parent 524a81d commit fd056ae

File tree

1 file changed

+112
-0
lines changed

1 file changed

+112
-0
lines changed

docs/how-to-guides/upgrade-etcd.md

Lines changed: 112 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,112 @@
1+
# Upgrading etcd
2+
3+
## Contents
4+
5+
- [Introduction](#introduction)
6+
- [Steps](#steps)
7+
- [Step 1: Find out the IP and SSH](#step-1-find-out-the-ip-and-ssh)
8+
- [Step 2: Create necessary directories with correct permissions](#step-2-create-necessary-directories-with-correct-permissions)
9+
- [Step 3: Upgrade etcd](#step-3-upgrade-etcd)
10+
- [Step 4: Verify upgrade](#step-4-verify-upgrade)
11+
- [Step 5: Verify using `etcdctl`](#step-5-verify-using-etcdctl)
12+
13+
## Introduction
14+
15+
[Etcd](https://etcd.io/) is the most crucial component of a Kubernetes cluster. It stores the cluster state.
16+
17+
This document will provide step by step guide on upgrading etcd in Lokomotive.
18+
19+
## Steps
20+
21+
Repeat the following steps on all the controller node one node at a time.
22+
23+
### Step 1: Find out the IP and SSH
24+
25+
Find the IP of the controller node by visiting the cloud provider dashboard and ssh into it.
26+
27+
```bash
28+
ssh core@<IP Address>
29+
```
30+
31+
### Step 2: Create necessary directories with correct permissions
32+
33+
Latest etcd (`>= v3.4.10`) necessitates the data directory permissions to be `0700`, accordingly change the permissions. Verify the permissions are changed to `rwx------`.
34+
35+
> **NOTE**: This step is needed only for the Lokomotive deployment done using `lokoctl` version `< 0.4.0`.
36+
37+
```bash
38+
sudo chmod 0700 /var/lib/etcd/
39+
sudo ls -ld /var/lib/etcd/
40+
```
41+
42+
If the node reboots, we need the right settings in place so that `systemd-tmpfile` service does not alter the permissions of the data directory. To make the changes made above persistent run the following command:
43+
44+
```bash
45+
echo "d /var/lib/etcd 0700 etcd etcd - -" | sudo tee /etc/tmpfiles.d/etcd-wrapper.conf
46+
```
47+
48+
### Step 3: Upgrade etcd
49+
50+
Run the following commands:
51+
52+
> **NOTE**: Before proceeding to other commands, set the `etcd_version` variable to the latest etcd version.
53+
54+
```bash
55+
export etcd_version=<latest etcd version e.g. v3.4.10>
56+
57+
sudo sed -i "s,ETCD_IMAGE_TAG=.*,ETCD_IMAGE_TAG=${etcd_version}," \
58+
/etc/systemd/system/etcd-member.service.d/40-etcd-cluster.conf
59+
sudo systemctl daemon-reload
60+
sudo systemctl restart etcd-member
61+
```
62+
63+
### Step 4: Verify upgrade
64+
65+
Verify that the etcd service is in `active (running)` state:
66+
67+
```bash
68+
sudo systemctl status --no-pager etcd-member
69+
```
70+
71+
Run the following command to see logs of the process since the last restart:
72+
73+
```bash
74+
sudo journalctl _SYSTEMD_INVOCATION_ID=$(sudo systemctl \
75+
show -p InvocationID --value etcd-member.service)
76+
```
77+
78+
> **NOTE**: Do not proceed with the upgrade of the rest of the cluster if you encounter any errors.
79+
80+
Once you see the following log line, you can discern that the etcd daemon has come up without errors:
81+
82+
```log
83+
etcdserver: starting server... [version: 3.4.10, cluster version: to_be_decided]
84+
```
85+
86+
Once you see the following log line, you can discern that the etcd has rejoined the cluster without issues:
87+
88+
```log
89+
embed: serving client requests on 10.88.81.1:2379
90+
```
91+
92+
### Step 5: Verify using `etcdctl`
93+
94+
We can use `etcdctl` client to verify the state of etcd cluster.
95+
96+
```bash
97+
# Find the endpoint of this node's etcd:
98+
export endpoint=$(grep ETCD_ADVERTISE_CLIENT_URLS \
99+
/etc/systemd/system/etcd-member.service.d/40-etcd-cluster.conf | cut -d"=" -f3 | tr -d '"')
100+
export flags="--cacert=/etc/ssl/etcd/etcd-client-ca.crt \
101+
--cert=/etc/ssl/etcd/etcd-client.crt \
102+
--key=/etc/ssl/etcd/etcd-client.key"
103+
endpoints=$(sudo ETCDCTL_API=3 etcdctl member list $flags --endpoints=${endpoint} \
104+
--write-out=json | jq -r '.members[].clientURLs[]')
105+
endpoints=$(sed 's| |,|g' <<< ${endpoints})
106+
107+
# Verify:
108+
sudo ETCDCTL_API=3 etcdctl member list $flags --endpoints=${endpoint}
109+
sudo ETCDCTL_API=3 etcdctl endpoint health $flags --endpoints=${endpoints}
110+
```
111+
112+
The last command should report that nodes are healthy. If it indicates otherwise then try commands from [Step 4](#step-4-verify-upgrade) to see what's wrong. If the nodes are healthy, it is safe to move forward with the next controller node.

0 commit comments

Comments
 (0)