Skip to content
This repository was archived by the owner on Jun 29, 2022. It is now read-only.

Commit f5d24e0

Browse files
committed
docs: Add "How to upgrade etcd"
Signed-off-by: Suraj Deshmukh <suraj@kinvolk.io>
1 parent 9172b20 commit f5d24e0

File tree

1 file changed

+117
-0
lines changed

1 file changed

+117
-0
lines changed

docs/how-to-guides/upgrade-etcd.md

Lines changed: 117 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,117 @@
1+
# Upgrading etcd
2+
3+
## Contents
4+
5+
- [Introduction](#introduction)
6+
- [Steps](#steps)
7+
- [Step 1: Find out the IP and SSH](#step-1-find-out-the-ip-and-ssh)
8+
- [Step 2: Create necessary directories with correct permissions](#step-2-create-necessary-directories-with-correct-permissions)
9+
- [Step 3: Upgrade etcd](#step-3-upgrade-etcd)
10+
- [Step 4: Verify upgrade](#step-4-verify-upgrade)
11+
- [Step 5: Verify using `etcdctl`](#step-5-verify-using-etcdctl)
12+
13+
## Introduction
14+
15+
[Etcd](https://etcd.io/) is the most crucial component of a Kubernetes cluster. It stores the cluster state.
16+
17+
This document will provide step by step guide on upgrading etcd in Lokomotive.
18+
19+
## Steps
20+
21+
Repeat the following steps on all the controller node one node at a time.
22+
23+
### Step 1: Find out the IP and SSH
24+
25+
Find the IP of the controller node by visiting the cloud provider dashboard and ssh into it.
26+
27+
```bash
28+
ssh core@<IP Address>
29+
```
30+
31+
### Step 2: Create necessary directories with correct permissions
32+
33+
Latest etcd (`v3.4.10`) necessitates the data directory permissions to be `0700`, accordingly change the permissions. Verify the permissions are changed to `rwx------`.
34+
35+
```bash
36+
sudo chmod 0700 /var/lib/etcd/
37+
sudo ls -ld /var/lib/etcd/
38+
```
39+
40+
If the node reboots, we need the right settings in place so that `systemd-tmpfile` service does not alter the permissions of the data directory. To make the changes made above persistent run the following command:
41+
42+
```bash
43+
echo "d /var/lib/etcd 0700 etcd etcd - -" | sudo tee /etc/tmpfiles.d/etcd-wrapper.conf
44+
```
45+
46+
### Step 3: Upgrade etcd
47+
48+
Run the following commands:
49+
50+
> **NOTE**: Before proceeding to other commands, set the `etcd_version` variable to the latest etcd version.
51+
52+
```bash
53+
export etcd_version=<latest etcd version e.g. v3.4.10>
54+
55+
sudo sed -i "s,ETCD_IMAGE_TAG=.*,ETCD_IMAGE_TAG=${etcd_version}," \
56+
/etc/systemd/system/etcd-member.service.d/40-etcd-cluster.conf
57+
sudo systemctl daemon-reload
58+
sudo systemctl restart etcd-member
59+
```
60+
61+
### Step 4: Verify upgrade
62+
63+
Verify that the etcd service is in `active (running)` state:
64+
65+
```bash
66+
sudo systemctl status --no-pager etcd-member
67+
```
68+
69+
Run the following command to see logs of the process since the last restart:
70+
71+
```bash
72+
sudo journalctl _SYSTEMD_INVOCATION_ID=$(sudo systemctl \
73+
show -p InvocationID --value etcd-member.service)
74+
```
75+
76+
Once you see the following log line, you can discern that the etcd daemon has come up without errors:
77+
78+
```log
79+
etcdserver: starting server... [version: 3.4.10, cluster version: to_be_decided]
80+
```
81+
82+
Once you see the following log line, you can discern that the etcd has rejoined the cluster without issues:
83+
84+
```log
85+
embed: serving client requests on 10.88.81.1:2379
86+
```
87+
88+
### Step 5: Verify using `etcdctl`
89+
90+
We can use `etcdctl` client to verify the state of etcd cluster.
91+
92+
> **NOTE**: Before proceeding to other commands, set the `no_of_controller_nodes` variable to the number of controller nodes in the cluster.
93+
94+
```bash
95+
export no_of_controller_nodes=<no of controller nodes>
96+
97+
# Find the endpoint of etcd0:
98+
export endpoint=$(grep ETCD_ADVERTISE_CLIENT_URLS /etc/systemd/system/etcd-member.service.d/40-etcd-cluster.conf | cut -d"=" -f3 | tr -d '"')
99+
export endpoints="${endpoint}"
100+
101+
# Create list of other endpoints:
102+
for ((n = 1; n < no_of_controller_nodes; n++)); do
103+
np=$(sed "s|etcd0|etcd${n}|g" <<< $endpoint)
104+
endpoints="${endpoints},${np}"
105+
done
106+
107+
export flags="--cacert=/etc/ssl/etcd/etcd-client-ca.crt \
108+
--cert=/etc/ssl/etcd/etcd-client.crt \
109+
--key=/etc/ssl/etcd/etcd-client.key \
110+
--endpoints=${endpoints}"
111+
112+
# Verify:
113+
sudo ETCDCTL_API=3 etcdctl member list $flags
114+
sudo ETCDCTL_API=3 etcdctl endpoint health $flags
115+
```
116+
117+
The last command should report each node as healthy.

0 commit comments

Comments
 (0)