Add k0scontrolplane heathcheck-remediation#824
Conversation
a90e27f to
4437cae
Compare
| return fmt.Errorf("failed to filter machines for control plane: %w", err) | ||
| } | ||
|
|
||
| healthyMachines := machines.Filter(collections.Not(isUnhealthy)) |
There was a problem hiding this comment.
Could we use collections.Not(collections.HasUnhealthyCondition) here? If not, could we then avoid a double negative here? Eg something like machines.Filter(isHealthy)?
There was a problem hiding this comment.
👍 fixed using machines.Filter(isHealthy)
4437cae to
096ed96
Compare
makhov
left a comment
There was a problem hiding this comment.
Sorry, just noticed a couple of things with annotations.
|
|
||
| // Remove the annotation tracking that a remediation is in progress. | ||
| // A remediation is completed when the replacement machine has been created above. | ||
| delete(kcp.Annotations, cpv1beta1.RemediationInProgressAnnotation) |
There was a problem hiding this comment.
Shouldn't it be done before creating a machine in kube-api?
There was a problem hiding this comment.
in this way we are sure machine is created in kube-api which means any remediation is done. If not there could be errors creating machine in kube-api and start a second remediation even if the first one was not completed. I think is safer if we make sure machine is created/remediated. WDYT?
|
|
||
| // Mark controlplane to track that remediation is in progress and do not proceed until machine is gone. | ||
| // This annotation is removed when new controlplane creates a new machine. | ||
| annotations.AddAnnotations(kcp, map[string]string{ |
There was a problem hiding this comment.
This annotation should probably be also removed from the K0sControlPlane once it recreates all the machines. Somewhere in updateStatus func or so.
There was a problem hiding this comment.
I think removing it before creating the machine is safe in order to continue with next remediations. We could face cases where more than one machine needs to be remediated. This annotation is to not allow multiples remediations at the same time
096ed96 to
9e25f53
Compare
…/mkdocs-3ba6cc2ae5 Bump mkdocs-material from 9.5.47 to 9.5.48 in /docs in the mkdocs group
e082659 to
e09c775
Compare
Signed-off-by: Adrian Pedriza <adripedriza@gmail.com>
c9b6c59 to
fd53dfe
Compare
|
@AdrianPedriza looks like something is off with changes in this PR currently |
fd53dfe to
c9b6c59
Compare
This PR adds the reconciliation by k0scontrolplane of machines that are considered unhealthy by the machinehealtcheck controller. Check MachineHealthCheck contract for more details h
It basically replicates the behavior of KubeadmControlPlane when handling machines considered unhealthy except that a remediation strategy in order to have a more granular process control is not implemented. Currently machine creation does not take into account the previous state of a machine if it is to be a replacement, so adding this control would require changes to the machine synchronization process. It can always be added later but I did not want to compromise that logic in this PR given its sensitivity.