Skip to content

Add k0scontrolplane heathcheck-remediation#824

Merged
apedriza merged 2 commits intok0sproject:mainfrom
apedriza:support-machinehealthchecks
Dec 16, 2024
Merged

Add k0scontrolplane heathcheck-remediation#824
apedriza merged 2 commits intok0sproject:mainfrom
apedriza:support-machinehealthchecks

Conversation

@apedriza
Copy link
Contributor

@apedriza apedriza commented Nov 22, 2024

This PR adds the reconciliation by k0scontrolplane of machines that are considered unhealthy by the machinehealtcheck controller. Check MachineHealthCheck contract for more details h

It basically replicates the behavior of KubeadmControlPlane when handling machines considered unhealthy except that a remediation strategy in order to have a more granular process control is not implemented. Currently machine creation does not take into account the previous state of a machine if it is to be a replacement, so adding this control would require changes to the machine synchronization process. It can always be added later but I did not want to compromise that logic in this PR given its sensitivity.

@apedriza apedriza requested a review from a team as a code owner November 22, 2024 12:10
@apedriza apedriza marked this pull request as draft November 22, 2024 12:10
@apedriza apedriza force-pushed the support-machinehealthchecks branch 14 times, most recently from a90e27f to 4437cae Compare November 27, 2024 13:43
@apedriza apedriza marked this pull request as ready for review November 27, 2024 14:37
return fmt.Errorf("failed to filter machines for control plane: %w", err)
}

healthyMachines := machines.Filter(collections.Not(isUnhealthy))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we use collections.Not(collections.HasUnhealthyCondition) here? If not, could we then avoid a double negative here? Eg something like machines.Filter(isHealthy)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 fixed using machines.Filter(isHealthy)

@apedriza apedriza force-pushed the support-machinehealthchecks branch from 4437cae to 096ed96 Compare November 28, 2024 14:45
@apedriza apedriza requested a review from makhov November 28, 2024 18:05
Copy link
Contributor

@makhov makhov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, just noticed a couple of things with annotations.


// Remove the annotation tracking that a remediation is in progress.
// A remediation is completed when the replacement machine has been created above.
delete(kcp.Annotations, cpv1beta1.RemediationInProgressAnnotation)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't it be done before creating a machine in kube-api?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in this way we are sure machine is created in kube-api which means any remediation is done. If not there could be errors creating machine in kube-api and start a second remediation even if the first one was not completed. I think is safer if we make sure machine is created/remediated. WDYT?


// Mark controlplane to track that remediation is in progress and do not proceed until machine is gone.
// This annotation is removed when new controlplane creates a new machine.
annotations.AddAnnotations(kcp, map[string]string{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This annotation should probably be also removed from the K0sControlPlane once it recreates all the machines. Somewhere in updateStatus func or so.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think removing it before creating the machine is safe in order to continue with next remediations. We could face cases where more than one machine needs to be remediated. This annotation is to not allow multiples remediations at the same time

@apedriza apedriza force-pushed the support-machinehealthchecks branch from 096ed96 to 9e25f53 Compare November 29, 2024 15:37
…/mkdocs-3ba6cc2ae5

Bump mkdocs-material from 9.5.47 to 9.5.48 in /docs in the mkdocs group
@apedriza apedriza force-pushed the support-machinehealthchecks branch 2 times, most recently from e082659 to e09c775 Compare December 12, 2024 13:24
Signed-off-by: Adrian Pedriza <adripedriza@gmail.com>
@apedriza apedriza force-pushed the support-machinehealthchecks branch 2 times, most recently from c9b6c59 to fd53dfe Compare December 16, 2024 10:42
@makhov
Copy link
Contributor

makhov commented Dec 16, 2024

@AdrianPedriza looks like something is off with changes in this PR currently

@apedriza apedriza force-pushed the support-machinehealthchecks branch from fd53dfe to c9b6c59 Compare December 16, 2024 13:17
@apedriza apedriza merged commit 0a411d1 into k0sproject:main Dec 16, 2024
74 of 78 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants