Initiate update on machine template ref change#813
Conversation
Signed-off-by: Alexey Makhov <amakhov@mirantis.com>
apedriza
left a comment
There was a problem hiding this comment.
code looks good but I think we have some issue in our machines sync algo. Following the new e2e test added in the PR, if a second update is applied, cluster update gets stuck in this state:

I applied the following update:
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
kind: DockerMachineTemplate
metadata:
name: docker-test-cp-template-new2
namespace: default
spec:
template:
spec:
customImage: kindest/node:v1.31.1
---
apiVersion: controlplane.cluster.x-k8s.io/v1beta1
kind: K0sControlPlane
metadata:
name: docker-test
spec:
replicas: 3
version: v1.31.2+k0s.0
updateStrategy: Recreate
k0sConfigSpec:
k0s:
apiVersion: k0s.k0sproject.io/v1beta1
kind: ClusterConfig
metadata:
name: k0s
spec:
api:
extraArgs:
anonymous-auth: "true"
telemetry:
enabled: false
machineTemplate:
infrastructureRef:
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
kind: DockerMachineTemplate
name: docker-test-cp-template-new2
namespace: defaultabc978e to
d35adea
Compare
|
@apedriza yeah, thanks! I fixed the issues and added second update to the test case. |
| if oldMachines > 0 { | ||
| if clusterIsUpdating { | ||
| log.Log.Info("Cluster is updating", "currentVersion", currentVersion, "newVersion", kcp.Spec.Version, "strategy", kcp.Spec.UpdateStrategy) | ||
| clusterIsUpdating = true |
There was a problem hiding this comment.
| clusterIsUpdating = true | |
at that point it is always true
| machine := machines.Oldest() | ||
| machine := machines.Filter(func(m *clusterv1.Machine) bool { | ||
| return machineNamesToDelete[m.Name] | ||
| }).Newest() |
There was a problem hiding this comment.
should we return newest or oldest ? Can you explain reason one instead of the other? Maybe a comment in this lines explaining why delete oldest/newest helps. (Currently, log message says found oldest and we are retrieving newest)
There was a problem hiding this comment.
It really doesn't matter. I change it back to oldest to keep it consistent
| } | ||
|
|
||
| logger := log.FromContext(ctx).WithValues("controlNode", name) | ||
| //err := clientset.RESTClient(). |
There was a problem hiding this comment.
should we remove this lines?
| i := 0 | ||
| for len(desiredMachineNames) < int(kcp.Spec.Replicas) { | ||
| name := machineName(kcp.Name, i) | ||
| log.Log.Info("desire machine", "name", len(desiredMachineNames)) | ||
| _, ok := machineNamesToDelete[name] | ||
| if !ok { | ||
| _, exists := machines[name] | ||
| desiredMachineNames[name] = exists | ||
| } | ||
| i++ | ||
| } | ||
| log.Log.Info("Desired machines", "count", len(desiredMachineNames)) |
There was a problem hiding this comment.
WDYT about use random suffix at the end of the name instead the current integer? With the current approach we are creating new machines using same name as old ones (check e2e test added) had which maybe could led us in some weird scenario. i dont know if it is possible use random suffix but we definitely would not have these cases
There was a problem hiding this comment.
Yeah, I've been thinking about it and even tried it. Having predictable machine names keeps names consistent across reconciliation loops and it is convenient for tests. So I didn't proceed with those changes, at least for now.
Signed-off-by: Alexey Makhov <amakhov@mirantis.com>
Fixes #808