-
-
Notifications
You must be signed in to change notification settings - Fork 12
Description
Let's proceed to the migration (previous migration here #4250)
-
Check network or update it - see module "private_vnet" in https://github.com/jenkins-infra/azure-net/blob/main/vnets.tf (pairing needed)
-
Check VPN routes, should be ok: see https://github.com/jenkins-infra/docker-openvpn/blob/57a6c0d7efa2165e2f641a337a1973373851f631/config.yaml#L6
-
Create the cluster with terraform. Requires 2 steps (at least)
- Create Azure cloud resources (cluster, node pools, resources groups, Entra identities, disks, storage,e tc.)
- Then the "Kubernetes resources in a secondary PR as it requires a provider to a running cluster
-
Validate cluster access
- From an admin machine
- From infra.ci agents
-
Set up the new cluster in jenkins-infra/kubernetes-management
- Add the custom crafted
kubeconfigin the job jenkins-infra/kubernetes-management (in infra.ci) - Set up the cluster privatek8s with just datadog [inspired by https://github.com/feat: add privatek8s_sponsorship cluster kubernetes-management#6485/files]
- Add the custom crafted
-
Add the other "non running applications" releases
- cert-manager + acme - IMPORTANT note: we need 2 distincts PRs.
-
cert-manager(to install CRDs) -
acme(to install theClusterIssuerwhich requires CRDs to be installed)
-
- public-nginx-ingress - https://github.com/jenkins-infra/kubernetes-management/pull/xxx
- private-nginx-ingress - https://github.com/jenkins-infra/kubernetes-management/pull/xxx
- jenkins-infra-jobs - https://github.com/jenkins-infra/kubernetes-management/pull/xxx
- cert-manager + acme - IMPORTANT note: we need 2 distincts PRs.
-
Create NS/SA/PVCs/PVs for infra.ci and release.ci in terraform
- Note: named of these resources were changed: requires kubernetes-management/config/* update
-
jenkins-release-agents (with updated release: independant of release.ci controller release now)
-
Allow the cluster
privatek8sIPs to reach LDAP: jenkins-infra/kubernetes-management@682e98b -
Announce the operation for the 4 services (release, infra, rss2twitter and github comment ops)
-
Prior to the operation: disable
privatek8sfrom kubernetes-management - feat: migrate infra.ci from privatek8s to privatek8s (CDF) and stop managing privatek8s-sponsorship kubernetes-management#6865 -
Migrate infra.ci:
- (in old cluster) Put controller in "shutdown mode"
- (in old cluster) Scale to zero to close file handles and un-mount disk automatically
- take a snapshot and grab its ID
- Terraform Azure: recreate disk/PV/PVCs with the snapshot ID - feat(infra.ci) add a disk,PV,PVC to import data from the data snapshot azure#1140
- unplanned: fix disk permissions - fix(infra.ci/privatek8s) correct disk permission to allow access from the correct cluster azure#1141
- Create migration pod and copy data:
-
Create migration pod which mounts both PVCs:
Click to expand Pod definition
--- apiVersion: v1 kind: Pod metadata: name: data-migration namespace: infra-ci-jenkins-io spec: containers: - name: data-migration image: alpine command: - sleep - infinity volumeMounts: - name: source mountPath: /data/source - name: destination mountPath: /data/destination volumes: - name: source persistentVolumeClaim: claimName: infra-ci-jenkins-io-data-import - name: destination persistentVolumeClaim: claimName: infra-ci-jenkins-io-data
-
Copy data using rsync/cp, and don't forget to set the permissions:
apk add rsync rsync -av /data/source/ /data/destination chown -R 1000:1000 /data/destination ls -ld /data/destination ls -la /data/destination # Remove cache and transient data rm -rf /data/destination/*cache* /data/destination/plugins/* /data/destination/nodes/* /data/destination/deadlock/* /data/destination/high-load/* /data/destination/slow-requests/*
-
Delete the pod to unmount the volumes
-
- Migrate DNS records - feat(dns-records) migrate infra.ci to CDF privatek8s cluster azure-net#418
- Deploy infra-ci-jenkins-io helm release: - feat: migrate infra.ci from privatek8s to privatek8s (CDF) and stop managing privatek8s-sponsorship kubernetes-management#6865 (applied manually with no probes)
- Reapply probes: jenkins-infra/kubernetes-management@ebe1f0f
- unplanned: fix NSG rules to allow VM ephemeral agents (and cleanup infra.ci resources in sponsored subscription as it is easier to fix issues like this)
- unplanned: update outbound IPs on other projects:
- AWS (access with SSH from infra.ci.jenkins.io agents to pkg.origin.jenkins.io VM when building docker-packaging for keyscan): feat(network) update infra.ci outbound IPs (both cluster and agents subnets) aws#570
- DigitalOcean (management of archives.jenkins.io through SSH from infra.ci.jenkins.io agents): feat(firewalls) update infra.ci outbound IPs (both cluster and agents subnets) digitalocean#248
- AWS sponsorship (access from infra.ci.jenkins.io agents to ci.jenkins.io and all AWS-sponsored account resources for management): Automatic
updatecliPR: Update the terraform-aws-modules outbound IPs terraform-aws-sponsorship#299
-
Migrate release.ci.jenkins.io:
-
(in old cluster) Put controller in "shutdown mode"
-
(in old cluster) Scale to zero to close file handles and un-mount disk automatically
-
take a snapshot and grab its ID
-
Terraform Azure: recreate disk/PV/PVCs with the snapshot ID - feat(release.ci) create disk/PV/PVC for migration to CDF subscription azure#1143
-
Create migration pod and copy data:
-
Create migration pod which mounts both PVCs:
Click to expand Pod definition
--- apiVersion: v1 kind: Pod metadata: name: data-migration namespace: release-ci-jenkins-io spec: containers: - name: data-migration image: alpine command: - sleep - infinity volumeMounts: - name: source mountPath: /data/source - name: destination mountPath: /data/destination volumes: - name: source persistentVolumeClaim: claimName: release-ci-jenkins-io-data-import - name: destination persistentVolumeClaim: claimName: release-ci-jenkins-io-data
-
Copy data using rsync/cp, and don't forget to set the permissions:
apk add rsync rsync -av /data/source/ /data/destination chown -R 1000:1000 /data/destination ls -ld /data/destination ls -la /data/destination # Remove cache and transient data rm -rf /data/destination/*cache* /data/destination/plugins/* /data/destination/nodes/* /data/destination/deadlock/* /data/destination/high-load/* /data/destination/slow-requests/*
-
Delete the pod to unmount the volumes
-
-
Migrate DNS - feat(dns-records) migrate release.ci to Azure CDF susbcription azure-net#419
-
Deploy release-ci-jenkins-io helm release: - feat(privatek8s) migrate release.ci to the Azure CDF subscription kubernetes-management#6870
-
Cleanup old resources: cleanup(privatek8s-sponsorship) remove release.ci resources azure#1144
-
unplanned: update outbounds IPs on other projects:
- AWS (access with SSH from release.ci agents to pkg.origin.jenkins.io VM when performing Jenkins releases, during the "packaging" phase): feat(network) update infra.ci outbound IPs (both cluster and agents subnets) aws#570
- DigitalOcean (access from release.ci agents to archives.jenkins.io VM when performing Jenkins releases, during the "packaging" phase through SSH): feat(firewalls) update infra.ci outbound IPs (both cluster and agents subnets) digitalocean#248
-
Run the "test agents jobs" to check autoscaler and PVCs permissions (in progress)
- Initial validation with manual changes: pods are scheduled (e.g. controller Kubernetes permissions, Kubernetes autoscaling, network routes for inbound agents, controller JCasC setup)
- PR in jenkins-infra/release to fix svcaccount naming and infra-health-agents pipeline improvement (in progress)
- PR in jenkins-infra/azure to manage release keyvault and its allowed IPs
- New "health" validation job works as expected
-
-
rss2twitter (can be done on its own)
- Need to scale down to zero the old release (in
privatek8s-sponsorship) to avoid concurrency:kubectl -n rss2twitter scale --replicas=0 deployment rss2twitter - feat(privatek8s) migrate bot applications to Azure CDF subscription kubernetes-management#6871
- Need to scale down to zero the old release (in
-
github-comment-ops (can be done on its own)
- Migrate the DNS for the webhooks: cleanup: remove all remnants of privatek8s-sponsorship azure-net#420
- Need to scale down to zero the old release (in
privatek8s-sponsorship) to avoid concurrency:kubectl -n github-comment-ops scale --replicas=0 deployment github-comment-ops - feat(privatek8s) migrate bot applications to Azure CDF subscription kubernetes-management#6871
-
Cleanups
- in jenkins-infra/azure:
- Remove infra.ci remnants - cleanup(infra.ci,packer,vnets) remove resources from the sponsored subscription azure#1126
- Remove release.ci remnants - cleanup(privatek8s-sponsorship) remove release.ci resources azure#1144
- Remove privatek8s-sponsorship Kubernetes resources +IP locks first - cleanup(privatek8s-sponsorship) remove all kubernetes/outputs as a first step azure#1145
- Then remove all remnants of privatek8s-sponsorship - cleanup: remove all remaining privatek8s-sponsorship resources azure#1146
- in jenkins-infra/azure-net:
- Remove vnets/subnets/RGs of privatek8s-sponsorship - cleanup: remove all remnants of privatek8s-sponsorship azure-net#420
- in jenkins-infra/kubernetes-management + jenkins-infra/chart-secrets:
- cleanup: remove all remnants of privatek8s kubernetes-management#6872
- Secrets removal for datadog: https://github.com/jenkins-infra/charts-secrets/commit/e3e3ea4e129a7185fafd67aededed2f7f9149e3b
- Secrets removal for infra.ci: https://github.com/jenkins-infra/charts-secrets/commit/e5a8f14e26065077a703c081a025e741fab48328
- terraform states - https://github.com/jenkins-infra/terraform-states/commit/c50962f8dec2a0fb6ec1196224a9bb34cdb55646
- in jenkins-infra/azure: