Skip to content
This repository was archived by the owner on Jun 29, 2022. It is now read-only.

[WIP] - Baremetal reprovisioning#1402

Closed
ipochi wants to merge 18 commits intomasterfrom
imran/baremetal-reprovisioning
Closed

[WIP] - Baremetal reprovisioning#1402
ipochi wants to merge 18 commits intomasterfrom
imran/baremetal-reprovisioning

Conversation

@ipochi
Copy link
Member

@ipochi ipochi commented Feb 25, 2021

replacement of #1333

Includes all the changes of the latest master, #1387 , #1405

This commit restructures the baremetal terraform module to split the
flatcar provisioning as a separate module `matchbox-flatcar`.

The new module takes care of setting up matchbox profiles and groups.

This makes it much easier to use this new module for other projects and
possibly the matchbox-flatcar module could maybe live in its own
repository in the future.

Signed-off-by: Imran Pochi <imran@kinvolk.io>
@ipochi ipochi force-pushed the imran/baremetal-reprovisioning branch from edc266b to ac9c2b3 Compare February 25, 2021 15:02
While the kernel arguments can be set for PXE, they have not been
applied to the final installation. Also, the Ignition URL got lost
because it was downloaded in the installer and put to
/usr/share/oem/config.ign through the -i ignition.json flag.
Use the /usr/share/oem/grub.cfg file to permanently set the kernel
arguments and the Ignition URL, so that on each boot the final
installation uses the kernel arguments provided by the user and
that Ignition is able to fetch the any config changes from Matchbox.
The default of 10 parallel steps does not limit creation of a small
cluster but on a cluster with more than 10 nodes it leads to creation
in sequential batches of 10, slowing the bring up down.
@ipochi ipochi force-pushed the imran/baremetal-reprovisioning branch from ac9c2b3 to c25ee94 Compare February 26, 2021 11:28
@ipochi ipochi force-pushed the imran/baremetal-reprovisioning branch from c25ee94 to e574d78 Compare February 26, 2021 12:45
@ipochi ipochi force-pushed the imran/baremetal-reprovisioning branch from e574d78 to 9ec8c47 Compare February 26, 2021 12:53
@ipochi ipochi force-pushed the imran/baremetal-reprovisioning branch from 9ec8c47 to 23c266d Compare March 3, 2021 08:12
pothos and others added 7 commits March 3, 2021 16:05
The PXE provisioning logic should only be done if it's either the first
run or if a worker node gets reconfigured. For controllers this is
currently not supported due to losing etcd state and ensuring there is
always the quorum held. It should also be robust against any races and
wait verify that the reprovisioning actually took place.  The new way of
doing it works with local state files under the asset folder. This
serves as source of truth and also gives the user a way to force a PXE
install if needed (reminder: all this exercise here is done because we
don't have a Terraform provider doing this for us). A flag file that
Ignition creates and that a validation step removes further helps to
prevent any races where the worker secrets SSH step would already run
before the reprovisioning took place.
The installer OS also needs to be able to get its network configuration
as CLC snippet if DHCP is not used.
In case the network only has a temporary problem, the installer service
should start again on failure.
Convert it to a oneshot service to set RemainAfterExist, so that it is
not triggered twice if anything that depends on it is pulling it in
again after it finished.
When the PXE boot is skipped, the new Ignition config is already
fetched but it does not contain the kernel arguments because these
have to be set outside of Ignition or otherwise they would only
apply to the following boots. The kernel arguments are kept in the
grub config and this needs to be updated out of band (kernel console
and Ignition URL are also part of the kernel arguments).
Signed-off-by: Imran Pochi <imran@kinvolk.io>
Will be removed, once individual commits contain generated assets.

Signed-off-by: Imran Pochi <imran@kinvolk.io>
pothos and others added 4 commits April 7, 2021 13:44
When the cluster is not fully set up and lokoctl is run again to fix that
problem, the user may want to avoid the control plane update happen at the
same time, because it can cause issues.
Since the disks are not empty, any detected LVM partitions may
automatically be activated on bootup. This can disrupt the installation
and we need to disable them first. This was done for all "vg" prefixes
but we can do it for any volume names. Any non-LVM entries are ignored
with a small error log output.

See flatcar/Flatcar#332
This commit wipes any additional disks attached to the hardware before
proceeding with the Flatcar Container Linux installation.

The reason this is being done is to provide clean and unformatted disk
devices to the Lokomotive components [openebs and rook] for storage
orchestration.

Both rook and openebs fail to use the disks that have data or filesystem
already present on the disk. This leads to among other issues, failed
installation where PVCs get stuck in pending state.

Signed-off-by: Imran Pochi <imran@kinvolk.io>
@pothos
Copy link
Contributor

pothos commented Jun 18, 2021

Will be replaced by #1502
(but don't delete this branch yet)

@pothos pothos closed this Jun 22, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants