SUSE-AI Node Setup via Ansible

This project automates the setup of a high-availability RKE2 cluster with Rancher on SUSE-based systems, providing a reliable foundation for deploying SUSE AI applications. It streamlines installation through containerized Ansible playbooks and, when enabled, adds GPU support by installing the NVIDIA driver and GPU Operator.

Note: The initial implementation is designed to work with the default configuration options for both RKE2 and Rancher.

Prerequisites

Docker or Podman
SSH key-based access to all target nodes
Target hosts are SUSE based OS
Proper DNS setup (e.g. rancher.example.com)
Target hosts must fulfill prerequisites at https://docs.rke2.io/install/quickstart#prerequisites
Target hosts with python 3.11+ version. Verify that Python 3 points to version 3.11 or higher. Run python3 --version
A valid registration key for the SUSE OS enterprise distribution, which can be obtained with your SUSE subscription.

Components

Ansible playbooks for:
- Optional installation of nvidia driver packages
- RKE2 HA server installation
- RKE2 agent node installation
- Optional deployment of Rancher
- Optional deployment of nvidia gpu-operator
Roles for idempotent configuration
A Dockerfile to run the playbooks in a container

Inventory Example

This is an example of inventory.ini file with 3 RKE2 Servers and 2 RKE2 Agents.

#inventory.ini.example
[rke2_servers]
rke2_server1 ansible_host=192.168.1.10
rke2_server2 ansible_host=192.168.1.11
rke2_server3 ansible_host=192.168.1.12

[rke2_agents]
rke2_agent1 ansible_host=192.168.1.20
rke2_agent2 ansible_host=192.168.1.21

[all:vars]
ansible_user=<SSH_USER>

This is an example of inventory.ini file with 1 RKE2 server.

#inventory.ini.onenode.example
[rke2_servers]
rke2_server1 ansible_host=192.168.1.10

[all:vars]
ansible_user=<SSH_USER>

This is an example of inventory.ini file with target host being the localhost.

##inventory.ini.local.example
[rke2_servers]
rke2_server1 ansible_host=localhost

[all:vars]
ansible_user=<SSH_USER>

Notes

Mount your SSH keys under ~/.ssh to enable access to target nodes.
The load balancer rke2.lb_address provided in the extra_vars.yml must route port 9345 and 443 to the RKE2 server nodes.

Usage

1. Build the Docker Image from the source

docker build -t suse-ai-node-ansible-runner -f Dockerfile.local .

2. Create inventory.ini file

cp inventory.ini.example inventory.ini

Update the ansible host and user entries in inventory.ini

3. Create extra_vars.yml

cp extra_vars.yml.example extra_vars.yml

Configure entries in extra_vars.yml accordingly.

4. Run the site.yml playbook

At a high level, this playbook verifies that the target hosts are supported systems and registers them with the SCC if they are not already registered. It installs required packages and, when enabled, the NVIDIA drivers. NVIDIA G06 drivers are installed on servers with NVIDIA GPUs and are supported on Turing and newer architectures. Finally, the playbook reboots the target hosts and then run some checks and installs rke2 servers, rke2 agents, rancher and gpu-operator.

docker run --rm \
  -v ~/.ssh/id_rsa:/root/.ssh/id_rsa:ro \
  -v ./inventory.ini:/workspace/inventory.ini \
  -v ./extra_vars.yml:/workspace/extra_vars.yml \
  suse-ai-node-ansible-runner \
  ansible-playbook -i inventory.ini playbooks/site.yml -e "@extra_vars.yml"

If your target ansible_host is a localhost:

docker run --rm \
  --network host \
  -v ~/.ssh/id_rsa:/root/.ssh/id_rsa:ro \
  -v ./inventory.ini:/workspace/inventory.ini \
  -v ./extra_vars.yml:/workspace/extra_vars.yml \
  suse-ai-node-ansible-runner \
  ansible-playbook -i inventory.ini playbooks/stage1.yml -e "@extra_vars.yml"

docker run --rm \
  -v ~/.ssh/id_rsa:/root/.ssh/id_rsa:ro \
  -v ./inventory.ini:/workspace/inventory.ini \
  -v ./extra_vars.yml:/workspace/extra_vars.yml \
  suse-ai-node-ansible-runner \
  ansible-playbook -i inventory.ini playbooks/stage2.yml -e "@extra_vars.yml"

Note: NVIDIA drivers are not installed when localhost is the target. Recommended for localhost is to manually install the drivers.

6. Troubleshooting

6a. Failed to connect to the host via ssh

confirm key permissions (~/.ssh 700, private key 600).

verify public key is in ~/.ssh/authorized_keys of the remote user.

run ssh -v user@host to debug connection/auth issues.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
group_vars		group_vars
playbooks		playbooks
roles		roles
.gitignore		.gitignore
Dockerfile.local		Dockerfile.local
LICENSE		LICENSE
README.md		README.md
ansible.cfg		ansible.cfg
extra_vars.yml.example		extra_vars.yml.example
inventory.ini.example		inventory.ini.example
inventory.ini.local.example		inventory.ini.local.example

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

SUSE-AI Node Setup via Ansible

Prerequisites

Components

Inventory Example

Notes

Usage

1. Build the Docker Image from the source

2. Create inventory.ini file

3. Create extra_vars.yml

4. Run the site.yml playbook

6. Troubleshooting

6a. Failed to connect to the host via ssh

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

License

SUSE/suse-ai-node-ansible

Folders and files

Latest commit

History

Repository files navigation

SUSE-AI Node Setup via Ansible

Prerequisites

Components

Inventory Example

Notes

Usage

1. Build the Docker Image from the source

2. Create inventory.ini file

3. Create extra_vars.yml

4. Run the site.yml playbook

6. Troubleshooting

6a. Failed to connect to the host via ssh

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages