Skip to content

Conversation

guiand888
Copy link
Contributor

Fixes certificate validation failures in multi-server setups where api_endpoint (FQDN) differs from the detected hostname

Problem

When using FQDNs in inventory (e.g., server1.example.com), k3s HA clusters fail to bootstrap because:

  1. First server generates certificate with SANs based on ansible_hostname (e.g., server1)
  2. Additional servers try to connect using api_endpoint (e.g., server1.example.com)
  3. Certificate validation fails: x509: certificate is valid for server1, not server1.example.com

This occurs when:

  • Using FQDNs in inventory files
  • api_endpoint resolves to a different name than the detected hostname

Solution

Automatically add --tls-san={{ api_endpoint }} to k3s server arguments when api_endpoint differs from ansible_hostname.

Changes

  • roles/k3s_server/defaults/main.yml: Add computed TLS SAN logic
  • roles/k3s_server/templates/*.service.j2: Use computed server args

Logic

_computed_tls_sans: "{% if api_endpoint is defined and api_endpoint != ansible_hostname %}--tls-san={{ api_endpoint }}{% endif %}"
_final_server_args: "{{ extra_server_args }} {{ _computed_tls_sans }}"

Testing

Verified with:

  • FQDN inventory (server1.example.com → adds --tls-san=server1.example.com)
  • Hostname inventory (server1 → no additional SAN needed)
  • HA cluster bootstrap now succeeds with certificate validation

@dereknola
Copy link
Member

Several things:

  1. Please sign your commits to comply with DCO
  2. Don't generate the logic in the defaults.yaml, those files are by design very simple defaults. Move logic into the server and agent tasks and assign new variables with set_fact
  3. (What makes doing this automatically more difficult) You need to validate that a tls-san flag does not exist in both the server_config_yaml and extra_server_args variables (same for agents) before assuming you can assign it.

Your problem is solved just by setting the tls-san flag yourself to whatever external hostname you setup. This is standard K3s practice and its well documented. That's why there has never been a push to automated this step, as other related user intervention was already required.

- Auto-add --tls-san={{ api_endpoint }} when it differs from ansible_hostname
- Prevents 'x509: certificate is valid for hostname, not FQDN' errors
- Ensures first server generates certificate with all required SANs
- Maintains backward compatibility with existing configurations
- Fixes HA cluster bootstrap issues when using FQDNs in inventory

Closes certificate validation failures in multi-server setups where
api_endpoint (FQDN) differs from the detected hostname.

Signed-off-by: Guillaume Andre <[email protected]>
@guiand888 guiand888 force-pushed the fix/tls-san-api-endpoint branch from d6440ce to 24ec0ec Compare August 26, 2025 03:21
Comment on lines 12 to 14
# Auto-computed TLS SANs to prevent certificate validation issues
_computed_tls_sans: "{% if api_endpoint is defined and api_endpoint != ansible_hostname %}--tls-san={{ api_endpoint }}{% endif %}"
_final_server_args: "{{ extra_server_args }} {{ _computed_tls_sans }}"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move the logic out of these files, we specifically want these to be very simple default files.

@guiand888
Copy link
Contributor Author

Thank you for your feedback and sorry, I had forgotten to push changes.

I moved the TLS SAN logic from defaults to tasks using set_fact.

The code also validates that tls-san doesn't already exist in server_config_yaml / agent_config_yaml and extra_server_args before adding it, to prevent duplicate entries.

@guiand888 guiand888 requested a review from dereknola September 6, 2025 14:28
Copy link
Member

@dereknola dereknola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good but I want to separate this new tls-san argument from the existing extra_args. No need to handle injecting the extra_args when we don't have to.

@dereknola
Copy link
Member

I will wait on merging #442 till after this PR is merged (no need to make you deal with my merge conflict :)

guiand888 and others added 2 commits September 13, 2025 19:16
Applied suggestion from @dereknola

Co-authored-by: Derek Nola <[email protected]>
Signed-off-by: Guillaume A <[email protected]>
Applied suggestion from @dereknola

Co-authored-by: Derek Nola <[email protected]>
Signed-off-by: Guillaume A <[email protected]>
@dereknola dereknola merged commit f2aed3b into k3s-io:master Sep 15, 2025
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants