Skip to content

Conversation

@jrasell
Copy link
Member

@jrasell jrasell commented Nov 17, 2025

When a Nomad client is configured with leave_on_terminate or leave_on_interrupt and drain_on_shutdown and is asked to shutdown, it will make an RPC request to Node.UpdateDrain. When ACLs are enabled, this request is authenticated using the nodes authentication token which in 1.11 is the signed JWT.

This RPC was not correctly handling requests made with a node identity token which meant nodes attempting to drain themselves were returned with permissions errors.

The change updates the RPC handler to correctly handle both secret IDs and JWT tokens. It must support both, so that mixed cluster client topologies can be handled. A new test was added to exercise identity calls to the RPC and the test suite updated to use a table, so failures are easier to find in the future.

Testing & Reproduction steps

Can be tested locally using a separate client and server processes; the client should have the following configuration options:

leave_on_terminate = true
leave_on_interrupt = true

client {
  drain_on_shutdown {
    deadline           = "1h"
    force              = true
    ignore_system_jobs = false
  }
}

Links

closes #27104
jira https://hashicorp.atlassian.net/browse/NMD-1069

Contributor Checklist

  • Changelog Entry If this PR changes user-facing behavior, please generate and add a
    changelog entry using the make cl command.
  • Testing Please add tests to cover any new functionality or to demonstrate bug fixes and
    ensure regressions will be caught.
  • Documentation If the change impacts user-facing functionality such as the CLI, API, UI,
    and job configuration, please update the Nomad website documentation to reflect this. Refer to
    the website README for docs guidelines. Please also consider whether the
    change requires notes within the upgrade guide.

Reviewer Checklist

  • Backport Labels Please add the correct backport labels as described by the internal
    backporting document.
  • Commit Type Ensure the correct merge method is selected which should be "squash and merge"
    in the majority of situations. The main exceptions are long-lived feature branches or merges where
    history should be preserved.
  • Enterprise PRs If this is an enterprise only PR, please add any required changelog entry
    within the public repository.

@jrasell jrasell self-assigned this Nov 17, 2025
jrasell added a commit that referenced this pull request Nov 17, 2025
@jrasell jrasell added the backport/1.11.x backport to 1.11.x release line label Nov 17, 2025
When a Nomad client is configured with `leave_on_terminate` or
`leave_on_interrupt` and `drain_on_shutdown` and is asked to
shutdown, it will make an RPC request to `Node.UpdateDrain`. When
ACLs are enabled, this request is authenticated using the nodes
authentication token which in 1.11 is the signed JWT.

This RPC was not correctly handling requests made with a node
identity token which meant nodes attempting to drain themselves
were returned with permissions errors.

The change updates the RPC handler to correctly handle both secret
IDs and JWT tokens. It must support both, so that mixed cluster
client topologies can be handled. A new test was added to exercise
identity calls to the RPC and the test suite updated to use a table,
so failures are easier to find in the future.
Copy link
Member

@tgross tgross left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Longer term, should we update the Authenticate/AuthenticateClientOnly methods so that the ClientID is getting set for the node identity code path? Right now this is a one-off authentication and there's probably a lot of overlap with RPCs like Node.GetClientAllocs that we can use to simplify this.

@jrasell jrasell merged commit a3cab66 into main Nov 18, 2025
38 checks passed
@jrasell jrasell deleted the b-NMD-1069 branch November 18, 2025 06:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport/1.11.x backport to 1.11.x release line

Projects

None yet

Development

Successfully merging this pull request may close these issues.

drain on shutdown gets permission denied errors (1.11.0)

2 participants