Skip to content

Conversation

@agreaves-ms
Copy link
Collaborator

Description

Adds Azure AKS Workload Identity support, allowing pods to authenticate to Azure Blob Storage without connection strings or managed secrets. Uses DefaultAzureCredential for token-based auth and updates Helm charts to support workload identity configuration.

Changes

Helm charts

  • all charts (service, router, backend-operator, web-ui) now support ServiceAccount configuration:
    • serviceAccount.create - toggle chart-managed ServiceAccount creation
    • serviceAccount.name - specify pre-provisioned SA name
    • serviceAccount.annotations - add workload identity annotations (e.g., azure.workload.identity/client-id)
  • added extraPodLabels to services for workload identity labels (azure.workload.identity/use: "true")
  • made imagePullSecrets optional when using workload identity

Azure storage backend

  • AzureBlobStorageClient supports both connection string and token-based auth
  • added DefaultAzureCredential support for workload identity
  • SAS token generation now uses user delegation keys instead of account keys
  • credential fields (access_key_id, access_key) are now optional

Credentials

  • BasicDataCredential fields are now optional
  • added get_access_key_value() for safe credential retrieval
  • updated callers to use new accessor pattern

PyInstaller

  • added azure.identity, msal, and msal_extensions to Azure hooks for CLI workload identity support

Tests

  • added test coverage for workload identity auth paths

Usage

serviceAccount:
  create: false
  name: "my-workload-identity-sa"
  annotations:
    azure.workload.identity/client-id: "<client-id>"

services:
  service:
    extraPodLabels:
      azure.workload.identity/use: "true"

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@agreaves-ms agreaves-ms requested a review from a team December 11, 2025 21:03
@fernandol-nvidia
Copy link
Contributor

This is awesome, @agreaves-ms! Thank you so much for looking into this.

The team will be going through this change and thinking through what else we might need to change. Some high level thoughts is we probably need to give you more "types" of Credentials (e.g. WorkloadIdentityCredential, DataAccessCredential, ConnectionStringCredential, etc).

Another thing we will look at as a team is changes necessary to our service and osmo-ctrl to support this. Please stay tuned! And again, really appreciate the great work so far!

@agreaves-ms
Copy link
Collaborator Author

This is awesome, @agreaves-ms! Thank you so much for looking into this.

The team will be going through this change and thinking through what else we might need to change. Some high level thoughts is we probably need to give you more "types" of Credentials (e.g. WorkloadIdentityCredential, DataAccessCredential, ConnectionStringCredential, etc).

Another thing we will look at as a team is changes necessary to our service and osmo-ctrl to support this. Please stay tuned! And again, really appreciate the great work so far!

This is a great idea, I wanted to try and make a PR that was small and focused without refactoring to much of your codebase. Mainly to get the ideas through. Having a new credential type would be the most ideal, I'm needing to also alter the client_config.py to support DataSet upload and also DataSet download from both the osmo cli (if I'm a user running locally with az login done and I have the proper roles to be able to access the Storage Accoutn and Blob container) and then also from the osmo service kicking off the workload (needing to download the DataSet input data or upload the output data to the DataSet).

I'll update this PR with that change as well as a separate commit and then point you at it.

I was trying to think through a more agnostic naming scheme for this since Workload Identity is really an Azure and GCP term whereas AWS uses something like IRSA. I was thinking of going with EnvironmentAuthCredential or something similar... Anyways, you'll see that naming in this commit that I have.

Also, feel free to not take this PR, and instead implement the logic however way makes the most sense for your teams architecture that you've planned out. Thank you again for looking into this!

@agreaves-ms
Copy link
Collaborator Author

@fernandol-nvidia I've pushed an extra commit to further address workload identity support for input and output datasets, both on upload from the osmo-cli and then running in the workflow. Please take a look at that as well when considering implementation. I've now fully tested this in my own environment and I'm able to use my authentication with az login before doing an osmo workflow submit and it uses my credentials and access to the storage account to upload the dataset, then in the running pod for the workflow it's using the workload identity that's there to pull in the dataset in the blob.

Feel free to use this work however your team needs. I'm happy to further test any implementation that you all make available as well.

@fernandol-nvidia
Copy link
Contributor

Hey @agreaves-ms, thanks for the update and apologies for the delay. I spent the past couple of days doing a refactor of how OSMO data operations work w.r.t to different credential strategies. I have a pending PR at #159.

The tl;dr is that we are differentiating DataCredential into StaticDataCredential (what we currently have) and DefaultDataCredential (which we delegate to the underlying SDK on credential resolution).

I believe this would give you the necessary abstractions/interface to add workload identity support for Azure. More specifically:

Defining the behavior at

def create_client(data_cred: credentials.DataCredential) -> blob.BlobServiceClient:
"""
Creates a new Azure Blob Storage client.
"""
match data_cred:
case credentials.StaticDataCredential():
return blob.BlobServiceClient.from_connection_string(
conn_str=data_cred.access_key.get_secret_value(),
)
case credentials.DefaultDataCredential():
raise NotImplementedError(
'Default data credentials are not supported yet')
case _ as unreachable:
assert_never(unreachable)

And overriding this method on how to decide if workload identity should be available for a particular backend:

@functools.cached_property
def resolved_data_credential(self) -> credentials.DataCredential:
"""
Resolve the data credential for the storage backend.
Subclasses should override this method to provide custom credential resolution logic.
Returns:
The resolved data credential.
Raises:
OSMOCredentialError: If the data credential is not found.
"""
data_cred = credentials.get_static_data_credential_from_config(self.profile)
if data_cred is None:
raise osmo_errors.OSMOCredentialError(
f'Data credential not found for {self.profile}')
return data_cred

Hopefully, this frees you from having to update so many unrelated files just to get Azure workload identity working :)

Please feel free to add any feedback/suggestions that would be helpful for you. I will keep you updated as things progress and can help you rebase this PR to absorb the new changes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants