Skip to content

Using AzureIdentityProvider with WorkloadIdentityCredential causes high CPU usage and object allocations in a token refresh busy-wait #40

@nluk

Description

@nluk

Hi,
We've been using this extension since beta-1, before the support for DefaultAzureCredential was introduced, to authenticate to an Azure Managed Redis instance via an AKS workload identity. For this specific scenario, there is a discrepancy between built-in MSAL caching, and the TokenManager + RenewalScheduler renewal process, that causes excessive CPU usage (busy-wait) and rapid object allocation. The scenario in detail:

  1. Your AzureIdentityProvider delegates to the DefaultAzureCredential::getToken
    accessTokenSupplier = () -> defaultAzureCredential.getToken(ctx).block(Duration.ofMillis(timeout));
  2. DefaultAzureCredential selects WorkloadIdentityCredential from the credentials chain
  3. WorkloadIdentityCredential invokes an IdentityClient::authenticateWithWorkloadIdentityConfidentialClient:
    https://github.com/Azure/azure-sdk-for-java/blob/f39bb5bf7bbea85b4591a01fa4a99c2fcec94072/sdk/identity/azure-identity/src/main/java/com/azure/identity/WorkloadIdentityCredential.java#L113
  4. Down the invocation chain that translates to MSAL's AquireTokenByClientCredentialSupplier:
    https://github.com/AzureAD/microsoft-authentication-library-for-java/blob/5a4f9fcffb9d0bf9d8c2c15e29a056213a967d32/msal4j-sdk/src/main/java/com/microsoft/aad/msal4j/AcquireTokenByClientCredentialSupplier.java#L9
  5. That delegates to a AcquireTokenSilentSupplier, that has a built-in cache of tokens.
    https://github.com/AzureAD/microsoft-authentication-library-for-java/blob/5a4f9fcffb9d0bf9d8c2c15e29a056213a967d32/msal4j-sdk/src/main/java/com/microsoft/aad/msal4j/AcquireTokenSilentSupplier.java#L35

In our experience, the issued access token is valid for 24 hours. The cache won't refresh the token unless it's about to expire - five minutes left (https://github.com/AzureAD/microsoft-authentication-library-for-java/blob/5a4f9fcffb9d0bf9d8c2c15e29a056213a967d32/msal4j-sdk/src/main/java/com/microsoft/aad/msal4j/AcquireTokenSilentSupplier.java#L16).

With the default ratio and lower bound:

public static final float DEFAULT_EXPIRATION_REFRESH_RATIO = 0.75F;

public static final int DEFAULT_LOWER_REFRESH_BOUND_MILLIS = 2 * 60 * 1000;

The calculation in:

public long calculateRenewalDelay(long expireDate, long issueDate) {

resolves to:

  1. Lower bound is 24h - 2m - elapsed 0h = ~23h58m
  2. Ratio is 0.75 * 24h - elapsed 0h = ~18h00m
  3. Min(bound,ratio) = 18h of delay.

That means, that in 18 hours an attempt to refresh the token will be made, with 6 hours left until token expiry. But that attempt will result in WorkloadIdentityCredential returning a cached token, since we're not in the 5-minute-long window of refresh in MSAL. That triggers next calculation:

  1. Lower bound is 24h - 2m - elapsed 18h = ~5h58m
  2. Ratio is 0.75 * 24h - elapsed 18 = ~0
  3. Min(bound, ratio) = 0, immediate refresh

The immediate refresh is submitted to an executor, that hits the cache and the entire process loops in a busy-wait.

JFR method profiling shows a busy-wait in TokenManager:
Image

And rapid allocation of renewed tokens:

Image

Minimal example without MSAL, with a provider that replicates the caching behaviour:

import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import redis.clients.authentication.core.IdentityProvider;
import redis.clients.authentication.core.SimpleToken;
import redis.clients.authentication.core.Token;
import redis.clients.authentication.core.TokenListener;
import redis.clients.authentication.core.TokenManager;
import redis.clients.authentication.core.TokenManagerConfig;
import redis.clients.authentication.entraid.EntraIDTokenAuthConfigBuilder;

import java.time.Duration;
import java.time.Instant;
import java.util.Collections;

public class MinimalExample {

    private static final Logger log = LoggerFactory.getLogger(MinimalExample.class);

    static class IdentityProviderWith5MinuteBound implements IdentityProvider {

        private static final Logger log = LoggerFactory.getLogger(IdentityProviderWith5MinuteBound.class);

        private Token token = issueToken();

        @Override
        public Token requestToken() {
            if(System.currentTimeMillis() > (token.getExpiresAt() - (5 * 60 * 1000))) {
                token = issueToken();
            }
            log.info("Returning cached token");
            return token;
        }

        private static Token issueToken() {
            log.info("Issuing token");
            Instant now = Instant.now(); // needed as a reference point-in-time, because TokenManager uses System.currentTimeMillis
            Instant iat = now.minus(Duration.ofHours(19)); // Issued 19hrs ago = we're hitting the default 0.75 ratio of our 24h token
            Instant exp = iat.plus(Duration.ofHours(24));
            return new SimpleToken("user", "value", exp.toEpochMilli(), iat.toEpochMilli(), Collections.emptyMap());
        }
    }

    public static void main(String[] args) throws InterruptedException {
        TokenManagerConfig defaultEntraConfig = EntraIDTokenAuthConfigBuilder.builder().build().getTokenManagerConfig();
        IdentityProviderWith5MinuteBound identityProvider = new IdentityProviderWith5MinuteBound();
        TokenManager tokenManager = new TokenManager(identityProvider, defaultEntraConfig);
        Duration.ofMillis(tokenManager.calculateRenewalDelay(Instant.now().plus(Duration.ofHours(24)).toEpochMilli(), Instant.now().toEpochMilli()));
        tokenManager.start(new TokenListener() {
            @Override
            public void onTokenRenewed(Token newToken) {
                log.info("Token renewed");
            }

            @Override
            public void onError(Exception reason) {
                log.info("Error", reason);
            }
        }, false);
        Thread.sleep(Duration.ofDays(1));
    }
}

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions