Using LiteLLM with a Custom REST API Endpoint and Dynamic Token for Vertex AI #14198

jkushagra · 2025-09-03T06:00:00Z

jkushagra
Sep 3, 2025

I am trying to configure LiteLLM to work with Google's Vertex AI models in a restrictive corporate environment. My goal is to have a LiteLLM proxy that can route requests to a custom Vertex AI REST endpoint using a dynamically generated authentication token. This would allow me to leverage frameworks like Google ADK , Langgraph etc, which integrate smoothly with LiteLLM.

Working Direct-SDK Solution

My environment has several constraints:

No gcloud CLI access.
No Application Default Credentials (ADC).
No service account .json key files.

Authentication is handled by a custom Python function, get_api_key(), which returns a short-lived bearer token as a string. I have a fully functional setup using the google-cloud-aiplatform SDK directly, which proves that my token and endpoint are valid.

Here is the code that works correctly:

import os
import vertexai
from google.oauth2.credentials import Credentials
from vertexai.generative_models import GenerativeModel

def get_api_key():
    """
    Custom internal function to retrieve a temporary auth token.
    (Implementation is internal to my organization)
    """
    # This function returns a raw token string, e.g., "ey..."
    return "your-dynamic-token-string"

# 1. Create a credentials object from the raw token
credentials = Credentials(token=get_api_key())

# 2. Initialize the Vertex AI client with custom parameters
vertexai.init(
    project="my-gcp-project-id",
    api_transport="rest",
    api_endpoint="https://customendpoint/vertex", # My organization's custom endpoint
    credentials=credentials,
    location="us-central1",
    request_metadata=[("x-user", os.getenv("USERNAME"))], # Required metadata
)

# 3. Successfully generate content
# This part works perfectly.
model = GenerativeModel(model_name="gemini-1.0-pro")
response = model.generate_content("Hello, this is a test.")
print(response.text)

Challenge: Integrating with LiteLLM

I am now trying to replicate this successful authentication flow within LiteLLM. I have attempted to use LiteLLM's CustomLLM feature, but I am encountering an error where my custom provider is not being recognized.

What I've Tried

I created a config.yaml file and a corresponding Python file for my custom logic.

config.yaml

model_list:
  - model_name: vertex-org-model
    litellm_params:
      model: gemini-1.0-pro # The underlying Gemini model
      custom_llm_provider: vertex_org_llm.VertexOrgLLM
      init_kwargs:
        project: "my-gcp-project-id"
        location: "us-central1"
        api_endpoint: "https://customendpoint/vertex"

litellm_settings:
  set_verbose: True

vertex_org_llm.py (placed in the same directory)

import os
import vertexai
from google.oauth2.credentials import Credentials
from vertexai.generative_models import GenerativeModel
from litellm.llms.custom_llm import CustomLLM

def get_api_key():
    """
    Custom internal function to retrieve a temporary auth token.
    """
    return "your-dynamic-token-string"

class VertexOrgLLM(CustomLLM):
    """
    Custom LiteLLM provider to handle our specific Vertex AI authentication.
    """
    def __init__(self, model: str, **kwargs):
        super().__init__()
        
        token = get_api_key()
        if not token:
            raise ValueError("Failed to retrieve API token via get_api_key()")

        creds = Credentials(token=token)

        # Initialize vertexai within the class instance
        vertexai.init(
            project=kwargs.get("project"),
            location=kwargs.get("location"),
            api_transport="rest",
            credentials=creds,
            api_endpoint=kwargs.get("api_endpoint"),
            request_metadata=[("x-user", os.getenv("USERNAME"))],
        )
        
        self.model = GenerativeModel(model)

    def completion(self, messages, **kwargs):
        """
        Handles non-streaming completion requests.
        """
        prompt = "\n".join([m["content"] for m in messages if m["role"] == "user"])
        response = self.model.generate_content([prompt])
        
        return {
            "choices": [
                {"message": {"content": response.text}}
            ]
        }

    def streaming(self, messages, **kwargs):
        """
        Handles streaming requests.
        """
        # Placeholder for future implementation
        raise NotImplementedError("Streaming not yet implemented for this custom provider.")

The Error

When I run the LiteLLM proxy using litellm --config config.yaml, the server fails to load my custom model with the following error:

LiteLLM: Proxy initialized with Config, Set models:
vertex-org-model
11:16:05 - LiteLLM Router:ERROR: router.py:4954 - Error creating deployment: Unsupported provider - vertex_org_llm.VertexOrgLLM, ignoring and continuing with other deployments.
Traceback (most recent call last):
File "xx\site-packages\litellm\router.py", line 4946, in _create_deployment
deployment = self._add_deployment(deployment=deployment)
File "xx\site-packages\litellm\router.py", line 5121, in _add_deployment
raise Exception(f"Unsupported provider - {custom_llm_provider}")
Exception: Unsupported provider - vertex_org_llm.VertexOrgLLM

This Unsupported provider error suggests that the LiteLLM router cannot register my VertexOrgLLM class from the vertex_org_llm.py file.

My Questions for the Community

How can I correctly register a CustomLLM provider?
Is this the right approach? Is creating a CustomLLM class the intended method for this scenario, or is there a more direct way to pass a dynamic bearer token and custom endpoint to LiteLLM's native Vertex AI provider?
Is there an alternative configuration? Given my specific constraints, what is the recommended way to configure LiteLLM? For example, can I somehow pass the credentials object created from my token directly to LiteLLM?

Any guidance on the correct procedure to solve this would be extremely helpful. Thank you!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Using LiteLLM with a Custom REST API Endpoint and Dynamic Token for Vertex AI #14198

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Uh oh!

Using LiteLLM with a Custom REST API Endpoint and Dynamic Token for Vertex AI #14198

Uh oh!

jkushagra Sep 3, 2025

Working Direct-SDK Solution

What I've Tried

The Error

My Questions for the Community

Replies: 0 comments

jkushagra
Sep 3, 2025