Skip to content
This repository was archived by the owner on Jul 4, 2025. It is now read-only.
This repository was archived by the owner on Jul 4, 2025. It is now read-only.

Add Multi-GPU Support for LlamaCpp Engine #1391

Closed
@nguyenhoangthuan99

Description

@nguyenhoangthuan99

Add Multi-GPU Support for LlamaCpp Engine

Description

We need to implement multi-GPU support for our LlamaCpp wrapper engine to improve performance and allow users to utilize multiple GPUs effectively.

Goals

  • Allow users to choose which available GPUs to use for running the engine
  • Implement load balancing across selected GPUs
  • Maintain compatibility with single-GPU setups

Proposed Implementation

  1. Detect available GPUs on the system
  2. Add a configuration option for users to specify which GPUs to use
  3. Modify the wrapper engine to distribute workload across selected GPUs

Acceptance Criteria

  • Users can specify which GPUs to use via configuration
  • The engine correctly utilizes all selected GPUs

Additional Considerations

  • Ensure proper error handling for scenarios where specified GPUs are unavailable
  • Will we add this feature to model.yml for model management?
  • Is this feature works for both CLI and API?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    Status

    Completed

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions