Increase max gpu utilization for 70b models #517

dmchoiboi · 2024-05-14T16:59:13Z

Pull Request Summary

What is this PR changing? Why is this change being made? Any caveats you'd like to highlight? Link any relevant documents, links, or screenshots here if applicable.

Up max gpu memory utilization to 0.95 for 70b models in attempt to address OOM issues

https://linear.app/scale-epd/issue/MLI-2309/use-095-gpu-memory-utilization-for-70b-models

Test Plan and Usage Guide

How did you validate that your PR works correctly? How do you run or demo the code? Provide enough detail so a reviewer can reasonably reproduce the testing procedure. Paste example command line invocations if applicable.

Published test docker image for batch_inference. Tested with API request using local gateway: job ft-cp21h54gfe6g02mlqikg

model-engine/model_engine_server/common/dtos/llms.py

model-engine/model_engine_server/domain/use_cases/llm_model_endpoint_use_cases.py

yunfeng-scale · 2024-05-14T21:21:04Z

lgtm

yixu34 · 2024-05-14T23:12:41Z

model-engine/model_engine_server/domain/use_cases/llm_model_endpoint_use_cases.py

@@ -2198,6 +2199,27 @@ async def execute(self, user: User, request: ModelDownloadRequest) -> ModelDownl
        return ModelDownloadResponse(urls=urls)


+@dataclass
+class VLLMEngineArgs:


Hm I know this is by no means the main offender, but implementation specifics like vLLM aren't supposed to go into the use case layer. Granted, that'd require another layer, which I suspect @yunfeng-scale would find perfunctory 😁

I guess I could just call it LLMEngineArgs. It seems right now we only support batch inference w/ vLLM, so we could try to do a proper abstraction when we decide we need to support it for a different engine?

Yeah I think this is ok for now.

oh 😅 you had a good point, the current code structure does not completely fit into clean architecture. in that sense we might want to move all these framework-specific code to another layer

…ization-for-70b-models

Increase max gpu utilization for 70b models

6764c4b

dmchoiboi requested a review from yunfeng-scale May 14, 2024 16:59

yunfeng-scale reviewed May 14, 2024

View reviewed changes

model-engine/model_engine_server/common/dtos/llms.py Outdated Show resolved Hide resolved

yunfeng-scale reviewed May 14, 2024

View reviewed changes

model-engine/model_engine_server/domain/use_cases/llm_model_endpoint_use_cases.py Outdated Show resolved Hide resolved

Separate Gateway DTO and engine DTO

a59cf19

dmchoiboi force-pushed the michaelchoi/mli-2309-use-095-gpu-memory-utilization-for-70b-models branch from 5a2bb87 to a59cf19 Compare May 14, 2024 18:03

Update test fixtures

1e17ab4

dmchoiboi force-pushed the michaelchoi/mli-2309-use-095-gpu-memory-utilization-for-70b-models branch from bc50329 to 1e17ab4 Compare May 14, 2024 20:00

yunfeng-scale approved these changes May 14, 2024

View reviewed changes

yixu34 reviewed May 14, 2024

View reviewed changes

Merge branch 'main' into michaelchoi/mli-2309-use-095-gpu-memory-util…

924542f

…ization-for-70b-models

dmchoiboi merged commit fbe7417 into main May 15, 2024

dmchoiboi deleted the michaelchoi/mli-2309-use-095-gpu-memory-utilization-for-70b-models branch May 15, 2024 02:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Increase max gpu utilization for 70b models #517

Increase max gpu utilization for 70b models #517

Uh oh!

dmchoiboi commented May 14, 2024 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

yunfeng-scale commented May 14, 2024

Uh oh!

yixu34 May 14, 2024

Uh oh!

dmchoiboi May 14, 2024

Uh oh!

yixu34 May 14, 2024

Uh oh!

yunfeng-scale May 15, 2024

Uh oh!

Uh oh!

Increase max gpu utilization for 70b models #517

Increase max gpu utilization for 70b models #517

Uh oh!

Conversation

dmchoiboi commented May 14, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Pull Request Summary

Test Plan and Usage Guide

Uh oh!

Uh oh!

Uh oh!

yunfeng-scale commented May 14, 2024

Uh oh!

yixu34 May 14, 2024

Choose a reason for hiding this comment

Uh oh!

dmchoiboi May 14, 2024

Choose a reason for hiding this comment

Uh oh!

yixu34 May 14, 2024

Choose a reason for hiding this comment

Uh oh!

yunfeng-scale May 15, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

dmchoiboi commented May 14, 2024 •

edited

Loading