-
Notifications
You must be signed in to change notification settings - Fork 67
Validate quantization #315
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
model-engine/model_engine_server/domain/use_cases/llm_model_endpoint_use_cases.py
Outdated
Show resolved
Hide resolved
model-engine/model_engine_server/domain/use_cases/llm_model_endpoint_use_cases.py
Outdated
Show resolved
Hide resolved
@@ -73,5 +74,5 @@ def get_boolean_env_var(name: str) -> bool: | |||
logger.warning("LOCAL development & testing mode is ON") | |||
|
|||
GIT_TAG: str = os.environ.get("GIT_TAG", "GIT_TAG_NOT_FOUND") | |||
if GIT_TAG == "GIT_TAG_NOT_FOUND": | |||
if GIT_TAG == "GIT_TAG_NOT_FOUND" and "pytest" not in sys.modules: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
make pytest work without specifying GIT_TAG
LLMInferenceFramework.DEEPSPEED: [], | ||
LLMInferenceFramework.TEXT_GENERATION_INFERENCE: [Quantization.BITSANDBYTES], | ||
LLMInferenceFramework.VLLM: [Quantization.AWQ], | ||
LLMInferenceFramework.LIGHTLLM: [], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
probably best for a separate pr, but can you update the docs to specify which models in the model zoo support lightllm as inference framework?
) | ||
if num_shards > gpus: | ||
raise ObjectHasInvalidValueException( | ||
f"Num shard {num_shards} must be less than or equal to the number of GPUs {gpus}." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: could mention the inference framework in the error msg
Validate quantization values when creating endpoints