-
Notifications
You must be signed in to change notification settings - Fork 67
Ianmacleod/completion sync error throws 4xx #234
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@@ -636,7 +636,10 @@ def model_output_to_completion_output( | |||
) | |||
except Exception as e: | |||
logger.exception(f"Error parsing text-generation-inference output {model_output}") | |||
raise e | |||
if 'generated_text' not in model_output: | |||
raise ObjectHasInvalidValueException( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm yeah I'm not sure it necessarily follows that a user passed a bad input, if we observe if 'generated_text' not in model_output:
. It seems like we're doing this empirically instead of going against TGI's API (not sure if they document this behavior). Perhaps the original exception that gets thrown can tell us something? @yunfeng-scale thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i don't think TGI's error behavior is documented, they just send back a string when error
can you change error handling for streaming as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah I'll update the error handling for streaming as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The initial error message from TGI is a stringified json message: "{'error': 'Input validation error: inputs
must have less than 2048 tokens. Given: 2687', 'error_type': 'validation'}". I could parse this error and return it to the user wrapped as an InputValidationError? Does that sound better? @yixu34 @yunfeng-scale
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally not ideal to try and parse strings, and it's instead preferable to rely on structured fields. But in the absence of that, think this is fine for now. It's at least more precise than inferring semantics like what's currently being done in this PR?
I'd be in favor of doing the parse, putting a TODO, and revisiting this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think we could try to parse it as an error and return more precise content. the current error Error parsing text-generation-inference output
does not really capture what the error is and might confuse user
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay added this logic for completion sync
model-engine/model_engine_server/domain/use_cases/llm_model_endpoint_use_cases.py
Outdated
Show resolved
Hide resolved
logger.exception(f"Error parsing text-generation-inference output {model_output}. Error message: {json.loads(e)['error']}") | ||
if 'generated_text' not in model_output: | ||
raise ObjectHasInvalidValueException( | ||
f"Error parsing text-generation-inference output {model_output}. Error message: {json.loads(e)['error']}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
my suggestion is:
if model_output.get("error_type") == "validation":
raise InvalidRequestException(model_output.get("error") # trigger a 400
else:
raise UpstreamServiceError(model_output.get("error")) # also change llms_v1.py that will return a 500 HTTPException so user can retry
@@ -57,3 +57,8 @@ class ReadOnlyDatabaseException(DomainException): | |||
""" | |||
Thrown if the server attempted to write to a read-only database. | |||
""" | |||
|
|||
class InvalidRequestException(DomainException): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it already exists in model_engine_server.domain.exceptions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure difference between model_engine_server.core.domain_exceptions
and model_engine_server.domain.exceptions
though
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh, sorry about that -- I'll import from model_engine_server.domain.exceptions
, I can't think of a reason why we'd need this in two places
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually we should probably look into why we have two different sets of exceptions and combine them if possible. I'll look into that
) | ||
''' | ||
if model_output.get("error_type") == "validation": | ||
raise InvalidRequestException(model_output.get("error")) # trigger a 400 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think some changes are needed in llms_v1.py, are you able to get 400 repro validation error with the current changes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, added them so that we handle and return 400 HTTPException
if "error" in result: | ||
logger.exception(f"Error parsing text-generation-inference stream output. Error message: {e}") | ||
raise ObjectHasInvalidValueException( | ||
f"Error parsing text-generation-inference stream output." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i would expect same error processing logics here as in sync use case
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated to reflect that now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
raise InvalidRequestException or UpstreamServiceError?
@@ -803,8 +813,10 @@ async def execute( | |||
ObjectNotAuthorizedException: If the owner does not own the model endpoint. | |||
""" | |||
|
|||
logging.debug("do we reach execute()?") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove
if "error" in result: | ||
logger.exception(f"Error parsing text-generation-inference stream output. Error message: {e}") | ||
raise ObjectHasInvalidValueException( | ||
f"Error parsing text-generation-inference stream output." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
raise InvalidRequestException or UpstreamServiceError?
if model_output.get("error_type") == "validation": | ||
raise InvalidRequestException(model_output.get("error")) # trigger a 400 | ||
else: | ||
raise UpstreamServiceError(model_output.get("error")) # also change llms_v1.py that will return a 500 HTTPException so user can retry |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you double check 500s with proper error messages are returned in this code path?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm sorry for re-requesting a review pre-emptively (I should've cleaned up the code better). I'll fix these issues and then reach back out, sorry @yunfeng-scale and thanks for the comments!
model-engine/model_engine_server/domain/use_cases/llm_model_endpoint_use_cases.py
Outdated
Show resolved
Hide resolved
async for message in response: | ||
yield {"data": message.json()} | ||
except InvalidRequestException as exc: | ||
yield { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you update docs about how to deal with errors when streaming
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, right now it looks like:
stream = Completion.create(
model="llama-2-7b",
prompt="Give me a 200 word summary on the current economic events in the US.",
max_new_tokens=1000,
temperature=0.2,
stream=True,
)
for response in stream:
try:
if response.output:
print(response.output.text, end="")
sys.stdout.flush()
except: # an error occurred
print(stream.text) # print the error message out
I will add this directly to the llm-engine home page doc here: https://llm-engine.scale.com/getting_started/ and also here https://llm-engine.scale.com/guides/completions/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it sounds like you want to add them in a separate PR which is fine. be aware everything is in the same repo now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just added the docs changes in this same pr
async for message in response: | ||
yield {"data": message.json()} | ||
except InvalidRequestException as exc: | ||
yield { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it sounds like you want to add them in a separate PR which is fine. be aware everything is in the same repo now
print(response.output.text, end="") | ||
sys.stdout.flush() | ||
except: # an error occurred | ||
print(stream.text) # print the error message out |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm, this doesn't seem right to me, shouldn't we print response.error
or something? printing from stream
doesn't feel right
fixing errors so that we don't run into so many 5xx errors when TGI returns an issue due to input prompt window being too large.
tested with
just up
locally to make sure behavior is consistent when calling with a prompt that exceeds the max number of tokens specified by TGI tokenizer.