Skip to content

Conversation

ian-scale
Copy link
Contributor

@ian-scale ian-scale commented Aug 29, 2023

fixing errors so that we don't run into so many 5xx errors when TGI returns an issue due to input prompt window being too large.

tested with just up locally to make sure behavior is consistent when calling with a prompt that exceeds the max number of tokens specified by TGI tokenizer.

@@ -636,7 +636,10 @@ def model_output_to_completion_output(
)
except Exception as e:
logger.exception(f"Error parsing text-generation-inference output {model_output}")
raise e
if 'generated_text' not in model_output:
raise ObjectHasInvalidValueException(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm yeah I'm not sure it necessarily follows that a user passed a bad input, if we observe if 'generated_text' not in model_output:. It seems like we're doing this empirically instead of going against TGI's API (not sure if they document this behavior). Perhaps the original exception that gets thrown can tell us something? @yunfeng-scale thoughts?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i don't think TGI's error behavior is documented, they just send back a string when error

can you change error handling for streaming as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah I'll update the error handling for streaming as well

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The initial error message from TGI is a stringified json message: "{'error': 'Input validation error: inputs must have less than 2048 tokens. Given: 2687', 'error_type': 'validation'}". I could parse this error and return it to the user wrapped as an InputValidationError? Does that sound better? @yixu34 @yunfeng-scale

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally not ideal to try and parse strings, and it's instead preferable to rely on structured fields. But in the absence of that, think this is fine for now. It's at least more precise than inferring semantics like what's currently being done in this PR?

I'd be in favor of doing the parse, putting a TODO, and revisiting this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think we could try to parse it as an error and return more precise content. the current error Error parsing text-generation-inference output does not really capture what the error is and might confuse user

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay added this logic for completion sync

logger.exception(f"Error parsing text-generation-inference output {model_output}. Error message: {json.loads(e)['error']}")
if 'generated_text' not in model_output:
raise ObjectHasInvalidValueException(
f"Error parsing text-generation-inference output {model_output}. Error message: {json.loads(e)['error']}"
Copy link
Contributor

@yunfeng-scale yunfeng-scale Aug 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my suggestion is:

if model_output.get("error_type") == "validation":
  raise InvalidRequestException(model_output.get("error") # trigger a 400
else:
  raise UpstreamServiceError(model_output.get("error")) # also change llms_v1.py that will return a 500 HTTPException so user can retry

@@ -57,3 +57,8 @@ class ReadOnlyDatabaseException(DomainException):
"""
Thrown if the server attempted to write to a read-only database.
"""

class InvalidRequestException(DomainException):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it already exists in model_engine_server.domain.exceptions

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure difference between model_engine_server.core.domain_exceptions and model_engine_server.domain.exceptions though

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, sorry about that -- I'll import from model_engine_server.domain.exceptions, I can't think of a reason why we'd need this in two places

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually we should probably look into why we have two different sets of exceptions and combine them if possible. I'll look into that

)
'''
if model_output.get("error_type") == "validation":
raise InvalidRequestException(model_output.get("error")) # trigger a 400
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think some changes are needed in llms_v1.py, are you able to get 400 repro validation error with the current changes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, added them so that we handle and return 400 HTTPException

if "error" in result:
logger.exception(f"Error parsing text-generation-inference stream output. Error message: {e}")
raise ObjectHasInvalidValueException(
f"Error parsing text-generation-inference stream output."
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i would expect same error processing logics here as in sync use case

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated to reflect that now

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

raise InvalidRequestException or UpstreamServiceError?

@@ -803,8 +813,10 @@ async def execute(
ObjectNotAuthorizedException: If the owner does not own the model endpoint.
"""

logging.debug("do we reach execute()?")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove

if "error" in result:
logger.exception(f"Error parsing text-generation-inference stream output. Error message: {e}")
raise ObjectHasInvalidValueException(
f"Error parsing text-generation-inference stream output."
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

raise InvalidRequestException or UpstreamServiceError?

if model_output.get("error_type") == "validation":
raise InvalidRequestException(model_output.get("error")) # trigger a 400
else:
raise UpstreamServiceError(model_output.get("error")) # also change llms_v1.py that will return a 500 HTTPException so user can retry
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you double check 500s with proper error messages are returned in this code path?

Copy link
Contributor Author

@ian-scale ian-scale Aug 31, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sorry for re-requesting a review pre-emptively (I should've cleaned up the code better). I'll fix these issues and then reach back out, sorry @yunfeng-scale and thanks for the comments!

@ian-scale
Copy link
Contributor Author

error messages now look like this for completions-sync and completions-stream:

Screenshot 2023-09-01 at 11 27 31 AM

async for message in response:
yield {"data": message.json()}
except InvalidRequestException as exc:
yield {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you update docs about how to deal with errors when streaming

Copy link
Contributor Author

@ian-scale ian-scale Sep 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, right now it looks like:

stream = Completion.create(
    model="llama-2-7b",
    prompt="Give me a 200 word summary on the current economic events in the US.",
    max_new_tokens=1000,
    temperature=0.2,
    stream=True,
)

for response in stream:
  try:
      if response.output:
          print(response.output.text, end="")
          sys.stdout.flush()
  except: # an error occurred
    print(stream.text) # print the error message out 

I will add this directly to the llm-engine home page doc here: https://llm-engine.scale.com/getting_started/ and also here https://llm-engine.scale.com/guides/completions/

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it sounds like you want to add them in a separate PR which is fine. be aware everything is in the same repo now

Copy link
Contributor Author

@ian-scale ian-scale Sep 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just added the docs changes in this same pr

async for message in response:
yield {"data": message.json()}
except InvalidRequestException as exc:
yield {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it sounds like you want to add them in a separate PR which is fine. be aware everything is in the same repo now

@ian-scale ian-scale enabled auto-merge (squash) September 1, 2023 19:46
@ian-scale ian-scale merged commit b5cf6a9 into main Sep 1, 2023
@ian-scale ian-scale deleted the ianmacleod/completion_sync_error_throws_4xx branch September 1, 2023 19:52
print(response.output.text, end="")
sys.stdout.flush()
except: # an error occurred
print(stream.text) # print the error message out
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, this doesn't seem right to me, shouldn't we print response.error or something? printing from stream doesn't feel right

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants