Ianmacleod/completion sync error throws 4xx #234

ian-scale · 2023-08-29T16:39:11Z

fixing errors so that we don't run into so many 5xx errors when TGI returns an issue due to input prompt window being too large.

tested with just up locally to make sure behavior is consistent when calling with a prompt that exceeds the max number of tokens specified by TGI tokenizer.

yixu34 · 2023-08-29T17:49:55Z

model-engine/model_engine_server/domain/use_cases/llm_model_endpoint_use_cases.py

@@ -636,7 +636,10 @@ def model_output_to_completion_output(
                )
            except Exception as e:
                logger.exception(f"Error parsing text-generation-inference output {model_output}")
-                raise e
+                if 'generated_text' not in model_output:
+                    raise ObjectHasInvalidValueException(


Hmm yeah I'm not sure it necessarily follows that a user passed a bad input, if we observe if 'generated_text' not in model_output:. It seems like we're doing this empirically instead of going against TGI's API (not sure if they document this behavior). Perhaps the original exception that gets thrown can tell us something? @yunfeng-scale thoughts?

i don't think TGI's error behavior is documented, they just send back a string when error

can you change error handling for streaming as well?

yeah I'll update the error handling for streaming as well

The initial error message from TGI is a stringified json message: "{'error': 'Input validation error: inputs must have less than 2048 tokens. Given: 2687', 'error_type': 'validation'}". I could parse this error and return it to the user wrapped as an InputValidationError? Does that sound better? @yixu34 @yunfeng-scale

Generally not ideal to try and parse strings, and it's instead preferable to rely on structured fields. But in the absence of that, think this is fine for now. It's at least more precise than inferring semantics like what's currently being done in this PR?

I'd be in favor of doing the parse, putting a TODO, and revisiting this.

i think we could try to parse it as an error and return more precise content. the current error Error parsing text-generation-inference output does not really capture what the error is and might confuse user

Okay added this logic for completion sync

model-engine/model_engine_server/domain/use_cases/llm_model_endpoint_use_cases.py

yunfeng-scale · 2023-08-30T00:38:58Z

model-engine/model_engine_server/domain/use_cases/llm_model_endpoint_use_cases.py

+                logger.exception(f"Error parsing text-generation-inference output {model_output}. Error message: {json.loads(e)['error']}")
+                if 'generated_text' not in model_output:
+                    raise ObjectHasInvalidValueException(
+                        f"Error parsing text-generation-inference output {model_output}. Error message: {json.loads(e)['error']}"


my suggestion is:

if model_output.get("error_type") == "validation": raise InvalidRequestException(model_output.get("error") # trigger a 400 else: raise UpstreamServiceError(model_output.get("error")) # also change llms_v1.py that will return a 500 HTTPException so user can retry

yunfeng-scale · 2023-08-30T19:56:59Z

model-engine/model_engine_server/core/domain_exceptions.py

@@ -57,3 +57,8 @@ class ReadOnlyDatabaseException(DomainException):
    """
    Thrown if the server attempted to write to a read-only database.
    """
+
+class InvalidRequestException(DomainException):


it already exists in model_engine_server.domain.exceptions

not sure difference between model_engine_server.core.domain_exceptions and model_engine_server.domain.exceptions though

oh, sorry about that -- I'll import from model_engine_server.domain.exceptions, I can't think of a reason why we'd need this in two places

Actually we should probably look into why we have two different sets of exceptions and combine them if possible. I'll look into that

yunfeng-scale · 2023-08-30T19:58:51Z

model-engine/model_engine_server/domain/use_cases/llm_model_endpoint_use_cases.py

+                    )
+                '''
+                if model_output.get("error_type") == "validation":
+                    raise InvalidRequestException(model_output.get("error")) # trigger a 400


i think some changes are needed in llms_v1.py, are you able to get 400 repro validation error with the current changes?

yeah, added them so that we handle and return 400 HTTPException

yunfeng-scale · 2023-08-30T19:59:38Z

model-engine/model_engine_server/domain/use_cases/llm_model_endpoint_use_cases.py

+                        if "error" in result:
+                            logger.exception(f"Error parsing text-generation-inference stream output. Error message: {e}")
+                            raise ObjectHasInvalidValueException(
+                                f"Error parsing text-generation-inference stream output." 


i would expect same error processing logics here as in sync use case

updated to reflect that now

raise InvalidRequestException or UpstreamServiceError?

yunfeng-scale · 2023-08-31T01:07:05Z

model-engine/model_engine_server/domain/use_cases/llm_model_endpoint_use_cases.py

@@ -803,8 +813,10 @@ async def execute(
            ObjectNotAuthorizedException: If the owner does not own the model endpoint.
        """

+        logging.debug("do we reach execute()?")


yunfeng-scale · 2023-08-31T01:07:34Z

model-engine/model_engine_server/domain/use_cases/llm_model_endpoint_use_cases.py

+                        if "error" in result:
+                            logger.exception(f"Error parsing text-generation-inference stream output. Error message: {e}")
+                            raise ObjectHasInvalidValueException(
+                                f"Error parsing text-generation-inference stream output." 


raise InvalidRequestException or UpstreamServiceError?

yunfeng-scale · 2023-08-31T01:07:59Z

model-engine/model_engine_server/domain/use_cases/llm_model_endpoint_use_cases.py

+                if model_output.get("error_type") == "validation":
+                    raise InvalidRequestException(model_output.get("error")) # trigger a 400
+                else:
+                    raise UpstreamServiceError(model_output.get("error")) # also change llms_v1.py that will return a 500 HTTPException so user can retry


can you double check 500s with proper error messages are returned in this code path?

I'm sorry for re-requesting a review pre-emptively (I should've cleaned up the code better). I'll fix these issues and then reach back out, sorry @yunfeng-scale and thanks for the comments!

ian-scale · 2023-09-01T18:28:17Z

error messages now look like this for completions-sync and completions-stream:

model-engine/model_engine_server/api/llms_v1.py

model-engine/model_engine_server/domain/use_cases/llm_model_endpoint_use_cases.py

yunfeng-scale · 2023-09-01T18:31:44Z

model-engine/model_engine_server/api/llms_v1.py

+                async for message in response:
+                    yield {"data": message.json()}
+            except InvalidRequestException as exc:
+                yield {


can you update docs about how to deal with errors when streaming

yes, right now it looks like:

stream = Completion.create( model="llama-2-7b", prompt="Give me a 200 word summary on the current economic events in the US.", max_new_tokens=1000, temperature=0.2, stream=True, ) for response in stream: try: if response.output: print(response.output.text, end="") sys.stdout.flush() except: # an error occurred print(stream.text) # print the error message out

I will add this directly to the llm-engine home page doc here: https://llm-engine.scale.com/getting_started/ and also here https://llm-engine.scale.com/guides/completions/

it sounds like you want to add them in a separate PR which is fine. be aware everything is in the same repo now

I just added the docs changes in this same pr

yunfeng-scale · 2023-09-01T19:01:44Z

model-engine/model_engine_server/api/llms_v1.py

+                async for message in response:
+                    yield {"data": message.json()}
+            except InvalidRequestException as exc:
+                yield {


it sounds like you want to add them in a separate PR which is fine. be aware everything is in the same repo now

yunfeng-scale · 2023-09-01T22:16:41Z

docs/guides/completions.md

+            print(response.output.text, end="")
+            sys.stdout.flush()
+    except: # an error occurred
+        print(stream.text) # print the error message out 


hmm, this doesn't seem right to me, shouldn't we print response.error or something? printing from stream doesn't feel right

ian-scale-2 added 3 commits August 29, 2023 16:36

changing 5xx error to 4xx error

7e874c3

.

bc292b5

.

f3c9c20

yixu34 reviewed Aug 29, 2023

View reviewed changes

adding completion stream changes

f6a6e61

ian-scale requested a review from yunfeng-scale August 29, 2023 21:47

ian-scale-2 added 2 commits August 29, 2023 23:06

parsing error dictionary

0a8e2f5

.

eb0ffb4

ian-scale commented Aug 29, 2023

View reviewed changes

model-engine/model_engine_server/domain/use_cases/llm_model_endpoint_use_cases.py Outdated Show resolved Hide resolved

yunfeng-scale reviewed Aug 30, 2023

View reviewed changes

.

6b3cf77

yunfeng-scale reviewed Aug 30, 2023

View reviewed changes

fixing error handling for 400

bf55e22

ian-scale requested a review from yunfeng-scale August 30, 2023 23:57

yunfeng-scale reviewed Aug 31, 2023

View reviewed changes

ian-scale-2 added 3 commits August 31, 2023 02:19

.

35385bd

hacky way of fixing completion stream w error message

f2b9502

cleanup

bf22087

ian-scale requested a review from yunfeng-scale September 1, 2023 18:28

yunfeng-scale reviewed Sep 1, 2023

View reviewed changes

model-engine/model_engine_server/api/llms_v1.py Show resolved Hide resolved

yunfeng-scale reviewed Sep 1, 2023

View reviewed changes

yunfeng-scale approved these changes Sep 1, 2023

View reviewed changes

ian-scale-2 and others added 5 commits September 1, 2023 19:07

cleanup, add docs

0d8cf50

.

7612bb7

Merge branch 'main' into ianmacleod/completion_sync_error_throws_4xx

f114357

Merge branch 'main' into ianmacleod/completion_sync_error_throws_4xx

4438d7d

fixing indentation on docs

e568400

ian-scale enabled auto-merge (squash) September 1, 2023 19:46

Merge branch 'main' into ianmacleod/completion_sync_error_throws_4xx

727d284

ian-scale merged commit b5cf6a9 into main Sep 1, 2023

ian-scale deleted the ianmacleod/completion_sync_error_throws_4xx branch September 1, 2023 19:52

yunfeng-scale reviewed Sep 1, 2023

View reviewed changes

Ianmacleod/completion sync error throws 4xx #234

Ianmacleod/completion sync error throws 4xx #234

Uh oh!

Conversation

ian-scale commented Aug 29, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

yunfeng-scale Aug 30, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ian-scale Aug 31, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ian-scale commented Sep 1, 2023

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ian-scale Sep 1, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ian-scale Sep 1, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ian-scale commented Aug 29, 2023 •

edited

Loading

yunfeng-scale Aug 30, 2023 •

edited

Loading

ian-scale Aug 31, 2023 •

edited

Loading

ian-scale Sep 1, 2023 •

edited

Loading

ian-scale Sep 1, 2023 •

edited

Loading