Skip to content

renaming of mxnet-model-server in sagemaker-inference package 1.5.3 causing entrypoint with command serve to fail #88

Open
@RZachLamberty

Description

@RZachLamberty

Describe the bug
sagemaker-inference recently (10/15) released v1.5.3, which included this commit updating the name of the model server artifact and command from mxnet-model-server to multi-model-server.

all containers defined in this repository install sagemaker-inference as a dependency of this repo itself, on lines

RUN pip install --no-cache-dir "sagemaker-pytorch-inference<2"

and this repo's setup.py has an install_requires which includes sagemaker-inference>=1.3.1. as a result, sagemaker-inference=1.5.3 installed.

so while the Dockerfile's CMD value (which calls mxnet-model-server directly) will succeed, attempts to use the ENTRYPOINT with serve as a build arg will fail with message:

Traceback (most recent call last):
  File "/usr/local/bin/dockerd-entrypoint.py", line 22, in <module>
    serving.main()
  File "/opt/conda/lib/python3.6/site-packages/sagemaker_pytorch_serving_container/serving.py", line 39, in main
    _start_model_server()
  File "/opt/conda/lib/python3.6/site-packages/retrying.py", line 49, in wrapped_f
    return Retrying(*dargs, **dkw).call(f, *args, **kw)
  File "/opt/conda/lib/python3.6/site-packages/retrying.py", line 206, in call
    return attempt.get(self._wrap_exception)
  File "/opt/conda/lib/python3.6/site-packages/retrying.py", line 247, in get
    six.reraise(self.value[0], self.value[1], self.value[2])
  File "/opt/conda/lib/python3.6/site-packages/six.py", line 703, in reraise
    raise value
  File "/opt/conda/lib/python3.6/site-packages/retrying.py", line 200, in call
    attempt = Attempt(fn(*args, **kwargs), attempt_number, False)
  File "/opt/conda/lib/python3.6/site-packages/sagemaker_pytorch_serving_container/serving.py", line 35, in _start_model_server
    model_server.start_model_server(handler_service=HANDLER_SERVICE)
  File "/opt/conda/lib/python3.6/site-packages/sagemaker_inference/model_server.py", line 94, in start_model_server
    subprocess.Popen(multi_model_server_cmd)
  File "/opt/conda/lib/python3.6/subprocess.py", line 709, in __init__
    restore_signals, start_new_session)
  File "/opt/conda/lib/python3.6/subprocess.py", line 1344, in _execute_child
    raise child_exception_type(errno_num, err_msg, err_filename)
FileNotFoundError: [Errno 2] No such file or directory: 'multi-model-server': 'multi-model-server'

To reproduce

  1. build any container
  2. mount a model and inference.py (e.g. half_plus_three) into /opt/ml/model
  3. docker run [tag name] serve

Expected behavior
tensorflow serving serves the mounted model / inference.py

System information
A description of your system. Please provide:

  • Toolkit version: 2.0.5, but should apply to all versions
  • Framework version: 1.4, but should apply to all versions
  • Python version: 3.7
  • CPU or GPU: cpu, but should apply to both
  • Custom Docker image (Y/N): N

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions