Skip to content

fix: Fixing issue #82 #83

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Jul 9, 2021
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions src/sagemaker_inference/model_server.py
Original file line number Diff line number Diff line change
Expand Up @@ -150,6 +150,7 @@ def _generate_mms_config_properties():
"default_workers_per_model": env.model_server_workers,
"inference_address": "http://0.0.0.0:{}".format(env.inference_http_port),
"management_address": "http://0.0.0.0:{}".format(env.management_http_port),
"vmargs": "-XX:-UseContainerSupport",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wondering if we should make this configurable via SM env? Not sure if it would require additional changes anywhere else +@dhanainme
"vmargs": env.vmargs if env.vmargs else "-XX:-UseContainerSupport"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1. TS_VM_ARGS could be the env variable where we can pick up from

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change needs to happen here in this file

self._module_name = os.environ.get(parameters.USER_PROGRAM_ENV, DEFAULT_MODULE_NAME)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have tried the fix in this PR that should sovle #82 but we see no difference in the number of CPU's logged in cloudwatch. So not sure if more changes are involved, but this fix as seperate change seems not solving the issue. The container we use (pytorch 1.7.1, torch-serve 0.4.0), uses JDK 11 which should have the property -XX:-UseContainerSupport enabled by default.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it won't work for PyTorch >= 1.6 containers since torchserve model server is used

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason why this doesn't fix PT >=1.6 is because the pytorch inference toolkit needs similar fix.

}

custom_configuration = str()
Expand Down