Skip to content
This repository was archived by the owner on Jun 10, 2024. It is now read-only.
This repository was archived by the owner on Jun 10, 2024. It is now read-only.

Feature request: implement VPF <-> Triton interoperability #207

Open
@rarzumanyan

Description

@rarzumanyan

Description is taken from #205 by @philipp-schmidt

Triton is basically a server running your ML networks and offering http or grpc endpoints for them. Clients can send a request and invoke the networks and receive results without having to use frameworks or bother with implementation or GPU acceleration, etc. It's by far the best tool to scale ML workload imo.

To start a request to the server there are basically three options to pass the data (e.g. an image to analyze via object detection):

  • pass your data via http or grpc payload (only option if you are not on the same compute node, so over the network)
  • register and pass a pointer to shared memory with the data
  • register and pass a pointer to shared memory ON GPU with the data

Obviously the performance increases with the order in the list as the amount of memory copy operations goes down to basically zero with sharing GPU memory.

It would obviously be very useful if VPF could simply provide the necessary pointers to the data in the GPU memory to Triton Inference Server. This would mean that VPF needs to have a very good set of processing layers (like the already existing scaling and color convert operation), because most networks expect a very well defined input and conversion needs to happen on GPU. And there might be things like batching (compute multiple images at once) which might make things more complicated.

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions