Feature request: implement VPF <-> Triton interoperability #207
Description
Description is taken from #205 by @philipp-schmidt
Triton is basically a server running your ML networks and offering http or grpc endpoints for them. Clients can send a request and invoke the networks and receive results without having to use frameworks or bother with implementation or GPU acceleration, etc. It's by far the best tool to scale ML workload imo.
To start a request to the server there are basically three options to pass the data (e.g. an image to analyze via object detection):
- pass your data via http or grpc payload (only option if you are not on the same compute node, so over the network)
- register and pass a pointer to shared memory with the data
- register and pass a pointer to shared memory ON GPU with the data
Obviously the performance increases with the order in the list as the amount of memory copy operations goes down to basically zero with sharing GPU memory.
It would obviously be very useful if VPF could simply provide the necessary pointers to the data in the GPU memory to Triton Inference Server. This would mean that VPF needs to have a very good set of processing layers (like the already existing scaling and color convert operation), because most networks expect a very well defined input and conversion needs to happen on GPU. And there might be things like batching (compute multiple images at once) which might make things more complicated.