-
Notifications
You must be signed in to change notification settings - Fork 53
Description
Consider an RTC-style web app concurrently running a collection of ML workloads of various response requirements. Assume that there aren't capacity issues (the system as a whole has enough HP to execute the workload on average).
| Workload | Period/response requirement | Job completion delay |
|---|---|---|
| Audio capture processing (denoising etc) | 50 Hz | 25% |
| Video capture processing (eg background blur) | 30 Hz | 10-30% |
| WebSpeech | 500ms? | (continuous) |
| LLM | Bursty | Bursty |
Without proper WebNN consideration of these requirements there are concerns that the UA will make poor scheduling decisions, causing the system to miss deadlines. The effects:
- robo-sound
- audio glitches
- dropped video frames
- janky video
- increased end-to-end delay (since receiver jitterbuffers need to bump target delay to account for the problems).
The ideal state is that the system manages to understand the requirements. With the right scheduling decisions, LLM queries will be belated a little and possibly the latency of WebSpeech will go up, but as a whole it manages to meet the expectations in a good way.
Assuming the backend frameworks are capable of supporting workload prioritization, I see some options to help this case
- Equip the WebNN API with hints on the workload passed.
- Auto-detect the type of processing. For example, when audio or video capture wakes a worker it could transport a token containing periodicity info and let that influence scheduling decisions in the WebNN backend.
My guess is both need spec updates? Or maybe just the first option?
See also crbug.com/456006123 for repros of related issues with other tech than WebNN. Also note this issue is not asking for ways to understand total system capacity as this is impossible to compute without trying workloads out.