Description
@cynthia's Machine Learning in Web Architecture talk makes a (wild) guess:
Less than 1% of users will train in the browser context
The current in-browser efforts (e.g. those pursued by the W3C's Machine Learning for the Web Community Group) are focused on inference rather than training. This is due to pragmatic reasons: limited availability of respective platform APIs to make this process efficient, also resource restrictions of the browser architecture not optimized for such a demanding task.
For in-browser inference, a model with a total weight size somewhere in the ballpark of ~100 MB starts to be too slow on a typical desktop hardware. For training, more memory and compute is required, so possibly even smaller models than that will be too slow to train in a browser to be useful in most use cases. @huningxin has probably couple of pointers to model size vs performance evaluations.
My questions:
- Are non-browser JS environments unhindered by resource restriction of a browser client the only feasible short-term target for JS-based training (discussed in Applicability to non-browser JS environments #62)?
- Assuming we're headed toward a future of in-browser training eventually becoming a thing, are there obvious gaps that could be bridged on the browser capabilities and APIs to smoothen the path to that future?
For example, the large amounts of data needed to train a model is currently better ingested outside the browser from a native file system. The Native File System API may address the issue of data ingestion for in-browser usage. What other such API gaps would make the memory and compute intensive task of training more feasible in the browser context?