-
Notifications
You must be signed in to change notification settings - Fork 884
Open
Labels
enhancementNew feature or requestNew feature or requestfuturejavaPull requests that update Java codePull requests that update Java codeoptimizationp1mid prioritymid priorityperfPerformance issuePerformance issue
Description
🚀 The feature
it seems that initial handlers are loaded sequentially for different models(handlers for same model are loaded in parallel though). When serving many models in production, this will significantly slowdown the new server spinning up. If it is possible to load all handlers in parallel? e.g. for a 32 core machine, on server startup, ideally we should process 32 workers in parallel in startup. This will dramatically decrease the startup time and can scale up better during traffic surge.
Motivation, pitch
see above
Alternatives
No response
Additional context
No response
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or requestfuturejavaPull requests that update Java codePull requests that update Java codeoptimizationp1mid prioritymid priorityperfPerformance issuePerformance issue