-
Notifications
You must be signed in to change notification settings - Fork 78
Open
Labels
Description
We have reports of the SparkResourceAdaptor causing a regression between 25.02 and 25.04 given coarse locks taken while we do some expensive STL and metric operations. We have confirmed that reducing those critical sections improves the performance immensely, removing the regression.
This issue is to help fix this in the proper way, by speeding up the processing within these locks.
List of changes I am proposing:
- read/write locks in RmmSpark against Rmm.class: Use Rmm read/write locks in RmmSpark #3924
- logging macros in SparkResourceAdaptor to consistently check for logging enabled, and to stop doing expensive pre-log operations that we do not need. Decouple logger object from spark_resource_adaptor #3931
- make
full_thread_statea shared pointer Make full_thread_state a shared pointer #3966 - stop deadlock detection in critical path, and leave it to allocation failures and deadlock watchdog + stop calling into jni during deadlock protection, instead java sends to jni state of threads (less expensive java -> native) Only check for deadlocks in deadlock busting thread #3977
- use state specific collections in the adaptor, especially for blocked threads, ordered by priority. This prevents expensive loops on the
threadsmap in critical sections.