-
Notifications
You must be signed in to change notification settings - Fork 28.7k
[SPARK-24441][SS] Expose total estimated size of states in HDFSBackedStateStoreProvider #21469
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Test build #91347 has finished for PR 21469 at commit
|
Test build #91375 has finished for PR 21469 at commit
|
@@ -181,6 +182,12 @@ private[state] class HDFSBackedStateStoreProvider extends StateStoreProvider wit | |||
} | |||
} | |||
|
|||
def getCustomMetricsForProvider(): Map[StateStoreCustomMetric, Long] = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tiny nit:
def getCustomMetricsForProvider(): Map[StateStoreCustomMetric, Long] = synchronized {
Map(metricProviderLoaderMapSize -> SizeEstimator.estimate(loadedMaps))
}
Shall we make the PR title complete? Looks truncated. |
Thanks @HyukjinKwon for reviewing. Addressed PR title as well as fixing nit. |
Test build #91382 has finished for PR 21469 at commit
|
LGTM. To clarify the description, we expect the memory footprint to be much larger than query status reports in situations where the state store is getting a lot of updates? |
@jose-torres |
@HeartSaVioR , may be then this should be reported in the "memoryUsedBytes" in the StateOperatorProgress (value reported in StreamingQueryProgress) or better as a separate custom metrics because currently the usage reported does not reflect the memory used for the cache. Question: in the screenshot "Estimated size of states cache in provider total" is 3.3 MB whereas the "memory used by state total" is 20.6 KB with "total number of state rows" = 2. This 150x difference is expected with just 2 rows in the state? Were there 100 versions of the map in the sample output you posted? |
@arunmahadevan Btw, the cache is going to clean up when maintenance operation is in progress, so there could be more than 100 versions in map. Not sure why it shows 150x, but I couldn't find missing spot on the patch. Maybe the issue is from SizeEstimator.estimate()? One thing we need to check is how SizeEstimator.estimate() calculate the memory usage when Unsafe row objects are shared across versions. If SizeEstimator adds the size of object whenever it is referenced, it will report much higher memory usage than actual. |
Looks like the size is added only once for same identity on SizeEstimator.estimate(), so SizeEstimator.estimate() is working correctly in this case. There might be other valid cases, but not sure. |
May be this can be reported as a custom metrics and keep it optional and that way its not tied to any specific implementation. |
Test build #91486 has finished for PR 21469 at commit
|
@jose-torres is it good to go? |
@arunmahadevan I have to exclude
|
Nice, LGTM. |
Test build #91503 has finished for PR 21469 at commit
|
Test build #91509 has finished for PR 21469 at commit
|
@@ -231,7 +231,7 @@ class StreamingQueryListenerSuite extends StreamTest with BeforeAndAfter { | |||
test("event ordering") { | |||
val listener = new EventCollector | |||
withListenerAdded(listener) { | |||
for (i <- 1 to 100) { | |||
for (i <- 1 to 50) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After the patch this test starts failing: it just means there's more time needed to run this loop 100 times. It doesn't mean the logic is broken. Decreasing number works for me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense, and I agree with the implicit claim that this slowdown isn't too worrying.
Test build #91523 has finished for PR 21469 at commit
|
retest this, please |
Test build #91526 has finished for PR 21469 at commit
|
retest this please |
Test build #91535 has finished for PR 21469 at commit
|
ok to test |
@tdas Thanks for the review! Addressed review comments. |
Test build #93869 has finished for PR 21469 at commit
|
Retest this, please |
Test build #93906 has finished for PR 21469 at commit
|
retest this please |
Test build #93927 has finished for PR 21469 at commit
|
@tdas Kindly reminder. |
I am having a second thoughts about this. Exposing the entire memory usage of all the loaded maps as another custom metric .... just adds more confusion. Rather the point of the the main state metric I am fine adding the custom metrics hit and miss counts. No questions about that. What do you think? |
My series of patches could be possible based on two metrics: IMHO, I'm not 100% sure how much this patch provides confusion to the end users, but if the intention of |
@HeartSaVioR I think I agree with a second approach that you suggested. So |
customMetric.stateOnCurrentVersionSizeBytes to size for memory usage of current version
@tdas Thanks for the feedback! Updated the PR. |
Test build #95012 has finished for PR 21469 at commit
|
LGTM. |
Merged to master. |
Thanks all for reviewing and thanks @tdas for merging this in! |
Unfortunately this PR broke the master build. Looks like some import that probably got removed in the other PR I merged, which didnt create any direct conflict. |
@tdas Yeah, I can check with master branch if you would like to let me handle, and please go ahead if you would like to handle it by yourself. |
@tdas In case of you are not working on the patch, I'm working on the fix and will provide minor PR. |
I did. Fixed the import |
… HDFSBackedStateStoreProvider This patch proposes breaking down configuration of retaining batch size on state into two pieces: files and in memory (cache). While this patch reuses existing configuration for files, it introduces new configuration, "spark.sql.streaming.maxBatchesToRetainInMemory" to configure max count of batch to retain in memory. Apply this patch on top of SPARK-24441 (apache#21469), and manually tested in various workloads to ensure overall size of states in memory is around 2x or less of the size of latest version of state, while it was 10x ~ 80x before applying the patch. Author: Jungtaek Lim <[email protected]> Closes apache#21700 from HeartSaVioR/SPARK-24717.
#183) [SPARK-24717][SS] Split out max retain version of state for memory in HDFSBackedStateStoreProvider This patch proposes breaking down configuration of retaining batch size on state into two pieces: files and in memory (cache). While this patch reuses existing configuration for files, it introduces new configuration, "spark.sql.streaming.maxBatchesToRetainInMemory" to configure max count of batch to retain in memory. Apply this patch on top of SPARK-24441 (apache#21469), and manually tested in various workloads to ensure overall size of states in memory is around 2x or less of the size of latest version of state, while it was 10x ~ 80x before applying the patch. Author: Jungtaek Lim <[email protected]> Closes apache#21700 from HeartSaVioR/SPARK-24717.
What changes were proposed in this pull request?
This patch exposes the estimation of size of cache (loadedMaps) in HDFSBackedStateStoreProvider as a custom metric of StateStore.
The rationalize of the patch is that state backed by HDFSBackedStateStoreProvider will consume more memory than the number what we can get from query status due to caching multiple versions of states. The memory footprint to be much larger than query status reports in situations where the state store is getting a lot of updates: while shallow-copying map incurs additional small memory usages due to the size of map entities and references, but row objects will still be shared across the versions. If there're lots of updates between batches, less row objects will be shared and more row objects will exist in memory consuming much memory then what we expect.
While HDFSBackedStateStore refers loadedMaps in HDFSBackedStateStoreProvider directly, there would be only one
StateStoreWriter
which refers a StateStoreProvider, so the value is not exposed as well as being aggregated multiple times. Current state metrics are safe to aggregate for the same reason.How was this patch tested?
Tested manually. Below is the snapshot of UI page which is reflected by the patch:
Please refer "estimated size of states cache in provider total" as well as "count of versions in state cache in provider".