[SPARK-24441][SS] Expose total estimated size of states in HDFSBackedStateStoreProvider #21469

HeartSaVioR · 2018-05-31T14:55:30Z

What changes were proposed in this pull request?

This patch exposes the estimation of size of cache (loadedMaps) in HDFSBackedStateStoreProvider as a custom metric of StateStore.

The rationalize of the patch is that state backed by HDFSBackedStateStoreProvider will consume more memory than the number what we can get from query status due to caching multiple versions of states. The memory footprint to be much larger than query status reports in situations where the state store is getting a lot of updates: while shallow-copying map incurs additional small memory usages due to the size of map entities and references, but row objects will still be shared across the versions. If there're lots of updates between batches, less row objects will be shared and more row objects will exist in memory consuming much memory then what we expect.

While HDFSBackedStateStore refers loadedMaps in HDFSBackedStateStoreProvider directly, there would be only one StateStoreWriter which refers a StateStoreProvider, so the value is not exposed as well as being aggregated multiple times. Current state metrics are safe to aggregate for the same reason.

How was this patch tested?

Tested manually. Below is the snapshot of UI page which is reflected by the patch:

Please refer "estimated size of states cache in provider total" as well as "count of versions in state cache in provider".

SparkQA · 2018-05-31T18:33:05Z

Test build #91347 has finished for PR 21469 at commit dc11338.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2018-06-01T04:34:31Z

Test build #91375 has finished for PR 21469 at commit 6c1c30b.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

HeartSaVioR · 2018-06-01T05:23:49Z

cc. @tdas @jose-torres @jerryshao @HyukjinKwon @arunmahadevan

HyukjinKwon · 2018-06-01T06:33:40Z

...main/scala/org/apache/spark/sql/execution/streaming/state/HDFSBackedStateStoreProvider.scala

@@ -181,6 +182,12 @@ private[state] class HDFSBackedStateStoreProvider extends StateStoreProvider wit
    }
  }

+  def getCustomMetricsForProvider(): Map[StateStoreCustomMetric, Long] = {


tiny nit:

def getCustomMetricsForProvider(): Map[StateStoreCustomMetric, Long] = synchronized { Map(metricProviderLoaderMapSize -> SizeEstimator.estimate(loadedMaps)) }

HyukjinKwon · 2018-06-01T06:41:32Z

Shall we make the PR title complete? Looks truncated.

HeartSaVioR · 2018-06-01T06:49:16Z

Thanks @HyukjinKwon for reviewing. Addressed PR title as well as fixing nit.

SparkQA · 2018-06-01T10:24:56Z

Test build #91382 has finished for PR 21469 at commit 933fb2e.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

jose-torres · 2018-06-02T01:06:47Z

LGTM. To clarify the description, we expect the memory footprint to be much larger than query status reports in situations where the state store is getting a lot of updates?

HeartSaVioR · 2018-06-03T04:15:33Z

@jose-torres
Ah yes I forgot that shallow copy has been occurring, so while new map should hold the size of map entries and references, but row object itself will be shared across versions. Thanks for pointing it out. Will update the description.

arunmahadevan · 2018-06-04T16:40:13Z

@HeartSaVioR , may be then this should be reported in the "memoryUsedBytes" in the StateOperatorProgress (value reported in StreamingQueryProgress) or better as a separate custom metrics because currently the usage reported does not reflect the memory used for the cache.

Question: in the screenshot "Estimated size of states cache in provider total" is 3.3 MB whereas the "memory used by state total" is 20.6 KB with "total number of state rows" = 2. This 150x difference is expected with just 2 rows in the state? Were there 100 versions of the map in the sample output you posted?

HeartSaVioR · 2018-06-05T02:39:18Z

@arunmahadevan
I didn't add the metric to StateOperatorProgress cause this behavior is specific to HDFSBackedStateStoreProvider (though this is only one implementation available in Apache Spark) so not sure this metric can be treated as a general one. (@tdas what do you think about this?)

Btw, the cache is going to clean up when maintenance operation is in progress, so there could be more than 100 versions in map. Not sure why it shows 150x, but I couldn't find missing spot on the patch. Maybe the issue is from SizeEstimator.estimate()?

One thing we need to check is how SizeEstimator.estimate() calculate the memory usage when Unsafe row objects are shared across versions. If SizeEstimator adds the size of object whenever it is referenced, it will report much higher memory usage than actual.

HeartSaVioR · 2018-06-05T08:25:29Z

Looks like the size is added only once for same identity on SizeEstimator.estimate(), so SizeEstimator.estimate() is working correctly in this case. There might be other valid cases, but not sure.

HeartSaVioR · 2018-06-05T13:22:45Z

Also added custom metric for the count of versions stored in loadedMaps.

This is a new screenshot:

arunmahadevan · 2018-06-05T16:13:11Z

I didn't add the metric to StateOperatorProgress cause this behavior is specific to HDFSBackedStateStoreProvider

May be this can be reported as a custom metrics and keep it optional and that way its not tied to any specific implementation.

SparkQA · 2018-06-05T17:01:14Z

Test build #91486 has finished for PR 21469 at commit 345397d.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
case class StateStoreCustomAverageMetric(name: String, desc: String) extends StateStoreCustomMetric

HyukjinKwon · 2018-06-06T08:31:12Z

@jose-torres is it good to go?

HeartSaVioR · 2018-06-06T14:11:08Z

@arunmahadevan
Added custom metrics in state store to streaming query status as well. You can see providerLoadedMapSize is added to stateOperators.customMetrics in below output.

I have to exclude providerLoadedMapCountOfVersions from the list, since average metric is implemented a bit tricky and doesn't look like easy to aggregate for streaming query status.
We may want to reimplement SQLMetric and subclasses to make sure everything works correctly without any tricky approach, but that doesn't look like trivial to address and I think this is out of scope on this PR.

18/06/06 22:51:23 INFO MicroBatchExecution: Streaming query made progress: {
  "id" : "7564a0b7-e3b2-4d53-b246-b774ab04e586",
  "runId" : "8dd34784-080c-4f86-afaf-ac089902252d",
  "name" : null,
  "timestamp" : "2018-06-06T13:51:15.467Z",
  "batchId" : 4,
  "numInputRows" : 547,
  "inputRowsPerSecond" : 67.15776550030694,
  "processedRowsPerSecond" : 65.94333936106088,
  "durationMs" : {
    "addBatch" : 7944,
    "getBatch" : 1,
    "getEndOffset" : 0,
    "queryPlanning" : 61,
    "setOffsetRange" : 5,
    "triggerExecution" : 8295,
    "walCommit" : 158
  },
  "eventTime" : {
    "avg" : "2018-06-06T13:51:10.313Z",
    "max" : "2018-06-06T13:51:14.250Z",
    "min" : "2018-06-06T13:51:07.098Z",
    "watermark" : "2018-06-06T13:50:36.676Z"
  },
  "stateOperators" : [ {
    "numRowsTotal" : 20,
    "numRowsUpdated" : 16,
    "memoryUsedBytes" : 26679,
    "customMetrics" : {
      "providerLoadedMapSize" : 181911
    }
  } ],
  "sources" : [ {
    "description" : "KafkaV2[Subscribe[apachelogs-v2]]",
    "startOffset" : {
      "apachelogs-v2" : {
        "2" : 489056,
        "4" : 489053,
        "1" : 489055,
        "3" : 489051,
        "0" : 489053
      }
    },
    "endOffset" : {
      "apachelogs-v2" : {
        "2" : 489056,
        "4" : 489053,
        "1" : 489055,
        "3" : 489051,
        "0" : 489053
      }
    },
    "numInputRows" : 547,
    "inputRowsPerSecond" : 67.15776550030694,
    "processedRowsPerSecond" : 65.94333936106088
  } ],
  "sink" : {
    "description" : "org.apache.spark.sql.execution.streaming.ConsoleSinkProvider@60999714"
  }
}

arunmahadevan · 2018-06-06T14:22:28Z

Nice, LGTM.

SparkQA · 2018-06-06T16:59:16Z

Test build #91503 has finished for PR 21469 at commit af57f26.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2018-06-07T03:42:37Z

Test build #91509 has finished for PR 21469 at commit 7ec3242.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

HeartSaVioR · 2018-06-07T04:27:22Z

sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamingQueryListenerSuite.scala

@@ -231,7 +231,7 @@ class StreamingQueryListenerSuite extends StreamTest with BeforeAndAfter {
  test("event ordering") {
    val listener = new EventCollector
    withListenerAdded(listener) {
-      for (i <- 1 to 100) {
+      for (i <- 1 to 50) {


After the patch this test starts failing: it just means there's more time needed to run this loop 100 times. It doesn't mean the logic is broken. Decreasing number works for me.

Makes sense, and I agree with the implicit claim that this slowdown isn't too worrying.

SparkQA · 2018-06-07T14:59:13Z

Test build #91523 has finished for PR 21469 at commit 3c80cad.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

HeartSaVioR · 2018-06-07T15:01:53Z

retest this, please

SparkQA · 2018-06-07T17:07:49Z

Test build #91526 has finished for PR 21469 at commit 3c80cad.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

HeartSaVioR · 2018-06-07T20:50:23Z

retest this please

SparkQA · 2018-06-08T00:47:53Z

Test build #91535 has finished for PR 21469 at commit 3c80cad.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

HyukjinKwon · 2018-06-09T08:51:20Z

ok to test

HeartSaVioR · 2018-08-01T05:26:02Z

@tdas Thanks for the review! Addressed review comments.

SparkQA · 2018-08-01T08:22:22Z

Test build #93869 has finished for PR 21469 at commit ed072fc.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

HeartSaVioR · 2018-08-01T21:39:08Z

Retest this, please

SparkQA · 2018-08-02T00:52:12Z

Test build #93906 has finished for PR 21469 at commit ed072fc.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

HyukjinKwon · 2018-08-02T01:59:00Z

retest this please

SparkQA · 2018-08-02T05:46:25Z

Test build #93927 has finished for PR 21469 at commit ed072fc.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

HeartSaVioR · 2018-08-06T22:56:50Z

@tdas Kindly reminder.

tdas · 2018-08-08T08:40:24Z

I am having a second thoughts about this. Exposing the entire memory usage of all the loaded maps as another custom metric .... just adds more confusion. Rather the point of the the main state metric memoryUsedBytes is to capture how much memory is occupied because of the one partition of the state, and that implicitly should cover all the loaded versions of that state partition. So I strongly feel that instead of adding a custom metric, we should change the existing memoryUsedBytes to capture all the memory.

I am fine adding the custom metrics hit and miss counts. No questions about that.

What do you think?

HeartSaVioR · 2018-08-08T13:52:04Z

My series of patches could be possible based on two metrics: size for memory usage of latest version and size for total memory usage of loaded versions. SPARK-24717 (#21700) enabled the possibility to tune the overall state memory usage in executor, and if end users have either one metric they couldn't tune it.

IMHO, I'm not 100% sure how much this patch provides confusion to the end users, but if the intention of memoryUsedBytes is for measuring overall state partition, what about replacing memoryUsedBytes as size for total memory usage of loaded versions, but also placing size for memory usage of latest version to custom metric?

HeartSaVioR · 2018-08-16T03:22:58Z

@tdas Kindly reminder.
@zsxwing Could you take a quick look at this and share your thought? I think the patch is ready to merge, but blocked with slightly conflict of view so more voices would be better.

tdas · 2018-08-21T07:41:58Z

@HeartSaVioR I think I agree with a second approach that you suggested. So
memoryUsedBytes => size for total memory usage of loaded versions and
customMetric => size for memory usage of latest version
Please make the necessary changes.

customMetric.stateOnCurrentVersionSizeBytes to size for memory usage of current version

HeartSaVioR · 2018-08-21T10:45:09Z

@tdas Thanks for the feedback! Updated the PR.

SparkQA · 2018-08-21T14:22:50Z

Test build #95012 has finished for PR 21469 at commit 545081e.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

tdas · 2018-08-21T22:28:26Z

LGTM.

tdas · 2018-08-21T22:29:02Z

Merged to master.

HeartSaVioR · 2018-08-21T22:41:54Z

Thanks all for reviewing and thanks @tdas for merging this in!

tdas · 2018-08-21T23:10:47Z

Unfortunately this PR broke the master build. Looks like some import that probably got removed in the other PR I merged, which didnt create any direct conflict.

HeartSaVioR · 2018-08-21T23:15:38Z

@tdas Yeah, I can check with master branch if you would like to let me handle, and please go ahead if you would like to handle it by yourself.

HeartSaVioR · 2018-08-21T23:43:03Z

@tdas In case of you are not working on the patch, I'm working on the fix and will provide minor PR.

tdas · 2018-08-22T00:27:40Z

I did. Fixed the import

… HDFSBackedStateStoreProvider This patch proposes breaking down configuration of retaining batch size on state into two pieces: files and in memory (cache). While this patch reuses existing configuration for files, it introduces new configuration, "spark.sql.streaming.maxBatchesToRetainInMemory" to configure max count of batch to retain in memory. Apply this patch on top of SPARK-24441 (apache#21469), and manually tested in various workloads to ensure overall size of states in memory is around 2x or less of the size of latest version of state, while it was 10x ~ 80x before applying the patch. Author: Jungtaek Lim <[email protected]> Closes apache#21700 from HeartSaVioR/SPARK-24717.

#183) [SPARK-24717][SS] Split out max retain version of state for memory in HDFSBackedStateStoreProvider This patch proposes breaking down configuration of retaining batch size on state into two pieces: files and in memory (cache). While this patch reuses existing configuration for files, it introduces new configuration, "spark.sql.streaming.maxBatchesToRetainInMemory" to configure max count of batch to retain in memory. Apply this patch on top of SPARK-24441 (apache#21469), and manually tested in various workloads to ensure overall size of states in memory is around 2x or less of the size of latest version of state, while it was 10x ~ 80x before applying the patch. Author: Jungtaek Lim <[email protected]> Closes apache#21700 from HeartSaVioR/SPARK-24717.

HyukjinKwon reviewed Jun 1, 2018

View reviewed changes

HeartSaVioR changed the title ~~[SPARK-24441][SS] Expose total size of states in HDFSBackedStateStore…~~ [SPARK-24441][SS] Expose total size of states in HDFSBackedStateStoreProvider Jun 1, 2018

HeartSaVioR changed the title ~~[SPARK-24441][SS] Expose total size of states in HDFSBackedStateStoreProvider~~ [SPARK-24441][SS] Expose total estimated size of states in HDFSBackedStateStoreProvider Jun 3, 2018

HeartSaVioR mentioned this pull request Jun 5, 2018

Scalable Memory option for HDFSBackedStateStore #21500

Closed

HeartSaVioR commented Jun 7, 2018

View reviewed changes

Address review comments from @tdas

ed072fc

Use memoryUsedBytes to total memory usage of loaded versions

545081e

customMetric.stateOnCurrentVersionSizeBytes to size for memory usage of current version

asfgit closed this in 42035a4 Aug 21, 2018

HeartSaVioR deleted the SPARK-24441 branch August 21, 2018 22:42

HeartSaVioR mentioned this pull request Aug 21, 2018

[MINOR] Fix build failure due to non-direct conflict: removing import affects others #22178

Closed

vatsalmevada mentioned this pull request Nov 8, 2019

SNAP-3219 - Retaining only two versions of the state store in memory TIBCOSoftware/snappy-spark#182

Closed

vatsalmevada mentioned this pull request Nov 8, 2019

[SPARK-24717][SS] Split out max retain version of state for memory in… TIBCOSoftware/snappy-spark#183

Merged

sarutak mentioned this pull request Nov 16, 2020

[SPARK-33287][SS][UI]Expose state custom metrics information on SS UI #30336

Closed

[SPARK-24441][SS] Expose total estimated size of states in HDFSBackedStateStoreProvider #21469

[SPARK-24441][SS] Expose total estimated size of states in HDFSBackedStateStoreProvider #21469

Uh oh!

Conversation

HeartSaVioR commented May 31, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

SparkQA commented May 31, 2018

Uh oh!

SparkQA commented Jun 1, 2018

Uh oh!

HeartSaVioR commented Jun 1, 2018

Uh oh!

HyukjinKwon Jun 1, 2018

Choose a reason for hiding this comment

Uh oh!

HyukjinKwon commented Jun 1, 2018

Uh oh!

HeartSaVioR commented Jun 1, 2018

Uh oh!

SparkQA commented Jun 1, 2018

Uh oh!

jose-torres commented Jun 2, 2018

Uh oh!

HeartSaVioR commented Jun 3, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

arunmahadevan commented Jun 4, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

HeartSaVioR commented Jun 5, 2018

Uh oh!

HeartSaVioR commented Jun 5, 2018

Uh oh!

HeartSaVioR commented Jun 5, 2018

Uh oh!

arunmahadevan commented Jun 5, 2018

Uh oh!

SparkQA commented Jun 5, 2018

Uh oh!

HyukjinKwon commented Jun 6, 2018

Uh oh!

HeartSaVioR commented Jun 6, 2018

Uh oh!

arunmahadevan commented Jun 6, 2018

Uh oh!

SparkQA commented Jun 6, 2018

Uh oh!

SparkQA commented Jun 7, 2018

Uh oh!

HeartSaVioR Jun 7, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jose-torres Jun 9, 2018

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Jun 7, 2018

Uh oh!

HeartSaVioR commented Jun 7, 2018

Uh oh!

SparkQA commented Jun 7, 2018

Uh oh!

HeartSaVioR commented Jun 7, 2018

Uh oh!

SparkQA commented Jun 8, 2018

Uh oh!

HyukjinKwon commented Jun 9, 2018

Uh oh!

HeartSaVioR commented Aug 1, 2018

Uh oh!

SparkQA commented Aug 1, 2018

Uh oh!

HeartSaVioR commented Aug 1, 2018

Uh oh!

SparkQA commented Aug 2, 2018

Uh oh!

HyukjinKwon commented Aug 2, 2018

HeartSaVioR commented May 31, 2018 •

edited

Loading

HeartSaVioR commented Jun 3, 2018 •

edited

Loading

arunmahadevan commented Jun 4, 2018 •

edited

Loading

HeartSaVioR Jun 7, 2018 •

edited

Loading

HeartSaVioR commented Aug 8, 2018 •

edited

Loading

tdas commented Aug 21, 2018 •

edited

Loading