SPY-1394: CSD caching policy at block level #189

ianlcsd · 2017-07-21T23:34:43Z

The problem originated from the fact that spark can not read a cache block from disk store with size over 2G. When data is distributed highly skewed over partitions, we see problems recorded in SPY-1394, with any RDD cached on disk.

The ultimate solution of being to adapt the partitions to skew is a long shot,
but imposing a CSD policy on cache block size is feasible.

This PR is proposing the following policy:

We cache block in memory if it is less than 2G and there is available storage memory.
We don't cache any block over 2G size
We drop the block to disk store when it is smaller than 2G and there is no sufficient storage memory.
Introduced a spark configuration, spark.storage.MemoryStore.csdCacheBlockSizeLimit.
A negative value should disable the block policy.
The default is set to Integer.MAX_VALUE, but we could choose to make it smaller.

By applying the above policy, we ensure all the cache block is less 2G in size either in memory or disk. This braces us against skewed data and other type of abnormal caching pattern.

@davidnavas @markhamstra

ianlcsd · 2017-07-21T23:57:21Z

core/src/main/scala/org/apache/spark/storage/MemoryStore.scala

+   * sufficient memory in spark's user memory space to be set apart:
+   * (2G + overhead) per thread and per operator/RDD compute.
+   */
+  def fetchUntilCsdBlockSizeLimit[T](


This api is challenging as we need to keep all the seen values until reaching 2G size limit. it is a potential OOM thread and hard to test.

ianlcsd · 2017-07-24T20:35:32Z

jttp

davidnavas · 2017-07-25T16:11:47Z

core/src/main/scala/org/apache/spark/storage/MemoryStore.scala

+          if (!shouldCache) {
+            logBlockSizeLimitMessage(blockId, vector.estimateSize())
+          } else {
+            // We ran out of space while unrolling the values for this block


not sure I understand this comment for this case. Wouldn't this be the case where it's still cacheable?
oic, this is the original comment. Maybe need to put a line before this to separate cache from unroll.

ianlcsd changed the title ~~SPY-1394: Csd policy on caching block size~~ SPY-1394: CSD caching policy at block level Jul 21, 2017

ianlcsd force-pushed the csd-1.6 branch from 0f3e60d to 28bd9e6 Compare July 21, 2017 23:38

ianlcsd closed this Jul 21, 2017

ianlcsd reopened this Jul 21, 2017

ianlcsd force-pushed the csd-1.6 branch from 28bd9e6 to 4d17305 Compare July 21, 2017 23:44

ianlcsd commented Jul 21, 2017

View reviewed changes

ianlcsd force-pushed the csd-1.6 branch from 4d17305 to cf69436 Compare July 22, 2017 19:37

SPY-1394

95794cc

ianlcsd force-pushed the csd-1.6 branch from 6662db6 to 95794cc Compare July 24, 2017 18:44

alteryx deleted a comment from ianlcsd Jul 25, 2017

davidnavas reviewed Jul 25, 2017

View reviewed changes

comments

3ece0ac

markhamstra merged commit 2b5e422 into alteryx:csd-1.6 Jul 25, 2017

ianlcsd deleted the csd-1.6 branch April 22, 2018 22:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

SPY-1394: CSD caching policy at block level #189

SPY-1394: CSD caching policy at block level #189

Uh oh!

ianlcsd commented Jul 21, 2017 •

edited

Loading

Uh oh!

ianlcsd Jul 21, 2017

Uh oh!

ianlcsd commented Jul 24, 2017

Uh oh!

davidnavas Jul 25, 2017

Uh oh!

Uh oh!

SPY-1394: CSD caching policy at block level #189

SPY-1394: CSD caching policy at block level #189

Uh oh!

Conversation

ianlcsd commented Jul 21, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ianlcsd Jul 21, 2017

Choose a reason for hiding this comment

Uh oh!

ianlcsd commented Jul 24, 2017

Uh oh!

davidnavas Jul 25, 2017

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ianlcsd commented Jul 21, 2017 •

edited

Loading