Skip to content

Conversation

kellrott
Copy link
Contributor

@kellrott kellrott commented Mar 1, 2014

This is a port of a pull request original targeted at incubator-spark: https://github.com/apache/incubator-spark/pull/180

Essentially if a user returns a generative iterator (from a flatMap operation), when trying to persist the data, Spark would first unroll the iterator into an ArrayBuffer, and then try to figure out if it could store the data. In cases where the user provided an iterator that generated more data then available memory, this would case a crash. With this patch, if the user requests a persist with a 'StorageLevel.DISK_ONLY', the iterator will be unrolled as it is inputed into the serializer.

To do this, two changes where made:

  1. The type of the 'values' argument in the putValues method of the BlockStore interface was changed from ArrayBuffer to Iterator (and all code interfacing with this method was modified to connect correctly.
  2. The JavaSerializer now calls the ObjectOutputStream 'reset' method every 1000 objects. This was done because the ObjectOutputStream caches objects (thus preventing them from being GC'd) to write more compact serialization. If reset is never called, eventually the memory fills up, if it is called too often then the serialization streams become much larger because of redundant class descriptions.

…the serializer when a 'DISK_ONLY' persist is called.

This is in response to SPARK-942.
…ffer objects. This was previously done higher up the stack.
Conflicts:
	core/src/main/scala/org/apache/spark/CacheManager.scala
… system variable 'spark.serializer.objectStreamReset', default is not 10000.
…Buffer (rather then an Iterator).

This will allow BlockStores to have slightly different behaviors dependent on whether they get an
Iterator or ArrayBuffer. In the case of the MemoryStore, it needs to duplicate and cache an Iterator
into an ArrayBuffer, but if handed a ArrayBuffer, it can skip the duplication.
…5 seconds. Confirmed that it still crashes an unpatched copy of Spark.
…rs. It doesn't try to invoke a OOM error any more
…. Now using trait 'Values'. Also modified BlockStore.putBytes call to return PutResult, so that it behaves like putValues.
…k into iterator-to-disk

Conflicts:
	core/src/test/scala/org/apache/spark/storage/LargeIteratorSuite.scala
@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@AmplabJenkins
Copy link

Merged build triggered.

@pwendell
Copy link
Contributor

pwendell commented Mar 6, 2014

Thanks @kellrott for this patch - sorry it took us a long time to review it. I'm going to merge this now.

@pwendell
Copy link
Contributor

pwendell commented Mar 6, 2014

I've created SPARK-1201 (https://spark-project.atlassian.net/browse/SPARK-1201) to cover optimizations in cases other than DISK_ONLY.

@asfgit asfgit closed this in 40566e1 Mar 6, 2014
jhartlaub referenced this pull request in jhartlaub/spark May 27, 2014
Fix race condition in SparkListenerSuite (fixes SPARK-908).

(cherry picked from commit 215238c)
Signed-off-by: Reynold Xin <[email protected]>
vlad17 pushed a commit to vlad17/spark that referenced this pull request Aug 23, 2016
## What changes were proposed in this pull request?
In Databricks, `SPARK_DIST_CLASSPATH` are used for driver classpath and `SPARK_JARS_DIR` is empty. So, we need to add `SPARK_DIST_CLASSPATH` to the `LAUNCH_CLASSPATH`. We cannot remove `SPARK_JARS_DIR` because Spark unit tests are actually using it.

Author: Yin Huai <[email protected]>

Closes apache#50 from yhuai/Add-SPARK_DIST_CLASSPATH-toLAUNCH_CLASSPATH.
clockfly pushed a commit to clockfly/spark that referenced this pull request Aug 30, 2016
## What changes were proposed in this pull request?
In Databricks, `SPARK_DIST_CLASSPATH` are used for driver classpath and `SPARK_JARS_DIR` is empty. So, we need to add `SPARK_DIST_CLASSPATH` to the `LAUNCH_CLASSPATH`. We cannot remove `SPARK_JARS_DIR` because Spark unit tests are actually using it.

Author: Yin Huai <[email protected]>

Closes apache#50 from yhuai/Add-SPARK_DIST_CLASSPATH-toLAUNCH_CLASSPATH.
ash211 added a commit to ash211/spark that referenced this pull request Jan 31, 2017
* Create README to better describe project purpose

* Add links to usage guide and dev docs

* Minor changes
lins05 pushed a commit to lins05/spark that referenced this pull request Apr 23, 2017
* Create README to better describe project purpose

* Add links to usage guide and dev docs

* Minor changes
erikerlandson pushed a commit to erikerlandson/spark that referenced this pull request Jul 28, 2017
* Create README to better describe project purpose

* Add links to usage guide and dev docs

* Minor changes
jlopezmalla pushed a commit to jlopezmalla/spark that referenced this pull request Sep 13, 2017
marcosdotps pushed a commit to marcosdotps/spark that referenced this pull request Sep 13, 2017
* Refactor and Test of ConfigSecurity

* [SPK-64] removed ssl tricks on spark-env (apache#50)
jlopezmalla pushed a commit to jlopezmalla/spark that referenced this pull request Nov 3, 2017
* removed ssl tricks on spark-env

* test phase activated

* added changes requested from jlopez-malla

* changed properties and fixed typos

* changed signature for methods
gczsjdy pushed a commit to gczsjdy/spark that referenced this pull request Jul 30, 2018
Igosuki pushed a commit to Adikteev/spark that referenced this pull request Jul 31, 2018
luzhonghao pushed a commit to luzhonghao/spark that referenced this pull request Dec 11, 2018
cloud-fan pushed a commit to cloud-fan/spark that referenced this pull request Jan 16, 2019
mccheah pushed a commit to mccheah/spark that referenced this pull request Feb 14, 2019
hejian991 pushed a commit to growingio/spark that referenced this pull request Jun 24, 2019
bzhaoopenstack pushed a commit to bzhaoopenstack/spark that referenced this pull request Sep 11, 2019
Enable Octavia in LBaaS test of terraform-openstack-provider
jzhuge pushed a commit to jzhuge/spark that referenced this pull request Oct 19, 2019
…-spark:bump_lineage_logging_211 to netflix/2.1.1-unstable

Squashed commit of the following:

commit 347c0be48e6613b07d67b6efa9247e116b3a99b2
Author: Daniel Watson <[email protected]>
Date:   Tue Oct 8 09:55:43 2019 -0700

    NETFLIX-BUILD: Bump lineage-logging to 0.1.20
fishcus pushed a commit to fishcus/spark that referenced this pull request Jul 8, 2020
* apache#49 add more metrics to application-source

* upgrade hadoop to 2.7.1

* apache#49 add request_cores to master json

* Revert "upgrade hadoop to 2.7.1"

This reverts commit 2db019d.

* upgrade kylin to 2.4.1-kylin-r38

* fix ut
microbearz added a commit to microbearz/spark that referenced this pull request Dec 15, 2020
* apache#49 add more metrics to application-source

* upgrade hadoop to 2.7.1

* apache#49 add request_cores to master json

* Revert "upgrade hadoop to 2.7.1"

This reverts commit 2db019d.

* upgrade kylin to 2.4.1-kylin-r38

* fix ut
dongjoon-hyun added a commit that referenced this pull request Jul 21, 2025
…ingBuilder`

### What changes were proposed in this pull request?

This PR aims to improve `toString` by `JEP-280` instead of `ToStringBuilder`. In addition, `Scalastyle` and `Checkstyle` rules are added to prevent a future regression.

### Why are the changes needed?

Since Java 9, `String Concatenation` has been handled better by default.

| ID | DESCRIPTION |
| - | - |
| JEP-280 | [Indify String Concatenation](https://openjdk.org/jeps/280) |

For example, this PR improves `OpenBlocks` like the following. Both Java source code and byte code are simplified a lot by utilizing JEP-280 properly.

**CODE CHANGE**
```java

- return new ToStringBuilder(this, ToStringStyle.SHORT_PREFIX_STYLE)
-   .append("appId", appId)
-   .append("execId", execId)
-   .append("blockIds", Arrays.toString(blockIds))
-   .toString();
+ return "OpenBlocks[appId=" + appId + ",execId=" + execId + ",blockIds=" +
+     Arrays.toString(blockIds) + "]";
```

**BEFORE**
```
  public java.lang.String toString();
    Code:
       0: new           #39                 // class org/apache/commons/lang3/builder/ToStringBuilder
       3: dup
       4: aload_0
       5: getstatic     #41                 // Field org/apache/commons/lang3/builder/ToStringStyle.SHORT_PREFIX_STYLE:Lorg/apache/commons/lang3/builder/ToStringStyle;
       8: invokespecial #47                 // Method org/apache/commons/lang3/builder/ToStringBuilder."<init>":(Ljava/lang/Object;Lorg/apache/commons/lang3/builder/ToStringStyle;)V
      11: ldc           #50                 // String appId
      13: aload_0
      14: getfield      #7                  // Field appId:Ljava/lang/String;
      17: invokevirtual #51                 // Method org/apache/commons/lang3/builder/ToStringBuilder.append:(Ljava/lang/String;Ljava/lang/Object;)Lorg/apache/commons/lang3/builder/ToStringBuilder;
      20: ldc           #55                 // String execId
      22: aload_0
      23: getfield      #13                 // Field execId:Ljava/lang/String;
      26: invokevirtual #51                 // Method org/apache/commons/lang3/builder/ToStringBuilder.append:(Ljava/lang/String;Ljava/lang/Object;)Lorg/apache/commons/lang3/builder/ToStringBuilder;
      29: ldc           #56                 // String blockIds
      31: aload_0
      32: getfield      #16                 // Field blockIds:[Ljava/lang/String;
      35: invokestatic  #57                 // Method java/util/Arrays.toString:([Ljava/lang/Object;)Ljava/lang/String;
      38: invokevirtual #51                 // Method org/apache/commons/lang3/builder/ToStringBuilder.append:(Ljava/lang/String;Ljava/lang/Object;)Lorg/apache/commons/lang3/builder/ToStringBuilder;
      41: invokevirtual #61                 // Method org/apache/commons/lang3/builder/ToStringBuilder.toString:()Ljava/lang/String;
      44: areturn
```

**AFTER**
```
  public java.lang.String toString();
    Code:
       0: aload_0
       1: getfield      #7                  // Field appId:Ljava/lang/String;
       4: aload_0
       5: getfield      #13                 // Field execId:Ljava/lang/String;
       8: aload_0
       9: getfield      #16                 // Field blockIds:[Ljava/lang/String;
      12: invokestatic  #39                 // Method java/util/Arrays.toString:([Ljava/lang/Object;)Ljava/lang/String;
      15: invokedynamic #43,  0             // InvokeDynamic #0:makeConcatWithConstants:(Ljava/lang/String;Ljava/lang/String;Ljava/lang/String;)Ljava/lang/String;
      20: areturn
```

### Does this PR introduce _any_ user-facing change?

No. This is an `toString` implementation improvement.

### How was this patch tested?

Pass the CIs.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #51572 from dongjoon-hyun/SPARK-52880.

Authored-by: Dongjoon Hyun <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
haoyangeng-db pushed a commit to haoyangeng-db/apache-spark that referenced this pull request Jul 22, 2025
…ingBuilder`

### What changes were proposed in this pull request?

This PR aims to improve `toString` by `JEP-280` instead of `ToStringBuilder`. In addition, `Scalastyle` and `Checkstyle` rules are added to prevent a future regression.

### Why are the changes needed?

Since Java 9, `String Concatenation` has been handled better by default.

| ID | DESCRIPTION |
| - | - |
| JEP-280 | [Indify String Concatenation](https://openjdk.org/jeps/280) |

For example, this PR improves `OpenBlocks` like the following. Both Java source code and byte code are simplified a lot by utilizing JEP-280 properly.

**CODE CHANGE**
```java

- return new ToStringBuilder(this, ToStringStyle.SHORT_PREFIX_STYLE)
-   .append("appId", appId)
-   .append("execId", execId)
-   .append("blockIds", Arrays.toString(blockIds))
-   .toString();
+ return "OpenBlocks[appId=" + appId + ",execId=" + execId + ",blockIds=" +
+     Arrays.toString(blockIds) + "]";
```

**BEFORE**
```
  public java.lang.String toString();
    Code:
       0: new           apache#39                 // class org/apache/commons/lang3/builder/ToStringBuilder
       3: dup
       4: aload_0
       5: getstatic     apache#41                 // Field org/apache/commons/lang3/builder/ToStringStyle.SHORT_PREFIX_STYLE:Lorg/apache/commons/lang3/builder/ToStringStyle;
       8: invokespecial apache#47                 // Method org/apache/commons/lang3/builder/ToStringBuilder."<init>":(Ljava/lang/Object;Lorg/apache/commons/lang3/builder/ToStringStyle;)V
      11: ldc           apache#50                 // String appId
      13: aload_0
      14: getfield      apache#7                  // Field appId:Ljava/lang/String;
      17: invokevirtual apache#51                 // Method org/apache/commons/lang3/builder/ToStringBuilder.append:(Ljava/lang/String;Ljava/lang/Object;)Lorg/apache/commons/lang3/builder/ToStringBuilder;
      20: ldc           apache#55                 // String execId
      22: aload_0
      23: getfield      apache#13                 // Field execId:Ljava/lang/String;
      26: invokevirtual apache#51                 // Method org/apache/commons/lang3/builder/ToStringBuilder.append:(Ljava/lang/String;Ljava/lang/Object;)Lorg/apache/commons/lang3/builder/ToStringBuilder;
      29: ldc           apache#56                 // String blockIds
      31: aload_0
      32: getfield      apache#16                 // Field blockIds:[Ljava/lang/String;
      35: invokestatic  apache#57                 // Method java/util/Arrays.toString:([Ljava/lang/Object;)Ljava/lang/String;
      38: invokevirtual apache#51                 // Method org/apache/commons/lang3/builder/ToStringBuilder.append:(Ljava/lang/String;Ljava/lang/Object;)Lorg/apache/commons/lang3/builder/ToStringBuilder;
      41: invokevirtual apache#61                 // Method org/apache/commons/lang3/builder/ToStringBuilder.toString:()Ljava/lang/String;
      44: areturn
```

**AFTER**
```
  public java.lang.String toString();
    Code:
       0: aload_0
       1: getfield      apache#7                  // Field appId:Ljava/lang/String;
       4: aload_0
       5: getfield      apache#13                 // Field execId:Ljava/lang/String;
       8: aload_0
       9: getfield      apache#16                 // Field blockIds:[Ljava/lang/String;
      12: invokestatic  apache#39                 // Method java/util/Arrays.toString:([Ljava/lang/Object;)Ljava/lang/String;
      15: invokedynamic apache#43,  0             // InvokeDynamic #0:makeConcatWithConstants:(Ljava/lang/String;Ljava/lang/String;Ljava/lang/String;)Ljava/lang/String;
      20: areturn
```

### Does this PR introduce _any_ user-facing change?

No. This is an `toString` implementation improvement.

### How was this patch tested?

Pass the CIs.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes apache#51572 from dongjoon-hyun/SPARK-52880.

Authored-by: Dongjoon Hyun <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants