SKIPME Spark 1.4.1 #64

markhamstra · 2015-07-13T16:49:56Z

catch up with branch-1.4 bug fixes; Spark 1.4.1 release

…rt to 1.4) This PR backports apache#7199 to branch-1.4 Author: Cheng Lian <[email protected]> Closes apache#7200 from liancheng/spark-8501-for-1.4 and squashes the following commits: 725e9e3 [Cheng Lian] Addresses comments 0fa25af [Cheng Lian] Avoids reading schema from empty ORC files

I am increasing the perm gen size to 256m. https://issues.apache.org/jira/browse/SPARK-8776 Author: Yin Huai <[email protected]> Closes apache#7196 from yhuai/SPARK-8776 and squashes the following commits: 60901b4 [Yin Huai] Fix test. d44b713 [Yin Huai] Make sparkShell and hiveConsole use 256m PermGen size. 30aaf8e [Yin Huai] Increase the default PermGen size to 256m. (cherry picked from commit f743c79) Signed-off-by: Yin Huai <[email protected]>

cc rxin Having back ticks or null as elements causes problems. Since elements become column names, we have to drop them from the element as back ticks are special characters. Having null throws exceptions, we could replace them with empty strings. Handling back ticks should be improved for 1.5 Author: Burak Yavuz <[email protected]> Closes apache#7201 from brkyvz/weird-ct-elements and squashes the following commits: e06b840 [Burak Yavuz] fix scalastyle 93a0d3f [Burak Yavuz] added tests for NaN and Infinity 9dba6ce [Burak Yavuz] address cr1 db71dbd [Burak Yavuz] handle special characters in elements in crosstab (cherry picked from commit 9b23e92) Signed-off-by: Reynold Xin <[email protected]> Conflicts: sql/core/src/main/scala/org/apache/spark/sql/execution/stat/StatFunctions.scala

JIRA: https://issues.apache.org/jira/browse/SPARK-8463 Currently, at the reading path, `DriverRegistry` is used to load needed jdbc driver at executors. However, at the writing path, we also need `DriverRegistry` to load jdbc driver. Author: Liang-Chi Hsieh <[email protected]> Closes apache#6900 from viirya/jdbc_write_driver and squashes the following commits: 16cd04b [Liang-Chi Hsieh] Use DriverRegistry to load jdbc driver at writing path. (cherry picked from commit d4d6d31) Signed-off-by: Reynold Xin <[email protected]>

This is a workaround for MSHADE-148, which leads to an infinite loop when building Spark with maven 3.3.x. This was originally caused by apache#6441, which added a bunch of test dependencies on the spark-core test module. Recently, it was revealed by apache#7193. This patch adds a `-Prelease` profile. If present, it will set `createDependencyReducedPom` to true. The consequences are: - If you are releasing Spark with this profile, you are fine as long as you use maven 3.2.x or before. - If you are releasing Spark without this profile, you will run into SPARK-8781. - If you are not releasing Spark but you are using this profile, you may run into SPARK-8819. - If you are not releasing Spark and you did not include this profile, you are fine. This is all documented in `pom.xml` and tested locally with both versions of maven. Author: Andrew Or <[email protected]> Closes apache#7219 from andrewor14/fix-maven-build and squashes the following commits: 1d37e87 [Andrew Or] Merge branch 'master' of github.com:apache/spark into fix-maven-build 3574ae4 [Andrew Or] Review comments f39199c [Andrew Or] Create a -Prelease profile that flags `createDependencyReducedPom` (cherry picked from commit 9eae5fa) Signed-off-by: Andrew Or <[email protected]>

…ved" This reverts commit 82cf331. Conflicts: pom.xml

when publishing releases. We named it as 'release-profile' because that is the Maven convention. However, it turns out this special name causes several other things to kick-in when we are creating releases that are not desirable. For instance, it triggers the javadoc plugin to run, which actually fails in our current build set-up. The fix is just to rename this to a different profile to have no collateral damage associated with its use.

Otherwise the script will crash with - Downloading boto... Traceback (most recent call last): File "ec2/spark_ec2.py", line 148, in <module> setup_external_libs(external_libs) File "ec2/spark_ec2.py", line 128, in setup_external_libs if hashlib.md5(tar.read()).hexdigest() != lib["md5"]: File "/usr/lib/python3.4/codecs.py", line 319, in decode (result, consumed) = self._buffer_decode(data, self.errors, final) UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8b in position 1: invalid start byte In case of an utf8 env setting. Author: Simon Hafner <[email protected]> Closes apache#7215 from reactormonk/branch-1.4 and squashes the following commits: e86957a [Simon Hafner] [SPARK-8821] [EC2] Switched to binary mode

…ts only of NullType columns https://issues.apache.org/jira/browse/SPARK-8868 Author: Yin Huai <[email protected]> Closes apache#7262 from yhuai/SPARK-8868 and squashes the following commits: cb58780 [Yin Huai] Andrew's comment. e456857 [Yin Huai] Josh's comments. 5122e65 [Yin Huai] If types of all columns are NullTypes, do not use serializer2. (cherry picked from commit 68a4a16) Signed-off-by: Josh Rosen <[email protected]>

Author: Sun Rui <[email protected]> Closes apache#7287 from sun-rui/SPARK-8894 and squashes the following commits: da63898 [Sun Rui] [SPARK-8894][SPARKR][DOC] Example code errors in SparkR documentation. (cherry picked from commit bf02e37) Signed-off-by: Shivaram Venkataraman <[email protected]>

Fail to upload resource to viewfs in spark-1.4 JIRA Link: https://issues.apache.org/jira/browse/SPARK-8657 Author: Tao Li <[email protected]> Closes apache#7125 from litao-buptsse/SPARK-8657-for-master and squashes the following commits: 65b13f4 [Tao Li] [SPARK-8657] [YARN] Fail to upload resource to viewfs (cherry picked from commit 26d9b6b) Signed-off-by: Sean Owen <[email protected]> # Conflicts: # yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala

Fail to upload resource to viewfs in spark-1.4 JIRA Link: https://issues.apache.org/jira/browse/SPARK-8657 Author: Tao Li <[email protected]> Closes apache#7125 from litao-buptsse/SPARK-8657-for-master and squashes the following commits: 65b13f4 [Tao Li] [SPARK-8657] [YARN] Fail to upload resource to viewfs (cherry picked from commit 26d9b6b) Signed-off-by: Sean Owen <[email protected]>

cc pwendell Author: Shivaram Venkataraman <[email protected]> Closes apache#7293 from shivaram/sparkr-packages-doc and squashes the following commits: c91471d [Shivaram Venkataraman] Fix sparkPackages in init documentation (cherry picked from commit 374c8a8) Signed-off-by: Shivaram Venkataraman <[email protected]>

…ng-guide#Manually Specifying Options to be in sync with java,python, R version Author: Alok Singh <“[email protected]”> Closes apache#7299 from aloknsingh/aloknsingh_SPARK-8909 and squashes the following commits: d3c20ba [Alok Singh] fix the file to .parquet from .json d476140 [Alok Singh] [SPARK-8909][Documentation] Change the scala example in sql-programming-guide#Manually Specifying Options to be in sync with java,python, R version (cherry picked from commit 8f3cd93) Signed-off-by: Reynold Xin <[email protected]>

This fixes a bug introduced in the cherry-pick of apache#7201 which led to a NullPointerException when cross-tabulating a data set that contains null values. Author: Josh Rosen <[email protected]> Closes apache#7295 from JoshRosen/SPARK-8903 and squashes the following commits: 5489948 [Josh Rosen] [SPARK-8903] Fix bug in cherry-pick of SPARK-8803

With "+" the strings are separate expressions, and format() is called on the last string before concatenation. (So substitution does not happen.) Without "+" the string literals are merged first by the parser, so format() is called on the complete string. Should I make a JIRA for this? Author: Daniel Darabos <[email protected]> Closes apache#7288 from darabos/patch-2 and squashes the following commits: be0d3b7 [Daniel Darabos] Correctly print hostname in error (cherry picked from commit 5687f76) Signed-off-by: Kousuke Saruta <[email protected]>

A couple descriptions were not inside `<td></td>` and were being displayed immediately under the section title instead of in their row. Author: Jonathan Alter <[email protected]> Closes apache#7292 from jonalter/docs-config and squashes the following commits: 5ce1570 [Jonathan Alter] [DOCS] Format wrong for some config descriptions (cherry picked from commit 28fa01e) Signed-off-by: Sean Owen <[email protected]>

Due to the way MiMa works, we currently start a `SQLContext` pretty early on. This causes us to start a `SparkUI` that attempts to bind to port 4040. Because many tests run in parallel on the Jenkins machines, this causes port contention sometimes and fails the MiMa tests. Note that we already disabled the SparkUI for scalatests. However, the MiMa test is run before we even have a chance to load the default scalatest settings, so we need to explicitly disable the UI ourselves. Author: Andrew Or <[email protected]> Closes apache#7300 from andrewor14/mima-flaky and squashes the following commits: b55a547 [Andrew Or] Do not enable SparkUI during tests (cherry picked from commit 47ef423) Signed-off-by: Reynold Xin <[email protected]>

…s missing in ScalaTest config. `spark.unsafe.exceptionOnMemoryLeak` is present in the config of surefire. ```  <plugin> <groupId>org.apache.maven.plugins</groupId> <artifactId>maven-surefire-plugin</artifactId> <version>2.18.1</version>  ... <spark.unsafe.exceptionOnMemoryLeak>true</spark.unsafe.exceptionOnMemoryLeak> </systemProperties> ... ``` but is absent in the config ScalaTest. Author: Kousuke Saruta <[email protected]> Closes apache#7308 from sarutak/add-setting-for-memory-leak and squashes the following commits: 95644e7 [Kousuke Saruta] Added a setting for memory leak (cherry picked from commit aba5784) Signed-off-by: Kousuke Saruta <[email protected]>

…le input stream Fix this failure: https://amplab.cs.berkeley.edu/jenkins/job/Spark-Master-SBT/2886/AMPLAB_JENKINS_BUILD_PROFILE=hadoop2.3,label=centos/testReport/junit/org.apache.spark.streaming/CheckpointSuite/recovery_with_file_input_stream/ To reproduce this failure, you can add `Thread.sleep(2000)` before this line https://github.com/apache/spark/blob/a9c4e29950a14e32acaac547e9a0e8879fd37fc9/streaming/src/test/scala/org/apache/spark/streaming/CheckpointSuite.scala#L477 Author: zsxwing <[email protected]> Closes apache#7323 from zsxwing/SPARK-7419 and squashes the following commits: b3caf58 [zsxwing] Fix CheckpointSuite.recovery with file input stream

(This reopens a patch that was closed in the past: apache#6248) When you view the stage page while running the following: ``` sc.parallelize(1 to X, 10000).count() ``` The page never loads, the job is stalled, and you end up running into an OOM: ``` HTTP ERROR 500 Problem accessing /stages/stage/. Reason: Server Error Caused by: java.lang.OutOfMemoryError: Java heap space at java.util.Arrays.copyOf(Arrays.java:2367) at java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:130) ``` This patch compresses Jetty responses in gzip. The correct long-term fix is to add pagination. Author: Andrew Or <[email protected]> Closes apache#7296 from andrewor14/gzip-jetty and squashes the following commits: a051c64 [Andrew Or] Use GZIP to compress Jetty responses (cherry picked from commit ebdf585) Signed-off-by: Andrew Or <[email protected]>

Author: guowei2 <[email protected]> Closes apache#7254 from guowei2/spark-8865 and squashes the following commits: 48ca17a [guowei2] fix contains key (cherry picked from commit 8977003) Signed-off-by: Tathagata Das <[email protected]>

Runs for *all* existing keys and returning "None" will remove the key-value pair. Author: Michael Vogiatzis <[email protected]> Closes apache#7229 from mvogiatzis/patch-1 and squashes the following commits: e7a2946 [Michael Vogiatzis] Updated updateStateByKey text 00283ed [Michael Vogiatzis] Removed space c2656f9 [Michael Vogiatzis] Moved description farther up 0a42551 [Michael Vogiatzis] Added important updateStateByKey details (cherry picked from commit d538919) Signed-off-by: Tathagata Das <[email protected]>

…t user specified options (for branch-1.4) Backports PR apache#7347 (SPARK-8990) to branch-1.4. Author: Cheng Lian <[email protected]> Closes apache#7351 from liancheng/spark-8990-for-1.4 and squashes the following commits: ffb5a73 [Cheng Lian] Backports PR apache#7347 (SPARK-8990) to branch-1.4

SKIPME Spark 1.4.1

## What changes were proposed in this pull request? These error below seems caused by unidoc that does not understand double commented block. ``` [error] .../spark/sql/core/target/java/org/apache/spark/sql/KeyedState.java:69: error: class, interface, or enum expected [error] * MapGroupsWithStateFunction<String, Integer, Integer, String> mappingFunction = [error] ^ [error] .../spark/sql/core/target/java/org/apache/spark/sql/KeyedState.java:69: error: class, interface, or enum expected [error] * MapGroupsWithStateFunction<String, Integer, Integer, String> mappingFunction = [error] ^ [error] .../spark/sql/core/target/java/org/apache/spark/sql/KeyedState.java:70: error: class, interface, or enum expected [error] * new MapGroupsWithStateFunction<String, Integer, Integer, String>() { [error] ^ [error] .../spark/sql/core/target/java/org/apache/spark/sql/KeyedState.java:70: error: class, interface, or enum expected [error] * new MapGroupsWithStateFunction<String, Integer, Integer, String>() { [error] ^ [error] .../spark/sql/core/target/java/org/apache/spark/sql/KeyedState.java:72: error: illegal character: '#' [error] * @Override [error] ^ [error] .../spark/sql/core/target/java/org/apache/spark/sql/KeyedState.java:72: error: class, interface, or enum expected [error] * @Override [error] ^ [error] .../spark/sql/core/target/java/org/apache/spark/sql/KeyedState.java:73: error: class, interface, or enum expected [error] * public String call(String key, Iterator<Integer> value, KeyedState<Integer> state) { [error] ^ [error] .../spark/sql/core/target/java/org/apache/spark/sql/KeyedState.java:73: error: class, interface, or enum expected [error] * public String call(String key, Iterator<Integer> value, KeyedState<Integer> state) { [error] ^ [error] .../spark/sql/core/target/java/org/apache/spark/sql/KeyedState.java:73: error: class, interface, or enum expected [error] * public String call(String key, Iterator<Integer> value, KeyedState<Integer> state) { [error] ^ [error] .../spark/sql/core/target/java/org/apache/spark/sql/KeyedState.java:73: error: class, interface, or enum expected [error] * public String call(String key, Iterator<Integer> value, KeyedState<Integer> state) { [error] ^ [error] .../spark/sql/core/target/java/org/apache/spark/sql/KeyedState.java:73: error: class, interface, or enum expected [error] * public String call(String key, Iterator<Integer> value, KeyedState<Integer> state) { [error] ^ [error] .../spark/sql/core/target/java/org/apache/spark/sql/KeyedState.java:76: error: class, interface, or enum expected [error] * boolean shouldRemove = ...; // Decide whether to remove the state [error] ^ [error] .../spark/sql/core/target/java/org/apache/spark/sql/KeyedState.java:77: error: class, interface, or enum expected [error] * if (shouldRemove) { [error] ^ [error] .../spark/sql/core/target/java/org/apache/spark/sql/KeyedState.java:79: error: class, interface, or enum expected [error] * } else { [error] ^ [error] .../spark/sql/core/target/java/org/apache/spark/sql/KeyedState.java:81: error: class, interface, or enum expected [error] * state.update(newState); // Set the new state [error] ^ [error] .../spark/sql/core/target/java/org/apache/spark/sql/KeyedState.java:82: error: class, interface, or enum expected [error] * } [error] ^ [error] .../forked/spark/sql/core/target/java/org/apache/spark/sql/KeyedState.java:85: error: class, interface, or enum expected [error] * state.update(initialState); [error] ^ [error] .../forked/spark/sql/core/target/java/org/apache/spark/sql/KeyedState.java:86: error: class, interface, or enum expected [error] * } [error] ^ [error] .../spark/sql/core/target/java/org/apache/spark/sql/KeyedState.java:90: error: class, interface, or enum expected [error] * </code></pre> [error] ^ [error] .../spark/sql/core/target/java/org/apache/spark/sql/KeyedState.java:92: error: class, interface, or enum expected [error] * tparam S User-defined type of the state to be stored for each key. Must be encodable into [error] ^ [error] .../spark/sql/core/target/java/org/apache/spark/sql/KeyedState.java:93: error: class, interface, or enum expected [error] * Spark SQL types (see {link Encoder} for more details). [error] ^ [error] .../spark/sql/core/target/java/org/apache/spark/sql/KeyedState.java:94: error: class, interface, or enum expected [error] * since 2.1.1 [error] ^ ``` And another link seems unrecognisable. ``` .../spark/sql/core/target/java/org/apache/spark/sql/KeyedState.java:16: error: reference not found [error] * That is, in every batch of the {link streaming.StreamingQuery StreamingQuery}, [error] ``` Note that this PR does not fix the two breaks as below: ``` [error] .../spark/sql/core/target/java/org/apache/spark/sql/DataFrameStatFunctions.java:43: error: unexpected content [error] * see {link DataFrameStatsFunctions.approxQuantile(col:Str* approxQuantile} for [error] ^ [error] .../spark/sql/core/target/java/org/apache/spark/sql/DataFrameStatFunctions.java:52: error: bad use of '>' [error] * param relativeError The relative target precision to achieve (>= 0). [error] ^ [error] ``` because these seem probably fixed soon in apache#16776 and I intended to avoid potential conflicts. ## How was this patch tested? Manually via `jekyll build` Author: hyukjinkwon <[email protected]> Closes apache#16926 from HyukjinKwon/javadoc-break.

Davies Liu and others added 30 commits July 2, 2015 15:58

fix string order for non-ascii character

3f1e4ef

Preparing Spark release v1.4.1-rc2

07b95c7

Preparing development version 1.4.2-SNAPSHOT

e990561

Merge branch 'branch-1.4' of github.com:apache/spark into csd-1.4

73e57cd

Revert "[SPARK-8781] Fix variables in published pom.xml are not resol…

997444c

…ved" This reverts commit 82cf331. Conflicts: pom.xml

Preparing Spark release v1.4.1-rc3

f8aab7a

Preparing development version 1.4.2-SNAPSHOT

5c080c2

Preparing Spark release v1.4.1-rc3

3e8ae38

Preparing development version 1.4.2-SNAPSHOT

bf8b47d

Merge branch 'branch-1.4' of github.com:apache/spark into csd-1.4

ec94b6d

[HOTFIX] Fix style error introduced in e4313db

898b073

Preparing Spark release v1.4.1-rc4

dbaa5c2

Preparing development version 1.4.2-SNAPSHOT

5bc19a1

sarutak and others added 10 commits July 9, 2015 13:28

Merge branch 'branch-1.4' of github.com:apache/spark into csd-1.4

2376ce8

Merge branch 'branch-1.4' of github.com:apache/spark into csd-1.4

756beda

Merge branch 'branch-1.4' of github.com:apache/spark into csd-1.4

66b7467

bumped version for 1.4.1

5819266

markhamstra added a commit that referenced this pull request Jul 13, 2015

Merge pull request #64 from markhamstra/csd-1.4

478d9fb

SKIPME Spark 1.4.1

markhamstra merged commit 478d9fb into alteryx:csd-1.4 Jul 13, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

SKIPME Spark 1.4.1 #64

SKIPME Spark 1.4.1 #64

Uh oh!

markhamstra commented Jul 13, 2015

Uh oh!

Uh oh!

SKIPME Spark 1.4.1 #64

SKIPME Spark 1.4.1 #64

Uh oh!

Conversation

markhamstra commented Jul 13, 2015

Uh oh!

Uh oh!