Skip to content

SKIPME Spark 1.4.1 #64

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 40 commits into from
Jul 13, 2015
Merged

SKIPME Spark 1.4.1 #64

merged 40 commits into from
Jul 13, 2015

Conversation

markhamstra
Copy link

catch up with branch-1.4 bug fixes; Spark 1.4.1 release

Davies Liu and others added 30 commits July 2, 2015 15:58
…rt to 1.4)

This PR backports apache#7199 to branch-1.4

Author: Cheng Lian <[email protected]>

Closes apache#7200 from liancheng/spark-8501-for-1.4 and squashes the following commits:

725e9e3 [Cheng Lian] Addresses comments
0fa25af [Cheng Lian] Avoids reading schema from empty ORC files
I am increasing the perm gen size to 256m.

https://issues.apache.org/jira/browse/SPARK-8776

Author: Yin Huai <[email protected]>

Closes apache#7196 from yhuai/SPARK-8776 and squashes the following commits:

60901b4 [Yin Huai] Fix test.
d44b713 [Yin Huai] Make sparkShell and hiveConsole use 256m PermGen size.
30aaf8e [Yin Huai] Increase the default PermGen size to 256m.

(cherry picked from commit f743c79)
Signed-off-by: Yin Huai <[email protected]>
cc rxin

Having back ticks or null as elements causes problems.
Since elements become column names, we have to drop them from the element as back ticks are special characters.
Having null throws exceptions, we could replace them with empty strings.

Handling back ticks should be improved for 1.5

Author: Burak Yavuz <[email protected]>

Closes apache#7201 from brkyvz/weird-ct-elements and squashes the following commits:

e06b840 [Burak Yavuz] fix scalastyle
93a0d3f [Burak Yavuz] added tests for NaN and Infinity
9dba6ce [Burak Yavuz] address cr1
db71dbd [Burak Yavuz] handle special characters in elements in crosstab

(cherry picked from commit 9b23e92)
Signed-off-by: Reynold Xin <[email protected]>

Conflicts:
	sql/core/src/main/scala/org/apache/spark/sql/execution/stat/StatFunctions.scala
JIRA: https://issues.apache.org/jira/browse/SPARK-8463

Currently, at the reading path, `DriverRegistry` is used to load needed jdbc driver at executors. However, at the writing path, we also need `DriverRegistry` to load jdbc driver.

Author: Liang-Chi Hsieh <[email protected]>

Closes apache#6900 from viirya/jdbc_write_driver and squashes the following commits:

16cd04b [Liang-Chi Hsieh] Use DriverRegistry to load jdbc driver at writing path.

(cherry picked from commit d4d6d31)
Signed-off-by: Reynold Xin <[email protected]>
This is a workaround for MSHADE-148, which leads to an infinite loop when building Spark with maven 3.3.x. This was originally caused by apache#6441, which added a bunch of test dependencies on the spark-core test module. Recently, it was revealed by apache#7193.

This patch adds a `-Prelease` profile. If present, it will set `createDependencyReducedPom` to true. The consequences are:
- If you are releasing Spark with this profile, you are fine as long as you use maven 3.2.x or before.
- If you are releasing Spark without this profile, you will run into SPARK-8781.
- If you are not releasing Spark but you are using this profile, you may run into SPARK-8819.
- If you are not releasing Spark and you did not include this profile, you are fine.

This is all documented in `pom.xml` and tested locally with both versions of maven.

Author: Andrew Or <[email protected]>

Closes apache#7219 from andrewor14/fix-maven-build and squashes the following commits:

1d37e87 [Andrew Or] Merge branch 'master' of github.com:apache/spark into fix-maven-build
3574ae4 [Andrew Or] Review comments
f39199c [Andrew Or] Create a -Prelease profile that flags `createDependencyReducedPom`

(cherry picked from commit 9eae5fa)
Signed-off-by: Andrew Or <[email protected]>
when publishing releases. We named it as 'release-profile' because that is
the Maven convention. However, it turns out this special name causes several
other things to kick-in when we are creating releases that are not desirable.
For instance, it triggers the javadoc plugin to run, which actually fails
in our current build set-up.

The fix is just to rename this to a different profile to have no
collateral damage associated with its use.
Otherwise the script will crash with

    - Downloading boto...
    Traceback (most recent call last):
      File "ec2/spark_ec2.py", line 148, in <module>
        setup_external_libs(external_libs)
      File "ec2/spark_ec2.py", line 128, in setup_external_libs
        if hashlib.md5(tar.read()).hexdigest() != lib["md5"]:
      File "/usr/lib/python3.4/codecs.py", line 319, in decode
        (result, consumed) = self._buffer_decode(data, self.errors, final)
    UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8b in position 1: invalid start byte

In case of an utf8 env setting.

Author: Simon Hafner <[email protected]>

Closes apache#7215 from reactormonk/branch-1.4 and squashes the following commits:

e86957a [Simon Hafner] [SPARK-8821] [EC2] Switched to binary mode
…ts only of NullType columns

https://issues.apache.org/jira/browse/SPARK-8868

Author: Yin Huai <[email protected]>

Closes apache#7262 from yhuai/SPARK-8868 and squashes the following commits:

cb58780 [Yin Huai] Andrew's comment.
e456857 [Yin Huai] Josh's comments.
5122e65 [Yin Huai] If types of all columns are NullTypes, do not use serializer2.

(cherry picked from commit 68a4a16)
Signed-off-by: Josh Rosen <[email protected]>
Author: Sun Rui <[email protected]>

Closes apache#7287 from sun-rui/SPARK-8894 and squashes the following commits:

da63898 [Sun Rui] [SPARK-8894][SPARKR][DOC] Example code errors in SparkR documentation.

(cherry picked from commit bf02e37)
Signed-off-by: Shivaram Venkataraman <[email protected]>
Fail to upload resource to viewfs in spark-1.4
JIRA Link: https://issues.apache.org/jira/browse/SPARK-8657

Author: Tao Li <[email protected]>

Closes apache#7125 from litao-buptsse/SPARK-8657-for-master and squashes the following commits:

65b13f4 [Tao Li] [SPARK-8657] [YARN] Fail to upload resource to viewfs

(cherry picked from commit 26d9b6b)
Signed-off-by: Sean Owen <[email protected]>

# Conflicts:
#	yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala
Fail to upload resource to viewfs in spark-1.4
JIRA Link: https://issues.apache.org/jira/browse/SPARK-8657

Author: Tao Li <[email protected]>

Closes apache#7125 from litao-buptsse/SPARK-8657-for-master and squashes the following commits:

65b13f4 [Tao Li] [SPARK-8657] [YARN] Fail to upload resource to viewfs

(cherry picked from commit 26d9b6b)
Signed-off-by: Sean Owen <[email protected]>
cc pwendell

Author: Shivaram Venkataraman <[email protected]>

Closes apache#7293 from shivaram/sparkr-packages-doc and squashes the following commits:

c91471d [Shivaram Venkataraman] Fix sparkPackages in init documentation

(cherry picked from commit 374c8a8)
Signed-off-by: Shivaram Venkataraman <[email protected]>
…ng-guide#Manually Specifying Options to be in sync with java,python, R version

Author: Alok Singh <“[email protected]”>

Closes apache#7299 from aloknsingh/aloknsingh_SPARK-8909 and squashes the following commits:

d3c20ba [Alok Singh] fix the file to .parquet from .json
d476140 [Alok Singh] [SPARK-8909][Documentation] Change the scala example in sql-programming-guide#Manually Specifying Options to be in sync with java,python, R version

(cherry picked from commit 8f3cd93)
Signed-off-by: Reynold Xin <[email protected]>
This fixes a bug introduced in the cherry-pick of apache#7201 which led to a NullPointerException when cross-tabulating a data set that contains null values.

Author: Josh Rosen <[email protected]>

Closes apache#7295 from JoshRosen/SPARK-8903 and squashes the following commits:

5489948 [Josh Rosen] [SPARK-8903] Fix bug in cherry-pick of SPARK-8803
With "+" the strings are separate expressions, and format() is called on the last string before concatenation. (So substitution does not happen.) Without "+" the string literals are merged first by the parser, so format() is called on the complete string.

Should I make a JIRA for this?

Author: Daniel Darabos <[email protected]>

Closes apache#7288 from darabos/patch-2 and squashes the following commits:

be0d3b7 [Daniel Darabos] Correctly print hostname in error

(cherry picked from commit 5687f76)
Signed-off-by: Kousuke Saruta <[email protected]>
A couple descriptions were not inside `<td></td>` and were being displayed immediately under the section title instead of in their row.

Author: Jonathan Alter <[email protected]>

Closes apache#7292 from jonalter/docs-config and squashes the following commits:

5ce1570 [Jonathan Alter] [DOCS] Format wrong for some config descriptions

(cherry picked from commit 28fa01e)
Signed-off-by: Sean Owen <[email protected]>
Due to the way MiMa works, we currently start a `SQLContext` pretty early on. This causes us to start a `SparkUI` that attempts to bind to port 4040. Because many tests run in parallel on the Jenkins machines, this  causes port contention sometimes and fails the MiMa tests.

Note that we already disabled the SparkUI for scalatests. However, the MiMa test is run before we even have a chance to load the default scalatest settings, so we need to explicitly disable the UI ourselves.

Author: Andrew Or <[email protected]>

Closes apache#7300 from andrewor14/mima-flaky and squashes the following commits:

b55a547 [Andrew Or] Do not enable SparkUI during tests

(cherry picked from commit 47ef423)
Signed-off-by: Reynold Xin <[email protected]>
sarutak and others added 10 commits July 9, 2015 13:28
…s missing in ScalaTest config.

`spark.unsafe.exceptionOnMemoryLeak` is present in the config of surefire.

```
        <!-- Surefire runs all Java tests -->
        <plugin>
          <groupId>org.apache.maven.plugins</groupId>
          <artifactId>maven-surefire-plugin</artifactId>
          <version>2.18.1</version>
          <!-- Note config is repeated in scalatest config -->
...

<spark.unsafe.exceptionOnMemoryLeak>true</spark.unsafe.exceptionOnMemoryLeak>
            </systemProperties>
...
```

 but is absent in the config ScalaTest.

Author: Kousuke Saruta <[email protected]>

Closes apache#7308 from sarutak/add-setting-for-memory-leak and squashes the following commits:

95644e7 [Kousuke Saruta] Added a setting for memory leak

(cherry picked from commit aba5784)
Signed-off-by: Kousuke Saruta <[email protected]>
(This reopens a patch that was closed in the past: apache#6248)

When you view the stage page while running the following:
```
sc.parallelize(1 to X, 10000).count()
```
The page never loads, the job is stalled, and you end up running into an OOM:
```
HTTP ERROR 500

Problem accessing /stages/stage/. Reason:
    Server Error
Caused by:
java.lang.OutOfMemoryError: Java heap space
    at java.util.Arrays.copyOf(Arrays.java:2367)
    at java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:130)
```
This patch compresses Jetty responses in gzip. The correct long-term fix is to add pagination.

Author: Andrew Or <[email protected]>

Closes apache#7296 from andrewor14/gzip-jetty and squashes the following commits:

a051c64 [Andrew Or] Use GZIP to compress Jetty responses

(cherry picked from commit ebdf585)
Signed-off-by: Andrew Or <[email protected]>
Author: guowei2 <[email protected]>

Closes apache#7254 from guowei2/spark-8865 and squashes the following commits:

48ca17a [guowei2] fix contains key

(cherry picked from commit 8977003)
Signed-off-by: Tathagata Das <[email protected]>
Runs for *all* existing keys and returning "None" will remove the key-value pair.

Author: Michael Vogiatzis <[email protected]>

Closes apache#7229 from mvogiatzis/patch-1 and squashes the following commits:

e7a2946 [Michael Vogiatzis] Updated updateStateByKey text
00283ed [Michael Vogiatzis] Removed space
c2656f9 [Michael Vogiatzis] Moved description farther up
0a42551 [Michael Vogiatzis] Added important updateStateByKey details

(cherry picked from commit d538919)
Signed-off-by: Tathagata Das <[email protected]>
…t user specified options (for branch-1.4)

Backports PR apache#7347 (SPARK-8990) to branch-1.4.

Author: Cheng Lian <[email protected]>

Closes apache#7351 from liancheng/spark-8990-for-1.4 and squashes the following commits:

ffb5a73 [Cheng Lian] Backports PR apache#7347 (SPARK-8990) to branch-1.4
markhamstra added a commit that referenced this pull request Jul 13, 2015
@markhamstra markhamstra merged commit 478d9fb into alteryx:csd-1.4 Jul 13, 2015
markhamstra pushed a commit that referenced this pull request Mar 21, 2017
## What changes were proposed in this pull request?

These error below seems caused by unidoc that does not understand double commented block.

```
[error] .../spark/sql/core/target/java/org/apache/spark/sql/KeyedState.java:69: error: class, interface, or enum expected
[error]  * MapGroupsWithStateFunction&lt;String, Integer, Integer, String&gt; mappingFunction =
[error]                                  ^
[error] .../spark/sql/core/target/java/org/apache/spark/sql/KeyedState.java:69: error: class, interface, or enum expected
[error]  * MapGroupsWithStateFunction&lt;String, Integer, Integer, String&gt; mappingFunction =
[error]                                                                       ^
[error] .../spark/sql/core/target/java/org/apache/spark/sql/KeyedState.java:70: error: class, interface, or enum expected
[error]  *    new MapGroupsWithStateFunction&lt;String, Integer, Integer, String&gt;() {
[error]                                         ^
[error] .../spark/sql/core/target/java/org/apache/spark/sql/KeyedState.java:70: error: class, interface, or enum expected
[error]  *    new MapGroupsWithStateFunction&lt;String, Integer, Integer, String&gt;() {
[error]                                                                             ^
[error] .../spark/sql/core/target/java/org/apache/spark/sql/KeyedState.java:72: error: illegal character: '#'
[error]  *      &#64;Override
[error]          ^
[error] .../spark/sql/core/target/java/org/apache/spark/sql/KeyedState.java:72: error: class, interface, or enum expected
[error]  *      &#64;Override
[error]              ^
[error] .../spark/sql/core/target/java/org/apache/spark/sql/KeyedState.java:73: error: class, interface, or enum expected
[error]  *      public String call(String key, Iterator&lt;Integer&gt; value, KeyedState&lt;Integer&gt; state) {
[error]                ^
[error] .../spark/sql/core/target/java/org/apache/spark/sql/KeyedState.java:73: error: class, interface, or enum expected
[error]  *      public String call(String key, Iterator&lt;Integer&gt; value, KeyedState&lt;Integer&gt; state) {
[error]                                                    ^
[error] .../spark/sql/core/target/java/org/apache/spark/sql/KeyedState.java:73: error: class, interface, or enum expected
[error]  *      public String call(String key, Iterator&lt;Integer&gt; value, KeyedState&lt;Integer&gt; state) {
[error]                                                                ^
[error] .../spark/sql/core/target/java/org/apache/spark/sql/KeyedState.java:73: error: class, interface, or enum expected
[error]  *      public String call(String key, Iterator&lt;Integer&gt; value, KeyedState&lt;Integer&gt; state) {
[error]                                                                                     ^
[error] .../spark/sql/core/target/java/org/apache/spark/sql/KeyedState.java:73: error: class, interface, or enum expected
[error]  *      public String call(String key, Iterator&lt;Integer&gt; value, KeyedState&lt;Integer&gt; state) {
[error]                                                                                                 ^
[error] .../spark/sql/core/target/java/org/apache/spark/sql/KeyedState.java:76: error: class, interface, or enum expected
[error]  *          boolean shouldRemove = ...; // Decide whether to remove the state
[error]  ^
[error] .../spark/sql/core/target/java/org/apache/spark/sql/KeyedState.java:77: error: class, interface, or enum expected
[error]  *          if (shouldRemove) {
[error]  ^
[error] .../spark/sql/core/target/java/org/apache/spark/sql/KeyedState.java:79: error: class, interface, or enum expected
[error]  *          } else {
[error]  ^
[error] .../spark/sql/core/target/java/org/apache/spark/sql/KeyedState.java:81: error: class, interface, or enum expected
[error]  *            state.update(newState); // Set the new state
[error]  ^
[error] .../spark/sql/core/target/java/org/apache/spark/sql/KeyedState.java:82: error: class, interface, or enum expected
[error]  *          }
[error]  ^
[error] .../forked/spark/sql/core/target/java/org/apache/spark/sql/KeyedState.java:85: error: class, interface, or enum expected
[error]  *          state.update(initialState);
[error]  ^
[error] .../forked/spark/sql/core/target/java/org/apache/spark/sql/KeyedState.java:86: error: class, interface, or enum expected
[error]  *        }
[error]  ^
[error] .../spark/sql/core/target/java/org/apache/spark/sql/KeyedState.java:90: error: class, interface, or enum expected
[error]  * </code></pre>
[error]  ^
[error] .../spark/sql/core/target/java/org/apache/spark/sql/KeyedState.java:92: error: class, interface, or enum expected
[error]  * tparam S User-defined type of the state to be stored for each key. Must be encodable into
[error]            ^
[error] .../spark/sql/core/target/java/org/apache/spark/sql/KeyedState.java:93: error: class, interface, or enum expected
[error]  *           Spark SQL types (see {link Encoder} for more details).
[error]                                          ^
[error] .../spark/sql/core/target/java/org/apache/spark/sql/KeyedState.java:94: error: class, interface, or enum expected
[error]  * since 2.1.1
[error]           ^
```

And another link seems unrecognisable.

```
.../spark/sql/core/target/java/org/apache/spark/sql/KeyedState.java:16: error: reference not found
[error]  * That is, in every batch of the {link streaming.StreamingQuery StreamingQuery},
[error]
```

Note that this PR does not fix the two breaks as below:

```
[error] .../spark/sql/core/target/java/org/apache/spark/sql/DataFrameStatFunctions.java:43: error: unexpected content
[error]    * see {link DataFrameStatsFunctions.approxQuantile(col:Str* approxQuantile} for
[error]      ^
[error] .../spark/sql/core/target/java/org/apache/spark/sql/DataFrameStatFunctions.java:52: error: bad use of '>'
[error]    * param relativeError The relative target precision to achieve (>= 0).
[error]                                                                     ^
[error]
```

because these seem probably fixed soon in apache#16776 and I intended to avoid potential conflicts.

## How was this patch tested?

Manually via `jekyll build`

Author: hyukjinkwon <[email protected]>

Closes apache#16926 from HyukjinKwon/javadoc-break.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.