Skip to content

Spark 5529 backport 1.3 #5746

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 647 commits into from

Conversation

alexrovner
Copy link

Still running tests on this branch. Mechanically applied the changes based on #4369 without fully understanding whats actually happening since I am not familiar with the codebase. Feedback would be appreciated.

viirya and others added 30 commits March 2, 2015 13:11
…tion

It should be `true` instead of `false`?

Author: Liang-Chi Hsieh <[email protected]>

Closes apache#4762 from viirya/doc_fix and squashes the following commits:

2e37482 [Liang-Chi Hsieh] Fix doc.

(cherry picked from commit 3f9def8)
Signed-off-by: Michael Armbrust <[email protected]>
HiveQL expression like `select count(1) from src tablesample(1 percent);` means take 1% sample to select. But it means 100% in the current version of the Spark.

Author: q00251598 <[email protected]>

Closes apache#4789 from watermen/SPARK-6040 and squashes the following commits:

2453ebe [q00251598] check and adjust the fraction.

(cherry picked from commit 582e5a2)
Signed-off-by: Michael Armbrust <[email protected]>
…ers.

Some YARN configurations return a vcore count for allocated
containers that does not match the requested resource. That means
Spark would always ignore those containers. So relax the the matching
of the vcore count to allow the Spark jobs to run.

Author: Marcelo Vanzin <[email protected]>

Closes apache#4818 from vanzin/SPARK-6050 and squashes the following commits:

991c803 [Marcelo Vanzin] Remove config option, standardize on legacy behavior (no vcore matching).
8c9c346 [Marcelo Vanzin] Restrict lax matching to vcores only.
3359692 [Marcelo Vanzin] [SPARK-6050] [yarn] Add config option to do lax resource matching.

(cherry picked from commit 6b348d9)
Signed-off-by: Thomas Graves <[email protected]>
Author: Michael Armbrust <[email protected]>

Closes apache#4855 from marmbrus/explodeBug and squashes the following commits:

a712249 [Michael Armbrust] [SPARK-6114][SQL] Avoid metastore conversions before plan is resolved

(cherry picked from commit 8223ce6)
Signed-off-by: Michael Armbrust <[email protected]>
…hen caching tables

Constructs like Hive `TRANSFORM` may generate malformed rows (via badly authored external scripts for example). I'm a bit hesitant to have this feature, since it introduces per-tuple cost when caching tables. However, considering caching tables is usually a one-time cost, this is probably worth having.

<!-- Reviewable:start -->
[<img src="https://reviewable.io/review_button.png" height=40 alt="Review on Reviewable"/>](https://reviewable.io/reviews/apache/spark/4842)
<!-- Reviewable:end -->

Author: Cheng Lian <[email protected]>

Closes apache#4842 from liancheng/spark-6082 and squashes the following commits:

b05dbff [Cheng Lian] Provides better error message for malformed rows when caching tables

(cherry picked from commit 1a49496)
Signed-off-by: Michael Armbrust <[email protected]>
Some users have reported difficulty in parsing the new event log format. Since we embed the metadata in the beginning of the file, when we compress the event log we need to skip the metadata because we need that information to parse the log later. This means we'll end up with a partially compressed file if event logging compression is turned on. The old format looks like:
```
sparkVersion = 1.3.0
compressionCodec = org.apache.spark.io.LZFCompressionCodec
=== LOG_HEADER_END ===
// actual events, could be compressed bytes
```
The new format in this patch puts the compression codec in the log file name instead. It also removes the metadata header altogether along with the Spark version, which was not needed. The new file name looks something like:
```
app_without_compression
app_123.lzf
app_456.snappy
```

I tested this with and without compression, using different compression codecs and event logging directories. I verified that both the `Master` and the `HistoryServer` can render both compressed and uncompressed logs as before.

Author: Andrew Or <[email protected]>

Closes apache#4821 from andrewor14/event-log-format and squashes the following commits:

8511141 [Andrew Or] Fix test
654883d [Andrew Or] Add back metadata with Spark version
7f537cd [Andrew Or] Address review feedback
7d6aa61 [Andrew Or] Make codec an extension
59abee9 [Andrew Or] Merge branch 'master' of github.com:apache/spark into event-log-format
27c9a6c [Andrew Or] Address review feedback
519e51a [Andrew Or] Address review feedback
ef69276 [Andrew Or] Merge branch 'master' of github.com:apache/spark into event-log-format
88a091d [Andrew Or] Add tests for new format and file name
f32d8d2 [Andrew Or] Fix tests
8db5a06 [Andrew Or] Embed metadata in the event log file name instead

(cherry picked from commit 6776cb3)
Signed-off-by: Patrick Wendell <[email protected]>
There are multiple issues with translating on set outlined in the JIRA.

This PR reverts the translation logic added to `SparkConf`. In the future, after the 1.3.0 release we will figure out a way to reorganize the internal structure more elegantly. For now, let's preserve the existing semantics of `SparkConf` since it's a public interface. Unfortunately this means duplicating some code for now, but this is all internal and we can always clean it up later.

Author: Andrew Or <[email protected]>

Closes apache#4799 from andrewor14/conf-set-translate and squashes the following commits:

11c525b [Andrew Or] Move warning to driver
10e77b5 [Andrew Or] Add documentation for deprecation precedence
a369cb1 [Andrew Or] Merge branch 'master' of github.com:apache/spark into conf-set-translate
c26a9e3 [Andrew Or] Revert all translate logic in SparkConf
fef6c9c [Andrew Or] Restore deprecation logic for spark.executor.userClassPathFirst
94b4dfa [Andrew Or] Translate on get, not set

(cherry picked from commit 258d154)
Signed-off-by: Patrick Wendell <[email protected]>
`df.dtypes` shows `null` for UDTs. This PR uses `udt` by default and `VectorUDT` overwrites it with `vector`.

jkbradley davies

Author: Xiangrui Meng <[email protected]>

Closes apache#4858 from mengxr/SPARK-6121 and squashes the following commits:

34f0a77 [Xiangrui Meng] simpleString for UDT

(cherry picked from commit 2db6a85)
Signed-off-by: Xiangrui Meng <[email protected]>
This is based on apache#4801 from dbtsai. The linear method guide is re-organized a little bit for this change.

Closes apache#4801

Author: Xiangrui Meng <[email protected]>
Author: DB Tsai <[email protected]>

Closes apache#4861 from mengxr/SPARK-5537 and squashes the following commits:

47af0ac [Xiangrui Meng] update user guide for multinomial logistic regression
cdc2e15 [Xiangrui Meng] Merge remote-tracking branch 'apache/master' into AlpineNow-mlor-doc
096d0ca [DB Tsai] first commit

(cherry picked from commit 9d6c5ae)
Signed-off-by: Xiangrui Meng <[email protected]>
davies

Author: Tathagata Das <[email protected]>

Closes apache#4860 from tdas/SPARK-6127 and squashes the following commits:

82de92a [Tathagata Das] Add Kafka to Python api docs

(cherry picked from commit 9eb22ec)
Signed-off-by: Tathagata Das <[email protected]>
… should work when using datasource api

This PR contains the following changes:
1. Add a new method, `DataType.equalsIgnoreCompatibleNullability`, which is the middle ground between DataType's equality check and `DataType.equalsIgnoreNullability`. For two data types `from` and `to`, it does `equalsIgnoreNullability` as well as if the nullability of `from` is compatible with that of `to`. For example, the nullability of `ArrayType(IntegerType, containsNull = false)` is compatible with that of `ArrayType(IntegerType, containsNull = true)` (for an array without null values, we can always say it may contain null values). However,  the nullability of `ArrayType(IntegerType, containsNull = true)` is incompatible with that of `ArrayType(IntegerType, containsNull = false)` (for an array that may have null values, we cannot say it does not have null values).
2. For the `resolved` field of `InsertIntoTable`, use `equalsIgnoreCompatibleNullability` to replace the equality check of the data types.
3. For our data source write path, when appending data, we always use the schema of existing table to write the data. This is important for parquet, since nullability direct impacts the way to encode/decode values. If we do not do this, we may see corrupted values when reading values from a set of parquet files generated with different nullability settings.
4. When generating a new parquet table, we always set nullable/containsNull/valueContainsNull to true. So, we will not face situations that we cannot append data because containsNull/valueContainsNull in an Array/Map column of the existing table has already been set to `false`. This change makes the whole data pipeline more robust.
5. Update the equality check of JSON relation. Since JSON does not really cares nullability,  `equalsIgnoreNullability` seems a better choice to compare schemata from to JSON tables.

JIRA: https://issues.apache.org/jira/browse/SPARK-5950

Thanks viirya for the initial work in apache#4729.

cc marmbrus liancheng

Author: Yin Huai <[email protected]>

Closes apache#4826 from yhuai/insertNullabilityCheck and squashes the following commits:

3b61a04 [Yin Huai] Revert change on equals.
80e487e [Yin Huai] asNullable in UDT.
587d88b [Yin Huai] Make methods private.
0cb7ea2 [Yin Huai] marmbrus's comments.
3cec464 [Yin Huai] Cheng's comments.
486ed08 [Yin Huai] Merge remote-tracking branch 'upstream/master' into insertNullabilityCheck
d3747d1 [Yin Huai] Remove unnecessary change.
8360817 [Yin Huai] Merge remote-tracking branch 'upstream/master' into insertNullabilityCheck
8a3f237 [Yin Huai] Use equalsIgnoreNullability instead of equality check.
0eb5578 [Yin Huai] Fix tests.
f6ed813 [Yin Huai] Update old parquet path.
e4f397c [Yin Huai] Unit tests.
b2c06f8 [Yin Huai] Ignore nullability in JSON relation's equality check.
8bd008b [Yin Huai] nullable, containsNull, and valueContainsNull will be always true for parquet data.
bf50d73 [Yin Huai] When appending data, we use the schema of the existing table instead of the schema of the new data.
0a703e7 [Yin Huai] Test failed again since we cannot read correct content.
9a26611 [Yin Huai] Make InsertIntoTable happy.
8f19fe5 [Yin Huai] equalsIgnoreCompatibleNullability
4ec17fd [Yin Huai] Failed test.

(cherry picked from commit 1259994)
Signed-off-by: Michael Armbrust <[email protected]>
 - Various Fixes to docs
 - Make data source traits actually interfaces

Based on apache#4862 but with fixed conflicts.

Author: Reynold Xin <[email protected]>
Author: Michael Armbrust <[email protected]>

Closes apache#4868 from marmbrus/pr/4862 and squashes the following commits:

fe091ea [Michael Armbrust] Merge remote-tracking branch 'origin/master' into pr/4862
0208497 [Reynold Xin] Test fixes.
34e0a28 [Reynold Xin] [SPARK-5310][SQL] Various fixes to Spark SQL docs.

(cherry picked from commit 54d1968)
Signed-off-by: Michael Armbrust <[email protected]>
Similar to `MatrixFactorizaionModel`, we only need wrappers to support save/load for tree models in Python.

jkbradley

Author: Xiangrui Meng <[email protected]>

Closes apache#4854 from mengxr/SPARK-6097 and squashes the following commits:

4586a4d [Xiangrui Meng] fix more typos
8ebcac2 [Xiangrui Meng] fix python style
91172d8 [Xiangrui Meng] fix typos
201b3b9 [Xiangrui Meng] update user guide
b5158e2 [Xiangrui Meng] support tree model save/load in PySpark/MLlib

(cherry picked from commit 7e53a79)
Signed-off-by: Xiangrui Meng <[email protected]>
Issue: When the Python DecisionTree example in the programming guide is run, it runs out of Java Heap Space when using the default memory settings for the spark shell.

This prints a warning.

CC: mengxr

Author: Joseph K. Bradley <[email protected]>

Closes apache#4864 from jkbradley/dt-save-heap and squashes the following commits:

02e8daf [Joseph K. Bradley] fixed based on code review
7ecb1ed [Joseph K. Bradley] Added warnings about memory when calling tree and ensemble model save with too small a Java heap size

(cherry picked from commit c2fe3a6)
Signed-off-by: Xiangrui Meng <[email protected]>
…ression

Adding more description on top of apache#4861.

Author: DB Tsai <[email protected]>

Closes apache#4866 from dbtsai/doc and squashes the following commits:

37e9d07 [DB Tsai] doc

(cherry picked from commit b196056)
Signed-off-by: Xiangrui Meng <[email protected]>
After apache#2982 (SPARK-4048) we rely on the newer HBase packaging format.
This adds two features:
1. The ability to publish with a different maven version than
   that specified in the release source.
2. Forking of different Zinc instances during the parallel dist
   creation (to help with some stability issues).
…ize to ensure deleting the temp file"

This reverts commit 25fae8e.
…h Java 6

Add warning about building with Java 7+ and running the JAR on early Java 6.

CC andrewor14

Author: Sean Owen <[email protected]>

Closes apache#4874 from srowen/SPARK-1911 and squashes the following commits:

79fa2f6 [Sean Owen] Add warning about building with Java 7+ and running the JAR on early Java 6.

(cherry picked from commit e750a6b)
Signed-off-by: Andrew Or <[email protected]>
…w/ kryo

https://issues.apache.org/jira/browse/SPARK-5949

Author: Imran Rashid <[email protected]>

Closes apache#4877 from squito/SPARK-5949_register_roaring_bitmap and squashes the following commits:

7e13316 [Imran Rashid] style style style
5f6bb6d [Imran Rashid] more style
709bfe0 [Imran Rashid] style
a5cb744 [Imran Rashid] update tests to cover both types of RoaringBitmapContainers
09610c6 [Imran Rashid] formatting
f9a0b7c [Imran Rashid] put primitive array registrations together
97beaf8 [Imran Rashid] SPARK-5949 HighlyCompressedMapStatus needs more classes registered w/ kryo

(cherry picked from commit 1f1fccc)
Signed-off-by: Reynold Xin <[email protected]>
…ce bug

LBFGS and OWLQN in Breeze 0.10 has convergence check bug.
This is fixed in 0.11, see the description in Breeze project for detail:

scalanlp/breeze#373 (comment)

Author: Xiangrui Meng <[email protected]>
Author: DB Tsai <[email protected]>
Author: DB Tsai <[email protected]>

Closes apache#4879 from dbtsai/breeze and squashes the following commits:

d848f65 [DB Tsai] Merge pull request apache#1 from mengxr/AlpineNow-breeze
c2ca6ac [Xiangrui Meng] upgrade to breeze-0.11.1
35c2f26 [Xiangrui Meng] fix LRSuite
397a208 [DB Tsai] upgrade breeze

(cherry picked from commit 76e20a0)
Signed-off-by: Xiangrui Meng <[email protected]>
…cker-client

Integration test suites in the JDBC data source (`MySQLIntegration` and `PostgresIntegration`) depend on docker-client 2.7.5, which transitively depends on Guava 17.0. Unfortunately, Guava 17.0 is causing test runtime binary compatibility issues when Spark is compiled against Hive 0.12.0, or Hadoop 2.4.

Considering `MySQLIntegration` and `PostgresIntegration` are ignored right now, I'd suggest moving them from the Spark project to the [Spark integration tests] [1] project. This PR removes both the JDBC data source integration tests and the docker-client test dependency.

[1]: |https://github.com/databricks/spark-integration-tests

<!-- Reviewable:start -->
[<img src="https://reviewable.io/review_button.png" height=40 alt="Review on Reviewable"/>](https://reviewable.io/reviews/apache/spark/4872)
<!-- Reviewable:end -->

Author: Cheng Lian <[email protected]>

Closes apache#4872 from liancheng/remove-docker-client and squashes the following commits:

1f4169e [Cheng Lian] Removes DockerHacks
159b24a [Cheng Lian] Removed JDBC integration tests which depends on docker-client

(cherry picked from commit 76b472f)
Signed-off-by: Cheng Lian <[email protected]>
…t LongType value in defaultPrimitive

In `CodeGenerator`, the casting on `FloatType` should use `FloatType` instead of `IntegerType`.

Besides, `defaultPrimitive` for `LongType` should be `-1L` instead of `1L`.

Author: Liang-Chi Hsieh <[email protected]>

Closes apache#4870 from viirya/codegen_type and squashes the following commits:

76311dd [Liang-Chi Hsieh] Fix wrong datatype for casting on FloatType. Fix the wrong value for LongType in defaultPrimitive.

(cherry picked from commit aef8a84)
Signed-off-by: Cheng Lian <[email protected]>
The code failed in two modes: it complained when it tried to re-create a directory that already existed, and it was placing some files in the wrong parent directory. The patch fixes both issues.

Author: Marcelo Vanzin <[email protected]>
Author: trystanleftwich <[email protected]>

Closes apache#4894 from vanzin/SPARK-6144 and squashes the following commits:

100b3a1 [Marcelo Vanzin] Style fix.
58266aa [Marcelo Vanzin] Fix fetchHcfs file for directories.
91733b7 [trystanleftwich] [SPARK-6144]When in cluster mode using ADD JAR with a hdfs:// sourced jar will fail

(cherry picked from commit 3a35a0d)
Signed-off-by: Andrew Or <[email protected]>
…dule-scala_2.10

This PR excludes Guava 15.0 from the SBT build, to make Spark SQL CLI (`bin/spark-sql`) work when compiled against Hive 0.12.0.

<!-- Reviewable:start -->
[<img src="https://reviewable.io/review_button.png" height=40 alt="Review on Reviewable"/>](https://reviewable.io/reviews/apache/spark/4890)
<!-- Reviewable:end -->

Author: Cheng Lian <[email protected]>

Closes apache#4890 from liancheng/exclude-guava-15 and squashes the following commits:

91ae9fa [Cheng Lian] Moves Guava 15 exclusion from SBT build to POM
282bd2a [Cheng Lian] Excludes Guava 15 referenced by jackson-module-scala_2.10

(cherry picked from commit 1aa90e3)
Signed-off-by: Patrick Wendell <[email protected]>
JoshRosen and others added 29 commits April 14, 2015 13:41
We should upgrade our snappy-java dependency to 1.1.1.7 in order to include a fix for a bug that results in worse compression in SnappyOutputStream (see xerial/snappy-java#100).

Author: Josh Rosen <[email protected]>

Closes apache#5512 from JoshRosen/snappy-1.1.1.7 and squashes the following commits:

f1ac0f8 [Josh Rosen] Upgrade to snappy-java 1.1.1.7.

(cherry picked from commit 6adb8bc)
Signed-off-by: Josh Rosen <[email protected]>

Conflicts:
	pom.xml
…ed and StreamingListenerBatchStarted (backport to branch 1.3)

Backport SPARK-6766 apache#5414 to branch 1.3

Conflicts:

	streaming/src/main/scala/org/apache/spark/streaming/ui/StreamingJobProgressListener.scala

Author: zsxwing <[email protected]>

Closes apache#5452 from zsxwing/SPARK-6766-branch-1.3 and squashes the following commits:

cb87e44 [zsxwing] [SPARK-6766][Streaming] Fix issue about StreamingListenerBatchSubmitted and StreamingListenerBatchStarted (backport to branch 1.3)
…s f...

...ound.

Author: Marcelo Vanzin <[email protected]>

Closes apache#5515 from vanzin/SPARK-5634 and squashes the following commits:

f74ecf1 [Marcelo Vanzin] [SPARK-5634] [core] Show correct message in HS when no incomplete apps found.

(cherry picked from commit 30a6e0d)
Signed-off-by: Andrew Or <[email protected]>
…amingPage

Because `StreamingPage.render` doesn't hold the `listener` lock when generating the content, the different parts of content may have some inconsistent values if `listener` updates its status at the same time. And it will confuse people.

This PR added `listener.synchronized` to make sure we have a consistent view of StreamingJobProgressListener when creating the content.

Author: zsxwing <[email protected]>

Closes apache#5470 from zsxwing/SPARK-6860 and squashes the following commits:

cec6f92 [zsxwing] Add missing 'synchronized' in StreamingJobProgressListener
7182498 [zsxwing] Add synchronized to make sure we have a consistent view of StreamingJobProgressListener when creating the content
Set the current dir path $FWDIR and same at $ASSEMBLY_DIR1, $ASSEMBLY_DIR2
otherwise $SPARK_HOME cannot be visible from spark-env.sh -- no SPARK_HOME variable is assigned there.
I am using the Spark-1.3.0 source code package and I come across with this when trying to start the master: sbin/start-master.sh

Author: raschild <[email protected]>

Closes apache#5261 from raschild/patch-1 and squashes the following commits:

b9babcd [raschild] Update load-spark-env.sh
Currently, the created broadcast object will have same life cycle as RDD in Python. For multistage jobs, an PythonRDD will be created in JVM and the RDD in Python may be GCed, then the broadcast will be destroyed in JVM before the PythonRDD.

This PR change to use PythonRDD to track the lifecycle of the broadcast object. It also have a refactor about getNumPartitions() to avoid unnecessary creation of PythonRDD, which could be heavy.

cc JoshRosen

Author: Davies Liu <[email protected]>

Closes apache#5496 from davies/big_closure and squashes the following commits:

9a0ea4c [Davies Liu] fix big closure with shuffle

(cherry picked from commit f11288d)
Signed-off-by: Josh Rosen <[email protected]>
JIRA https://issues.apache.org/jira/browse/SPARK-6800

Author: Liang-Chi Hsieh <[email protected]>

Closes apache#5488 from viirya/fix_jdbc_where and squashes the following commits:

51386c8 [Liang-Chi Hsieh] Update code comment.
1dcc929 [Liang-Chi Hsieh] Update document.
3eb74d6 [Liang-Chi Hsieh] Revert and modify doc.
df11783 [Liang-Chi Hsieh] Merge remote-tracking branch 'upstream/master' into fix_jdbc_where
3e7db15 [Liang-Chi Hsieh] Fix wrong logic to generate WHERE clause for JDBC.

(cherry picked from commit e3e4e9a)
Signed-off-by: Michael Armbrust <[email protected]>
…not being accepted on c...

 Tiny bug in PowerIterationClusteringExample in which radius not accepted from command line

Author: sboeschhuawei <[email protected]>

Closes apache#5531 from javadba/picsub and squashes the following commits:

2aab8cf [sboeschhuawei] Fixed bug in PICExample in which the radius were not being accepted on command line

(cherry picked from commit 557a797)
Signed-off-by: Xiangrui Meng <[email protected]>
sbin/spark-daemon.sh used

    ps -p "$TARGET_PID" -o args=

to figure out whether the process running with the expected PID is actually a Spark
daemon. When running with a large classpath, the output of ps gets
truncated and the check fails spuriously.

This weakens the check to see if it's a java command (which is something
we do in other parts of the script) rather than looking for the specific
main class name. This means that SPARK-4832 might happen under a
slightly broader range of circumstances (a java program happened to
reuse the same PID), but it seems worthwhile compared to failing
consistently with a large classpath.

Author: Punya Biswal <[email protected]>

Closes apache#5535 from punya/feature/SPARK-6952 and squashes the following commits:

7ea12d1 [Punya Biswal] Handle long args when detecting PID reuse
This patch includes :
 * adding how to use map after an sql query using javaRDD
 * fixing the first few java examples that were written in Scala

Thank you for your time,

Olivier.

Author: Olivier Girardot <[email protected]>

Closes apache#5564 from ogirardot/branch-1.3 and squashes the following commits:

9f8d60e [Olivier Girardot] SPARK-6988 : Fix documentation regarding DataFrames using the Java API
`numExecutors` checking is failed when dynamic allocation is enabled with default configuration. Details can be seen is [SPARK-6975](https://issues.apache.org/jira/browse/SPARK-6975). sryza, please help me to review this, not sure is this the correct way, I think previous you change this part :)

Author: jerryshao <[email protected]>

Closes apache#5551 from jerryshao/SPARK-6975 and squashes the following commits:

4335da1 [jerryshao] Change according to the comments
77bdcbd [jerryshao] Fix argument validation error

(cherry picked from commit d850b4b)
Signed-off-by: Andrew Or <[email protected]>
This patch is fixing the Java examples for Spark SQL when defining
programmatically a Schema and mapping Rows.

Author: Olivier Girardot <[email protected]>

Closes apache#5569 from ogirardot/branch-1.3 and squashes the following commits:

c29e58d [Olivier Girardot] SPARK-6992 : Fix documentation example for Spark SQL on StructType
Just fixed a doc.

Author: Gaurav Nanda <[email protected]>

Closes apache#5576 from gaurav324/master and squashes the following commits:

8a7323f [Gaurav Nanda] Fixed doc

(cherry picked from commit 729885e)
Signed-off-by: Reynold Xin <[email protected]>
If `StreamingKMeans` is not `Serializable`, we cannot do checkpoint for applications that using `StreamingKMeans`. So we should make it `Serializable`.

Author: zsxwing <[email protected]>

Closes apache#5582 from zsxwing/SPARK-6998 and squashes the following commits:

67c2a14 [zsxwing] Make StreamingKMeans 'Serializable'

(cherry picked from commit fa73da0)
Signed-off-by: Reynold Xin <[email protected]>
* Fix the page title in Isotonic regression documents (Naive Bayes -> Isotonic regression)
* Add a newline character at the end of the file

Author: dobashim <[email protected]>

Closes apache#5581 from dobashim/master and squashes the following commits:

d54a041 [dobashim] Fix typo of the page title in Isotonic regression documents

(cherry picked from commit 6fe690d)
Signed-off-by: Sean Owen <[email protected]>
The contribution is my original work. I license the work to the project under the project's open source license.

Small typo in the programming guide.

Author: Eric Chiang <[email protected]>

Closes apache#5599 from ericchiang/docs-typo and squashes the following commits:

1177942 [Eric Chiang] fixed doc

(cherry picked from commit 97fda73)
Signed-off-by: Reynold Xin <[email protected]>
The commit message is pretty self-explanatory.

Author: BenFradet <[email protected]>

Closes apache#5600 from BenFradet/master and squashes the following commits:

108492d [BenFradet] [doc][streaming] Fixed broken link in mllib section

(cherry picked from commit 517bdf3)
Signed-off-by: Xiangrui Meng <[email protected]>
…flowError

A simple truncation in integer division (on rates over 1000 messages / second) causes the existing implementation to sleep for 0 milliseconds, then call itself recursively; this causes what is essentially an infinite recursion, since the base case of the calculated amount of time having elapsed can't be reached before available stack space is exhausted. A fix to this truncation error is included in this patch.

However, even with the defect patched, the accuracy of the existing implementation is abysmal (the error bounds of the original test were effectively [-30%, +10%], although this fact was obscured by hard-coded error margins); as such, when the error bounds were tightened down to [-5%, +5%], the existing implementation failed to meet the new, tightened, requirements. Therefore, an industry-vetted solution (from Guava) was used to get the adapted tests to pass.

Author: David McGuire <[email protected]>

Closes apache#5559 from dmcguire81/master and squashes the following commits:

d29d2e0 [David McGuire] Back out to +/-5% error margins, for flexibility in timing
8be6934 [David McGuire] Fix spacing per code review
90e98b9 [David McGuire] Address scalastyle errors
29011bd [David McGuire] Further ratchet down the error margins
b33b796 [David McGuire] Eliminate dependency on even distribution by BlockGenerator
8f2934b [David McGuire] Remove arbitrary thread timing / cooperation code
70ee310 [David McGuire] Use Thread.yield(), since Thread.sleep(0) is system-dependent
82ee46d [David McGuire] Replace guard clause with nested conditional
2794717 [David McGuire] Replace the RateLimiter with the Guava implementation
38f3ca8 [David McGuire] Ratchet down the error rate to +/- 5%; tests fail
24b1bc0 [David McGuire] Fix truncation in integer division causing infinite recursion
d6e1079 [David McGuire] Stack overflow error in RateLimiter on rates over 1000/s

(cherry picked from commit 5fea3e5)
Signed-off-by: Sean Owen <[email protected]>
SchemaRDD works with ALS.train in 1.2, so we should continue support DataFrames for compatibility. coderxiang

Author: Xiangrui Meng <[email protected]>

Closes apache#5619 from mengxr/SPARK-7036 and squashes the following commits:

dfcaf5a [Xiangrui Meng] ALS.train should support DataFrames in PySpark

(cherry picked from commit 686dd74)
Signed-off-by: Xiangrui Meng <[email protected]>
…scala

add missing comma and space

Author: Alain <[email protected]>

Closes apache#5621 from AiHe/tree-node-issue and squashes the following commits:

159a7bb [Alain] [Minor][MLLIB] Fix a minor formatting bug in toString methods in Node.scala
Issue:

https://issues.apache.org/jira/browse/SPARK-7039

Add support to column type NVARCHAR in Sql Server

java.sql.Types:
http://docs.oracle.com/javase/7/docs/api/java/sql/Types.html

Author: szheng79 <[email protected]>

Closes apache#5618 from szheng79/patch-1 and squashes the following commits:

10da99c [szheng79] Update JDBCRDD.scala
eab0bd8 [szheng79] Add support on type NVARCHAR

(cherry picked from commit fbe7106)
Signed-off-by: Reynold Xin <[email protected]>
This pr convert java.sql.Date type into Int for JDBCRDD.

Author: Daoyuan Wang <[email protected]>

Closes apache#5590 from adrian-wang/datebug and squashes the following commits:

f897b81 [Daoyuan Wang] add a test case
3c9184c [Daoyuan Wang] fix date type convertion in jdbcrdd

(cherry picked from commit 04525c0)
Signed-off-by: Reynold Xin <[email protected]>

Conflicts:
	sql/core/src/main/scala/org/apache/spark/sql/jdbc/JDBCRDD.scala
jkbradley

Author: Xiangrui Meng <[email protected]>

Closes apache#5649 from mengxr/SPARK-7070 and squashes the following commits:

c66023c [Xiangrui Meng] setBeta should call setTopicConcentration

(cherry picked from commit 1ed46a6)
Signed-off-by: Xiangrui Meng <[email protected]>
Author: Cheng Hao <[email protected]>

Closes apache#5671 from chenghao-intel/transform2 and squashes the following commits:

2237e81 [Cheng Hao] fix the deadlock in ScriptTransform
fix typo

Author: Ken Geis <[email protected]>

Closes apache#5674 from kgeis/patch-1 and squashes the following commits:

5ae67de [Ken Geis] Update sql-programming-guide.md

(cherry picked from commit 67bccbd)
Signed-off-by: Reynold Xin <[email protected]>
turned on hive-thriftserver profile in release script

Author: Misha Chernetsov <[email protected]>

Closes apache#5429 from chernetsov/master and squashes the following commits:

9cc36af [Misha Chernetsov] [SPARK-4925] Publish Spark SQL hive-thriftserver maven artifact turned on hive-thriftserver profile in release script for scala 2.10

(cherry picked from commit 998aac2)
Signed-off-by: Patrick Wendell <[email protected]>
…ioner

Added a check to the SparkContext.union method to check that a partitioner is defined on all RDDs when instantiating a PartitionerAwareUnionRDD.

Author: Steven She <[email protected]>

Closes apache#5679 from stevencanopy/SPARK-7103 and squashes the following commits:

5a3d846 [Steven She] SPARK-7103: Fix crash with SparkContext.union when at least one RDD has no partitioner

(cherry picked from commit b9de9e0)
Signed-off-by: Sean Owen <[email protected]>
rxin

Author: Andrew Or <[email protected]>

Closes apache#5734 from andrewor14/ser-deb and squashes the following commits:

e8aad6c [Andrew Or] NonFatal
57d0ef4 [Andrew Or] try catch improveException

(cherry picked from commit bf35edd)
Signed-off-by: Reynold Xin <[email protected]>
@alexrovner alexrovner closed this Apr 28, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.