Spark 5529 backport 1.3 #5746

alexrovner · 2015-04-28T17:32:58Z

Still running tests on this branch. Mechanically applied the changes based on #4369 without fully understanding whats actually happening since I am not familiar with the codebase. Feedback would be appreciated.

…tion It should be `true` instead of `false`? Author: Liang-Chi Hsieh <[email protected]> Closes apache#4762 from viirya/doc_fix and squashes the following commits: 2e37482 [Liang-Chi Hsieh] Fix doc. (cherry picked from commit 3f9def8) Signed-off-by: Michael Armbrust <[email protected]>

HiveQL expression like `select count(1) from src tablesample(1 percent);` means take 1% sample to select. But it means 100% in the current version of the Spark. Author: q00251598 <[email protected]> Closes apache#4789 from watermen/SPARK-6040 and squashes the following commits: 2453ebe [q00251598] check and adjust the fraction. (cherry picked from commit 582e5a2) Signed-off-by: Michael Armbrust <[email protected]>

…ers. Some YARN configurations return a vcore count for allocated containers that does not match the requested resource. That means Spark would always ignore those containers. So relax the the matching of the vcore count to allow the Spark jobs to run. Author: Marcelo Vanzin <[email protected]> Closes apache#4818 from vanzin/SPARK-6050 and squashes the following commits: 991c803 [Marcelo Vanzin] Remove config option, standardize on legacy behavior (no vcore matching). 8c9c346 [Marcelo Vanzin] Restrict lax matching to vcores only. 3359692 [Marcelo Vanzin] [SPARK-6050] [yarn] Add config option to do lax resource matching. (cherry picked from commit 6b348d9) Signed-off-by: Thomas Graves <[email protected]>

Author: Michael Armbrust <[email protected]> Closes apache#4855 from marmbrus/explodeBug and squashes the following commits: a712249 [Michael Armbrust] [SPARK-6114][SQL] Avoid metastore conversions before plan is resolved (cherry picked from commit 8223ce6) Signed-off-by: Michael Armbrust <[email protected]>

…hen caching tables Constructs like Hive `TRANSFORM` may generate malformed rows (via badly authored external scripts for example). I'm a bit hesitant to have this feature, since it introduces per-tuple cost when caching tables. However, considering caching tables is usually a one-time cost, this is probably worth having.  [<img src="https://reviewable.io/review_button.png" height=40 alt="Review on Reviewable"/>](https://reviewable.io/reviews/apache/spark/4842)  Author: Cheng Lian <[email protected]> Closes apache#4842 from liancheng/spark-6082 and squashes the following commits: b05dbff [Cheng Lian] Provides better error message for malformed rows when caching tables (cherry picked from commit 1a49496) Signed-off-by: Michael Armbrust <[email protected]>

Some users have reported difficulty in parsing the new event log format. Since we embed the metadata in the beginning of the file, when we compress the event log we need to skip the metadata because we need that information to parse the log later. This means we'll end up with a partially compressed file if event logging compression is turned on. The old format looks like: ``` sparkVersion = 1.3.0 compressionCodec = org.apache.spark.io.LZFCompressionCodec === LOG_HEADER_END === // actual events, could be compressed bytes ``` The new format in this patch puts the compression codec in the log file name instead. It also removes the metadata header altogether along with the Spark version, which was not needed. The new file name looks something like: ``` app_without_compression app_123.lzf app_456.snappy ``` I tested this with and without compression, using different compression codecs and event logging directories. I verified that both the `Master` and the `HistoryServer` can render both compressed and uncompressed logs as before. Author: Andrew Or <[email protected]> Closes apache#4821 from andrewor14/event-log-format and squashes the following commits: 8511141 [Andrew Or] Fix test 654883d [Andrew Or] Add back metadata with Spark version 7f537cd [Andrew Or] Address review feedback 7d6aa61 [Andrew Or] Make codec an extension 59abee9 [Andrew Or] Merge branch 'master' of github.com:apache/spark into event-log-format 27c9a6c [Andrew Or] Address review feedback 519e51a [Andrew Or] Address review feedback ef69276 [Andrew Or] Merge branch 'master' of github.com:apache/spark into event-log-format 88a091d [Andrew Or] Add tests for new format and file name f32d8d2 [Andrew Or] Fix tests 8db5a06 [Andrew Or] Embed metadata in the event log file name instead (cherry picked from commit 6776cb3) Signed-off-by: Patrick Wendell <[email protected]>

There are multiple issues with translating on set outlined in the JIRA. This PR reverts the translation logic added to `SparkConf`. In the future, after the 1.3.0 release we will figure out a way to reorganize the internal structure more elegantly. For now, let's preserve the existing semantics of `SparkConf` since it's a public interface. Unfortunately this means duplicating some code for now, but this is all internal and we can always clean it up later. Author: Andrew Or <[email protected]> Closes apache#4799 from andrewor14/conf-set-translate and squashes the following commits: 11c525b [Andrew Or] Move warning to driver 10e77b5 [Andrew Or] Add documentation for deprecation precedence a369cb1 [Andrew Or] Merge branch 'master' of github.com:apache/spark into conf-set-translate c26a9e3 [Andrew Or] Revert all translate logic in SparkConf fef6c9c [Andrew Or] Restore deprecation logic for spark.executor.userClassPathFirst 94b4dfa [Andrew Or] Translate on get, not set (cherry picked from commit 258d154) Signed-off-by: Patrick Wendell <[email protected]>

`df.dtypes` shows `null` for UDTs. This PR uses `udt` by default and `VectorUDT` overwrites it with `vector`. jkbradley davies Author: Xiangrui Meng <[email protected]> Closes apache#4858 from mengxr/SPARK-6121 and squashes the following commits: 34f0a77 [Xiangrui Meng] simpleString for UDT (cherry picked from commit 2db6a85) Signed-off-by: Xiangrui Meng <[email protected]>

This is based on apache#4801 from dbtsai. The linear method guide is re-organized a little bit for this change. Closes apache#4801 Author: Xiangrui Meng <[email protected]> Author: DB Tsai <[email protected]> Closes apache#4861 from mengxr/SPARK-5537 and squashes the following commits: 47af0ac [Xiangrui Meng] update user guide for multinomial logistic regression cdc2e15 [Xiangrui Meng] Merge remote-tracking branch 'apache/master' into AlpineNow-mlor-doc 096d0ca [DB Tsai] first commit (cherry picked from commit 9d6c5ae) Signed-off-by: Xiangrui Meng <[email protected]>

davies Author: Tathagata Das <[email protected]> Closes apache#4860 from tdas/SPARK-6127 and squashes the following commits: 82de92a [Tathagata Das] Add Kafka to Python api docs (cherry picked from commit 9eb22ec) Signed-off-by: Tathagata Das <[email protected]>

… should work when using datasource api This PR contains the following changes: 1. Add a new method, `DataType.equalsIgnoreCompatibleNullability`, which is the middle ground between DataType's equality check and `DataType.equalsIgnoreNullability`. For two data types `from` and `to`, it does `equalsIgnoreNullability` as well as if the nullability of `from` is compatible with that of `to`. For example, the nullability of `ArrayType(IntegerType, containsNull = false)` is compatible with that of `ArrayType(IntegerType, containsNull = true)` (for an array without null values, we can always say it may contain null values). However, the nullability of `ArrayType(IntegerType, containsNull = true)` is incompatible with that of `ArrayType(IntegerType, containsNull = false)` (for an array that may have null values, we cannot say it does not have null values). 2. For the `resolved` field of `InsertIntoTable`, use `equalsIgnoreCompatibleNullability` to replace the equality check of the data types. 3. For our data source write path, when appending data, we always use the schema of existing table to write the data. This is important for parquet, since nullability direct impacts the way to encode/decode values. If we do not do this, we may see corrupted values when reading values from a set of parquet files generated with different nullability settings. 4. When generating a new parquet table, we always set nullable/containsNull/valueContainsNull to true. So, we will not face situations that we cannot append data because containsNull/valueContainsNull in an Array/Map column of the existing table has already been set to `false`. This change makes the whole data pipeline more robust. 5. Update the equality check of JSON relation. Since JSON does not really cares nullability, `equalsIgnoreNullability` seems a better choice to compare schemata from to JSON tables. JIRA: https://issues.apache.org/jira/browse/SPARK-5950 Thanks viirya for the initial work in apache#4729. cc marmbrus liancheng Author: Yin Huai <[email protected]> Closes apache#4826 from yhuai/insertNullabilityCheck and squashes the following commits: 3b61a04 [Yin Huai] Revert change on equals. 80e487e [Yin Huai] asNullable in UDT. 587d88b [Yin Huai] Make methods private. 0cb7ea2 [Yin Huai] marmbrus's comments. 3cec464 [Yin Huai] Cheng's comments. 486ed08 [Yin Huai] Merge remote-tracking branch 'upstream/master' into insertNullabilityCheck d3747d1 [Yin Huai] Remove unnecessary change. 8360817 [Yin Huai] Merge remote-tracking branch 'upstream/master' into insertNullabilityCheck 8a3f237 [Yin Huai] Use equalsIgnoreNullability instead of equality check. 0eb5578 [Yin Huai] Fix tests. f6ed813 [Yin Huai] Update old parquet path. e4f397c [Yin Huai] Unit tests. b2c06f8 [Yin Huai] Ignore nullability in JSON relation's equality check. 8bd008b [Yin Huai] nullable, containsNull, and valueContainsNull will be always true for parquet data. bf50d73 [Yin Huai] When appending data, we use the schema of the existing table instead of the schema of the new data. 0a703e7 [Yin Huai] Test failed again since we cannot read correct content. 9a26611 [Yin Huai] Make InsertIntoTable happy. 8f19fe5 [Yin Huai] equalsIgnoreCompatibleNullability 4ec17fd [Yin Huai] Failed test. (cherry picked from commit 1259994) Signed-off-by: Michael Armbrust <[email protected]>

- Various Fixes to docs - Make data source traits actually interfaces Based on apache#4862 but with fixed conflicts. Author: Reynold Xin <[email protected]> Author: Michael Armbrust <[email protected]> Closes apache#4868 from marmbrus/pr/4862 and squashes the following commits: fe091ea [Michael Armbrust] Merge remote-tracking branch 'origin/master' into pr/4862 0208497 [Reynold Xin] Test fixes. 34e0a28 [Reynold Xin] [SPARK-5310][SQL] Various fixes to Spark SQL docs. (cherry picked from commit 54d1968) Signed-off-by: Michael Armbrust <[email protected]>

Similar to `MatrixFactorizaionModel`, we only need wrappers to support save/load for tree models in Python. jkbradley Author: Xiangrui Meng <[email protected]> Closes apache#4854 from mengxr/SPARK-6097 and squashes the following commits: 4586a4d [Xiangrui Meng] fix more typos 8ebcac2 [Xiangrui Meng] fix python style 91172d8 [Xiangrui Meng] fix typos 201b3b9 [Xiangrui Meng] update user guide b5158e2 [Xiangrui Meng] support tree model save/load in PySpark/MLlib (cherry picked from commit 7e53a79) Signed-off-by: Xiangrui Meng <[email protected]>

Issue: When the Python DecisionTree example in the programming guide is run, it runs out of Java Heap Space when using the default memory settings for the spark shell. This prints a warning. CC: mengxr Author: Joseph K. Bradley <[email protected]> Closes apache#4864 from jkbradley/dt-save-heap and squashes the following commits: 02e8daf [Joseph K. Bradley] fixed based on code review 7ecb1ed [Joseph K. Bradley] Added warnings about memory when calling tree and ensemble model save with too small a Java heap size (cherry picked from commit c2fe3a6) Signed-off-by: Xiangrui Meng <[email protected]>

…ression Adding more description on top of apache#4861. Author: DB Tsai <[email protected]> Closes apache#4866 from dbtsai/doc and squashes the following commits: 37e9d07 [DB Tsai] doc (cherry picked from commit b196056) Signed-off-by: Xiangrui Meng <[email protected]>

After apache#2982 (SPARK-4048) we rely on the newer HBase packaging format.

This adds two features: 1. The ability to publish with a different maven version than that specified in the release source. 2. Forking of different Zinc instances during the parallel dist creation (to help with some stability issues).

This reverts commit 2ab0ba0.

This reverts commit f97b0d4.

…ize to ensure deleting the temp file" This reverts commit 25fae8e.

…h Java 6 Add warning about building with Java 7+ and running the JAR on early Java 6. CC andrewor14 Author: Sean Owen <[email protected]> Closes apache#4874 from srowen/SPARK-1911 and squashes the following commits: 79fa2f6 [Sean Owen] Add warning about building with Java 7+ and running the JAR on early Java 6. (cherry picked from commit e750a6b) Signed-off-by: Andrew Or <[email protected]>

…w/ kryo https://issues.apache.org/jira/browse/SPARK-5949 Author: Imran Rashid <[email protected]> Closes apache#4877 from squito/SPARK-5949_register_roaring_bitmap and squashes the following commits: 7e13316 [Imran Rashid] style style style 5f6bb6d [Imran Rashid] more style 709bfe0 [Imran Rashid] style a5cb744 [Imran Rashid] update tests to cover both types of RoaringBitmapContainers 09610c6 [Imran Rashid] formatting f9a0b7c [Imran Rashid] put primitive array registrations together 97beaf8 [Imran Rashid] SPARK-5949 HighlyCompressedMapStatus needs more classes registered w/ kryo (cherry picked from commit 1f1fccc) Signed-off-by: Reynold Xin <[email protected]>

…ce bug LBFGS and OWLQN in Breeze 0.10 has convergence check bug. This is fixed in 0.11, see the description in Breeze project for detail: scalanlp/breeze#373 (comment) Author: Xiangrui Meng <[email protected]> Author: DB Tsai <[email protected]> Author: DB Tsai <[email protected]> Closes apache#4879 from dbtsai/breeze and squashes the following commits: d848f65 [DB Tsai] Merge pull request apache#1 from mengxr/AlpineNow-breeze c2ca6ac [Xiangrui Meng] upgrade to breeze-0.11.1 35c2f26 [Xiangrui Meng] fix LRSuite 397a208 [DB Tsai] upgrade breeze (cherry picked from commit 76e20a0) Signed-off-by: Xiangrui Meng <[email protected]>

…cker-client Integration test suites in the JDBC data source (`MySQLIntegration` and `PostgresIntegration`) depend on docker-client 2.7.5, which transitively depends on Guava 17.0. Unfortunately, Guava 17.0 is causing test runtime binary compatibility issues when Spark is compiled against Hive 0.12.0, or Hadoop 2.4. Considering `MySQLIntegration` and `PostgresIntegration` are ignored right now, I'd suggest moving them from the Spark project to the [Spark integration tests] [1] project. This PR removes both the JDBC data source integration tests and the docker-client test dependency. [1]: |https://github.com/databricks/spark-integration-tests  [<img src="https://reviewable.io/review_button.png" height=40 alt="Review on Reviewable"/>](https://reviewable.io/reviews/apache/spark/4872)  Author: Cheng Lian <[email protected]> Closes apache#4872 from liancheng/remove-docker-client and squashes the following commits: 1f4169e [Cheng Lian] Removes DockerHacks 159b24a [Cheng Lian] Removed JDBC integration tests which depends on docker-client (cherry picked from commit 76b472f) Signed-off-by: Cheng Lian <[email protected]>

…t LongType value in defaultPrimitive In `CodeGenerator`, the casting on `FloatType` should use `FloatType` instead of `IntegerType`. Besides, `defaultPrimitive` for `LongType` should be `-1L` instead of `1L`. Author: Liang-Chi Hsieh <[email protected]> Closes apache#4870 from viirya/codegen_type and squashes the following commits: 76311dd [Liang-Chi Hsieh] Fix wrong datatype for casting on FloatType. Fix the wrong value for LongType in defaultPrimitive. (cherry picked from commit aef8a84) Signed-off-by: Cheng Lian <[email protected]>

The code failed in two modes: it complained when it tried to re-create a directory that already existed, and it was placing some files in the wrong parent directory. The patch fixes both issues. Author: Marcelo Vanzin <[email protected]> Author: trystanleftwich <[email protected]> Closes apache#4894 from vanzin/SPARK-6144 and squashes the following commits: 100b3a1 [Marcelo Vanzin] Style fix. 58266aa [Marcelo Vanzin] Fix fetchHcfs file for directories. 91733b7 [trystanleftwich] [SPARK-6144]When in cluster mode using ADD JAR with a hdfs:// sourced jar will fail (cherry picked from commit 3a35a0d) Signed-off-by: Andrew Or <[email protected]>

…dule-scala_2.10 This PR excludes Guava 15.0 from the SBT build, to make Spark SQL CLI (`bin/spark-sql`) work when compiled against Hive 0.12.0.  [<img src="https://reviewable.io/review_button.png" height=40 alt="Review on Reviewable"/>](https://reviewable.io/reviews/apache/spark/4890)  Author: Cheng Lian <[email protected]> Closes apache#4890 from liancheng/exclude-guava-15 and squashes the following commits: 91ae9fa [Cheng Lian] Moves Guava 15 exclusion from SBT build to POM 282bd2a [Cheng Lian] Excludes Guava 15 referenced by jackson-module-scala_2.10 (cherry picked from commit 1aa90e3) Signed-off-by: Patrick Wendell <[email protected]>

We should upgrade our snappy-java dependency to 1.1.1.7 in order to include a fix for a bug that results in worse compression in SnappyOutputStream (see xerial/snappy-java#100). Author: Josh Rosen <[email protected]> Closes apache#5512 from JoshRosen/snappy-1.1.1.7 and squashes the following commits: f1ac0f8 [Josh Rosen] Upgrade to snappy-java 1.1.1.7. (cherry picked from commit 6adb8bc) Signed-off-by: Josh Rosen <[email protected]> Conflicts: pom.xml

…ed and StreamingListenerBatchStarted (backport to branch 1.3) Backport SPARK-6766 apache#5414 to branch 1.3 Conflicts: streaming/src/main/scala/org/apache/spark/streaming/ui/StreamingJobProgressListener.scala Author: zsxwing <[email protected]> Closes apache#5452 from zsxwing/SPARK-6766-branch-1.3 and squashes the following commits: cb87e44 [zsxwing] [SPARK-6766][Streaming] Fix issue about StreamingListenerBatchSubmitted and StreamingListenerBatchStarted (backport to branch 1.3)

…s f... ...ound. Author: Marcelo Vanzin <[email protected]> Closes apache#5515 from vanzin/SPARK-5634 and squashes the following commits: f74ecf1 [Marcelo Vanzin] [SPARK-5634] [core] Show correct message in HS when no incomplete apps found. (cherry picked from commit 30a6e0d) Signed-off-by: Andrew Or <[email protected]>

…amingPage Because `StreamingPage.render` doesn't hold the `listener` lock when generating the content, the different parts of content may have some inconsistent values if `listener` updates its status at the same time. And it will confuse people. This PR added `listener.synchronized` to make sure we have a consistent view of StreamingJobProgressListener when creating the content. Author: zsxwing <[email protected]> Closes apache#5470 from zsxwing/SPARK-6860 and squashes the following commits: cec6f92 [zsxwing] Add missing 'synchronized' in StreamingJobProgressListener 7182498 [zsxwing] Add synchronized to make sure we have a consistent view of StreamingJobProgressListener when creating the content

Set the current dir path $FWDIR and same at $ASSEMBLY_DIR1, $ASSEMBLY_DIR2 otherwise $SPARK_HOME cannot be visible from spark-env.sh -- no SPARK_HOME variable is assigned there. I am using the Spark-1.3.0 source code package and I come across with this when trying to start the master: sbin/start-master.sh Author: raschild <[email protected]> Closes apache#5261 from raschild/patch-1 and squashes the following commits: b9babcd [raschild] Update load-spark-env.sh

Currently, the created broadcast object will have same life cycle as RDD in Python. For multistage jobs, an PythonRDD will be created in JVM and the RDD in Python may be GCed, then the broadcast will be destroyed in JVM before the PythonRDD. This PR change to use PythonRDD to track the lifecycle of the broadcast object. It also have a refactor about getNumPartitions() to avoid unnecessary creation of PythonRDD, which could be heavy. cc JoshRosen Author: Davies Liu <[email protected]> Closes apache#5496 from davies/big_closure and squashes the following commits: 9a0ea4c [Davies Liu] fix big closure with shuffle (cherry picked from commit f11288d) Signed-off-by: Josh Rosen <[email protected]>

JIRA https://issues.apache.org/jira/browse/SPARK-6800 Author: Liang-Chi Hsieh <[email protected]> Closes apache#5488 from viirya/fix_jdbc_where and squashes the following commits: 51386c8 [Liang-Chi Hsieh] Update code comment. 1dcc929 [Liang-Chi Hsieh] Update document. 3eb74d6 [Liang-Chi Hsieh] Revert and modify doc. df11783 [Liang-Chi Hsieh] Merge remote-tracking branch 'upstream/master' into fix_jdbc_where 3e7db15 [Liang-Chi Hsieh] Fix wrong logic to generate WHERE clause for JDBC. (cherry picked from commit e3e4e9a) Signed-off-by: Michael Armbrust <[email protected]>

…not being accepted on c... Tiny bug in PowerIterationClusteringExample in which radius not accepted from command line Author: sboeschhuawei <[email protected]> Closes apache#5531 from javadba/picsub and squashes the following commits: 2aab8cf [sboeschhuawei] Fixed bug in PICExample in which the radius were not being accepted on command line (cherry picked from commit 557a797) Signed-off-by: Xiangrui Meng <[email protected]>

sbin/spark-daemon.sh used ps -p "$TARGET_PID" -o args= to figure out whether the process running with the expected PID is actually a Spark daemon. When running with a large classpath, the output of ps gets truncated and the check fails spuriously. This weakens the check to see if it's a java command (which is something we do in other parts of the script) rather than looking for the specific main class name. This means that SPARK-4832 might happen under a slightly broader range of circumstances (a java program happened to reuse the same PID), but it seems worthwhile compared to failing consistently with a large classpath. Author: Punya Biswal <[email protected]> Closes apache#5535 from punya/feature/SPARK-6952 and squashes the following commits: 7ea12d1 [Punya Biswal] Handle long args when detecting PID reuse

This patch includes : * adding how to use map after an sql query using javaRDD * fixing the first few java examples that were written in Scala Thank you for your time, Olivier. Author: Olivier Girardot <[email protected]> Closes apache#5564 from ogirardot/branch-1.3 and squashes the following commits: 9f8d60e [Olivier Girardot] SPARK-6988 : Fix documentation regarding DataFrames using the Java API

`numExecutors` checking is failed when dynamic allocation is enabled with default configuration. Details can be seen is [SPARK-6975](https://issues.apache.org/jira/browse/SPARK-6975). sryza, please help me to review this, not sure is this the correct way, I think previous you change this part :) Author: jerryshao <[email protected]> Closes apache#5551 from jerryshao/SPARK-6975 and squashes the following commits: 4335da1 [jerryshao] Change according to the comments 77bdcbd [jerryshao] Fix argument validation error (cherry picked from commit d850b4b) Signed-off-by: Andrew Or <[email protected]>

This patch is fixing the Java examples for Spark SQL when defining programmatically a Schema and mapping Rows. Author: Olivier Girardot <[email protected]> Closes apache#5569 from ogirardot/branch-1.3 and squashes the following commits: c29e58d [Olivier Girardot] SPARK-6992 : Fix documentation example for Spark SQL on StructType

Just fixed a doc. Author: Gaurav Nanda <[email protected]> Closes apache#5576 from gaurav324/master and squashes the following commits: 8a7323f [Gaurav Nanda] Fixed doc (cherry picked from commit 729885e) Signed-off-by: Reynold Xin <[email protected]>

If `StreamingKMeans` is not `Serializable`, we cannot do checkpoint for applications that using `StreamingKMeans`. So we should make it `Serializable`. Author: zsxwing <[email protected]> Closes apache#5582 from zsxwing/SPARK-6998 and squashes the following commits: 67c2a14 [zsxwing] Make StreamingKMeans 'Serializable' (cherry picked from commit fa73da0) Signed-off-by: Reynold Xin <[email protected]>

* Fix the page title in Isotonic regression documents (Naive Bayes -> Isotonic regression) * Add a newline character at the end of the file Author: dobashim <[email protected]> Closes apache#5581 from dobashim/master and squashes the following commits: d54a041 [dobashim] Fix typo of the page title in Isotonic regression documents (cherry picked from commit 6fe690d) Signed-off-by: Sean Owen <[email protected]>

The contribution is my original work. I license the work to the project under the project's open source license. Small typo in the programming guide. Author: Eric Chiang <[email protected]> Closes apache#5599 from ericchiang/docs-typo and squashes the following commits: 1177942 [Eric Chiang] fixed doc (cherry picked from commit 97fda73) Signed-off-by: Reynold Xin <[email protected]>

The commit message is pretty self-explanatory. Author: BenFradet <[email protected]> Closes apache#5600 from BenFradet/master and squashes the following commits: 108492d [BenFradet] [doc][streaming] Fixed broken link in mllib section (cherry picked from commit 517bdf3) Signed-off-by: Xiangrui Meng <[email protected]>

…flowError A simple truncation in integer division (on rates over 1000 messages / second) causes the existing implementation to sleep for 0 milliseconds, then call itself recursively; this causes what is essentially an infinite recursion, since the base case of the calculated amount of time having elapsed can't be reached before available stack space is exhausted. A fix to this truncation error is included in this patch. However, even with the defect patched, the accuracy of the existing implementation is abysmal (the error bounds of the original test were effectively [-30%, +10%], although this fact was obscured by hard-coded error margins); as such, when the error bounds were tightened down to [-5%, +5%], the existing implementation failed to meet the new, tightened, requirements. Therefore, an industry-vetted solution (from Guava) was used to get the adapted tests to pass. Author: David McGuire <[email protected]> Closes apache#5559 from dmcguire81/master and squashes the following commits: d29d2e0 [David McGuire] Back out to +/-5% error margins, for flexibility in timing 8be6934 [David McGuire] Fix spacing per code review 90e98b9 [David McGuire] Address scalastyle errors 29011bd [David McGuire] Further ratchet down the error margins b33b796 [David McGuire] Eliminate dependency on even distribution by BlockGenerator 8f2934b [David McGuire] Remove arbitrary thread timing / cooperation code 70ee310 [David McGuire] Use Thread.yield(), since Thread.sleep(0) is system-dependent 82ee46d [David McGuire] Replace guard clause with nested conditional 2794717 [David McGuire] Replace the RateLimiter with the Guava implementation 38f3ca8 [David McGuire] Ratchet down the error rate to +/- 5%; tests fail 24b1bc0 [David McGuire] Fix truncation in integer division causing infinite recursion d6e1079 [David McGuire] Stack overflow error in RateLimiter on rates over 1000/s (cherry picked from commit 5fea3e5) Signed-off-by: Sean Owen <[email protected]>

SchemaRDD works with ALS.train in 1.2, so we should continue support DataFrames for compatibility. coderxiang Author: Xiangrui Meng <[email protected]> Closes apache#5619 from mengxr/SPARK-7036 and squashes the following commits: dfcaf5a [Xiangrui Meng] ALS.train should support DataFrames in PySpark (cherry picked from commit 686dd74) Signed-off-by: Xiangrui Meng <[email protected]>

…scala add missing comma and space Author: Alain <[email protected]> Closes apache#5621 from AiHe/tree-node-issue and squashes the following commits: 159a7bb [Alain] [Minor][MLLIB] Fix a minor formatting bug in toString methods in Node.scala

Issue: https://issues.apache.org/jira/browse/SPARK-7039 Add support to column type NVARCHAR in Sql Server java.sql.Types: http://docs.oracle.com/javase/7/docs/api/java/sql/Types.html Author: szheng79 <[email protected]> Closes apache#5618 from szheng79/patch-1 and squashes the following commits: 10da99c [szheng79] Update JDBCRDD.scala eab0bd8 [szheng79] Add support on type NVARCHAR (cherry picked from commit fbe7106) Signed-off-by: Reynold Xin <[email protected]>

This pr convert java.sql.Date type into Int for JDBCRDD. Author: Daoyuan Wang <[email protected]> Closes apache#5590 from adrian-wang/datebug and squashes the following commits: f897b81 [Daoyuan Wang] add a test case 3c9184c [Daoyuan Wang] fix date type convertion in jdbcrdd (cherry picked from commit 04525c0) Signed-off-by: Reynold Xin <[email protected]> Conflicts: sql/core/src/main/scala/org/apache/spark/sql/jdbc/JDBCRDD.scala

jkbradley Author: Xiangrui Meng <[email protected]> Closes apache#5649 from mengxr/SPARK-7070 and squashes the following commits: c66023c [Xiangrui Meng] setBeta should call setTopicConcentration (cherry picked from commit 1ed46a6) Signed-off-by: Xiangrui Meng <[email protected]>

Author: Cheng Hao <[email protected]> Closes apache#5671 from chenghao-intel/transform2 and squashes the following commits: 2237e81 [Cheng Hao] fix the deadlock in ScriptTransform

fix typo Author: Ken Geis <[email protected]> Closes apache#5674 from kgeis/patch-1 and squashes the following commits: 5ae67de [Ken Geis] Update sql-programming-guide.md (cherry picked from commit 67bccbd) Signed-off-by: Reynold Xin <[email protected]>

turned on hive-thriftserver profile in release script Author: Misha Chernetsov <[email protected]> Closes apache#5429 from chernetsov/master and squashes the following commits: 9cc36af [Misha Chernetsov] [SPARK-4925] Publish Spark SQL hive-thriftserver maven artifact turned on hive-thriftserver profile in release script for scala 2.10 (cherry picked from commit 998aac2) Signed-off-by: Patrick Wendell <[email protected]>

…ioner Added a check to the SparkContext.union method to check that a partitioner is defined on all RDDs when instantiating a PartitionerAwareUnionRDD. Author: Steven She <[email protected]> Closes apache#5679 from stevencanopy/SPARK-7103 and squashes the following commits: 5a3d846 [Steven She] SPARK-7103: Fix crash with SparkContext.union when at least one RDD has no partitioner (cherry picked from commit b9de9e0) Signed-off-by: Sean Owen <[email protected]>

rxin Author: Andrew Or <[email protected]> Closes apache#5734 from andrewor14/ser-deb and squashes the following commits: e8aad6c [Andrew Or] NonFatal 57d0ef4 [Andrew Or] try catch improveException (cherry picked from commit bf35edd) Signed-off-by: Reynold Xin <[email protected]>

viirya and others added 30 commits March 2, 2015 13:11

HOTFIX: Bump HBase version in MapR profiles.

1aa8461

After apache#2982 (SPARK-4048) we rely on the newer HBase packaging format.

BUILD: Minor tweaks to internal build scripts

ae60eb9

This adds two features: 1. The ability to publish with a different maven version than that specified in the release source. 2. Forking of different Zinc instances during the parallel dist creation (to help with some stability issues).

Adding CHANGES.txt for Spark 1.3

ce7158c

Revert "Preparing development version 1.3.1-SNAPSHOT"

4fee08e

This reverts commit 2ab0ba0.

Revert "Preparing Spark release v1.3.0-rc1"

b012ed1

This reverts commit f97b0d4.

Preparing Spark release v1.3.0-rc2

3af2687

Preparing development version 1.3.1-SNAPSHOT

05d5a29

Revert "[SPARK-5423][Core] Cleanup resources in DiskMapIterator.final…

ee4929d

…ize to ensure deleting the temp file" This reverts commit 25fae8e.

JoshRosen and others added 29 commits April 14, 2015 13:41

Fixed doc

1d6e332

Just fixed a doc. Author: Gaurav Nanda <[email protected]> Closes apache#5576 from gaurav324/master and squashes the following commits: 8a7323f [Gaurav Nanda] Fixed doc (cherry picked from commit 729885e) Signed-off-by: Reynold Xin <[email protected]>

[SPARK-7044][SQL] Fix the deadlock in ScriptTransform(for Spark 1.3)

2b340af

Author: Cheng Hao <[email protected]> Closes apache#5671 from chenghao-intel/transform2 and squashes the following commits: 2237e81 [Cheng Hao] fix the deadlock in ScriptTransform

SPARK-5529 Backported changes to 1.3

18008f9

alexrovner closed this Apr 28, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Spark 5529 backport 1.3 #5746

Spark 5529 backport 1.3 #5746

Uh oh!

alexrovner commented Apr 28, 2015

Uh oh!

Uh oh!

Spark 5529 backport 1.3 #5746

Spark 5529 backport 1.3 #5746

Uh oh!

Conversation

alexrovner commented Apr 28, 2015

Uh oh!

Uh oh!