[SPARK-10400] [SQL] Deprecates SQLConf.PARQUET_FOLLOW_PARQUET_FORMAT_SPEC #7

liancheng · 2015-09-01T16:21:44Z

Please refer to SPARK-10400 for more details.

Add a python API for the Stop Words Remover. Author: Holden Karau <[email protected]> Closes apache#8118 from holdenk/SPARK-9679-python-StopWordsRemover.

…ring scripts Migrate Apache download closer.cgi refs to new closer.lua This is the bit of the change that affects the project docs; I'm implementing the changes to the Apache site separately. Author: Sean Owen <[email protected]> Closes apache#8557 from srowen/SPARK-10398.

SPARK-4223. Currently we support setting view and modify acls but you have to specify a list of users. It would be nice to support * meaning all users have access. Manual tests to verify that: "*" works for any user in: a. Spark ui: view and kill stage. Done. b. Spark history server. Done. c. Yarn application killing. Done. Author: zhuol <[email protected]> Closes apache#8398 from zhuoliu/4223.

…ilter function This PR addresses [SPARK-10162](https://issues.apache.org/jira/browse/SPARK-10162) The issue is with DataFrame filter() function, if datetime.datetime is passed to it: * Timezone information of this datetime is ignored * This datetime is assumed to be in local timezone, which depends on the OS timezone setting Fix includes both code change and regression test. Problem reproduction code on master: ```python import pytz from datetime import datetime from pyspark.sql import * from pyspark.sql.types import * sqc = SQLContext(sc) df = sqc.createDataFrame([], StructType([StructField("dt", TimestampType())])) m1 = pytz.timezone('UTC') m2 = pytz.timezone('Etc/GMT+3') df.filter(df.dt > datetime(2000, 01, 01, tzinfo=m1)).explain() df.filter(df.dt > datetime(2000, 01, 01, tzinfo=m2)).explain() ``` It gives the same timestamp ignoring time zone: ``` >>> df.filter(df.dt > datetime(2000, 01, 01, tzinfo=m1)).explain() Filter (dt#0 > 946713600000000) Scan PhysicalRDD[dt#0] >>> df.filter(df.dt > datetime(2000, 01, 01, tzinfo=m2)).explain() Filter (dt#0 > 946713600000000) Scan PhysicalRDD[dt#0] ``` After the fix: ``` >>> df.filter(df.dt > datetime(2000, 01, 01, tzinfo=m1)).explain() Filter (dt#0 > 946684800000000) Scan PhysicalRDD[dt#0] >>> df.filter(df.dt > datetime(2000, 01, 01, tzinfo=m2)).explain() Filter (dt#0 > 946695600000000) Scan PhysicalRDD[dt#0] ``` PR [8536](apache#8536) was occasionally closed by me dropping the repo Author: 0x0FFF <[email protected]> Closes apache#8555 from 0x0FFF/SPARK-10162.

This PR addresses issue [SPARK-10392](https://issues.apache.org/jira/browse/SPARK-10392) The problem is that for "start of epoch" date (01 Jan 1970) PySpark class DateType returns 0 instead of the `datetime.date` due to implementation of its return statement Issue reproduction on master: ``` >>> from pyspark.sql.types import * >>> a = DateType() >>> a.fromInternal(0) 0 >>> a.fromInternal(1) datetime.date(1970, 1, 2) ``` Author: 0x0FFF <[email protected]> Closes apache#8556 from 0x0FFF/SPARK-10392.

…ct on JobHistory UI. Author: ArcherShao <[email protected]> Closes apache#5886 from ArcherShao/SPARK-7336.

Before apache#8371, there was a bug for `Sort` on `Aggregate` that we can't use aggregate expressions named `_aggOrdering` and can't use more than one ordering expressions which contains aggregate functions. The reason of this bug is that: The aggregate expression in `SortOrder` never get resolved, we alias it with `_aggOrdering` and call `toAttribute` which gives us an `UnresolvedAttribute`. So actually we are referencing aggregate expression by name, not by exprId like we thought. And if there is already an aggregate expression named `_aggOrdering` or there are more than one ordering expressions having aggregate functions, we will have conflict names and can't search by name. However, after apache#8371 got merged, the `SortOrder`s are guaranteed to be resolved and we are always referencing aggregate expression by exprId. The Bug doesn't exist anymore and this PR add regression tests for it. Author: Wenchen Fan <[email protected]> Closes apache#8231 from cloud-fan/sort-agg.

…n on Aggregate For example, we can write `SELECT MAX(value) FROM src GROUP BY key + 1 ORDER BY key + 1` in PostgreSQL, and we should support this in Spark SQL. Author: Wenchen Fan <[email protected]> Closes apache#8548 from cloud-fan/support-order-by-non-attribute.

… data. To correctly isolate applications, when requests to read shuffle data arrive at the shuffle service, proper authorization checks need to be performed. This change makes sure that only the application that created the shuffle data can read from it. Such checks are only enabled when "spark.authenticate" is enabled, otherwise there's no secure way to make sure that the client is really who it says it is. Author: Marcelo Vanzin <[email protected]> Closes apache#8218 from vanzin/SPARK-10004.

`pyspark.sql.column.Column` object has `__getitem__` method, which makes it iterable for Python. In fact it has `__getitem__` to address the case when the column might be a list or dict, for you to be able to access certain element of it in DF API. The ability to iterate over it is just a side effect that might cause confusion for the people getting familiar with Spark DF (as you might iterate this way on Pandas DF for instance) Issue reproduction: ``` df = sqlContext.jsonRDD(sc.parallelize(['{"name": "El Magnifico"}'])) for i in df["name"]: print i ``` Author: 0x0FFF <[email protected]> Closes apache#8574 from 0x0FFF/SPARK-10417.

…verride clone method https://issues.apache.org/jira/browse/SPARK-10422 Author: Yin Huai <[email protected]> Closes apache#8578 from yhuai/SPARK-10422.

Params.getOrDefault should throw a more meaningful exception than what you get from a bad key lookup. Author: Holden Karau <[email protected]> Closes apache#8567 from holdenk/SPARK-9723-params-getordefault-should-throw-more-useful-error.

…edException The ```Stage``` class now tracks whether there were a sufficient number of consecutive failures of that stage to trigger an abort. To avoid an infinite loop of stage retries, we abort the job completely after 4 consecutive stage failures for one stage. We still allow more than 4 consecutive stage failures if there is an intervening successful attempt for the stage, so that in very long-lived applications, where a stage may get reused many times, we don't abort the job after failures that have been recovered from successfully. I've added test cases to exercise the most obvious scenarios. Author: Ilya Ganelin <[email protected]> Closes apache#5636 from ilganeli/SPARK-5945.

…rtitions Added numPartitions(evaluate: Boolean) to RDD. With "evaluate=true" the method is same with "partitions.length". With "evaluate=false", it checks checked-out or already evaluated partitions in the RDD to get number of partition. If it's not those cases, returns -1. RDDInfo.partitionNum calls numPartition only when it's accessed. Author: navis.ryu <[email protected]> Closes apache#7127 from navis/SPARK-8707.

Added fetchUpToMaxBytes() to prevent having to update both code blocks when a change is made. Author: Evan Racah <[email protected]> Closes apache#8514 from eracah/master.

…erSuite This is pretty minor, just trying to improve the readability of `DAGSchedulerSuite`, I figure every bit helps. Before whenever I read this test, I never knew what "should work" and "should be ignored" really meant -- this adds some asserts & updates comments to make it more clear. Also some reformatting per a suggestion from markhamstra on apache#7699 Author: Imran Rashid <[email protected]> Closes apache#8434 from squito/SPARK-10247.

Author: Davies Liu <[email protected]> Closes apache#8543 from davies/preserve_page.

…explain by default New screenshots after this fix: <img width="627" alt="s1" src="https://cloud.githubusercontent.com/assets/1000778/9625782/4b2dba36-518b-11e5-9104-c713ff026e3d.png"> Default: <img width="462" alt="s2" src="https://cloud.githubusercontent.com/assets/1000778/9625817/92366e50-518b-11e5-9981-cdfb774d66b8.png"> After clicking `+details`: <img width="377" alt="s3" src="https://cloud.githubusercontent.com/assets/1000778/9625784/4ba24342-518b-11e5-8522-846a16a95d44.png"> Author: zsxwing <[email protected]> Closes apache#8570 from zsxwing/SPARK-10411.

From Jira: Running spark-submit with yarn with number-executors equal to 0 when not using dynamic allocation should error out. In spark 1.5.0 it continues and ends up hanging. yarn.ClientArguments still has the check so something else must have changed. spark-submit --master yarn --deploy-mode cluster --class org.apache.spark.examples.SparkPi --num-executors 0 .... spark 1.4.1 errors with: java.lang.IllegalArgumentException: Number of executors was 0, but must be at least 1 (or 0 if dynamic executor allocation is enabled). Author: Holden Karau <[email protected]> Closes apache#8580 from holdenk/SPARK-10332-spark-submit-to-yarn-executors-0-message.

…ntLoader https://issues.apache.org/jira/browse/SPARK-9596 Author: WangTaoTheTonic <[email protected]> Closes apache#7931 from WangTaoTheTonic/SPARK-9596.

Spark gives an error message and does not show the output when a field of the result DataFrame contains characters in CJK. I changed SerDe.scala in order that Spark support Unicode characters when writes a string to R. Author: CHOIJAEHONG <[email protected]> Closes apache#7494 from CHOIJAEHONG1/SPARK-8951.

Author: Tom Graves <[email protected]> Closes apache#8585 from tgravescs/SPARK-10432.

…eue to be clear Author: robbins <[email protected]> Closes apache#8582 from robbinspg/InputOutputMetricsSuite.

…rting results Author: robbins <[email protected]> Closes apache#8589 from robbinspg/InputStreamSuite-fix.

…vars This contribution is my original work and I license the work to the project under the project's open source license. Author: Pat Shields <[email protected]> Closes apache#7979 from pashields/env-loading-on-driver.

…DOperationScope Author: Vinod K C <[email protected]> Closes apache#8581 from vinodkc/fix_RDDOperationScope_Hashcode.

…block [SPARK-9591](https://issues.apache.org/jira/browse/SPARK-9591) When we getting the broadcast variable, we can fetch the block form several location,but now when connecting the lost blockmanager(idle for enough time removed by driver when using dynamic resource allocate and so on) will cause task fail,and the worse case will cause the job fail. Author: jeanlyn <[email protected]> Closes apache#7927 from jeanlyn/catch_exception.

…th R It's not supported yet so we should error with a clear message. Author: Andrew Or <[email protected]> Closes apache#8590 from andrewor14/mesos-cluster-r-guard.

…cies. This avoids them being mistakenly pulled instead of the newer ones that Spark actually uses. Spark only depends on these artifacts transitively, so sometimes maven just decides to pick tachyon's version of the dependency for whatever reason. Author: Marcelo Vanzin <[email protected]> Closes apache#8577 from vanzin/SPARK-10421.

Note: this is not intended to be in Spark 1.5! This patch rewrites some code in the `DAGScheduler` to make it more readable. In particular - there were blocks of code that are unnecessary and removed for simplicity - there were abstractions that are unnecessary and made the code hard to navigate - other minor changes Author: Andrew Or <[email protected]> Closes apache#8217 from andrewor14/dag-scheduler-readability and squashes the following commits: 57abca3 [Andrew Or] Move comment back into if case 574fb1e [Andrew Or] Merge branch 'master' of github.com:apache/spark into dag-scheduler-readability 64a9ed2 [Andrew Or] Remove unnecessary code + minor code rewrites

Previously, project/plugins.sbt explicitly set scalaVersion to 2.10.4. This can cause issues when using a version of sbt that is compiled against a different version of Scala (for example sbt 0.13.9 uses 2.10.5). Removing this explicit setting will cause build files to be compiled and run against the same version of Scala that sbt is compiled against. Note that this only applies to the project build files (items in project/), it is distinct from the version of Scala we target for the actual spark compilation. Author: Ahir Reddy <[email protected]> Closes apache#8709 from ahirreddy/sbt-scala-version-fix.

…se LIBSVM data source instead of MLUtils I fixed to use LIBSVM data source in the example code in spark.ml instead of MLUtils Author: y-shimizu <[email protected]> Closes apache#8697 from y-shimizu/SPARK-10518.

…ion in PySpark LinearRegression and LogisticRegression lack of some Params for Python, and some Params are not shared classes which lead we need to write them for each class. These kinds of Params are list here: ```scala HasElasticNetParam HasFitIntercept HasStandardization HasThresholds ``` Here we implement them in shared params at Python side and make LinearRegression/LogisticRegression parameters peer with Scala one. Author: Yanbo Liang <[email protected]> Closes apache#8508 from yanboliang/spark-10026.

…assifier Add Python API for ```MultilayerPerceptronClassifier```. Author: Yanbo Liang <[email protected]> Closes apache#8067 from yanboliang/SPARK-9773.

…nd some minor improvements We should document options in public API doc. Otherwise, it is hard to find out the options without looking at the code. I tried to make `DefaultSource` private and put the documentation to package doc. However, since then there exists no public class under `source.libsvm`, the Java package doc doesn't show up in the generated html file (http://bugs.java.com/bugdatabase/view_bug.do?bug_id=4492654). So I put the doc to `DefaultSource` instead. There are several minor updates in this PR: 1. Do `vectorType == "sparse"` only once. 2. Update `hashCode` and `equals`. 3. Remove inherited doc. 4. Delete temp dir in `afterAll`. Lewuathe Author: Xiangrui Meng <[email protected]> Closes apache#8699 from mengxr/SPARK-10537.

…dataUtils Changes: * Make Scala doc for StringIndexerInverse clearer. Also remove Scala doc from transformSchema, so that the doc is inherited. * MetadataUtils.scala: “ Helper utilities for tree-based algorithms” —> not just trees anymore CC: holdenk mengxr Author: Joseph K. Bradley <[email protected]> Closes apache#8679 from jkbradley/doc-fixes-1.5.

…s" if it is too flaky If hadoopFsRelationSuites's "test all data types" is too flaky we can disable it for now. https://issues.apache.org/jira/browse/SPARK-10540 Author: Yin Huai <[email protected]> Closes apache#8705 from yhuai/SPARK-10540-ignore.

jira: https://issues.apache.org/jira/browse/SPARK-8530 add python API for MinMaxScaler jira for MinMaxScaler: https://issues.apache.org/jira/browse/SPARK-7514 Author: Yuhao Yang <[email protected]> Closes apache#7150 from hhbyyh/pythonMinMax.

See this thread for background: http://search-hadoop.com/m/q3RTt0rWvIkHAE81 We should check the range of partition Id and provide meaningful message through exception. Alternatively, we can use abs() and modulo to force the partition Id into legitimate range. However, expectation is that user should correct the logic error in his / her code. Author: tedyu <[email protected]> Closes apache#8703 from tedyu/master.

Just fixing a typo in exception message, raised when attempting to pickle SparkContext. Author: Icaro Medeiros <[email protected]> Closes apache#8724 from icaromedeiros/master.

When we cast string to boolean in hive, it returns `true` if the length of string is > 0, and spark SQL follows this behavior. However, this behavior is very different from other SQL systems: 1. [presto](https://github.com/facebook/presto/blob/master/presto-main/src/main/java/com/facebook/presto/type/VarcharOperators.java#L89-L118) will return `true` for 't' 'true' '1', `false` for 'f' 'false' '0', throw exception for others. 2. [redshift](http://docs.aws.amazon.com/redshift/latest/dg/r_Boolean_type.html) will return `true` for 't' 'true' 'y' 'yes' '1', `false` for 'f' 'false' 'n' 'no' '0', null for others. 3. [postgresql](http://www.postgresql.org/docs/devel/static/datatype-boolean.html) will return `true` for 't' 'true' 'y' 'yes' 'on' '1', `false` for 'f' 'false' 'n' 'no' 'off' '0', throw exception for others. 4. [vertica](https://my.vertica.com/docs/5.0/HTML/Master/2983.htm) will return `true` for 't' 'true' 'y' 'yes' '1', `false` for 'f' 'false' 'n' 'no' '0', null for others. 5. [impala](http://www.cloudera.com/content/cloudera/en/documentation/cloudera-impala/latest/topics/impala_boolean.html) throw exception when try to cast string to boolean. 6. mysql, oracle, sqlserver don't have boolean type Whether we should change the cast behavior according to other SQL system or not is not decided yet, this PR is a test to see if we changed, how many compatibility tests will fail. Author: Wenchen Fan <[email protected]> Closes apache#8698 from cloud-fan/string2boolean.

…er rule. Incorporate review comments Adding changes suggested by cloud-fan in apache#5700 cc marmbrus Author: Yash Datta <[email protected]> Closes apache#8716 from saucam/bool_simp.

…, sample and intersect operators This PR is in conflict with apache#8535. I will update this one when apache#8535 gets merged. Author: zsxwing <[email protected]> Closes apache#8573 from zsxwing/more-local-operators.

1. Hide `LocalNodeIterator` behind the `LocalNode#asIterator` method 2. Add tests for this Author: Andrew Or <[email protected]> Closes apache#8708 from andrewor14/local-hash-join-follow-up.

…l the test This commit ensures if an assertion fails within a thread, it will ultimately fail the test. Otherwise we end up potentially masking real bugs by not propagating assertion failures properly. Author: Andrew Or <[email protected]> Closes apache#8723 from andrewor14/fix-threading-suite.

… operator This PR addresses (SPARK-9014)[https://issues.apache.org/jira/browse/SPARK-9014] Added functionality: `Column` object in Python now supports exponential operator `**` Example: ``` from pyspark.sql import * df = sqlContext.createDataFrame([Row(a=2)]) df.select(3**df.a,df.a**3,df.a**df.a).collect() ``` Outputs: ``` [Row(POWER(3.0, a)=9.0, POWER(a, 3.0)=8.0, POWER(a, a)=4.0)] ``` Author: 0x0FFF <[email protected]> Closes apache#8658 from 0x0FFF/SPARK-9014.

…asks important error information When throwing an IllegalArgumentException in SnappyCompressionCodec.init, chain the existing exception. This allows potentially important debugging info to be passed to the user. Manual testing shows the exception chained properly, and the test suite still looks fine as well. This contribution is my original work and I license the work to the project under the project's open source license. Author: Daniel Imfeld <[email protected]> Closes apache#8725 from dimfeld/dimfeld-patch-1.

https://issues.apache.org/jira/browse/SPARK-10554 Fixes NPE when ShutdownHook tries to cleanup temporary folders Author: Nithin Asokan <[email protected]> Closes apache#8720 from nasokan/SPARK-10554.

Fix a few Java API test style issues: unused generic types, exceptions, wrong assert argument order Author: Sean Owen <[email protected]> Closes apache#8706 from srowen/SPARK-10547.

Adding STDDEV support for DataFrame using 1-pass online /parallel algorithm to compute variance. Please review the code change. Author: JihongMa <[email protected]> Author: Jihong MA <[email protected]> Author: Jihong MA <[email protected]> Author: Jihong MA <[email protected]> Closes apache#6297 from JihongMA/SPARK-SQL.

…obContext methods This is a followup to apache#8499 which adds a Scalastyle rule to mandate the use of SparkHadoopUtil's JobContext accessor methods and fixes the existing violations. Author: Josh Rosen <[email protected]> Closes apache#8521 from JoshRosen/SPARK-10330-part2.

…r of GraphX Finish deprecating Bagel; remove reference to nonexistent example Author: Sean Owen <[email protected]> Closes apache#8731 from srowen/SPARK-10222.

## What changes were proposed in this pull request? This PR aims to optimize GroupExpressions by removing repeating expressions. `RemoveRepetitionFromGroupExpressions` is added. **Before** ```scala scala> sql("select a+1 from values 1,2 T(a) group by a+1, 1+a, A+1, 1+A").explain() == Physical Plan == WholeStageCodegen : +- TungstenAggregate(key=[(a#0 + 1)#6,(1 + a#0)#7,(A#0 + 1)#8,(1 + A#0)#9], functions=[], output=[(a + 1)#5]) : +- INPUT +- Exchange hashpartitioning((a#0 + 1)#6, (1 + a#0)#7, (A#0 + 1)#8, (1 + A#0)#9, 200), None +- WholeStageCodegen : +- TungstenAggregate(key=[(a#0 + 1) AS (a#0 + 1)#6,(1 + a#0) AS (1 + a#0)#7,(A#0 + 1) AS (A#0 + 1)#8,(1 + A#0) AS (1 + A#0)#9], functions=[], output=[(a#0 + 1)#6,(1 + a#0)#7,(A#0 + 1)#8,(1 + A#0)#9]) : +- INPUT +- LocalTableScan [a#0], [[1],[2]] ``` **After** ```scala scala> sql("select a+1 from values 1,2 T(a) group by a+1, 1+a, A+1, 1+A").explain() == Physical Plan == WholeStageCodegen : +- TungstenAggregate(key=[(a#0 + 1)#6], functions=[], output=[(a + 1)#5]) : +- INPUT +- Exchange hashpartitioning((a#0 + 1)#6, 200), None +- WholeStageCodegen : +- TungstenAggregate(key=[(a#0 + 1) AS (a#0 + 1)#6], functions=[], output=[(a#0 + 1)#6]) : +- INPUT +- LocalTableScan [a#0], [[1],[2]] ``` ## How was this patch tested? Pass the Jenkins tests (with a new testcase) Author: Dongjoon Hyun <[email protected]> Closes apache#12590 from dongjoon-hyun/SPARK-14830.

## What changes were proposed in this pull request? This PR aims to optimize GroupExpressions by removing repeating expressions. `RemoveRepetitionFromGroupExpressions` is added. **Before** ```scala scala> sql("select a+1 from values 1,2 T(a) group by a+1, 1+a, A+1, 1+A").explain() == Physical Plan == WholeStageCodegen : +- TungstenAggregate(key=[(a#0 + 1)#6,(1 + a#0)#7,(A#0 + 1)#8,(1 + A#0)#9], functions=[], output=[(a + 1)#5]) : +- INPUT +- Exchange hashpartitioning((a#0 + 1)#6, (1 + a#0)#7, (A#0 + 1)#8, (1 + A#0)#9, 200), None +- WholeStageCodegen : +- TungstenAggregate(key=[(a#0 + 1) AS (a#0 + 1)#6,(1 + a#0) AS (1 + a#0)#7,(A#0 + 1) AS (A#0 + 1)#8,(1 + A#0) AS (1 + A#0)#9], functions=[], output=[(a#0 + 1)#6,(1 + a#0)#7,(A#0 + 1)#8,(1 + A#0)#9]) : +- INPUT +- LocalTableScan [a#0], [[1],[2]] ``` **After** ```scala scala> sql("select a+1 from values 1,2 T(a) group by a+1, 1+a, A+1, 1+A").explain() == Physical Plan == WholeStageCodegen : +- TungstenAggregate(key=[(a#0 + 1)#6], functions=[], output=[(a + 1)#5]) : +- INPUT +- Exchange hashpartitioning((a#0 + 1)#6, 200), None +- WholeStageCodegen : +- TungstenAggregate(key=[(a#0 + 1) AS (a#0 + 1)#6], functions=[], output=[(a#0 + 1)#6]) : +- INPUT +- LocalTableScan [a#0], [[1],[2]] ``` ## How was this patch tested? Pass the Jenkins tests (with a new testcase) Author: Dongjoon Hyun <[email protected]> Closes apache#12590 from dongjoon-hyun/SPARK-14830. (cherry picked from commit 6e63201) Signed-off-by: Michael Armbrust <[email protected]>

holdenk and others added 30 commits September 1, 2015 10:48

[SPARK-9679] [ML] [PYSPARK] Add Python API for Stop Words Remover

e6e483c

Add a python API for the Stop Words Remover. Author: Holden Karau <[email protected]> Closes apache#8118 from holdenk/SPARK-9679-python-StopWordsRemover.

[SPARK-7336] [HISTORYSERVER] Fix bug that applications status incorre…

c3b881a

…ct on JobHistory UI. Author: ArcherShao <[email protected]> Closes apache#5886 from ArcherShao/SPARK-7336.

[SPARK-10422] [SQL] String column in InMemoryColumnarCache needs to o…

03f3e91

…verride clone method https://issues.apache.org/jira/browse/SPARK-10422 Author: Yin Huai <[email protected]> Closes apache#8578 from yhuai/SPARK-10422.

Removed code duplication in ShuffleBlockFetcherIterator

f6c447f

Added fetchUpToMaxBytes() to prevent having to update both code blocks when a change is made. Author: Evan Racah <[email protected]> Closes apache#8514 from eracah/master.

[SPARK-10379] preserve first page in UnsafeShuffleExternalSorter

62b4690

Author: Davies Liu <[email protected]> Closes apache#8543 from davies/preserve_page.

[SPARK-9596] [SQL] treat hadoop classes as shared one in IsolatedClie…

3abc0d5

…ntLoader https://issues.apache.org/jira/browse/SPARK-9596 Author: WangTaoTheTonic <[email protected]> Closes apache#7931 from WangTaoTheTonic/SPARK-9596.

[SPARK-10432] spark.port.maxRetries documentation is unclear

49aff7b

Author: Tom Graves <[email protected]> Closes apache#8585 from tgravescs/SPARK-10432.

[SPARK-10431] [CORE] Fix intermittent test failure. Wait for event qu…

d911c68

…eue to be clear Author: robbins <[email protected]> Closes apache#8582 from robbinspg/InputOutputMetricsSuite.

[SPARK-9869] [STREAMING] Wait for all event notifications before asse…

754f853

…rting results Author: robbins <[email protected]> Closes apache#8589 from robbinspg/InputStreamSuite-fix.

[SPARK-10430] [CORE] Added hashCode methods in AccumulableInfo and RD…

11ef32c

…DOperationScope Author: Vinod K C <[email protected]> Closes apache#8581 from vinodkc/fix_RDDOperationScope_Hashcode.

[SPARK-10435] Spark submit should fail fast for Mesos cluster mode wi…

08b0750

…th R It's not supported yet so we should error with a clear message. Author: Andrew Or <[email protected]> Closes apache#8590 from andrewor14/mesos-cluster-r-guard.

ahirreddy and others added 24 commits September 11, 2015 13:06

[SPARK-10518] [DOCS] Update code examples in spark.ml user guide to u…

c268ca4

…se LIBSVM data source instead of MLUtils I fixed to use LIBSVM data source in the example code in spark.ml instead of MLUtils Author: y-shimizu <[email protected]> Closes apache#8697 from y-shimizu/SPARK-10518.

[SPARK-9773] [ML] [PySpark] Add Python API for MultilayerPerceptronCl…

b01b262

…assifier Add Python API for ```MultilayerPerceptronClassifier```. Author: Yanbo Liang <[email protected]> Closes apache#8067 from yanboliang/SPARK-9773.

[PYTHON] Fixed typo in exception message

c373866

Just fixing a typo in exception message, raised when attempting to pickle SparkContext. Author: Icaro Medeiros <[email protected]> Closes apache#8724 from icaromedeiros/master.

[SPARK-7142] [SQL] Minor enhancement to BooleanSimplification Optimiz…

1eede3b

…er rule. Incorporate review comments Adding changes suggested by cloud-fan in apache#5700 cc marmbrus Author: Yash Datta <[email protected]> Closes apache#8716 from saucam/bool_simp.

[SPARK-9990] [SQL] Local hash join follow-ups

c2af42b

1. Hide `LocalNodeIterator` behind the `LocalNode#asIterator` method 2. Add tests for this Author: Andrew Or <[email protected]> Closes apache#8708 from andrewor14/local-hash-join-follow-up.

[SPARK-10554] [CORE] Fix NPE with ShutdownHook

8285e3b

https://issues.apache.org/jira/browse/SPARK-10554 Fixes NPE when ShutdownHook tries to cleanup temporary folders Author: Nithin Asokan <[email protected]> Closes apache#8720 from nasokan/SPARK-10554.

[SPARK-10547] [TEST] Streamline / improve style of Java API tests

22730ad

Fix a few Java API test style issues: unused generic types, exceptions, wrong assert argument order Author: Sean Owen <[email protected]> Closes apache#8706 from srowen/SPARK-10547.

[SPARK-10222] [GRAPHX] [DOCS] More thoroughly deprecate Bagel in favo…

1dc614b

…r of GraphX Finish deprecating Bagel; remove reference to nonexistent example Author: Sean Owen <[email protected]> Closes apache#8731 from srowen/SPARK-10222.

Deprecates SQLConf.PARQUET_FOLLOW_PARQUET_FORMAT_SPEC

7d94924

Removes instead of deprecates the old option

85bbfde

liancheng force-pushed the spark-10400/deprecate-follow-parquet-format-spec branch from b3f7877 to 85bbfde Compare September 14, 2015 07:24

liancheng closed this Sep 16, 2015

liancheng deleted the spark-10400/deprecate-follow-parquet-format-spec branch October 2, 2015 01:04

liancheng restored the spark-10400/deprecate-follow-parquet-format-spec branch October 6, 2015 23:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-10400] [SQL] Deprecates SQLConf.PARQUET_FOLLOW_PARQUET_FORMAT_SPEC #7

[SPARK-10400] [SQL] Deprecates SQLConf.PARQUET_FOLLOW_PARQUET_FORMAT_SPEC #7

Uh oh!

liancheng commented Sep 1, 2015

Uh oh!

Uh oh!

[SPARK-10400] [SQL] Deprecates SQLConf.PARQUET_FOLLOW_PARQUET_FORMAT_SPEC #7

[SPARK-10400] [SQL] Deprecates SQLConf.PARQUET_FOLLOW_PARQUET_FORMAT_SPEC #7

Uh oh!

Conversation

liancheng commented Sep 1, 2015

Uh oh!

Uh oh!