Skip to content

Merge upstream #299

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 358 commits into from
Jan 25, 2018
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
358 commits
Select commit Hold shift + click to select a range
1abcbed
[SPARK-22763][CORE] SHS: Ignore unknown events and parse through the …
gengliangwang Dec 13, 2017
0bdb4e5
[SPARK-22574][MESOS][SUBMIT] Check submission request parameters
Gschiavon Dec 13, 2017
ba0e79f
[SPARK-22772][SQL] Use splitExpressionsWithCurrentInputs to split cod…
viirya Dec 13, 2017
a83e8e6
[SPARK-22764][CORE] Fix flakiness in SparkContextSuite.
Dec 13, 2017
ef92999
[SPARK-22600][SQL][FOLLOW-UP] Fix a compilation error in TPCDS q75/q77
maropu Dec 13, 2017
bc7e4a9
Revert "[SPARK-22600][SQL][FOLLOW-UP] Fix a compilation error in TPCD…
cloud-fan Dec 14, 2017
2a29a60
Revert "[SPARK-22600][SQL] Fix 64kb limit for deeply nested expressio…
cloud-fan Dec 14, 2017
1e44dd0
[SPARK-3181][ML] Implement huber loss for LinearRegression.
yanboliang Dec 14, 2017
f8c7c1f
[SPARK-22732] Add Structured Streaming APIs to DataSourceV2
jose-torres Dec 14, 2017
c3dd2a2
[SPARK-22779][SQL] Resolve default values for fallback configs.
Dec 14, 2017
7d8e2ca
[SPARK-22775][SQL] move dictionary related APIs from ColumnVector to …
cloud-fan Dec 14, 2017
d095795
[SPARK-22785][SQL] remove ColumnVector.anyNullsSet
cloud-fan Dec 14, 2017
606ae49
[SPARK-22774][SQL][TEST] Add compilation check into TPCDSQuerySuite
kiszk Dec 14, 2017
40de176
[SPARK-16496][SQL] Add wholetext as option for reading text in SQL.
ScrapCodes Dec 14, 2017
6d99940
[SPARK-22660][BUILD] Use position() and limit() to fix ambiguity issu…
kellyzly Dec 14, 2017
2fe1633
[SPARK-22778][KUBERNETES] Added the missing service metadata for Kube…
liyinan926 Dec 14, 2017
59daf91
[SPARK-22733] Split StreamExecution into MicroBatchExecution and Stre…
jose-torres Dec 14, 2017
0ea2d8c
[SPARK-22496][SQL] thrift server adds operation logs
Dec 14, 2017
3fea5c4
[SPARK-22787][TEST][SQL] Add a TPC-H query suite
gatorsmile Dec 15, 2017
3775dd3
[SPARK-22753][SQL] Get rid of dataSource.writeAndRead
xuanyuanking Dec 15, 2017
e58f275
Revert "[SPARK-22496][SQL] thrift server adds operation logs"
gatorsmile Dec 15, 2017
9fafa82
[SPARK-22800][TEST][SQL] Add a SSB query suite
maropu Dec 15, 2017
4677623
[SPARK-22762][TEST] Basic tests for IfCoercion and CaseWhenCoercion
wangyum Dec 15, 2017
0c8fca4
[SPARK-22811][PYSPARK][ML] Fix pyspark.ml.tests failure when Hive is …
MrBago Dec 16, 2017
c2aeddf
[SPARK-22817][R] Use fixed testthat version for SparkR tests in AppVeyor
HyukjinKwon Dec 17, 2017
77988a9
[MINOR][DOC] Fix the link of 'Getting Started'
mcavdar Dec 17, 2017
7f6d10a
[SPARK-22816][TEST] Basic tests for PromoteStrings and InConversion
wangyum Dec 17, 2017
fb3636b
[SPARK-22807][SCHEDULER] Remove config that says docker and replace w…
foxish Dec 18, 2017
772e464
[SPARK-20653][CORE] Add cleaning of old elements from the status store.
Dec 18, 2017
fbfa9be
Revert "Revert "[SPARK-22496][SQL] thrift server adds operation logs""
HyukjinKwon Dec 18, 2017
3a07eff
[SPARK-22813][BUILD] Use lsof or /usr/sbin/lsof in run-tests.py
kiszk Dec 18, 2017
0609dcc
[SPARK-22777][SCHEDULER] Kubernetes mode dockerfile permission and di…
foxish Dec 18, 2017
d4e6959
[MINOR][SQL] Remove Useless zipWithIndex from ResolveAliases
gatorsmile Dec 19, 2017
ab7346f
[SPARK-22673][SQL] InMemoryRelation should utilize existing stats whe…
CodingCat Dec 19, 2017
571aa27
[SPARK-21984][SQL] Join estimation based on equi-height histogram
wzhfy Dec 19, 2017
2831571
[SPARK-22791][SQL][SS] Redact Output of Explain
gatorsmile Dec 19, 2017
b779c93
[SPARK-22815][SQL] Keep PromotePrecision in Optimized Plans
gatorsmile Dec 19, 2017
ee56fc3
[SPARK-18016][SQL] Code Generation: Constant Pool Limit - reduce entr…
kiszk Dec 19, 2017
ef10f45
[SPARK-21652][SQL][FOLLOW-UP] Fix rule conflict caused by InferFilter…
gatorsmile Dec 19, 2017
6129ffa
[SPARK-22821][TEST] Basic tests for WidenSetOperationTypes, BooleanEq…
wangyum Dec 19, 2017
3a7494d
[SPARK-22827][CORE] Avoid throwing OutOfMemoryError in case of except…
Dec 20, 2017
6e36d8d
[SPARK-22829] Add new built-in function date_trunc()
youngbink Dec 20, 2017
13268a5
[SPARK-22649][PYTHON][SQL] Adding localCheckpoint to Dataset API
ferdonline Dec 20, 2017
9962390
[SPARK-22781][SS] Support creating streaming dataset with ORC files
dongjoon-hyun Dec 20, 2017
7570eab
[SPARK-22788][STREAMING] Use correct hadoop config for fs append supp…
Dec 20, 2017
7798c9e
[SPARK-22824] Restore old offset for binary compatibility
jose-torres Dec 20, 2017
d762d11
[SPARK-22832][ML] BisectingKMeans unpersist unused datasets
zhengruifeng Dec 20, 2017
c89b431
[SPARK-22849] ivy.retrieve pattern should also consider `classifier`
gatorsmile Dec 20, 2017
792915c
[SPARK-22830] Scala Coding style has been improved in Spark Examples
chetkhatri Dec 20, 2017
b176014
[SPARK-22847][CORE] Remove redundant code in AppStatusListener while …
Ngone51 Dec 20, 2017
0114c89
[SPARK-22845][SCHEDULER] Modify spark.kubernetes.allocation.batch.del…
foxish Dec 21, 2017
fb0562f
[SPARK-22810][ML][PYSPARK] Expose Python API for LinearRegression wit…
yanboliang Dec 21, 2017
9c289a5
[SPARK-22387][SQL] Propagate session configs to data source read/writ…
jiangxb1987 Dec 21, 2017
d3ae3e1
[SPARK-19634][SQL][ML][FOLLOW-UP] Improve interface of dataframe vect…
WeichenXu123 Dec 21, 2017
cb9fc8d
[SPARK-22848][SQL] Eliminate mutable state from Stack
kiszk Dec 21, 2017
59d5263
[SPARK-22324][SQL][PYTHON] Upgrade Arrow to 0.8.0
BryanCutler Dec 21, 2017
0abaf31
[SPARK-22852][BUILD] Exclude -Xlint:unchecked from sbt javadoc flags
easel Dec 21, 2017
4c2efde
[SPARK-22855][BUILD] Add -no-java-comments to sbt docs/scalacOptions
easel Dec 21, 2017
8a0ed5a
[SPARK-22668][SQL] Ensure no global variables in arguments of method …
cloud-fan Dec 21, 2017
d3a1d95
[SPARK-22786][SQL] only use AppStatusPlugin in history server
cloud-fan Dec 21, 2017
4e107fd
[SPARK-22822][TEST] Basic tests for WindowFrameCoercion and DecimalPr…
wangyum Dec 21, 2017
fe65361
[SPARK-22042][FOLLOW-UP][SQL] ReorderJoinPredicates can break when ch…
tejasapatil Dec 21, 2017
7beb375
[SPARK-22861][SQL] SQLAppStatusListener handles multi-job executions.
squito Dec 21, 2017
7ab165b
[SPARK-22648][K8S] Spark on Kubernetes - Documentation
foxish Dec 22, 2017
c0abb1d
[SPARK-22854][UI] Read Spark version from event logs.
Dec 22, 2017
c6f01ca
[SPARK-22750][SQL] Reuse mutable states when possible
mgaido91 Dec 22, 2017
a36b78b
[SPARK-22450][CORE][MLLIB][FOLLOWUP] safely register class for mllib …
zhengruifeng Dec 22, 2017
22e1849
[SPARK-22866][K8S] Fix path issue in Kubernetes dockerfile
foxish Dec 22, 2017
8df1da3
[SPARK-22862] Docs on lazy elimination of columns missing from an enc…
marmbrus Dec 22, 2017
13190a4
[SPARK-22874][PYSPARK][SQL] Modify checking pandas version to use Loo…
ueshin Dec 22, 2017
d23dc5b
[SPARK-22346][ML] VectorSizeHint Transformer for using VectorAssemble…
MrBago Dec 22, 2017
8941a4a
[SPARK-22789] Map-only continuous processing execution
jose-torres Dec 23, 2017
86db9b2
[SPARK-22833][IMPROVEMENT] in SparkHive Scala Examples
chetkhatri Dec 23, 2017
ea2642e
[SPARK-20694][EXAMPLES] Update SQLDataSourceExample.scala
CNRui Dec 23, 2017
f6084a8
[HOTFIX] Fix Scala style checks
HyukjinKwon Dec 23, 2017
aeb45df
[SPARK-22844][R] Adds date_trunc in R API
HyukjinKwon Dec 23, 2017
1219d7a
[SPARK-22889][SPARKR] Set overwrite=T when install SparkR in tests
shivaram Dec 23, 2017
0bf1a74
[SPARK-22465][CORE] Add a safety-check to RDD defaultPartitioner
Dec 24, 2017
fba0313
[SPARK-22707][ML] Optimize CrossValidator memory occupation by models…
WeichenXu123 Dec 25, 2017
33ae243
[SPARK-22893][SQL] Unified the data type mismatch message
wangyum Dec 25, 2017
12d20dd
[SPARK-22874][PYSPARK][SQL][FOLLOW-UP] Modify error messages to show …
ueshin Dec 25, 2017
be03d3a
[SPARK-22893][SQL][HOTFIX] Fix a error message of VersionsSuite
dongjoon-hyun Dec 26, 2017
0e68330
[SPARK-20168][DSTREAM] Add changes to use kinesis fetches from specif…
yashs360 Dec 26, 2017
eb386be
[SPARK-21552][SQL] Add DecimalType support to ArrowWriter.
ueshin Dec 26, 2017
ff48b1b
[SPARK-22901][PYTHON] Add deterministic flag to pyspark UDF
mgaido91 Dec 26, 2017
9348e68
[SPARK-22833][EXAMPLE] Improvement SparkHive Scala Examples
cloud-fan Dec 26, 2017
91d1b30
[SPARK-22894][SQL] DateTimeOperations should accept SQL like string type
wangyum Dec 26, 2017
6674acd
[SPARK-22846][SQL] Fix table owner is null when creating table throug…
Dec 27, 2017
b8bfce5
[SPARK-22324][SQL][PYTHON][FOLLOW-UP] Update setup.py file.
ueshin Dec 27, 2017
774715d
[SPARK-22904][SQL] Add tests for decimal operations and string casts
mgaido91 Dec 27, 2017
753793b
[SPARK-22899][ML][STREAMING] Fix OneVsRestModel transform on streamin…
WeichenXu123 Dec 28, 2017
5683984
[SPARK-18016][SQL][FOLLOW-UP] Code Generation: Constant Pool Limit - …
kiszk Dec 28, 2017
32ec269
[SPARK-22909][SS] Move Structured Streaming v2 APIs to streaming folder
zsxwing Dec 28, 2017
171f6dd
[SPARK-22757][KUBERNETES] Enable use of remote dependencies (http, s3…
liyinan926 Dec 28, 2017
ded6d27
[SPARK-22648][K8S] Add documentation covering init containers and sec…
liyinan926 Dec 28, 2017
76e8a1d
[SPARK-22843][R] Adds localCheckpoint in R
HyukjinKwon Dec 28, 2017
1eebfbe
[SPARK-21208][R] Adds setLocalProperty and getLocalProperty in R
HyukjinKwon Dec 28, 2017
755f2f5
[SPARK-20392][SQL][FOLLOWUP] should not add extra AnalysisBarrier
cloud-fan Dec 28, 2017
2877817
[SPARK-22917][SQL] Should not try to generate histogram for empty/nul…
wzhfy Dec 28, 2017
5536f31
[MINOR][BUILD] Fix Java linter errors
dongjoon-hyun Dec 28, 2017
8f6d573
[SPARK-22875][BUILD] Assembly build fails for a high user id
gerashegalov Dec 28, 2017
9c21ece
[SPARK-22836][UI] Show driver logs in UI when available.
Dec 28, 2017
613b71a
[SPARK-22890][TEST] Basic tests for DateTimeOperations
wangyum Dec 28, 2017
cfcd746
[SPARK-11035][CORE] Add in-process Spark app launcher.
Dec 28, 2017
ffe6fd7
[SPARK-22818][SQL] csv escape of quote escape
Dec 28, 2017
c745730
[SPARK-22905][MLLIB] Fix ChiSqSelectorModel save implementation
WeichenXu123 Dec 29, 2017
796e48c
[SPARK-22313][PYTHON][FOLLOWUP] Explicitly import warnings namespace …
HyukjinKwon Dec 29, 2017
67ea11e
[SPARK-22891][SQL] Make hive client creation thread safe
Dec 29, 2017
d4f0b1d
[SPARK-22834][SQL] Make insertion commands have real children to fix …
gengliangwang Dec 29, 2017
224375c
[SPARK-22892][SQL] Simplify some estimation logic by using double ins…
wzhfy Dec 29, 2017
cc30ef8
[SPARK-22916][SQL] shouldn't bias towards build right if user does no…
Dec 29, 2017
fcf66a3
[SPARK-21657][SQL] optimize explode quadratic memory consumpation
uzadude Dec 29, 2017
dbd492b
[SPARK-22921][PROJECT-INFRA] Choices for Assigning Jira on Merge
squito Dec 29, 2017
11a849b
[SPARK-22370][SQL][PYSPARK][FOLLOW-UP] Fix a test failure when xmlrun…
ueshin Dec 29, 2017
8b49704
[SPARK-20654][CORE] Add config to limit disk usage of the history ser…
Dec 29, 2017
4e9e6ae
[SPARK-22864][CORE] Disable allocation schedule in ExecutorAllocation…
Dec 29, 2017
afc3641
[SPARK-22905][ML][FOLLOWUP] Fix GaussianMixtureModel save
zhengruifeng Dec 29, 2017
66a7d6b
[SPARK-22920][SPARKR] sql functions for current_date, current_timesta…
felixcheung Dec 29, 2017
ccda75b
[SPARK-22921][PROJECT-INFRA] Bug fix in jira assigning
squito Dec 29, 2017
30fcdc0
[SPARK-22922][ML][PYSPARK] Pyspark portion of the fit-multiple API
MrBago Dec 30, 2017
8169630
[SPARK-22734][ML][PYSPARK] Added Python API for VectorSizeHint.
MrBago Dec 30, 2017
2ea17af
[SPARK-22881][ML][TEST] ML regression package testsuite add Structure…
WeichenXu123 Dec 30, 2017
f2b3525
[SPARK-22771][SQL] Concatenate binary inputs into a binary output
maropu Dec 30, 2017
14c4a62
[SPARK-21475][Core]Revert "[SPARK-21475][CORE] Use NIO's Files API to…
zsxwing Dec 30, 2017
234d943
[TEST][MINOR] remove redundant `EliminateSubqueryAliases` in test code
wzhfy Dec 30, 2017
fd7d141
[SPARK-22919] Bump httpclient versions
Dec 30, 2017
ea0a5ee
[SPARK-22924][SPARKR] R API for sortWithinPartitions
felixcheung Dec 30, 2017
ee3af15
[SPARK-22363][SQL][TEST] Add unit test for Window spilling
gaborgsomogyi Dec 31, 2017
cfbe11e
[SPARK-22895][SQL] Push down the deterministic predicates that are af…
gatorsmile Dec 31, 2017
3d8837e
[SPARK-22397][ML] add multiple columns support to QuantileDiscretizer
huaxingao Dec 31, 2017
028ee40
[SPARK-22801][ML][PYSPARK] Allow FeatureHasher to treat numeric colum…
Dec 31, 2017
5955a2d
[MINOR][DOCS] s/It take/It takes/g
jkremser Dec 31, 2017
994065d
[SPARK-13030][ML] Create OneHotEncoderEstimator for OneHotEncoder as …
viirya Dec 31, 2017
f5b7714
[BUILD] Close stale PRs
srowen Jan 1, 2018
7a702d8
[SPARK-21616][SPARKR][DOCS] update R migration guide and vignettes
felixcheung Jan 1, 2018
c284c4e
[MINOR] Fix a bunch of typos
srowen Dec 31, 2017
1c9f95c
[SPARK-22530][PYTHON][SQL] Adding Arrow support for ArrayType
BryanCutler Jan 1, 2018
e734a4b
[SPARK-21893][SPARK-22142][TESTS][FOLLOWUP] Enables PySpark tests for…
HyukjinKwon Jan 1, 2018
e0c090f
[SPARK-22932][SQL] Refactor AnalysisContext
gatorsmile Jan 2, 2018
a6fc300
[SPARK-22897][CORE] Expose stageAttemptId in TaskContext
advancedxy Jan 2, 2018
247a089
[SPARK-22938] Assert that SQLConf.get is accessed only on the driver.
juliuszsompolski Jan 3, 2018
1a87a16
[SPARK-22934][SQL] Make optional clauses order insensitive for CREATE…
gatorsmile Jan 3, 2018
a66fe36
[SPARK-20236][SQL] dynamic partition overwrite
cloud-fan Jan 3, 2018
9a2b65a
[SPARK-22896] Improvement in String interpolation
chetkhatri Jan 3, 2018
b297029
[SPARK-20960][SQL] make ColumnVector public
cloud-fan Jan 3, 2018
7d045c5
[SPARK-22944][SQL] improve FoldablePropagation
cloud-fan Jan 4, 2018
df95a90
[SPARK-22933][SPARKR] R Structured Streaming API for withWatermark, t…
felixcheung Jan 4, 2018
9fa703e
[SPARK-22950][SQL] Handle ChildFirstURLClassLoader's parent
yaooqinn Jan 4, 2018
d5861ab
[SPARK-22945][SQL] add java UDF APIs in the functions object
cloud-fan Jan 4, 2018
5aadbc9
[SPARK-22939][PYSPARK] Support Spark UDF in registerFunction
gatorsmile Jan 4, 2018
6f68316
[SPARK-22771][SQL] Add a missing return statement in Concat.checkInpu…
maropu Jan 4, 2018
93f92c0
[SPARK-21475][CORE][2ND ATTEMPT] Change to use NIO's Files API for ex…
jerryshao Jan 4, 2018
d2cddc8
[SPARK-22850][CORE] Ensure queued events are delivered to all event q…
Jan 4, 2018
95f9659
[SPARK-22948][K8S] Move SparkPodInitContainer to correct package.
Jan 4, 2018
e288fc8
[SPARK-22953][K8S] Avoids adding duplicated secret volumes when init-…
liyinan926 Jan 4, 2018
0428368
[SPARK-22960][K8S] Make build-push-docker-images.sh more dev-friendly.
Jan 5, 2018
df7fc3e
[SPARK-22957] ApproxQuantile breaks if the number of rows exceeds MaxInt
juliuszsompolski Jan 5, 2018
52fc5c1
[SPARK-22825][SQL] Fix incorrect results of Casting Array to String
maropu Jan 5, 2018
cf0aa65
[SPARK-22949][ML] Apply CrossValidator approach to Driver/Distributed…
MrBago Jan 5, 2018
6cff7d1
[SPARK-22757][K8S] Enable spark.jars and spark.files in KUBERNETES mode
liyinan926 Jan 5, 2018
51c33bd
[SPARK-22961][REGRESSION] Constant columns should generate QueryPlanC…
adrian-ionescu Jan 5, 2018
c0b7424
[SPARK-22940][SQL] HiveExternalCatalogVersionsSuite should succeed on…
bersprockets Jan 5, 2018
930b90a
[SPARK-13030][ML] Follow-up cleanups for OneHotEncoderEstimator
jkbradley Jan 5, 2018
ea95683
[SPARK-22914][DEPLOY] Register history.ui.port
gerashegalov Jan 6, 2018
e8af7e8
[SPARK-22937][SQL] SQL elt output binary for binary inputs
maropu Jan 6, 2018
bf65cd3
[SPARK-22960][K8S] Revert use of ARG base_image in images
liyinan926 Jan 6, 2018
f2dd8b9
[SPARK-22930][PYTHON][SQL] Improve the description of Vectorized UDFs…
icexelloss Jan 6, 2018
be9a804
[SPARK-22793][SQL] Memory leak in Spark Thrift Server
Jan 6, 2018
7b78041
[SPARK-21786][SQL] When acquiring 'compressionCodecClassName' in 'Par…
fjh100456 Jan 6, 2018
993f215
[SPARK-22901][PYTHON][FOLLOWUP] Adds the doc for asNondeterministic f…
HyukjinKwon Jan 6, 2018
9a7048b
[HOTFIX] Fix style checking failure
gatorsmile Jan 6, 2018
18e9414
[SPARK-22973][SQL] Fix incorrect results of Casting Map to String
maropu Jan 7, 2018
71d65a3
[SPARK-22985] Fix argument escaping bug in from_utc_timestamp / to_ut…
JoshRosen Jan 8, 2018
3e40eb3
[SPARK-22566][PYTHON] Better error message for `_merge_type` in Panda…
gberger-palantir Jan 8, 2018
8fdeb4b
[SPARK-22979][PYTHON][SQL] Avoid per-record type dispatch in Python d…
HyukjinKwon Jan 8, 2018
2c73d2a
[SPARK-22983] Don't push filters beneath aggregates with empty groupi…
JoshRosen Jan 8, 2018
eb45b52
[SPARK-21865][SQL] simplify the distribution semantic of Spark SQL
cloud-fan Jan 8, 2018
40b983c
[SPARK-22952][CORE] Deprecate stageAttemptId in favour of stageAttemp…
advancedxy Jan 8, 2018
eed82a0
[SPARK-22992][K8S] Remove assumption of the DNS domain
foxish Jan 8, 2018
4f7e758
[SPARK-22912] v2 data source support in MicroBatchExecution
jose-torres Jan 8, 2018
68ce792
[SPARK-22972] Couldn't find corresponding Hive SerDe for data source …
xubo245 Jan 9, 2018
849043c
[SPARK-22990][CORE] Fix method isFairScheduler in JobsTab and StagesTab
gengliangwang Jan 9, 2018
f20131d
[SPARK-22984] Fix incorrect bitmap copying and offset adjustment in G…
JoshRosen Jan 9, 2018
8486ad4
[SPARK-21292][DOCS] refreshtable example
felixcheung Jan 9, 2018
02214b0
[SPARK-21293][SPARKR][DOCS] structured streaming doc update
felixcheung Jan 9, 2018
0959aa5
[SPARK-23000] Fix Flaky test suite DataSourceWithHiveMetastoreCatalog…
gatorsmile Jan 9, 2018
6a4206f
[SPARK-22998][K8S] Set missing value for SPARK_MOUNTED_CLASSPATH in t…
liyinan926 Jan 9, 2018
f44ba91
[SPARK-16060][SQL] Support Vectorized ORC Reader
dongjoon-hyun Jan 9, 2018
2250cb7
[SPARK-22981][SQL] Fix incorrect results of Casting Struct to String
maropu Jan 9, 2018
96ba217
[SPARK-23005][CORE] Improve RDD.take on small number of partitions
gengliangwang Jan 10, 2018
6f169ca
[MINOR] fix a typo in BroadcastJoinSuite
cloud-fan Jan 10, 2018
7bcc266
[SPARK-23018][PYTHON] Fix createDataFrame from Pandas timestamp serie…
BryanCutler Jan 10, 2018
e599837
[SPARK-23009][PYTHON] Fix for non-str col names to createDataFrame fr…
BryanCutler Jan 10, 2018
edf0a48
[SPARK-22982] Remove unsafe asynchronous close() call from FileDownlo…
JoshRosen Jan 10, 2018
eaac60a
[SPARK-16060][SQL][FOLLOW-UP] add a wrapper solution for vectorized o…
cloud-fan Jan 10, 2018
70bcc9d
[SPARK-22993][ML] Clarify HasCheckpointInterval param doc
sethah Jan 10, 2018
f340b6b
[SPARK-22997] Add additional defenses against use of freed MemoryBlocks
JoshRosen Jan 10, 2018
344e3aa
[SPARK-23019][CORE] Wait until SparkContext.stop() finished in SparkL…
gengliangwang Jan 10, 2018
9b33dfc
[SPARK-22951][SQL] fix aggregation after dropDuplicates on empty data…
Jan 10, 2018
a6647ff
[SPARK-22587] Spark job fails if fs.defaultFS and application jar are…
Jan 11, 2018
87c98de
[SPARK-23001][SQL] Fix NullPointerException when DESC a database with…
gatorsmile Jan 11, 2018
1c70da3
[SPARK-20657][CORE] Speed up rendering of the stages page.
Jan 11, 2018
0552c36
[SPARK-22967][TESTS] Fix VersionSuite's unit tests by change Windows …
Ngone51 Jan 11, 2018
76892bc
[SPARK-23000][TEST-HADOOP2.6] Fix Flaky test suite DataSourceWithHive…
gatorsmile Jan 11, 2018
b46e58b
[SPARK-19732][FOLLOW-UP] Document behavior changes made in na.fill an…
gatorsmile Jan 11, 2018
6d230dc
Update PageRank.scala
ddna1021 Jan 11, 2018
0b2eefb
[SPARK-22994][K8S] Use a single image for all Spark containers.
Jan 11, 2018
6f7aaed
[SPARK-22908] Add kafka source and sink for continuous processing.
jose-torres Jan 11, 2018
186bf8f
[SPARK-23046][ML][SPARKR] Have RFormula include VectorSizeHint in pip…
MrBago Jan 11, 2018
b5042d7
[SPARK-23008][ML] OnehotEncoderEstimator python API
WeichenXu123 Jan 12, 2018
cbe7c6f
[SPARK-22986][CORE] Use a cache to avoid instantiating multiple insta…
ho3rexqj Jan 12, 2018
a7d98d5
[SPARK-23008][ML][FOLLOW-UP] mark OneHotEncoder python API deprecated
WeichenXu123 Jan 12, 2018
5050868
[SPARK-23025][SQL] Support Null type in scala reflection
mgaido91 Jan 12, 2018
f5300fb
Update rdd-programming-guide.md
Jan 12, 2018
651f761
[SPARK-23028] Bump master branch version to 2.4.0-SNAPSHOT
gatorsmile Jan 12, 2018
7bd14cf
[MINOR][BUILD] Fix Java linter errors
dongjoon-hyun Jan 12, 2018
03e1901
Merge branch 'master' into rk/merge-upstream
Jan 12, 2018
5427739
[SPARK-22975][SS] MetricsReporter should not throw exception when the…
mgaido91 Jan 12, 2018
66dd9cb
Resolve conflicts keeping our k8s code
Jan 12, 2018
55dbfbc
Revert "[SPARK-22908] Add kafka source and sink for continuous proces…
sameeragarwal Jan 12, 2018
cd9f49a
[SPARK-22980][PYTHON][SQL] Clarify the length of each series is of ea…
HyukjinKwon Jan 13, 2018
628a1ca
[SPARK-23043][BUILD] Upgrade json4s to 3.5.3
shimamoto Jan 13, 2018
fc6fe8a
[SPARK-22870][CORE] Dynamic allocation should allow 0 idle time
wangyum Jan 13, 2018
bd4a21b
[SPARK-23036][SQL][TEST] Add withGlobalTempView for testing
xubo245 Jan 13, 2018
ba891ec
[SPARK-22790][SQL] add a configurable factor to describe HadoopFsRela…
CodingCat Jan 13, 2018
0066d6f
[SPARK-21213][SQL][FOLLOWUP] Use compatible types for comparisons in …
maropu Jan 13, 2018
afae8f2
[SPARK-22959][PYTHON] Configuration to select the modules for daemon …
HyukjinKwon Jan 14, 2018
c3548d1
[SPARK-23063][K8S] K8s changes for publishing scripts (and a couple o…
foxish Jan 14, 2018
7a3d0aa
[SPARK-23038][TEST] Update docker/spark-test (JDK/OS)
dongjoon-hyun Jan 14, 2018
66738d2
[SPARK-23069][DOCS][SPARKR] fix R doc for describe missing text
felixcheung Jan 14, 2018
990f05c
[SPARK-23021][SQL] AnalysisBarrier should override innerChildren to p…
maropu Jan 14, 2018
60eeecd
[SPARK-23051][CORE] Fix for broken job description in Spark UI
smurakozi Jan 14, 2018
42a1a15
[SPARK-22999][SQL] show databases like command' can remove the like k…
Jan 14, 2018
b98ffa4
[SPARK-23054][SQL] Fix incorrect results of casting UserDefinedType t…
maropu Jan 15, 2018
9a96bfc
[SPARK-23049][SQL] `spark.sql.files.ignoreCorruptFiles` should work f…
dongjoon-hyun Jan 15, 2018
2b9fdfe
Consistent versions
Jan 15, 2018
1f2962c
one bouncycastle
Jan 15, 2018
91a6f1f
versions
Jan 15, 2018
b598083
[SPARK-23023][SQL] Cast field data to strings in showString
maropu Jan 15, 2018
a38c887
[SPARK-19550][BUILD][FOLLOW-UP] Remove MaxPermSize for sql module
wangyum Jan 15, 2018
bd08a9e
[SPARK-23070] Bump previousSparkVersion in MimaBuild.scala to be 2.2.0
gatorsmile Jan 15, 2018
6c81fe2
[SPARK-23035][SQL] Fix improper information of TempTableAlreadyExists…
xubo245 Jan 15, 2018
1ef99cf
Merge branch 'master' into rk/merge-upstream
Jan 15, 2018
b8380fd
Fixes
Jan 15, 2018
277425d
need to write as int96
Jan 23, 2018
d37f4ee
we're smaller?
Jan 23, 2018
fab5e5a
no mesos and hive thrift
Jan 23, 2018
bd4056d
change test
Jan 24, 2018
2cf672d
correct assert
Jan 24, 2018
ab93a53
Update ParquetInteroperabilitySuite.scala
robert3005 Jan 25, 2018
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
5 changes: 3 additions & 2 deletions .travis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -55,8 +55,9 @@ notifications:
# 5. Run maven build before running lints.
install:
- export MAVEN_SKIP_RC=1
- build/mvn ${PHASE} ${PROFILES} ${MODULES} ${ARGS}
# 6. Run lints.
- build/mvn -T 4 -q -DskipTests -Pkubernetes -Pmesos -Pyarn -Pkinesis-asl -Phive -Phive-thriftserver install

# 6. Run lint-java.
script:
- dev/lint-java
- dev/lint-scala
6 changes: 6 additions & 0 deletions NOTICE
Original file line number Diff line number Diff line change
Expand Up @@ -448,6 +448,12 @@ Copyright (C) 2011 Google Inc.
Apache Commons Pool
Copyright 1999-2009 The Apache Software Foundation

This product includes/uses Kubernetes & OpenShift 3 Java Client (https://github.com/fabric8io/kubernetes-client)
Copyright (C) 2015 Red Hat, Inc.

This product includes/uses OkHttp (https://github.com/square/okhttp)
Copyright (C) 2012 The Android Open Source Project

=========================================================================
== NOTICE file corresponding to section 4(d) of the Apache License, ==
== Version 2.0, in this case for the DataNucleus distribution. ==
Expand Down
3 changes: 2 additions & 1 deletion R/pkg/DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
Package: SparkR
Type: Package
Version: 2.3.0
Version: 2.4.0
Title: R Frontend for Apache Spark
Description: Provides an R Frontend for Apache Spark.
Authors@R: c(person("Shivaram", "Venkataraman", role = c("aut", "cre"),
Expand Down Expand Up @@ -59,3 +59,4 @@ Collate:
'window.R'
RoxygenNote: 6.0.1
VignetteBuilder: knitr
NeedsCompilation: no
9 changes: 8 additions & 1 deletion R/pkg/NAMESPACE
Original file line number Diff line number Diff line change
Expand Up @@ -76,7 +76,9 @@ exportMethods("glm",
export("setJobGroup",
"clearJobGroup",
"cancelJobGroup",
"setJobDescription")
"setJobDescription",
"setLocalProperty",
"getLocalProperty")

# Export Utility methods
export("setLogLevel")
Expand Down Expand Up @@ -133,6 +135,7 @@ exportMethods("arrange",
"isStreaming",
"join",
"limit",
"localCheckpoint",
"merge",
"mutate",
"na.omit",
Expand Down Expand Up @@ -176,6 +179,7 @@ exportMethods("arrange",
"with",
"withColumn",
"withColumnRenamed",
"withWatermark",
"write.df",
"write.jdbc",
"write.json",
Expand Down Expand Up @@ -225,11 +229,14 @@ exportMethods("%<=>%",
"crc32",
"create_array",
"create_map",
"current_date",
"current_timestamp",
"hash",
"cume_dist",
"date_add",
"date_format",
"date_sub",
"date_trunc",
"datediff",
"dayofmonth",
"dayofweek",
Expand Down
141 changes: 131 additions & 10 deletions R/pkg/R/DataFrame.R
Original file line number Diff line number Diff line change
Expand Up @@ -2297,6 +2297,7 @@ setClassUnion("characterOrColumn", c("character", "Column"))
#' @param ... additional sorting fields
#' @param decreasing a logical argument indicating sorting order for columns when
#' a character vector is specified for col
#' @param withinPartitions a logical argument indicating whether to sort only within each partition
#' @return A SparkDataFrame where all elements are sorted.
#' @family SparkDataFrame functions
#' @aliases arrange,SparkDataFrame,Column-method
Expand All @@ -2312,16 +2313,21 @@ setClassUnion("characterOrColumn", c("character", "Column"))
#' arrange(df, asc(df$col1), desc(abs(df$col2)))
#' arrange(df, "col1", decreasing = TRUE)
#' arrange(df, "col1", "col2", decreasing = c(TRUE, FALSE))
#' arrange(df, "col1", "col2", withinPartitions = TRUE)
#' }
#' @note arrange(SparkDataFrame, Column) since 1.4.0
setMethod("arrange",
signature(x = "SparkDataFrame", col = "Column"),
function(x, col, ...) {
function(x, col, ..., withinPartitions = FALSE) {
jcols <- lapply(list(col, ...), function(c) {
c@jc
})

sdf <- callJMethod(x@sdf, "sort", jcols)
if (withinPartitions) {
sdf <- callJMethod(x@sdf, "sortWithinPartitions", jcols)
} else {
sdf <- callJMethod(x@sdf, "sort", jcols)
}
dataFrame(sdf)
})

Expand All @@ -2332,7 +2338,7 @@ setMethod("arrange",
#' @note arrange(SparkDataFrame, character) since 1.4.0
setMethod("arrange",
signature(x = "SparkDataFrame", col = "character"),
function(x, col, ..., decreasing = FALSE) {
function(x, col, ..., decreasing = FALSE, withinPartitions = FALSE) {

# all sorting columns
by <- list(col, ...)
Expand All @@ -2356,7 +2362,7 @@ setMethod("arrange",
}
})

do.call("arrange", c(x, jcols))
do.call("arrange", c(x, jcols, withinPartitions = withinPartitions))
})

#' @rdname arrange
Expand Down Expand Up @@ -3048,10 +3054,10 @@ setMethod("describe",
#' \item stddev
#' \item min
#' \item max
#' \item arbitrary approximate percentiles specified as a percentage (eg, "75%")
#' \item arbitrary approximate percentiles specified as a percentage (eg, "75\%")
#' }
#' If no statistics are given, this function computes count, mean, stddev, min,
#' approximate quartiles (percentiles at 25%, 50%, and 75%), and max.
#' approximate quartiles (percentiles at 25\%, 50\%, and 75\%), and max.
#' This function is meant for exploratory data analysis, as we make no guarantee about the
#' backward compatibility of the schema of the resulting Dataset. If you want to
#' programmatically compute summary statistics, use the \code{agg} function instead.
Expand Down Expand Up @@ -3655,7 +3661,8 @@ setMethod("getNumPartitions",
#' isStreaming
#'
#' Returns TRUE if this SparkDataFrame contains one or more sources that continuously return data
#' as it arrives.
#' as it arrives. A dataset that reads data from a streaming source must be executed as a
#' \code{StreamingQuery} using \code{write.stream}.
#'
#' @param x A SparkDataFrame
#' @return TRUE if this SparkDataFrame is from a streaming source
Expand Down Expand Up @@ -3701,7 +3708,17 @@ setMethod("isStreaming",
#' @param df a streaming SparkDataFrame.
#' @param source a name for external data source.
#' @param outputMode one of 'append', 'complete', 'update'.
#' @param ... additional argument(s) passed to the method.
#' @param partitionBy a name or a list of names of columns to partition the output by on the file
#' system. If specified, the output is laid out on the file system similar to Hive's
#' partitioning scheme.
#' @param trigger.processingTime a processing time interval as a string, e.g. '5 seconds',
#' '1 minute'. This is a trigger that runs a query periodically based on the processing
#' time. If value is '0 seconds', the query will run as fast as possible, this is the
#' default. Only one trigger can be set.
#' @param trigger.once a logical, must be set to \code{TRUE}. This is a trigger that processes only
#' one batch of data in a streaming query then terminates the query. Only one trigger can be
#' set.
#' @param ... additional external data source specific named options.
#'
#' @family SparkDataFrame functions
#' @seealso \link{read.stream}
Expand All @@ -3719,7 +3736,8 @@ setMethod("isStreaming",
#' # console
#' q <- write.stream(wordCounts, "console", outputMode = "complete")
#' # text stream
#' q <- write.stream(df, "text", path = "/home/user/out", checkpointLocation = "/home/user/cp")
#' q <- write.stream(df, "text", path = "/home/user/out", checkpointLocation = "/home/user/cp"
#' partitionBy = c("year", "month"), trigger.processingTime = "30 seconds")
#' # memory stream
#' q <- write.stream(wordCounts, "memory", queryName = "outs", outputMode = "complete")
#' head(sql("SELECT * from outs"))
Expand All @@ -3731,7 +3749,8 @@ setMethod("isStreaming",
#' @note experimental
setMethod("write.stream",
signature(df = "SparkDataFrame"),
function(df, source = NULL, outputMode = NULL, ...) {
function(df, source = NULL, outputMode = NULL, partitionBy = NULL,
trigger.processingTime = NULL, trigger.once = NULL, ...) {
if (!is.null(source) && !is.character(source)) {
stop("source should be character, NULL or omitted. It is the data source specified ",
"in 'spark.sql.sources.default' configuration by default.")
Expand All @@ -3742,12 +3761,43 @@ setMethod("write.stream",
if (is.null(source)) {
source <- getDefaultSqlSource()
}
cols <- NULL
if (!is.null(partitionBy)) {
if (!all(sapply(partitionBy, function(c) { is.character(c) }))) {
stop("All partitionBy column names should be characters.")
}
cols <- as.list(partitionBy)
}
jtrigger <- NULL
if (!is.null(trigger.processingTime) && !is.na(trigger.processingTime)) {
if (!is.null(trigger.once)) {
stop("Multiple triggers not allowed.")
}
interval <- as.character(trigger.processingTime)
if (nchar(interval) == 0) {
stop("Value for trigger.processingTime must be a non-empty string.")
}
jtrigger <- handledCallJStatic("org.apache.spark.sql.streaming.Trigger",
"ProcessingTime",
interval)
} else if (!is.null(trigger.once) && !is.na(trigger.once)) {
if (!is.logical(trigger.once) || !trigger.once) {
stop("Value for trigger.once must be TRUE.")
}
jtrigger <- callJStatic("org.apache.spark.sql.streaming.Trigger", "Once")
}
options <- varargsToStrEnv(...)
write <- handledCallJMethod(df@sdf, "writeStream")
write <- callJMethod(write, "format", source)
if (!is.null(outputMode)) {
write <- callJMethod(write, "outputMode", outputMode)
}
if (!is.null(cols)) {
write <- callJMethod(write, "partitionBy", cols)
}
if (!is.null(jtrigger)) {
write <- callJMethod(write, "trigger", jtrigger)
}
write <- callJMethod(write, "options", options)
ssq <- handledCallJMethod(write, "start")
streamingQuery(ssq)
Expand Down Expand Up @@ -3782,6 +3832,33 @@ setMethod("checkpoint",
dataFrame(df)
})

#' localCheckpoint
#'
#' Returns a locally checkpointed version of this SparkDataFrame. Checkpointing can be used to
#' truncate the logical plan, which is especially useful in iterative algorithms where the plan
#' may grow exponentially. Local checkpoints are stored in the executors using the caching
#' subsystem and therefore they are not reliable.
#'
#' @param x A SparkDataFrame
#' @param eager whether to locally checkpoint this SparkDataFrame immediately
#' @return a new locally checkpointed SparkDataFrame
#' @family SparkDataFrame functions
#' @aliases localCheckpoint,SparkDataFrame-method
#' @rdname localCheckpoint
#' @name localCheckpoint
#' @export
#' @examples
#'\dontrun{
#' df <- localCheckpoint(df)
#' }
#' @note localCheckpoint since 2.3.0
setMethod("localCheckpoint",
signature(x = "SparkDataFrame"),
function(x, eager = TRUE) {
df <- callJMethod(x@sdf, "localCheckpoint", as.logical(eager))
dataFrame(df)
})

#' cube
#'
#' Create a multi-dimensional cube for the SparkDataFrame using the specified columns.
Expand Down Expand Up @@ -3934,3 +4011,47 @@ setMethod("broadcast",
sdf <- callJStatic("org.apache.spark.sql.functions", "broadcast", x@sdf)
dataFrame(sdf)
})

#' withWatermark
#'
#' Defines an event time watermark for this streaming SparkDataFrame. A watermark tracks a point in
#' time before which we assume no more late data is going to arrive.
#'
#' Spark will use this watermark for several purposes:
#' \itemize{
#' \item To know when a given time window aggregation can be finalized and thus can be emitted
#' when using output modes that do not allow updates.
#' \item To minimize the amount of state that we need to keep for on-going aggregations.
#' }
#' The current watermark is computed by looking at the \code{MAX(eventTime)} seen across
#' all of the partitions in the query minus a user specified \code{delayThreshold}. Due to the cost
#' of coordinating this value across partitions, the actual watermark used is only guaranteed
#' to be at least \code{delayThreshold} behind the actual event time. In some cases we may still
#' process records that arrive more than \code{delayThreshold} late.
#'
#' @param x a streaming SparkDataFrame
#' @param eventTime a string specifying the name of the Column that contains the event time of the
#' row.
#' @param delayThreshold a string specifying the minimum delay to wait to data to arrive late,
#' relative to the latest record that has been processed in the form of an
#' interval (e.g. "1 minute" or "5 hours"). NOTE: This should not be negative.
#' @return a SparkDataFrame.
#' @aliases withWatermark,SparkDataFrame,character,character-method
#' @family SparkDataFrame functions
#' @rdname withWatermark
#' @name withWatermark
#' @export
#' @examples
#' \dontrun{
#' sparkR.session()
#' schema <- structType(structField("time", "timestamp"), structField("value", "double"))
#' df <- read.stream("json", path = jsonDir, schema = schema, maxFilesPerTrigger = 1)
#' df <- withWatermark(df, "time", "10 minutes")
#' }
#' @note withWatermark since 2.3.0
setMethod("withWatermark",
signature(x = "SparkDataFrame", eventTime = "character", delayThreshold = "character"),
function(x, eventTime, delayThreshold) {
sdf <- callJMethod(x@sdf, "withWatermark", eventTime, delayThreshold)
dataFrame(sdf)
})
4 changes: 3 additions & 1 deletion R/pkg/R/SQLContext.R
Original file line number Diff line number Diff line change
Expand Up @@ -727,7 +727,9 @@ read.jdbc <- function(url, tableName,
#' @param schema The data schema defined in structType or a DDL-formatted string, this is
#' required for file-based streaming data source
#' @param ... additional external data source specific named options, for instance \code{path} for
#' file-based streaming data source
#' file-based streaming data source. \code{timeZone} to indicate a timezone to be used to
#' parse timestamps in the JSON/CSV data sources or partition values; If it isn't set, it
#' uses the default value, session local timezone.
#' @return SparkDataFrame
#' @rdname read.stream
#' @name read.stream
Expand Down
Loading