Skip to content

Commit 7019a8a

Browse files
committed
Merge branch 'master' of https://github.com/apache/spark into feature/SPARK-6568-2
2 parents c7ba6a7 + b5c51c8 commit 7019a8a

File tree

123 files changed

+12822
-239
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

123 files changed

+12822
-239
lines changed

.gitignore

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -63,6 +63,8 @@ ec2/lib/
6363
rat-results.txt
6464
scalastyle.txt
6565
scalastyle-output.xml
66+
R-unit-tests.log
67+
R/unit-tests.out
6668

6769
# For Hive
6870
metastore_db/

.rat-excludes

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -67,3 +67,5 @@ logs
6767
.*scalastyle-output.xml
6868
.*dependency-reduced-pom.xml
6969
known_translations
70+
DESCRIPTION
71+
NAMESPACE

R/.gitignore

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
*.o
2+
*.so
3+
*.Rd
4+
lib
5+
pkg/man
6+
pkg/html

R/DOCUMENTATION.md

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,12 @@
1+
# SparkR Documentation
2+
3+
SparkR documentation is generated using in-source comments annotated using using
4+
`roxygen2`. After making changes to the documentation, to generate man pages,
5+
you can run the following from an R console in the SparkR home directory
6+
7+
library(devtools)
8+
devtools::document(pkg="./pkg", roclets=c("rd"))
9+
10+
You can verify if your changes are good by running
11+
12+
R CMD check pkg/

R/README.md

Lines changed: 67 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,67 @@
1+
# R on Spark
2+
3+
SparkR is an R package that provides a light-weight frontend to use Spark from R.
4+
5+
### SparkR development
6+
7+
#### Build Spark
8+
9+
Build Spark with [Maven](http://spark.apache.org/docs/latest/building-spark.html#building-with-buildmvn) and include the `-PsparkR` profile to build the R package. For example to use the default Hadoop versions you can run
10+
```
11+
build/mvn -DskipTests -Psparkr package
12+
```
13+
14+
#### Running sparkR
15+
16+
You can start using SparkR by launching the SparkR shell with
17+
18+
./bin/sparkR
19+
20+
The `sparkR` script automatically creates a SparkContext with Spark by default in
21+
local mode. To specify the Spark master of a cluster for the automatically created
22+
SparkContext, you can run
23+
24+
./bin/sparkR --master "local[2]"
25+
26+
To set other options like driver memory, executor memory etc. you can pass in the [spark-submit](http://spark.apache.org/docs/latest/submitting-applications.html) arguments to `./bin/sparkR`
27+
28+
#### Using SparkR from RStudio
29+
30+
If you wish to use SparkR from RStudio or other R frontends you will need to set some environment variables which point SparkR to your Spark installation. For example
31+
```
32+
# Set this to where Spark is installed
33+
Sys.setenv(SPARK_HOME="/Users/shivaram/spark")
34+
# This line loads SparkR from the installed directory
35+
.libPaths(c(file.path(Sys.getenv("SPARK_HOME"), "R", "lib"), .libPaths()))
36+
library(SparkR)
37+
sc <- sparkR.init(master="local")
38+
```
39+
40+
#### Making changes to SparkR
41+
42+
The [instructions](https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark) for making contributions to Spark also apply to SparkR.
43+
If you only make R file changes (i.e. no Scala changes) then you can just re-install the R package using `R/install-dev.sh` and test your changes.
44+
Once you have made your changes, please include unit tests for them and run existing unit tests using the `run-tests.sh` script as described below.
45+
46+
#### Generating documentation
47+
48+
The SparkR documentation (Rd files and HTML files) are not a part of the source repository. To generate them you can run the script `R/create-docs.sh`. This script uses `devtools` and `knitr` to generate the docs and these packages need to be installed on the machine before using the script.
49+
50+
### Examples, Unit tests
51+
52+
SparkR comes with several sample programs in the `examples/src/main/r` directory.
53+
To run one of them, use `./bin/sparkR <filename> <args>`. For example:
54+
55+
./bin/sparkR examples/src/main/r/pi.R local[2]
56+
57+
You can also run the unit-tests for SparkR by running (you need to install the [testthat](http://cran.r-project.org/web/packages/testthat/index.html) package first):
58+
59+
R -e 'install.packages("testthat", repos="http://cran.us.r-project.org")'
60+
./R/run-tests.sh
61+
62+
### Running on YARN
63+
The `./bin/spark-submit` and `./bin/sparkR` can also be used to submit jobs to YARN clusters. You will need to set YARN conf dir before doing so. For example on CDH you can run
64+
```
65+
export YARN_CONF_DIR=/etc/hadoop/conf
66+
./bin/spark-submit --master yarn examples/src/main/r/pi.R 4
67+
```

R/WINDOWS.md

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
## Building SparkR on Windows
2+
3+
To build SparkR on Windows, the following steps are required
4+
5+
1. Install R (>= 3.1) and [Rtools](http://cran.r-project.org/bin/windows/Rtools/). Make sure to
6+
include Rtools and R in `PATH`.
7+
2. Install
8+
[JDK7](http://www.oracle.com/technetwork/java/javase/downloads/jdk7-downloads-1880260.html) and set
9+
`JAVA_HOME` in the system environment variables.
10+
3. Download and install [Maven](http://maven.apache.org/download.html). Also include the `bin`
11+
directory in Maven in `PATH`.
12+
4. Set `MAVEN_OPTS` as described in [Building Spark](http://spark.apache.org/docs/latest/building-spark.html).
13+
5. Open a command shell (`cmd`) in the Spark directory and run `mvn -DskipTests -Psparkr package`

R/create-docs.sh

Lines changed: 46 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,46 @@
1+
#!/bin/bash
2+
3+
#
4+
# Licensed to the Apache Software Foundation (ASF) under one or more
5+
# contributor license agreements. See the NOTICE file distributed with
6+
# this work for additional information regarding copyright ownership.
7+
# The ASF licenses this file to You under the Apache License, Version 2.0
8+
# (the "License"); you may not use this file except in compliance with
9+
# the License. You may obtain a copy of the License at
10+
#
11+
# http://www.apache.org/licenses/LICENSE-2.0
12+
#
13+
# Unless required by applicable law or agreed to in writing, software
14+
# distributed under the License is distributed on an "AS IS" BASIS,
15+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
16+
# See the License for the specific language governing permissions and
17+
# limitations under the License.
18+
#
19+
20+
# Script to create API docs for SparkR
21+
# This requires `devtools` and `knitr` to be installed on the machine.
22+
23+
# After running this script the html docs can be found in
24+
# $SPARK_HOME/R/pkg/html
25+
26+
# Figure out where the script is
27+
export FWDIR="$(cd "`dirname "$0"`"; pwd)"
28+
pushd $FWDIR
29+
30+
# Generate Rd file
31+
Rscript -e 'library(devtools); devtools::document(pkg="./pkg", roclets=c("rd"))'
32+
33+
# Install the package
34+
./install-dev.sh
35+
36+
# Now create HTML files
37+
38+
# knit_rd puts html in current working directory
39+
mkdir -p pkg/html
40+
pushd pkg/html
41+
42+
Rscript -e 'library(SparkR, lib.loc="../../lib"); library(knitr); knit_rd("SparkR")'
43+
44+
popd
45+
46+
popd

R/install-dev.bat

Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,27 @@
1+
@echo off
2+
3+
rem
4+
rem Licensed to the Apache Software Foundation (ASF) under one or more
5+
rem contributor license agreements. See the NOTICE file distributed with
6+
rem this work for additional information regarding copyright ownership.
7+
rem The ASF licenses this file to You under the Apache License, Version 2.0
8+
rem (the "License"); you may not use this file except in compliance with
9+
rem the License. You may obtain a copy of the License at
10+
rem
11+
rem http://www.apache.org/licenses/LICENSE-2.0
12+
rem
13+
rem Unless required by applicable law or agreed to in writing, software
14+
rem distributed under the License is distributed on an "AS IS" BASIS,
15+
rem WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
16+
rem See the License for the specific language governing permissions and
17+
rem limitations under the License.
18+
rem
19+
20+
rem Install development version of SparkR
21+
rem
22+
23+
set SPARK_HOME=%~dp0..
24+
25+
MKDIR %SPARK_HOME%\R\lib
26+
27+
R.exe CMD INSTALL --library="%SPARK_HOME%\R\lib" %SPARK_HOME%\R\pkg\

R/install-dev.sh

Lines changed: 36 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,36 @@
1+
#!/bin/bash
2+
3+
#
4+
# Licensed to the Apache Software Foundation (ASF) under one or more
5+
# contributor license agreements. See the NOTICE file distributed with
6+
# this work for additional information regarding copyright ownership.
7+
# The ASF licenses this file to You under the Apache License, Version 2.0
8+
# (the "License"); you may not use this file except in compliance with
9+
# the License. You may obtain a copy of the License at
10+
#
11+
# http://www.apache.org/licenses/LICENSE-2.0
12+
#
13+
# Unless required by applicable law or agreed to in writing, software
14+
# distributed under the License is distributed on an "AS IS" BASIS,
15+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
16+
# See the License for the specific language governing permissions and
17+
# limitations under the License.
18+
#
19+
20+
# This scripts packages the SparkR source files (R and C files) and
21+
# creates a package that can be loaded in R. The package is by default installed to
22+
# $FWDIR/lib and the package can be loaded by using the following command in R:
23+
#
24+
# library(SparkR, lib.loc="$FWDIR/lib")
25+
#
26+
# NOTE(shivaram): Right now we use $SPARK_HOME/R/lib to be the installation directory
27+
# to load the SparkR package on the worker nodes.
28+
29+
30+
FWDIR="$(cd `dirname $0`; pwd)"
31+
LIB_DIR="$FWDIR/lib"
32+
33+
mkdir -p $LIB_DIR
34+
35+
# Install R
36+
R CMD INSTALL --library=$LIB_DIR $FWDIR/pkg/

R/log4j.properties

Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,28 @@
1+
#
2+
# Licensed to the Apache Software Foundation (ASF) under one or more
3+
# contributor license agreements. See the NOTICE file distributed with
4+
# this work for additional information regarding copyright ownership.
5+
# The ASF licenses this file to You under the Apache License, Version 2.0
6+
# (the "License"); you may not use this file except in compliance with
7+
# the License. You may obtain a copy of the License at
8+
#
9+
# http://www.apache.org/licenses/LICENSE-2.0
10+
#
11+
# Unless required by applicable law or agreed to in writing, software
12+
# distributed under the License is distributed on an "AS IS" BASIS,
13+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
14+
# See the License for the specific language governing permissions and
15+
# limitations under the License.
16+
#
17+
18+
# Set everything to be logged to the file target/unit-tests.log
19+
log4j.rootCategory=INFO, file
20+
log4j.appender.file=org.apache.log4j.FileAppender
21+
log4j.appender.file.append=true
22+
log4j.appender.file.file=R-unit-tests.log
23+
log4j.appender.file.layout=org.apache.log4j.PatternLayout
24+
log4j.appender.file.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss.SSS} %t %p %c{1}: %m%n
25+
26+
# Ignore messages below warning level from Jetty, because it's a bit verbose
27+
log4j.logger.org.eclipse.jetty=WARN
28+
org.eclipse.jetty.LEVEL=WARN

0 commit comments

Comments
 (0)