Skip to content

Commit 9262a72

Browse files
committed
update
1 parent b32f21a commit 9262a72

File tree

15,178 files changed

+1770735
-14
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

15,178 files changed

+1770735
-14
lines changed

_site/CONTRIBUTING.md

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,16 @@
1+
## Contributing to Spark
2+
3+
*Before opening a pull request*, review the
4+
[Contributing to Spark wiki](https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark).
5+
It lists steps that are required before creating a PR. In particular, consider:
6+
7+
- Is the change important and ready enough to ask the community to spend time reviewing?
8+
- Have you searched for existing, related JIRAs and pull requests?
9+
- Is this a new feature that can stand alone as a package on http://spark-packages.org ?
10+
- Is the change being proposed clearly explained and motivated?
11+
12+
When you contribute code, you affirm that the contribution is your original work and that you
13+
license the work to the project under the project's open source license. Whether or not you
14+
state this explicitly, by submitting any copyrighted material via pull request, email, or
15+
other means you agree to license the material under the project's open source license and
16+
warrant that you have the legal authority to do so.

_site/LICENSE

Lines changed: 906 additions & 0 deletions
Large diffs are not rendered by default.

_site/NOTICE

Lines changed: 574 additions & 0 deletions
Large diffs are not rendered by default.

_site/R/DOCUMENTATION.md

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,12 @@
1+
# SparkR Documentation
2+
3+
SparkR documentation is generated using in-source comments annotated using using
4+
`roxygen2`. After making changes to the documentation, to generate man pages,
5+
you can run the following from an R console in the SparkR home directory
6+
7+
library(devtools)
8+
devtools::document(pkg="./pkg", roclets=c("rd"))
9+
10+
You can verify if your changes are good by running
11+
12+
R CMD check pkg/

_site/R/README.md

Lines changed: 67 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,67 @@
1+
# R on Spark
2+
3+
SparkR is an R package that provides a light-weight frontend to use Spark from R.
4+
5+
### SparkR development
6+
7+
#### Build Spark
8+
9+
Build Spark with [Maven](http://spark.apache.org/docs/latest/building-spark.html#building-with-buildmvn) and include the `-PsparkR` profile to build the R package. For example to use the default Hadoop versions you can run
10+
```
11+
build/mvn -DskipTests -Psparkr package
12+
```
13+
14+
#### Running sparkR
15+
16+
You can start using SparkR by launching the SparkR shell with
17+
18+
./bin/sparkR
19+
20+
The `sparkR` script automatically creates a SparkContext with Spark by default in
21+
local mode. To specify the Spark master of a cluster for the automatically created
22+
SparkContext, you can run
23+
24+
./bin/sparkR --master "local[2]"
25+
26+
To set other options like driver memory, executor memory etc. you can pass in the [spark-submit](http://spark.apache.org/docs/latest/submitting-applications.html) arguments to `./bin/sparkR`
27+
28+
#### Using SparkR from RStudio
29+
30+
If you wish to use SparkR from RStudio or other R frontends you will need to set some environment variables which point SparkR to your Spark installation. For example
31+
```
32+
# Set this to where Spark is installed
33+
Sys.setenv(SPARK_HOME="/Users/shivaram/spark")
34+
# This line loads SparkR from the installed directory
35+
.libPaths(c(file.path(Sys.getenv("SPARK_HOME"), "R", "lib"), .libPaths()))
36+
library(SparkR)
37+
sc <- sparkR.init(master="local")
38+
```
39+
40+
#### Making changes to SparkR
41+
42+
The [instructions](https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark) for making contributions to Spark also apply to SparkR.
43+
If you only make R file changes (i.e. no Scala changes) then you can just re-install the R package using `R/install-dev.sh` and test your changes.
44+
Once you have made your changes, please include unit tests for them and run existing unit tests using the `run-tests.sh` script as described below.
45+
46+
#### Generating documentation
47+
48+
The SparkR documentation (Rd files and HTML files) are not a part of the source repository. To generate them you can run the script `R/create-docs.sh`. This script uses `devtools` and `knitr` to generate the docs and these packages need to be installed on the machine before using the script.
49+
50+
### Examples, Unit tests
51+
52+
SparkR comes with several sample programs in the `examples/src/main/r` directory.
53+
To run one of them, use `./bin/sparkR <filename> <args>`. For example:
54+
55+
./bin/sparkR examples/src/main/r/dataframe.R
56+
57+
You can also run the unit-tests for SparkR by running (you need to install the [testthat](http://cran.r-project.org/web/packages/testthat/index.html) package first):
58+
59+
R -e 'install.packages("testthat", repos="http://cran.us.r-project.org")'
60+
./R/run-tests.sh
61+
62+
### Running on YARN
63+
The `./bin/spark-submit` and `./bin/sparkR` can also be used to submit jobs to YARN clusters. You will need to set YARN conf dir before doing so. For example on CDH you can run
64+
```
65+
export YARN_CONF_DIR=/etc/hadoop/conf
66+
./bin/spark-submit --master yarn examples/src/main/r/dataframe.R
67+
```

_site/R/WINDOWS.md

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
## Building SparkR on Windows
2+
3+
To build SparkR on Windows, the following steps are required
4+
5+
1. Install R (>= 3.1) and [Rtools](http://cran.r-project.org/bin/windows/Rtools/). Make sure to
6+
include Rtools and R in `PATH`.
7+
2. Install
8+
[JDK7](http://www.oracle.com/technetwork/java/javase/downloads/jdk7-downloads-1880260.html) and set
9+
`JAVA_HOME` in the system environment variables.
10+
3. Download and install [Maven](http://maven.apache.org/download.html). Also include the `bin`
11+
directory in Maven in `PATH`.
12+
4. Set `MAVEN_OPTS` as described in [Building Spark](http://spark.apache.org/docs/latest/building-spark.html).
13+
5. Open a command shell (`cmd`) in the Spark directory and run `mvn -DskipTests -Psparkr package`

_site/R/create-docs.sh

Lines changed: 46 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,46 @@
1+
#!/bin/bash
2+
3+
#
4+
# Licensed to the Apache Software Foundation (ASF) under one or more
5+
# contributor license agreements. See the NOTICE file distributed with
6+
# this work for additional information regarding copyright ownership.
7+
# The ASF licenses this file to You under the Apache License, Version 2.0
8+
# (the "License"); you may not use this file except in compliance with
9+
# the License. You may obtain a copy of the License at
10+
#
11+
# http://www.apache.org/licenses/LICENSE-2.0
12+
#
13+
# Unless required by applicable law or agreed to in writing, software
14+
# distributed under the License is distributed on an "AS IS" BASIS,
15+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
16+
# See the License for the specific language governing permissions and
17+
# limitations under the License.
18+
#
19+
20+
# Script to create API docs for SparkR
21+
# This requires `devtools` and `knitr` to be installed on the machine.
22+
23+
# After running this script the html docs can be found in
24+
# $SPARK_HOME/R/pkg/html
25+
26+
# Figure out where the script is
27+
export FWDIR="$(cd "`dirname "$0"`"; pwd)"
28+
pushd $FWDIR
29+
30+
# Generate Rd file
31+
Rscript -e 'library(devtools); devtools::document(pkg="./pkg", roclets=c("rd"))'
32+
33+
# Install the package
34+
./install-dev.sh
35+
36+
# Now create HTML files
37+
38+
# knit_rd puts html in current working directory
39+
mkdir -p pkg/html
40+
pushd pkg/html
41+
42+
Rscript -e 'library(SparkR, lib.loc="../../lib"); library(knitr); knit_rd("SparkR")'
43+
44+
popd
45+
46+
popd

_site/R/install-dev.bat

Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,27 @@
1+
@echo off
2+
3+
rem
4+
rem Licensed to the Apache Software Foundation (ASF) under one or more
5+
rem contributor license agreements. See the NOTICE file distributed with
6+
rem this work for additional information regarding copyright ownership.
7+
rem The ASF licenses this file to You under the Apache License, Version 2.0
8+
rem (the "License"); you may not use this file except in compliance with
9+
rem the License. You may obtain a copy of the License at
10+
rem
11+
rem http://www.apache.org/licenses/LICENSE-2.0
12+
rem
13+
rem Unless required by applicable law or agreed to in writing, software
14+
rem distributed under the License is distributed on an "AS IS" BASIS,
15+
rem WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
16+
rem See the License for the specific language governing permissions and
17+
rem limitations under the License.
18+
rem
19+
20+
rem Install development version of SparkR
21+
rem
22+
23+
set SPARK_HOME=%~dp0..
24+
25+
MKDIR %SPARK_HOME%\R\lib
26+
27+
R.exe CMD INSTALL --library="%SPARK_HOME%\R\lib" %SPARK_HOME%\R\pkg\

_site/R/install-dev.sh

Lines changed: 36 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,36 @@
1+
#!/bin/bash
2+
3+
#
4+
# Licensed to the Apache Software Foundation (ASF) under one or more
5+
# contributor license agreements. See the NOTICE file distributed with
6+
# this work for additional information regarding copyright ownership.
7+
# The ASF licenses this file to You under the Apache License, Version 2.0
8+
# (the "License"); you may not use this file except in compliance with
9+
# the License. You may obtain a copy of the License at
10+
#
11+
# http://www.apache.org/licenses/LICENSE-2.0
12+
#
13+
# Unless required by applicable law or agreed to in writing, software
14+
# distributed under the License is distributed on an "AS IS" BASIS,
15+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
16+
# See the License for the specific language governing permissions and
17+
# limitations under the License.
18+
#
19+
20+
# This scripts packages the SparkR source files (R and C files) and
21+
# creates a package that can be loaded in R. The package is by default installed to
22+
# $FWDIR/lib and the package can be loaded by using the following command in R:
23+
#
24+
# library(SparkR, lib.loc="$FWDIR/lib")
25+
#
26+
# NOTE(shivaram): Right now we use $SPARK_HOME/R/lib to be the installation directory
27+
# to load the SparkR package on the worker nodes.
28+
29+
30+
FWDIR="$(cd `dirname $0`; pwd)"
31+
LIB_DIR="$FWDIR/lib"
32+
33+
mkdir -p $LIB_DIR
34+
35+
# Install R
36+
R CMD INSTALL --library=$LIB_DIR $FWDIR/pkg/

_site/R/log4j.properties

Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,28 @@
1+
#
2+
# Licensed to the Apache Software Foundation (ASF) under one or more
3+
# contributor license agreements. See the NOTICE file distributed with
4+
# this work for additional information regarding copyright ownership.
5+
# The ASF licenses this file to You under the Apache License, Version 2.0
6+
# (the "License"); you may not use this file except in compliance with
7+
# the License. You may obtain a copy of the License at
8+
#
9+
# http://www.apache.org/licenses/LICENSE-2.0
10+
#
11+
# Unless required by applicable law or agreed to in writing, software
12+
# distributed under the License is distributed on an "AS IS" BASIS,
13+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
14+
# See the License for the specific language governing permissions and
15+
# limitations under the License.
16+
#
17+
18+
# Set everything to be logged to the file target/unit-tests.log
19+
log4j.rootCategory=INFO, file
20+
log4j.appender.file=org.apache.log4j.FileAppender
21+
log4j.appender.file.append=true
22+
log4j.appender.file.file=R-unit-tests.log
23+
log4j.appender.file.layout=org.apache.log4j.PatternLayout
24+
log4j.appender.file.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss.SSS} %t %p %c{1}: %m%n
25+
26+
# Ignore messages below warning level from Jetty, because it's a bit verbose
27+
log4j.logger.org.eclipse.jetty=WARN
28+
org.eclipse.jetty.LEVEL=WARN

0 commit comments

Comments
 (0)