Skip to content

Commit 78ff03d

Browse files
authored
Merge pull request #110 from fe2s/spark-shell-docs-2
#109: document spark-shell and pyspark configuration parameters
2 parents bab0584 + 82b33cf commit 78ff03d

File tree

3 files changed

+51
-21
lines changed

3 files changed

+51
-21
lines changed

doc/dataframe.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -75,7 +75,7 @@ It is used by spark-redis internally when reading DataFrame back to Spark memory
7575

7676
### Specifying Redis key
7777

78-
By default, spark-redis generates UUID identifier for each row to ensure
78+
By default spark-redis generates UUID identifier for each row to ensure
7979
their uniqueness. However, you can also provide your own column as a key. This is controlled with `key.column` option:
8080

8181
```scala
@@ -157,7 +157,7 @@ df.write
157157

158158
### Persistence model
159159

160-
By default, DataFrames are persisted as Redis Hashes. It allows to write data with Spark and query from non-Spark environment.
160+
By default DataFrames are persisted as Redis Hashes. It allows to write data with Spark and query from non-Spark environment.
161161
It also enables projection query optimization when only a small subset of columns are selected. On the other hand, there is currently
162162
a limitation with Hash model - it doesn't support nested DataFrame schema. One option to overcome it is making your DataFrame schema flat.
163163
If it is not possible due to some constraints, you may consider using Binary persistence model.

doc/getting-started.md

Lines changed: 27 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -20,25 +20,21 @@ cd spark-redis
2020
mvn clean package -DskipTests
2121
```
2222

23-
## Using the library
24-
Add Spark-Redis to Spark with the `--jars` command line option. For example, use it from spark-shell, include it in the following manner:
23+
### Using the library with spark shell
24+
Add Spark-Redis to Spark with the `--jars` command line option.
2525

26-
```
26+
```bash
2727
$ bin/spark-shell --jars <path-to>/spark-redis-<version>-jar-with-dependencies.jar
28+
```
2829

29-
Welcome to
30-
____ __
31-
/ __/__ ___ _____/ /__
32-
_\ \/ _ \/ _ `/ __/ '_/
33-
/___/ .__/\_,_/_/ /_/\_\ version 2.3.1
34-
/_/
30+
By default it connects to `localhost:6379` without any password, you can change the connection settings in the following manner:
3531

36-
Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_101)
32+
```bash
33+
$ bin/spark-shell --jars <path-to>/spark-redis-<version>-jar-with-dependencies.jar --conf "spark.redis.host=localhost" --conf "spark.redis.port=6379" --conf "spark.redis.auth=passwd"
3734
```
3835

39-
The following sections contain code snippets that demonstrate the use of Spark-Redis. To use the sample code, you'll need to replace `your.redis.server` and `6379` with your Redis database's IP address or hostname and port, respectively.
4036

41-
### Configuring Connections to Redis using SparkConf
37+
### Configuring connection to Redis in a self-contained application
4238

4339
Below is an example configuration of SparkContext with redis configuration:
4440

@@ -47,21 +43,33 @@ import com.redislabs.provider.redis._
4743

4844
...
4945

50-
sc = new SparkContext(new SparkConf()
46+
val sc = new SparkContext(new SparkConf()
5147
.setMaster("local")
5248
.setAppName("myApp")
53-
5449
// initial redis host - can be any node in cluster mode
5550
.set("spark.redis.host", "localhost")
56-
5751
// initial redis port
5852
.set("spark.redis.port", "6379")
59-
6053
// optional redis AUTH password
61-
.set("spark.redis.auth", "")
54+
.set("spark.redis.auth", "passwd")
6255
)
6356
```
6457

58+
The SparkSession can be configured in a similar manner:
59+
60+
```scala
61+
val spark = SparkSession
62+
.builder()
63+
.appName("myApp")
64+
.master("local[*]")
65+
.config("spark.redis.host", "localhost")
66+
.config("spark.redis.port", "6379")
67+
.config("spark.redis.auth", "passwd")
68+
.getOrCreate()
69+
70+
val sc = spark.sparkContext
71+
```
72+
6573
### Create RDD
6674

6775
```scala
@@ -83,6 +91,8 @@ df.write
8391
### Create Stream
8492

8593
```scala
94+
import com.redislabs.provider.redis._
95+
8696
val ssc = new StreamingContext(sc, Seconds(1))
8797
val redisStream = ssc.createRedisStream(Array("foo", "bar"),
8898
storageLevel = StorageLevel.MEMORY_AND_DISK_2)

doc/python.md

Lines changed: 22 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -8,9 +8,16 @@ Here is an example:
88
1. Run `pyspark` providing the spark-redis jar file
99

1010
```bash
11-
$ ./bin/pyspark --jars /your/path/to/spark-redis-<version>-jar-with-dependencies.jar
11+
$ ./bin/pyspark --jars <path-to>/spark-redis-<version>-jar-with-dependencies.jar
1212
```
1313

14+
By default it connects to `localhost:6379` without any password, you can change the connection settings in the following manner:
15+
16+
```bash
17+
$ bin/spark-shell --jars <path-to>/spark-redis-<version>-jar-with-dependencies.jar --conf "spark.redis.host=localhost" --conf "spark.redis.port=6379" --conf "spark.redis.auth=passwd"
18+
```
19+
20+
1421
2. Read DataFrame from json, write/read from Redis:
1522
```python
1623
df = spark.read.json("examples/src/main/resources/people.json")
@@ -19,7 +26,7 @@ loadedDf = spark.read.format("org.apache.spark.sql.redis").option("table", "peop
1926
loadedDf.show()
2027
```
2128

22-
2. Check the data with redis-cli:
29+
3. Check the data with redis-cli:
2330

2431
```bash
2532
127.0.0.1:6379> hgetall people:Justin
@@ -29,3 +36,16 @@ loadedDf.show()
2936
4) "Justin"
3037
```
3138

39+
The self-contained application can be configured in the following manner:
40+
41+
```python
42+
SparkSession\
43+
.builder\
44+
.appName("myApp")\
45+
.config("spark.redis.host", "localhost")\
46+
.config("spark.redis.port", "6379")\
47+
.config("spark.redis.auth", "passwd")\
48+
.getOrCreate()
49+
```
50+
51+

0 commit comments

Comments
 (0)