Skip to content

Commit eddeb31

Browse files
authored
Merge branch 'master' into branch-2.4
2 parents 4e1ab9e + e48c569 commit eddeb31

34 files changed

+811
-70
lines changed

.travis.yml

Lines changed: 12 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1,13 +1,18 @@
11
sudo: required
22
language: scala
3-
scala:
4-
- 2.11.2
3+
jdk:
4+
- openjdk8
55
install:
6-
- wget http://download.redis.io/releases/redis-5.0.1.tar.gz
7-
- tar -xzvf redis-5.0.1.tar.gz
8-
- make -C redis-5.0.1 -j4
9-
- export PATH=$PWD/redis-5.0.1/src:$PATH
10-
script: make test
6+
- wget http://download.redis.io/releases/redis-6.0.3.tar.gz
7+
- tar -xzvf redis-6.0.3.tar.gz
8+
- make -C redis-6.0.3 -j4 BUILD_TLS=yes
9+
- export PATH=$PWD/redis-6.0.3/src:$PATH
10+
script:
11+
- make test # test with scala 2.11
12+
- sleep 5s # let redis exit gracefully (we use kill, not kill -9 in makefile)
13+
- ps aux | grep redis
14+
- ./dev/change-scala-version.sh 2.12 # switch to scala 2.12
15+
- make test # test with scala 2.12
1116
cache:
1217
directories:
1318
- $HOME/.m2

Makefile

Lines changed: 22 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,23 @@ appendonly no
99
requirepass passwd
1010
endef
1111

12+
# STANDALONE REDIS NODE WITH SSL
13+
define REDIS_STANDALONE_NODE_CONF_SSL
14+
daemonize yes
15+
port 0
16+
pidfile /tmp/redis_standalone_node__ssl_for_spark-redis.pid
17+
logfile /tmp/redis_standalone_node_ssl_for_spark-redis.log
18+
save ""
19+
appendonly no
20+
requirepass passwd
21+
tls-auth-clients no
22+
tls-port 6380
23+
tls-cert-file ./src/test/resources/tls/redis.crt
24+
tls-key-file ./src/test/resources/tls/redis.key
25+
tls-ca-cert-file ./src/test/resources/tls/ca.crt
26+
tls-dh-params-file ./src/test/resources/tls/redis.dh
27+
endef
28+
1229
# CLUSTER REDIS NODES
1330
define REDIS_CLUSTER_NODE1_CONF
1431
daemonize yes
@@ -44,12 +61,14 @@ cluster-config-file /tmp/redis_cluster_node3_for_spark-redis.conf
4461
endef
4562

4663
export REDIS_STANDALONE_NODE_CONF
64+
export REDIS_STANDALONE_NODE_CONF_SSL
4765
export REDIS_CLUSTER_NODE1_CONF
4866
export REDIS_CLUSTER_NODE2_CONF
4967
export REDIS_CLUSTER_NODE3_CONF
5068

5169
start-standalone:
5270
echo "$$REDIS_STANDALONE_NODE_CONF" | redis-server -
71+
echo "$$REDIS_STANDALONE_NODE_CONF_SSL" | redis-server -
5372

5473

5574
start-cluster:
@@ -72,7 +91,8 @@ start:
7291

7392
stop-standalone:
7493
kill `cat /tmp/redis_standalone_node_for_spark-redis.pid`
75-
94+
kill `cat /tmp/redis_standalone_node__ssl_for_spark-redis.pid`
95+
7696
stop-cluster:
7797
kill `cat /tmp/redis_cluster_node1_for_spark-redis.pid` || true
7898
kill `cat /tmp/redis_cluster_node2_for_spark-redis.pid` || true
@@ -93,7 +113,7 @@ test:
93113
make start
94114
# with --batch-mode maven doesn't print 'Progress: 125/150kB', the progress lines take up 90% of the log and causes
95115
# Travis build to fail with 'The job exceeded the maximum log length, and has been terminated'
96-
mvn clean test -B
116+
mvn clean test -B -DargLine="-Djavax.net.ssl.trustStorePassword=password -Djavax.net.ssl.trustStore=./src/test/resources/tls/clientkeystore -Djavax.net.ssl.trustStoreType=jceks"
97117
make stop
98118

99119
benchmark:

README.md

Lines changed: 10 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,9 @@
11
[![license](https://img.shields.io/github/license/RedisLabs/spark-redis.svg)](https://github.com/RedisLabs/spark-redis)
22
[![GitHub issues](https://img.shields.io/github/release/RedisLabs/spark-redis.svg)](https://github.com/RedisLabs/spark-redis/releases/latest)
33
[![Build Status](https://travis-ci.org/RedisLabs/spark-redis.svg)](https://travis-ci.org/RedisLabs/spark-redis)
4-
[![Maven Central](https://maven-badges.herokuapp.com/maven-central/com.redislabs/spark-redis/badge.svg)](https://maven-badges.herokuapp.com/maven-central/com.redislabs/spark-redis)
5-
[![Javadocs](https://www.javadoc.io/badge/com.redislabs/spark-redis.svg)](https://www.javadoc.io/doc/com.redislabs/spark-redis)
4+
[![Maven Central](https://maven-badges.herokuapp.com/maven-central/com.redislabs/spark-redis_2.11/badge.svg)](https://maven-badges.herokuapp.com/maven-central/com.redislabs/spark-redis_2.11)
5+
[![Javadocs](https://www.javadoc.io/badge/com.redislabs/spark-redis_2.11.svg)](https://www.javadoc.io/doc/com.redislabs/spark-redis_2.11)
6+
[![Gitter](https://badges.gitter.im/RedisLabs/spark-redis.svg)](https://gitter.im/RedisLabs/spark-redis?utm_source=badge&utm_medium=badge&utm_campaign=pr-badge)
67
<!--[![Codecov](https://codecov.io/gh/RedisLabs/spark-redis/branch/master/graph/badge.svg)](https://codecov.io/gh/RedisLabs/spark-redis)-->
78

89
# Spark-Redis
@@ -19,11 +20,11 @@ Spark-Redis also supports Spark Streaming (DStreams) and Structured Streaming.
1920
The library has several branches, each corresponds to a different supported Spark version. For example, 'branch-2.3' works with any Spark 2.3.x version.
2021
The master branch contains the recent development for the next release.
2122

22-
| Spark-Redis | Spark | Redis | Supported Scala Versions |
23-
| ----------- | ------------- | ---------------- | ------------------------ |
24-
| 2.4 | 2.4.x | >=2.9.0 | 2.11 |
25-
| 2.3 | 2.3.x | >=2.9.0 | 2.11 |
26-
| 1.4 | 1.4.x | | 2.10 |
23+
| Spark-Redis | Spark | Redis | Supported Scala Versions |
24+
| ----------------------------------------------------------------| ------------- | ---------------- | ------------------------ |
25+
| [2.4](https://github.com/RedisLabs/spark-redis/tree/branch-2.4) | 2.4.x | >=2.9.0 | 2.11, 2.12 |
26+
| [2.3](https://github.com/RedisLabs/spark-redis/tree/branch-2.3) | 2.3.x | >=2.9.0 | 2.11 |
27+
| [1.4](https://github.com/RedisLabs/spark-redis/tree/branch-1.4) | 1.4.x | | 2.10 |
2728

2829

2930
## Known limitations
@@ -35,6 +36,8 @@ This library is a work in progress so the API may change before the official rel
3536

3637
## Documentation
3738

39+
Please make sure you use documentation from the correct branch ([2.4](https://github.com/RedisLabs/spark-redis/tree/branch-2.4#documentation), [2.3](https://github.com/RedisLabs/spark-redis/tree/branch-2.3#documentation), etc).
40+
3841
- [Getting Started](doc/getting-started.md)
3942
- [RDD](doc/rdd.md)
4043
- [Dataframe](doc/dataframe.md)

dev/change-scala-version.sh

Lines changed: 61 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,61 @@
1+
#!/usr/bin/env bash
2+
3+
set -e
4+
5+
VALID_VERSIONS=( 2.11 2.12 )
6+
7+
SCALA_211_MINOR_VERSION="12"
8+
SCALA_212_MINOR_VERSION="9"
9+
10+
usage() {
11+
echo "Usage: $(basename $0) [-h|--help] <version>
12+
where :
13+
-h| --help Display this help text
14+
valid version values : ${VALID_VERSIONS[*]}
15+
" 1>&2
16+
exit 1
17+
}
18+
19+
if [[ ($# -ne 1) || ( $1 == "--help") || $1 == "-h" ]]; then
20+
usage
21+
fi
22+
23+
TO_MAJOR_VERSION=$1
24+
25+
check_scala_version() {
26+
for i in ${VALID_VERSIONS[*]}; do [ $i = "$1" ] && return 0; done
27+
echo "Invalid Scala version: $1. Valid versions: ${VALID_VERSIONS[*]}" 1>&2
28+
exit 1
29+
}
30+
31+
check_scala_version "$TO_MAJOR_VERSION"
32+
33+
if [ $TO_MAJOR_VERSION = "2.12" ]; then
34+
FROM_MAJOR_VERSION="2.11"
35+
FROM_MINOR_VERSION=$SCALA_211_MINOR_VERSION
36+
TO_MINOR_VERSION=$SCALA_212_MINOR_VERSION
37+
else
38+
FROM_MAJOR_VERSION="2.12"
39+
FROM_MINOR_VERSION=$SCALA_212_MINOR_VERSION
40+
TO_MINOR_VERSION=$SCALA_211_MINOR_VERSION
41+
fi
42+
43+
sed_i() {
44+
sed -e "$1" "$2" > "$2.tmp" && mv "$2.tmp" "$2"
45+
}
46+
47+
export -f sed_i
48+
49+
# change <artifactId>
50+
BASEDIR=$(dirname $0)/..
51+
find "$BASEDIR" -name 'pom.xml' -not -path '*target*' -print \
52+
-exec bash -c "sed_i 's/\(artifactId.*\)_'$FROM_MAJOR_VERSION'/\1_'$TO_MAJOR_VERSION'/g' {}" \;
53+
54+
# change <scala.major.version>
55+
find "$BASEDIR" -name 'pom.xml' -not -path '*target*' -print \
56+
-exec bash -c "sed_i 's/\(<scala.major.version>\)'$FROM_MAJOR_VERSION'/\1'$TO_MAJOR_VERSION'/g' {}" \;
57+
58+
# change <scala.complete.version>
59+
find "$BASEDIR" -name 'pom.xml' -not -path '*target*' -print \
60+
-exec bash -c "sed_i 's/\(<scala.complete.version>.*\.\)'$FROM_MINOR_VERSION'/\1'$TO_MINOR_VERSION'/g' {}" \;
61+

doc/configuration.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,7 @@ topology from the initial node, so there is no need to provide the rest of the c
1010
* `spark.redis.timeout` - connection timeout in ms, 2000 ms by default
1111
* `spark.redis.max.pipeline.size` - the maximum number of commands per pipeline (used to batch commands). The default value is 100.
1212
* `spark.redis.scan.count` - count option of SCAN command (used to iterate over keys). The default value is 100.
13+
* `spark.redis.ssl` - set to true to use tls
1314

1415

1516

doc/dataframe.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -341,6 +341,11 @@ root
341341
| max.pipeline.size | maximum number of commands per pipeline (used to batch commands) | `Int` | 100 |
342342
| scan.count | count option of SCAN command (used to iterate over keys) | `Int` | 100 |
343343
| iterator.grouping.size | the number of items to be grouped when iterating over underlying RDD partition | `Int` | 1000 |
344+
| host | overrides `spark.redis.host` configured in SparkSession (if set, any other connection setting from SparkSession is ignored ) | `String` | `localhost` |
345+
| port | overrides `spark.redis.port` configured in SparkSession (if set, any other connection setting from SparkSession is ignored ) | `Int` | `6379` |
346+
| auth | overrides `spark.redis.auth` configured in SparkSession (if set, any other connection setting from SparkSession is ignored ) | `String` | - |
347+
| dbNum | overrides `spark.redis.db` configured in SparkSession (if set, any other connection setting from SparkSession is ignored ) | `Int` | `0` |
348+
| timeout | overrides `spark.redis.timeout` configured in SparkSession (if set, any other connection setting from SparkSession is ignored ) | `Int` | `2000` |
344349

345350

346351
## Known limitations

doc/dev.md

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -25,3 +25,12 @@ To build Spark-Redis skipping tests, run:
2525
```
2626
mvn clean package -DskipTests
2727
```
28+
29+
To change scala version use `./dev/change-scala-version.sh` script. It will change scala version in `pom.xml`. For example:
30+
```
31+
./dev/change-scala-version.sh 2.12
32+
```
33+
34+
```
35+
./dev/change-scala-version.sh 2.11
36+
```

doc/getting-started.md

Lines changed: 20 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -6,12 +6,30 @@
66
<dependencies>
77
<dependency>
88
<groupId>com.redislabs</groupId>
9-
<artifactId>spark-redis</artifactId>
10-
<version>2.4.0</version>
9+
<artifactId>spark-redis_2.11</artifactId>
10+
<version>2.4.2</version>
1111
</dependency>
1212
</dependencies>
1313
```
1414

15+
Or
16+
17+
```xml
18+
<dependencies>
19+
<dependency>
20+
<groupId>com.redislabs</groupId>
21+
<artifactId>spark-redis_2.12</artifactId>
22+
<version>2.4.2</version>
23+
</dependency>
24+
</dependencies>
25+
```
26+
27+
### SBT
28+
29+
```scala
30+
libraryDependencies += "com.redislabs" %% "spark-redis" % "2.4.2"
31+
```
32+
1533
### Build form source
1634
You can download the library's source and build it:
1735
```

doc/rdd.md

Lines changed: 65 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,4 @@
1+
12
# RDD
23

34
- [The keys RDD](#the-keys-rdd)
@@ -114,12 +115,35 @@ For String values, your RDD should consist of the key-value pairs that are to be
114115
sc.toRedisKV(stringRDD)
115116
```
116117

118+
In order to set an expiry on the key, we can pass in the `ttl` (in seconds) as an additional argument:
119+
120+
```scala
121+
sc.toRedisKV(stringRDD, ttl)
122+
```
123+
124+
By default, Strings won't have any expiry set.
125+
117126
#### Hashes
118127
To store a Redis Hash, the RDD should consist of its field-value pairs. If the RDD is called `hashRDD`, the following should be used for storing it in the key name specified by `hashName`:
119128

120129
```scala
121130
sc.toRedisHASH(hashRDD, hashName)
122131
```
132+
In order to set an expiry on the key, we can pass in the `ttl` (in seconds) as an additional argument:
133+
134+
```scala
135+
sc.toRedisHASH(hashRDD, hashName, ttl)
136+
```
137+
138+
By default, Hashes won't have any expiry set.
139+
140+
Use the following to store an RDD into multiple hashs:
141+
142+
```scala
143+
sc.toRedisHASHes(hashRDD, ttl)
144+
```
145+
146+
The `hashRDD` is a rdd of tuples (`hashname`, `map[field name, field value]`)
123147

124148
#### Lists
125149
Use the following to store an RDD in a Redis List:
@@ -137,6 +161,31 @@ sc.toRedisFixedLIST(listRDD, listName, listSize)
137161
The `listRDD` is an RDD that contains all of the list's string elements in order, and `listName` is the list's key name.
138162
`listSize` is an integer which specifies the size of the Redis list; it is optional, and will default to an unlimited size.
139163

164+
Use the following to store an RDD in multiple Redis Lists:
165+
166+
```scala
167+
sc.toRedisLISTs(rdd)
168+
```
169+
170+
The `rdd` is an RDD of tuples (`list name`, `list values`)
171+
172+
Use the following to store an RDD of binary values in multiple Redis Lists:
173+
174+
```scala
175+
sc.toRedisByteLISTs(byteListRDD)
176+
```
177+
178+
The `byteListRDD` is an RDD of tuples (`list name`, `list values`) represented as byte arrays.
179+
180+
Expiry can be set on Lists by passing in an additional argument called `ttl` (in seconds) to the above methods except `toRedisFixedLIST`:
181+
```scala
182+
sc.toRedisLIST(listRDD, listName, ttl)
183+
sc.toRedisLISTs(rdd, ttl)
184+
sc.toRedisByteLISTs(byteListRDD, ttl)
185+
```
186+
187+
By default, Lists won't have any expiry set.
188+
140189
#### Sets
141190
For storing data in a Redis Set, use `toRedisSET` as follows:
142191

@@ -146,13 +195,29 @@ sc.toRedisSET(setRDD, setName)
146195

147196
Where `setRDD` is an RDD with the set's string elements and `setName` is the name of the key for that set.
148197

198+
In order to set an expiry on the key, we can pass in the `ttl` (in seconds) as an additional argument:
199+
200+
```scala
201+
sc.toRedisSET(setRDD, setName, ttl)
202+
```
203+
204+
By default, Sets won't have any expiry set.
205+
149206
#### Sorted Sets
150207
```scala
151208
sc.toRedisZSET(zsetRDD, zsetName)
152209
```
153210

154211
The above example demonstrates storing data in Redis in a Sorted Set. The `zsetRDD` in the example should contain pairs of members and their scores, whereas `zsetName` is the name for that key.
155212

213+
In order to set an expiry on the key, we can pass in the `ttl` (in seconds) as an additional argument:
214+
215+
```scala
216+
sc.toRedisZSET(zsetRDD, zsetName, ttl)
217+
```
218+
219+
By default, Sorted Sets won't have any expiry set.
220+
156221
### Read and write configuration options
157222

158223
Some [configuration options](configuration.md) can be overridden for a particular RDD:

doc/streaming.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -119,6 +119,7 @@ import com.redislabs.provider.redis.streaming._
119119
val ssc = new StreamingContext(sc, Seconds(1))
120120
val redisStream = ssc.createRedisStream(Array("foo", "bar"), storageLevel = StorageLevel.MEMORY_AND_DISK_2)
121121
redisStream.print()
122+
ssc.start()
122123
ssc.awaitTermination()
123124
```
124125

@@ -132,5 +133,6 @@ import com.redislabs.provider.redis.streaming._
132133
val ssc = new StreamingContext(sc, Seconds(1))
133134
val redisStream = ssc.createRedisStreamWithoutListname(Array("foo", "bar"), storageLevel = StorageLevel.MEMORY_AND_DISK_2)
134135
redisStream.print()
136+
ssc.start()
135137
ssc.awaitTermination()
136138
```

0 commit comments

Comments
 (0)