RedisLabs
diff --git a/‎.travis.yml
Lines changed: 4 additions & 3 deletions b/‎.travis.yml
Lines changed: 4 additions & 3 deletions
diff --git a/‎doc/streaming.md
Lines changed: 113 additions & 4 deletions b/‎doc/streaming.md
Lines changed: 113 additions & 4 deletions
diff --git a/‎pom.xml
Lines changed: 16 additions & 2 deletions b/‎pom.xml
Lines changed: 16 additions & 2 deletions
diff --git a/‎src/main/scala/com/redislabs/provider/redis/RedisConfig.scala
Lines changed: 2 additions & 1 deletion b/‎src/main/scala/com/redislabs/provider/redis/RedisConfig.scala
Lines changed: 2 additions & 1 deletion
diff --git a/‎src/main/scala/com/redislabs/provider/redis/rdd/RedisRDD.scala
Lines changed: 3 additions & 3 deletions b/‎src/main/scala/com/redislabs/provider/redis/rdd/RedisRDD.scala
Lines changed: 3 additions & 3 deletions
diff --git a/‎src/main/scala/com/redislabs/provider/redis/redisFunctions.scala
Lines changed: 10 additions & 1 deletion b/‎src/main/scala/com/redislabs/provider/redis/redisFunctions.scala
Lines changed: 10 additions & 1 deletion
diff --git a/‎src/main/scala/com/redislabs/provider/redis/streaming/RedisInputDStream.scala
Lines changed: 3 additions & 0 deletions b/‎src/main/scala/com/redislabs/provider/redis/streaming/RedisInputDStream.scala
Lines changed: 3 additions & 0 deletions
@@ -2,10 +2,11 @@ sudo: required
 language: scala
 scala:
     - 2.11.2
-before_install:
-    - git clone https://github.com/antirez/redis.git redis_for_spark-redis_test || true
 install:
-    - make -C redis_for_spark-redis_test -j4
+    - wget http://download.redis.io/releases/redis-5.0.1.tar.gz
+    - tar -xzvf redis-5.0.1.tar.gz
+    - make -C redis-5.0.1 -j4
+    - export PATH=$PWD/redis-5.0.1/src:$PATH
 script: make test
 cache:
     directories:
 
@@ -1,6 +1,115 @@
 ### Streaming
-Spark-Redis support streaming data from Redis instance/cluster, currently streaming data are fetched from Redis' List by the `blpop` command. Users are required to provide an array which stores all the List names they are interested in. The [storageLevel](http://spark.apache.org/docs/latest/streaming-programming-guide.html#data-serialization) is `MEMORY_AND_DISK_SER_2` by default, you can change it on your demand.
-`createRedisStream` will create a `(listName, value)` stream, but if you don't care about which list feeds the value, you can use `createRedisStreamWithoutListname` to get the only `value` stream.
+
+Spark-Redis supports streaming data from Stream and List data structures:
+
+  - [Redis Stream](#redis-stream)
+  - [Redis List](#redis-list)
+
+
+## Redis Stream
+
+To stream data from [Redis Stream](https://redis.io/topics/streams-intro) use `createRedisXStream` method:
+
+```scala
+import com.redislabs.provider.redis._
+import com.redislabs.provider.redis.streaming.{ConsumerConfig, StreamItem}
+import org.apache.spark.sql.SparkSession
+import org.apache.spark.streaming.dstream.InputDStream
+import org.apache.spark.streaming.{Seconds, StreamingContext}
+
+val spark = SparkSession.builder.appName("Redis Stream Example")
+  .master("local[*]")
+  .config("spark.redis.host", "localhost")
+  .config("spark.redis.port", "6379")
+  .getOrCreate()
+
+val ssc = new StreamingContext(spark.sparkContext, Seconds(1))
+
+val stream = ssc.createRedisXStream(Seq(ConsumerConfig("my-stream", "my-consumer-group", "my-consumer-1")))
+stream.print()
+
+ssc.start()
+ssc.awaitTermination()
+
+```
+
+It will automatically create a consumer group if it doesn't exist and will start listening for the messages in the stream. 
+
+### Stream Offset
+
+By default it pulls messages starting from the latest message. If you need to start from the earliest message or any specific position in the stream, specify the `offset` parameter:
+
+```scala
+ConsumerConfig("my-stream", "my-consumer-group", "my-consumer-1", offset = Earliest) // start from '0-0'
+ConsumerConfig("my-stream", "my-consumer-group", "my-consumer-1", IdOffset(42, 0))   // start from '42-0'
+```
+
+Please note, spark-redis will attempt to create a consumer group with the specified offset, but if the consumer group already exists, 
+it will use the existing offset. It means, for example, if you decide to re-process all the messages from the beginning, 
+just changing the offset to `Earliest` may not be enough. You may need to either manually delete the consumer 
+group with `XGROUP DESTROY` or modify the offset with `XGROUP SETID`.
+
+### Receiver reliability
+
+The DStream is implemented with a [Reliable Receiver](https://spark.apache.org/docs/latest/streaming-custom-receivers.html#receiver-reliability) that acknowledges 
+after the data has been stored in Spark. As with any other Receiver to achieve strong fault-tolerance guarantees and ensure zero data loss, you have to enable [write-ahead logs](https://spark.apache.org/docs/latest/streaming-programming-guide.html#deploying-applications) and checkpointing. 
+
+The received data is stored with `StorageLevel.MEMORY_AND_DISK_2` by default. 
+Storage level can be configured with `storageLevel` parameter, e.g.:
+```scala
+ssc.createRedisXStream(conf, storageLevel = StorageLevel.MEMORY_AND_DISK_SER_2)
+```
+
+### Level of Parallelism
+
+The `createRedisXStream()` takes a sequence of consumer configs, each consumer is started in a separate thread. This allows you, for example, to
+create a stream from multiple Redis Stream keys:
+
+```scala
+ssc.createRedisXStream(Seq(
+  ConsumerConfig("my-stream-1", "my-consumer-group-1", "my-consumer-1"),
+  ConsumerConfig("my-stream-2", "my-consumer-group-2", "my-consumer-1")
+))
+```
+
+In this example we created an input DStream that corresponds to a single receiver running in a Spark executor. The receiver will create two threads pulling 
+data from the streams in parallel. However if the data receiving becomes a bottleneck, you may want to start multiple receivers in different executors (worker machines).
+This can be achieved by creating multiple input DStreams and `union` them together. You can read more about about it [here](https://spark.apache.org/docs/latest/streaming-programming-guide.html#level-of-parallelism-in-data-receiving).
+
+For example, the following will create two receivers pulling the data from `my-stream` and balancing the load:  
+
+```scala
+val streams = Seq(
+  ssc.createRedisXStream(Seq(ConsumerConfig("my-stream", "my-consumer-group", "my-consumer-1"))),
+  ssc.createRedisXStream(Seq(ConsumerConfig("my-stream", "my-consumer-group", "my-consumer-2")))
+)
+
+val stream = ssc.union(streams)
+stream.print()
+```
+
+### Configuration
+
+If the cluster resources is not large enough to process data as fast as it is being received, the receiving rate can be limited:
+
+```scala
+ConsumerConfig("stream", "group", "c-1", rateLimitPerConsumer = Some(100)) // 100 items per second
+```
+
+It defines the number of received items per second per consumer.
+
+Another options you can configure are `batchSize` and `block`. They define the maximum number of pulled items and time in milliseconds to wait in a `XREADGROUP` call. 
+
+```scala
+ConsumerConfig("stream", "group", "c-1", batchSize = 50, block = 200)
+```
+
+
+## Redis List
+
+The stream can be also created from Redis' List, the data is fetched with the `blpop` command. Users are required to provide an array which stores all the List names they are interested in. The [storageLevel](http://spark.apache.org/docs/latest/streaming-programming-guide.html#data-serialization) is `MEMORY_AND_DISK_SER_2` by default, you can change it on your demand.
+
+The method `createRedisStream` will create a `(listName, value)` stream, but if you don't care about which list feeds the value, you can use `createRedisStreamWithoutListname` to get the only `value` stream.
 
 Use the following to get a `(listName, value)` stream from `foo` and `bar` list
 
@@ -10,7 +119,7 @@ import org.apache.spark.storage.StorageLevel
 import com.redislabs.provider.redis._
 val ssc = new StreamingContext(sc, Seconds(1))
 val redisStream = ssc.createRedisStream(Array("foo", "bar"), storageLevel = StorageLevel.MEMORY_AND_DISK_2)
-redisStream.print
+redisStream.print()
 ssc.awaitTermination()
 ```
 
@@ -23,6 +132,6 @@ import org.apache.spark.storage.StorageLevel
 import com.redislabs.provider.redis._
 val ssc = new StreamingContext(sc, Seconds(1))
 val redisStream = ssc.createRedisStreamWithoutListname(Array("foo", "bar"), storageLevel = StorageLevel.MEMORY_AND_DISK_2)
-redisStream.print
+redisStream.print()
 ssc.awaitTermination()
 ```
@@ -49,7 +49,7 @@
 		<java.version>1.8</java.version>
 		<scala.major.version>2.11</scala.major.version>
 		<scala.complete.version>${scala.major.version}.12</scala.complete.version>
-		<jedis.version>2.9.0</jedis.version>
+		<jedis.version>3.0.0-20181113.105826-9</jedis.version>
 		<spark.version>2.3.1</spark.version>
 		<plugins.scalatest.version>1.0</plugins.scalatest.version>
 	</properties>
@@ -65,6 +65,20 @@
   		</repository>
 	</distributionManagement>
 
+	<!-- TODO: temporal to get jedis SNAPSHOT -->
+	<repositories>
+		<repository>
+			<id>oss.sonatype.org-snapshot</id>
+			<url>http://oss.sonatype.org/content/repositories/snapshots</url>
+			<releases>
+				<enabled>false</enabled>
+			</releases>
+			<snapshots>
+				<enabled>true</enabled>
+			</snapshots>
+		</repository>
+	</repositories>
+
 	<build>
 		<plugins>
 			<plugin>
@@ -258,7 +272,7 @@
             <version>2.0</version>
         </dependency>
         <dependency>
-            <groupId>redis.clients</groupId>
+            <groupId>com.redislabs</groupId>
             <artifactId>jedis</artifactId>
             <version>${jedis.version}</version>
             <type>jar</type>
 
@@ -3,8 +3,9 @@ package com.redislabs.provider.redis
 import java.net.URI
 
 import org.apache.spark.SparkConf
+import redis.clients.jedis.util.{JedisClusterCRC16, JedisURIHelper, SafeEncoder}
 import redis.clients.jedis.{Jedis, Protocol}
-import redis.clients.util.{JedisURIHelper, SafeEncoder, JedisClusterCRC16}
+
 import scala.collection.JavaConversions._
 
 
 
@@ -7,8 +7,8 @@ import com.redislabs.provider.redis.util.PipelineUtils.mapWithPipeline
 import com.redislabs.provider.redis.{ReadWriteConfig, RedisConfig, RedisNode}
 import org.apache.spark._
 import org.apache.spark.rdd.RDD
-import redis.clients.jedis._
-import redis.clients.util.JedisClusterCRC16
+import redis.clients.jedis.{Jedis, ScanParams}
+import redis.clients.jedis.util.JedisClusterCRC16
 
 import scala.collection.JavaConversions._
 import scala.reflect.{ClassTag, classTag}
@@ -408,7 +408,7 @@ trait Keys {
     do {
       val scan = jedis.scan(cursor, params)
       keys.addAll(scan.getResult)
-      cursor = scan.getStringCursor
+      cursor = scan.getCursor
     } while (cursor != "0")
     keys
   }
 
@@ -1,12 +1,13 @@
 package com.redislabs.provider.redis
 
-import com.redislabs.provider.redis.streaming.RedisInputDStream
+import com.redislabs.provider.redis.streaming.{ConsumerConfig, RedisInputDStream, RedisStreamReceiver, StreamItem}
 import org.apache.spark.SparkContext
 import org.apache.spark.rdd.RDD
 import com.redislabs.provider.redis.rdd._
 import com.redislabs.provider.redis.util.PipelineUtils._
 import org.apache.spark.storage.StorageLevel
 import org.apache.spark.streaming.StreamingContext
+import org.apache.spark.streaming.dstream.InputDStream
 
 /**
   * RedisContext extends sparkContext's functionality with redis functions
@@ -456,6 +457,14 @@ class RedisStreamingContext(@transient val ssc: StreamingContext) extends Serial
                                       (implicit redisConfig: RedisConfig = RedisConfig.fromSparkConf(ssc.sparkContext.getConf)) = {
     new RedisInputDStream(ssc, keys, storageLevel, redisConfig, classOf[String])
   }
+
+  def createRedisXStream(consumersConfig: Seq[ConsumerConfig],
+                         storageLevel: StorageLevel = StorageLevel.MEMORY_AND_DISK_2)
+                        (implicit redisConfig: RedisConfig = RedisConfig.fromSparkConf(ssc.sparkContext.getConf)): InputDStream[StreamItem] = {
+    val readWriteConfig = ReadWriteConfig.fromSparkConf(ssc.sparkContext.getConf)
+    val receiver = new RedisStreamReceiver(consumersConfig, redisConfig, readWriteConfig, storageLevel)
+    ssc.receiverStream(receiver)
+  }
 }
 
 trait RedisFunctions {
 
@@ -12,6 +12,9 @@ import redis.clients.jedis._
 import scala.reflect.{ClassTag, classTag}
 import scala.util.control.NonFatal
 
+/**
+  * Receives messages from Redis List
+  */
 class RedisInputDStream[T: ClassTag](_ssc: StreamingContext,
                                      keys: Array[String],
                                      storageLevel: StorageLevel,