You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In order to load the keys back, you also need to specify
106
+
the key column parameter while reading
107
+
108
+
```scala
109
+
valdf= spark.read
110
+
.format("org.apache.spark.sql.redis")
111
+
.option("table", "person")
112
+
.option("key.column", "name")
113
+
.load()
114
+
```
115
+
96
116
### Save Modes
97
117
98
118
Spark-redis supports all DataFrame [SaveMode](https://spark.apache.org/docs/latest/sql-programming-guide.html#save-modes)'s: `Append`,
@@ -142,8 +162,8 @@ It also enables projection query optimization when only a small subset of column
142
162
a limitation with Hash model - it doesn't support nested DataFrame schema. One option to overcome it is making your DataFrame schema flat.
143
163
If it is not possible due to some constraints, you may consider using Binary persistence model.
144
164
145
-
With the Binary persistence model the DataFrame row is serialized into a byte array and stored as a string in Redis. This implies that
146
-
storage model is private to spark-redis library and data cannot be easily queried from non-Spark environments. Another drawback
165
+
With the Binary persistence model the DataFrame row is serialized into a byte array and stored as a string in Redis (the default Java Serialization is used).
166
+
This implies that storage model is private to spark-redis library and data cannot be easily queried from non-Spark environments. Another drawback
147
167
of Binary model is a larger memory footprint.
148
168
149
169
To enable Binary model use `option("model", "binary")`, e.g.
@@ -171,17 +191,18 @@ There are two options how you can read a DataFrame:
171
191
To read a previously saved DataFrame, specify the table name that was used for saving. Example:
If they `key.column` option was used for writing, then it should be also used for reading table back. See [Specifying Redis key](#specifying-redis-key) for details.
241
+
219
242
To read with a Spark SQL:
220
243
221
244
```scala
@@ -230,22 +253,63 @@ val loadedDf = spark.sql(s"SELECT * FROM person")
230
253
231
254
To read Redis Hashes you have to provide keys pattern with `.option("keys.pattern", keysPattern)` option. The DataFrame schema should be explicitly specified or can be inferred from a random row.
232
255
233
-
An example of explicit schema:
256
+
```bash
257
+
hset person:1 name John age 30
258
+
hset person:2 name Peter age 45
259
+
```
260
+
261
+
An example of providing an explicit schema and specifying `key.column`:
234
262
235
263
```scala
236
-
valdf= spark.read
237
-
.format("org.apache.spark.sql.redis")
238
-
.schema(
239
-
StructType(Array(
240
-
StructField("name", StringType),
241
-
StructField("age", IntegerType))
242
-
)
243
-
)
244
-
.option("keys.pattern", "person:*")
245
-
.load()
264
+
valdf= spark.read
265
+
.format("org.apache.spark.sql.redis")
266
+
.schema(
267
+
StructType(Array(
268
+
StructField("id", IntegerType),
269
+
StructField("name", StringType),
270
+
StructField("age", IntegerType))
271
+
)
272
+
)
273
+
.option("keys.pattern", "person:*")
274
+
.option("key.column", "id")
275
+
.load()
276
+
277
+
df.show()
246
278
```
247
279
248
-
Another option is to let spark-redis automatically infer schema based on a random row. In this case all columns will have `String` type. Example:
280
+
```bash
281
+
+---+-----+---+
282
+
| id| name|age|
283
+
+---+-----+---+
284
+
| 1| John| 30|
285
+
| 2|Peter| 45|
286
+
+---+-----+---+
287
+
```
288
+
289
+
Spark-Redis tries to extract the key based on the key pattern:
290
+
- if the pattern ends with `*` and it's the only wildcard, the trailing substring will be extracted
291
+
- otherwise there is no extraction - the key is kept as is, e.g.
292
+
293
+
```scala
294
+
valdf=// code omitted...
295
+
.option("keys.pattern", "p*:*")
296
+
.option("key.column", "id")
297
+
.load()
298
+
df.show()
299
+
```
300
+
301
+
```bash
302
+
+-----+---+------------+
303
+
| name|age| id|
304
+
+-----+---+------------+
305
+
|John|30| person:John|
306
+
|Peter|45|person:Peter|
307
+
+-----+---+------------+
308
+
```
309
+
310
+
Another option is to let spark-redis automatically infer schema based on a random row. Inthiscase all columns will have `String` type.
311
+
Also we don't specify `key.column` option in this example, so the column `_id` will be created.
| model | defines Redis model used to persist DataFrame, see [Persistence model](#persistence-model)|`enum [binary, hash]`|`hash`|
272
338
| partitions.number | number of partitions (applies only when reading dataframe) |`Int`|`3`|
273
-
| key.column |specify unique column used as a Redis key, by default a key is auto-generated|`String`| - |
339
+
| key.column |when writing - specifies unique column used as a Redis key, by default a key is auto-generated. <br/> When reading - specifies column name to store hash key|`String`| - |
274
340
| ttl | data time to live in `seconds`. Data doesn't expire if `ttl` is less than `1`|`Int`|`0`|
275
341
| infer.schema | infer schema from random row, all columns will have `String` type |`Boolean`|`false`|
276
-
| max.pipeline.size | maximum number of commands per pipeline (used to batch commands) |`Int`|10000|
277
-
| scan.count | count option of SCAN command (used to iterate over keys) |`Int`|10000|
342
+
| max.pipeline.size | maximum number of commands per pipeline (used to batch commands) |`Int`|100 |
343
+
| scan.count | count option of SCAN command (used to iterate over keys) |`Int`|100 |
278
344
279
345
280
346
## Known limitations
281
347
282
-
- Nested DataFrame fields are not currently supported with Hash model. Consider making DataFrame schema flat or using Binary persistence model.
348
+
- Nested DataFrame fields are not currently supported with Hash model. Consider making DataFrame schema flat or using Binary persistence model.
0 commit comments