Skip to content

Commit bef0591

Browse files
mvogiatzistdas
authored andcommitted
[DOCS] Added important updateStateByKey details
Runs for *all* existing keys and returning "None" will remove the key-value pair. Author: Michael Vogiatzis <[email protected]> Closes #7229 from mvogiatzis/patch-1 and squashes the following commits: e7a2946 [Michael Vogiatzis] Updated updateStateByKey text 00283ed [Michael Vogiatzis] Removed space c2656f9 [Michael Vogiatzis] Moved description farther up 0a42551 [Michael Vogiatzis] Added important updateStateByKey details (cherry picked from commit d538919) Signed-off-by: Tathagata Das <[email protected]>
1 parent 2f2f9da commit bef0591

File tree

1 file changed

+2
-0
lines changed

1 file changed

+2
-0
lines changed

docs/streaming-programming-guide.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -854,6 +854,8 @@ it with new information. To use this, you will have to do two steps.
854854
1. Define the state update function - Specify with a function how to update the state using the
855855
previous state and the new values from an input stream.
856856

857+
In every batch, Spark will apply the state update function for all existing keys, regardless of whether they have new data in a batch or not. If the update function returns `None` then the key-value pair will be eliminated.
858+
857859
Let's illustrate this with an example. Say you want to maintain a running count of each word
858860
seen in a text data stream. Here, the running count is the state and it is an integer. We
859861
define the update function as:

0 commit comments

Comments
 (0)