-
Notifications
You must be signed in to change notification settings - Fork 28.7k
[SPARK-12244][SPARK-12245][STREAMING] Rename trackStateByKey to mapWithState and change tracking function signature #10224
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Test build #47426 has finished for PR 10224 at commit
|
Could you search |
yeah. I am doing that. I havent updated the tests yet. |
@@ -867,8 +865,8 @@ public void testTrackStateByAPI() { | |||
JavaPairRDD<String, Boolean> initialRDD = null; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: the method name testTrackStateByAPI
should be renamed to testMapWithStateAPI
* } | ||
* | ||
* val spec = StateSpec.function(trackingFunction).numPartitions(10) | ||
* val spec = StateSpec.function(mappingFunction).numPartitions(10) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same issue as above
Test build #47449 has finished for PR 10224 at commit
|
Test build #47447 has finished for PR 10224 at commit
|
* JavaTrackStateDStream<Integer, Integer, Integer, String> trackStateDStream = | ||
* keyValueDStream.<Integer, String>trackStateByKey( | ||
* StateSpec.function(trackStateFunc).numPartitions(10)); | ||
* JavaMapWithStateDStream<Integer, Integer, Integer, String> mapWithStateDStream = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: the key type is String
now.
LGTM except some nits |
Test build #2191 has finished for PR 10224 at commit
|
Test build #47460 has finished for PR 10224 at commit
|
Test build #47467 has finished for PR 10224 at commit
|
LGTM |
merging to master and 1.6 |
…thState and change tracking function signature SPARK-12244: Based on feedback from early users and personal experience attempting to explain it, the name trackStateByKey had two problem. "trackState" is a completely new term which really does not give any intuition on what the operation is the resultant data stream of objects returned by the function is called in docs as the "emitted" data for the lack of a better. "mapWithState" makes sense because the API is like a mapping function like (Key, Value) => T with State as an additional parameter. The resultant data stream is "mapped data". So both problems are solved. SPARK-12245: From initial experiences, not having the key in the function makes it hard to return mapped stuff, as the whole information of the records is not there. Basically the user is restricted to doing something like mapValue() instead of map(). So adding the key as a parameter. Author: Tathagata Das <[email protected]> Closes #10224 from tdas/rename.
SPARK-12244:
Based on feedback from early users and personal experience attempting to explain it, the name trackStateByKey had two problem.
"trackState" is a completely new term which really does not give any intuition on what the operation is
the resultant data stream of objects returned by the function is called in docs as the "emitted" data for the lack of a better.
"mapWithState" makes sense because the API is like a mapping function like (Key, Value) => T with State as an additional parameter. The resultant data stream is "mapped data". So both problems are solved.
SPARK-12245:
From initial experiences, not having the key in the function makes it hard to return mapped stuff, as the whole information of the records is not there. Basically the user is restricted to doing something like mapValue() instead of map(). So adding the key as a parameter.