Skip to content

[SPARK-12244][SPARK-12245][STREAMING] Rename trackStateByKey to mapWithState and change tracking function signature #10224

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 7 commits into from

Conversation

tdas
Copy link
Contributor

@tdas tdas commented Dec 9, 2015

SPARK-12244:

Based on feedback from early users and personal experience attempting to explain it, the name trackStateByKey had two problem.
"trackState" is a completely new term which really does not give any intuition on what the operation is
the resultant data stream of objects returned by the function is called in docs as the "emitted" data for the lack of a better.
"mapWithState" makes sense because the API is like a mapping function like (Key, Value) => T with State as an additional parameter. The resultant data stream is "mapped data". So both problems are solved.

SPARK-12245:

From initial experiences, not having the key in the function makes it hard to return mapped stuff, as the whole information of the records is not there. Basically the user is restricted to doing something like mapValue() instead of map(). So adding the key as a parameter.

@SparkQA
Copy link

SparkQA commented Dec 9, 2015

Test build #47426 has finished for PR 10224 at commit 68d1c64.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@tdas tdas changed the title [WIP][SPARK-??][STREAMING] Renamed trackStateByKey to mapWithState [WIP][SPARK-??][STREAMING] Rename trackStateByKey to mapWithState Dec 9, 2015
@tdas tdas changed the title [WIP][SPARK-??][STREAMING] Rename trackStateByKey to mapWithState [SPARK-12244][STREAMING] Rename trackStateByKey to mapWithState Dec 9, 2015
@zsxwing
Copy link
Member

zsxwing commented Dec 9, 2015

Could you search trackState in the project and fix other places?

@tdas
Copy link
Contributor Author

tdas commented Dec 9, 2015

yeah. I am doing that. I havent updated the tests yet.

@@ -867,8 +865,8 @@ public void testTrackStateByAPI() {
JavaPairRDD<String, Boolean> initialRDD = null;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: the method name testTrackStateByAPI should be renamed to testMapWithStateAPI

@tdas tdas changed the title [SPARK-12244][STREAMING] Rename trackStateByKey to mapWithState [SPARK-12244][SPARK-12245][STREAMING] Rename trackStateByKey to mapWithState and change tracking function signature Dec 9, 2015
* }
*
* val spec = StateSpec.function(trackingFunction).numPartitions(10)
* val spec = StateSpec.function(mappingFunction).numPartitions(10)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same issue as above

@SparkQA
Copy link

SparkQA commented Dec 9, 2015

Test build #47449 has finished for PR 10224 at commit 4e9f778.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Dec 9, 2015

Test build #47447 has finished for PR 10224 at commit 13b8cb4.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

* JavaTrackStateDStream<Integer, Integer, Integer, String> trackStateDStream =
* keyValueDStream.<Integer, String>trackStateByKey(
* StateSpec.function(trackStateFunc).numPartitions(10));
* JavaMapWithStateDStream<Integer, Integer, Integer, String> mapWithStateDStream =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: the key type is String now.

@zsxwing
Copy link
Member

zsxwing commented Dec 10, 2015

LGTM except some nits

@SparkQA
Copy link

SparkQA commented Dec 10, 2015

Test build #2191 has finished for PR 10224 at commit 495b982.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Dec 10, 2015

Test build #47460 has finished for PR 10224 at commit 6f5c694.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Dec 10, 2015

Test build #47467 has finished for PR 10224 at commit 6d29b7a.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@zsxwing
Copy link
Member

zsxwing commented Dec 10, 2015

LGTM

@zsxwing
Copy link
Member

zsxwing commented Dec 10, 2015

merging to master and 1.6

@asfgit asfgit closed this in bd2cd4f Dec 10, 2015
asfgit pushed a commit that referenced this pull request Dec 10, 2015
…thState and change tracking function signature

SPARK-12244:

Based on feedback from early users and personal experience attempting to explain it, the name trackStateByKey had two problem.
"trackState" is a completely new term which really does not give any intuition on what the operation is
the resultant data stream of objects returned by the function is called in docs as the "emitted" data for the lack of a better.
"mapWithState" makes sense because the API is like a mapping function like (Key, Value) => T with State as an additional parameter. The resultant data stream is "mapped data". So both problems are solved.

SPARK-12245:

From initial experiences, not having the key in the function makes it hard to return mapped stuff, as the whole information of the records is not there. Basically the user is restricted to doing something like mapValue() instead of map(). So adding the key as a parameter.

Author: Tathagata Das <[email protected]>

Closes #10224 from tdas/rename.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants