Skip to content

[SPARK-7237] [SPARK-7741] [Core] [Streaming] Clean more closures that need cleaning #6269

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 9 commits into from

Conversation

andrewor14
Copy link
Contributor

SPARK-7741 is the equivalent of SPARK-7237 in streaming. This is an alternative to #6268.

@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@SparkQA
Copy link

SparkQA commented May 19, 2015

Test build #33099 has started for PR 6269 at commit 5431f61.

@SparkQA
Copy link

SparkQA commented May 19, 2015

Test build #33099 has finished for PR 6269 at commit 5431f61.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Merged build finished. Test PASSed.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/33099/
Test PASSed.

@tdas
Copy link
Contributor

tdas commented May 19, 2015

foreachRDD is missing.

@andrewor14
Copy link
Contributor Author

Good catch

@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@SparkQA
Copy link

SparkQA commented May 20, 2015

Test build #33106 has started for PR 6269 at commit 328139b.

@SparkQA
Copy link

SparkQA commented May 20, 2015

Test build #33106 has finished for PR 6269 at commit 328139b.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Merged build finished. Test FAILed.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/33106/
Test FAILed.

@andrewor14
Copy link
Contributor Author

Okay, I added tests to verify that closures are being cleaned in DStream operations. @tdas PTAL.

@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@SparkQA
Copy link

SparkQA commented May 20, 2015

Test build #33117 has started for PR 6269 at commit d18c9f9.

@andrewor14
Copy link
Contributor Author

Regarding foreachRDD, I'm actually pretty confused about one thing. Currently when we clean the foreach closure we don't check serializability for the following reason:

// because the DStream is reachable from the outer object here, and because 
// DStreams can't be serialized with closures, we can't proactively check 
// it for serializability and so we pass the optional false to SparkContext.clean
new ForEachDStream(this, context.sparkContext.clean(foreachFunc, false)).register()

This implies that the closure cannot be serialized ever, which is true because we pull in the whole ssc into the closure in ForEachInputDStream#generateJob:

val jobFunc = () => createRDDWithLocalProperties(time) {
  ssc.sparkContext.setCallSite(creationSite)
  foreachFunc(rdd, time)
}

So, my question is the following:
What's the point in cleaning this closure if it can never be made serializable?

(By the way, this is a separate issue that we can address later. For now I'm OK with just not checking serializability in foreachRDD, which is arbitrary but doesn't really hurt.)

@SparkQA
Copy link

SparkQA commented May 20, 2015

Test build #33117 has finished for PR 6269 at commit d18c9f9.

  • This patch fails MiMa tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class SimpleFunctionRegistry(val conf: CatalystConf) extends FunctionRegistry

@AmplabJenkins
Copy link

Merged build finished. Test FAILed.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/33117/
Test FAILed.

@andrewor14
Copy link
Contributor Author

retest this please

@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@SparkQA
Copy link

SparkQA commented May 20, 2015

Test build #33127 has started for PR 6269 at commit 79a435b.

@SparkQA
Copy link

SparkQA commented May 20, 2015

Test build #33127 has finished for PR 6269 at commit 79a435b.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Merged build finished. Test PASSed.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/33127/
Test PASSed.

@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@SparkQA
Copy link

SparkQA commented May 20, 2015

Test build #33160 has started for PR 6269 at commit c51c9ab.

@andrewor14 andrewor14 changed the title [SPARK-7237] [SPARK-7741] Clean more closures that need cleaning [SPARK-7237] [SPARK-7741] [Core] [Streaming] Clean more closures that need cleaning May 20, 2015
@SparkQA
Copy link

SparkQA commented May 20, 2015

Test build #33160 has finished for PR 6269 at commit c51c9ab.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class MultilabelMetrics(JavaModelWrapper):
    • class GroupedData protected[sql](

@AmplabJenkins
Copy link

Merged build finished. Test FAILed.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/33160/
Test FAILed.

@andrewor14
Copy link
Contributor Author

retest this please

@tdas
Copy link
Contributor

tdas commented May 20, 2015

LGTM!

@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@SparkQA
Copy link

SparkQA commented May 20, 2015

Test build #33169 has started for PR 6269 at commit c51c9ab.

@SparkQA
Copy link

SparkQA commented May 20, 2015

Test build #33169 has finished for PR 6269 at commit c51c9ab.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Merged build finished. Test PASSed.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/33169/
Test PASSed.

asfgit pushed a commit that referenced this pull request May 20, 2015
… need cleaning

SPARK-7741 is the equivalent of SPARK-7237 in streaming. This is an alternative to #6268.

Author: Andrew Or <[email protected]>

Closes #6269 from andrewor14/clean-moar and squashes the following commits:

c51c9ab [Andrew Or] Add periods (trivial)
6c686ac [Andrew Or] Merge branch 'master' of github.com:apache/spark into clean-moar
79a435b [Andrew Or] Fix tests
d18c9f9 [Andrew Or] Merge branch 'master' of github.com:apache/spark into clean-moar
65ef07b [Andrew Or] Fix tests?
4b487a3 [Andrew Or] Add tests for closures passed to DStream operations
328139b [Andrew Or] Do not forget foreachRDD
5431f61 [Andrew Or] Clean streaming closures
72b7b73 [Andrew Or] Clean core closures

(cherry picked from commit 9b84443)
Signed-off-by: Tathagata Das <[email protected]>
@asfgit asfgit closed this in 9b84443 May 20, 2015
@andrewor14 andrewor14 deleted the clean-moar branch May 20, 2015 23:27
jeanlyn pushed a commit to jeanlyn/spark that referenced this pull request May 28, 2015
… need cleaning

SPARK-7741 is the equivalent of SPARK-7237 in streaming. This is an alternative to apache#6268.

Author: Andrew Or <[email protected]>

Closes apache#6269 from andrewor14/clean-moar and squashes the following commits:

c51c9ab [Andrew Or] Add periods (trivial)
6c686ac [Andrew Or] Merge branch 'master' of github.com:apache/spark into clean-moar
79a435b [Andrew Or] Fix tests
d18c9f9 [Andrew Or] Merge branch 'master' of github.com:apache/spark into clean-moar
65ef07b [Andrew Or] Fix tests?
4b487a3 [Andrew Or] Add tests for closures passed to DStream operations
328139b [Andrew Or] Do not forget foreachRDD
5431f61 [Andrew Or] Clean streaming closures
72b7b73 [Andrew Or] Clean core closures
jeanlyn pushed a commit to jeanlyn/spark that referenced this pull request Jun 12, 2015
… need cleaning

SPARK-7741 is the equivalent of SPARK-7237 in streaming. This is an alternative to apache#6268.

Author: Andrew Or <[email protected]>

Closes apache#6269 from andrewor14/clean-moar and squashes the following commits:

c51c9ab [Andrew Or] Add periods (trivial)
6c686ac [Andrew Or] Merge branch 'master' of github.com:apache/spark into clean-moar
79a435b [Andrew Or] Fix tests
d18c9f9 [Andrew Or] Merge branch 'master' of github.com:apache/spark into clean-moar
65ef07b [Andrew Or] Fix tests?
4b487a3 [Andrew Or] Add tests for closures passed to DStream operations
328139b [Andrew Or] Do not forget foreachRDD
5431f61 [Andrew Or] Clean streaming closures
72b7b73 [Andrew Or] Clean core closures
nemccarthy pushed a commit to nemccarthy/spark that referenced this pull request Jun 19, 2015
… need cleaning

SPARK-7741 is the equivalent of SPARK-7237 in streaming. This is an alternative to apache#6268.

Author: Andrew Or <[email protected]>

Closes apache#6269 from andrewor14/clean-moar and squashes the following commits:

c51c9ab [Andrew Or] Add periods (trivial)
6c686ac [Andrew Or] Merge branch 'master' of github.com:apache/spark into clean-moar
79a435b [Andrew Or] Fix tests
d18c9f9 [Andrew Or] Merge branch 'master' of github.com:apache/spark into clean-moar
65ef07b [Andrew Or] Fix tests?
4b487a3 [Andrew Or] Add tests for closures passed to DStream operations
328139b [Andrew Or] Do not forget foreachRDD
5431f61 [Andrew Or] Clean streaming closures
72b7b73 [Andrew Or] Clean core closures
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants