Skip to content

[SPARK-4075][SPARK-4434] Fix the URI validation logic for Application Jar name. #3326

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 8 commits into from

Conversation

sarutak
Copy link
Member

@sarutak sarutak commented Nov 17, 2014

This PR adds a regression test for SPARK-4434.

@SparkQA
Copy link

SparkQA commented Nov 17, 2014

Test build #23503 has started for PR 3326 at commit 6d4f47e.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Nov 17, 2014

Test build #23503 has finished for PR 3326 at commit 6d4f47e.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23503/
Test PASSed.

@@ -24,6 +24,7 @@ class ClientSuite extends FunSuite with Matchers {
test("correctly validates driver jar URL's") {
ClientArguments.isValidJarUrl("http://someHost:8080/foo.jar") should be (true)
ClientArguments.isValidJarUrl("file://some/path/to/a/jarFile.jar") should be (true)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not your change, but this should not be a valid jar URL right? My understanding is that one or three slashes are valid, but not two.

@andrewor14
Copy link
Contributor

Hey @sarutak also are you resubmitting your original change for SPARK-4434?

@sarutak
Copy link
Member Author

sarutak commented Nov 18, 2014

O.K, I'll resubmit original change including this test case and fixing to handle double slashed file scheme.

In deploy.ClientArguments.isValidJarUrl, the url is checked as follows.

    def isValidJarUrl(s: String): Boolean = s.matches("(.+):(.+)jar")

So, it allows like 'hdfs:file.jar' (no authority).

Author: Kousuke Saruta <[email protected]>

Closes apache#2925 from sarutak/uri-syntax-check-improvement and squashes the following commits:

cf06173 [Kousuke Saruta] Improved URI syntax checking
@SparkQA
Copy link

SparkQA commented Nov 18, 2014

Test build #23520 has started for PR 3326 at commit 9e09da2.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Nov 18, 2014

Test build #23520 has finished for PR 3326 at commit 9e09da2.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23520/
Test PASSed.

@andrewor14
Copy link
Contributor

Thanks, can you rename the title of the PR now that this is not just tests anymore?

@sarutak sarutak changed the title [SPARK-4434] Add a test case as a regression check for SPARK-4434 [SPARK-4075][SPARK-4434] Fix the URI validation logic for Application Jar name. Nov 18, 2014
@sarutak
Copy link
Member Author

sarutak commented Nov 18, 2014

Thanks for pointing out. I've now modified.

@@ -73,7 +75,7 @@ private[spark] class ClientArguments(args: Array[String]) {

if (!ClientArguments.isValidJarUrl(_jarUrl)) {
println(s"Jar url '${_jarUrl}' is not in valid format.")
println(s"Must be a jar file path in URL format (e.g. hdfs://XX.jar, file://XX.jar)")
println(s"Must be a jar file path in URL format (e.g. hdfs://XX.jar, file:///XX.jar)")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you're changing the file URL might as well change the hdfs one.

@vanzin
Copy link
Contributor

vanzin commented Nov 18, 2014

Just nits w.r.t. naming of things, otherwise looks ok.

@SparkQA
Copy link

SparkQA commented Nov 18, 2014

Test build #23538 has started for PR 3326 at commit 4f30210.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Nov 18, 2014

Test build #23538 has finished for PR 3326 at commit 4f30210.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23538/
Test FAILed.

@SparkQA
Copy link

SparkQA commented Nov 18, 2014

Test build #23539 has started for PR 3326 at commit c1c80ca.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Nov 18, 2014

Test build #23539 has finished for PR 3326 at commit c1c80ca.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23539/
Test PASSed.

@SparkQA
Copy link

SparkQA commented Nov 18, 2014

Test build #23552 has started for PR 3326 at commit 82bc9cc.

  • This patch merges cleanly.

ClientArguments.isValidJarUrl("file://some/path/to/a/jarFile.jar") should be (true)

// file scheme with authority and path is valid.
ClientArguments.isValidJarUrl("file://somehost/path/to/a/jarFile.jar") should be (true)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait, why is this valid?

@andrewor14
Copy link
Contributor

Hey @sarutak @vanzin I thought the following are valid:

file:/path/to/my.jar
file:///path/to/my.jar
hdfs://path/to/my.jar

but the following are not:

file://path/to/my.jar
hdfs:/path/to/my.jar
hdfs:///path/to/my.jar

Is this not the case?

@sarutak
Copy link
Member Author

sarutak commented Nov 18, 2014

I think, double-slashed file scheme is valid in according to the specification of URI.
In the example URI you mentioned above (file://path/to/my.jar), path is recognized as authority.

Or, should we invalid double-slashed file scheme (file://), single-slashed any scheme(hdfs:/) and triple-slashed any scheme except for file scheme(hdfs:///) regardless of the URI specification?

@vanzin
Copy link
Contributor

vanzin commented Nov 18, 2014

@andrewor14 all the URIs you mention as not valid are actually valid.

  • file scheme with authority is valid. It won't do anything useful in most cases, but on Windows, for example, it might work (if it's pointing at an SMB share)
  • hdfs scheme with no host will probably default to fs.defaultFS, if I remember correctly the HDFS code. (if the scheme matches, otherwise HDFS will throw an exception)
  • /// means "default host", which is generally "localhost" but in HDFS might be the above.

So those are all valid URIs, which is why no exception is thrown when you construct the URI object.

@andrewor14
Copy link
Contributor

I see. By "valid" here we mean the URI provides some combination of the scheme, the path and the authority. However, when you do two-slashes on file:, it actually interprets it differently from what I would otherwise assume.

scala> new URI("file:/path/to/my.jar").getPath
res9: String = /path/to/my.jar

scala> new URI("file://path/to/my.jar").getPath
res10: String = /to/my.jar

I suppose then it is the user's responsibility to not make this mistake. We can't prevent this because "path" could be a valid authority or hostname here and we won't be able to tell the difference.

@vanzin
Copy link
Contributor

vanzin commented Nov 18, 2014

However, when you do two-slashes on file:, it actually interprets it differently from what I would otherwise assume

That's because you assume wrong. :-) "//" is the indication that the next component is the authority. You can omit the authority by not specifying "//". "file" is not special. It works just like any other URI (http, hdfs or otherwise).

@SparkQA
Copy link

SparkQA commented Nov 18, 2014

Test build #23552 has finished for PR 3326 at commit 82bc9cc.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23552/
Test PASSed.

@andrewor14
Copy link
Contributor

Ok, I'm merging this into master and 1.2. Not going to merge this into 1.1 because the benefit here is not worth potentially causing another regression.

davies pushed a commit to davies/spark that referenced this pull request Nov 18, 2014
… Jar name.

This PR adds a regression test for SPARK-4434.

Author: Kousuke Saruta <[email protected]>

Closes apache#3326 from sarutak/add-triple-slash-testcase and squashes the following commits:

82bc9cc [Kousuke Saruta] Fixed wrong grammar in comment
9149027 [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into add-triple-slash-testcase
c1c80ca [Kousuke Saruta] Fixed style
4f30210 [Kousuke Saruta] Modified comments
9e09da2 [Kousuke Saruta] Fixed URI validation for jar file
d4b99ef [Kousuke Saruta] [SPARK-4075] [Deploy] Jar url validation is not enough for Jar file
ac79906 [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into add-triple-slash-testcase
6d4f47e [Kousuke Saruta] Added a test case as a regression check for SPARK-4434
sarutak added a commit that referenced this pull request Nov 19, 2014
… Jar name.

This PR adds a regression test for SPARK-4434.

Author: Kousuke Saruta <[email protected]>

Closes #3326 from sarutak/add-triple-slash-testcase and squashes the following commits:

82bc9cc [Kousuke Saruta] Fixed wrong grammar in comment
9149027 [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into add-triple-slash-testcase
c1c80ca [Kousuke Saruta] Fixed style
4f30210 [Kousuke Saruta] Modified comments
9e09da2 [Kousuke Saruta] Fixed URI validation for jar file
d4b99ef [Kousuke Saruta] [SPARK-4075] [Deploy] Jar url validation is not enough for Jar file
ac79906 [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into add-triple-slash-testcase
6d4f47e [Kousuke Saruta] Added a test case as a regression check for SPARK-4434

(cherry picked from commit bfebfd8)
Signed-off-by: Andrew Or <[email protected]>
@andrewor14
Copy link
Contributor

Hey @sarutak mind closing this? It's already merged

@sarutak
Copy link
Member Author

sarutak commented Nov 19, 2014

Thanks for notification. I close this PR.

@sarutak sarutak closed this Nov 19, 2014
@sarutak sarutak deleted the add-triple-slash-testcase branch April 11, 2015 05:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants