Skip to content

[SPARK-7663][MLlib] Add requirement for word2vec model #6228

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from

Conversation

yinxusen
Copy link
Contributor

JIRA issue link.

We should check the model size of word2vec, to prevent the unexpected empty.

CC @srowen.

@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@SparkQA
Copy link

SparkQA commented May 18, 2015

Test build #32978 has started for PR 6228 at commit 54ae63e.

@@ -410,6 +410,9 @@ class Word2Vec extends Serializable with Logging {
i += 1
}

require(word2VecMap.size > 0, "The word2vec map should not be empty. You may need to check " +
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be nonEmpty, but isn't this already determined by whether vocabSize == 0? I don't know this code well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be checked by the vocabSize, but actually it doesn't. You are right, I think the check should be added in this line https://github.com/yinxusen/spark/blob/SPARK-7663/mllib/src/main/scala/org/apache/spark/mllib/feature/Word2Vec.scala#L160 to prevent the following computations.

@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@SparkQA
Copy link

SparkQA commented May 18, 2015

Test build #32980 has started for PR 6228 at commit 21770c5.

@SparkQA
Copy link

SparkQA commented May 18, 2015

Test build #32978 has finished for PR 6228 at commit 54ae63e.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • case class SerializableAWSCredentials(accessKeyId: String, secretKey: String)

@AmplabJenkins
Copy link

Merged build finished. Test PASSed.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/32978/
Test PASSed.

@SparkQA
Copy link

SparkQA commented May 18, 2015

Test build #32980 has finished for PR 6228 at commit 21770c5.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Merged build finished. Test PASSed.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/32980/
Test PASSed.

@@ -158,6 +158,9 @@ class Word2Vec extends Serializable with Logging {
.sortWith((a, b) => a.cn > b.cn)

vocabSize = vocab.length
require(vocabSize > 0, "The vocabulary size should be large than 0. You may need to check " +
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be a RuntimeException (rather than an IllegalArgumentException)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's reasonable. Changed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, I'd use require here. IllegalArgumentException is a RuntimeException, and I've always understood it to be slightly bad style to throw RuntimeException as you can't catch for it with catching for any RuntimeException.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@srowen Do you have a better subclass of RuntimeException to suggest? I feel like there was no illegal argument given here. But looking at a list of subclasses [https://docs.oracle.com/javase/7/docs/api/java/lang/RuntimeException.html], I can't find one I like better.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this is real minor point of taste. I use IllegalStateException when there's no real argument to speak of, which is marginally more specific and description. But isn't this error caused by a bad input RDD? IllegalArgumentException seems like just the thing.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. @yinxusen Sorry for asking you to revert your change, but can you please stick with "require" per @srowen 's advice? Thank you!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Never mind.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jkbradley I have reverted it back.

@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@SparkQA
Copy link

SparkQA commented May 19, 2015

Test build #33051 has started for PR 6228 at commit 6210125.

@SparkQA
Copy link

SparkQA commented May 19, 2015

Test build #33051 has finished for PR 6228 at commit 6210125.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Merged build finished. Test PASSed.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/33051/
Test PASSed.

@SparkQA
Copy link

SparkQA commented May 20, 2015

Test build #834 has started for PR 6228 at commit 6210125.

@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@SparkQA
Copy link

SparkQA commented May 20, 2015

Test build #33119 has started for PR 6228 at commit 21770c5.

@SparkQA
Copy link

SparkQA commented May 20, 2015

Test build #834 has finished for PR 6228 at commit 6210125.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented May 20, 2015

Test build #33119 has finished for PR 6228 at commit 21770c5.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Merged build finished. Test FAILed.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/33119/
Test FAILed.

@srowen
Copy link
Member

srowen commented May 20, 2015

Jenkins, retest this please

@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@SparkQA
Copy link

SparkQA commented May 20, 2015

Test build #33131 has started for PR 6228 at commit 21770c5.

@SparkQA
Copy link

SparkQA commented May 20, 2015

Test build #33131 has finished for PR 6228 at commit 21770c5.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Merged build finished. Test PASSed.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/33131/
Test PASSed.

@asfgit asfgit closed this in b3abf0b May 20, 2015
jeanlyn pushed a commit to jeanlyn/spark that referenced this pull request May 28, 2015
JIRA issue [link](https://issues.apache.org/jira/browse/SPARK-7663).

We should check the model size of word2vec, to prevent the unexpected empty.

CC srowen.

Author: Xusen Yin <[email protected]>

Closes apache#6228 from yinxusen/SPARK-7663 and squashes the following commits:

21770c5 [Xusen Yin] check the vocab size
54ae63e [Xusen Yin] add requirement for word2vec model
jeanlyn pushed a commit to jeanlyn/spark that referenced this pull request Jun 12, 2015
JIRA issue [link](https://issues.apache.org/jira/browse/SPARK-7663).

We should check the model size of word2vec, to prevent the unexpected empty.

CC srowen.

Author: Xusen Yin <[email protected]>

Closes apache#6228 from yinxusen/SPARK-7663 and squashes the following commits:

21770c5 [Xusen Yin] check the vocab size
54ae63e [Xusen Yin] add requirement for word2vec model
nemccarthy pushed a commit to nemccarthy/spark that referenced this pull request Jun 19, 2015
JIRA issue [link](https://issues.apache.org/jira/browse/SPARK-7663).

We should check the model size of word2vec, to prevent the unexpected empty.

CC srowen.

Author: Xusen Yin <[email protected]>

Closes apache#6228 from yinxusen/SPARK-7663 and squashes the following commits:

21770c5 [Xusen Yin] check the vocab size
54ae63e [Xusen Yin] add requirement for word2vec model
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants