-
Notifications
You must be signed in to change notification settings - Fork 28.7k
[SPARK-7663][MLlib] Add requirement for word2vec model #6228
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Merged build triggered. |
Merged build started. |
Test build #32978 has started for PR 6228 at commit |
@@ -410,6 +410,9 @@ class Word2Vec extends Serializable with Logging { | |||
i += 1 | |||
} | |||
|
|||
require(word2VecMap.size > 0, "The word2vec map should not be empty. You may need to check " + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could be nonEmpty
, but isn't this already determined by whether vocabSize == 0
? I don't know this code well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be checked by the vocabSize
, but actually it doesn't. You are right, I think the check should be added in this line https://github.com/yinxusen/spark/blob/SPARK-7663/mllib/src/main/scala/org/apache/spark/mllib/feature/Word2Vec.scala#L160 to prevent the following computations.
Merged build triggered. |
Merged build started. |
Test build #32980 has started for PR 6228 at commit |
Test build #32978 has finished for PR 6228 at commit
|
Merged build finished. Test PASSed. |
Test PASSed. |
Test build #32980 has finished for PR 6228 at commit
|
Merged build finished. Test PASSed. |
Test PASSed. |
@@ -158,6 +158,9 @@ class Word2Vec extends Serializable with Logging { | |||
.sortWith((a, b) => a.cn > b.cn) | |||
|
|||
vocabSize = vocab.length | |||
require(vocabSize > 0, "The vocabulary size should be large than 0. You may need to check " + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be a RuntimeException (rather than an IllegalArgumentException)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's reasonable. Changed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, I'd use require
here. IllegalArgumentException
is a RuntimeException
, and I've always understood it to be slightly bad style to throw RuntimeException
as you can't catch for it with catching for any RuntimeException
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@srowen Do you have a better subclass of RuntimeException to suggest? I feel like there was no illegal argument given here. But looking at a list of subclasses [https://docs.oracle.com/javase/7/docs/api/java/lang/RuntimeException.html], I can't find one I like better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this is real minor point of taste. I use IllegalStateException
when there's no real argument to speak of, which is marginally more specific and description. But isn't this error caused by a bad input RDD? IllegalArgumentException
seems like just the thing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Never mind.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jkbradley I have reverted it back.
Merged build triggered. |
Merged build started. |
Test build #33051 has started for PR 6228 at commit |
Test build #33051 has finished for PR 6228 at commit
|
Merged build finished. Test PASSed. |
Test PASSed. |
Test build #834 has started for PR 6228 at commit |
Merged build triggered. |
Merged build started. |
Test build #33119 has started for PR 6228 at commit |
Test build #834 has finished for PR 6228 at commit
|
Test build #33119 has finished for PR 6228 at commit
|
Merged build finished. Test FAILed. |
Test FAILed. |
Jenkins, retest this please |
Merged build triggered. |
Merged build started. |
Test build #33131 has started for PR 6228 at commit |
Test build #33131 has finished for PR 6228 at commit
|
Merged build finished. Test PASSed. |
Test PASSed. |
JIRA issue [link](https://issues.apache.org/jira/browse/SPARK-7663). We should check the model size of word2vec, to prevent the unexpected empty. CC srowen. Author: Xusen Yin <[email protected]> Closes apache#6228 from yinxusen/SPARK-7663 and squashes the following commits: 21770c5 [Xusen Yin] check the vocab size 54ae63e [Xusen Yin] add requirement for word2vec model
JIRA issue [link](https://issues.apache.org/jira/browse/SPARK-7663). We should check the model size of word2vec, to prevent the unexpected empty. CC srowen. Author: Xusen Yin <[email protected]> Closes apache#6228 from yinxusen/SPARK-7663 and squashes the following commits: 21770c5 [Xusen Yin] check the vocab size 54ae63e [Xusen Yin] add requirement for word2vec model
JIRA issue [link](https://issues.apache.org/jira/browse/SPARK-7663). We should check the model size of word2vec, to prevent the unexpected empty. CC srowen. Author: Xusen Yin <[email protected]> Closes apache#6228 from yinxusen/SPARK-7663 and squashes the following commits: 21770c5 [Xusen Yin] check the vocab size 54ae63e [Xusen Yin] add requirement for word2vec model
JIRA issue link.
We should check the model size of word2vec, to prevent the unexpected empty.
CC @srowen.