-
Notifications
You must be signed in to change notification settings - Fork 329
Add stop words removers #726
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There's already a WIP for this feature #716 |
This one looks like it is further ahead and I didn't update the list so i'll scrap my one. One thing though can you add TransformSchema, get/set Locale get/set CaseSensitive, get/set StopWords as well? (at a minimum Get/SetStopWords is going to be required I think otherwise the english stop words are the only stop words that could be used) |
@GoEddie sounds good. Can you help drive this PR to completion ? |
@GoEddie "Type Microsoft.Spark.Sql.Types.StructType not supported yet" so unable to complete transform scheme, rest of the things are done, Can you review and complete this PR. |
Thought of contributing to #26 for struct type then will finish the transform schema method |
CountVectorizerModel also shows how to implement TransformSchema
|
@GoEddie TransformSchema is done. |
src/csharp/Microsoft.Spark.E2ETest/IpcTests/ML/Feature/StopWordsRemoverTests.cs
Show resolved
Hide resolved
src/csharp/Microsoft.Spark.E2ETest/IpcTests/ML/Feature/StopWordsRemoverTests.cs
Outdated
Show resolved
Hide resolved
src/csharp/Microsoft.Spark.E2ETest/IpcTests/ML/Feature/StopWordsRemoverTests.cs
Outdated
Show resolved
Hide resolved
src/csharp/Microsoft.Spark.E2ETest/IpcTests/ML/Feature/StopWordsRemoverTests.cs
Outdated
Show resolved
Hide resolved
src/csharp/Microsoft.Spark.E2ETest/IpcTests/ML/Feature/StopWordsRemoverTests.cs
Outdated
Show resolved
Hide resolved
src/csharp/Microsoft.Spark.E2ETest/IpcTests/ML/Feature/StopWordsRemoverTests.cs
Outdated
Show resolved
Hide resolved
@Niharikadutta Implemented your suggestions. |
src/csharp/Microsoft.Spark.E2ETest/IpcTests/ML/Feature/StopWordsRemoverTests.cs
Show resolved
Hide resolved
src/csharp/Microsoft.Spark.E2ETest/IpcTests/ML/Feature/StopWordsRemoverTests.cs
Outdated
Show resolved
Hide resolved
src/csharp/Microsoft.Spark.E2ETest/IpcTests/ML/Feature/StopWordsRemoverTests.cs
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it also possible to implement loadDefaultStopWords? https://spark.apache.org/docs/2.4.6/api/java/org/apache/spark/ml/feature/StopWordsRemover.html#loadDefaultStopWords-java.lang.String-
@GoEddie Changes impleted, Could you have a look? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd say the implementation looks good, maybe some formatting changes - I can't approve though as I am not a maintainer :)
src/csharp/Microsoft.Spark.E2ETest/IpcTests/ML/Feature/StopWordsRemoverTests.cs
Outdated
Show resolved
Hide resolved
src/csharp/Microsoft.Spark.E2ETest/IpcTests/ML/Feature/StopWordsRemoverTests.cs
Outdated
Show resolved
Hide resolved
src/csharp/Microsoft.Spark.E2ETest/IpcTests/ML/Feature/StopWordsRemoverTests.cs
Outdated
Show resolved
Hide resolved
Co-authored-by: Steve Suh <[email protected]>
…dsRemoverTests.cs Co-authored-by: Steve Suh <[email protected]>
src/csharp/Microsoft.Spark.E2ETest/IpcTests/ML/Feature/StopWordsRemoverTests.cs
Outdated
Show resolved
Hide resolved
src/csharp/Microsoft.Spark.E2ETest/IpcTests/ML/Feature/StopWordsRemoverTests.cs
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a couple nits, but LGTM. Thanks @SARAVANA1501 !
…dsRemoverTests.cs Co-authored-by: Steve Suh <[email protected]>
…dsRemoverTests.cs Co-authored-by: Steve Suh <[email protected]>
@Niharikadutta @GoEddie can you review ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is looking pretty good, a couple of small things but other than that looks good to me.
We also have 110 character width now so some of the comments can be reformatted slightly.
Some minor nits, but otherwise LGTM. Thanks @SARAVANA1501 ! |
Co-authored-by: Niharika Dutta <[email protected]>
Co-authored-by: Niharika Dutta <[email protected]>
Co-authored-by: Niharika Dutta <[email protected]>
LGTM |
LGTM as well, thanks @SARAVANA1501 ! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM (few nits), thanks @SARAVANA1501!
Assert.Equal(expectedStopWords, stopWordsRemover.GetStopWords()); | ||
Assert.NotEmpty(StopWordsRemover.LoadDefaultStopWords("english")); | ||
|
||
using (TemporaryDirectory tempDirectory = new TemporaryDirectory()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: var
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ping?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I committed this change. I will merge this PR after CI passes.
Co-authored-by: Terry Kim <[email protected]>
src/csharp/Microsoft.Spark.E2ETest/IpcTests/ML/Feature/StopWordsRemoverTests.cs
Outdated
Show resolved
Hide resolved
…dsRemoverTests.cs
#381
Add support for stop words removers