Skip to content

Conversation

SARAVANA1501
Copy link
Contributor

#381
Add support for stop words removers

@suhsteve
Copy link
Member

suhsteve commented Oct 9, 2020

There's already a WIP for this feature #716

@GoEddie
Copy link
Contributor

GoEddie commented Oct 9, 2020

This one looks like it is further ahead and I didn't update the list so i'll scrap my one.

One thing though can you add TransformSchema, get/set Locale get/set CaseSensitive, get/set StopWords as well? (at a minimum Get/SetStopWords is going to be required I think otherwise the english stop words are the only stop words that could be used)

@suhsteve
Copy link
Member

suhsteve commented Oct 9, 2020

@GoEddie sounds good. Can you help drive this PR to completion ?

@SARAVANA1501
Copy link
Contributor Author

SARAVANA1501 commented Oct 11, 2020

@GoEddie "Type Microsoft.Spark.Sql.Types.StructType not supported yet" so unable to complete transform scheme, rest of the things are done, Can you review and complete this PR.

@SARAVANA1501
Copy link
Contributor Author

Thought of contributing to #26 for struct type then will finish the transform schema method

@suhsteve
Copy link
Member

Thought of contributing to #26 for struct type then will finish the transform schema method

StructType is already supported. If you are trying to expose TransformSchema then I would suggest taking a look at how we send StructType to the jvm by looking here.

@GoEddie
Copy link
Contributor

GoEddie commented Oct 11, 2020

CountVectorizerModel also shows how to implement TransformSchema

public StructType TransformSchema(StructType value) =>

@SARAVANA1501
Copy link
Contributor Author

@GoEddie TransformSchema is done.

@SARAVANA1501
Copy link
Contributor Author

@Niharikadutta Implemented your suggestions.

Copy link
Contributor

@GoEddie GoEddie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@SARAVANA1501
Copy link
Contributor Author

@GoEddie Changes impleted, Could you have a look?

GoEddie
GoEddie previously approved these changes Oct 17, 2020
Copy link
Contributor

@GoEddie GoEddie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd say the implementation looks good, maybe some formatting changes - I can't approve though as I am not a maintainer :)

suhsteve
suhsteve previously approved these changes Nov 3, 2020
Copy link
Member

@suhsteve suhsteve left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a couple nits, but LGTM. Thanks @SARAVANA1501 !

@suhsteve
Copy link
Member

@Niharikadutta @GoEddie can you review ?

Copy link
Contributor

@GoEddie GoEddie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is looking pretty good, a couple of small things but other than that looks good to me.

We also have 110 character width now so some of the comments can be reformatted slightly.

@Niharikadutta
Copy link
Collaborator

Some minor nits, but otherwise LGTM. Thanks @SARAVANA1501 !

@GoEddie
Copy link
Contributor

GoEddie commented Nov 18, 2020

LGTM

@Niharikadutta
Copy link
Collaborator

LGTM as well, thanks @SARAVANA1501 !

Copy link
Contributor

@imback82 imback82 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM (few nits), thanks @SARAVANA1501!

Assert.Equal(expectedStopWords, stopWordsRemover.GetStopWords());
Assert.NotEmpty(StopWordsRemover.LoadDefaultStopWords("english"));

using (TemporaryDirectory tempDirectory = new TemporaryDirectory())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: var

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ping?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I committed this change. I will merge this PR after CI passes.

@imback82 imback82 merged commit 4894956 into dotnet:master Nov 23, 2020
@SARAVANA1501 SARAVANA1501 deleted the StopWordsRemover branch December 13, 2020 14:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants