-
Notifications
You must be signed in to change notification settings - Fork 329
Open
Labels
enhancementNew feature or requestNew feature or requestgood first issueGood for newcomersGood for newcomershelp wantedExtra attention is neededExtra attention is needed
Description
** These should all be implemented with #1031 **
=========================================
This is to track implementation of the ML-Features: https://spark.apache.org/docs/latest/ml-features
Bucketizer has been implemented in #378 but there are more features that should be implemented.
- Feature Extractors
- TF-IDF
- Word2Vec (Implement ML Features: Word2Vec #491)
- CountVectorizer (Implement ML/CountVectorizer and ML/CountVectorizerModel #608)
- FeatureHasher (FeatureHasher #652)
- Feature Transformers
- Tokenizer (base class for Feature as lots of methods are shared between the objects (more methods to be added in later pr's) #574)
- StopWordsRemover (Add stop words removers #726 thanks @SARAVANA1501 )
- n-gram (in-progress Add NGram #734)
- Binarizer (in-progress Add Binarizer #744)
- [] PCA (in-progress)
- PolynormalExpansion
- Dicrete Cosine Transform (DCT)
- StringIndexer (in-progress)
- IndexToString
- OneHotEncoderEstimator
- VectorIndexer
- Normalizer
- StandardScaler
- MinMaxScaler
- MaxAbsScaler
- Bucketizer
- ElementwiseProduct
- SQLTransformer (Implement ML Features #381. SQLTransformer class and testcase #781 @ramanathanv)
- VectorAssembler
- VectorSizeHint
- QuantileDiscretizer
- Imputer
- Feature Selectors
- VectorSlicer
- RFormula
- ChiSqSelector
- Locality Sensitive Hashing
- LSH Operations
- Feature Transformation
- Approximate Similarity Join
- Approximate Nearest Neighbour Search
- LSH Algorithms
- Bucketed Random Projection for Euclidean Distance
- MinHash for Jaccard Distance
- LSH Operations
If anyone else is going to implement probably best to put a comment here and I'll keep the list up to date.
luisquintanilla, kiramishima, rrekapalli and cutecycle
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or requestgood first issueGood for newcomersGood for newcomershelp wantedExtra attention is neededExtra attention is needed