-
Notifications
You must be signed in to change notification settings - Fork 732
Labels
Description
DateMatcher is not detecting dates as expected. in Spark NLP
Here 1/2 is identified as date in pyspark - Please check this.
Attached the below code.
Example: text = ["right over-the-needle catheter system 18 gauge;1 1/2 in length"]
class NLPPipeline:
def init(self, pipeline_stage_type):
self.pipeline_stage_type = pipeline_stage_type
self.documentAssembler = DocumentAssembler()
.setInputCol("text")
.setOutputCol("document")
self.date_parser = DateMatcher() \
.setInputCols("document") \
.setOutputCol("date") \
.setAnchorDateYear(1900) \
.setDateFormat("yyyy/MM/dd")
self.pretrained_model_pipeline = self.pretrained_model_pipeline_runner()
def pretrained_model_pipeline_runner(self):
if self.pipeline_stage_type == 'DATE_ONLY':
pipeline = Pipeline().setStages([
self.documentAssembler,
self.date_parser])
return pipeline