Skip to content

DateParser failing to detect dates #6459

@csr03sridharan

Description

@csr03sridharan

DateMatcher is not detecting dates as expected. in Spark NLP

Here 1/2 is identified as date in pyspark - Please check this.

Attached the below code.

Example: text = ["right over-the-needle catheter system 18 gauge;1 1/2 in length"]

class NLPPipeline:
def init(self, pipeline_stage_type):
self.pipeline_stage_type = pipeline_stage_type
self.documentAssembler = DocumentAssembler()
.setInputCol("text")
.setOutputCol("document")

self.date_parser = DateMatcher() \
  .setInputCols("document") \
  .setOutputCol("date") \
  .setAnchorDateYear(1900) \
  .setDateFormat("yyyy/MM/dd")
self.pretrained_model_pipeline = self.pretrained_model_pipeline_runner()

def pretrained_model_pipeline_runner(self):
if self.pipeline_stage_type == 'DATE_ONLY':
pipeline = Pipeline().setStages([
self.documentAssembler,
self.date_parser])
return pipeline

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions