Skip to content

Commit 97b8969

Browse files
josejuanmartinezvkocaman
authored andcommitted
Add model 2021-11-05-bert_sequence_classifier_question_statement_clinical_en
1 parent 581158b commit 97b8969

File tree

1 file changed

+153
-0
lines changed

1 file changed

+153
-0
lines changed
Lines changed: 153 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,153 @@
1+
---
2+
layout: model
3+
title: Bert for Sequence Classification: Question vs Statement (Clinical)
4+
author: John Snow Labs
5+
name: bert_sequence_classifier_question_statement_clinical
6+
date: 2021-11-05
7+
tags: [question, statement, clinical, en, licensed]
8+
task: Text Classification
9+
language: en
10+
edition: Spark NLP for Healthcare 3.3.2
11+
spark_version: 3.0
12+
supported: true
13+
article_header:
14+
type: cover
15+
use_language_switcher: "Python-Scala-Java"
16+
---
17+
18+
## Description
19+
20+
Trained to add sentence classifying capabilities to distinguish between Question vs Statements in clinical domain.
21+
22+
This model was imported from Hugging Face (https://huggingface.co/shahrukhx01/question-vs-statement-classifier), trained based on Haystack (https://github.com/deepset-ai/haystack/issues/611) and finetuned by John Snow Labs with in-house clinical annotations.
23+
24+
## Predicted Entities
25+
26+
`question`, `statement`
27+
28+
{:.btn-box}
29+
<button class="button button-orange" disabled>Live Demo</button>
30+
<button class="button button-orange" disabled>Open in Colab</button>
31+
[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/clinical/models/bert_sequence_classifier_question_statement_clinical_en_3.3.2_3.0_1636106577489.zip){:.button.button-orange.button-orange-trans.arr.button-icon}
32+
33+
## How to use
34+
35+
36+
37+
<div class="tabs-box" markdown="1">
38+
{% include programmingLanguageSelectScalaPythonNLU.html %}
39+
```python
40+
documentAssembler = DocumentAssembler()\
41+
.setInputCol("text")\
42+
.setOutputCol("document")
43+
44+
sentenceDetector = SentenceDetectorDLModel.pretrained() \
45+
.setInputCols(["document"]) \
46+
.setOutputCol("sentence")
47+
48+
tokenizer = Tokenizer()\
49+
.setInputCols("sentence")\
50+
.setOutputCol("token")
51+
52+
seq = BertForSequenceClassification.pretrained('bert_sequence_classifier_question_statement_clinical', 'en', 'clinical/models')\
53+
.setInputCols(["token", "sentence"])\
54+
.setOutputCol("label")\
55+
.setCaseSensitive(True)
56+
57+
pipeline = Pipeline(stages = [
58+
documentAssembler,
59+
sentenceDetector,
60+
tokenizer,
61+
seq])
62+
63+
test_sentences = ["""Hello I am going to be having a baby throughand have just received my medical results before I have my tubes tested. I had the tests on day 23 of my cycle. My progresterone level is 10. What does this mean? What does progesterone level of 10 indicate?
64+
Your progesterone report is perfectly normal. We expect this result on day 23rd of the cycle.So there's nothing to worry as it's perfectly alright"""]
65+
66+
res = p_model.transform(spark.createDataFrame(pd.DataFrame({'text': test_sentences})))
67+
```
68+
```scala
69+
val documentAssembler = DocumentAssembler()\
70+
.setInputCol("text")\
71+
.setOutputCol("document")
72+
73+
val sentenceDetector = SentenceDetectorDLModel.pretrained() \
74+
.setInputCols(["document"]) \
75+
.setOutputCol("sentence")
76+
77+
val tokenizer = Tokenizer()\
78+
.setInputCols("sentence")\
79+
.setOutputCol("token")
80+
81+
val seq = BertForSequenceClassification.pretrained('bert_sequence_classifier_question_statement_clinical', 'en', 'clinical/models')\
82+
.setInputCols(["token", "sentence"])\
83+
.setOutputCol("label")\
84+
.setCaseSensitive(True)
85+
86+
val pipeline = new Pipeline().setStages(Array(
87+
documentAssembler,
88+
sentenceDetector,
89+
tokenizer,
90+
seq))
91+
92+
val test_sentences = "Hello I am going to be having a baby throughand have just received my medical results before I have my tubes tested. I had the tests on day 23 of my cycle.
93+
My progresterone level is 10. What does this mean? What does progesterone level of 10 indicate?
94+
Your progesterone report is perfectly normal. We expect this result on day 23rd of the cycle.So there's nothing to worry as it's perfectly alright"
95+
96+
val example = Seq.empty[test_sentences].toDS.toDF("text")
97+
val result = pipeline.fit(example).transform(example)
98+
```
99+
</div>
100+
101+
## Results
102+
103+
```bash
104+
```
105+
+--------------------------------------------------------------------------------------------------------------------+---------+
106+
|sentence |label |
107+
+--------------------------------------------------------------------------------------------------------------------+---------+
108+
|Hello I am going to be having a baby throughand have just received my medical results before I have my tubes tested.|statement|
109+
|I had the tests on day 23 of my cycle. |statement|
110+
|My progresterone level is 10. |statement|
111+
|What does this mean? |question |
112+
|What does progesterone level of 10 indicate? |question |
113+
|Your progesterone report is perfectly normal. We expect this result on day 23rd of the cycle. |statement|
114+
|So there's nothing to worry as it's perfectly alright |statement|
115+
+--------------------------------------------------------------------------------------------------------------------+---------
116+
```
117+
```
118+
119+
{:.model-param}
120+
## Model Information
121+
122+
{:.table-model}
123+
|---|---|
124+
|Model Name:|bert_sequence_classifier_question_statement_clinical|
125+
|Compatibility:|Spark NLP for Healthcare 3.3.2+|
126+
|License:|Licensed|
127+
|Edition:|Official|
128+
|Input Labels:|[token, sentence]|
129+
|Output Labels:|[label]|
130+
|Language:|en|
131+
|Case sensitive:|true|
132+
133+
## Data Source
134+
135+
For generic domain training:
136+
https://github.com/deepset-ai/haystack/issues/611
137+
138+
For finetuning in clinical domain, in house JSL annotations based on clinical Q&A.
139+
140+
## Benchmarking
141+
142+
```bash
143+
```
144+
precision recall f1-score support
145+
146+
question 0.97 0.94 0.96 243
147+
statement 0.98 0.99 0.99 729
148+
149+
accuracy 0.98 972
150+
macro avg 0.98 0.97 0.97 972
151+
weighted avg 0.98 0.98 0.98 972
152+
```
153+
```

0 commit comments

Comments
 (0)