|
| 1 | +--- |
| 2 | +layout: model |
| 3 | +title: Bert for Sequence Classification: Question vs Statement (Clinical) |
| 4 | +author: John Snow Labs |
| 5 | +name: bert_sequence_classifier_question_statement_clinical |
| 6 | +date: 2021-11-05 |
| 7 | +tags: [question, statement, clinical, en, licensed] |
| 8 | +task: Text Classification |
| 9 | +language: en |
| 10 | +edition: Spark NLP for Healthcare 3.3.2 |
| 11 | +spark_version: 3.0 |
| 12 | +supported: true |
| 13 | +article_header: |
| 14 | + type: cover |
| 15 | +use_language_switcher: "Python-Scala-Java" |
| 16 | +--- |
| 17 | + |
| 18 | +## Description |
| 19 | + |
| 20 | +Trained to add sentence classifying capabilities to distinguish between Question vs Statements in clinical domain. |
| 21 | + |
| 22 | +This model was imported from Hugging Face (https://huggingface.co/shahrukhx01/question-vs-statement-classifier), trained based on Haystack (https://github.com/deepset-ai/haystack/issues/611) and finetuned by John Snow Labs with in-house clinical annotations. |
| 23 | + |
| 24 | +## Predicted Entities |
| 25 | + |
| 26 | +`question`, `statement` |
| 27 | + |
| 28 | +{:.btn-box} |
| 29 | +<button class="button button-orange" disabled>Live Demo</button> |
| 30 | +<button class="button button-orange" disabled>Open in Colab</button> |
| 31 | +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/clinical/models/bert_sequence_classifier_question_statement_clinical_en_3.3.2_3.0_1636106577489.zip){:.button.button-orange.button-orange-trans.arr.button-icon} |
| 32 | + |
| 33 | +## How to use |
| 34 | + |
| 35 | + |
| 36 | + |
| 37 | +<div class="tabs-box" markdown="1"> |
| 38 | +{% include programmingLanguageSelectScalaPythonNLU.html %} |
| 39 | +```python |
| 40 | +documentAssembler = DocumentAssembler()\ |
| 41 | + .setInputCol("text")\ |
| 42 | + .setOutputCol("document") |
| 43 | + |
| 44 | +sentenceDetector = SentenceDetectorDLModel.pretrained() \ |
| 45 | + .setInputCols(["document"]) \ |
| 46 | + .setOutputCol("sentence") |
| 47 | + |
| 48 | +tokenizer = Tokenizer()\ |
| 49 | + .setInputCols("sentence")\ |
| 50 | + .setOutputCol("token") |
| 51 | + |
| 52 | +seq = BertForSequenceClassification.pretrained('bert_sequence_classifier_question_statement_clinical', 'en', 'clinical/models')\ |
| 53 | + .setInputCols(["token", "sentence"])\ |
| 54 | + .setOutputCol("label")\ |
| 55 | + .setCaseSensitive(True) |
| 56 | + |
| 57 | +pipeline = Pipeline(stages = [ |
| 58 | + documentAssembler, |
| 59 | + sentenceDetector, |
| 60 | + tokenizer, |
| 61 | + seq]) |
| 62 | + |
| 63 | +test_sentences = ["""Hello I am going to be having a baby throughand have just received my medical results before I have my tubes tested. I had the tests on day 23 of my cycle. My progresterone level is 10. What does this mean? What does progesterone level of 10 indicate? |
| 64 | +Your progesterone report is perfectly normal. We expect this result on day 23rd of the cycle.So there's nothing to worry as it's perfectly alright"""] |
| 65 | + |
| 66 | +res = p_model.transform(spark.createDataFrame(pd.DataFrame({'text': test_sentences}))) |
| 67 | +``` |
| 68 | +```scala |
| 69 | +val documentAssembler = DocumentAssembler()\ |
| 70 | + .setInputCol("text")\ |
| 71 | + .setOutputCol("document") |
| 72 | + |
| 73 | +val sentenceDetector = SentenceDetectorDLModel.pretrained() \ |
| 74 | + .setInputCols(["document"]) \ |
| 75 | + .setOutputCol("sentence") |
| 76 | + |
| 77 | +val tokenizer = Tokenizer()\ |
| 78 | + .setInputCols("sentence")\ |
| 79 | + .setOutputCol("token") |
| 80 | + |
| 81 | +val seq = BertForSequenceClassification.pretrained('bert_sequence_classifier_question_statement_clinical', 'en', 'clinical/models')\ |
| 82 | + .setInputCols(["token", "sentence"])\ |
| 83 | + .setOutputCol("label")\ |
| 84 | + .setCaseSensitive(True) |
| 85 | + |
| 86 | +val pipeline = new Pipeline().setStages(Array( |
| 87 | + documentAssembler, |
| 88 | + sentenceDetector, |
| 89 | + tokenizer, |
| 90 | + seq)) |
| 91 | + |
| 92 | +val test_sentences = "Hello I am going to be having a baby throughand have just received my medical results before I have my tubes tested. I had the tests on day 23 of my cycle. |
| 93 | +My progresterone level is 10. What does this mean? What does progesterone level of 10 indicate? |
| 94 | +Your progesterone report is perfectly normal. We expect this result on day 23rd of the cycle.So there's nothing to worry as it's perfectly alright" |
| 95 | + |
| 96 | +val example = Seq.empty[test_sentences].toDS.toDF("text") |
| 97 | +val result = pipeline.fit(example).transform(example) |
| 98 | +``` |
| 99 | +</div> |
| 100 | + |
| 101 | +## Results |
| 102 | + |
| 103 | +```bash |
| 104 | +``` |
| 105 | ++--------------------------------------------------------------------------------------------------------------------+---------+ |
| 106 | +|sentence |label | |
| 107 | ++--------------------------------------------------------------------------------------------------------------------+---------+ |
| 108 | +|Hello I am going to be having a baby throughand have just received my medical results before I have my tubes tested.|statement| |
| 109 | +|I had the tests on day 23 of my cycle. |statement| |
| 110 | +|My progresterone level is 10. |statement| |
| 111 | +|What does this mean? |question | |
| 112 | +|What does progesterone level of 10 indicate? |question | |
| 113 | +|Your progesterone report is perfectly normal. We expect this result on day 23rd of the cycle. |statement| |
| 114 | +|So there's nothing to worry as it's perfectly alright |statement| |
| 115 | ++--------------------------------------------------------------------------------------------------------------------+--------- |
| 116 | +``` |
| 117 | +``` |
| 118 | + |
| 119 | +{:.model-param} |
| 120 | +## Model Information |
| 121 | + |
| 122 | +{:.table-model} |
| 123 | +|---|---| |
| 124 | +|Model Name:|bert_sequence_classifier_question_statement_clinical| |
| 125 | +|Compatibility:|Spark NLP for Healthcare 3.3.2+| |
| 126 | +|License:|Licensed| |
| 127 | +|Edition:|Official| |
| 128 | +|Input Labels:|[token, sentence]| |
| 129 | +|Output Labels:|[label]| |
| 130 | +|Language:|en| |
| 131 | +|Case sensitive:|true| |
| 132 | + |
| 133 | +## Data Source |
| 134 | + |
| 135 | +For generic domain training: |
| 136 | +https://github.com/deepset-ai/haystack/issues/611 |
| 137 | + |
| 138 | +For finetuning in clinical domain, in house JSL annotations based on clinical Q&A. |
| 139 | + |
| 140 | +## Benchmarking |
| 141 | + |
| 142 | +```bash |
| 143 | +``` |
| 144 | + precision recall f1-score support |
| 145 | + |
| 146 | + question 0.97 0.94 0.96 243 |
| 147 | + statement 0.98 0.99 0.99 729 |
| 148 | + |
| 149 | + accuracy 0.98 972 |
| 150 | + macro avg 0.98 0.97 0.97 972 |
| 151 | +weighted avg 0.98 0.98 0.98 972 |
| 152 | +``` |
| 153 | +``` |
0 commit comments