Skip to content

Commit cd94669

Browse files
Merge pull request #6495 from JohnSnowLabs/2021-11-21-distilbert_sequence_classifier_banking77_en_WtHZOklESUeoLm3ZZ6VN9JXl
2021-11-21-distilbert_sequence_classifier_banking77_en
2 parents 7926bfc + 24ba004 commit cd94669

File tree

1 file changed

+121
-0
lines changed

1 file changed

+121
-0
lines changed
Lines changed: 121 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,121 @@
1+
---
2+
layout: model
3+
title: DistilBERT Sequence Classification - Banking77 (distilbert_sequence_classifier_banking77)
4+
author: John Snow Labs
5+
name: distilbert_sequence_classifier_banking77
6+
date: 2021-11-21
7+
tags: [banking, distilbert, en, english, sequence_classification, open_source]
8+
task: Text Classification
9+
language: en
10+
edition: Spark NLP 3.3.3
11+
spark_version: 3.0
12+
supported: true
13+
article_header:
14+
type: cover
15+
use_language_switcher: "Python-Scala-Java"
16+
---
17+
18+
## Description
19+
20+
Fine-tuned DistilBERT model by using Banking77 dataset. The dataset is composed of online banking queries annotated with their corresponding intents.
21+
22+
BANKING77 dataset provides a very fine-grained set of intents in a banking domain. It comprises 13,083 customer service queries labeled with 77 intents. It focuses on fine-grained single-domain intent detection.
23+
24+
## Predicted Entities
25+
26+
`activate_my_card`, `age_limit`, `apple_pay_or_google_pay`, `atm_support`, `automatic_top_up`, `balance_not_updated_after_bank_transfer`, `balance_not_updated_after_cheque_or_cash_deposit`, `beneficiary_not_allowed`, `cancel_transfer`, `card_about_to_expire`, `card_acceptance`, `card_arrival`, `card_delivery_estimate`, `card_linking`, `card_not_working`, `card_payment_fee_charged`, `card_payment_not_recognised`, `card_payment_wrong_exchange_rate`, `card_swallowed`, `cash_withdrawal_charge`, `cash_withdrawal_not_recognised`, `change_pin`, `compromised_card`, `contactless_not_working`, `country_support`, `declined_card_payment`, `declined_cash_withdrawal`, `declined_transfer`, `direct_debit_payment_not_recognised`, `disposable_card_limits`, `edit_personal_details`, `exchange_charge`, `exchange_rate`, `exchange_via_app`, `extra_charge_on_statement`, `failed_transfer`, `fiat_currency_support`, `get_disposable_virtual_card`, `get_physical_card`, `getting_spare_card`, `getting_virtual_card`, `lost_or_stolen_card`, `lost_or_stolen_phone`, `order_physical_card`, `passcode_forgotten`, `pending_card_payment`, `pending_cash_withdrawal`, `pending_top_up`, `pending_transfer`, `pin_blocked`, `receiving_money`, `Refund_not_showing_up`, `request_refund`, `reverted_card_payment?`, `supported_cards_and_currencies`, `terminate_account`, `top_up_by_bank_transfer_charge`, `top_up_by_card_charge`, `top_up_by_cash_or_cheque`, `top_up_failed`, `top_up_limits`, `top_up_reverted`, `topping_up_by_card`, `transaction_charged_twice`, `transfer_fee_charged`, `transfer_into_account`, `transfer_not_received_by_recipient`, `transfer_timing`, `unable_to_verify_identity`, `verify_my_identity`, `verify_source_of_funds`, `verify_top_up`, `virtual_card_not_working`, `visa_or_mastercard`, `why_verify_identity`, `wrong_amount_of_cash_received`, `wrong_exchange_rate_for_cash_withdrawal`
27+
28+
{:.btn-box}
29+
<button class="button button-orange" disabled>Live Demo</button>
30+
<button class="button button-orange" disabled>Open in Colab</button>
31+
[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_sequence_classifier_banking77_en_3.3.3_3.0_1637500452249.zip){:.button.button-orange.button-orange-trans.arr.button-icon}
32+
33+
## How to use
34+
35+
36+
37+
<div class="tabs-box" markdown="1">
38+
{% include programmingLanguageSelectScalaPythonNLU.html %}
39+
```python
40+
document_assembler = DocumentAssembler() \
41+
.setInputCol('text') \
42+
.setOutputCol('document')
43+
44+
tokenizer = Tokenizer() \
45+
.setInputCols(['document']) \
46+
.setOutputCol('token')
47+
48+
sequenceClassifier = DistilBertForSequenceClassification \
49+
.pretrained('distilbert_sequence_classifier_banking77', 'en') \
50+
.setInputCols(['token', 'document']) \
51+
.setOutputCol('class') \
52+
.setMaxSentenceLength(512)
53+
54+
pipeline = Pipeline(stages=[
55+
document_assembler,
56+
tokenizer,
57+
sequenceClassifier
58+
])
59+
60+
example = spark.createDataFrame([['I am still waiting on my card?']]).toDF("text")
61+
result = pipeline.fit(example).transform(example)
62+
```
63+
```scala
64+
val document_assembler = DocumentAssembler()
65+
.setInputCol("text")
66+
.setOutputCol("document")
67+
68+
val tokenizer = Tokenizer()
69+
.setInputCols("document")
70+
.setOutputCol("token")
71+
72+
val tokenClassifier = DistilBertForSequenceClassification.pretrained("distilbert_sequence_classifier_banking77", "en")
73+
.setInputCols("document", "token")
74+
.setOutputCol("class")
75+
.setMaxSentenceLength(512)
76+
77+
val pipeline = new Pipeline().setStages(Array(document_assembler, tokenizer, sequenceClassifier))
78+
79+
val example = Seq("I am still waiting on my card?").toDS.toDF("text")
80+
81+
val result = pipeline.fit(example).transform(example)
82+
```
83+
</div>
84+
85+
{:.model-param}
86+
## Model Information
87+
88+
{:.table-model}
89+
|---|---|
90+
|Model Name:|distilbert_sequence_classifier_banking77|
91+
|Compatibility:|Spark NLP 3.3.3+|
92+
|License:|Open Source|
93+
|Edition:|Official|
94+
|Input Labels:|[token, document]|
95+
|Output Labels:|[class]|
96+
|Language:|en|
97+
|Case sensitive:|false|
98+
|Max sentense length:|512|
99+
100+
## Data Source
101+
102+
[https://huggingface.co/philschmid/DistilBERT-Banking77](https://huggingface.co/philschmid/DistilBERT-Banking77)
103+
104+
[https://huggingface.co/datasets/banking77](https://huggingface.co/datasets/banking77)
105+
106+
## Benchmarking
107+
108+
```bash
109+
- Loss: 0.2988220155239105
110+
- Accuracy: 0.9246753246753247
111+
- Macro F1: 0.9246117406953515
112+
- Micro F1: 0.9246753246753247
113+
- Weighted F1: 0.9246117406953518
114+
- Macro Precision: 0.9278163684429038
115+
- Micro Precision: 0.9246753246753247
116+
- Weighted Precision: 0.927816368442904
117+
- Macro Recall: 0.9246753246753248
118+
- Micro Recall: 0.9246753246753247
119+
- Weighted Recall: 0.9246753246753247
120+
121+
```

0 commit comments

Comments
 (0)