-
Notifications
You must be signed in to change notification settings - Fork 463
dataset: gqnli added #2989
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
dataset: gqnli added #2989
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR - we need a few updates on the metadata
license="cc-by-4.0", | ||
annotations_creators="human-annotated", | ||
dialect=[], | ||
sample_creation="found", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The dataset seems to be translated - not found. Can you also make sure that this is reflected in the description?
eval_splits=["test"], | ||
eval_langs=["fra-Latn"], | ||
main_score="max_accuracy", | ||
date=("2025-08-05", "2025-08-05"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This date appears to be incorrect - it should be the date range during which the texts were written.
Co-authored-by: Kenneth Enevoldsen <[email protected]>
publisher = "ELRA and ICCL", | ||
url = "https://aclanthology.org/2024.lrec-main.1065", | ||
pages = "12173--12186", | ||
abstract = "This paper introduces DACCORD, an original dataset in French for automatic detection of contradictions between sentences. It also presents new, manually translated versions of two datasets, namely the well known dataset RTE3 and the recent dataset GQNLI, from English to French, for the task of natural language inference / recognising textual entailment, which is a sentence-pair classification task. These datasets help increase the admittedly limited number of datasets in French available for these tasks. DACCORD consists of 1034 pairs of sentences and is the first dataset exclusively dedicated to this task and covering among others the topic of the Russian invasion in Ukraine. RTE3-FR contains 800 examples for each of its validation and test subsets, while GQNLI-FR is composed of 300 pairs of sentences and focuses specifically on the use of generalised quantifiers. Our experiments on these datasets show that they are more challenging than the two already existing datasets for the mainstream NLI task in French (XNLI, FraCaS). For languages other than English, most deep learning models for NLI tasks currently have only XNLI available as a training set. Additional datasets, such as ours for French, could permit different training and evaluation strategies, producing more robust results and reducing the inevitable biases present in any single dataset.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
correct citation
@MedAmineYoussef can you also add and complete the dataset checklist (see lik in your own PR comment) |
checkllist done |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Almost there :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can't push these - simply copy and paste the checklist into the main PR message
from mteb.abstasks.TaskMetadata import TaskMetadata | ||
|
||
|
||
class GqnliTask(AbsTaskPairClassification): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
class GqnliTask(AbsTaskPairClassification): | |
class GqnliPairClassification(AbsTaskPairClassification): |
|
||
class GqnliTask(AbsTaskPairClassification): | ||
metadata = TaskMetadata( | ||
name="gqnli", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
name="gqnli", | |
name="GqnliPairClassification", |
and rename file to match
This pull request has been automatically marked as stale due to inactivity. |
@MedAmineYoussef are you still working on this PR? |
If you add a model or a dataset, please add the corresponding checklist: