Skip to content

Conversation

MedAmineYoussef
Copy link

If you add a model or a dataset, please add the corresponding checklist:

@Samoed Samoed changed the title gqnli added dataset: gqnli added Aug 6, 2025
Copy link
Contributor

@KennethEnevoldsen KennethEnevoldsen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR - we need a few updates on the metadata

license="cc-by-4.0",
annotations_creators="human-annotated",
dialect=[],
sample_creation="found",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The dataset seems to be translated - not found. Can you also make sure that this is reflected in the description?

eval_splits=["test"],
eval_langs=["fra-Latn"],
main_score="max_accuracy",
date=("2025-08-05", "2025-08-05"),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This date appears to be incorrect - it should be the date range during which the texts were written.

publisher = "ELRA and ICCL",
url = "https://aclanthology.org/2024.lrec-main.1065",
pages = "12173--12186",
abstract = "This paper introduces DACCORD, an original dataset in French for automatic detection of contradictions between sentences. It also presents new, manually translated versions of two datasets, namely the well known dataset RTE3 and the recent dataset GQNLI, from English to French, for the task of natural language inference / recognising textual entailment, which is a sentence-pair classification task. These datasets help increase the admittedly limited number of datasets in French available for these tasks. DACCORD consists of 1034 pairs of sentences and is the first dataset exclusively dedicated to this task and covering among others the topic of the Russian invasion in Ukraine. RTE3-FR contains 800 examples for each of its validation and test subsets, while GQNLI-FR is composed of 300 pairs of sentences and focuses specifically on the use of generalised quantifiers. Our experiments on these datasets show that they are more challenging than the two already existing datasets for the mainstream NLI task in French (XNLI, FraCaS). For languages other than English, most deep learning models for NLI tasks currently have only XNLI available as a training set. Additional datasets, such as ours for French, could permit different training and evaluation strategies, producing more robust results and reducing the inevitable biases present in any single dataset.",
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

correct citation

@KennethEnevoldsen
Copy link
Contributor

@MedAmineYoussef can you also add and complete the dataset checklist (see lik in your own PR comment)

@MedAmineYoussef
Copy link
Author

checkllist done

Copy link
Contributor

@KennethEnevoldsen KennethEnevoldsen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Almost there :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can't push these - simply copy and paste the checklist into the main PR message

from mteb.abstasks.TaskMetadata import TaskMetadata


class GqnliTask(AbsTaskPairClassification):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
class GqnliTask(AbsTaskPairClassification):
class GqnliPairClassification(AbsTaskPairClassification):


class GqnliTask(AbsTaskPairClassification):
metadata = TaskMetadata(
name="gqnli",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
name="gqnli",
name="GqnliPairClassification",

and rename file to match

Copy link
Contributor

github-actions bot commented Sep 1, 2025

This pull request has been automatically marked as stale due to inactivity.

@github-actions github-actions bot added the stale label Sep 1, 2025
@KennethEnevoldsen
Copy link
Contributor

@MedAmineYoussef are you still working on this PR?

@github-actions github-actions bot removed the stale label Sep 2, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants