Skip to content

feat(validate): add precision, recall, and F1 metrics #2568

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

ha405
Copy link

@ha405 ha405 commented Aug 15, 2025

Overview

This PR adds optional support for calculating Precision, Recall, and F1-score in validate.py.
These metrics provide a more nuanced view of model performance than Top-k accuracy, especially for datasets with class imbalance.

Changes

  • New flag: --metrics-avg to select averaging method (weighted, macro, micro).
  • Metric computation: Uses scikit-learn for robust precision/recall/F1 calculations (soft dependency).
  • Integration: New metrics are included in console logs and results files.
  • Backward compatibility: Default script behavior remains unchanged.

Closes #2506

@ha405 ha405 changed the title feat(validate): add extra metrics, JSON export, and latency/memory tr… feat(validate): add precision, recall, and F1 metrics Aug 15, 2025
@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@rwightman
Copy link
Collaborator

@ha405 looks good, I haven't tested yet, but will when I'm back from vacation in a week and a bit. Did you look at train script? Could be added there but needs extra work to cover distributed case...

@ha405
Copy link
Author

ha405 commented Aug 22, 2025

hi, I looked into train.py and opened a new request #2574

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Will you add Precision, Recall, and F1 score to metrics
3 participants