Pushing an automatically generated readme datacard with yourbench dataset #125

patrickfleith · 2025-06-11T18:04:43Z

Hi, this PR implements dataset card generation #113 and uploads to the Hub.

Example result: https://huggingface.co/datasets/patrickfleith/yourbench_example

Let me know if I should change something 🤗

Notable Changes

Template Creation: Added a markdown template file yourbench_card_template.md to standardize dataset cards
Configuration Options: Implemented optional upload_card flag in YAML configurations (added to both advanced_example.yaml and simple_example.yaml)
Integration: Modified handler.py and dataset_engine.py to automatically generate and upload dataset cards upon pipeline completion

Implementation Details

Metadata Management:
- Added extract_readme_metadata and extract_dataset_info functions to preserve dataset metadata generated during pipeline steps
- This ensures all automatically generated dataset information is properly included in the README
Core Functions:
- _serialize_config_for_card: Serializes pipeline configuration to YAML for inclusion in the dataset card
- _get_pipeline_subset_info: Maps each active pipeline stage to predefined descriptions
- _generate_and_upload_dataset_card and upload_dataset_card: Handle card construction and uploading
Badge Design

Created a YourBench badge based on the axolotl recommendation from the original issue
Provided two versions:
- SVG format for future modifications
- PNG format (200x32) for README display
Note: The badge references https://raw.githubusercontent.com/huggingface/yourbench/main/docs/assets/yourbench-badge-web.png and will render properly after merging

…example.yaml

… but also extract the metadata card to preserve them.

patrickfleith · 2025-06-24T17:11:08Z

Hi @sumukshashidhar @alozowski do you need some help with he PR? I appreciate it looks overly complex with respect to what we are trying to achieve, but it was much more tricky than I initially thought.

sumukshashidhar · 2025-06-25T02:08:16Z

Hi @patrickfleith ! Could you please resolve the merge conflict with the dataset engine. If you're unable to, I'd be super happy to do it!

…-dataset

sumukshashidhar · 2025-06-26T05:25:50Z

Hi @patrickfleith! I did a preliminary merge! I'll test the functionality later today!

sumukshashidhar · 2025-06-26T05:30:44Z

@patrickfleith I don't seem to have access to your repo, so just running make style and make quality should make this mergable.

Thank you so much for your work!

patrickfleith · 2025-06-26T21:32:21Z

@patrickfleith I don't seem to have access to your repo, so just running make style and make quality should make this mergable.

Thank you so much for your work!

Hey, thanks for the preliminary merge. I believe you also resolved the conflict during this merge, right? Because I don't see the conflict anymore. I just pushed a new commit with make style and make quality. I tested it with simple_example and it seems to work!

sumukshashidhar · 2025-06-27T15:00:31Z

Thank you so much! I'll merge this with the main!

patrickfleith added 4 commits June 10, 2025 20:36

Adding the upload_card parameter in simple_example.yaml and advanced_…

6d89312

…example.yaml

created a template markdown for dataset card

b1a00ff

created yourbench badge svg and png assets

0c63d03

Adding routines to dataset_engine.py to generate the card, upload it,…

cce35f5

… but also extract the metadata card to preserve them.

sumukshashidhar requested a review from alozowski June 25, 2025 02:07

Merge branch 'main' into feature/push-read-me-datacard-with-yourbench…

54197eb

…-dataset

Ran make style and make quality

58e81a5

sumukshashidhar self-requested a review June 27, 2025 15:00

sumukshashidhar approved these changes Jun 27, 2025

View reviewed changes

sumukshashidhar merged commit 78f7d2a into huggingface:main Jun 27, 2025
6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Pushing an automatically generated readme datacard with yourbench dataset #125

Pushing an automatically generated readme datacard with yourbench dataset #125

Uh oh!

patrickfleith commented Jun 11, 2025

Uh oh!

patrickfleith commented Jun 24, 2025

Uh oh!

sumukshashidhar commented Jun 25, 2025

Uh oh!

sumukshashidhar commented Jun 26, 2025

Uh oh!

sumukshashidhar commented Jun 26, 2025

Uh oh!

patrickfleith commented Jun 26, 2025

Uh oh!

sumukshashidhar commented Jun 27, 2025

Uh oh!

Uh oh!

Uh oh!

Pushing an automatically generated readme datacard with yourbench dataset #125

Pushing an automatically generated readme datacard with yourbench dataset #125

Uh oh!

Conversation

patrickfleith commented Jun 11, 2025

Notable Changes

Implementation Details

Uh oh!

patrickfleith commented Jun 24, 2025

Uh oh!

sumukshashidhar commented Jun 25, 2025

Uh oh!

sumukshashidhar commented Jun 26, 2025

Uh oh!

sumukshashidhar commented Jun 26, 2025

Uh oh!

patrickfleith commented Jun 26, 2025

Uh oh!

sumukshashidhar commented Jun 27, 2025

Uh oh!

Uh oh!

Uh oh!