Pushing an automatically generated readme datacard with yourbench dataset #125
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Hi, this PR implements dataset card generation #113 and uploads to the Hub.
Example result: https://huggingface.co/datasets/patrickfleith/yourbench_example
Let me know if I should change something 🤗
Notable Changes
yourbench_card_template.md
to standardize dataset cardsupload_card
flag in YAML configurations (added to bothadvanced_example.yaml
andsimple_example.yaml
)handler.py
anddataset_engine.py
to automatically generate and upload dataset cards upon pipeline completionImplementation Details
Metadata Management:
extract_readme_metadata
andextract_dataset_info
functions to preserve dataset metadata generated during pipeline stepsCore Functions:
_serialize_config_for_card
: Serializes pipeline configuration to YAML for inclusion in the dataset card_get_pipeline_subset_info
: Maps each active pipeline stage to predefined descriptions_generate_and_upload_dataset_card
andupload_dataset_card
: Handle card construction and uploadingBadge Design
https://raw.githubusercontent.com/huggingface/yourbench/main/docs/assets/yourbench-badge-web.png
and will render properly after merging