-
Notifications
You must be signed in to change notification settings - Fork 320
Added post processing (for reasoning tokens) to pipeline #882
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
19 commits
Select commit
Hold shift + click to select a range
44bd18f
added post processing
clefourrier d32d683
default factory for dataclass
clefourrier 90f6969
style
clefourrier 33ae811
fix
clefourrier 772e9f7
fix, tokens should be list of tuples
clefourrier dfa9b01
couple bug fixes with the linter + an actual bug when updating the da…
clefourrier 2ff3626
doc update for reasoning_pair kwarg
clefourrier 7083da4
manage user args
clefourrier 4fa53fb
add aime avg@64 like in qwen paper
clefourrier 4b8c7c5
small fix max_len for vllm models
clefourrier 9525a29
up metric
clefourrier 18a5396
unrelated, updated doc which was completely outdated
clefourrier 6349a79
more robust reasoning token management
clefourrier 17008e8
small reorg of lighteval task class to allow mocking
clefourrier 40d04c3
test suite
clefourrier 8a64c6d
Merge branch 'main' into think
clefourrier b948696
fix tests
clefourrier 94c756d
update doc
clefourrier 3f56c4f
Merge branch 'main' into think
clefourrier File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -31,10 +31,10 @@ | |
app = typer.Typer() | ||
|
||
|
||
HELP_PANNEL_NAME_1 = "Common Parameters" | ||
HELP_PANNEL_NAME_2 = "Logging Parameters" | ||
HELP_PANNEL_NAME_3 = "Debug Parameters" | ||
HELP_PANNEL_NAME_4 = "Modeling Parameters" | ||
HELP_PANEL_NAME_1 = "Common Parameters" | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. nit, unrelated to the PR |
||
HELP_PANEL_NAME_2 = "Logging Parameters" | ||
HELP_PANEL_NAME_3 = "Debug Parameters" | ||
HELP_PANEL_NAME_4 = "Modeling Parameters" | ||
|
||
|
||
@app.command(rich_help_panel="Evaluation Backends") | ||
|
@@ -45,46 +45,56 @@ def custom( | |
tasks: Annotated[str, Argument(help="Comma-separated list of tasks to evaluate on.")], | ||
# === Common parameters === | ||
dataset_loading_processes: Annotated[ | ||
int, Option(help="Number of processes to use for dataset loading.", rich_help_panel=HELP_PANNEL_NAME_1) | ||
int, Option(help="Number of processes to use for dataset loading.", rich_help_panel=HELP_PANEL_NAME_1) | ||
] = 1, | ||
custom_tasks: Annotated[ | ||
Optional[str], Option(help="Path to custom tasks directory.", rich_help_panel=HELP_PANNEL_NAME_1) | ||
Optional[str], Option(help="Path to custom tasks directory.", rich_help_panel=HELP_PANEL_NAME_1) | ||
] = None, | ||
num_fewshot_seeds: Annotated[ | ||
int, Option(help="Number of seeds to use for few-shot evaluation.", rich_help_panel=HELP_PANNEL_NAME_1) | ||
int, Option(help="Number of seeds to use for few-shot evaluation.", rich_help_panel=HELP_PANEL_NAME_1) | ||
] = 1, | ||
remove_reasoning_tags: Annotated[ | ||
bool, Option(help="Remove reasoning tags from responses.", rich_help_panel=HELP_PANEL_NAME_1) | ||
] = True, | ||
reasoning_tags: Annotated[ | ||
str | None, | ||
Option( | ||
help="List of reasoning tags (provided as pairs) to remove from responses. Default is [('<think>', '</think>')].", | ||
rich_help_panel=HELP_PANEL_NAME_1, | ||
), | ||
] = None, | ||
# === saving === | ||
output_dir: Annotated[ | ||
str, Option(help="Output directory for evaluation results.", rich_help_panel=HELP_PANNEL_NAME_2) | ||
str, Option(help="Output directory for evaluation results.", rich_help_panel=HELP_PANEL_NAME_2) | ||
] = "results", | ||
results_path_template: Annotated[ | ||
str | None, | ||
Option( | ||
help="Template path for where to save the results, you have access to 3 variables, `output_dir`, `org` and `model`. for example a template can be `'{output_dir}/1234/{org}+{model}'`", | ||
rich_help_panel=HELP_PANNEL_NAME_2, | ||
rich_help_panel=HELP_PANEL_NAME_2, | ||
), | ||
] = None, | ||
push_to_hub: Annotated[ | ||
bool, Option(help="Push results to the huggingface hub.", rich_help_panel=HELP_PANNEL_NAME_2) | ||
bool, Option(help="Push results to the huggingface hub.", rich_help_panel=HELP_PANEL_NAME_2) | ||
] = False, | ||
push_to_tensorboard: Annotated[ | ||
bool, Option(help="Push results to tensorboard.", rich_help_panel=HELP_PANNEL_NAME_2) | ||
bool, Option(help="Push results to tensorboard.", rich_help_panel=HELP_PANEL_NAME_2) | ||
] = False, | ||
public_run: Annotated[ | ||
bool, Option(help="Push results and details to a public repo.", rich_help_panel=HELP_PANNEL_NAME_2) | ||
bool, Option(help="Push results and details to a public repo.", rich_help_panel=HELP_PANEL_NAME_2) | ||
] = False, | ||
results_org: Annotated[ | ||
Optional[str], Option(help="Organization to push results to.", rich_help_panel=HELP_PANNEL_NAME_2) | ||
Optional[str], Option(help="Organization to push results to.", rich_help_panel=HELP_PANEL_NAME_2) | ||
] = None, | ||
save_details: Annotated[ | ||
bool, Option(help="Save detailed, sample per sample, results.", rich_help_panel=HELP_PANNEL_NAME_2) | ||
bool, Option(help="Save detailed, sample per sample, results.", rich_help_panel=HELP_PANEL_NAME_2) | ||
] = False, | ||
# === debug === | ||
max_samples: Annotated[ | ||
Optional[int], Option(help="Maximum number of samples to evaluate on.", rich_help_panel=HELP_PANNEL_NAME_3) | ||
Optional[int], Option(help="Maximum number of samples to evaluate on.", rich_help_panel=HELP_PANEL_NAME_3) | ||
] = None, | ||
job_id: Annotated[ | ||
int, Option(help="Optional job id for future refenrence.", rich_help_panel=HELP_PANNEL_NAME_3) | ||
int, Option(help="Optional job id for future refenrence.", rich_help_panel=HELP_PANEL_NAME_3) | ||
] = 0, | ||
): | ||
""" | ||
|
@@ -113,6 +123,8 @@ def custom( | |
custom_tasks_directory=custom_tasks, | ||
num_fewshot_seeds=num_fewshot_seeds, | ||
max_samples=max_samples, | ||
remove_reasoning_tags=remove_reasoning_tags, | ||
reasoning_tags=reasoning_tags, | ||
) | ||
pipeline = Pipeline( | ||
tasks=tasks, | ||
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unrelated to PR, incorrect doc was updated