Skip to content

Update AXLearn performance script #1593

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from
Closed

Conversation

Steboss
Copy link
Contributor

@Steboss Steboss commented Jul 31, 2025

Add the metrics to the output log file

@Steboss Steboss requested review from aaronp24, olupton and gpupuck and removed request for aaronp24 July 31, 2025 16:38
f"\n=== Final metrics ===\n"
f"Tokens/s/device: {tokens_per_sec_gpu.mean()} +/- {tokens_per_sec_gpu.std()}\n"
f"Seqs/s/device: {seqs_per_sec_gpu.mean()} +/- {seqs_per_sec_gpu.std()}\n"
f"AvgTimestep: {times_arr.mean()} +/- {times_arr.std()}\n"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A bit confused, what's Timestep?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @gpupuck
Timestep the name is a bit misleading, it comes directly from AXLearn. It's how many seconds we take to perform a single training step.
BTW I was thinking of saving those metrics into a specific metric.log file, rather than writing them directly in the output log. WDYT?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think printing the metrics to stdout is good enough. It's easier to process the log if all the metrics are there.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's true
However, I am now running some tests on cloud and I need to save those metrics in a storage, to check them after the model has run

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just do both? Print human-readable to stdout, dump machine-readable (json?) values to a separate file.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right Olli
I ended up to that :) no need to change the script

@Steboss Steboss requested a review from gpupuck August 1, 2025 08:54
@Steboss Steboss closed this Aug 4, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants