Update AXLearn performance script #1593

Steboss · 2025-07-31T16:38:29Z

Add the metrics to the output log file

gpupuck · 2025-07-31T16:51:12Z

.github/container/fuji-train-perf.py

+            f"\n=== Final metrics ===\n"
+            f"Tokens/s/device: {tokens_per_sec_gpu.mean()} +/- {tokens_per_sec_gpu.std()}\n"
+            f"Seqs/s/device: {seqs_per_sec_gpu.mean()} +/- {seqs_per_sec_gpu.std()}\n"
+            f"AvgTimestep: {times_arr.mean()} +/- {times_arr.std()}\n"


A bit confused, what's Timestep?

Hey @gpupuck
Timestep the name is a bit misleading, it comes directly from AXLearn. It's how many seconds we take to perform a single training step.
BTW I was thinking of saving those metrics into a specific metric.log file, rather than writing them directly in the output log. WDYT?

I think printing the metrics to stdout is good enough. It's easier to process the log if all the metrics are there.

That's true
However, I am now running some tests on cloud and I need to save those metrics in a storage, to check them after the model has run

Just do both? Print human-readable to stdout, dump machine-readable (json?) values to a separate file.

You're right Olli
I ended up to that :) no need to change the script

update the script to have the metrics within the output file

f60f8d9

Steboss requested review from aaronp24, olupton and gpupuck and removed request for aaronp24 July 31, 2025 16:38

gpupuck reviewed Jul 31, 2025

View reviewed changes

save metrics to a metrics file

1ae6686

Steboss requested a review from gpupuck August 1, 2025 08:54

Steboss closed this Aug 4, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Update AXLearn performance script #1593

Update AXLearn performance script #1593

Uh oh!

Steboss commented Jul 31, 2025

Uh oh!

gpupuck Jul 31, 2025

Uh oh!

Steboss Jul 31, 2025

Uh oh!

gpupuck Jul 31, 2025

Uh oh!

Steboss Aug 4, 2025

Uh oh!

olupton Aug 4, 2025

Uh oh!

Steboss Aug 4, 2025

Uh oh!

Uh oh!

Update AXLearn performance script #1593

Update AXLearn performance script #1593

Uh oh!

Conversation

Steboss commented Jul 31, 2025

Uh oh!

gpupuck Jul 31, 2025

Choose a reason for hiding this comment

Uh oh!

Steboss Jul 31, 2025

Choose a reason for hiding this comment

Uh oh!

gpupuck Jul 31, 2025

Choose a reason for hiding this comment

Uh oh!

Steboss Aug 4, 2025

Choose a reason for hiding this comment

Uh oh!

olupton Aug 4, 2025

Choose a reason for hiding this comment

Uh oh!

Steboss Aug 4, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!