Skip to content

Conversation

@trianxy
Copy link

@trianxy trianxy commented Mar 13, 2025

This PR adds optional validation using HellaSwag accuracy (on 10042 validation examples).

It is controlled by setting the hyperparameter hellaswag=True.

Motivation
I am trying to make the model smarter by adjusting the architecture/training objective. And I wanted to have benchmarks like HellaSwag which may better correlate with smarter models than loss. By the way: Having this repo which makes a model converge quickly is super helpful!

Not sure if it's worth it to merge into master. Some thoughts:

  • Pros:
    • more data/confidence on whether a change improved the model
    • related to the above bullet point, it can improve comparability to other models and training runs
  • Cons:
    • adds ~150 lines of code
    • adds ~40s wall time to script when running on 8 x H100:
      • ~30s to download data + preprocess + warm-up
      • ~6s to run validations (0.4s every time we run validation on 10042 examples: the code packs 400-800 HellaSwag tasks into 1 sequence which is then evaluated using 1 forward pass. 2 forward passes are required for each of the 8 GPUs to process all examples)

Happy to adjust the PR if helpful.

@trianxy
Copy link
Author

trianxy commented May 5, 2025

Hey @KellerJordan , checking in if there's anything I can do to make this PR easier to review, or make it more valuable?

Also, feel free to close it if you feel it's out of scope.

@KellerJordan
Copy link
Owner

Hi @trianxy, thanks for the PR. I think adding code to easily get the HellaSwag score is a good idea; however, I want to keep it out of train_gpt.py to avoid complexity and wallclock increase. Probably for me to accept a PR adding HellaSwag, it would instead be adding a hellaswag.py file into data/ which calculates the score by consumes a checkpoint generated by putting save_checkpoint=True into the Args in train_gpt.py.

@trianxy
Copy link
Author

trianxy commented May 28, 2025

Sounds good @KellerJordan - I'll take a stab at adjusting this PR accordingly

@trianxy
Copy link
Author

trianxy commented Aug 12, 2025

Hey @gszfwsb , thanks for the approval. Before we merge, let me implement what @KellerJordan mentioned in #89 (comment) and also resolve the conflicts.

Haven't gotten around it yet, but am looking to do it in September.

After that, I'd ask you to take one more look again.

@trianxy trianxy force-pushed the hellaswag branch 2 times, most recently from 9b8fb2c to 473800c Compare August 27, 2025 10:25
@trianxy
Copy link
Author

trianxy commented Aug 27, 2025

Hey @gszfwsb , I just updated this PR to incorporate @KellerJordan 's suggestion to move the HellaSwag code into a separate file data/hellaswag.py.

To both of you: Feel free to let me know if there's anything else I can do on my side.

FYI

I didn't change how train_gpt.py works, but I kind of had to move a bunch of code behind a clause if __name__ == "__main__"..., because I wanted to import the model from that file without re-running the training loop.

There would have been a few options to implement this PR without touching train_gpt.py, e.g.

  • Option 1: read the source code of train_gpt.py into a string, remove unnecessary stuff, and then exec(...) it
  • Option 2: use the module ast to only load from the source code those objects which are necessary to define the model architecture

I decided against these, because they felt less maintainable and more likely to break in the future.

@trianxy trianxy requested a review from gszfwsb August 28, 2025 10:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants