-
Notifications
You must be signed in to change notification settings - Fork 10.5k
Description
Bug description
Hello, I was following ch05/07_gpt_to_llama/converting-gpt-to-llama2.ipynb
notebook, and I found out that the output from weight loaded Llama 2 7B was different from Hugging Face's version. I used greedy decoding to generate the response on both the notebook code and Hugging Face's. To make sure I was using the correct decoding settings on Hugging Face, I tried out with gpt2 model and both Hugging Face and the notebook code outputs matched.
Here are the comparison of the outputs using the same seed and greedy decoding.
Because Hugging Face tokenizer adds (begging of sequence) token in front of the sequence for llama2 models, I added it and generated the following:
Every effort has been made to ensure that the information contained in this website is accurate and up to date and correct at the time of publication
Here is Hugging Face output with a little bit more tokens:
Every effort has been made to ensure that the information contained in this website is accurate and up to date. However, the information is provided without any warranty, express or implied, as to the accuracy
Their outputs are equivalent up to certain tokens and diverge.
What operating system are you using?
Linux
Where do you run your code?
Local (laptop, desktop)
Environment
[OK] Your Python version is 3.12.9
2025-07-22 00:22:31.985392: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:467] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
E0000 00:00:1753161752.316201 7968 cuda_dnn.cc:8579] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1753161752.409691 7968 cuda_blas.cc:1407] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
W0000 00:00:1753161753.247778 7968 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
W0000 00:00:1753161753.247806 7968 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
W0000 00:00:1753161753.247810 7968 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
W0000 00:00:1753161753.247814 7968 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
2025-07-22 00:22:33.321338: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
[OK] torch 2.7.0+cu126
[OK] jupyterlab 4.3.5
[OK] tiktoken 0.9.0
[OK] matplotlib 3.10.1
[OK] tensorflow 2.19.0
[OK] tqdm 4.67.1
[FAIL] numpy 2.2.6, please install a version matching <2.1,>=1.26
[OK] pandas 2.2.3
[OK] psutil 6.1.1