improve comment

rasbt · rasbt · commit 22a71d543ac2 · 2025-09-02T10:05:24.000-05:00
diff --git a/ch05/07_gpt_to_llama/converting-gpt-to-llama2.ipynb b/ch05/07_gpt_to_llama/converting-gpt-to-llama2.ipynb
@@ -411,7 +411,9 @@
     "\n",
     "- Unlike traditional absolute positional embeddings, Llama uses rotary position embeddings (RoPE), which enable it to capture both absolute and relative positional information simultaneously\n",
     "- The reference paper for RoPE is [RoFormer: Enhanced Transformer with Rotary Position Embedding (2021)](https://arxiv.org/abs/2104.09864)\n",
-    "- This code uses the RoPE **split-halves** style, which matches the Hugging Face Transformers implementation ([modeling\\_llama.py](https://github.com/huggingface/transformers/blob/e42587f596181396e1c4b63660abf0c736b10dae/src/transformers/models/llama/modeling_llama.py#L173-L188)).<br> The original RoPE paper and Meta’s official LLaMA-2 repo, however, use the **interleaved (even/odd)** version ([llama/model.py](https://github.com/meta-llama/llama/blob/6c7fe276574e78057f917549435a2554000a876d/llama/model.py#L64-L74)).<br> Both variants work fine — just be aware of this difference to avoid confusion."
+    "- RoPE can be implemented in two equivalent ways: the *split-halves* version and the *interleaved even/odd version*; they are mathematically the same as long as we pair dimensions consistently and use the same cos/sin ordering (see [this](https://github.com/rasbt/LLMs-from-scratch/issues/751) GitHub discussion for more information)\n",
+    "- This code uses the RoPE *split-halves* approach, similar to Hugging Face transformers ([modeling_llama.py](https://github.com/huggingface/transformers/blob/e42587f596181396e1c4b63660abf0c736b10dae/src/transformers/models/llama/modeling_llama.py#L173-L188))\n",
+    "- The original RoPE paper and Meta's official Llama 2 repository, however, use the *interleaved (even/odd)* version ([llama/model.py](https://github.com/meta-llama/llama/blob/6c7fe276574e78057f917549435a2554000a876d/llama/model.py#L64-L74)); but as mentioned earlier, they are equivalent"
    ]
   },
   {

Original file line number	Diff line number	Diff line change
`@@ -411,7 +411,9 @@`
`411`	`411`	`"\n",`
`412`	`412`	`"- Unlike traditional absolute positional embeddings, Llama uses rotary position embeddings (RoPE), which enable it to capture both absolute and relative positional information simultaneously\n",`
`413`	`413`	`"- The reference paper for RoPE is [RoFormer: Enhanced Transformer with Rotary Position Embedding (2021)](https://arxiv.org/abs/2104.09864)\n",`
`414`		- "- This code uses the RoPE split-halves style, which matches the Hugging Face Transformers implementation ([modeling\\_llama.py](https://github.com/huggingface/transformers/blob/e42587f596181396e1c4b63660abf0c736b10dae/src/transformers/models/llama/modeling_llama.py#L173-L188)).<br> The original RoPE paper and Meta’s official LLaMA-2 repo, however, use the interleaved (even/odd) version ([llama/model.py](https://github.com/meta-llama/llama/blob/6c7fe276574e78057f917549435a2554000a876d/llama/model.py#L64-L74)).<br> Both variants work fine — just be aware of this difference to avoid confusion."
	`414`	`+ "- RoPE can be implemented in two equivalent ways: the split-halves version and the interleaved even/odd version; they are mathematically the same as long as we pair dimensions consistently and use the same cos/sin ordering (see [this](https://github.com/rasbt/LLMs-from-scratch/issues/751) GitHub discussion for more information)\n",`
	`415`	`+ "- This code uses the RoPE split-halves approach, similar to Hugging Face transformers ([modeling_llama.py](https://github.com/huggingface/transformers/blob/e42587f596181396e1c4b63660abf0c736b10dae/src/transformers/models/llama/modeling_llama.py#L173-L188))\n",`
	`416`	`+ "- The original RoPE paper and Meta's official Llama 2 repository, however, use the interleaved (even/odd) version ([llama/model.py](https://github.com/meta-llama/llama/blob/6c7fe276574e78057f917549435a2554000a876d/llama/model.py#L64-L74)); but as mentioned earlier, they are equivalent"`
`415`	`417`	`]`
`416`	`418`	`},`
`417`	`419`	`{`