Skip to content

Commit 2d8d622

Browse files
HayatoHongorasbt
andauthored
added brief explanations about 2 different ways of RoPE implementations (#802)
* added brief explanations about 2 different ways of RoPE implementations * improve comment --------- Co-authored-by: rasbt <[email protected]>
1 parent 9ea2c57 commit 2d8d622

File tree

1 file changed

+4
-1
lines changed

1 file changed

+4
-1
lines changed

ch05/07_gpt_to_llama/converting-gpt-to-llama2.ipynb

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -410,7 +410,10 @@
410410
"```\n",
411411
"\n",
412412
"- Unlike traditional absolute positional embeddings, Llama uses rotary position embeddings (RoPE), which enable it to capture both absolute and relative positional information simultaneously\n",
413-
"- The reference paper for RoPE is [RoFormer: Enhanced Transformer with Rotary Position Embedding (2021)](https://arxiv.org/abs/2104.09864)"
413+
"- The reference paper for RoPE is [RoFormer: Enhanced Transformer with Rotary Position Embedding (2021)](https://arxiv.org/abs/2104.09864)\n",
414+
"- RoPE can be implemented in two equivalent ways: the *split-halves* version and the *interleaved even/odd version*; they are mathematically the same as long as we pair dimensions consistently and use the same cos/sin ordering (see [this](https://github.com/rasbt/LLMs-from-scratch/issues/751) GitHub discussion for more information)\n",
415+
"- This code uses the RoPE *split-halves* approach, similar to Hugging Face transformers ([modeling_llama.py](https://github.com/huggingface/transformers/blob/e42587f596181396e1c4b63660abf0c736b10dae/src/transformers/models/llama/modeling_llama.py#L173-L188))\n",
416+
"- The original RoPE paper and Meta's official Llama 2 repository, however, use the *interleaved (even/odd)* version ([llama/model.py](https://github.com/meta-llama/llama/blob/6c7fe276574e78057f917549435a2554000a876d/llama/model.py#L64-L74)); but as mentioned earlier, they are equivalent"
414417
]
415418
},
416419
{

0 commit comments

Comments
 (0)