Add Rotary Positional Embeddings #1

kartikayk · 2023-10-25T16:51:08Z

Adding rotary positional embeddings and associated tests. I also abuse this commit by adding the gitignore file

note: This needs Pytest for the test. I'll add a dev-requirements file in a separate PR.

For testing:

cd ~/torch_tbd/tests
pytest

Adding rotary positional embeddings and associated tests. I also abuse this commit by adding the gitignore file

abhinavarora · 2023-10-25T17:07:02Z

llm/llama2/position_embeddings.py

+        )
+
+        # Outer product of theta and position index
+        idx_theta = torch.einsum("i, j -> ij", seq_idx, self.theta).float()


Curious, does this work with jit?

Is this something we have to worry about still?

We won't be needing jit compatibility, only torch.compile, since we'll be doing python inference

rohan-varma

LGTM overall, most comments are minor, my main question is around recomputation of the RoPE cache and how that works.

Thanks!

rohan-varma · 2023-10-25T17:29:32Z

llm/llama2/position_embeddings.py

@@ -0,0 +1,85 @@
+# Copyright (c) Meta Platforms, Inc. and affiliates.


do we remove these copyrights in OSS?

I actually took this from the OSS code in Multimodal

rohan-varma · 2023-10-25T17:29:48Z

llm/llama2/position_embeddings.py

@@ -0,0 +1,85 @@
+# Copyright (c) Meta Platforms, Inc. and affiliates.
+# All rights reserved.


also, maybe make a subdirectory /components for these?

Didn't quite follow - can you elaborate?

rohan-varma · 2023-10-25T17:31:28Z

llm/llama2/position_embeddings.py

+
+    def forward(self, x: torch.Tensor) -> torch.Tensor:
+        """
+        TODO: The implementation below can be made more efficient


any pointers as to how so?

rohan-varma · 2023-10-25T17:32:38Z

tests/llm/llama2/test_position_embeddings.py

+    set_rng_seed(0)
+
+
+class TestRotaryPositionEmbedding:


do you want to test the cache invalidation / recomputation?

rohan-varma · 2023-10-25T17:35:55Z

llm/llama2/position_embeddings.py

+        for inference.
+        """
+        seq_len = x.size(1)
+        rope_cache = self.cache[:seq_len]


where does the cache actually get invalidated if we exceed the seq_len and recomputed?

We compute this with max_seq_len that the model supports and so in the current setting it wouldn't need to be invalidated. There are some corner cases for inference which I don't think I fully understand right now

rohan-varma · 2023-10-25T17:36:22Z

llm/llama2/position_embeddings.py

+        )
+
+        # Outer product of theta and position index
+        idx_theta = torch.einsum("i, j -> ij", seq_idx, self.theta).float()


We won't be needing jit compatibility, only torch.compile, since we'll be doing python inference

rohan-varma · 2023-10-25T17:37:16Z

llm/llama2/position_embeddings.py

+
+    Attributes:
+        dim (int): Embedding dimension for each head, computed as:
+            embed_size //  num_heads


what is num_heads here and where would it be specified, the attention block? Shall we clarify this as num_attention_heads? And does this value take on different meaning for GQA / MQA?

Oh yeh thats a good point.

rohan-varma · 2023-10-25T17:37:24Z

llm/llama2/position_embeddings.py

+    Attributes:
+        dim (int): Embedding dimension for each head, computed as:
+            embed_size //  num_heads
+        max_seq_len (int): Maximum expected sequence length for the


nit: add defaults in documentation?

rohan-varma · 2023-10-25T17:38:36Z

tests/llm/llama2/test_position_embeddings.py

+        return torch.randn(bsz, seq_len, num_heads, head_dim)
+
+    @pytest.fixture
+    def rope(self, input_params):


a state_dict compatibility test might be useful too. For example, take a state_dict in memory and verifies that it has the expected keys, which will help us ensure correctness when we load in pretrained weights.

rohan-varma · 2023-10-25T17:39:11Z

tests/test_utils.py

+    random.seed(seed)
+
+
+def assert_expected(


nit: do you want to add some defaults for this for ease of use as we develop?

use torchao copy_

Remove 130 LOC from this recipe

Adds option to do torch profile tracing via: --run_profiler (T/F) --profile_folder (str) Traces are saved out with rank_X as part of the trace name. <img width="1711" alt="rank_named_traces" src="https://github.com/pytorch-labs/torchtrain/assets/46302957/6eb3c3e0-6034-4d1f-8ea8-f43988755714"> Implemented as context wrapper around the main training loop.

Add Rotary Positional Embeddings

8947c2b

Adding rotary positional embeddings and associated tests. I also abuse this commit by adding the gitignore file

kartikayk requested review from rohan-varma and joecummings October 25, 2023 16:51

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Oct 25, 2023

abhinavarora reviewed Oct 25, 2023

View reviewed changes

rohan-varma approved these changes Oct 25, 2023

View reviewed changes

Address review comments

127fe1e

kartikayk merged commit 902ce2b into main Oct 25, 2023

kartikayk deleted the llama2_componentization branch October 25, 2023 18:13

janeyx99 referenced this pull request in janeyx99/torchtune May 16, 2024

Merge pull request #1 from weifengpy/fsdp2

b6fad93

use torchao copy_

RdoubleA mentioned this pull request Aug 4, 2024

Pre training #1257

Closed

SLR722 added a commit to calvinpelletier/torchtune that referenced this pull request Aug 23, 2024

fix issue pytorch#1 and pytorch#2

3194adb

SLR722 added a commit to calvinpelletier/torchtune that referenced this pull request Aug 30, 2024

fix issue pytorch#1 and pytorch#2

6f06404

joecummings added a commit that referenced this pull request May 1, 2025

Merge pull request #1 from joecummings/test

36ca9f0

Remove 130 LOC from this recipe

JonSnow1807 mentioned this pull request Jul 24, 2025

docs: Add comprehensive custom data guide and fix missing _component_ #2889

Open

4 tasks

		@@ -0,0 +1,85 @@
		# Copyright (c) Meta Platforms, Inc. and affiliates.

		@@ -0,0 +1,85 @@
		# Copyright (c) Meta Platforms, Inc. and affiliates.
		# All rights reserved.

Add Rotary Positional Embeddings #1

Add Rotary Positional Embeddings #1

Uh oh!

Conversation

kartikayk commented Oct 25, 2023

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rohan-varma left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!