Skip to content

Commit 2aa8e81

Browse files
authored
Note about RoPE usage (#839)
* Note about devcontainer root usage * Add note about RoPE implementation
1 parent 42c1306 commit 2aa8e81

File tree

3 files changed

+128
-0
lines changed

3 files changed

+128
-0
lines changed

pkg/llms_from_scratch/llama3.py

Lines changed: 52 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -205,6 +205,58 @@ def forward(self, x, mask, cos, sin):
205205
return context_vec
206206

207207

208+
# ==============================================================================
209+
# RoPE implementation summary
210+
#
211+
#
212+
# There are two common styles to implement RoPE, which are
213+
# mathematically equivalent;
214+
# they mainly differ in how the rotation matrix pairs dimensions.
215+
#
216+
# 1) Split-halves style (this repo, Hugging Face Transformers):
217+
#
218+
# For hidden dim d = 8 (example):
219+
#
220+
# [ x0 x1 x2 x3 x4 x5 x6 x7 ]
221+
# │ │ │ │ │ │ │ │
222+
# ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼
223+
# cos cos cos cos sin sin sin sin
224+
#
225+
# Rotation matrix:
226+
#
227+
# [ cosθ -sinθ 0 0 ... ]
228+
# [ sinθ cosθ 0 0 ... ]
229+
# [ 0 0 cosθ -sinθ ... ]
230+
# [ 0 0 sinθ cosθ ... ]
231+
# ...
232+
#
233+
# Here, the embedding dims are split into two halves and then
234+
# each one is rotated in blocks.
235+
#
236+
#
237+
# 2) Interleaved (even/odd) style (original paper, Llama repo):
238+
#
239+
# For hidden dim d = 8 (example):
240+
#
241+
# [ x0 x1 x2 x3 x4 x5 x6 x7 ]
242+
# │ │ │ │ │ │ │ │
243+
# ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼
244+
# cos sin cos sin cos sin cos sin
245+
#
246+
# Rotation matrix:
247+
# [ cosθ -sinθ 0 0 ... ]
248+
# [ sinθ cosθ 0 0 ... ]
249+
# [ 0 0 cosθ -sinθ ... ]
250+
# [ 0 0 sinθ cosθ ... ]
251+
# ...
252+
#
253+
# Here, embedding dims are interleaved as even/odd cosine/sine pairs.
254+
#
255+
# Both layouts encode the same relative positions; the only difference is how
256+
# dimensions are paired.
257+
# ==============================================================================
258+
259+
208260
def compute_rope_params(head_dim, theta_base=10_000, context_length=4096, freq_config=None, dtype=torch.float32):
209261
assert head_dim % 2 == 0, "Embedding dimension must be even"
210262

pkg/llms_from_scratch/qwen3.py

Lines changed: 52 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -316,6 +316,58 @@ def forward(self, x, mask, cos, sin):
316316
return self.out_proj(context)
317317

318318

319+
# ==============================================================================
320+
# RoPE implementation summary
321+
#
322+
#
323+
# There are two common styles to implement RoPE, which are
324+
# mathematically equivalent;
325+
# they mainly differ in how the rotation matrix pairs dimensions.
326+
#
327+
# 1) Split-halves style (this repo, Hugging Face Transformers):
328+
#
329+
# For hidden dim d = 8 (example):
330+
#
331+
# [ x0 x1 x2 x3 x4 x5 x6 x7 ]
332+
# │ │ │ │ │ │ │ │
333+
# ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼
334+
# cos cos cos cos sin sin sin sin
335+
#
336+
# Rotation matrix:
337+
#
338+
# [ cosθ -sinθ 0 0 ... ]
339+
# [ sinθ cosθ 0 0 ... ]
340+
# [ 0 0 cosθ -sinθ ... ]
341+
# [ 0 0 sinθ cosθ ... ]
342+
# ...
343+
#
344+
# Here, the embedding dims are split into two halves and then
345+
# each one is rotated in blocks.
346+
#
347+
#
348+
# 2) Interleaved (even/odd) style (original paper, Llama repo):
349+
#
350+
# For hidden dim d = 8 (example):
351+
#
352+
# [ x0 x1 x2 x3 x4 x5 x6 x7 ]
353+
# │ │ │ │ │ │ │ │
354+
# ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼
355+
# cos sin cos sin cos sin cos sin
356+
#
357+
# Rotation matrix:
358+
# [ cosθ -sinθ 0 0 ... ]
359+
# [ sinθ cosθ 0 0 ... ]
360+
# [ 0 0 cosθ -sinθ ... ]
361+
# [ 0 0 sinθ cosθ ... ]
362+
# ...
363+
#
364+
# Here, embedding dims are interleaved as even/odd cosine/sine pairs.
365+
#
366+
# Both layouts encode the same relative positions; the only difference is how
367+
# dimensions are paired.
368+
# ==============================================================================
369+
370+
319371
def compute_rope_params(head_dim, theta_base=10_000, context_length=4096, dtype=torch.float32):
320372
assert head_dim % 2 == 0, "Embedding dimension must be even"
321373

setup/03_optional-docker-environment/README.md

Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -78,6 +78,29 @@ mv setup/03_optional-docker-environment/.devcontainer ./
7878

7979
Since the `.devcontainer` folder is present in the main `LLMs-from-scratch` directory (folders starting with `.` may be invisible in your OS depending on your settings), VS Code should automatically detect it and ask whether you would like to open the project in a devcontainer. If it doesn't, simply press `Ctrl + Shift + P` to open the command palette and start typing `dev containers` to see a list of all DevContainer-specific options.
8080

81+
82+
 
83+
> ⚠️ **Note about running as root**
84+
>
85+
> By default, the DevContainer runs as the *root user*. This is not generally recommended for security reasons, but for simplicity in this book's setup, the root configuration is used so that all required packages install cleanly inside the container.
86+
>
87+
> If you try to start Jupyter Lab manually inside the container, you may see this error:
88+
>
89+
> ```bash
90+
> Running as root is not recommended. Use --allow-root to bypass.
91+
> ```
92+
>
93+
> In this case, you can run:
94+
>
95+
> ```bash
96+
> uv run jupyter lab --allow-root
97+
> ```
98+
>
99+
> - When using VS Code with the Jupyter extension, you usually don't need to start Jupyter Lab manually. Opening notebooks through the extension should work out of the box.
100+
> - Advanced users who prefer stricter security can modify the `.devcontainer.json` to set up a non-root user, but this requires extra configuration and is not necessary for most use cases.
101+
102+
103+
81104
8. Select **Reopen in Container**.
82105

83106
Docker will now begin the process of building the Docker image specified in the `.devcontainer` configuration if it hasn't been built before, or pull the image if it's available from a registry.
@@ -86,6 +109,7 @@ The entire process is automated and might take a few minutes, depending on your
86109

87110
Once completed, VS Code will automatically connect to the container and reopen the project within the newly created Docker development environment. You will be able to write, execute, and debug code as if it were running on your local machine, but with the added benefits of Docker's isolation and consistency.
88111

112+
 
89113
> **Warning:**
90114
> If you are encountering an error during the build process, this is likely because your machine does not support NVIDIA container toolkit because your machine doesn't have a compatible GPU. In this case, edit the `devcontainer.json` file to remove the `"runArgs": ["--runtime=nvidia", "--gpus=all"],` line and run the "Reopen Dev Container" procedure again.
91115

0 commit comments

Comments
 (0)