Skip to content

Commit e1388cb

Browse files
committed
Add Gemma3 KV cache variant
1 parent 80d4732 commit e1388cb

File tree

7 files changed

+1471
-29
lines changed

7 files changed

+1471
-29
lines changed

.github/workflows/basic-tests-linux-uv.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -55,6 +55,7 @@ jobs:
5555
pytest --ruff ch05/07_gpt_to_llama/tests/test_llama32_nb.py
5656
pytest --ruff ch05/11_qwen3/tests/test_qwen3_nb.py
5757
pytest --ruff ch05/12_gemma3/tests/test_gemma3_nb.py
58+
pytest --ruff ch05/12_gemma3/tests/test_gemma3_kv_nb.py
5859
pytest --ruff ch06/01_main-chapter-code/tests.py
5960
6061
- name: Validate Selected Jupyter Notebooks (uv)

.github/workflows/basic-tests-macos-uv.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -54,6 +54,7 @@ jobs:
5454
pytest --ruff ch05/07_gpt_to_llama/tests/test_llama32_nb.py
5555
pytest --ruff ch05/11_qwen3/tests/test_qwen3_nb.py
5656
pytest --ruff ch05/12_gemma3/tests/test_gemma3_nb.py
57+
pytest --ruff ch05/12_gemma3/tests/test_gemma3_kv_nb.py
5758
pytest --ruff ch06/01_main-chapter-code/tests.py
5859
5960
- name: Validate Selected Jupyter Notebooks (uv)

ch05/12_gemma3/README.md

Lines changed: 22 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,26 @@
11
# Gemma 3 270M From Scratch
22

3-
This [standalone-gemma3.ipynb](standalone-gemma3.ipynb) Jupyter notebook in this folder contains a from-scratch implementation of Gemma 3 270M. It requires about 2 GB of RAM to run.
3+
This [standalone-gemma3.ipynb](standalone-gemma3.ipynb) Jupyter notebook in this folder contains a from-scratch implementation of Gemma 3 270M. It requires about 2 GB of RAM to run.
4+
5+
The alternative [standalone-gemma3-plus-kvcache.ipynb](standalone-gemma3-plus-kvcache.ipynb) notebook adds a KV cache for better runtime performance (but adds more code complexity). To learn more about KV caching, see my [Understanding and Coding the KV Cache in LLMs from Scratch](https://magazine.sebastianraschka.com/p/coding-the-kv-cache-in-llms) article.
6+
7+
| Model | Mode | Hardware | Tokens/sec | GPU Memory (VRAM) |
8+
| ----------------- | ----------------- | --------------- | ---------- | ----------------- |
9+
| Gemma3Model 270M | Regular | Mac Mini M4 CPU | 8 | - |
10+
| Gemma3Model 270M | Regular compiled | Mac Mini M4 CPU | 9 | - |
11+
| Gemma3Model 270M | KV cache | Mac Mini M4 CPU | 130 | - |
12+
| Gemma3Model 270M | KV cache compiled | Mac Mini M4 CPU | 224 | - |
13+
| | | | | |
14+
| Gemma3Model 270M | Regular | Mac Mini M4 GPU | 16 | - |
15+
| Gemma3Model 270M | Regular compiled | Mac Mini M4 GPU | Error | - |
16+
| Gemma3Model 270M | KV cache | Mac Mini M4 GPU | 23 | - |
17+
| Gemma3Model 270M | KV cache compiled | Mac Mini M4 GPU | Error | - |
18+
| | | | | |
19+
| Gemma3Model 270M | Regular | Nvidia A100 GPU | 28 | 1.84 GB |
20+
| Gemma3Model 270M | Regular compiled | Nvidia A100 GPU | 128 | 2.12 GB |
21+
| Gemma3Model 270M | KV cache | Nvidia A100 GPU | 26 | 1.77 GB |
22+
| Gemma3Model 270M | KV cache compiled | Nvidia A100 GPU | 99 | 2.12 GB |
23+
424

525
Below is a side-by-side comparison with Qwen3 0.6B as a reference model; if you are interested in the Qwen3 0.6B standalone notebook, you can find it [here](../11_qwen3).
626

@@ -10,7 +30,7 @@ Below is a side-by-side comparison with Qwen3 0.6B as a reference model; if you
1030

1131
<br>
1232

13-
To learn more about the architecture differences and read about comparisons with other architectures, see my [The Big LLM Architecture Comparison: From DeepSeek-V3 to Kimi K2: A Look At Modern LLM Architecture Design](https://magazine.sebastianraschka.com/p/the-big-llm-architecture-comparison)article.
33+
To learn more about the architecture differences and read about comparisons with other architectures, see my [The Big LLM Architecture Comparison: From DeepSeek-V3 to Kimi K2: A Look At Modern LLM Architecture Design](https://magazine.sebastianraschka.com/p/the-big-llm-architecture-comparison) article.
1434

1535

1636

0 commit comments

Comments
 (0)