Gemma 3 270M From Scratch (#771)

rasbt · web-flow · commit a6b883c9f9d0 · 2025-08-17T08:23:05.000-05:00
* Gemma 3 270M From Scratch

* fix path

* update readme
diff --git a/.github/workflows/basic-tests-linux-uv.yml b/.github/workflows/basic-tests-linux-uv.yml
@@ -52,6 +52,7 @@ jobs:
           pytest --ruff ch04/03_kv-cache/tests.py
           pytest --ruff ch05/01_main-chapter-code/tests.py
           pytest --ruff ch05/07_gpt_to_llama/tests/tests.py
+          pytest --ruff ch05/12_gemma3/tests/test_gemma3.py
           pytest --ruff ch06/01_main-chapter-code/tests.py
 
       - name: Validate Selected Jupyter Notebooks (uv)
diff --git a/.github/workflows/basic-tests-macos-uv.yml b/.github/workflows/basic-tests-macos-uv.yml
@@ -51,6 +51,7 @@ jobs:
           pytest --ruff ch04/01_main-chapter-code/tests.py
           pytest --ruff ch05/01_main-chapter-code/tests.py
           pytest --ruff ch05/07_gpt_to_llama/tests/tests.py
+          pytest --ruff ch05/12_gemma3/tests/test_gemma3.py
           pytest --ruff ch06/01_main-chapter-code/tests.py
 
       - name: Validate Selected Jupyter Notebooks (uv)
diff --git a/.gitignore b/.gitignore
@@ -77,6 +77,8 @@ ch07/01_main-chapter-code/gpt2-medium355M-sft-standalone.pth
 ch07/01_main-chapter-code/Smalltestmodel-sft-standalone.pth
 ch07/01_main-chapter-code/gpt2/
 
+gemma-3-270m/
+gemma-3-270m-it/
 Qwen3-0.6B-Base/
 Qwen3-0.6B/
 tokenizer-base.json
diff --git a/README.md b/README.md
@@ -159,6 +159,7 @@ Several folders contain optional materials as a bonus for interested readers:
   - [Converting GPT to Llama](ch05/07_gpt_to_llama)
   - [Llama 3.2 From Scratch](ch05/07_gpt_to_llama/standalone-llama32.ipynb)
   - [Qwen3 Dense and Mixture-of-Experts (MoE) From Scratch](ch05/11_qwen3/)
+  - [Gemma 3 From Scratch](ch05/12_gemma3/)
   - [Memory-efficient Model Weight Loading](ch05/08_memory_efficient_weight_loading/memory-efficient-state-dict.ipynb)
   - [Extending the Tiktoken BPE Tokenizer with New Tokens](ch05/09_extending-tokenizers/extend-tiktoken.ipynb)
   - [PyTorch Performance Tips for Faster LLM Training](ch05/10_llm-training-speed)
diff --git a/ch05/11_qwen3/standalone-qwen3.ipynb b/ch05/11_qwen3/standalone-qwen3.ipynb
@@ -983,6 +983,12 @@
     "else:\n",
     "    tokenizer_file_path = f\"Qwen3-{CHOOSE_MODEL}-Base/tokenizer.json\"\n",
     "\n",
+    "hf_hub_download(\n",
+    "    repo_id=repo_id,\n",
+    "    filename=\"tokenizer.json\",\n",
+    "    local_dir=local_dir,\n",
+    ")\n",
+    "\n",
     "tokenizer = Qwen3Tokenizer(\n",
     "    tokenizer_file_path=tokenizer_file_path,\n",
     "    repo_id=repo_id,\n",
diff --git a/ch05/12_gemma3/README.md b/ch05/12_gemma3/README.md
@@ -0,0 +1,18 @@
+# Gemma 3 270M From Scratch
+
+This [standalone-gemma3.ipynb](standalone-gemma3.ipynb) Jupyter notebook in this folder contains a from-scratch implementation of Gemma 3 270M. It requires about 2 GB of RAM to run.
+
+Below is a side-by-side comparison with Qwen3 0.6B as a reference model; if you are interested in the Qwen3 0.6B standalone notebook, you can find it [here](../11_qwen3).
+
+<br>
+
+<img src="https://sebastianraschka.com/images/LLMs-from-scratch-images/bonus/gemma3/gemma3-vs-qwen3.webp">
+
+<br>
+
+To learn more about the architecture differences and read about comparisons with other architectures, see my [The Big LLM Architecture Comparison: From DeepSeek-V3 to Kimi K2: A Look At Modern LLM Architecture Design](https://magazine.sebastianraschka.com/p/the-big-llm-architecture-comparison)article.
+
+
+
+
+
diff --git a/ch05/12_gemma3/standalone-gemma3.ipynb b/ch05/12_gemma3/standalone-gemma3.ipynb
diff --git a/ch05/12_gemma3/tests/test_gemma3.py b/ch05/12_gemma3/tests/test_gemma3.py