Skip to content

Commit a6b883c

Browse files
authored
Gemma 3 270M From Scratch (#771)
* Gemma 3 270M From Scratch * fix path * update readme
1 parent e9c1c1d commit a6b883c

File tree

8 files changed

+1394
-0
lines changed

8 files changed

+1394
-0
lines changed

.github/workflows/basic-tests-linux-uv.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -52,6 +52,7 @@ jobs:
5252
pytest --ruff ch04/03_kv-cache/tests.py
5353
pytest --ruff ch05/01_main-chapter-code/tests.py
5454
pytest --ruff ch05/07_gpt_to_llama/tests/tests.py
55+
pytest --ruff ch05/12_gemma3/tests/test_gemma3.py
5556
pytest --ruff ch06/01_main-chapter-code/tests.py
5657
5758
- name: Validate Selected Jupyter Notebooks (uv)

.github/workflows/basic-tests-macos-uv.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -51,6 +51,7 @@ jobs:
5151
pytest --ruff ch04/01_main-chapter-code/tests.py
5252
pytest --ruff ch05/01_main-chapter-code/tests.py
5353
pytest --ruff ch05/07_gpt_to_llama/tests/tests.py
54+
pytest --ruff ch05/12_gemma3/tests/test_gemma3.py
5455
pytest --ruff ch06/01_main-chapter-code/tests.py
5556
5657
- name: Validate Selected Jupyter Notebooks (uv)

.gitignore

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -77,6 +77,8 @@ ch07/01_main-chapter-code/gpt2-medium355M-sft-standalone.pth
7777
ch07/01_main-chapter-code/Smalltestmodel-sft-standalone.pth
7878
ch07/01_main-chapter-code/gpt2/
7979

80+
gemma-3-270m/
81+
gemma-3-270m-it/
8082
Qwen3-0.6B-Base/
8183
Qwen3-0.6B/
8284
tokenizer-base.json

README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -159,6 +159,7 @@ Several folders contain optional materials as a bonus for interested readers:
159159
- [Converting GPT to Llama](ch05/07_gpt_to_llama)
160160
- [Llama 3.2 From Scratch](ch05/07_gpt_to_llama/standalone-llama32.ipynb)
161161
- [Qwen3 Dense and Mixture-of-Experts (MoE) From Scratch](ch05/11_qwen3/)
162+
- [Gemma 3 From Scratch](ch05/12_gemma3/)
162163
- [Memory-efficient Model Weight Loading](ch05/08_memory_efficient_weight_loading/memory-efficient-state-dict.ipynb)
163164
- [Extending the Tiktoken BPE Tokenizer with New Tokens](ch05/09_extending-tokenizers/extend-tiktoken.ipynb)
164165
- [PyTorch Performance Tips for Faster LLM Training](ch05/10_llm-training-speed)

ch05/11_qwen3/standalone-qwen3.ipynb

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -983,6 +983,12 @@
983983
"else:\n",
984984
" tokenizer_file_path = f\"Qwen3-{CHOOSE_MODEL}-Base/tokenizer.json\"\n",
985985
"\n",
986+
"hf_hub_download(\n",
987+
" repo_id=repo_id,\n",
988+
" filename=\"tokenizer.json\",\n",
989+
" local_dir=local_dir,\n",
990+
")\n",
991+
"\n",
986992
"tokenizer = Qwen3Tokenizer(\n",
987993
" tokenizer_file_path=tokenizer_file_path,\n",
988994
" repo_id=repo_id,\n",

ch05/12_gemma3/README.md

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
# Gemma 3 270M From Scratch
2+
3+
This [standalone-gemma3.ipynb](standalone-gemma3.ipynb) Jupyter notebook in this folder contains a from-scratch implementation of Gemma 3 270M. It requires about 2 GB of RAM to run.
4+
5+
Below is a side-by-side comparison with Qwen3 0.6B as a reference model; if you are interested in the Qwen3 0.6B standalone notebook, you can find it [here](../11_qwen3).
6+
7+
<br>
8+
9+
<img src="https://sebastianraschka.com/images/LLMs-from-scratch-images/bonus/gemma3/gemma3-vs-qwen3.webp">
10+
11+
<br>
12+
13+
To learn more about the architecture differences and read about comparisons with other architectures, see my [The Big LLM Architecture Comparison: From DeepSeek-V3 to Kimi K2: A Look At Modern LLM Architecture Design](https://magazine.sebastianraschka.com/p/the-big-llm-architecture-comparison)article.
14+
15+
16+
17+
18+

0 commit comments

Comments
 (0)