DeepSeek‑R1‑Medical‑COT is a 4‑bit fine‑tuned causal language model
optimised for medical reasoning and clinical‑scenario interpretation.
It is based on unsloth/DeepSeek‑R1‑Distill‑Llama‑8B and fine‑tuned on the
FreedomIntelligence/medical-o1-reasoning-SFT dataset to provide structured,
step‑by‑step clinical reasoning and evidence‑based conclusions.
├── README.md
├── Fine_Tune_DeepSeek_R1_Medical_COT.ipynb # notebook with full training workflow
└── requirements.txt # Python dependencies
The Jupyter notebook contains all of the steps: model loading, prompt formatting, data preparation, LoRA configuration, training, validation, saving and pushing to the Hugging Face hub.
Create a virtual environment and install the required packages:
pip install unsloth trl peft accelerate bitsandbytes datasets
pip install transformers gradio huggingface-hubLoad and run the model as shown below:
from unsloth import FastLanguageModel
from transformers import AutoTokenizer
model_name = "DeepSeek-R1-Medical-COT"
# load model
model, tokenizer = FastLanguageModel.from_pretrained(
model_name, load_in_4bit=True
)
prompt = """
### Clinical Scenario:
A 54-year-old man complains of frequent urinary urgency, nocturia,
and a weak urinary stream. His prostate is moderately enlarged.
Predict likely cystometric findings.
"""
inputs = tokenizer([prompt], return_tensors="pt").to("cuda")
outputs = model.generate(
input_ids=inputs.input_ids,
attention_mask=inputs.attention_mask,
max_new_tokens=500
)
print(tokenizer.decode(outputs[0]))Follow the instructions in Fine_Tune_DeepSeek_R1_Medical_COT.ipynb:
- Load unsloth/DeepSeek-R1-Distill-Llama-8B with load_in_4bit=True.
- Define a CoT‑style prompt template for medical questions.
- Load the FreedomIntelligence/medical-o1-reasoning-SFT dataset and format examples.
- Apply LoRA to selected attention and feed‑forward modules.
- Configure TrainingArguments (batch 1 + accumulation, 4‑bit AdamW, FP16/BF16, 200 steps, etc.).
- Train with trl.SFTTrainer.
- Validate on a held‑out slice, save validation_results.csv and training_logs.csv.
- Save the fine‑tuned model locally and push to the Hugging Face hub.
- You can run the notebook interactively with Jupyter or convert it to a script with jupyter nbconvert.
The notebook creates a small validation split (train[500:550]), runs inference on a few examples, and stores outputs/lengths.
Set your token in an environment variable:
export HF_TOKEN="your_token_here"Then run (see notebook for full code):
from huggingface_hub import login, HfApi
import os
login(token=os.environ["HF_TOKEN"])
api = HfApi()
api.create_repo("DeepSeek-R1-Medical-COT", repo_type="model",
exist_ok=True)
model.push_to_hub("DeepSeek-R1-Medical-COT",
tokenizer=tokenizer,
save_method="merged_16bit")After this the model can be loaded with FastLanguageModel.from_pretrained("DeepSeek-R1-Medical-COT").
The notebook includes an example Gradio interface that accepts a medical scenario and returns step‑by‑step reasoning plus a final conclusion.
- Model type: Causal Language Model (LLM)
- Base model: unsloth/DeepSeek-R1-Distill-Llama-8B
- Fine‑tuned for: Medical instruction‑following and clinical reasoning
- Model hub: https://huggingface.co/MohamedASAK/DeepSeek-R1-Medical-COT
- Training dataset: https://huggingface.co/datasets/FreedomIntelligence/medical-o1-reasoning-SFT
- Dataset: FreedomIntelligence/medical-o1-reasoning-SFT
- Preprocessing: Prompts formatted with CoT style (grounded) for step-by-step reasoning
- Fine‑tuning method: LoRA applied to attention and feedforward modules
- Hyperparameters:
- Batch size: 1 (gradient accumulation 8)
- Max steps: 200
- Learning rate: 2 × 10⁻⁴
- Mixed precision: FP16 / BF16 depending on GPU support
- Optimizer: 8‑bit AdamW
- Evaluated on a subset of medical reasoning questions
- Metrics: correctness of reasoning, coherence, and answer accuracy
- Results indicate improved structured reasoning over the base model
- Limited to the quality and scope of the training dataset
- May not cover rare or highly specialized medical cases
- Should not replace clinical judgment; intended for educational and reasoning support
Recommendation: Always review model outputs with a qualified healthcare professional.