A multi-approach machine learning project for extracting text data (name, DOB, address, state) from US driver's license images using object detection and OCR techniques.
This repository explores multiple approaches for driver's license data extraction:
- YOLO-v8 + EasyOCR: Production-ready pipeline combining YOLOv8 for field detection with EasyOCR for text recognition
- YOLO-v5: Custom-trained YOLOv5 model for license field detection
- Donut (Document Understanding Transformer): End-to-end transformer-based document parsing without OCR
- pyTesseract: Traditional OCR approach using Tesseract engine
- YOLO + TensorFlow 1.0: Legacy implementation using Darkflow
.
├── DATASET/ # Training datasets (US & India driver licenses)
├── Donut/
│ ├── CORD/ # Fine-tuning notebooks and data preparation scripts
│ └── deployment/ # Gradio web app for Donut model inference
├── YOLO-v5/
│ ├── LegacyTrain/ # Training artifacts
│ └── yolov5/ # YOLOv5 implementation
├── YOLO-v8/
│ ├── main.py # FastAPI REST API endpoint
│ ├── streamlit-app.py # Streamlit web interface
│ ├── parseq/ # Scene text recognition model
│ └── new_models/ # Trained model weights
├── pyTesseract_OCR/ # Tesseract-based extraction
└── YOLO___TF1.0/ # Legacy TensorFlow 1.x implementation
-
Install dependencies:
cd YOLO-v8 pip install -r requirements.txt -
Run the Streamlit app:
streamlit run streamlit-app.py
-
Or start the FastAPI server:
uvicorn main:app --reload
API endpoint:
POST /predictwith image file
-
Install dependencies:
cd Donut/deployment pip install -r requirements.txt -
Run the Gradio app:
python gradio-app.py
-
Create conda environment:
cd YOLO-v5 conda env create -f my_conda_env_yolov5.yml conda activate YOLO-v5 -
Training:
python train.py --img 640 --batch 4 --epochs 100 --data train/US_DL.yaml --cfg train/custom_yolov5l.yaml --weights train_L/yolov5l.pt
-
Inference:
python detect.py --weights runs/train/{exp}/weights/best.pt --img 640 --conf 0.2 --source test/images
- Deep Learning: PyTorch, Ultralytics YOLOv8/v5, Hugging Face Transformers
- OCR: EasyOCR, pyTesseract, PARSeq
- Web Frameworks: FastAPI, Streamlit, Gradio
- Computer Vision: OpenCV, PIL/Pillow
- Other: PyTorch Lightning, CUDA support for GPU acceleration
| Approach | Detection | Text Recognition | Use Case |
|---|---|---|---|
| YOLOv8 + EasyOCR | YOLOv8 | EasyOCR | Production API/UI |
| Donut | Transformer | End-to-end | Zero-OCR pipeline |
| YOLOv5 | YOLOv5 | - | Field detection only |
| pyTesseract | - | Tesseract | Basic OCR |
- Name
- Date of Birth (DOB)
- Address
- State
See individual component licenses in YOLO___TF1.0/licenses/.