This project provides a comprehensive machine learning pipeline for heart disease prediction using the UCI Heart Disease dataset. It includes data preprocessing, feature selection, model training, evaluation, and a user-friendly Streamlit web app for real-time predictions.
Heart Disease Prediction ML Pipeline/
│
├── ui/
│ └── src/
│ └── app.py # Streamlit web app
│
├── models/
│ ├── final_model.pkl # Trained ML model (Random Forest)
│ └── scaler.pkl # Scaler used for preprocessing
│
├── deployment/
│ └── ngrok_setup.txt # Instructions for exposing app via ngrok
│
├── results/
│ └── evaluation_metrics.txt # Model performance metrics
│
├── Notebook.ipynb # Main Jupyter notebook (ML pipeline)
├── requirements.txt # Python dependencies
├── .gitignore # Files/folders to ignore in git
└── README.md # Project documentation
git clone <repo-url>
cd Heart Disease Prediction ML Pipelinepip install -r requirements.txtstreamlit run ui/src/app.pyThe app will open in your browser at http://localhost:8501.
- Location:
ui/src/app.py - Features:
- Enter patient details to predict heart disease risk.
- Uses the trained Random Forest model and scaler.
- Displays prediction and probability.
-
Notebook:
Notebook.ipynb -
Pipeline Steps:
- Data loading and preprocessing
- Feature selection (RFE, Chi-Square)
- Model training (Logistic Regression, Decision Tree, Random Forest, SVM)
- Evaluation (Accuracy, Precision, Recall, F1, ROC AUC)
- Hyperparameter tuning
- Saving best model and scaler
-
Results:
Seeresults/evaluation_metrics.txtfor detailed metrics.
To expose your Streamlit app for external access, follow the instructions in deployment/ngrok_setup.txt using ngrok.
See requirements.txt for all dependencies.