Skip to content

A machine learning-powered web application that classifies SMS messages as spam or not using NLP techniques and the Multinomial Naive Bayes algorithm. This project includes full model training, evaluation, and a user-friendly Streamlit interface.

License

Notifications You must be signed in to change notification settings

Mozeel-V/spam-detection

Repository files navigation

SMS Spam Detection App 📱

Python Streamlit scikit-learn NLTK Pandas GitHub

A machine learning-powered web application that classifies SMS messages as spam or not using NLP techniques and the Multinomial Naive Bayes algorithm. This project includes full model training, evaluation, and a user-friendly Streamlit interface.


📂 Dataset


⚙️ Features

  • Data cleaning and preprocessing
  • Exploratory Data Analysis (EDA)
  • Text tokenization using NLTK
  • Vectorization using TF-IDF
  • Model comparison using multiple classifiers
  • Final model: Multinomial Naive Bayes
  • Evaluation metrics: Accuracy, Precision, Confusion Matrix
  • Streamlit web app for user interaction

🚀 How to Run

  1. Clone the repo

    git clone https://github.com/Mozeel-V/spam-detection.git
    cd spam-detection
  2. Create a Conda Environment(Optional)

    conda create -n spamguard
    conda activate spamguard
  3. Install dependencies

    pip install -r requirements.txt
  4. Run the app

    streamlit run app.py

Preview of the app can be accessed from here


📁 Project Structure

📦 spam-detection/
├── app.py                  # Streamlit app
├── model.pkl               # Trained Naive Bayes model
├── vectorizer.pkl          # TF-IDF vectorizer
├── spam.csv                # Original dataset
├── spam_utf8.csv           # UTF-8 converted dataset
├── spam-detection.ipynb    # Training and EDA notebook
├── requirements.txt        # Python dependencies
├── LICENSE                 # MIT open-source license
└── README.md               # Contains basic info about the project

🧠 Model Insights

  • The dataset was vectorized using TF-IDF to capture term importance.
  • Multiple classifiers were tested (e.g. Logistic Regression, SVM).
  • Multinomial Naive Bayes gave the best results on precision and accuracy.
  • The model was saved as model.pkl and used directly in the app.

🛠 Tech Stack

  • Python, Pandas, Scikit-learn, NLTK
  • TF-IDF Vectorizer
  • Streamlit (for frontend)

📄 License

This project is licensed under the MIT License.


🤝 Contributions

Feel free to fork, raise issues, or submit PRs to improve this project!


📝 Author

Mozeel Vanwani | IIT Kharagpur CSE

Email: [[email protected]]

About

A machine learning-powered web application that classifies SMS messages as spam or not using NLP techniques and the Multinomial Naive Bayes algorithm. This project includes full model training, evaluation, and a user-friendly Streamlit interface.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published