SMS Spam Detection Web Application

This application leverages machine learning to detect spam messages

This repository contains a web application for detecting spam SMS messages. The application uses machine learning models (Extra Trees and Bernoulli Naive Bayes) to classify messages as spam or not spam. The app also allows users to provide feedback on the classification results, which can be used to retrain the models periodically.

Dataset

Try on Streamlit

Try on Huggingface Space

Features

Prediction: Classify SMS messages as spam or not spam using Extra Trees or Bernoulli Naive Bayes models.
Feedback: Users can provide feedback on the predictions to improve model performance.
Continuous Training: The application supports periodic retraining of models using the feedback data.

Project Structure

/sms-spam-detection
│
├──/model
│   ├── BernoulliNB.pkl
│   └── Extra_Tree.pkl
│
├──/static
│   └──/images
│
├── app.py
├── streamlit_app.py
├── docker_app.py
├── Dockerfile
├── Dockerfile.fastapi
├── docker-compose.yml
├── requirements.txt

app.py: Defines the FastAPI application.
streamlit_app.py: Defines the streamlit webapp.
docker_app.py: streamlit webapp for docker
Dockerfile: Dockerfile for building the Docker image.
docker-compose.yml: Docker Compose file for orchestrating the services.
requirements.txt: List of dependencies.
model/: Directory containing pre-trained machine learning models.
static/: Directory containing static files such as images used in the interface.

Installation

Clone the repository:

git clone https://github.com/Sibikrish3000/sms-spam-detection.git
cd sms-spam-detection

Install the required packages:
```
pip install -r requirements.txt
```

Download NLTK data:

python -m nltk.downloader punkt
python -m nltk.downloader stopwords

Run Locally

Start the FastAPI Server:

uvicorn app:app --host 0.0.0.0 --port 8000 --reload

Run the Streamlit Application:
```
streamlit run streamlit_app.py
```

Using Docker Compose

Build and start the containers:

docker network create AIservice

docker-compose up --build

Access the streamlit webapp at http://localhost:8501.

Development

Running in a Gitpod Cloud Environment

Click the button below to start a new development environment:

Usage

Enter SMS Message: Input the SMS message you want to classify.
Select Model: Choose between Extra Trees and Bernoulli Naive Bayes models.
Predict: Click the "Predict" button to see the classification result.
Feedback: Provide feedback on the prediction by marking the message as spam or not spam and submit.

Continuous Training (CT) in MLOps

Continuous Training (CT) ensures that the machine learning models stay up-to-date with new data and feedback. Here are some suggestions for implementing CT for this application:

Online Learning

Online learning is suitable for scenarios where data arrives continuously, and the model needs to update frequently.

Implementation: Implement online learning techniques where models are updated incrementally as new labeled data arrives. Use techniques like stochastic gradient descent or mini-batch learning to update models in real-time based on user feedback. Use the partial_fit() method available in some scikit-learn models (e.g., SGDClassifier,BernoulliNB) to update the model incrementally.
Benefits: The model updates with each new feedback, allowing it to adapt quickly to new patterns.
Challenges: May require more careful tuning and monitoring to ensure model stability.

Offline Learning

Offline learning involves retraining the model periodically with the accumulated feedback data.

Implementation: Retrain the model every fixed interval (e.g., daily, weekly) using the feedback data stored in the CSV file.
Benefits: Simpler to implement and manage, as retraining can be scheduled during off-peak times.
Challenges: Model updates less frequently compared to online learning, which may delay the incorporation of new patterns.

Partial Fit

Partial fit combines aspects of both online and offline learning.

Implementation: Use models that support the partial_fit() method. Collect feedback data over a period and then update the model in smaller batches.
Benefits: Provides a balance between frequent updates and stability.
Challenges: Requires careful management of the batch size and frequency of updates.

Example Workflow for Offline Learning with Periodic Retraining

Collect Feedback: Save feedback data into a CSV file.
Scheduled Retraining: Set up a cron job or similar scheduling tool to retrain the model every 10 days.
Model Update: Load the feedback data, preprocess it, and retrain the model.
Save Model: Save the retrained model to a file and replace the old model.

Cron Job Example (Linux)

# Open the crontab editor
crontab -e

# Add the following line to schedule retraining every 10 days
0 0 */10 * * /usr/bin/python3 /path/to/your/retrain_script.py

Retraining Script Example

import pandas as pd
import joblib
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.ensemble import ExtraTreesClassifier

# Load feedback data
df = pd.read_csv('feedback.csv')

# Preprocess the messages
# Include your preprocessing function here

# Vectorize the messages
vectorizer = TfidfVectorizer()
X = vectorizer.fit_transform(df['message'])
y = df['label']

# Retrain the model
model = ExtraTreesClassifier()
model.fit(X, y)

# Save the retrained model
joblib.dump(model, 'Extra_Tree.pkl')

License

This project is licensed under the MIT License. See the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
.streamlit		.streamlit
datasets		datasets
models		models
notebook		notebook
static/images		static/images
.gitignore		.gitignore
Dockerfile		Dockerfile
Dockerfile.streamlit		Dockerfile.streamlit
LICENSE		LICENSE
README.md		README.md
about.md		about.md
app.py		app.py
demo.py		demo.py
docker-compose.yml		docker-compose.yml
docker_app.py		docker_app.py
feedback.csv		feedback.csv
image.jpg		image.jpg
requirements.txt		requirements.txt
streamlit_app.py		streamlit_app.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

SMS Spam Detection Web Application

Try on Streamlit

Try on Huggingface Space

Features

Project Structure

Installation

Run Locally

Using Docker Compose

Development

Running in a Gitpod Cloud Environment

Usage

Continuous Training (CT) in MLOps

Online Learning

Offline Learning

Partial Fit

Example Workflow for Offline Learning with Periodic Retraining

Cron Job Example (Linux)

Retraining Script Example

License

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

Sibikrish3000/sms-spam-detection

Folders and files

Latest commit

History

Repository files navigation

SMS Spam Detection Web Application

Try on Streamlit

Try on Huggingface Space

Features

Project Structure

Installation

Run Locally

Using Docker Compose

Development

Running in a Gitpod Cloud Environment

Usage

Continuous Training (CT) in MLOps

Online Learning

Offline Learning

Partial Fit

Example Workflow for Offline Learning with Periodic Retraining

Cron Job Example (Linux)

Retraining Script Example

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages