This project performs sentiment analysis on IMDb movie reviews using machine learning models. It compares the performance of Logistic Regression and Naive Bayes classifiers.
- Loads the IMDb dataset of 50,000 movie reviews
- Cleans and preprocesses text data (lemmatization, stopword removal)
- Converts text to numerical features using TF-IDF
- Visualizes frequent words in positive and negative reviews
- Trains and evaluates Logistic Regression and Naive Bayes models
- Compares model accuracies and saves trained models
- Python 3.x
- pandas
- numpy
- matplotlib
- seaborn
- scikit-learn
- nltk
- spacy
- imbalanced-learn
- joblib
- kagglehub
- Clone the repository and open
Sentiment Analysis on IMDb Movie Reviews.ipynbin Jupyter or VS Code. - Run all cells to execute the workflow.
- The notebook will output model metrics and save trained models as
.pklfiles.
The dataset is loaded from Kaggle using kagglehub:
- Model performance metrics (accuracy, precision, recall, F1 score) are printed for both classifiers.
- Confusion matrices and classification reports are visualized.