Skip to content

Latest commit

 

History

History
42 lines (31 loc) · 1.3 KB

File metadata and controls

42 lines (31 loc) · 1.3 KB

Sentiment Analysis on IMDb Movie Reviews

This project performs sentiment analysis on IMDb movie reviews using machine learning models. It compares the performance of Logistic Regression and Naive Bayes classifiers.

Features

  • Loads the IMDb dataset of 50,000 movie reviews
  • Cleans and preprocesses text data (lemmatization, stopword removal)
  • Converts text to numerical features using TF-IDF
  • Visualizes frequent words in positive and negative reviews
  • Trains and evaluates Logistic Regression and Naive Bayes models
  • Compares model accuracies and saves trained models

Requirements

  • Python 3.x
  • pandas
  • numpy
  • matplotlib
  • seaborn
  • scikit-learn
  • nltk
  • spacy
  • imbalanced-learn
  • joblib
  • kagglehub

Usage

  1. Clone the repository and open Sentiment Analysis on IMDb Movie Reviews.ipynb in Jupyter or VS Code.
  2. Run all cells to execute the workflow.
  3. The notebook will output model metrics and save trained models as .pkl files.

Dataset

The dataset is loaded from Kaggle using kagglehub:

Results

  • Model performance metrics (accuracy, precision, recall, F1 score) are printed for both classifiers.
  • Confusion matrices and classification reports are visualized.