This project implements a sarcasm detection system using BERT (Bidirectional Encoder Representations from Transformers) for text classification.
The goal of this project is to detect sarcasm in text using machine learning techniques. We use BERT, a state-of-the-art transformer model, to classify text as either sarcastic or non-sarcastic.
- Text preprocessing and cleaning
- BERT-based model architecture
- Binary classification (sarcastic vs non-sarcastic)
- Comprehensive evaluation metrics
- Framework: PyTorch
- Model: BERT-base-uncased
- Libraries:
- pandas: Data manipulation
- transformers: BERT implementation
- scikit-learn: Evaluation metrics
- torch: Deep learning framework
- Create a virtual environment:
python3 -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
- Install dependencies:
pip install -r requirements.txt
- Prepare your data in CSV format with 'comments' and 'contains_slash_s' columns
- Run the main script:
python dataset/sarcasm/main.py
- Base Model: BERT-base-uncased
- Classification Head:
- Input: 768-dimensional BERT embeddings
- Output: 2 classes (sarcastic/non-sarcastic)
- Training Parameters:
- Batch size: 16
- Learning rate: 1e-5
- Epochs: 3
The model is evaluated using:
- Accuracy
- Precision
- Recall
- F1-score
- Try different BERT variants
- Experiment with different hyperparameters
- Add more preprocessing steps
- Implement cross-validation
- Add data augmentation
MIT License