This project aims to develop a real-time sentiment analysis system that processes Twitter data and visualizes the results through interactive dashboards and geographic mapping.
An automated system that analyzes tweets about specific topics, classifies user emotional responses as positive, negative, or neutral, and visualizes the results through interactive charts and world maps. The project focuses on analyzing public sentiment regarding major events like the Ukraine-Russia conflict and economic indicators.
- Extract large-scale data from Twitter using APIs
- Process and analyze sentiment using data mining techniques
- Create meaningful visualizations and real-time analytics
- Provide geographic sentiment mapping capabilities
- Python: Main programming language for data processing and sentiment analysis
- Twitter API: Data extraction via Tweepy library
- TextBlob: Natural language processing and sentiment analysis
- Elastic Search: NoSQL database for storing and indexing tweet data
- Kibana: Real-time data visualization and dashboard creation
- AWS EC2: Cloud infrastructure for hosting the application
tweepy
- Twitter API integrationtextblob
- Sentiment analysiselasticsearch-py
- Database operationsgeopy
- Geographic coordinate conversionpandas
- Data manipulation and preprocessingre
- Text preprocessing with regular expressions
- Data Collection: Real-time tweet streaming using Twitter API
- Data Preprocessing: Text cleaning and standardization
- Sentiment Analysis: Classification using TextBlob polarity scores
- Geographic Processing: Location extraction and coordinate conversion
- Data Storage: Indexing in Elasticsearch with proper mapping
- Visualization: Interactive dashboards and maps via Kibana
- Tweet volume by location
- Sentiment distribution (positive/negative/neutral)
- Geographic sentiment mapping
- Time-series analysis of sentiment trends
- Top locations by follower count
- Message intensity distribution
- World map visualization of sentiment data
- Country and region-based sentiment analysis
- Real-time geographic sentiment tracking
The system was tested by analyzing sentiment around the Ukraine-Russia conflict across different time periods:
- May 3-5, 2022: 84,114 tweets analyzed
- May 10-12, 2022: 32,000 tweets analyzed
- June 17, 2022: 8,951 tweets analyzed
- Limited Turkish engagement on Ukraine topics
- High negative sentiment in UK and Spain
- Significant tweet volume across European locations
- Notable activity concentration in specific African regions (Togo)
class TweetStreamListener(tweepy.StreamListener):
def on_data(self, data):
# Process tweet data
# Perform sentiment analysis
# Extract geographic coordinates
# Store in Elasticsearch
- Positive: Polarity score > 0
- Negative: Polarity score < 0
- Neutral: Polarity score = 0
- Location text extraction from user profiles
- Coordinate conversion using Nominatim API
- Geo-point mapping for Kibana visualization
- Python 3.7+
- Twitter Developer Account
- Elasticsearch Cloud instance
- AWS EC2 instance (optional)
Create a .env
file with the following variables:
# Twitter API
api_key=your_twitter_api_key
api_key_secret=your_twitter_api_secret
access_token=your_access_token
access_token_secret=your_access_token_secret
# Elasticsearch
cloud_id=your_elasticsearch_cloud_id
user=your_elasticsearch_username
password=your_elasticsearch_password
pip install tweepy textblob elasticsearch geopy pandas
This thesis, conducted by Engin KarataΕ at Yozgat Bozok University, focused on implementing sentiment analysis and visualization of Twitter data using modern data science technologies including Python, Elastic Search, Kibana, and AWS cloud services.
- Successfully implemented real-time Twitter data collection using Twitter API and Tweepy library
- Developed automated data preprocessing pipeline using Python and regular expressions
- Achieved processing of up to 2 million tweets monthly (Twitter hobby account limit)
- Utilized TextBlob library for sentiment classification
- Classified tweets into three categories: positive, negative, and neutral
- Implemented polarity scoring system ranging from -1 (negative) to +1 (positive)
- Successfully integrated geolocation data using Geopy library
- Converted location names to latitude/longitude coordinates
- Enabled geographic visualization of sentiment patterns
- Implemented Elastic Search as NoSQL database for efficient data storage and retrieval
- Created real-time dashboards using Kibana for data visualization
- Developed multiple visualization types including bar charts, pie charts, and world maps
Data Analyzed: Multiple time periods (May 3-5, May 10-12, June 17, 2022) Total Records: Over 125,000 tweets analyzed
Key Findings:
- Limited Turkish participation in Ukraine war discussions on Twitter
- High negative sentiment concentration in UK and Spain
- Significant tweet volume across European regions, particularly in UK
- Notable activity in African countries, especially Togo
Key Findings:
- High concentration of tweets from Turkey (as expected for Turkish-language hashtag)
- Predominantly negative sentiment regarding exchange rate discussions
- Clear correlation between economic concerns and negative sentiment
- Developed end-to-end pipeline from data collection to visualization
- Implemented stream processing for live sentiment analysis
- Created automated data quality checks and preprocessing
- Successfully mapped sentiment data to world coordinates
- Enabled regional sentiment analysis capabilities
- Provided location-based filtering and analysis tools
- Utilized cloud infrastructure (AWS EC2) for processing
- Implemented distributed data storage with Elastic Search
- Created scalable visualization platform with Kibana
- Demonstrated effectiveness for real-time public opinion tracking
- Provided tools for crisis communication monitoring
- Enabled geographic sentiment analysis for global events
- Integrated multiple technologies (Python, ELK Stack, AWS) effectively
- Created open-source solution for academic research
- Developed interactive dashboards for real-time sentiment monitoring
- Created geographic heat maps for sentiment distribution
- Implemented filtering capabilities for temporal and spatial analysis
- Volume: Up to 2 million tweets per month
- Languages: Multi-language support with translation capabilities
- Real-time Processing: Near real-time sentiment classification
- Storage: NoSQL database with full-text search capabilities
- Location-based record counts
- Total tweet statistics
- Average sentiment analysis values
- Positive/negative tweet ratios
- Country and location-based sentiment analysis
- Intensity-based location, author, and message tables
- Distribution graphs of intensity values
- Top follower count locations
- Restricted to Twitter hobby account limits (2M tweets/month)
- Dependent on user-provided location data accuracy
- Limited to publicly available tweets
- Sentiment analysis accuracy dependent on TextBlob algorithm
- Geographic mapping limited to location data availability
- Real-time processing constrained by API rate limits
- Integration of machine learning models for improved sentiment accuracy
- Implementation of emotion detection beyond positive/negative classification
- Development of trend prediction capabilities
- Integration with other social media platforms
- Incorporation of news media sentiment analysis
- Cross-platform sentiment correlation studies
- Development of predictive sentiment models
- Implementation of anomaly detection for unusual sentiment patterns
- Creation of automated alert systems for significant sentiment changes
This research successfully demonstrates the feasibility and effectiveness of implementing a comprehensive Twitter sentiment analysis system using modern data science technologies. The project provides valuable insights into public opinion patterns during significant global events and establishes a reusable framework for future social media sentiment analysis research.
The integration of real-time data processing, sentiment analysis, geographic mapping, and interactive visualization creates a powerful tool for understanding public sentiment at scale. The findings from the Ukraine-Russia war and Turkish Lira exchange rate analyses demonstrate the system's capability to provide meaningful insights into public opinion dynamics across different geographic regions and topics.
The open-source nature of the implementation and detailed documentation provided in this thesis contribute to the academic community's resources for social media research and sentiment analysis methodologies.
- Environment variable protection for API keys
- Secure cloud deployment on AWS
- SSH protocol for secure server access
- API rate limiting compliance
- Set up your environment variables
- Configure Twitter API credentials
- Set up Elasticsearch instance
- Run the main Python script to start streaming tweets
- Access Kibana dashboard for real-time visualization
This project serves as a foundation for researchers and developers interested in social media sentiment analysis. Contributions and improvements are welcome.
This project was developed for academic purposes as part of a graduation thesis.
This project was developed as a graduation thesis at Yozgat Bozok University, Computer Engineering Department, supervised by Assoc. Prof. Dr. Mehmet BakΔ±r.
Author: Engin KarataΕ (Student ID: 16008118040)
Year: 2022
*This system provides a foundation for researchers, data scientists, and organizations to conduct objective sentiment analysis on social media data with geographic context and real-time visualization capabilities.