Skip to content

dhayeah/FYP-SDIP

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 

Repository files navigation

FYP-SDIP: Stream Data Ingestion and Processing Overview Welcome to the FYP-SDIP repository, a Final Year Project focused on Stream Data Ingestion and Processing (SDIP). This project demonstrates the design and implementation of real-time data pipelines for efficient ingestion, processing, and analysis of streaming data. Built with scalable technologies, it aims to optimize throughput and ensure data integrity for high-velocity data streams.

The project leverages Apache Kafka for stream ingestion and Apache Spark for processing, with additional tools for data pipeline orchestration and monitoring. It serves as a proof-of-concept for real-time data workflows, applicable to use cases such as IoT, financial transactions, or log analytics.

Features Real-Time Data Ingestion: Seamlessly ingests high-velocity data streams using Apache Kafka. Scalable Processing: Processes streaming data with Apache Spark, enabling complex transformations and analytics. Optimized Throughput: Ensures low-latency and high-throughput data pipelines. Data Integrity: Implements robust mechanisms to maintain data consistency and reliability. Extensible Architecture: Modular design for easy integration with additional data sources or processing logic. Technologies Used Apache Kafka: For distributed streaming and message brokering. Apache Spark: For stream processing and data analytics. Scala: Primary programming language for implementing processing logic. Python: For scripting and auxiliary tools.

Prerequisites To run this project locally, ensure you have the following installed:

Java JDK 8 or higher Scala 2.12.x Apache Kafka 2.8.x Apache Spark 3.2.x Python 3.8+ Maven 3.6.x Installation Clone the Repository: bash

Copy git clone https://github.com/dhayeah/FYP-SDIP.git cd FYP-SDIP Set Up Kafka and Zookeeper:

Start the Kafka Producer: Run the producer script to simulate streaming data: bash

Copy python scripts/producer.py Run the Spark Application: Submit the Spark job to process the streaming data: bash

Copy spark-submit --class com.dhayeah.fyp.StreamProcessor target/fyp-sdip-1.0.jar

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published