GitHub - priyam-choksi/Real-Time-Stock-Market-Data-Processing: This project focuses on constructing a real-time data engineering pipeline for stock market data using Apache Kafka, Python, and various AWS services. The goal is to demonstrate an end-to-end implementation that collects, processes, stores, and queries stock market data in real-time.

Real-Time Stock Market Data Processing with Kafka and AWS

Project Overview

This project focuses on constructing a real-time data engineering pipeline for stock market data using Apache Kafka, Python, and various AWS services. The goal is to demonstrate an end-to-end implementation that collects, processes, stores, and queries stock market data in real-time, offering insights into the technical execution and scalability of data pipelines.

Architectural Diagram

Technologies Used

Python: Primary programming language for scripting and automation.
Apache Kafka: Used for building real-time data streaming pipelines.
Amazon Web Services (AWS):
- S3 (Simple Storage Service): Data storage solution.
- EC2 (Elastic Compute Cloud): Server hosting for Kafka and other processes.
- Glue: ETL service for cataloging and processing data.
  - Glue Crawler: Automates the categorization of data into the Glue Catalog.
  - Glue Catalog: Metadata storage for data assets.
- Athena: SQL querying service used for analyzing data directly in S3 using standard SQL.

Dataset

This project is structured to work with any stock market dataset. The provided dataset includes processed stock indices, which is available on GitHub:

Processed Stock Indices Dataset

Implementation Details

Data Collection: Utilize Boto3 SDK to simulate real-time stock market data generation and ingestion into Kafka.
Data Streaming: Implement Kafka producers and consumers to handle the flow of data in real-time.
Data Storage and Management: Use AWS S3 for storing processed data, with Glue services to manage data cataloging and schema evolution.
Data Querying: Leverage AWS Athena for efficient querying capabilities on large-scale data stored in S3.

Instructions for Setup and Execution

Setup AWS services and configure Kafka on an EC2 instance.
Ensure all services are interconnected and data flow is properly managed across different AWS components.
Deploy the Python scripts to simulate data production and consumption processes.
Regularly monitor system performance and adjust configurations as necessary to handle different data volumes and velocities.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
Architecture.jpg		Architecture.jpg
KafkaConsumer.ipynb		KafkaConsumer.ipynb
KafkaProducer.ipynb		KafkaProducer.ipynb
README.md		README.md
command_kafka.txt		command_kafka.txt
indexProcessed.csv		indexProcessed.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Real-Time Stock Market Data Processing with Kafka and AWS

Project Overview

Architectural Diagram

Technologies Used

Dataset

Implementation Details

Instructions for Setup and Execution

About

Uh oh!

Releases

Packages

Languages

priyam-choksi/Real-Time-Stock-Market-Data-Processing

Folders and files

Latest commit

History

Repository files navigation

Real-Time Stock Market Data Processing with Kafka and AWS

Project Overview

Architectural Diagram

Technologies Used

Dataset

Implementation Details

Instructions for Setup and Execution

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages