Realtime Ethereum Transaction Pipeline

This repository contains a realtime analytics pipeline that processes Ethereum blockchain transactions. The pipeline reads transactions from an Ethereum node, publishes them to a Kafka topic, processes and transforms them using Spark Streaming, stores the results in Clickhouse, and visualizes the data using Superset.

Getting Started

Tools Used

Clickhouse
Superset
Spark Streaming
Kafka

Overview

Data Ingestion: A Python script reads Ethereum blockchain transactions and writes them to a Kafka topic named eth_transactions.
Data Processing: A Spark Streaming job reads from the eth_transactions topic, transforms the data by converting values from wei to eth for specific columns (gasPrice, gas, value), and writes the transformed data to a new Kafka topic named transformed.
Data Storage: A table is created in Clickhouse to read the transaction queue. A materialized view is used to query the transaction queue and write to a table named transaction.
Data Visualization: Superset connects to the Clickhouse instance and reads the transaction table for visualization and analysis.

Running the Project

Clone the repository

git clone <repository_url>

cd <repository_directory>
Set up environment variables
- create a .env file with the Ethereum node endpoint
Start the services

Check references below for more details on setting up all the necessary services.

docker-compose up -d
Run the Ethereum data ingestion script

Before running the script create a file name last_processed_block.txt in your working directory. This file is used to track the last block read incase of failure.

python read_ethereum_data.py
Run the Spark subscriber script.
- Create a kafka topic called transform to write the transformed transactions to.
- Create an empty folder to for spark checkpoint.
  
  kafka-topics --list --bootstrap-server localhost:9092`
  
  kafka-topics --create --bootstrap-server localhost:9092 --replication-factor 1 --partitions 1 --topic transform
  
  spark-submit --packages org.apache.spark:spark-sql-kafka-0-10_2.12:3.5.1,org.apache.spark:spark-streaming-kafka-0-10_2.12:3.5.1 spark_subscriber.py
Access Superset

Open your browser and go to http://localhost:8088 Log in with the credentials: Username: admin Password: admin Connect to Clickhouse and create dashboards and charts based on the transaction table.

REFERENCES

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
fs/volumes		fs/volumes
superset		superset
README.md		README.md
clickhouse_create_table_ddl.sql		clickhouse_create_table_ddl.sql
docker-compose.yml		docker-compose.yml
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml
read_ethereum_data.py		read_ethereum_data.py
spark_subscriber.py		spark_subscriber.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Realtime Ethereum Transaction Pipeline

Getting Started

Tools Used

Overview

Running the Project

REFERENCES

Setting up Kafka on Docker

Connecting Clickhouse to Kafka

Setting up Superset on Docker

Setting up Clickhouse on Docker

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Realtime Ethereum Transaction Pipeline

Getting Started

Tools Used

Overview

Running the Project

REFERENCES

Setting up Kafka on Docker

Connecting Clickhouse to Kafka

Setting up Superset on Docker

Setting up Clickhouse on Docker

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages