Lightweight ETL to store (primarily) challenge receipt data.
-
Follow these instructions to run the latest version of
blockchain-node
.- NOTE: You may want to make changes to the node configuration in
config/sys.config
, namely:- Change
store_json
parameter fromfalse
totrue
- Change
fetch_latest_from_snap_source
fromtrue
tofalse
. You can then put a valid snapshot height inblessed_snapshot_height
if you would like your node to load some historical data. However, bear in mind that the further back you go, the more difficult it is to find a peer with the snap, and the longer it will take to sync.
- Change
- NOTE: You may want to make changes to the node configuration in
-
In terms of hardware recommendations, I run the node, database, and this ETL on a c5ad.2xlarge with some room to spare. Your RAM requirements will largely depend on how you index the database (see Database Recommendations).
-
Make sure that you have a valid postgres database to connect to.
-
Clone this repository and
cd
into the main directory -
Make a copy of
.env.template
, call it.env
, and edit the environment variables with your settings. -
Install dependencies with
pip install -r requirements.txt
- Run the migrations to create the necessary tables
python etl.py --migrate
- Start the block follower
python etl.py --start
After backfilling all blocks stored on the node, the service will listen for new blocks and process them as they come in.
- I recommend installing
postgis
and theh3
extensions on your postgres instance for additional functionality, such as distance calculations as kRing-based queries. - Because the ETL deletes rows that are older than the
BLOCK_INVENTORY_SIZE
environment variable, certain tables - namelychallenge_receipts_parsed
- can get bogged down with a large number of dead tuples. I strongly recommend implementing a regularVACUUM
operation to alleviate this issue. You can schedule the jobs using a tool likepg_cron
. - Within
challenge_receipts_parsed
, put indexes onblock
,transmitter_address
, andwitness_address
columns for better performance on common queries.
If you need deep historical records of the Helium ledger, use blockchain-etl or the public API. This tool is best suited for short-term analyses (~5-10 days) of recent chain events.
At this point, the service populates the following tables:
challenge_receipts_parsed
payments_parsed
gateway_inventory
(refreshed daily via a bulk download from DeWi ETL Data Dumps)denylist
See migrations.py for the SQLAlchemy schema definitions.
Build image with:
make docker-build
Then start it with:
make docker-start
To learn more: make help
Navigate to your copy of the helium-transaction-etl
repository.
cd /path/to/helium-transaction-etl
Stop the container and remove it.
make docker-clean
Update the repository.
git pull
Build image.
make docker-build
Start the updated Docker container.
make docker-start
See logs.
docker logs transaction-etl
- h3-countries: Postgres-based mapping of h3 indices to country codes.
- dewi-alliance/hplans: Helium frequency plan regions in GeoJSON format.
- dewi-alliance/helium-etl-lite: A general-purpose light ETL for the Helium blockchain.