Faust -- Python Stream Processing Library
This library adapts ideas from Java-based Kafka Streams to Python for stream processing. Kafka is the underlying message system of Faust.
Note: Faust has not beeen updated for more than 2 years now. But, there is a fork called faust-streaming which is updated at least 4 months ago.
However, I would start practicing with the Faust native library because of its popularity.
The goal is to overcome the limitation that streaming libraries not as much popular with Python unlike the machine learning and data mining libraries. One of interesting articles regarding the same is available at medium.com.
Before running the Faust App, Zookeeper and Kafka needs to be run. The steps are as follows (in case you are using any platform other than Mac, please follow their corresponding steps):
- First download Kafka (which also includes Zookeeper in it). In my case, I downloaded it from apache-kafka-for-mac.
Where to download to?:
- I created a folder named kafka in the root directory (that is, the folder containing this README.md file). I also ensured to remove this folder from tracking (that is, added kafka/* to .gitignore).
- Downloaded the binaries of version 3.9.0 (filename: kafka_2.13-3.9.0.tgz) into kafka folder and extracted into the same kafka folder.
- Note: Kafka version 4.0.0 is also available for download, but Zookeeper has been removed in this version as Apache Kafka 4.0 only supports KRaft mode. TO read more about it, please refer to Kafka upgrade notes.
- Set KAFKA_HOME path (path of the extracted kafka files)
export KAFKA_HOME=./kafka/kafka_2.13-3.9.0 - (If not already installed) install openjdk (via commandline)
Then,
brew install openjdk
- for the system Java wrappers to find this JDK, symlink it by executing
sudo ln -sfn /opt/homebrew/opt/openjdk/libexec/openjdk.jdk /Library/Java/JavaVirtualMachines/openjdk.jdk
- if you need to have openjdk first in your PATH, run:
echo 'export PATH="/opt/homebrew/opt/openjdk/bin:$PATH"' >> ~/.zshrc
- for compilers to find openjdk, you may need to set:
export CPPFLAGS="-I/opt/homebrew/opt/openjdk/include"
- for the system Java wrappers to find this JDK, symlink it by executing
- Run Zookeeper
The sample output should look something like
$KAFKA_HOME/bin/zookeeper-server-start.sh $KAFKA_HOME/config/zookeeper.properties

- Run Kafka
The sample output should look something like
$KAFKA_HOME/bin/kafka-server-start $KAFKA_HOME/config/server.properties

- Install Python 3.11 (due to compatability issues with Python 3.13)
brew install [email protected]
- Create a Python virtual environment (i your favorite place) and activate it.
python3.11 -m venv py3_11_venv source py3_11_venv/bin/activate
-
Install the packages and dependencies from requirements.txt from commandline using
pip install -r requirements.txt
-
Start the worker (based on the code in hello_world.py).
faust -A hello_world worker -l info
-
Now, you can send the messages to the wishes topic.
faust -A hello_world send @wish "Hello Faust" faust -A hello_world send wishes "Hi Hallo, Faust"
-
You can also see the above sent messages being processed by the worker as

-
In case if you are wondering where these messages are stored, you can find them at
log.dirs. That is, in/tmp/kafka-logsdirectory. Each topic has its own directory in this logs directory and contents of the each topic directory looks something like
- Stream: unbounded data that is accessible over the time.
- Event: details of some incident packed into a self containing object.
- Producer: one that produces the event, such as sensors or web logs.
- Consumer: one that consumes the events generated by the producer.
- Message broker: a mechanism where producers write their messages and accordingly it notifies the corresponding consumers to receive these messages.
- Stream processing: operate on the stream data to store it in database or visualize it.
- Stream analytics: aggregate the sequence of event stream data such as to count the number of events in last one minute or, some window time.
- Tumbling window: non-overlapping window over the stream
- Hopping window: overlapping window over the stream
- Straggler events: events that are yet to be be received for stream window processing.
- zookeeper: a centralized service provided by apache to keep track of Kafka topics and Kafka cluster nodes; allows simultaneous reads writes from multiple clients. Summarily, it provides synchronization within the distributed system.

