pantarhei

pantarhei reads in arbitrary textual resources and emits them in tokenized chunks. It could thus serve as a provider of "streamified" textual data e.g. to a Kafka, Flink or Spark cluster.

Installation

pantarhei requires JDK 1.8+, git and SBT (simple build tool) to compile the source code.

The installation itself is as simple as this:

git clone https://github.com/sschuepbach/pantarhei
cd pantarhei
sbt assembly

You should then get a "fat" JAR file in the target/ folder.

Usage

To run the application issue the following command:

java -jar target/<jar-file> <options>

The following options are available:

-i, --inputs <value>        Input files / URLs / directories to process (required). Argument can be used several times.
-t, --token <value>         Tokens to be extracted: One of 'sentence', 'line', 'word', 'char'. Defaults to 'sentence'.
-k, --kafka <value>         Kafka sink. Value must match <host>:<port>:<groupId>:<topic>. Off per default
-s, --socket <value>        Connects to socket. Value must match <port>. Off per default
-o, --stdout <value>        Send output to STDOUT. On per default
-e, --emitFreq <value>      Emission frequency in milliseconds. Exact value or range (<ms>-<ms>). Defaults to 1000
-n, --tokenNumber <value>   Number of tokens per emission. Exact value or range (<tpe>-<tpe>). Defaults to 1
--help                      Prints this help text.

Known bugs

Socket sink is defunct at the moment

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
project		project
src		src
.gitignore		.gitignore
LICENSE.md		LICENSE.md
README.md		README.md
build.sbt		build.sbt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

pantarhei

Installation

Usage

Known bugs

About

Releases

Packages

Languages

License

dataramblers/pantarhei

Folders and files

Latest commit

History

Repository files navigation

pantarhei

Installation

Usage

Known bugs

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages