Description

This is a command line tool that facilitates data enrichment of any Elasticsearch index.

It was originally designed to enrich BODS v0.4 data, but it actually facilitates the enrichment of any Elasticsearch index.

It needs one "source" index, containing the data to be enriched, and one or more different indices containing the additional data to be added to the source.

In order to keep this flexible, a configuration file is used to map the fields that will be enriched, the enrichment data source for each field, and the fields that will be used to match the documents in the source index with the documents in the enrich index.

The first enricher use case is to add additional identifiers to BODS data, using the LEI as the key to match the documents. The additional identifiers will be sourced from OpenCorporates data.

Relevant documentation:

Enrichment concepts

In Elasticsearch enrichment, match_field, enrich_field, and target_field are crucial parameters that define how data from a source index is added to documents in a target index.

match_field: This field specifies which field in the source index should be used to find matching documents. Elasticsearch compares the value of this field in the source index with a corresponding field in the incoming document to determine if there is a match.
enrich_field: This parameter specifies the field from the source index that will be added to the target document. It contains the enrich data from the source indices that you want to add to incoming documents.
target_field: This parameter specifies the name of the field that will be added to the incoming document. The matched enrich_field's value will be added to this target_field in the target document. This target field will contain the matched and enrichment fields specified in the enrich policy.

Installation

This tool is written in Python, and can be installed using uv:

uv sync --frozen

Alternatively, you may use the bundled shell script:

./bin/install

Usage

The tool can be run using the following command:

uv run enricher

The tool will look for a configuration file in the current directory, named config.yaml.

Commands

`config`

This command is a small utility to manage the configuration file.

# get: retrieve a value from the configuration file
uv run python -m enricher config get <key>
# Example:
uv run python -m enricher config get source.index

# set: set a value in the configuration file
uv run python -m enricher config set <key> <value>
# Example:
uv run python -m enricher config set source.index entity

# set: remove a key from the configuration file
uv run python -m enricher config set <key>
# Example:
uv run python -m enricher config set source.index

`enrich`

This command is used to set up and start the enrichment process.

# setup: set up the enrichment processor(s) configured
uv run python -m enricher enrich setup

# start: start the enrichment process.
uv run python -m enricher enrich start

Configuration

This utility is configured via a YAML file, named config.yaml by default. There are two main sections in the configuration file: source and enrichment. Multiple enrichment policies can be defined in the enrichment section, each with its own set of parameters.

The source section contains the configuration for the source index, which is the index that will be enriched. The enrichments section contains the configuration for the enrichment indices, which are the indices that will be used to enrich the source index.

The configuration file should contain the following keys:

source.index: the name of the Elasticsearch index containing the data to be enriched.
source.host: the full URI of the Elasticsearch instance.
enrichments:
- name: the name of the enrichment policy (e.g. add_oc_ids).
- index: the name of the Elasticsearch index containing the enrichment data.
- match_field: the field in the source index that will be used to match documents.
- enrich_field: the field in the enrichment index that will be added to the source index.
- target_field: the field in the target index that will be used to store the enriched data.

Docker

This tool can be run using Docker and Docker Compose.

Building:

docker-compose build enricher

Running:

docker-compose run --rm enricher
docker-compose run --rm enricher <command>
# Examples
docker-compose run --rm enricher config get source.host
docker-compose run --rm enricher enrich start

Development

To run the tests, use the following command:

uv run pytest

To run the linter, use the following command:

uv run flake8

To run the formatter, use the following command:

uv run black .

License

This project is licensed under the GNU Affero General Public License v3.0.

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
bin		bin
enricher		enricher
fixtures		fixtures
.dockerignore		.dockerignore
.env.example		.env.example
.flake8		.flake8
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
config.example.yaml		config.example.yaml
docker-compose.yml		docker-compose.yml
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Description

Enrichment concepts

Installation

Usage

Commands

`config`

`enrich`

Configuration

Docker

Development

License

About

Releases

Packages

Languages

License

openownership/enricher

Folders and files

Latest commit

History

Repository files navigation

Description

Enrichment concepts

Installation

Usage

Commands

config

enrich

Configuration

Docker

Development

License

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

`config`

`enrich`

Packages