-
Notifications
You must be signed in to change notification settings - Fork 310
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Update documentation to include tsbs_load and timestream
- Loading branch information
1 parent
0cf8054
commit 354f236
Showing
25 changed files
with
282 additions
and
6 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,76 @@ | ||
# TSBS Supplemental Guide: Timestream | ||
|
||
Amazon Timestream is a serverless time series database service. | ||
This supplemental guide explains how the data generated for TSBS is stored, | ||
additional flags available when using the data importer (`tsbs_load load timestream`), | ||
and additional flags available for the query runner (`tsbs_run_queries_timestream`). **This | ||
should be read *after* the main README.** | ||
|
||
## Data format | ||
|
||
Data generated by `tsbs_generate_data` for Timestream is serialized in a | ||
"pseudo-CSV" format, along with a custom header at the beginning. The | ||
header is several lines long: | ||
* one line composed of a comma-separated list of tag labels, with the literal string `tags` as the first value in the list | ||
* one or more lines composed of a comma-separated list of field labels, with the table name as the first value in the list | ||
* a blank line | ||
|
||
An example for the `cpu-only` use case: | ||
```text | ||
tags,hostname,region,datacenter,rack,os,arch,team,service,service_version,service_environment | ||
cpu,usage_user,usage_system,usage_idle,usage_nice,usage_iowait,usage_irq,usage_softirq,usage_steal,usage_guest,usage_guest_nice | ||
``` | ||
|
||
Following this, each reading is composed of two rows: | ||
1. a comma-separated list of tag values for the reading, with the literal string `tags` as the first value in the list | ||
1. a comma-separated list of field values for the reading, with the hypertable the reading belongs to being the first value and the timestamp as the second value | ||
|
||
An example for the `cpu-only` use case: | ||
```text | ||
tags,host_0,eu-central-1,eu-central-1b,21,Ubuntu15.10,x86,SF,6,0,test | ||
cpu,1451606400000000000,58.1317132304976170,2.6224297271376256,24.9969495069947882,61.5854484633778867,22.9481393231639395,63.6499207106198313,6.4098777048301052,44.8799140503027445,80.5028770761136201,38.2431182911542820 | ||
``` | ||
|
||
--- | ||
|
||
## `tsbs_load load timestream` Additional Flags | ||
|
||
#### loader.db-specific.aws-region (type: `string`, default `us-east-1`) | ||
|
||
AWS region where the db is located | ||
|
||
#### loader.db-specific.use-common-attributes (type: `boolean`, default `true`) | ||
|
||
Timestream client makes write requests with common attributes. | ||
If false, each value is written as a separate Record, and a request of 100 records at once is sent. | ||
|
||
#### loader.db-specific.hash-property (type: `string`, default `hostname`) | ||
|
||
Dimension to use when hasing points to different workers | ||
|
||
#### loader.db-specific.use-current-time (type: `boolean`, default: `false`) | ||
|
||
Use the current local timestamp when creating the records to load. | ||
Usefull when you don't want to worry about the retention period vs simulated period. | ||
|
||
#### loader.db-specific.mag-store-retention-in-days (type: `int`, default: `180`) | ||
|
||
The duration for which data must be stored in the magnetic store | ||
|
||
#### loader.db-specific.mem-store-retention-in-hours (type: `int`, default: `12`) | ||
|
||
The duration for which data must be stored in the memory store. | ||
|
||
--- | ||
## `tsbs_generate_queries` required `-db-name` flag | ||
|
||
Timestream requires the database name be part of the WHERE clause | ||
of every query, so the `--db-name` flag is a required flag | ||
|
||
--- | ||
## `tsbs_run_queries_timestream` Additional Flags | ||
|
||
#### `-aws-region` (type: `string`, default: `us-east-1`) | ||
|
||
AWS region where the database is located |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,109 @@ | ||
# Supplemental Guide for `tsbs_load` | ||
|
||
The `tsbs_load` executable can benchmark data ingestion | ||
for all the implemented databases. | ||
|
||
## Generating a config file | ||
|
||
`tsbs_load` uses YAML files to specify the configuration for | ||
running the load benchmark. | ||
|
||
The config file is separated in two top-level sections: | ||
```yaml | ||
data-source: | ||
... | ||
loader: | ||
... | ||
``` | ||
* `data-source` contains the configuration for where to | ||
read the data from (`type: SIMULATOR` or `type: FILE`) | ||
* For `SIMULATOR` the configuration specifies the time range to be simulated, | ||
the use-case, scale and other properties that regard the data | ||
* For `FILE` the configuration only specifies the location of the pre-generated | ||
file with `tsbs_generate_data` | ||
* `loader` contains the configuration for the loading the data. Two sub-sections are | ||
important here `db-specific` and `runner` | ||
* The `db-specific` configuration varies depending of the target database | ||
and for TimescaleDB contains information about user, password, ssl mode, while | ||
for influx it contains information about backoff interval, replication factor etc. | ||
* The `runner` configuration specifies the number of concurrent workers to use, | ||
batch size, hashing and so on | ||
|
||
To generate an example configuration file for a specific database run | ||
```shell script | ||
$ tsbs_load config --target=<db-name> --data-source=[FILE|SIMULATOR] | ||
``` | ||
specifying db-name to one of the implemented databases and data-source to | ||
FILE or SIMULATOR | ||
|
||
⚠️ **The generated config file will be populated with the default values for each property.** | ||
|
||
The generated config file is saved in `./config.yaml` | ||
|
||
## On the fly simulation and load with `data-source: SIMULATOR` | ||
|
||
When you run `tsbs_generate_data` a simulator is created for | ||
the selected use case and the simulated data points are serialized | ||
to a file. `tsbs_load` utilizes the same simulators but the | ||
simulated points are directly piped to the worker clients that send batches | ||
of data to the databases. | ||
|
||
You can notice that the same properties you configure in the YAML file | ||
are the same flags that you need to specify when running `tsbs_generate_data`. | ||
|
||
You can run `tsbs_load` with | ||
```shell script | ||
$ tsbs_load load <db_name> --config=./path-to-config.yaml | ||
``` | ||
Where `<db_name>` is one of the implemented databases or you can run | ||
```shell script | ||
$ tsbs_load load --help | ||
``` | ||
for a list of the available databases. | ||
|
||
## Information about a property and overriding | ||
|
||
The generated yaml file with `tsbs_load config` does not contain | ||
information about what each of the properties represents. You can easily discover | ||
more details about each property by running: | ||
|
||
```shell script | ||
$ tsbs_load load --help | ||
``` | ||
This will list all the available flags configurable for all databases. These flags | ||
include the flags for `data-source` and `loader.runner`. The `--loader.runner.db-name` flag | ||
corresponds to the property: | ||
```yaml | ||
loader: | ||
runner: | ||
db-name: some-db | ||
``` | ||
in the YAML config file. With the type, description, and default | ||
value next to the flag name as : | ||
|
||
```string, Name of database (default "benchmark")``` | ||
|
||
### Information about database specific flags | ||
|
||
Some of the properties are only valid for specific databases. These | ||
properties go under the `loader.db-specific` section. To view information | ||
about them you can run: | ||
```shell script | ||
$ tsbs_load load <db_name> --help | ||
``` | ||
|
||
For example for timescaledb, you can see the following: | ||
```shell script | ||
$ tsbs_load load timescaledb --help | ||
... | ||
--loader.db-specific.chunk-time | ||
duration | ||
Duration that each chunk should represent, e.g., 12h (default 12h0m0s) | ||
... | ||
``` | ||
|
||
### Overriding values | ||
|
||
* Each property has a default value, used if not otherwise overridden | ||
* An entry in the config YAML file overrides the default value | ||
* A flag passed at runtime overrides an entry in the YAML file |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,57 @@ | ||
#!/bin/bash | ||
|
||
# Exit immediately if a command exits with a non-zero status. | ||
set -e | ||
|
||
# Ensure runner is available | ||
EXE_FILE_NAME=${EXE_FILE_NAME:-$(which tsbs_run_queries_timestream)} | ||
if [[ -z "$EXE_FILE_NAME" ]]; then | ||
echo "tsbs_run_queries_timestream not available. It is not specified explicitly and not found in \$PATH" | ||
exit 1 | ||
fi | ||
|
||
# AWS region of database | ||
AWS_REGION=${AWS_REGION:"us-east-1"} | ||
|
||
# Queries folder | ||
BULK_DATA_DIR=${BULK_DATA_DIR:-"/tmp/bulk_queries"} | ||
|
||
# How many queries would be run | ||
MAX_QUERIES=${MAX_QUERIES:-"0"} | ||
|
||
# How many concurrent worker would run queries - match num of cores, or default to 4 | ||
NUM_WORKERS=${NUM_WORKERS:-$(grep -c ^processor /proc/cpuinfo 2> /dev/null || echo 4)} | ||
|
||
|
||
for FULL_DATA_FILE_NAME in ${BULK_DATA_DIR}/queries_timestream*; do | ||
# $FULL_DATA_FILE_NAME: /full/path/to/file_with.ext | ||
# $DATA_FILE_NAME: file_with.ext | ||
# $DIR: /full/path/to | ||
# $EXTENSION: ext | ||
# NO_EXT_DATA_FILE_NAME: file_with | ||
|
||
DATA_FILE_NAME=$(basename -- "${FULL_DATA_FILE_NAME}") | ||
DIR=$(dirname "${FULL_DATA_FILE_NAME}") | ||
EXTENSION="${DATA_FILE_NAME##*.}" | ||
NO_EXT_DATA_FILE_NAME="${DATA_FILE_NAME%.*}" | ||
|
||
# Several options on how to name results file | ||
#OUT_FULL_FILE_NAME="${DIR}/result_${DATA_FILE_NAME}" | ||
OUT_FULL_FILE_NAME="${DIR}/result_${NO_EXT_DATA_FILE_NAME}.out" | ||
#OUT_FULL_FILE_NAME="${DIR}/${NO_EXT_DATA_FILE_NAME}.out" | ||
|
||
if [ "${EXTENSION}" == "gz" ]; then | ||
GUNZIP="gunzip" | ||
else | ||
GUNZIP="cat" | ||
fi | ||
|
||
echo "Running ${DATA_FILE_NAME}" | ||
cat $FULL_DATA_FILE_NAME \ | ||
| $GUNZIP \ | ||
| $EXE_FILE_NAME \ | ||
--max-queries $MAX_QUERIES \ | ||
--workers $NUM_WORKERS \ | ||
--aws-region $AWS_REGION | ||
| tee $OUT_FULL_FILE_NAME | ||
done |