You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
# Elasticsearch For Beginners: Generate and Upload Randomized Test Data
1
+
# OpenSearch : Generate and Upload Randomized Test Data
6
2
7
3
Because everybody loves test data.
8
4
9
5
## Ok, so what is this thing doing?
10
6
11
-
`es_test_data.py` lets you generate and upload randomized test data to
12
-
your ES cluster so you can start running queries, see what performance
13
-
is like, and verify your cluster is able to handle the load.
7
+
`opensearch_test.py` lets you generate and upload randomized test data and queries, see what performance is like, and verify your cluster is able to handle the load.
14
8
15
-
It allows for easy configuring of what the test documents look like, what
16
-
kind of data types they include and what the field names are called.
9
+
It allows for easy configuring of what the test documents look like, what kind of data types they include and what the field names are called.
17
10
18
-
## Cool, how do I use this?
11
+
## File Descriptions
19
12
20
-
### Run Python script
13
+
-**requirements.txt**: Lists the Python dependencies required to run the test script.
14
+
-**modules** Folder: Modularized python fuctions.
15
+
-**opensearch_test.py**: The main Python script that performs the OpenSearch tests.
16
+
-**LICENSE**: License
17
+
-**Dockerfile**: Contains the instructions to build the Docker image for the test environment.
21
18
22
-
Let's assume you have an Elasticsearch cluster running.
19
+
## Before Using it
23
20
24
-
Python and [Tornado](https://github.com/tornadoweb/tornado/) are used. Run
25
-
`pip install tornado` to install Tornado if you don't have it already.
21
+
### Change Server Config
26
22
27
-
It's as simple as this:
23
+
Recommended method for Config is create `server.conf` file and input the values needed.
to an Elasticsearch cluster at `http://localhost:9200` to an index called
61
-
`test_data`.
62
-
63
-
### Docker and Docker Compose
64
-
65
-
Requires [Docker](https://docs.docker.com/get-docker/) for running the app and [Docker Compose](https://docs.docker.com/compose/install/) for running a single ElasticSearch domain with two nodes (es1 and es2).
66
-
67
-
1. Set the maximum virtual memory of your machine to `262144` otherwise the ElasticSearch instances will crash, [see the docs](https://www.elastic.co/guide/en/elasticsearch/reference/current/vm-max-map-count.html)
1. Run the app and inject random data to the ES stack
81
-
```bash
82
-
$ docker run --rm -it --network host oliver006/es-test-data \
83
-
--es_url=http://localhost:9200 \
84
-
--batch_size=10000 \
85
-
--username=elastic \
86
-
--password="esbackup-password"
87
-
```
88
-
1. Cleanup
89
-
```bash
90
-
$ docker-compose down --volumes
91
-
```
92
-
93
-
## Not bad but what can I configure?
94
-
95
-
`python es_test_data.py --help` gives you the full set of command line
96
-
ptions, here are the most important ones:
97
-
98
-
- `--es_url=http://localhost:9200` the base URL of your ES node, don't
99
-
include the index name
100
-
- `--username=<username>` the username when basic auth is required
101
-
- `--password=<password>` the password when basic auth is required
102
-
- `--count=###` number of documents to generate and upload
103
-
- `--index_name=test_data` the name of the index to upload the data to.
104
-
If it doesn't exist it'll be created with these options
105
-
- `--num_of_shards=2` the number of shards for the index
106
-
- `--num_of_replicas=0` the number of replicas for the index
107
-
- `--batch_size=###` we use bulk upload to send the docs to ES, this option
108
-
controls how many we send at a time
109
-
- `--force_init_index=False` if `True` it will delete and re-create the index
110
-
- `--dict_file=filename.dic` if provided the `dict` data type will use words
111
-
from the dictionary file, format is one word per line. The entire file is
112
-
loaded at start-up so be careful with (very) large files.
113
-
- `--data_file=filename.json|filename.csv` if provided all data in the filename will be inserted into es. The file content has to be an array of json objects (the documents). If the file ends in `.csv` then the data is automatically converted into json and inserted as documents.
114
-
115
-
## What about the document format?
116
-
117
-
Glad you're asking, let's get to the doc format.
118
-
119
-
The doc format is configured via `--format=<<FORMAT>>` with the default being
| data_file | Name of the documents file to use | None |
64
+
| dict_file | Name of dictionary file to use | None |
65
+
| finish_time | Shape Finish Time in '%Y-%m-%d %H:%M:%S' format | None |
66
+
| force_init_index | Force deleting and re-initializing the OpenSearch index | False |
67
+
| format | Message format | (truncated for brevity) |
68
+
| http_upload_timeout | Timeout in seconds when uploading data | 10 |
69
+
| id_type | Type of 'id' to use for the docs, int or uuid4 | None |
70
+
| index_name | Name of the index to store your messages | test |
71
+
| index_type | Index type | test_type |
72
+
| number_of_replicas | Number of replicas for OpenSearch index | 1 |
73
+
| number_of_shards | Number of shards for OpenSearch index | 1 |
74
+
| opensearch_url | URL of your OpenSearch node |http://localhost:9200|
75
+
| out_file | Write test data to out_file as well | False |
76
+
| password | Password for OpenSearch | None |
77
+
| random_seed | Random seed number for Faker | None |
78
+
| set_refresh | Set refresh rate to -1 before starting the upload | False |
79
+
| start_time | Shape Start Time in '%Y-%m-%d %H:%M:%S' format | None |
80
+
| username | Username for OpenSearch | None |
81
+
| validate_cert | SSL validate_cert for requests. Use false for self-signed certificates | True |
82
+
83
+
Main configuration values are as follows.
84
+
85
+
-`action`: [generate_data, query_all, custom_query, delete_index, all] choose one
86
+
- generate_data: upload the data generated through `format` to the OpenSearch database.
87
+
- query_all: request all values of the specified index within the range using `start_time` and `finish_time`.
88
+
- custom_query: You can specify the values for the body used in the request through a JSON file. this option require `json_path`. For more [read docs](https://opensearch.org/docs/latest/api-reference/search/)
89
+
- delete_index: All data at the specified index will be deleted. (Please use with caution.)
90
+
- all: I will conduct whole process test.(generate_data -> query_all -> delete_index)
91
+
-`start_time` and `finish_time`: If values are `None`, they will default to 30 days before and 30 days after the current time.
92
+
- The values that need to be set according to the server's security settings are as follows:
93
+
-`validate_cert`
94
+
-`client_cert`
95
+
-`client_key`
96
+
-`username`
97
+
-`password`
98
+
-`format`: See this section [Generate Custom Document format](#generate-custom-document-format)
99
+
-`random_seed`: Most of the values are generated using the `Faker` library. This value is a random seed number used for that.
100
+
-`count` and `batch_size`: `count` represents the total number of docs to be generated, while the `batch_size` specifies the number of docs that uploaded at once.
101
+
102
+
### Generate Custom Document format
130
103
131
104
-`bool` returns a random true or false
132
105
-`ts` a timestamp (in milliseconds), randomly picked between now +/- 30 days
@@ -146,17 +119,84 @@ Currently supported field types are:
146
119
given list of `-` seperated words, the words are optional defaulting to
147
120
`text1``text2` and `text3`, min and max are optional, defaulting to `1`
148
121
and `1`
149
-
- `arr:[array_length_expression]:[single_element_format]` an array of entries
150
-
with format specified by `single_element_format`. `array_length_expression`
151
-
can be either a single number, or pair of numbers separated by `-` (i.e. 3-7),
152
-
defining range of lengths from with random length will be picked for each array
153
-
(Example `int_array:arr:1-5:int:1:250`)
122
+
-`arr:[array_length_expression]:[single_element_format]` an array of entries with format specified by `single_element_format`. `array_length_expression` can be either a single number, or pair of numbers separated by `-` (i.e. 3-7), defining range of lengths from with random length will be picked for each array (Example `int_array:arr:1-5:int:1:250`)
123
+
-`log_version` a random version `str` looks like v1.1.1
124
+
-`sha` generate random sha(len 40)
125
+
-`file_name` Generate fake python file(.py)
126
+
-`uuid` Generate fake uuid
127
+
-`service` Generate fake service name
128
+
129
+
for more read this github repo [README.md](https://github.com/obazda20/opensearch-test-data?tab=readme-ov-file#what-about-the-document-format) file
130
+
131
+
## Testing Methods
132
+
133
+
Testing can be run either use `Docker` or run the `Python` file directly.
134
+
135
+
### Using Docker
136
+
137
+
`Docker` must be installed, and you need to have an internet connection.
0 commit comments