Skip to content

Commit 3ff20ba

Browse files
author
Leo Park
committed
Docs: Update README.md
1 parent 8bf1752 commit 3ff20ba

File tree

1 file changed

+164
-124
lines changed

1 file changed

+164
-124
lines changed

README.md

Lines changed: 164 additions & 124 deletions
Original file line numberDiff line numberDiff line change
@@ -1,132 +1,105 @@
1-
# Quick fix for opensearch 2.6
2-
- fix index creation
3-
- remove doc type
4-
5-
# Elasticsearch For Beginners: Generate and Upload Randomized Test Data
1+
# OpenSearch : Generate and Upload Randomized Test Data
62

73
Because everybody loves test data.
84

95
## Ok, so what is this thing doing?
106

11-
`es_test_data.py` lets you generate and upload randomized test data to
12-
your ES cluster so you can start running queries, see what performance
13-
is like, and verify your cluster is able to handle the load.
7+
`opensearch_test.py` lets you generate and upload randomized test data and queries, see what performance is like, and verify your cluster is able to handle the load.
148

15-
It allows for easy configuring of what the test documents look like, what
16-
kind of data types they include and what the field names are called.
9+
It allows for easy configuring of what the test documents look like, what kind of data types they include and what the field names are called.
1710

18-
## Cool, how do I use this?
11+
## File Descriptions
1912

20-
### Run Python script
13+
- **requirements.txt**: Lists the Python dependencies required to run the test script.
14+
- **modules** Folder: Modularized python fuctions.
15+
- **opensearch_test.py**: The main Python script that performs the OpenSearch tests.
16+
- **LICENSE**: License
17+
- **Dockerfile**: Contains the instructions to build the Docker image for the test environment.
2118

22-
Let's assume you have an Elasticsearch cluster running.
19+
## Before Using it
2320

24-
Python and [Tornado](https://github.com/tornadoweb/tornado/) are used. Run
25-
`pip install tornado` to install Tornado if you don't have it already.
21+
### Change Server Config
2622

27-
It's as simple as this:
23+
Recommended method for Config is create `server.conf` file and input the values needed.
2824

29-
```
30-
$ python es_test_data.py --es_url=http://localhost:9200
31-
[I 150604 15:43:19 es_test_data:42] Trying to create index http://localhost:9200/test_data
32-
[I 150604 15:43:19 es_test_data:47] Guess the index exists already
33-
[I 150604 15:43:19 es_test_data:184] Generating 10000 docs, upload batch size is 1000
34-
[I 150604 15:43:19 es_test_data:62] Upload: OK - upload took: 25ms, total docs uploaded: 1000
35-
[I 150604 15:43:20 es_test_data:62] Upload: OK - upload took: 25ms, total docs uploaded: 2000
36-
[I 150604 15:43:20 es_test_data:62] Upload: OK - upload took: 19ms, total docs uploaded: 3000
37-
[I 150604 15:43:20 es_test_data:62] Upload: OK - upload took: 18ms, total docs uploaded: 4000
38-
[I 150604 15:43:20 es_test_data:62] Upload: OK - upload took: 27ms, total docs uploaded: 5000
39-
[I 150604 15:43:20 es_test_data:62] Upload: OK - upload took: 19ms, total docs uploaded: 6000
40-
[I 150604 15:43:20 es_test_data:62] Upload: OK - upload took: 15ms, total docs uploaded: 7000
41-
[I 150604 15:43:20 es_test_data:62] Upload: OK - upload took: 24ms, total docs uploaded: 8000
42-
[I 150604 15:43:20 es_test_data:62] Upload: OK - upload took: 32ms, total docs uploaded: 9000
43-
[I 150604 15:43:20 es_test_data:62] Upload: OK - upload took: 31ms, total docs uploaded: 10000
44-
[I 150604 15:43:20 es_test_data:216] Done - total docs uploaded: 10000, took 1 seconds
45-
[I 150604 15:43:20 es_test_data:217] Bulk upload average: 23 ms
46-
[I 150604 15:43:20 es_test_data:218] Bulk upload median: 24 ms
47-
[I 150604 15:43:20 es_test_data:219] Bulk upload 95th percentile: 31 ms
48-
```
49-
50-
Without any command line options, it will generate and upload 1000 documents
51-
of the format
25+
If you are not using a config file, you need to pass the required values as `arguments`.
26+
27+
However, when there are many values to set, it is much more convenient to create and use a `server.conf` file.
5228

29+
Enter the desired options in the `server.conf` file.
30+
31+
Example:
32+
33+
Create the configure file
34+
35+
```shell
36+
cd ${REPO}
37+
touch server.conf
38+
${EDITOR} server.conf
5339
```
54-
{
55-
"name":<<str>>,
56-
"age":<<int>>,
57-
"last_updated":<<ts>>
58-
}
40+
41+
Edit configure file
42+
43+
### Basic configure file example
44+
45+
```conf
46+
# server.conf
47+
action = "all"
48+
opensearch_url = "https://uri.for.opensearch.db:port"
49+
username = "./client_cert.pem"
50+
password = "./client_key.pem"
5951
```
60-
to an Elasticsearch cluster at `http://localhost:9200` to an index called
61-
`test_data`.
62-
63-
### Docker and Docker Compose
64-
65-
Requires [Docker](https://docs.docker.com/get-docker/) for running the app and [Docker Compose](https://docs.docker.com/compose/install/) for running a single ElasticSearch domain with two nodes (es1 and es2).
66-
67-
1. Set the maximum virtual memory of your machine to `262144` otherwise the ElasticSearch instances will crash, [see the docs](https://www.elastic.co/guide/en/elasticsearch/reference/current/vm-max-map-count.html)
68-
```bash
69-
$ sudo sysctl -w vm.max_map_count=262144
70-
```
71-
1. Clone this repository
72-
```bash
73-
$ git clone https://github.com/oliver006/elasticsearch-test-data.git
74-
$ cd elasticsearch-test-data
75-
```
76-
1. Run the ElasticSearch stack
77-
```bash
78-
$ docker-compose up --detached
79-
```
80-
1. Run the app and inject random data to the ES stack
81-
```bash
82-
$ docker run --rm -it --network host oliver006/es-test-data \
83-
--es_url=http://localhost:9200 \
84-
--batch_size=10000 \
85-
--username=elastic \
86-
--password="esbackup-password"
87-
```
88-
1. Cleanup
89-
```bash
90-
$ docker-compose down --volumes
91-
```
92-
93-
## Not bad but what can I configure?
94-
95-
`python es_test_data.py --help` gives you the full set of command line
96-
ptions, here are the most important ones:
97-
98-
- `--es_url=http://localhost:9200` the base URL of your ES node, don't
99-
include the index name
100-
- `--username=<username>` the username when basic auth is required
101-
- `--password=<password>` the password when basic auth is required
102-
- `--count=###` number of documents to generate and upload
103-
- `--index_name=test_data` the name of the index to upload the data to.
104-
If it doesn't exist it'll be created with these options
105-
- `--num_of_shards=2` the number of shards for the index
106-
- `--num_of_replicas=0` the number of replicas for the index
107-
- `--batch_size=###` we use bulk upload to send the docs to ES, this option
108-
controls how many we send at a time
109-
- `--force_init_index=False` if `True` it will delete and re-create the index
110-
- `--dict_file=filename.dic` if provided the `dict` data type will use words
111-
from the dictionary file, format is one word per line. The entire file is
112-
loaded at start-up so be careful with (very) large files.
113-
- `--data_file=filename.json|filename.csv` if provided all data in the filename will be inserted into es. The file content has to be an array of json objects (the documents). If the file ends in `.csv` then the data is automatically converted into json and inserted as documents.
114-
115-
## What about the document format?
116-
117-
Glad you're asking, let's get to the doc format.
118-
119-
The doc format is configured via `--format=<<FORMAT>>` with the default being
120-
`name:str,age:int,last_updated:ts`.
121-
122-
The general syntax looks like this:
123-
124-
`<<field_name>>:<<field_type>>,<<field_name>>::<<field_type>>, ...`
125-
126-
For every document, `es_test_data.py` will generate random values for each of
127-
the fields configured.
128-
129-
Currently supported field types are:
52+
53+
### What can be configure?
54+
55+
| Setting | Description | Default Value |
56+
| ----------------------- | ---------------------------------------------------------------------- | ----------------------- |
57+
| action | Specify the action to be performed. | all |
58+
| json_path | Query JSON file path | None |
59+
| batch_size | OpenSearch bulk index batch size | 1000 |
60+
| client_cert | Filepath of CA certificates in PEM format | None |
61+
| client_key | Filepath of client SSL key | None |
62+
| count | Number of docs to generate | 100000 |
63+
| data_file | Name of the documents file to use | None |
64+
| dict_file | Name of dictionary file to use | None |
65+
| finish_time | Shape Finish Time in '%Y-%m-%d %H:%M:%S' format | None |
66+
| force_init_index | Force deleting and re-initializing the OpenSearch index | False |
67+
| format | Message format | (truncated for brevity) |
68+
| http_upload_timeout | Timeout in seconds when uploading data | 10 |
69+
| id_type | Type of 'id' to use for the docs, int or uuid4 | None |
70+
| index_name | Name of the index to store your messages | test |
71+
| index_type | Index type | test_type |
72+
| number_of_replicas | Number of replicas for OpenSearch index | 1 |
73+
| number_of_shards | Number of shards for OpenSearch index | 1 |
74+
| opensearch_url | URL of your OpenSearch node | http://localhost:9200 |
75+
| out_file | Write test data to out_file as well | False |
76+
| password | Password for OpenSearch | None |
77+
| random_seed | Random seed number for Faker | None |
78+
| set_refresh | Set refresh rate to -1 before starting the upload | False |
79+
| start_time | Shape Start Time in '%Y-%m-%d %H:%M:%S' format | None |
80+
| username | Username for OpenSearch | None |
81+
| validate_cert | SSL validate_cert for requests. Use false for self-signed certificates | True |
82+
83+
Main configuration values are as follows.
84+
85+
- `action`: [generate_data, query_all, custom_query, delete_index, all] choose one
86+
- generate_data: upload the data generated through `format` to the OpenSearch database.
87+
- query_all: request all values of the specified index within the range using `start_time` and `finish_time`.
88+
- custom_query: You can specify the values for the body used in the request through a JSON file. this option require `json_path`. For more [read docs](https://opensearch.org/docs/latest/api-reference/search/)
89+
- delete_index: All data at the specified index will be deleted. (Please use with caution.)
90+
- all: I will conduct whole process test.(generate_data -> query_all -> delete_index)
91+
- `start_time` and `finish_time`: If values are `None`, they will default to 30 days before and 30 days after the current time.
92+
- The values that need to be set according to the server's security settings are as follows:
93+
- `validate_cert`
94+
- `client_cert`
95+
- `client_key`
96+
- `username`
97+
- `password`
98+
- `format`: See this section [Generate Custom Document format](#generate-custom-document-format)
99+
- `random_seed`: Most of the values are generated using the `Faker` library. This value is a random seed number used for that.
100+
- `count` and `batch_size`: `count` represents the total number of docs to be generated, while the `batch_size` specifies the number of docs that uploaded at once.
101+
102+
### Generate Custom Document format
130103

131104
- `bool` returns a random true or false
132105
- `ts` a timestamp (in milliseconds), randomly picked between now +/- 30 days
@@ -146,17 +119,84 @@ Currently supported field types are:
146119
given list of `-` seperated words, the words are optional defaulting to
147120
`text1` `text2` and `text3`, min and max are optional, defaulting to `1`
148121
and `1`
149-
- `arr:[array_length_expression]:[single_element_format]` an array of entries
150-
with format specified by `single_element_format`. `array_length_expression`
151-
can be either a single number, or pair of numbers separated by `-` (i.e. 3-7),
152-
defining range of lengths from with random length will be picked for each array
153-
(Example `int_array:arr:1-5:int:1:250`)
122+
- `arr:[array_length_expression]:[single_element_format]` an array of entries with format specified by `single_element_format`. `array_length_expression` can be either a single number, or pair of numbers separated by `-` (i.e. 3-7), defining range of lengths from with random length will be picked for each array (Example `int_array:arr:1-5:int:1:250`)
123+
- `log_version` a random version `str` looks like v1.1.1
124+
- `sha` generate random sha(len 40)
125+
- `file_name` Generate fake python file(.py)
126+
- `uuid` Generate fake uuid
127+
- `service` Generate fake service name
128+
129+
for more read this github repo [README.md](https://github.com/obazda20/opensearch-test-data?tab=readme-ov-file#what-about-the-document-format) file
130+
131+
## Testing Methods
132+
133+
Testing can be run either use `Docker` or run the `Python` file directly.
134+
135+
### Using Docker
136+
137+
`Docker` must be installed, and you need to have an internet connection.
154138

139+
Follow [Install Docker Engine](https://docs.docker.com/engine/install/)
155140

156-
## Todo
141+
Do not forgot to mount all files using -v when running `docker run`.
142+
143+
```shell
144+
cd ${REPO}
145+
docker build -t <docker_image_name>:<tag> .
146+
docker run -v ${PWD}/client_cert.pem:/app/client_cert.pem -v ${PWD}/client_key.pem:/app/client_key.pem -v /server.conf:/app/server.conf <docker_image_name>:<tag>
147+
```
148+
149+
### Using python
150+
151+
[python3](https://www.python.org/downloads/) needs to be installed.
152+
153+
Dependencies can be installed from `requirements.txt`.
154+
155+
Example:
156+
157+
```shell
158+
# install requirements
159+
pip install --no-cache-dir -r requirements.txt
160+
```
161+
162+
After the installation of dependencies is complete, you should check if the required file paths are set correctly, and then you can run it as follows.
163+
164+
Please check if the following files are properly configured before execution.
165+
166+
- server.conf
167+
- client_cert.pem(optional)
168+
- client_key.pem(optional)
169+
- custom_query_json_file(optional)
170+
171+
```shell
172+
# run script
173+
python opensearch_test.py
174+
```
175+
176+
## Result Example
177+
178+
```bash
179+
[I 241114 10:42:31 opensearch_test:47] ***Start Data Generate Test***
180+
[I 241114 10:42:32 requests:83] Trying to create index https://db.test.co.kr/test
181+
[I 241114 10:42:33 requests:89] Looks like the index exists already
182+
[I 241114 10:42:33 upload_data:83] Generating 20000 docs, upload batch size is 10000
183+
[I 241114 10:42:55 requests:109] Upload: OK - upload took: 4551ms, total docs uploaded: 10000
184+
[I 241114 10:43:18 requests:109] Upload: OK - upload took: 4598ms, total docs uploaded: 20000
185+
[I 241114 10:43:18 upload_data:123] Done - total docs uploaded: 20000, took 45 seconds
186+
[I 241114 10:43:18 opensearch_test:52] ***Start Query All Test***
187+
[I 241114 10:43:41 requests:226] Total hits: 1022000, Total pages: 103
188+
[I 241114 10:43:44 requests:231] Retrieved page 1 of 103
189+
[I 241114 10:43:48 requests:231] Retrieved page 2 of 103
190+
...
191+
[I 241114 10:50:08 requests:231] Retrieved page 103 of 103
192+
[I 241114 10:50:09 requests:213] Scroll context cleared successfully
193+
[I 241114 10:50:09 requests:235] Total Querying time taken: 13116.00ms
194+
[I 241114 10:50:09 opensearch_test:62] ***Start Period Breakdown Query test***
195+
[I 241114 10:50:10 requests:293] Total Querying time taken: 376ms
196+
[I 241114 10:50:10 opensearch_test:67] ***Start Delete Index***
197+
[I 241114 10:50:13 requests:61] Deleting index 'test' done b'{"acknowledged":true}'
198+
```
157199

158-
- document the remaining cmd line options
159-
- more different format types
160-
- ...
200+
Through this test, you can verify whether OpenSearch is functioning properly.
161201

162-
All suggestions, comments, ideas, pull requests are welcome!
202+
You can also check the performance.

0 commit comments

Comments
 (0)