Skip to content

Commit 8d36064

Browse files
author
Leo Park
committed
Docs: Update README.md
1 parent 8bf1752 commit 8d36064

File tree

1 file changed

+108
-40
lines changed

1 file changed

+108
-40
lines changed

README.md

Lines changed: 108 additions & 40 deletions
Original file line numberDiff line numberDiff line change
@@ -1,53 +1,51 @@
1-
# Quick fix for opensearch 2.6
2-
- fix index creation
3-
- remove doc type
4-
51
# Elasticsearch For Beginners: Generate and Upload Randomized Test Data
62

73
Because everybody loves test data.
84

95
## Ok, so what is this thing doing?
106

11-
`es_test_data.py` lets you generate and upload randomized test data to
12-
your ES cluster so you can start running queries, see what performance
7+
`search_test.py` lets you generate and upload randomized test data to
8+
your Elasticsearch or Opensearch cluster so you can start running queries, see what performance
139
is like, and verify your cluster is able to handle the load.
1410

1511
It allows for easy configuring of what the test documents look like, what
1612
kind of data types they include and what the field names are called.
1713

1814
## Cool, how do I use this?
1915

16+
### File Descriptions
17+
2018
### Run Python script
2119

22-
Let's assume you have an Elasticsearch cluster running.
20+
Let's assume you have an Elasticsearch or Opensearch cluster running.
2321

24-
Python and [Tornado](https://github.com/tornadoweb/tornado/) are used. Run
25-
`pip install tornado` to install Tornado if you don't have it already.
22+
Python, [Tornado](https://github.com/tornadoweb/tornado/) and [Faker](https://github.com/joke2k/faker) are used. Run
23+
`pip install tornado` and `pip install Faker` to install if you don't have them already.
2624

2725
It's as simple as this:
2826

2927
```
30-
$ python es_test_data.py --es_url=http://localhost:9200
31-
[I 150604 15:43:19 es_test_data:42] Trying to create index http://localhost:9200/test_data
32-
[I 150604 15:43:19 es_test_data:47] Guess the index exists already
33-
[I 150604 15:43:19 es_test_data:184] Generating 10000 docs, upload batch size is 1000
34-
[I 150604 15:43:19 es_test_data:62] Upload: OK - upload took: 25ms, total docs uploaded: 1000
35-
[I 150604 15:43:20 es_test_data:62] Upload: OK - upload took: 25ms, total docs uploaded: 2000
36-
[I 150604 15:43:20 es_test_data:62] Upload: OK - upload took: 19ms, total docs uploaded: 3000
37-
[I 150604 15:43:20 es_test_data:62] Upload: OK - upload took: 18ms, total docs uploaded: 4000
38-
[I 150604 15:43:20 es_test_data:62] Upload: OK - upload took: 27ms, total docs uploaded: 5000
39-
[I 150604 15:43:20 es_test_data:62] Upload: OK - upload took: 19ms, total docs uploaded: 6000
40-
[I 150604 15:43:20 es_test_data:62] Upload: OK - upload took: 15ms, total docs uploaded: 7000
41-
[I 150604 15:43:20 es_test_data:62] Upload: OK - upload took: 24ms, total docs uploaded: 8000
42-
[I 150604 15:43:20 es_test_data:62] Upload: OK - upload took: 32ms, total docs uploaded: 9000
43-
[I 150604 15:43:20 es_test_data:62] Upload: OK - upload took: 31ms, total docs uploaded: 10000
44-
[I 150604 15:43:20 es_test_data:216] Done - total docs uploaded: 10000, took 1 seconds
45-
[I 150604 15:43:20 es_test_data:217] Bulk upload average: 23 ms
46-
[I 150604 15:43:20 es_test_data:218] Bulk upload median: 24 ms
47-
[I 150604 15:43:20 es_test_data:219] Bulk upload 95th percentile: 31 ms
28+
$ python search_test.py --search_db_url=http://localhost:9200 --index_name=test --count=20000
29+
[I 241114 10:42:31 opensearch_test:47] ***Start Data Generate Test***
30+
[I 241114 10:42:32 requests:83] Trying to create index http://localhost:9200/test_type
31+
[I 241114 10:42:33 requests:89] Looks like the index exists already
32+
[I 241114 10:42:33 upload_data:83] Generating 20000 docs, upload batch size is 10000
33+
[I 241114 10:42:55 requests:109] Upload: OK - upload took: 4551ms, total docs uploaded: 10000
34+
[I 241114 10:43:18 requests:109] Upload: OK - upload took: 4598ms, total docs uploaded: 20000
35+
[I 241114 10:43:18 upload_data:123] Done - total docs uploaded: 20000, took 45 seconds
36+
[I 241114 10:43:18 opensearch_test:52] ***Start Query All Test***
37+
[I 241114 10:43:41 requests:226] Total hits: 20000, Total pages: 2
38+
[I 241114 10:43:44 requests:231] Retrieved page 1 of 2
39+
...
40+
[I 241114 10:43:48 requests:231] Retrieved page 2 of 2
41+
[I 241114 10:50:09 requests:213] Scroll context cleared successfully
42+
[I 241114 10:50:09 requests:235] Total Querying time taken: 13116.00ms
43+
[I 241114 10:50:10 opensearch_test:67] ***Start Delete Index***
44+
[I 241114 10:50:13 requests:61] Deleting index 'test_type' done b'{"acknowledged":true}'
45+
4846
```
4947

50-
Without any command line options, it will generate and upload 1000 documents
48+
Without any command line options, it will generate and upload 100000 documents
5149
of the format
5250

5351
```
@@ -58,7 +56,7 @@ of the format
5856
}
5957
```
6058
to an Elasticsearch cluster at `http://localhost:9200` to an index called
61-
`test_data`.
59+
`test_type`.
6260

6361
### Docker and Docker Compose
6462

@@ -70,7 +68,7 @@ Requires [Docker](https://docs.docker.com/get-docker/) for running the app and [
7068
```
7169
1. Clone this repository
7270
```bash
73-
$ git clone https://github.com/oliver006/elasticsearch-test-data.git
71+
$ git clone <change_this_to_repository_url>
7472
$ cd elasticsearch-test-data
7573
```
7674
1. Run the ElasticSearch stack
@@ -92,20 +90,87 @@ Requires [Docker](https://docs.docker.com/get-docker/) for running the app and [
9290

9391
## Not bad but what can I configure?
9492

95-
`python es_test_data.py --help` gives you the full set of command line
96-
ptions, here are the most important ones:
93+
Recommended method for Config is create `server.conf` file and input the values needed.
94+
95+
If you are not using a config file, you need to pass the required values as `arguments`.
96+
97+
However, when there are many values to set, it is much more convenient to create and use a `server.conf` file.
98+
99+
Enter the desired options in the `server.conf` file.
100+
101+
Example:
97102

98-
- `--es_url=http://localhost:9200` the base URL of your ES node, don't
99-
include the index name
100-
- `--username=<username>` the username when basic auth is required
101-
- `--password=<password>` the password when basic auth is required
103+
Create the configure file
104+
105+
```shell
106+
cd ${REPOSITORY}/elasticsearch-test-data
107+
touch server.conf
108+
${EDITOR} server.conf
109+
```
110+
111+
Edit configure file
112+
113+
```conf
114+
# server.conf
115+
action = "all"
116+
opensearch_url = "https://uri.for.search.db:port"
117+
username = TEST_NAME
118+
password = TEST_PASSWORD
119+
```
120+
121+
### What can be configure?
122+
123+
| Setting | Description | Default Value |
124+
| ----------------------- | ---------------------------------------------------------------------- | ----------------------- |
125+
| action | Specify the action to be performed. | all |
126+
| json_path | Query JSON file path | None |
127+
| batch_size | OpenSearch bulk index batch size | 1000 |
128+
| client_cert | Filepath of CA certificates in PEM format | None |
129+
| client_key | Filepath of client SSL key | None |
130+
| count | Number of docs to generate | 100000 |
131+
| data_file | Name of the documents file to use | None |
132+
| dict_file | Name of dictionary file to use | None |
133+
| finish_time | Shape Finish Time in '%Y-%m-%d %H:%M:%S' format | None |
134+
| force_init_index | Force deleting and re-initializing the OpenSearch index | False |
135+
| format | Message format | (truncated for brevity) |
136+
| http_upload_timeout | Timeout in seconds when uploading data | 10 |
137+
| id_type | Type of 'id' to use for the docs, int or uuid4 | None |
138+
| index_name | Name of the index to store your messages | test |
139+
| index_type | Index type | test_type |
140+
| number_of_replicas | Number of replicas for OpenSearch index | 1 |
141+
| number_of_shards | Number of shards for OpenSearch index | 1 |
142+
| search_db_url | URL of your DB | http://localhost:9200 |
143+
| out_file | Write test data to out_file as well | False |
144+
| password | Password for OpenSearch | None |
145+
| random_seed | Random seed number for Faker | None |
146+
| set_refresh | Set refresh rate to -1 before starting the upload | False |
147+
| start_time | Shape Start Time in '%Y-%m-%d %H:%M:%S' format | None |
148+
| username | Username for OpenSearch | None |
149+
| validate_cert | SSL validate_cert for requests. Use false for self-signed certificates | True |
150+
151+
152+
`python search-test.py --help` also gives you the full set of command line
153+
ptions, here are more description about the most important ones:
154+
155+
- `action`: [generate_data, query_all, custom_query, delete_index, all] choose one
156+
- generate_data: upload the data generated through `format` to the OpenSearch database.
157+
- query_all: request all values of the specified index within the range using `start_time` and `finish_time`.
158+
- custom_query: You can specify the values for the body used in the request through a JSON file. this option require `json_path`. For more [read docs](https://opensearch.org/docs/latest/api-reference/search/)
159+
- delete_index: All data at the specified index will be deleted. (Please use with caution.)
160+
- all: I will conduct whole process test.(generate_data -> query_all -> delete_index)
161+
- The values that need to be set according to the server's security settings are as follows:
162+
- `validate_cert`
163+
- `client_cert`
164+
- `client_key`
165+
- `username`
166+
- `password`
167+
- `--search_db_url=http://localhost:9200` the base URL of your search DB node, don't include the index name
102168
- `--count=###` number of documents to generate and upload
103169
- `--index_name=test_data` the name of the index to upload the data to.
104170
If it doesn't exist it'll be created with these options
105171
- `--num_of_shards=2` the number of shards for the index
106172
- `--num_of_replicas=0` the number of replicas for the index
107-
- `--batch_size=###` we use bulk upload to send the docs to ES, this option
108-
controls how many we send at a time
173+
- `--batch_size=###` we use bulk upload to send the docs to DB, this option controls how many we send at a time
109174
- `--force_init_index=False` if `True` it will delete and re-create the index
110175
- `--dict_file=filename.dic` if provided the `dict` data type will use words
111176
from the dictionary file, format is one word per line. The entire file is
@@ -151,11 +216,14 @@ Currently supported field types are:
151216
can be either a single number, or pair of numbers separated by `-` (i.e. 3-7),
152217
defining range of lengths from with random length will be picked for each array
153218
(Example `int_array:arr:1-5:int:1:250`)
154-
219+
- `log_version` a random version `str` looks like v1.1.1
220+
- `sha` generate random sha(len 40)
221+
- `file_name` Generate fake python file(.py)
222+
- `uuid` Generate fake uuid
223+
- `service` Generate fake service name
155224
156225
## Todo
157226
158-
- document the remaining cmd line options
159227
- more different format types
160228
- ...
161229

0 commit comments

Comments
 (0)