Skip to content

Commit e9dde87

Browse files
authored
Data import reconstruction and active monitoring (#418)
* Data import reconstruction and active monitoring * Delete the original data import document * TsFile Active Listening&Loading Function Added Configuration Item Added Configuration Item
1 parent bc07621 commit e9dde87

File tree

12 files changed

+1044
-1
lines changed

12 files changed

+1044
-1
lines changed

src/.vuepress/sidebar/V1.3.x/en.ts

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -98,7 +98,8 @@ export const enSidebar = {
9898
{ text: 'Command Line Interface (CLI)', link: 'CLI' },
9999
{ text: 'Monitor Tool', link: 'Monitor-Tool_apache' },
100100
{ text: 'Benchmark Tool', link: 'Benchmark' },
101-
{ text: 'Maintenance Tool', link: 'Maintenance-Tool_apache' },
101+
{ text: 'Maintenance Tool', link: 'Maintenance-Tool_apache' },
102+
{ text: 'Data Import', link: 'Data-Import-Tool' },
102103
{ text: 'Data Export', link: 'Data-Export-Tool' },
103104
],
104105
},

src/.vuepress/sidebar/V1.3.x/zh.ts

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -99,7 +99,9 @@ export const zhSidebar = {
9999
{ text: '监控工具', link: 'Monitor-Tool_apache' },
100100
{ text: '测试工具', link: 'Benchmark' },
101101
{ text: '运维工具', link: 'Maintenance-Tool_apache' },
102+
{ text: '数据导入', link: 'Data-Import-Tool' },
102103
{ text: '数据导出', link: 'Data-Export-Tool' },
104+
103105
],
104106
},
105107
{

src/.vuepress/sidebar_timecho/V1.3.x/en.ts

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -108,6 +108,7 @@ export const enSidebar = {
108108
{ text: 'Monitor Tool', link: 'Monitor-Tool_timecho' },
109109
{ text: 'Benchmark Tool', link: 'Benchmark' },
110110
{ text: 'Maintenance Tool', link: 'Maintenance-Tool_timecho' },
111+
{ text: 'Data Import', link: 'Data-Import-Tool' },
111112
{ text: 'Data Export', link: 'Data-Export-Tool' },
112113
],
113114
},

src/.vuepress/sidebar_timecho/V1.3.x/zh.ts

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -108,6 +108,7 @@ export const zhSidebar = {
108108
{ text: '监控工具', link: 'Monitor-Tool_timecho' },
109109
{ text: '测试工具', link: 'Benchmark' },
110110
{ text: '运维工具', link: 'Maintenance-Tool_timecho' },
111+
{ text: '数据导入', link: 'Data-Import-Tool' },
111112
{ text: '数据导出', link: 'Data-Export-Tool' },
112113
],
113114
},

src/UserGuide/Master/Reference/Common-Config-Manual.md

Lines changed: 47 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2162,3 +2162,50 @@ Different configuration parameters take effect in the following three ways:
21622162
| Effective | hot-load |
21632163

21642164

2165+
#### TsFile Active Listening&Loading Function Configuration
2166+
2167+
* load\_active\_listening\_enable
2168+
2169+
|Name| load\_active\_listening\_enable |
2170+
|:---:|:---|
2171+
|Description| Whether to enable the DataNode's active listening and loading of tsfile functionality (default is enabled). |
2172+
|Type| Boolean |
2173+
|Default| true |
2174+
|Effective| hot-load |
2175+
2176+
* load\_active\_listening\_dirs
2177+
2178+
|Name| load\_active\_listening\_dirs |
2179+
|:---:|:---|
2180+
|Description| The directories to be listened to (automatically includes subdirectories of the directory), if there are multiple, separate with “,”. The default directory is ext/load/pending (supports hot loading). |
2181+
|Type| String |
2182+
|Default| ext/load/pending |
2183+
|Effective|hot-load|
2184+
2185+
* load\_active\_listening\_fail\_dir
2186+
2187+
|Name| load\_active\_listening\_fail\_dir |
2188+
|:---:|:---|
2189+
|Description| The directory to which files are transferred after the execution of loading tsfile files fails, only one directory can be configured. |
2190+
|Type| String |
2191+
|Default| ext/load/failed |
2192+
|Effective|hot-load|
2193+
2194+
* load\_active\_listening\_max\_thread\_num
2195+
2196+
|Name| load\_active\_listening\_max\_thread\_num |
2197+
|:---:|:---|
2198+
|Description| The maximum number of threads to perform loading tsfile tasks simultaneously. The default value when the parameter is commented out is max(1, CPU core count / 2). When the user sets a value not in the range [1, CPU core count / 2], it will be set to the default value (1, CPU core count / 2). |
2199+
|Type| Long |
2200+
|Default| max(1, CPU core count / 2) |
2201+
|Effective|Effective after restart|
2202+
2203+
2204+
* load\_active\_listening\_check\_interval\_seconds
2205+
2206+
|Name| load\_active\_listening\_check\_interval\_seconds |
2207+
|:---:|:---|
2208+
|Description| Active listening polling interval in seconds. The function of actively listening to tsfile is achieved by polling the folder. This configuration specifies the time interval between two checks of load_active_listening_dirs, and the next check will be executed after load_active_listening_check_interval_seconds seconds of each check. When the user sets the polling interval to less than 1, it will be set to the default value of 5 seconds. |
2209+
|Type| Long |
2210+
|Default| 5|
2211+
|Effective|Effective after restart|
Lines changed: 217 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,217 @@
1+
# Data Import
2+
3+
## 1. IoTDB Data Import
4+
5+
IoTDB currently supports importing data in CSV, SQL, and TsFile (IoTDB's underlying open-time series file format) into the database. The specific functionalities are as follows:
6+
7+
<table style="text-align: left;">
8+
<tr>
9+
<th>File Format</th>
10+
<th>IoTDB Tool</th>
11+
<th>Description</th>
12+
</tr>
13+
<tr>
14+
<td>CSV</td>
15+
<td>import-data.sh/bat</td>
16+
<td>Can be used for single or batch import of CSV files into IoTDB</td>
17+
</tr>
18+
<tr>
19+
<td>SQL</td>
20+
<td>import-data.sh/bat</td>
21+
<td>Can be used for single or batch import of SQL files into IoTDB</td>
22+
</tr>
23+
<tr>
24+
<td rowspan="2">TsFile</td>
25+
<td>load-tsfile.sh/bat</td>
26+
<td>Can be used for single or batch import of TsFile files into IoTDB</td>
27+
</tr>
28+
<tr>
29+
<td>TsFile Active Listening & Loading Feature</td>
30+
<td>According to user configuration, it listens for changes in TsFile files in the specified path and loads newly added TsFile files into IoTDB</td>
31+
</tr>
32+
</table>
33+
34+
## 2. import-data Scripts
35+
36+
- Supported formats: CSV、SQL
37+
38+
### 2.1 Command
39+
40+
```Bash
41+
# Unix/OS X
42+
>tools/import-data.sh -h <ip> -p <port> -u <username> -pw <password> -s <xxx.csv/sql> [-fd <./failedDirectory> -aligned <true/false> -batch <int> -tp <ms/ns/us> -typeInfer <boolean=text,float=double...> -lpf <int>]
43+
44+
# Windows
45+
>tools\import-data.bat -h <ip> -p <port> -u <username> -pw <password> -s <xxx.csv/sql> [-fd <./failedDirectory> -aligned <true/false> -batch <int> -tp <ms/ns/us> -typeInfer <boolean=text,float=double...> -lpf <int>]
46+
```
47+
48+
### 2.2 Parameter Introduction
49+
50+
51+
| **Parameter** | **Definition** | **Required** | **Default** |
52+
| --------- | ------------------------------------------------------------ | ------------ | ------------------------ |
53+
| -h | Hostname | No | 127.0.0.1 |
54+
| -p | Port | No | 6667 |
55+
| -u | Username | No | root |
56+
| -pw | Password | No | root |
57+
| -s | Specify the data to be imported, here you can specify files or folders. If a folder is specified, all files with suffixes of csv or sql in the folder will be batch imported (In V1.3.2, the parameter is `-f`) | Yes | |
58+
| -fd | Specify the directory for storing failed SQL files. If this parameter is not specified, failed files will be saved in the source data directory. Note: For unsupported SQL, illegal SQL, and failed SQL, they will be put into the failed directory under the failed file (default is the file name with `.failed` suffix) | No |The source filename with `.failed` suffix |
59+
| -aligned | Specify whether to use the `aligned` interface, options are true or false. Note: This parameter is only effective when importing csv files. | No | false |
60+
| -batch | Used to specify the number of data points per batch (minimum value is 1, maximum value is Integer.*MAX_VALUE*). If the program reports the error `org.apache.thrift.transport.TTransportException: Frame size larger than protect max size`, you can appropriately reduce this parameter. | No | 100000 |
61+
| -tp | Specify the time precision, options include `ms` (milliseconds), `ns` (nanoseconds), `us` (microseconds) | No | ms |
62+
| -lpf | Specify the number of data lines written per failed file (In V1.3.2, the parameter is `-linesPerFailedFile`) | No | 10000 |
63+
| -typeInfer | Used to specify type inference rules, such as <srcTsDataType1=dstTsDataType1,srcTsDataType2=dstTsDataType2,...>. Note: Used to specify type inference rules. `srcTsDataType` includes `boolean`, `int`, `long`, `float`, `double`, `NaN`. `dstTsDataType` includes `boolean`, `int`, `long`, `float`, `double`, `text`. When `srcTsDataType` is `boolean`, `dstTsDataType` can only be `boolean` or `text`. When `srcTsDataType` is `NaN`, `dstTsDataType` can only be `float`, `double`, or `text`. When `srcTsDataType` is a numerical type, the precision of `dstTsDataType` needs to be higher than `srcTsDataType`. For example: `-typeInfer boolean=text,float=double` | No | |
64+
65+
66+
### 2.3 Running Example
67+
68+
69+
- Import the `dump0_0.sql` data in the current `data` directory to the local IoTDB database.
70+
71+
```Bash
72+
# Unix/OS X
73+
>tools/import-data.sh -s ./data/dump0_0.sql
74+
# Windows
75+
>tools/import-data.bat -s ./data/dump0_0.sql
76+
```
77+
78+
- Import all data in the current `data` directory in an aligned manner to the local IoTDB database.
79+
80+
```Bash
81+
# Unix/OS X
82+
>tools/import-data.sh -s ./data/ -fd ./failed/ -aligned true
83+
# Windows
84+
>tools/import-data.bat -s ./data/ -fd ./failed/ -aligned true
85+
```
86+
87+
- Import the `dump0_0.csv` data in the current `data` directory to the local IoTDB database.
88+
89+
```Bash
90+
# Unix/OS X
91+
>tools/import-data.sh -s ./data/dump0_0.csv -fd ./failed/
92+
# Windows
93+
>tools/import-data.bat -s ./data/dump0_0.csv -fd ./failed/
94+
```
95+
96+
- Import the `dump0_0.csv` data in the current `data` directory in an aligned manner, batch import 100000 lines to the IoTDB database on the host with IP `192.168.100.1`, record failures in the current `failed` directory, with a maximum of 1000 lines per file.
97+
98+
99+
```Bash
100+
# Unix/OS X
101+
>tools/import-data.sh -h 192.168.100.1 -p 6667 -u root -pw root -s ./data/dump0_0.csv -fd ./failed/ -aligned true -batch 100000 -tp ms -typeInfer boolean=text,float=double -lpf 1000
102+
# Windows
103+
>tools/import-data.bat -h 192.168.100.1 -p 6667 -u root -pw root -s ./data/dump0_0.csv -fd ./failed/ -aligned true -batch 100000 -tp ms -typeInfer boolean=text,float=double -lpf 1000
104+
```
105+
106+
107+
## 3. load-tsfile Script
108+
109+
- Supported formats: TsFile
110+
111+
### 3.1 Command
112+
113+
```Bash
114+
# Unix/OS X
115+
>tools/load-tsfile.sh -h <ip> -p <port> -u <username> -pw <password> -s <source> -os <on_success> [-sd <success_dir>] -of <on_fail> [-fd <fail_dir>] [-tn <thread_num>]
116+
117+
# Windows
118+
>tools\load-tsfile.bat -h <ip> -p <port> -u <username> -pw <password> -s <source> -os <on_success> [-sd <success_dir>] -of <on_fail> [-fd <fail_dir>] [-tn <thread_num>]
119+
```
120+
121+
### 3.2 Parameter Introduction
122+
123+
124+
| **Parameter** | **Description** | **Required** | **Default** |
125+
| -------- | ------------------------------------------------------------ | ----------------------------------- | ------------------- |
126+
| -h | Hostname | No | root |
127+
| -p | Port | No | root |
128+
| -u | Username | No | 127.0.0.1 |
129+
| -pw | Password | No | 6667 |
130+
| -s | The local directory path of the script file (folder) to be loaded | Yes | |
131+
| -os | none: Do not delete <br> mv: Move successful files to the target folder <br> cp: Hard link (copy) successful files to the target folder <br> delete: Delete | Yes | |
132+
| -sd | When --on_success is mv or cp, the target folder for mv or cp. The file name of the file becomes the folder flattened and then concatenated with the original file name. | When --on_success is mv or cp, it is required to fill in Yes | ${EXEC_DIR}/success |
133+
| -of | none: Skip <br> mv: Move failed files to the target folder <br> cp: Hard link (copy) failed files to the target folder <br> delete: Delete | Yes | |
134+
| -fd | When --on_fail is specified as mv or cp, the target folder for mv or cp. The file name of the file becomes the folder flattened and then concatenated with the original file name. | When --on_fail is specified as mv or cp, it is required to fill in | ${EXEC_DIR}/fail |
135+
| -tn | Maximum number of parallel threads | Yes | 8 |
136+
137+
138+
139+
### 3.3 Running Examples
140+
141+
142+
```Bash
143+
# Unix/OS X
144+
> tools/load-tsfile.sh -h 127.0.0.1 -p 6667 -u root -pw root -s /path/sql -os delete -of delete -tn 8
145+
> tools/load-tsfile.sh -h 127.0.0.1 -p 6667 -u root -pw root -s /path/sql -os mv -of cp -sd /path/success/dir -fd /path/failure/dir -tn 8
146+
147+
# Windows
148+
> tools/load_data.bat -h 127.0.0.1 -p 6667 -u root -pw root -s /path/sql -os mv -of cp -sd /path/success/dir -fd /path/failure/dir -tn 8
149+
> tools/load_data.bat -h 127.0.0.1 -p 6667 -u root -pw root -s /path/sql -os delete -of delete -tn 8
150+
```
151+
152+
## 4. TsFile Active Listening & Loading Feature
153+
154+
The TsFile Active Listening & Loading Feature can actively monitor TsFile file changes in the specified target path (configured by the user) and automatically synchronize TsFile files from the target path to the specified reception path (configured by the user). Through this feature, IoTDB can automatically detect and load these files without the need for any additional manual loading operations. This automated process not only simplifies the user's operational steps but also reduces potential errors that may occur during the operation, effectively reducing the complexity for users during the usage process.
155+
156+
![](https://alioss.timecho.com/docs/img/Data-import2.png)
157+
158+
159+
### 4.1 Configuration Parameters
160+
161+
You can enable the TsFile Active Listening & Loading Feature by finding the following parameters in the configuration file template `iotdb-system.properties.template` and adding them to the IoTDB configuration file `iotdb-system.properties`. The complete configuration is as follows:
162+
163+
164+
| **Configuration Parameter** | **Description** | **Value Range** | **Required** | **Default Value** | **Loading Method** |
165+
| -------------------------------------------- | ------------------------------------------------------------ | -------------------------- | ------------ | ---------------------- | ---------------- |
166+
| load_active_listening_enable | Whether to enable the DataNode's active listening and loading of tsfile functionality (default is enabled). | Boolean: true,false | Optional | true | Hot Loading |
167+
| load_active_listening_dirs | The directories to be listened to (automatically includes subdirectories of the directory), if there are multiple, separate with “,”. The default directory is ext/load/pending (supports hot loading). | String: one or more file directories | Optional | ext/load/pending | Hot Loading |
168+
| load_active_listening_fail_dir | The directory to which files are transferred after the execution of loading tsfile files fails, only one directory can be configured. | String: one file directory | Optional | ext/load/failed | Hot Loading |
169+
| load_active_listening_max_thread_num | The maximum number of threads to perform loading tsfile tasks simultaneously. The default value when the parameter is commented out is max(1, CPU core count / 2). When the user sets a value not in the range [1, CPU core count / 2], it will be set to the default value (1, CPU core count / 2). | Long: [1, Long.MAX_VALUE] | Optional | max(1, CPU core count / 2) | Effective after restart |
170+
| load_active_listening_check_interval_seconds | Active listening polling interval in seconds. The function of actively listening to tsfile is achieved by polling the folder. This configuration specifies the time interval between two checks of load_active_listening_dirs, and the next check will be executed after load_active_listening_check_interval_seconds seconds of each check. When the user sets the polling interval to less than 1, it will be set to the default value of 5 seconds. | Long: [1, Long.MAX_VALUE] | Optional | 5 | Effective after restart |
171+
172+
173+
### 4.2 Precautions
174+
175+
1. If there is a mods file in the files to be loaded, the mods file should be moved to the listening directory first, and then the tsfile files should be moved, with the mods file and the corresponding tsfile file in the same directory. This prevents the loading of tsfile files without the corresponding mods files.
176+
177+
178+
```SQL
179+
FUNCTION moveFilesToListeningDirectory(sourceDirectory, listeningDirectory)
180+
// Move mods files
181+
modsFiles = searchFiles(sourceDirectory, "*mods*")
182+
IF modsFiles IS NOT EMPTY
183+
FOR EACH file IN modsFiles
184+
MOVE(file, listeningDirectory)
185+
END FOR
186+
END IF
187+
188+
// Move tsfile files
189+
tsfileFiles = searchFiles(sourceDirectory, "*tsfile*")
190+
IF tsfileFiles IS NOT EMPTY
191+
FOR EACH file IN tsfileFiles
192+
MOVE(file, listeningDirectory)
193+
END FOR
194+
END IF
195+
END FUNCTION
196+
197+
FUNCTION searchFiles(directory, pattern)
198+
matchedFiles = []
199+
FOR EACH file IN directory.files
200+
IF file.name MATCHES pattern
201+
APPEND file TO matchedFiles
202+
END IF
203+
END FOR
204+
RETURN matchedFiles
205+
END FUNCTION
206+
207+
FUNCTION MOVE(sourceFile, targetDirectory)
208+
// Implement the logic of moving files from sourceFile to targetDirectory
209+
END FUNCTION
210+
```
211+
212+
2. Prohibit setting the receiver directory of Pipe, the data directory for storing data, etc., as the listening directory.
213+
214+
3. Prohibit `load_active_listening_fail_dir` from having the same directory as `load_active_listening_dirs`, or each other's nesting.
215+
216+
4. Ensure that the `load_active_listening_dirs` directory has sufficient permissions. After the load is successful, the files will be deleted. If there is no delete permission, it will lead to repeated loading.
217+

0 commit comments

Comments
 (0)