Skip to content

Commit a4bd20e

Browse files
committed
add function: approx_most_frequent
1 parent 39e4620 commit a4bd20e

File tree

4 files changed

+111
-26
lines changed

4 files changed

+111
-26
lines changed

src/UserGuide/Master/Table/SQL-Manual/Basis-Function.md

Lines changed: 27 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -161,6 +161,7 @@ SELECT LEAST(temperature,humidity) FROM table2;
161161
| COUNT | Counts the number of data points. | All types | INT64 |
162162
| COUNT_IF | COUNT_IF(exp) counts the number of rows that satisfy a specified boolean expression. | `exp` must be a boolean expression,(e.g. `count_if(temperature>20)`) | INT64 |
163163
| APPROX_COUNT_DISTINCT | The APPROX_COUNT_DISTINCT(x[, maxStandardError]) function provides an approximation of COUNT(DISTINCT x), returning the estimated number of distinct input values. | `x`: The target column to be calculated, supports all data types.<br>`maxStandardError` (optional): Specifies the maximum standard error allowed for the function's result. Valid range is [0.0040625, 0.26]. Defaults to 0.023 if not specified. | INT64 |
164+
| APPROX_MOST_FREQUENT | The APPROX_MOST_FREQUENT(x, k, capacity) function is used to approximately calculate the top k most frequent elements in a dataset. It returns a JSON-formatted string where the keys are the element values and the values are their corresponding approximate frequencies. | `x` : The column to be calculated, supporting all existing data types in IoTDB;<br> `k`: The number of top-k most frequent values to return;<br>`capacity`: The number of buckets used for computation, which relates to memory usage—a larger value reduces error but consumes more memory, while a smaller value increases error but uses less memory. | STRING |
164165
| SUM | Calculates the sum. | INT32 INT64 FLOAT DOUBLE | DOUBLE |
165166
| AVG | Calculates the average. | INT32 INT64 FLOAT DOUBLE | DOUBLE |
166167
| MAX | Finds the maximum value. | All types | Same as input type |
@@ -251,8 +252,28 @@ Total line number = 1
251252
It costs 0.022s
252253
```
253254

255+
#### 2.3.5 Approx_most_frequent
254256

255-
#### 2.3.5 First
257+
Query the ​​top 2 most frequent values​​ in the `temperature` column of `table1`.
258+
259+
```sql
260+
IoTDB> select approx_most_frequent(temperature,2,100) as topk from table1;
261+
```
262+
263+
The execution result is as follows:
264+
265+
```sql
266+
+-------------------+
267+
| topk|
268+
+-------------------+
269+
|{"85.0":6,"90.0":5}|
270+
+-------------------+
271+
Total line number = 1
272+
It costs 0.064s
273+
```
274+
275+
276+
#### 2.3.6 First
256277

257278
Finds the values with the smallest timestamp that are not NULL in the `temperature` and `humidity` columns.
258279

@@ -272,7 +293,7 @@ Total line number = 1
272293
It costs 0.170s
273294
```
274295

275-
#### 2.3.6 Last
296+
#### 2.3.7 Last
276297

277298
Finds the values with the largest timestamp that are not NULL in the `temperature` and `humidity` columns.
278299

@@ -292,7 +313,7 @@ Total line number = 1
292313
It costs 0.211s
293314
```
294315

295-
#### 2.3.7 First_by
316+
#### 2.3.8 First_by
296317

297318
Finds the `time` value of the row with the smallest timestamp that is not NULL in the `temperature` column, and the `humidity` value of the row with the smallest timestamp that is not NULL in the `temperature` column.
298319

@@ -312,7 +333,7 @@ Total line number = 1
312333
It costs 0.269s
313334
```
314335

315-
#### 2.3.8 Last_by
336+
#### 2.3.9 Last_by
316337

317338
Queries the `time` value of the row with the largest timestamp that is not NULL in the `temperature` column, and the `humidity` value of the row with the largest timestamp that is not NULL in the `temperature` column.
318339

@@ -332,7 +353,7 @@ Total line number = 1
332353
It costs 0.070s
333354
```
334355

335-
#### 2.3.9 Max_by
356+
#### 2.3.10 Max_by
336357

337358
Queries the `time` value of the row where the `temperature` column is at its maximum, and the `humidity` value of the row where the `temperature` column is at its maximum.
338359

@@ -352,7 +373,7 @@ Total line number = 1
352373
It costs 0.172s
353374
```
354375

355-
#### 2.3.10 Min_by
376+
#### 2.3.11 Min_by
356377

357378
Queries the `time` value of the row where the `temperature` column is at its minimum, and the `humidity` value of the row where the `temperature` column is at its minimum.
358379

src/UserGuide/latest-Table/SQL-Manual/Basis-Function.md

Lines changed: 27 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -161,6 +161,7 @@ SELECT LEAST(temperature,humidity) FROM table2;
161161
| COUNT | Counts the number of data points. | All types | INT64 |
162162
| COUNT_IF | COUNT_IF(exp) counts the number of rows that satisfy a specified boolean expression. | `exp` must be a boolean expression,(e.g. `count_if(temperature>20)`) | INT64 |
163163
| APPROX_COUNT_DISTINCT | The APPROX_COUNT_DISTINCT(x[, maxStandardError]) function provides an approximation of COUNT(DISTINCT x), returning the estimated number of distinct input values. | `x`: The target column to be calculated, supports all data types.<br>`maxStandardError` (optional): Specifies the maximum standard error allowed for the function's result. Valid range is [0.0040625, 0.26]. Defaults to 0.023 if not specified. | INT64 |
164+
| APPROX_MOST_FREQUENT | The APPROX_MOST_FREQUENT(x, k, capacity) function is used to approximately calculate the top k most frequent elements in a dataset. It returns a JSON-formatted string where the keys are the element values and the values are their corresponding approximate frequencies. | `x` : The column to be calculated, supporting all existing data types in IoTDB;<br> `k`: The number of top-k most frequent values to return;<br>`capacity`: The number of buckets used for computation, which relates to memory usage—a larger value reduces error but consumes more memory, while a smaller value increases error but uses less memory. | STRING |
164165
| SUM | Calculates the sum. | INT32 INT64 FLOAT DOUBLE | DOUBLE |
165166
| AVG | Calculates the average. | INT32 INT64 FLOAT DOUBLE | DOUBLE |
166167
| MAX | Finds the maximum value. | All types | Same as input type |
@@ -251,8 +252,28 @@ Total line number = 1
251252
It costs 0.022s
252253
```
253254

255+
#### 2.3.5 Approx_most_frequent
254256

255-
#### 2.3.5 First
257+
Query the ​​top 2 most frequent values​​ in the `temperature` column of `table1`.
258+
259+
```sql
260+
IoTDB> select approx_most_frequent(temperature,2,100) as topk from table1;
261+
```
262+
263+
The execution result is as follows:
264+
265+
```sql
266+
+-------------------+
267+
| topk|
268+
+-------------------+
269+
|{"85.0":6,"90.0":5}|
270+
+-------------------+
271+
Total line number = 1
272+
It costs 0.064s
273+
```
274+
275+
276+
#### 2.3.6 First
256277

257278
Finds the values with the smallest timestamp that are not NULL in the `temperature` and `humidity` columns.
258279

@@ -272,7 +293,7 @@ Total line number = 1
272293
It costs 0.170s
273294
```
274295

275-
#### 2.3.6 Last
296+
#### 2.3.7 Last
276297

277298
Finds the values with the largest timestamp that are not NULL in the `temperature` and `humidity` columns.
278299

@@ -292,7 +313,7 @@ Total line number = 1
292313
It costs 0.211s
293314
```
294315

295-
#### 2.3.7 First_by
316+
#### 2.3.8 First_by
296317

297318
Finds the `time` value of the row with the smallest timestamp that is not NULL in the `temperature` column, and the `humidity` value of the row with the smallest timestamp that is not NULL in the `temperature` column.
298319

@@ -312,7 +333,7 @@ Total line number = 1
312333
It costs 0.269s
313334
```
314335

315-
#### 2.3.8 Last_by
336+
#### 2.3.9 Last_by
316337

317338
Queries the `time` value of the row with the largest timestamp that is not NULL in the `temperature` column, and the `humidity` value of the row with the largest timestamp that is not NULL in the `temperature` column.
318339

@@ -332,7 +353,7 @@ Total line number = 1
332353
It costs 0.070s
333354
```
334355

335-
#### 2.3.9 Max_by
356+
#### 2.3.10 Max_by
336357

337358
Queries the `time` value of the row where the `temperature` column is at its maximum, and the `humidity` value of the row where the `temperature` column is at its maximum.
338359

@@ -352,7 +373,7 @@ Total line number = 1
352373
It costs 0.172s
353374
```
354375

355-
#### 2.3.10 Min_by
376+
#### 2.3.11 Min_by
356377

357378
Queries the `time` value of the row where the `temperature` column is at its minimum, and the `humidity` value of the row where the `temperature` column is at its minimum.
358379

0 commit comments

Comments
 (0)