|
1 | 1 | ---
|
2 | 2 | title: Streaming Aggregation in OpenObserve
|
3 |
| -description: Learn how Streaming Aggregation works in OpenObserve Enterprise. |
| 3 | + |
| 4 | +description: Learn how streaming aggregation works in OpenObserve Enterprise. |
4 | 5 | ---
|
5 |
| -This page explains what Streaming Aggregation is and shows how to use it to improve query performance with aggregation cache in OpenObserve. |
| 6 | +This page explains what streaming aggregation is and shows how to use it to improve query performance with aggregation cache in OpenObserve. |
6 | 7 |
|
7 | 8 | > This is an enterprise feature.
|
8 | 9 |
|
@@ -46,31 +47,28 @@ This page explains what Streaming Aggregation is and shows how to use it to impr
|
46 | 47 | | `ZO_FEATURE_QUERY_STREAMING_AGGS` | Enables or disables streaming aggregation. When set to `true`, aggregation queries use the aggregation cache. | `true` |
|
47 | 48 | | `ZO_DATAFUSION_STREAMING_AGGS_CACHE_MAX_ENTRIES` | Defines the maximum number of cache entries stored for streaming aggregations. Controls how many partition results are retained. | `10000` |
|
48 | 49 | | `ZO_DISK_AGGREGATION_CACHE_MAX_SIZE` | Sets the maximum size for record batch cache on disk. By default, it is 10 percent of the local volume space, capped at 20 GB. | 10 percent of volume, up to 20 GB |
|
49 |
| - | `ZO_CACHE_DELAY_SECS` | Defines the number of seconds to wait before aggregation results become eligible to cache. | 300 secs | |
50 |
| - | `ZO_AGGREGATION_TOPK_ENABLED` | Enables the `approx_topk` function. | true | |
| 50 | + | `ZO_CACHE_DELAY_SECS` | Defines the number of seconds to wait before aggregation results become eligible to cache. | `300 secs` | |
| 51 | + | `ZO_AGGREGATION_TOPK_ENABLED` | Enables the `approx_topk` function. | `true` | |
51 | 52 |
|
52 | 53 |
|
53 | 54 | ---
|
54 | 55 |
|
55 | 56 | ## How does it work?
|
56 | 57 |
|
57 |
| - |
58 | 58 | **First run: partitioning and caching aggregate factors** <br>
|
59 | 59 | When an aggregation query runs for the first time, OpenObserve divides the requested time range into fixed-size partitions. Each partition is processed separately. Instead of storing the final aggregates, OpenObserve caches the factors required to compute the aggregate. For example, it caches sums and counts, which can later be combined to produce averages.
|
60 | 60 |
|
61 |
| - These results are cached on disk. This creates the initial **Aggregation cache** for the query stream. <br> |
62 |
| - |
| 61 | + These results are cached on disk. This creates the initial aggregation cache for the query stream. <br> |
| 62 | + |
63 | 63 | **Later runs: reuse of cached partitions**<br>
|
64 | 64 | When another query runs with the same stream, filters, and grouping, OpenObserve checks the cache. If the requested time range overlaps with existing partitions, it reuses the cached results and computes only the missing partitions. Results remain accurate because cached sums, counts, and other stored values can be combined with new results to compute the final aggregates.
|
65 | 65 |
|
66 | 66 | ---
|
67 | 67 |
|
68 |
| - |
69 | 68 | ## How does it handle late-arriving data?
|
70 |
| - |
71 | 69 | To handle late-arriving data, OpenObserve applies a delay window before marking aggregation results as eligible to cache.
|
72 | 70 | The system compares the query time with the end of the selected time range. If the end of the range falls within the delay window, the result is not cached. This ensures that results include all delayed records before being stored.
|
73 |
| - The delay window is configured through the environment variable `ZO_CACHE_DELAY_SECS`. The default value is 300 secs (5 minutes). You can adjust this value to match the ingestion delay in your environment. For example, if logs typically arrive with up to 10 minutes of delay, set the variable to 600 secs. |
| 71 | + The delay window is configured through the environment variable `ZO_CACHE_DELAY_SECS`. The default value is 300 secs. You can adjust this value to match the ingestion delay in your environment. For example, if logs typically arrive with up to 10 minutes of delay, set the variable to 600 secs. |
74 | 72 |
|
75 | 73 | ---
|
76 | 74 |
|
@@ -166,9 +164,10 @@ This page explains what Streaming Aggregation is and shows how to use it to impr
|
166 | 164 | GROUP BY x_axis_1
|
167 | 165 | ```
|
168 | 166 |
|
169 |
| - You can apply the aggregation query in any place where queries are executed, such as Logs or Dashboards. To measure load time, check cacheability, and verify cache usage, use your browser’s developer tools. Right-click the browser, select Inspect, open the Network tab, and filter by Fetch/XHR. |
| 167 | + You can apply the aggregation query in any place where queries are executed, such as Logs or Dashboards. To measure load time, check cacheability, and verify cache usage, use your browser’s developer tools. |
| 168 | + Right-click the browser, select **Inspect**, open the **Network** tab, and filter by **Fetch/XHR**. |
170 | 169 |
|
171 |
| - The following example is performed with Streaming Search enabled. Aggregation cache works the same when Streaming Search is disabled. |
| 170 | + The following example is performed with [Streaming Search](https://openobserve.ai/docs/user-guide/streams/summary-streams/) enabled. Aggregation cache works the same when Streaming Search is disabled. |
172 | 171 |
|
173 | 172 | **Step 1: Run the aggregation query** <br>
|
174 | 173 |
|
@@ -252,23 +251,19 @@ Streaming aggregation is enabled in all the following test runs:
|
252 | 251 |
|
253 | 252 | ## Limitations
|
254 | 253 |
|
255 |
| -- Very complex queries may not be eligible for cache reuse yet. Examples include joins, nested subqueries, heavy window functions, and large unions |
256 |
| -- The first run pays full computation cost to populate the cache |
| 254 | +- Very complex queries may not be eligible for cache reuse yet. Examples include joins, nested subqueries, heavy window functions, and large unions. |
| 255 | +- The first run pays full computation cost to populate the cache. |
257 | 256 | - Reuse depends on partition availability. Eviction due to capacity limits can reduce reuse.
|
258 | 257 |
|
259 | 258 | ---
|
260 | 259 |
|
261 | 260 | ## Troubleshooting
|
262 | 261 |
|
263 |
| -1. **Second run is not faster** |
264 |
| - |
265 |
| - - **Cause**: The query was not cacheable or the first run did not complete. |
266 |
| - - **Fix**: Align time windows and filters with the first run. Verify `streaming_aggs` and `streaming_id`. After a successful first run, confirm `result_cache_ratio` equals `100` on some partitions. |
| 262 | +- **Issue**: **Second run is not faster** |
| 263 | +- **Cause**: The query was not cacheable or the first run did not complete. |
| 264 | +- **Fix**: Align time windows and filters with the first run. Verify `streaming_aggs` and `streaming_id`. After a successful first run, confirm `result_cache_ratio` equals `100` on some partitions. |
267 | 265 |
|
268 |
| -2. **Different panels do not benefit** |
269 | 266 |
|
270 |
| - - **Cause**: Time windows or filters differ. |
271 |
| - - **Fix**: Use the same windows and the same filters across panels that analyze the same stream. |
272 | 267 |
|
273 | 268 |
|
274 | 269 |
|
|
0 commit comments