Merge pull request #160 from axoflow/4.17-prep

fekete-robert · web-flow · commit 37563dd831d9 · 2025-09-05T16:05:27.000+02:00
4.17 prep
diff --git a/config/_default/config.toml b/config/_default/config.toml
@@ -72,7 +72,7 @@ description = "Documentation for AxoSyslog, the scalable security data processor
   # The version number for the version of the docs represented in this doc set.
   # Used in the "version-banner" partial to display a version number for the 
   # current doc set.
-  version = "4.16.0"
+  version = "4.17.0"
   version_menu_canonicallinks = true
 
   # A link to latest version of the docs. Used in the "version-banner" partial to
@@ -172,9 +172,9 @@ description = "Documentation for AxoSyslog, the scalable security data processor
 [params.product]
 name = "AxoSyslog"
 abbrev = "AxoSyslog"
-version = "4.16"
-techversion = "4.16.0"
-configversion = "4.16"
+version = "4.17"
+techversion = "4.17.0"
+configversion = "4.17"
 syslog-ng = "syslog-ng"
 selinux = "SELinux"
 apparmor = "AppArmor"
diff --git a/content/chapter-destinations/clickhouse/_index.md b/content/chapter-destinations/clickhouse/_index.md
@@ -85,6 +85,28 @@ This destination has the following options:
 
 {{< include-headless "chunk/option-destination-hook.md" >}}
 
+## json-var()
+
+|          |              |
+| -------- | ------------ |
+| Type:    | string       |
+| Default: | empty string |
+
+Available in {{< product >}} 4.17 and later.
+
+*Description:* The `json-var()` option accepts either a JSON template or a variable containing a JSON string, and sends it to the ClickHouse server in Protobuf/JSON mixed mode ([`JSONEachRow` format](https://clickhouse.com/docs/interfaces/formats/JSONEachRow)). In this mode, type validation is performed by the ClickHouse server itself, so no Protobuf schema is required for communication. For example:
+
+```shell
+destination {
+  clickhouse (
+    ...
+    json-var(json("{\"ingest_time\":1755248921000000000, \"body\": \"test template\"}"))ß
+    };
+};
+```
+
+Using `json-var()` is mutually exclusive with the [`proto-var()`](#proto-var), [`server-side-schema()`](#server-side-schema), [`schema()`](#schema), and [`protobuf-schema()`](#protobuf-schema) options.
+
 {{< include-headless "chunk/option-destination-grpc-keep-alive.md" >}}
 
 {{% include-headless "chunk/option-destination-local-timezone.md" %}}
@@ -130,7 +152,7 @@ message CustomRecord {
 }
 ```
 
-Alternatively, you can set the schema with the [`schema()`](#schema) option, or use [proto-var()](#proto-var) to assign an already formatted object to the message.
+Alternatively, you can set the schema with the [`schema()`](#schema) option, use [proto-var()](#proto-var) to assign an already formatted object to the message, or use a JSON template with the [json-var()](#json-var) option.
 
 {{< include-headless "chunk/option-destination-proto-var.md" >}}
 
@@ -156,7 +178,7 @@ schema(
 )
 ```
 
-Alternatively, you can set the schema with the [`protobuf-schema()`](#protobuf-schema) option, or use [proto-var()](#proto-var) to assign an already formatted object to the message.
+Alternatively, you can set the schema with the [`protobuf-schema()`](#protobuf-schema) option, use [proto-var()](#proto-var) to assign an already formatted object to the message, or use a JSON template with the [json-var()](#json-var) option.
 
 You can find the available column types in the [official ClickHouse documentation](https://clickhouse.com/docs/en/sql-reference/data-types).
 
diff --git a/content/chapter-nonsequential-processing/_index.md b/content/chapter-nonsequential-processing/_index.md
@@ -10,7 +10,7 @@ By default, {{% param "product.abbrev" %}} processes log messages arriving from
 
 Sequential processing performs well if you have relatively many parallel connections, in which case it uses all the available CPU cores. However, if a small number of connections deliver a large number of messages, this behavior becomes a bottleneck.
 
-Starting with {{% param "product.abbrev" %}} version 4.3, {{% param "product.abbrev" %}} can split a stream of incoming messages into a set of partitions, which can be processed by multiple threads in parallel. Depending on how you partition the stream, you might lose the message ordering, but can scale the incoming load to all CPUs in the system, even if the entire load is coming from a single, chatty sender.
+Starting with {{% param "product.abbrev" %}} version 4.3, {{% param "product.abbrev" %}} can distribute a stream of incoming messages between a set of workers to process the stream by multiple threads in parallel. Depending on how you partition the stream, you might lose the message ordering, but can scale the incoming load to all CPUs in the system, even if the entire load is coming from a single, chatty sender.
 
 To enable this mode of execution, use the `parallelize()` element in your log path.
 
@@ -24,7 +24,7 @@ log {
       log-iw-size(10M) max-connections(10) log-fetch-limit(100000)
     );
   };
-  parallelize(partitions(4));
+  parallelize(workers(4));
 
   # from this part on, messages are processed in parallel even if
   # messages are originally coming from a single connection
@@ -34,7 +34,7 @@ log {
 };
 ```
 
-`parallelize()` uses round-robin to allocate messages to partitions by default, but you can retain ordering for a subset of messages with the `partition-key()` option. The `partition-key()` option specifies a template: messages that expand the template to the same value are mapped to the same partition. For example, you can partition messages based on their sender host:
+`parallelize()` uses round-robin to allocate messages to workers (called partitions in versions between 4.3-4.16) by default, but you can retain ordering for a subset of messages with the `worker-partition-key()` option. The `worker-partition-key()` option specifies a template: messages that expand the template to the same value are mapped to the same partition. For example, you can partition messages based on their sender host:
 
 ```shell
 log {
@@ -44,7 +44,7 @@ log {
       log-iw-size(10M) max-connections(10) log-fetch-limit(100000)
     );
   };
-  parallelize(partitions(4) partition-key("$HOST"));
+  parallelize(workers(4) worker-partition-key("$HOST"));
 
   # from this part on, messages are processed in parallel if their
   # $HOST value differs. Messages with the same $HOST will be mapped
@@ -55,3 +55,5 @@ log {
   destination { ... };
 };
 ```
+
+Staring with {{< product >}} version 4.17, you can use the `batch-size()` option to specify how many consecutive messages should be processed by a single `parallelize()` worker. This ensures that this many messages preserve their order on the destination side, and also improves `parallelize()` performance. A value around 100 is recommended for `batch-size()`. Default value: `0` (batching is disabled).
diff --git a/content/filterx/_index.md b/content/filterx/_index.md
@@ -326,7 +326,40 @@ js = json({
 });
 ```
 
-To create a field only if the assigned value is non-null, see [Create dict element if non-null (`:??`)]({{< relref "/filterx/operator-reference.md#create-non-null" >}}).
+When working with dicts, note the following points:
+
+- To create a field only if the assigned value is non-null, see [Create dict element if non-null (`:??`)]({{< relref "/filterx/operator-reference.md#create-non-null" >}}).
+- To assign a value to a non-existing key where only this key doesn't exist, you can use a simple value assignment, for example:
+
+    ```shell
+    js = json({
+    "key1": "one",
+    "key2": "two"
+    });
+
+    js.key3 = "three"
+    ```
+
+    However, if you want to assign a value where multiple elements of the path don't exist, use the [`dpath`]({{< relref "/filterx/function-reference.md#dpath" >}}) FilterX function, for example:
+
+    ```shell
+    dpath(js.key4.key41.key412) = "nested value"
+    ```
+
+    The value of the dictionary will be:
+
+    ```shell
+    js = json({
+    "key1": "one",
+    "key2": "two",
+    "key3": "three",
+    "key4": {
+        "key41": {
+            "key412": "nested value"
+            }
+       }
+    });
+    ```
 
 Within a FilterX block, you can access the fields of complex data types by using indexes and the dot notation, for example:
 
@@ -378,8 +411,9 @@ For details, see {{% xref "/filterx/operator-reference.md" %}}.
 FilterX has the following built-in functions.
 
 - [`cache_json_file`]({{< relref "/filterx/function-reference.md#cache-json-file" >}}): Loads an external JSON file to lookup contextual information.
-- [`endswith`]({{< relref "/filterx/filterx-string-search/_index.md" >}}): Checks if a string ends with the specified value.
 - [`dedup_metrics_labels`]({{< relref "/filterx/filterx-metrics/_index.md#metrics-labels" >}}): Deduplicate `metrics_labels` objects.
+- [`dpath`]({{< relref "/filterx/function-reference.md#dpath" >}}): Creates a nested path in a dictionary.
+- [`endswith`]({{< relref "/filterx/filterx-string-search/_index.md" >}}): Checks if a string ends with the specified value.
 - [`flatten`]({{< relref "/filterx/function-reference.md#flatten" >}}): Flattens the nested elements of an object.
 - [`format_cef`]({{< relref "/filterx/filterx-format-data/format-cef" >}}): Formats a dictionary into Common Event Format (CEF).
 - [`format_csv`]({{< relref "/filterx/filterx-format-data/format-csv.md" >}}): Formats a dictionary or a list into a comma-separated string.
diff --git a/content/filterx/filterx-parsing/key-value-parser/kv-parser-options/_index.md b/content/filterx/filterx-parsing/key-value-parser/kv-parser-options/_index.md
@@ -16,6 +16,24 @@ For example, to parse `key1=value1;key2=value2` pairs, use:
 ${MESSAGE} = parse_kv("key1=value1;key2=value2", pair_separator=";");
 ```
 
+## stray_words_append_to_value {#stray-words-append}
+
+Available in {{% param "product.abbrev" %}} 4.17 and later.
+
+If the `stray_words_append_to_value` flag is set, any stray words between the value pairs are appended to the preceding value. For example:
+
+```shell
+# input: a=b b=c d e f=g
+filterx {
+  ${MESSAGE} = parse_kv(${MESSAGE}, value_separator="=", pair_separator=" ", stray_words_append_to_value=true);
+};
+# The value of $MSG will be: {"a":"b","b":"c d e","f":"g"}
+```
+
+If you want to collect the stray words into a separate key, see [`stray_words_key`](#stray-words-key).
+
+{{< include-headless "wnt/note-parse-kv-stray-values.md" >}}
+
 ## stray_words_key {#stray-words-key}
 
 Specifies the key where {{% param "product.abbrev" %}} stores any stray words that appear before or between the parsed key-value pairs. If multiple stray words appear in a message, then {{% param "product.abbrev" %}} stores them as a comma-separated list. Default value:`N/A`
@@ -35,6 +53,10 @@ ${PARSED_MESSAGE} = parse_kv(${MESSAGE}, stray_words_key="stray_words");
 
 The value of `${PARSED_MESSAGE}.stray_words` for this message will be: `["interzone-emtn_s1_vpn-enodeb_om", "inbound"]`
 
+If you want to append the stray words to the respective values instead of adding them to a separate value, see [`stray_words_append_to_value`](#stray-words-append).
+
+{{< include-headless "wnt/note-parse-kv-stray-values.md" >}}
+
 ## value_separator
 
 Specifies the character that separates the keys from the values. Default value: `=`.
diff --git a/content/filterx/function-reference.md b/content/filterx/function-reference.md
@@ -71,6 +71,37 @@ Usually, you use the [strptime](#strptime) FilterX function to create datetime v
 
 Deduplicate `metrics_labels` objects. For details, see {{% xref "/filterx/filterx-metrics/_index.md#metrics-labels" %}}.
 
+## dpath
+
+Available in {{< product >}} 4.17 and later.
+
+Assigns a value to a dictionary and creates any elements of the path that don't exist. For example:
+
+```shell
+js = json({
+"key1": "one",
+"key2": "two",
+"key3": "three"
+});
+
+dpath(js.key4.key41.key412) = "nested value"
+```
+
+The value of the dictionary will be:
+
+```shell
+js = json({
+"key1": "one",
+"key2": "two",
+"key3": "three",
+"key4": {
+    "key41": {
+        "key412": "nested value"
+        }
+    }
+});
+```
+
 ## endswith
 
 Available in {{< product >}} 4.9 and later.
diff --git a/content/filterx/operator-reference.md b/content/filterx/operator-reference.md
@@ -196,7 +196,7 @@ Is there a workaround for wildcards/globbing? /chapter-routing-filters/filters/r
 
 Available in {{< product >}} 4.15 and later.
 
-You can slice strings at the specified index using the `..` operator to get a section of the string. Indexing starts at 0, and must be non-negative. You can omit the index to refer to the beginning or the end of the string. For example:
+You can slice strings at the specified index using the `..` operator to get a section of the string. Indexing starts at 0. You can omit the index to refer to the beginning or the end of the string. For example:
 
 ```shell
 filterx {
@@ -213,6 +213,17 @@ filterx {
 };
 ```
 
+Staring with {{< product >}} version 4.17, you can use negative indexes to refer to characters from the end of the string, for example:
+
+```shell
+filterx {
+  str = "example";
+  str[..-2] == "examp";
+  str[-3..] == "ple";
+  str[2..-2] == "amp";
+};
+```
+
 ## Ternary conditional operator
 
 The [ternary conditional operator](https://en.wikipedia.org/wiki/Ternary_conditional_operator) evaluates an expression and returns the first argument if the expression is true, and the second argument if it's false.
diff --git a/content/headless/axosyslog-intro.md b/content/headless/axosyslog-intro.md
@@ -1,5 +1,6 @@
 ---
 ---
+<!-- This file is under the copyright of Axoflow, and licensed under Apache License 2.0, except for using the Axoflow and AxoSyslog trademarks. -->
 {{< include-headless "tagline.md" >}}
 {{< product >}} is a drop-in replacement for `syslog-ng`, created by the original creators of `syslog-ng`. (It started as a fork, branched after syslog-ng&trade; v4.7.1).
 
diff --git a/content/headless/wnt/note-parse-kv-stray-values.md b/content/headless/wnt/note-parse-kv-stray-values.md
@@ -0,0 +1,6 @@
+---
+---
+<!-- This file is under the copyright of Axoflow, and licensed under Apache License 2.0, except for using the Axoflow and AxoSyslog trademarks. -->
+{{% alert title="Note" color="info" %}}
+Note that you cannot use `stray_words_append_to_value` and `stray_words_key` in the same parser.
+{{% /alert %}}
diff --git a/content/whats-new/_index.md b/content/whats-new/_index.md
@@ -6,6 +6,14 @@ weight: 10
 
 This page is a changelog that collects the major changes and additions to this documentation. (If you want to know the details about why we have separate documentation for AxoSyslog and how it relates to the `syslog-ng` documentation, read our [syslog-ng documentation and similarities with AxoSyslog Core](https://axoflow.com/blog/axosyslog-core-documentation-syslog-ng) blog post.)
 
+## Version 4.17 (2025-09-04)
+
+- The `parse_kv` FilterX function has an option ({{% xref "/filterx/filterx-parsing/key-value-parser/kv-parser-options/_index.md#stray-words-key" %}}) to append stray words to the preceding key.
+- You can now use negative indexes when [slicing FilterX strings]({{< relref "/filterx/operator-reference.md#slicing" >}}).
+- The [`dpath`]({{< relref "/filterx/function-reference.md#dpath" >}}) FilterX function assigns a value to a dictionary and creates any elements of the path that don't exist.
+- When using `parallelize()` during {{% xref "/chapter-nonsequential-processing/_index.md" %}}, you set the `batch-size()` option to specify how many consecutive messages should be processed by a single `parallelize()` worker.
+- For the `clickhouse()` destination, you can now use the [`json-var()` option]({{< relref "/chapter-destinations/clickhouse/_index.md#json-var" >}}) to send the message to the ClickHouse server in Protobuf/JSON mixed mode ([`JSONEachRow` format](https://clickhouse.com/docs/interfaces/formats/JSONEachRow)). In this mode, type validation is performed by the ClickHouse server itself, so no Protobuf schema is required for communication.
+
 ## Version 4.16 (2025-08-15)
 
 - New [`${PROTO_NAME` macro]({{< relref "/chapter-manipulating-messages/customizing-message-format/reference-macros/_index.md#proto-name" >}}).