Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
58 changes: 58 additions & 0 deletions docs/reference/source-serialization.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,9 @@ Source serialization refers to the process of (de)serializing POCO types in cons
- [Registering custom `System.Text.Json` converters](#registering-custom-converters)
- [Creating a custom `Serializer`](#creating-custom-serializers)
- [Native AOT](#native-aot)
- [Vector data serialization](#vector-data-serialization)
- [Opt‑in on document properties](#optin-on-document-properties)
- [Configure encodings globally](#configure-encodings-globally)

## Modeling documents with types [modeling-documents-with-types]

Expand Down Expand Up @@ -451,3 +454,58 @@ static void ConfigureOptions(JsonSerializerOptions o)
o.TypeInfoResolver = UserTypeSerializerContext.Default;
}
```

## Vector data serialization [vector-data-serialization]

Efficient ingestion of high-dimensional vectors often benefits from compact encodings rather than verbose JSON arrays. The client provides opt‑in converters for vector properties in your source documents that serialize to either hexadecimal or `base64` strings, depending on the vector type and the Elasticsearch version you target.

- Float vectors can use `base64` starting from Elasticsearch 9.3.0.
- Byte/bit vectors can use hexadecimal strings starting from Elasticsearch 8.14.0 and `base64` starting from Elasticsearch 9.3.0.
- The legacy representation (JSON arrays) remains available for backwards compatibility.

Base64 is the preferred format for high‑throughput indexing because it minimizes payload size and reduces JSON parsing overhead.

### Opt‑in on document properties [optin-on-document-properties]

Vector encodings are opt‑in. Apply a `System.Text.Json` `JsonConverter` attribute on the vector property of your POCO. For best performance, model the properties as `ReadOnlyMemory<T>`.

```csharp
using System;
using System.Text.Json.Serialization;
using Elastic.Clients.Elasticsearch.Serialization;

public class ImageEmbedding
{
[JsonConverter(typeof(FloatVectorDataConverter))] <1>
public ReadOnlyMemory<float> Vector { get; set; }
}

public class ByteSignature
{
[JsonConverter(typeof(ByteVectorDataConverter))] <2>
public ReadOnlyMemory<byte> Signature { get; set; }
}
```

1. `FloatVectorDataConverter` enables `base64` encoding for float vectors.
2. `ByteVectorDataConverter` enables `base64` encoding for byte vectors.

Without these attributes, vectors are serialized using the default source serializer behavior.

### Configure encodings globally [configure-encodings-globally]

When the opt‑in attributes are present, you can control the actual wire encoding globally via `ElasticsearchClient` settings on a per‑type basis:

- `FloatVectorDataEncoding`: controls float vector encoding (legacy arrays or `base64`).
- `ByteVectorDataEncoding`: controls byte/bit vector encoding (legacy arrays, hexadecimal, or `base64`).

These settings allow a single set of document types to work against mixed clusters. For example, a library using the 8.19.x client can talk to both 8.x and 9.x servers and dynamically opt out of `base64` on older servers without maintaining duplicate POCOs (with/without converter attributes).

::::{note}

Set the encoding based on your effective server version:

- Float vectors: use `base64` for 9.3.0+; otherwise use legacy arrays.
- Byte/bit vectors: prefer `base64` for 9.3.0+; use hexadecimal for 8.14.0–9.2.x; otherwise use legacy arrays.

::::
Original file line number Diff line number Diff line change
Expand Up @@ -115,6 +115,8 @@ public abstract class ElasticsearchClientSettingsBase<TConnectionSettings> :
private readonly Serializer _sourceSerializer;
private BeforeRequestEvent? _onBeforeRequest;
private bool _experimentalEnableSerializeNullInferredValues;
private FloatVectorDataEncoding _floatVectorDataEncoding = Serialization.FloatVectorDataEncoding.Base64;
private ByteVectorDataEncoding _byteVectorDataEncoding = Serialization.ByteVectorDataEncoding.Base64;
private ExperimentalSettings _experimentalSettings = new();

private bool _defaultDisableAllInference;
Expand Down Expand Up @@ -165,6 +167,8 @@ protected ElasticsearchClientSettingsBase(
FluentDictionary<Type, string> IElasticsearchClientSettings.RouteProperties => _routeProperties;
Serializer IElasticsearchClientSettings.SourceSerializer => _sourceSerializer;
BeforeRequestEvent? IElasticsearchClientSettings.OnBeforeRequest => _onBeforeRequest;
FloatVectorDataEncoding IElasticsearchClientSettings.FloatVectorDataEncoding => _floatVectorDataEncoding;
ByteVectorDataEncoding IElasticsearchClientSettings.ByteVectorDataEncoding => _byteVectorDataEncoding;
ExperimentalSettings IElasticsearchClientSettings.Experimental => _experimentalSettings;

bool IElasticsearchClientSettings.ExperimentalEnableSerializeNullInferredValues => _experimentalEnableSerializeNullInferredValues;
Expand Down Expand Up @@ -198,6 +202,18 @@ public TConnectionSettings DefaultFieldNameInferrer(Func<string, string> fieldNa
public TConnectionSettings ExperimentalEnableSerializeNullInferredValues(bool enabled = true) =>
Assign(enabled, (a, v) => a._experimentalEnableSerializeNullInferredValues = v);

/// <inheritdoc cref="IElasticsearchClientSettings.FloatVectorDataEncoding"/>
/// <param name="encoding">The default vector data encoding to use.</param>
/// <returns>This settings instance for chaining.</returns>
public TConnectionSettings FloatVectorDataEncoding(FloatVectorDataEncoding encoding) =>
Assign(encoding, (a, v) => a._floatVectorDataEncoding = v);

/// <inheritdoc cref="IElasticsearchClientSettings.ByteVectorDataEncoding"/>
/// <param name="encoding">The default vector data encoding to use.</param>
/// <returns>This settings instance for chaining.</returns>
public TConnectionSettings ByteVectorDataEncoding(ByteVectorDataEncoding encoding) =>
Assign(encoding, (a, v) => a._byteVectorDataEncoding = v);

public TConnectionSettings Experimental(ExperimentalSettings settings) =>
Assign(settings, (a, v) => a._experimentalSettings = v);

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,10 @@
using System;
using System.Collections.Generic;
using System.Reflection;

using Elastic.Clients.Elasticsearch.Requests;
using Elastic.Clients.Elasticsearch.Serialization;

using Elastic.Transport;

namespace Elastic.Clients.Elasticsearch;
Expand Down Expand Up @@ -116,14 +119,37 @@ public interface IElasticsearchClientSettings : ITransportConfiguration
BeforeRequestEvent? OnBeforeRequest { get; }

/// <summary>
/// This is an advanced setting which controls serialization behaviour for inferred properies such as ID, routing and index name.
/// <para>When enabled, it may reduce allocations on serialisation paths where the cost can be more significant, such as in bulk operations.</para>
/// This is an advanced setting which controls serialization behaviour for inferred properties such as ID, routing and index name.
/// <para>When enabled, it may reduce allocations on serialization paths where the cost can be more significant, such as in bulk operations.</para>
/// <para>As a by-product it may cause null values to be included in the serialized data and impact payload size. This will only be a concern should some
/// typed not have inferrence mappings defined for the required properties.</para>
/// typed not have inference mappings defined for the required properties.</para>
/// </summary>
/// <remarks>This is marked as experiemental and may be removed or renamed in the future once its impact is evaluated.</remarks>
/// <remarks>This is marked as experimental and may be removed or renamed in the future once its impact is evaluated.</remarks>
bool ExperimentalEnableSerializeNullInferredValues { get; }

/// <summary>
/// Controls the vector data encoding to use for <see cref="ReadOnlyMemory{T}"/> properties
/// in documents during ingestion when the <see cref="FloatVectorDataConverter"/> is used.
/// </summary>
/// <remarks>
/// Setting this value to <see cref="FloatVectorDataEncoding.Legacy"/> provides backwards
/// compatibility when talking to Elasticsearch servers with a version older than 9.3.0
/// (required for <see cref="ByteVectorDataEncoding.Base64"/>).
/// </remarks>
FloatVectorDataEncoding FloatVectorDataEncoding { get; }

/// <summary>
/// Controls the vector data encoding to use for <see cref="ReadOnlyMemory{T}"/> properties
/// in documents during ingestion when the <see cref="ByteVectorDataConverter"/> is used.
/// </summary>
/// <remarks>
/// Setting this value to <see cref="ByteVectorDataEncoding.Legacy"/> provides backwards
/// compatibility when talking to Elasticsearch servers with a version older than 8.14.0
/// (required for <see cref="ByteVectorDataEncoding.Hex"/>) or older than 9.3.0 (required
/// for <see cref="ByteVectorDataEncoding.Base64"/>).
/// </remarks>
ByteVectorDataEncoding ByteVectorDataEncoding { get; }

/// <summary>
/// Experimental settings.
/// </summary>
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -250,6 +250,21 @@ public static void WriteUnionValue<T1, T2>(this Utf8JsonWriter writer, JsonSeria
);
}

public static void WriteSpanValue<T>(this Utf8JsonWriter writer, JsonSerializerOptions options, ReadOnlySpan<T> span,
JsonWriteFunc<T>? writeElement)
{
writeElement ??= static (w, o, v) => WriteValue(w, o, v);

writer.WriteStartArray();

foreach (var element in span)
{
writeElement(writer, options, element);
}

writer.WriteEndArray();
}

#endregion Delegate Based Write Methods

#region Specialized Write Methods
Expand Down
Loading
Loading