Skip to content

Commit 0230476

Browse files
committed
Add documentation
1 parent 884526b commit 0230476

File tree

2 files changed

+59
-3
lines changed

2 files changed

+59
-3
lines changed

docs/reference/source-serialization.md

Lines changed: 58 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,9 @@ Source serialization refers to the process of (de)serializing POCO types in cons
1616
- [Registering custom `System.Text.Json` converters](#registering-custom-converters)
1717
- [Creating a custom `Serializer`](#creating-custom-serializers)
1818
- [Native AOT](#native-aot)
19+
- [Vector data serialization](#vector-data-serialization)
20+
- [Opt‑in on document properties](#optin-on-document-properties)
21+
- [Configure encodings globally](#configure-encodings-globally)
1922

2023
## Modeling documents with types [modeling-documents-with-types]
2124

@@ -451,3 +454,58 @@ static void ConfigureOptions(JsonSerializerOptions o)
451454
o.TypeInfoResolver = UserTypeSerializerContext.Default;
452455
}
453456
```
457+
458+
## Vector data serialization [vector-data-serialization]
459+
460+
Efficient ingestion of high-dimensional vectors often benefits from compact encodings rather than verbose JSON arrays. The client provides opt‑in converters for vector properties in your source documents that serialize to either hexadecimal or `base64` strings, depending on the vector type and the Elasticsearch version you target.
461+
462+
- Float vectors can use `base64` starting from Elasticsearch 9.3.0.
463+
- Byte/bit vectors can use hexadecimal strings starting from Elasticsearch 8.14.0 and `base64` starting from Elasticsearch 9.3.0.
464+
- The legacy representation (JSON arrays) remains available for backwards compatibility.
465+
466+
Base64 is the preferred format for high‑throughput indexing because it minimizes payload size and reduces JSON parsing overhead.
467+
468+
### Opt‑in on document properties [optin-on-document-properties]
469+
470+
Vector encodings are opt‑in. Apply a `System.Text.Json` `JsonConverter` attribute on the vector property of your POCO. For best performance, model the properties as `ReadOnlyMemory<T>`.
471+
472+
```csharp
473+
using System;
474+
using System.Text.Json.Serialization;
475+
using Elastic.Clients.Elasticsearch.Serialization;
476+
477+
public class ImageEmbedding
478+
{
479+
[JsonConverter(typeof(FloatVectorDataConverter))] <1>
480+
public ReadOnlyMemory<float> Vector { get; set; }
481+
}
482+
483+
public class ByteSignature
484+
{
485+
[JsonConverter(typeof(ByteVectorDataConverter))] <2>
486+
public ReadOnlyMemory<byte> Signature { get; set; }
487+
}
488+
```
489+
490+
1. `FloatVectorDataConverter` enables `base64` encoding for float vectors.
491+
2. `ByteVectorDataConverter` enables `base64` encoding for byte vectors.
492+
493+
Without these attributes, vectors are serialized using the default source serializer behavior.
494+
495+
### Configure encodings globally [configure-encodings-globally]
496+
497+
When the opt‑in attributes are present, you can control the actual wire encoding globally via `ElasticsearchClient` settings on a per‑type basis:
498+
499+
- `FloatVectorDataEncoding`: controls float vector encoding (legacy arrays or `base64`).
500+
- `ByteVectorDataEncoding`: controls byte/bit vector encoding (legacy arrays, hexadecimal, or `base64`).
501+
502+
These settings allow a single set of document types to work against mixed clusters. For example, a library using the 8.19.x client can talk to both 8.x and 9.x servers and dynamically opt out of `base64` on older servers without maintaining duplicate POCOs (with/without converter attributes).
503+
504+
::::{note}
505+
506+
Set the encoding based on your effective server version:
507+
508+
- Float vectors: use `base64` for 9.3.0+; otherwise use legacy arrays.
509+
- Byte/bit vectors: prefer `base64` for 9.3.0+; use hexadecimal for 8.14.0–9.2.x; otherwise use legacy arrays.
510+
511+
::::

src/Elastic.Clients.Elasticsearch/_Shared/Next/VectorConverters.cs

Lines changed: 1 addition & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -9,9 +9,7 @@
99
using System.Text.Json;
1010
using System.Text.Json.Serialization;
1111

12-
using Elastic.Clients.Elasticsearch.Serialization;
13-
14-
namespace Elastic.Clients.Elasticsearch;
12+
namespace Elastic.Clients.Elasticsearch.Serialization;
1513

1614
/// <summary>
1715
/// The encoding to use when serializing vector data using the <see cref="FloatVectorDataConverter"/> converter.

0 commit comments

Comments
 (0)