You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+60-1Lines changed: 60 additions & 1 deletion
Original file line number
Diff line number
Diff line change
@@ -110,6 +110,17 @@ pre-commit install
110
110
111
111
### Create and Run Tests
112
112
113
+
Set up the SSL files permissions:
114
+
115
+
```bash
116
+
chmod 0600 .ssl/*.key
117
+
```
118
+
119
+
Start the test databases using Docker Compose:
120
+
```bash
121
+
docker-compose up -d
122
+
```
123
+
113
124
Create tests within the `target_postgres/tests` subfolder and
114
125
then run:
115
126
@@ -163,7 +174,7 @@ The below table shows how this tap will map between jsonschema datatypes and Pos
163
174
| UNSUPPORTED | bit varying [ (n) ]|
164
175
| boolean | boolean |
165
176
| UNSUPPORTED | box |
166
-
|UNSUPPORTED | bytea |
177
+
|string with contentEncoding="base16" ([opt-in feature](#content-encoding-support))| bytea |
167
178
| UNSUPPORTED | character [ (n) ]|
168
179
| UNSUPPORTED | character varying [ (n) ]|
169
180
| UNSUPPORTED | cidr |
@@ -204,6 +215,7 @@ The below table shows how this tap will map between jsonschema datatypes and Pos
204
215
Note that while object types are mapped directly to jsonb, array types are mapped to a jsonb array.
205
216
206
217
If a column has multiple jsonschema types, the following order is using to order Postgres types, from highest priority to lowest priority.
218
+
- BYTEA
207
219
- ARRAY(JSONB)
208
220
- JSONB
209
221
- TEXT
@@ -216,3 +228,50 @@ If a column has multiple jsonschema types, the following order is using to order
216
228
- INTEGER
217
229
- BOOLEAN
218
230
- NOTYPE
231
+
232
+
## Content Encoding Support
233
+
234
+
Json Schema supports the [`contentEncoding` keyword](https://datatracker.ietf.org/doc/html/rfc4648#section-8), which can be used to specify the encoding of input string types.
235
+
236
+
This target can detect content encoding clues in the schema to determine how to store the data in the postgres in a more efficient way.
237
+
238
+
Content encoding interpretation is disabled by default. This is because the default config is meant to be as permissive as possible, and do not make any assumptions about the data that could lead to data loss.
239
+
240
+
However if you know your data respects the advertised content encoding way, you can enable this feature to get better performance and storage efficiency.
241
+
242
+
To enable it, set the `interpret_content_encoding` option to `True`.
243
+
244
+
### base16
245
+
246
+
The string is encoded using the base16 encoding, as defined in [RFC 4648](https://json-schema.org/draft/2020-12/draft-bhutton-json-schema-validation-00#rfc.section.8.3
247
+
).
248
+
249
+
Example schema:
250
+
```json
251
+
{
252
+
"type": "object",
253
+
"properties": {
254
+
"my_hex": {
255
+
"type": "string",
256
+
"contentEncoding": "base16"
257
+
}
258
+
}
259
+
}
260
+
```
261
+
262
+
Data will be stored as a `bytea` in the database.
263
+
264
+
Example data:
265
+
```json
266
+
# valid data
267
+
{ "my_hex": "01AF" }
268
+
{ "my_hex": "01af" }
269
+
{ "my_hex": "1af" }
270
+
{ "my_hex": "0x1234" }
271
+
272
+
# invalid data
273
+
{ "my_hex": " 0x1234 " }
274
+
{ "my_hex": "House" }
275
+
```
276
+
277
+
For convenience, data prefixed with `0x` or containing an odd number of characters is supported although it's not part of the standard.
0 commit comments