-
Notifications
You must be signed in to change notification settings - Fork 150
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Auto Schema Generation #233
Comments
I think the issue revolves around the fact that the object that I am passing into forValue has fields with arrays that have no root node. All the items record types that are not getting named fall into that category. Will Get Skipped
If my base json was formated like the following:
Then I believe it would work. What would be nice is if there was someway to pass the parent's name to the type hook and ensure that it gets called on in this specific case where there is no root node to the array objects. This is probably never an issue if the source of the data begins its life as a C# or Java object that is serialized into JSON. |
So I attempted using the example I posted in my last comment and that didn't work either. The resulting encoded file never includes any data. |
Great find! There was indeed a bug where type options (which include the type hook) weren't properly passed when combining array types; f69c869 should fix this. Can you upgrade to |
@mtth Okay I updated to 5.4.9 and now the auto naming works across the schema. So that bug based on my test is fixed. It doesn't explain why the file never gets encoded but I will follow up with that in issue 232. Thank you so much for the quick turn around it's very much appreciated. |
I thought everything was fixed but when I went to run the resulting avro file through C# it failed for a record type missing a name. I found another edge case and am trying to define what is exactly it is. |
@mtth Here are some snippets from my large file that didn't get names. I am not sure if this helps are not. {
"type": "array",
"items": ["string", {
"type": "record",
"fields": [{
"name": "field1",
"type": "float"
}, {
"name": "field2",
"type": "float"
}, {
"name": "field3",
"type": "int"
}]
}]
} {
"name": "someName",
"type": {
"type": "record",
"fields": [{
"name": "major",
"type": "int"
}, {
"name": "minor",
"type": "int"
}, {
"name": "patch",
"type": "int"
}, {
"name": "build",
"type": "int"
}]
}
} {
"name": "anotherName",
"type": {
"type": "array",
"items": {
"type": "record",
"fields": [{
"name": "field1",
"type": "float"
}, {
"name": "nameTwo",
"type": {
"type": "record",
"fields": [{
"name": "major",
"type": "int"
}]
}
}]
}
}
} |
Thanks! I just fixed another edge case where the options weren't wired through properly; can you upgrade to |
@mtth That did the trick. I do have one suggestion that I am not sure if it's possible as it relates to the type hook. What would be nice is if you could enable something like below. If we knew the parent name then we could make a record name that had more meaning. The project jsonschema-avro follows a similar convention for dealing with records with no names. It would also be nice if the root of the schema is of type record to be able to specify that somehow. If you do decide to make that change I would be happy to run it against my monster 6 meg file and review the resulting schema. Thanks a bunch for you help. Hopefully those edge cases help someone else out. function createNamingHook() {
let index = 0;
return function (schema, opts) {
switch (schema.type) {
case 'enum':
case 'fixed':
case 'record':
if(!schema.name) {
const schemaName = `${parentName}.record` || `Auto${index++}`;
schema.name = schemaName ;
}
break;
default:
}
};
} |
Great that it works!
Unfortunately, this is tricky: children's schemas are generated before their parent's (so there is no "parent name" yet). An alternative could be to provide a path to the value which is currently being parsed, similar to what |
@mtth It definitely sounds like it would work in my use case. I should be able to create a strategy to determine the name based on the path. |
Great, I filed #234 to track adding it to If you are only using type-related logic in const {Type} = require('@avro/types');
const val = {
one: [1,2,3],
two: {
three: 'four',
},
five: [
{six: 6},
{six: '6'},
],
};
function typeHook(schema, opts, scope) {
// scope.path is the path to the type about to be generated.
}
const type = Type.forValue(val, {typeHook}); |
@mtth Thanks I will check out that example. A+ |
I am evaluating AVRO ("avsc": "^5.4.7") for use with large objects that have mostly static fields with some that are extremely complex and nested. So the thought was to use the auto Schema generation support and see where that would go.
In my first attempt I was able to encode and decode a large JSON payload without issue until I tried opening it with another library for which I found that the sub record types were not getting named. This lead me to issue 108 which describes a solution that made sense in my case since the issue with the resulting auto generated schema was with sub record types not being auto named.
So I went down the route described in issue 108 and was able to encode a file but not able to decode the file. Looking at the raw output I can see that the root record type name was set but not of the sub types.
I am attaching the sample code and a snippet of the generated schema before and after using the name hook.
Schema Before
Schema After (Look Towards the end)
The source json payload in this test is 8 megs in size making to large to post here. Any suggestions would be appreciated.
** Sample Code **
The text was updated successfully, but these errors were encountered: