Skip to content

Datahub failed when ingest json-schema if schema declare a property to true #15421

@batuan

Description

@batuan

Describe the bug
When trying to ingest a json-schema to a datahub instance, a bug happened if the schema declare a property to true.

To Reproduce
Steps to reproduce the behavior:

  1. With recipe file is:
source:
  type: json-schema
  config:
    path:  'debug.json'
    platform:  schemaregistry
    platform_instance: debug
    env: 'PROD'
    use_id_as_base_uri: true
    stateful_ingestion:
      enabled: false # recommended to have this turned on

sink:
    type: datahub-rest
    config:
      server: http://localhost:8080/
      token: .......
  1. json schema file
{
    "$schema": "http://json-schema.org/draft-07/schema#",
    "description": "demo schema for datahub json schema parsing",
    "type": "object",
    "properties": {
      "id":{
        "type": "string"
      },
      "content": true
    },
    "examples": [
      {
        "id": "123456",
        "content": 5
      },
      {
        "id": "123456",
        "content": {
          "some_field": "some_value"
        }
      },
      {
        "id": "123456",
        "content": false
      }
    ]
}
  1. Use command line: datahub ingest -c json_schema.yaml

Error

 Failed to process file *******ingest_sources\test1\debug.json
Traceback (most recent call last):
  File "*******.venv\lib\site-packages\datahub\ingestion\source\schema\json_schema.py", line 376, in get_workunits_internal
    yield from self._load_one_file(
  File "*******.venv\lib\site-packages\datahub\ingestion\source\schema\json_schema.py", line 274, in _load_one_file
    meta: models.SchemaMetadataClass = get_schema_metadata(
  File "*******.venv\lib\site-packages\datahub\ingestion\extractor\json_schema_util.py", line 672, in get_schema_metadata
    schema_fields = list(JsonSchemaTranslator.get_fields_from_schema(json_schema))
  File "*******.venv\lib\site-packages\datahub\ingestion\extractor\json_schema_util.py", line 636, in get_fields_from_schema
    yield from JsonSchemaTranslator.get_fields(
  File "*******.venv\lib\site-packages\datahub\ingestion\extractor\json_schema_util.py", line 600, in get_fields
    yield from generator.__get__(cls)(
  File "*******.venv\lib\site-packages\datahub\ingestion\extractor\json_schema_util.py", line 414, in _field_from_complex_type
    JsonSchemaTranslator._get_type_from_schema(field_schema),
  File "*******.venv\lib\site-packages\datahub\ingestion\extractor\json_schema_util.py", line 278, in _get_type_from_schema
    if Ellipsis in schema:
TypeError: argument of type 'bool' is not iterable

Expected Behavior:

  • In the json schema specs a property set to true means any valid json.

  • I expect the json schema to be loaded into datahub

Thanks :))

Metadata

Metadata

Assignees

Labels

bugBug report

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions