Skip to content

Add LinkML schema parser utility and corresponding unit test#17

Open
khanahmedm wants to merge 1 commit intodevelopfrom
feature/linkml-parser
Open

Add LinkML schema parser utility and corresponding unit test#17
khanahmedm wants to merge 1 commit intodevelopfrom
feature/linkml-parser

Conversation

@khanahmedm
Copy link
Collaborator

This PR introduces the new linkml_parser.py module for loading and converting LinkML schemas into Spark-compatible column definitions. It also includes a dedicated unit test suite to validate schema loading, type mapping, and remote/local path handling. No other ingestion logic is modified in this PR.

The full solution is available here in feature/full-solution branch.

@ialarmedalien
Copy link
Collaborator

I would advise against adding the linkml parser to the repo now, for the following reasons:

  1. it isn’t as fully featured as the one I’m using for the cdm-schema (https://github.com/kbase/cdm-schema/blob/main/linkml_to_pyspark.py)
  2. we really want the output — i.e. the pyspark schema (e.g. https://github.com/kbase/cdm-schema/blob/main/src/cdm_schema/kbase_cdm_pyspark.py)
  3. the linkml parser would be better as a community-maintained resource in the linkml repo (means others can update/maintain it as it will be in a central location with other linkml converters)

I think it would be more useful to have an importer for pyspark schemas / python modules like the one linked in point 2.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants