Skip to content

UI4T architecture rehaul #999

@Ishankoradia

Description

@Ishankoradia

UI4t architecture currently has the following entities

  • A model node i.e. OrgDbtModel which is a materialized table. Everything should ideally start and end here in chains.
  • An operation node i.e. OrgDbtOperation which more like a configuration node that defines how the data should transform while moving through it.
  • A DbtEdge that maps or joins two OrgDbtModel.
  • The DAG (directed acyclic graph) is build from the above entities & sent to the client

The architecture/codebase has become complex & might be difficult to scale for following reasons/properties

  • We dont actually consider OrgDbtOperation as a node on the transform graph for some reason. There are no edges to & from an OrgDbtOperation node, which makes the operations of adding an operation, deleting an operation & rendering the DAG, unecessarily complicated
  • transform_api.get_dbt_project_DAG is quite complex. This should be as simple as fetching all the edges and the nodes on it.
  • transform.get_operation has to read from a cryptic json config dbt_operation.config and figure out the edges to it.
  • If we every want to save the state of the nodes (with coordinates) we definitely wont be able to do it in this architecture.
  • Also the schema of various operation configs (drop, union) are not typed and it becomes difficult to understand whats going on.

Proposed solution

  • There should be a concept of CanvasNode or Node. A more generic node. Different types of node would inherit from this one i.e. OrgDbtModel extends Node & OrgDbtOperation extends Node
  • A DbtEdge should exist from two generic Node.
  • Generating the DAG is now nothing but going through all edges and finding the unique set of nodes included.
  • Deleting a node form canvas should be straightforward, delete the Node and handle the side effects based on what type it is i.e. OrgDbtModel or OrgDbtOperation
  • Currently OrgDbtOperation.config has the information of the sources i.e. config.input_models in its cryptic json if its the first in the chain. This can go away since the edges will tell us this now.
  • Make the schema of various operations typed and see if there can be a more general schema that helps us take away creating individual form/functions for every new operation.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions