Skip to content

Conversation

Clonkk
Copy link

@Clonkk Clonkk commented Aug 25, 2025

No description provided.

@CLAassistant
Copy link

CLAassistant commented Aug 25, 2025

CLA assistant check
All committers have signed the CLA.

Copy link

ACTION NEEDED

Substrait follows the Conventional Commits
specification
for
release automation.

The PR title and description are used as the merge commit message. Please update your PR title and description to match the specification.

@Clonkk Clonkk changed the title Adding helper function to pretty-print plan feat: pretty-printer for plan and expr Aug 25, 2025
@Clonkk
Copy link
Author

Clonkk commented Aug 25, 2025

Let me know if you need further modification. The goal here is to have a good basis we can increment on to offer a nice debugging experience with substrait.

I realise this is a big PR, sorry about that but the init of a recursive function took a while to get right 😬

@Clonkk
Copy link
Author

Clonkk commented Aug 25, 2025

Substrait Plan Pretty Printer - Architecture Diagram

Mermaid diagram to understand a bit better the flow

graph TB
    %% Main Entry Points
    subgraph "Public Interface"
        PP[pretty_print_plan]
        PE[pretty_print_expression]
        SP[stringify_plan]
        SE[stringify_expression]
    end

    %% Core Printer Class
    subgraph "PlanPrinter Class"
        PPClass[PlanPrinter]
        Init[__init__]
        ColorDetect[_detect_color_support]
        Color[_color]
        IndentArrow[_get_indent_with_arrow]
        ResolveField[_resolve_field_name]
    end

    %% Main Streaming Methods
    subgraph "Core Streaming Methods"
        StreamPlan[_stream_plan]
        StreamRel[_stream_relation]
        StreamExpr[_stream_expression]
    end

    %% Relation Handlers
    subgraph "Relation Handlers"
        ReadRel[_stream_read_rel]
        FilterRel[_stream_filter_rel]
        ProjectRel[_stream_project_rel]
        AggregateRel[_stream_aggregate_rel]
        SortRel[_stream_sort_rel]
        JoinRel[_stream_join_rel]
        CrossRel[_stream_cross_rel]
        FetchRel[_stream_fetch_rel]
    end

    %% Expression Handlers
    subgraph "Expression Handlers"
        Literal[_stream_literal]
        Selection[_stream_selection]
        ScalarFunc[_stream_scalar_function]
        Cast[_stream_cast]
        IfThen[_stream_if_then]
        WindowFunc[_stream_window_function]
    end

    %% Specialized Handlers
    subgraph "Specialized Handlers"
        FuncArg[_stream_function_argument]
        MapLiteral[_stream_map_literal]
        LiteralValue[_stream_literal_value]
        TypeToString[_type_to_string]
    end

    %% Helper Methods
    subgraph "Helper Methods"
        GetFuncArgString[_get_function_argument_string]
    end

    %% Data Flow
    PP --> PPClass
    PE --> PPClass
    SP --> PPClass
    SE --> PPClass

    PPClass --> Init
    Init --> ColorDetect
    Init --> Color
    Init --> IndentArrow
    Init --> ResolveField

    PPClass --> StreamPlan
    StreamPlan --> StreamRel
    StreamRel --> ReadRel
    StreamRel --> FilterRel
    StreamRel --> ProjectRel
    StreamRel --> AggregateRel
    StreamRel --> SortRel
    StreamRel --> JoinRel
    StreamRel --> CrossRel
    StreamRel --> FetchRel

    ReadRel --> StreamExpr
    FilterRel --> StreamExpr
    ProjectRel --> StreamExpr
    AggregateRel --> StreamExpr
    SortRel --> StreamExpr
    JoinRel --> StreamExpr
    CrossRel --> StreamExpr
    FetchRel --> StreamExpr

    StreamExpr --> Literal
    StreamExpr --> Selection
    StreamExpr --> ScalarFunc
    StreamExpr --> Cast
    StreamExpr --> IfThen
    StreamExpr --> WindowFunc

    ScalarFunc --> FuncArg
    ScalarFunc --> MapLiteral
    ScalarFunc --> LiteralValue

    FuncArg --> MapLiteral
    FuncArg --> LiteralValue
    FuncArg --> ScalarFunc

    MapLiteral --> LiteralValue
    LiteralValue --> MapLiteral

    %% Color and Styling
    classDef public fill:#e1f5fe,stroke:#01579b,stroke-width:2px
    classDef core fill:#f3e5f5,stroke:#4a148c,stroke-width:2px
    classDef relation fill:#e8f5e8,stroke:#1b5e20,stroke-width:2px
    classDef expression fill:#fff3e0,stroke:#e65100,stroke-width:2px
    classDef specialized fill:#fce4ec,stroke:#880e4f,stroke-width:2px
    classDef helper fill:#f1f8e9,stroke:#33691e,stroke-width:2px

    class PP,PE,SP,SE public
    class PPClass,Init,ColorDetect,Color,IndentArrow,ResolveField,StreamPlan,StreamRel,StreamExpr core
    class ReadRel,FilterRel,ProjectRel,AggregateRel,SortRel,JoinRel,CrossRel,FetchRel relation
    class Literal,Selection,ScalarFunc,Cast,IfThen,WindowFunc expression
    class FuncArg,MapLiteral,LiteralValue,TypeToString specialized
    class GetFuncArgString helper
Loading

Flow Description

1. Entry Points

  • pretty_print_plan() / pretty_print_expression(): Print directly to stdout
  • stringify_plan() / stringify_expression(): Return formatted strings

2. Core Initialization

  • PlanPrinter class handles configuration and color detection
  • Auto-detects terminal color support with fallback to colorless mode
  • Sets up indentation patterns and field resolution

3. Plan Processing Flow

  • _stream_plan()_stream_relation() → Specific relation handlers
  • Each relation type has its own specialized handler
  • Relations can contain expressions that get processed recursively

4. Expression Processing Flow

  • _stream_expression() routes to appropriate expression handlers
  • Scalar functions get special treatment with argument expansion
  • Map literals and nested structures expand recursively

5. Recursive Expansion

  • Function arguments expand to show full structure
  • Map literals show key-value pairs with proper indentation
  • Nested scalar functions expand with additional depth levels

6. Output Formatting

  • Consistent -> arrow indentation system
  • Color coding for different types of information
  • Schema name resolution for field references
  • Proper spacing and alignment throughout

Key Design Principles

  1. Separation of Concerns: Each element type has its own handler
  2. Recursive Processing: Nested structures expand naturally
  3. Consistent Formatting: Uniform indentation and color schemes
  4. Performance: Streaming output with minimal memory overhead
  5. Extensibility: Easy to add new element types

@tokoko
Copy link
Contributor

tokoko commented Aug 25, 2025

This is definitely better than reading protos 😆 minor nit: display.py probably shouldn't be under builders. utils.py or top-level print function makes more sense.

On a more serious note, I know David has been working on text representation of a substrait plan (https://github.com/EpsilonPrime/substrait-textplan) which is essentially the same thing and also has python bindings. If the goal there is to establish a standard text format for substrait, we should probably try to avoid reimplementing the format here.

@Clonkk
Copy link
Author

Clonkk commented Aug 25, 2025

utils.py does sound nice 👍 I can make the switch (or maybe debugging.py to avoid being too generic 🤔 ?)

I wasn't aware of substrait-textplan, it seems like a super nice library. Indeed similar enough to what I'm doing.

I am just not a fan of having to load an external dependencies for something that's a debugging use-case essentially since it means i have to load a dependency I'm not using most of the time OR I have to add an external deps in my system every time I decide I need to add telemetry / logs to debug my substraits.

I am very much in favor of 'battery included' libraries :)

If the goal there is to establish a standard text format for substrait

In my mind, the goal here is to have a nicer debugging experience when building systems around substraits. Substrait being protobuf the 'standard' should be text_format I believe ?

@mbrobbel mbrobbel requested a review from EpsilonPrime August 26, 2025 08:18
@Clonkk
Copy link
Author

Clonkk commented Aug 27, 2025

CI error unrelated :

Run npm install @commitlint/config-conventional
npm error code E429
npm error 429 Too Many Requests - GET https://registry.npmjs.org/@commitlint%2fconfig-conventional
npm error A complete log of this run can be found in: /home/runner/.npm/_logs/2025-08-26T08_29_10_197Z-debug-0.log
Error: Process completed with exit code 1.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants