|
| 1 | +# Isthmus API Examples |
| 2 | + |
| 3 | +The Isthmus library converts Substrait plans to and from Spark Plans. There are two examples showing convertion in each direction. |
| 4 | + |
| 5 | +## How does this work in theory? |
| 6 | + |
| 7 | +In both cases, the Calcite library is used to do parsing and generation of the SQL String. Calcite has it's own relational object model, so there are clalsses within Ishtmus to convert Substrait to and from Calcites object model. |
| 8 | + |
| 9 | +Converting to Substrait from SQL will use Calcite to parse the SQL to an object model, and then it will be converted to Substrait. |
| 10 | + |
| 11 | +Converting from Substrait to SQL will involved converting Substrait to Calcite's object model, then asking Calcite to generate SQL strings. |
| 12 | + |
| 13 | +## Running the examples |
| 14 | + |
| 15 | +There are 2 example classes: |
| 16 | + |
| 17 | +- [FromSql](./src/main/java/io/substrait/examples/FromSql.java) that creates a plan starting from SQL |
| 18 | +- [ToSql](./app/src/main/java/io/substrait/examples/ToSQL.java) that reads a plan and creats the SQL |
| 19 | + |
| 20 | + |
| 21 | +### Requirements |
| 22 | + |
| 23 | +To run these you will need: |
| 24 | + |
| 25 | +- Java 17 or greater |
| 26 | +- [Two datafiles](./app/src/main/resources/) are provided for the sample data |
| 27 | + |
| 28 | + |
| 29 | +## Creating a Substrait Plan from SQL |
| 30 | + |
| 31 | +To run [`FromSql.java`](./src/main/java/io/substrait/examples/FromSql.java) from the root of this repository. `subtrait.plan` is the name of file written. |
| 32 | + |
| 33 | +```bash |
| 34 | + ./gradlew examples:isthmus-api:run --args "FromSql substrait.plan" |
| 35 | +> Task :examples:isthmus-api:run |
| 36 | +Plan{version=Version{major=0, minor=77, patch=0, producer=isthmus}, roots=[Root{input=Sort{input=Aggregate{input=Project{remap=Remap{indices=[15]}, input=Filter{input=Join{left=NamedScan{initialSchema=NamedStruct{struct=Struct{nullable=false, fields=[VarChar{nullable=true, length=15}, VarChar{nullable=true, length=40}, VarChar{nullable=true, length=40}, VarChar{nullable=true, length=15}, VarChar{nullable=true, length=15}, I32{nullable=true}, VarChar{nullable=true, length=15}]}, names=[vehicle_id, make, model, colour, fuel_type, cylinder_capacity, first_use_date]}, names=[vehicles]}, right=NamedScan{initialSchema=NamedStruct{struct=Struct{nullable=false, fields=[VarChar{nullable=true, length=15}, VarChar{nullable=true, length=15}, VarChar{nullable=true, length=20}, VarChar{nullable=true, length=20}, VarChar{nullable=true, length=20}, VarChar{nullable=true, length=15}, I32{nullable=true}, VarChar{nullable=true, length=15}]}, names=[test_id, vehicle_id, test_date, test_class, test_type, test_result, test_mileage, postcode_area]}, names=[tests]}, condition=ScalarFunctionInvocation{declaration=equal:any_any, arguments=[FieldReference{segments=[StructField{offset=0}], type=VarChar{nullable=true, length=15}}, FieldReference{segments=[StructField{offset=8}], type=VarChar{nullable=true, length=15}}], options=[], outputType=Bool{nullable=true}}, joinType=INNER}, condition=ScalarFunctionInvocation{declaration=equal:any_any, arguments=[FieldReference{segments=[StructField{offset=12}], type=VarChar{nullable=true, length=15}}, VarCharLiteral{nullable=false, value=P, length=15}], options=[], outputType=Bool{nullable=true}}}, expressions=[FieldReference{segments=[StructField{offset=3}], type=VarChar{nullable=true, length=15}}]}, groupings=[Grouping{expressions=[FieldReference{segments=[StructField{offset=0}], type=VarChar{nullable=true, length=15}}]}], measures=[Measure{function=AggregateFunctionInvocation{declaration=count:, arguments=[], options=[], aggregationPhase=INITIAL_TO_RESULT, sort=[], outputType=I64{nullable=false}, invocation=ALL}}]}, sortFields=[SortField{expr=FieldReference{segments=[StructField{offset=1}], type=Struct{nullable=false, fields=[VarChar{nullable=true, length=15}, I64{nullable=false}]}}, direction=ASC_NULLS_LAST}]}, names=[COLOUR, COLOURCOUNT]}], expectedTypeUrls=[]} |
| 37 | +File written to substrait.plan |
| 38 | +``` |
| 39 | + |
| 40 | +It is a binary file, so to check the file written out |
| 41 | +```bash |
| 42 | +ls -l examples/isthmus-api/substrait.plan |
| 43 | +-rw-r--r-- 1 matthew matthew 808 Dec 1 12:05 examples/isthmus-api/substrait.plan |
| 44 | +``` |
| 45 | + |
| 46 | +Please see the code comments for details of how the conversion is done. |
| 47 | + |
| 48 | +## Creating SQL from a Substrait Plan |
| 49 | + |
| 50 | +To run [`ToSql.java`](./src/main/java/io/substrait/examples/ToSql.java) from the root of this repository |
| 51 | +`subtrait.plan` is the name of file to be read - and probably will be the first created with `FromSql`. |
| 52 | + |
| 53 | +```bash |
| 54 | +./gradlew examples:isthmus-api:run --args "ToSql substrait.plan" |
| 55 | + |
| 56 | +> Task :examples:isthmus-api:run |
| 57 | +Reading from substrait.plan |
| 58 | +Plan{version=Version{major=0, minor=77, patch=0, producer=isthmus}, roots=[Root{input=Sort{input=Aggregate{input=Project{remap=Remap{indices=[15]}, input=Filter{input=Join{left=NamedScan{initialSchema=NamedStruct{struct=Struct{nullable=false, fields=[VarChar{nullable=true, length=15}, VarChar{nullable=true, length=40}, VarChar{nullable=true, length=40}, VarChar{nullable=true, length=15}, VarChar{nullable=true, length=15}, I32{nullable=true}, VarChar{nullable=true, length=15}]}, names=[vehicle_id, make, model, colour, fuel_type, cylinder_capacity, first_use_date]}, names=[vehicles]}, right=NamedScan{initialSchema=NamedStruct{struct=Struct{nullable=false, fields=[VarChar{nullable=true, length=15}, VarChar{nullable=true, length=15}, VarChar{nullable=true, length=20}, VarChar{nullable=true, length=20}, VarChar{nullable=true, length=20}, VarChar{nullable=true, length=15}, I32{nullable=true}, VarChar{nullable=true, length=15}]}, names=[test_id, vehicle_id, test_date, test_class, test_type, test_result, test_mileage, postcode_area]}, names=[tests]}, condition=ScalarFunctionInvocation{declaration=equal:any_any, arguments=[FieldReference{segments=[StructField{offset=0}], type=VarChar{nullable=true, length=15}}, FieldReference{segments=[StructField{offset=8}], type=VarChar{nullable=true, length=15}}], options=[], outputType=Bool{nullable=true}}, joinType=INNER}, condition=ScalarFunctionInvocation{declaration=equal:any_any, arguments=[FieldReference{segments=[StructField{offset=12}], type=VarChar{nullable=true, length=15}}, VarCharLiteral{nullable=false, value=P, length=15}], options=[], outputType=Bool{nullable=true}}}, expressions=[FieldReference{segments=[StructField{offset=3}], type=VarChar{nullable=true, length=15}}]}, groupings=[Grouping{expressions=[FieldReference{segments=[StructField{offset=0}], type=VarChar{nullable=true, length=15}}]}], measures=[Measure{function=AggregateFunctionInvocation{declaration=count:, arguments=[], options=[], aggregationPhase=INITIAL_TO_RESULT, sort=[], outputType=I64{nullable=false}, invocation=ALL}}]}, sortFields=[SortField{expr=FieldReference{segments=[StructField{offset=1}], type=I64{nullable=false}}, direction=ASC_NULLS_LAST}]}, names=[COLOUR, COLOURCOUNT]}], expectedTypeUrls=[]} |
| 59 | + |
| 60 | +SELECT `t2`.`colour0` AS `COLOUR`, `t2`.`$f1` AS `COLOURCOUNT` |
| 61 | +FROM (SELECT `vehicles`.`colour` AS `colour0`, COUNT(*) AS `$f1` |
| 62 | +FROM `vehicles` |
| 63 | +INNER JOIN `tests` ON `vehicles`.`vehicle_id` = `tests`.`vehicle_id` |
| 64 | +WHERE `tests`.`test_result` = 'P' |
| 65 | +GROUP BY `vehicles`.`colour` |
| 66 | +ORDER BY COUNT(*) IS NULL, 2) AS `t2` |
| 67 | + |
| 68 | +``` |
| 69 | + |
| 70 | +The SQL statement in the selected dialect will be created (MySql is used in the example). |
0 commit comments