Skip to content
Paul Rogers edited this page May 16, 2019 · 16 revisions

This tutorial shows how to use the Extended Vector Framework to create a simple format plugin. The EVF framework has also been called the "row set framework" and the "new scan framework". Here we focus on using the framework. Other pages in this section provide background information for when you need features beyond those shown here.

The Log Plugin

The Drill log plugin is the focus of this tutorial. A simplified version of this plugin is explained in the Learning Apache Drill book. The version used here is [the one which ships with Drill|https://github.com/apache/drill/tree/master/exec/java-exec/src/main/java/org/apache/drill/exec/store/log].

Current Design

In Drill 1.16 and earlier, the LogRecordReader uses a typical method to write to value vectors using the associated {{Mutator}} class. For example, for a nullable VarChar vector:

  private static class VarCharDefn extends ColumnDefn {

    private NullableVarCharVector.Mutator mutator;

    public VarCharDefn(String name, int index) {
      super(name, index);
    }

    @Override
    public void define(OutputMutator outputMutator) throws SchemaChangeException {
      MaterializedField field = MaterializedField.create(getName(),
          Types.optional(MinorType.VARCHAR));
      mutator = outputMutator.addField(field, NullableVarCharVector.class).getMutator();
    }

    @Override
    public void load(int rowIndex, String value) {
      byte[] bytes = value.getBytes();
      mutator.setSafe(rowIndex, bytes, 0, bytes.length);
    }
  }

Other readers are more clever: the "V2" text reader (Drill 1.16 and earlier) worked with direct memory itself, handling its own buffer allocation, offset vector calculations and so on.

The log reader code uses a {{ColumnDefn}} class to convert from the String value provided by the regex parser to the Java type needed by the {{Mutator}}.

Revised Design

With the EVF, we'll replace the {{Mutator}} with a {{ColumnWriter}}. We'll first do the simplest possible conversion, then look at how to use advanced features, such as type conversions, schemas and table properties.

In order to use the EVF, we must also change the way that the plugin is structured, using the new version of the "easy" plugin implementation to define the reader using the new scan framework.

Revise the Plugin Definition

Convert the Record Reader to a Batch Reader

Convert to use Column Writers

Test

Next Steps

Clone this wiki locally