Skip to content

ADR: Declarative workflow DSL for pipeline jobs #310

@em3s

Description

@em3s

Status

Accepted

Context

Actionbase has data processing jobs like async processor and HBase bulk loader. We needed a declarative way to define and run them — that's the purpose of this pipeline module.

The pipeline/ module provides a declarative format and a runner abstraction for actionbase batch jobs. This is an internal tool, not a general-purpose framework.

The format borrows only the core ideas from GitHub Actions: DAG / dependencies, ${{ }} expressions, versioned external references. Other GHA features, including the marketplace, are out of scope.

For runners, we ship EmbeddedRunner (tests) and LocalRunner (local execution). Jenkins, Airflow, and similar systems are out of scope.

Decision

Naming

  • Workflow — DAG of jobs (YAML)
  • Job — a single process
  • Runner — executor

Workflow YAML

name: spark-pi

# artifact version: latest / 0.x / 0.3.x / 0.3.1 (pinned)

env:
  samples: "1000000"

presets:
  spark-small:
    driver-memory: 1g
    executor-memory: 2g

jobs:
  pi:
    kind: spark
    artifact: "com.kakao.actionbase:pipeline:0.x"
    mainClass: SparkPiJob
    args:
      samples: "${{ env.samples }}"
    submit:
      $extends: ${{ presets.spark-small }}
      conf:
        spark.sql.shuffle.partitions: 8

  report:
    kind: bash
    needs: [pi]
    when: "${{ needs.pi.result == 'success' }}"
    run: 'echo "pi ≈ ${{ needs.pi.outputs.estimate }}"'

Job kind

kind Fields What runs
spark artifact, mainClass, args, submit spark-submit
bash run direct exec

artifact is a Gradle coordinate (group:name:version); the runner resolves it and fetches the JAR. submit is a nested map passed through to the spark-submit CLI as-is.

Expressions

${{ <expr> }} substitutes a value at any position. The result may be a scalar, a map, or a list.

Vocabulary Meaning
env.<key> workflow env
needs.<id>.result system result (success / failure / skipped / cancelled)
needs.<id>.outputs.<key> data emitted by an upstream job
presets.<name> reference an entry in this document's presets: section
load('<path>') load another YAML file

when: is a job's execution guard (boolean expression). Empty means always run.

$extends

submit:
  $extends: ${{ presets.spark-small }}                 # in-doc
  # or ${{ load('presets/spark-small.yaml') }}          # file
  conf:
    spark.sql.shuffle.partitions: 8

Deep merge$extends provides defaults; matching key paths in the surrounding map override. Cycles are rejected.

Keys with a $ prefix are reserved for processor directives (mirroring the JSON Schema $ref convention).

State Store

Jobs may pass data to downstream jobs via needs.<id>.outputs.<key>. The runner manages a simple key-value state store during workflow execution.

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions