Status
Accepted
Context
Actionbase has data processing jobs like async processor and HBase bulk loader. We needed a declarative way to define and run them — that's the purpose of this pipeline module.
The pipeline/ module provides a declarative format and a runner abstraction for actionbase batch jobs. This is an internal tool, not a general-purpose framework.
The format borrows only the core ideas from GitHub Actions: DAG / dependencies, ${{ }} expressions, versioned external references. Other GHA features, including the marketplace, are out of scope.
For runners, we ship EmbeddedRunner (tests) and LocalRunner (local execution). Jenkins, Airflow, and similar systems are out of scope.
Decision
Naming
- Workflow — DAG of jobs (YAML)
- Job — a single process
- Runner — executor
Workflow YAML
name: spark-pi
# artifact version: latest / 0.x / 0.3.x / 0.3.1 (pinned)
env:
samples: "1000000"
presets:
spark-small:
driver-memory: 1g
executor-memory: 2g
jobs:
pi:
kind: spark
artifact: "com.kakao.actionbase:pipeline:0.x"
mainClass: SparkPiJob
args:
samples: "${{ env.samples }}"
submit:
$extends: ${{ presets.spark-small }}
conf:
spark.sql.shuffle.partitions: 8
report:
kind: bash
needs: [pi]
when: "${{ needs.pi.result == 'success' }}"
run: 'echo "pi ≈ ${{ needs.pi.outputs.estimate }}"'
Job kind
kind |
Fields |
What runs |
spark |
artifact, mainClass, args, submit |
spark-submit |
bash |
run |
direct exec |
artifact is a Gradle coordinate (group:name:version); the runner resolves it and fetches the JAR. submit is a nested map passed through to the spark-submit CLI as-is.
Expressions
${{ <expr> }} substitutes a value at any position. The result may be a scalar, a map, or a list.
| Vocabulary |
Meaning |
env.<key> |
workflow env |
needs.<id>.result |
system result (success / failure / skipped / cancelled) |
needs.<id>.outputs.<key> |
data emitted by an upstream job |
presets.<name> |
reference an entry in this document's presets: section |
load('<path>') |
load another YAML file |
when: is a job's execution guard (boolean expression). Empty means always run.
$extends
submit:
$extends: ${{ presets.spark-small }} # in-doc
# or ${{ load('presets/spark-small.yaml') }} # file
conf:
spark.sql.shuffle.partitions: 8
Deep merge — $extends provides defaults; matching key paths in the surrounding map override. Cycles are rejected.
Keys with a $ prefix are reserved for processor directives (mirroring the JSON Schema $ref convention).
State Store
Jobs may pass data to downstream jobs via needs.<id>.outputs.<key>. The runner manages a simple key-value state store during workflow execution.
Status
Accepted
Context
Actionbase has data processing jobs like async processor and HBase bulk loader. We needed a declarative way to define and run them — that's the purpose of this pipeline module.
The
pipeline/module provides a declarative format and a runner abstraction for actionbase batch jobs. This is an internal tool, not a general-purpose framework.The format borrows only the core ideas from GitHub Actions: DAG / dependencies,
${{ }}expressions, versioned external references. Other GHA features, including the marketplace, are out of scope.For runners, we ship EmbeddedRunner (tests) and LocalRunner (local execution). Jenkins, Airflow, and similar systems are out of scope.
Decision
Naming
Workflow YAML
Job kind
kindsparkartifact,mainClass,args,submitspark-submitbashrunartifactis a Gradle coordinate (group:name:version); the runner resolves it and fetches the JAR.submitis a nested map passed through to thespark-submitCLI as-is.Expressions
${{ <expr> }}substitutes a value at any position. The result may be a scalar, a map, or a list.env.<key>needs.<id>.resultsuccess/failure/skipped/cancelled)needs.<id>.outputs.<key>presets.<name>presets:sectionload('<path>')when:is a job's execution guard (boolean expression). Empty means always run.$extendsDeep merge —
$extendsprovides defaults; matching key paths in the surrounding map override. Cycles are rejected.Keys with a
$prefix are reserved for processor directives (mirroring the JSON Schema$refconvention).State Store
Jobs may pass data to downstream jobs via
needs.<id>.outputs.<key>. The runner manages a simple key-value state store during workflow execution.