ADR: Declarative workflow DSL for pipeline jobs

## Status

Accepted

## Context

Actionbase has data processing jobs like async processor and HBase bulk loader. We needed a declarative way to define and run them — that's the purpose of this pipeline module.

The `pipeline/` module provides a **declarative format** and a **runner abstraction** for actionbase batch jobs. This is an internal tool, not a general-purpose framework.

The format borrows only the core ideas from GitHub Actions: DAG / dependencies, `${{ }}` expressions, versioned external references. Other GHA features, including the marketplace, are out of scope.

For runners, we ship **EmbeddedRunner** (tests) and **LocalRunner** (local execution). Jenkins, Airflow, and similar systems are out of scope.

## Decision

### Naming

- **Workflow** — DAG of jobs (YAML)
- **Job** — a single process
- **Runner** — executor

### Workflow YAML

```yaml
name: spark-pi

# artifact version: latest / 0.x / 0.3.x / 0.3.1 (pinned)

env:
  samples: "1000000"

presets:
  spark-small:
    driver-memory: 1g
    executor-memory: 2g

jobs:
  pi:
    kind: spark
    artifact: "com.kakao.actionbase:pipeline:0.x"
    mainClass: SparkPiJob
    args:
      samples: "${{ env.samples }}"
    submit:
      $extends: ${{ presets.spark-small }}
      conf:
        spark.sql.shuffle.partitions: 8

  report:
    kind: bash
    needs: [pi]
    when: "${{ needs.pi.result == 'success' }}"
    run: 'echo "pi ≈ ${{ needs.pi.outputs.estimate }}"'
```

### Job kind

| `kind` | Fields | What runs |
|---|---|---|
| `spark` | `artifact`, `mainClass`, `args`, `submit` | `spark-submit` |
| `bash`  | `run` | direct exec |

`artifact` is a Gradle coordinate (`group:name:version`); the runner resolves it and fetches the JAR. `submit` is a nested map passed through to the `spark-submit` CLI as-is.

### Expressions

`${{ <expr> }}` substitutes a value at any position. The result may be a scalar, a **map**, or a **list**.

| Vocabulary | Meaning |
|------|------|
| `env.<key>` | workflow env |
| `needs.<id>.result` | system result (`success` / `failure` / `skipped` / `cancelled`) |
| `needs.<id>.outputs.<key>` | data emitted by an upstream job |
| `presets.<name>` | reference an entry in this document's `presets:` section |
| `load('<path>')` | load another YAML file |

`when:` is a job's execution guard (boolean expression). Empty means always run.

### `$extends`

```yaml
submit:
  $extends: ${{ presets.spark-small }}                 # in-doc
  # or ${{ load('presets/spark-small.yaml') }}          # file
  conf:
    spark.sql.shuffle.partitions: 8
```

**Deep merge** — `$extends` provides defaults; matching key paths in the surrounding map override. Cycles are rejected.

Keys with a `$` prefix are reserved for processor directives (mirroring the JSON Schema `$ref` convention).

### State Store

Jobs may pass data to downstream jobs via `needs.<id>.outputs.<key>`. The runner manages a simple key-value state store during workflow execution.

`kind`	Fields	What runs
`spark`	`artifact`, `mainClass`, `args`, `submit`	`spark-submit`
`bash`	`run`	direct exec

Vocabulary	Meaning
`env.<key>`	workflow env
`needs.<id>.result`	system result (`success` / `failure` / `skipped` / `cancelled`)
`needs.<id>.outputs.<key>`	data emitted by an upstream job
`presets.<name>`	reference an entry in this document's `presets:` section
`load('<path>')`	load another YAML file

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ADR: Declarative workflow DSL for pipeline jobs #310

Status

Context

Decision

Naming

Workflow YAML

Job kind

Expressions

`$extends`

State Store

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

ADR: Declarative workflow DSL for pipeline jobs #310

Description

Status

Context

Decision

Naming

Workflow YAML

Job kind

Expressions

$extends

State Store

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

`$extends`