From 85b9b6dd7a8524d249294f5fedcda9e32143af77 Mon Sep 17 00:00:00 2001 From: Eric Willigers Date: Mon, 22 Jun 2026 11:22:55 +1000 Subject: [PATCH] Detect non-Accretive concept exercises --- accretive-audit.md | 122 ++++++++ bin/check-accretive | 276 ++++++++++++++++++ .../ledger-lookout/.meta/exemplar.factor | 2 +- .../passphrase-patrol/.meta/exemplar.factor | 2 +- 4 files changed, 400 insertions(+), 2 deletions(-) create mode 100644 accretive-audit.md create mode 100755 bin/check-accretive diff --git a/accretive-audit.md b/accretive-audit.md new file mode 100644 index 0000000..f09efcf --- /dev/null +++ b/accretive-audit.md @@ -0,0 +1,122 @@ +# Accretiveness audit — Factor concept exercises + +## What "Accretive" means + +A concept exercise ships a stub in which every word is `"unimplemented" throw ;`. +The exercise is **Accretive** if a student who correctly implements the first +*K* tasks (obeying every instruction for those tasks) and leaves the remaining +tasks as stubs sees **all tests for tasks 1..K pass**. + +We want every concept exercise to be Accretive. + +## How an exercise fails to be Accretive + +The whole `-tests.factor` file is compiled as one unit before any test +runs (`[ require ] [ test ] bi` in `lib/exercism-tools/exercism-tools.factor`). +That gives two distinct failure modes. + +**Mode A — compile-time.** A tuple class, tuple slot accessor (`foo>>`/`>>foo`), +constant, symbol, or word referenced anywhere in the tests is missing from the +stub and is introduced by a task *later than task 1*. Until that task is done +the entire test file fails to compile, so **no** task passes. Ordinary words are +safe because the stub pre-declares them all (they only `throw` at runtime); the +trap is a definition a later task adds. When the missing definition is created in +**task 1** (e.g. `lasagna`'s constant, `role-playing-game`'s tuple), the exercise +is still Accretive — doing task 1 makes the file compile. + +**Mode B — runtime.** A test for task *K* calls a word the instructions assign +to a later task *M > K*. The file compiles, but that word is still a stub and +`throw`s, so task *K*'s test fails even though the student did tasks 1..K +correctly. This typically happens for opaque/stateful structures (a queue, set, +disjoint-set, mutable resource, global registry) whose task-*K* behaviour can +only be *observed* through a later task's word. + +## Methodology + +Verified empirically with the factor runtime, via `bin/check-accretive`: + +- **Mode A:** compile the test file against the shipped stub. If it does not + compile, the stub is missing a definition (Mode A — unless that definition is + created in task 1). +- **Mode B:** for each prefix *K*, rebuild the exemplar with every word the + instructions assign to a task *> K* re-stubbed (all tuples/constants/symbols + kept, so compilation always succeeds and runtime ordering is isolated), strip + `STOP-HERE`, run, and check that no test in tasks 1..K fails. + +Run it with `FACTOR=/path/to/factor bin/check-accretive [slug...]`. + +## Results: 11 of 47 are not Accretive + +| Verdict | Count | Exercises | +|---|---|---| +| Accretive | 36 | all others | +| Mode A | 5 | bering-bearings, boatswains-bilge, dragons-descendants, factory-failsafe, pirates-path | +| Mode B | 5 | garden-gathering, lighthouse-logbook, poetry-club, quayside-crew, tellers-triage | +| Structural | 1 | telegraphers-tape | + +## Mode A — a definition is introduced after task 1 + +- **factory-failsafe** — `ERROR: machine-error` is created in task 3, but the + tests reference `machine-error` / `machine-error?` (tasks 3–4). The stub only + has a comment, so tasks 1–2 cannot compile. + *Fix:* pre-ship the error class in the stub, exactly as `rpn-calculator` + already ships `zero-divisor-error`. +- **pirates-path** — `gold-count` (a `MEMO:` word, task 4) is left as a comment + in the stub instead of a stub body, so the tests reference an undefined + `gold-count` and nothing compiles. + *Fix:* pre-stub it, e.g. `MEMO: gold-count ( cove -- n ) "unimplemented" throw ;`. +- **dragons-descendants** — the subtuples `fire-dragon` / `ice-dragon` / + `volcano-dragon` and their constructors are defined across tasks 2–4 (the stub + is comment-only), so implementing only task 1 leaves them undefined. + *Fix:* bundle all tuple/constructor definitions into task 1. +- **bering-bearings** — tuples `polar` (task 2) and `relative` (task 3), plus the + direction symbols and the `>cartesian`/`flip` generics, are scattered across + tasks (comment-only stub). + *Fix:* bundle all tuple/symbol/generic definitions into task 1. +- **boatswains-bilge** — task 5's `valve` tuple, `` constructor, and + `is-open` slot are not declared; the tests reference ``/`is-open>>`, so + tasks 1–4 cannot compile. (Tasks 1–4 themselves use a `test-pump` fixture + defined inside the test file and are otherwise fine.) + *Fix:* pre-ship the valve skeleton in the stub — + `TUPLE: valve < disposable is-open ;`, a stubbed ``, and `M: valve + dispose* drop ;` — so the file compiles; task 5 fills in the bodies. + +## Mode B — an early task's test calls a later task's word + +Each of these is a test-design issue: the test for an early task observes its +result through a word the student has not been asked to write yet. Fix by +rewriting the early-task test to observe the early word directly, or by +reordering tasks. + +- **tellers-triage** — task 1 is `{ { } } [ new-queue serve-all ]`, which needs + `serve-all` (task 4); task 2's tests also use `serve-all`/`next-name`. +- **lighthouse-logbook** — a task-1 test `[ empty-log dup "x" sight ]` (checking + each log is fresh) needs `sight` (task 2). +- **garden-gathering** — a task-2 test uses `release` (task 3) to check that ids + keep increasing after a release. +- **poetry-club** — task 1's test uses `circle-of` (task 3); task 2's test uses + `same-circle?` (task 4). A disjoint-set is only observable through those. +- **quayside-crew** — a task-3 test `[ 5 over hoist-crate tonnage>> ... ]` + uses `hoist-crate` (task 4) to show cranes are independent. + +## Structural + +- **telegraphers-tape** — the tests use descriptive `print` labels, not `TASK:` + markers, and each test is end-to-end (construct + read + dispose together), so + no prefix of tasks passes on its own. + *Fix (larger):* regroup the tests under per-task `TASK:` markers, ordered so + each task's tests need only words from that task and earlier; or accept it as a + documented exemption. + +## Suggested track-wide rules (to keep new exercises Accretive) + +1. The stub must make the test file compile on its own — pre-declare every + tuple (with all slots), constant, symbol, error class, and word the tests + reference (words as `"unimplemented" throw ;`). This makes Mode A impossible + and is checkable with no per-exercise metadata. +2. Each task's tests must exercise only words from that task and earlier tasks + (no Mode B). +3. Every exemplar should import `kernel` (so a re-stubbed body's `throw` + resolves; also assumed by `bin/check-accretive`). + +`bin/check-accretive` enforces (1) and (2) and can run in CI. diff --git a/bin/check-accretive b/bin/check-accretive new file mode 100755 index 0000000..e400fed --- /dev/null +++ b/bin/check-accretive @@ -0,0 +1,276 @@ +#!/usr/bin/env python3 +"""Check that concept exercises are "Accretive". + +An exercise is Accretive if a student who correctly implements the first K +tasks (leaving the rest as `"unimplemented" throw ;` stubs) sees every test +for tasks 1..K pass. Two ways this fails: + + Mode A (compile-time): the test file references a tuple/slot/constant/symbol + /word that the starting stub does not declare, so until the task that adds + it is done the whole test file fails to compile and NO task passes. (Words + are normally safe because the stub pre-declares them all; the trap is a + definition introduced by a task later than task 1.) Detected here by + compiling the test file against the shipped stub. + + Mode B (runtime): a test for task K calls a word the instructions assign to a + later task M>K, so that test throws "unimplemented" even when tasks 1..K are + done correctly. Detected here by, for each prefix K, rebuilding the exemplar + with every later-task word re-stubbed and checking no task<=K test fails. + +Usage: + bin/check-accretive [slug...] # default: all concept exercises +Environment: + FACTOR path to the Factor executable (default: factor) + +This script only writes to a fresh mktemp dir; it never touches the checkout. +""" +import json, os, re, shutil, subprocess, sys, tempfile + +REPO = os.path.dirname(os.path.dirname(os.path.abspath(__file__))) +FACTOR = os.environ.get("FACTOR", "factor") +TOOLS = os.path.join(REPO, "lib/exercism-tools") +CONCEPT = os.path.join(REPO, "exercises/concept") + +# Intended task ownership: word -> task number it is implemented in, per the +# instructions. Only body-bearing words are listed (these are what we re-stub); +# tuples/constants/symbols are left intact so Mode-B runs always compile and we +# isolate runtime ordering. Keep this in sync with the exercises' instructions. +MAPS = { + "annalyns-infiltration": {"can-do-fast-attack":1,"can-spy":2,"can-signal-prisoner":3,"can-free-prisoner":4}, + "backyard-birdcount": {"today":1,"increment-day-count":2,"has-day-without-birds?":3,"total":4,"busy-days":5}, + "backyard-birdwatcher": {"today":1,"increment-todays-count":2,"has-day-without-birds?":3,"count-for-first-days":4,"busy-days":5,"pad-missing-days":6}, + "belgian-boxcars": {"couple":1,"peek-couplings":2,"split-at-junctions":3,"coalesce-cargo":4}, + "boardwalk-games": {"roll-die":1,"pick-prize":2,"shuffle-deck":3,"deal-hand":4,"play-seeded":5}, + "bosuns-briefing": {"greeting":1,"crew-line":2,"closing":3,"roster":4,"briefing":5}, + "boutique-bookkeeping": {"sort-by-price":1,"with-missing-price":2,"expensive-items":3,"cheapest-item":4,"total-price":5,"format-price-tag":6}, + "bunting-bonanza": {"alphabet-bunting":1,"counting-bunting":2,"stripe-bunting":3,"marker-bunting":4,"valley-bunting":5}, + "cargo-shuffle": {"swap-crates":1,"clear-spill":2,"peek-under":3,"tidy-deck":4}, + "cars-assemble": {"production-status":1,"success-rate":2,"production-rate-per-hour":3,"working-items-per-minute":4}, + "channel-chatter": {"hear-out":1,"count-messages":2,"echo-back":3,"broadcast":4,"capture":5}, + "character-study": {"compare-chars":1,"size-of-char":2,"change-size-of-char":3,"type-of-char":4}, + "coordinate-choreography": {"translate-2d":1,"scale-2d":2,"compose-transformations":3,"apply-transformation":4,"transform-points":5}, + "currency-conversion": {"exchange-money":1,"get-change":2,"value-of-bills":3,"number-of-bills":4,"leftover-of-bills":5,"exchangeable-value":6,"safe-change":7,"cap-spend":8}, + "factory-failsafe": {"check-humidity":1,"check-temperature":2,"monitor":4,"monitor-batch":5}, + "ferry-schedule": {"make-date":1,"weekday-name":2,"leap?":3,"month-length":4,"add-days":5}, + "garden-gathering": {"open-garden":1,"list-registrations":1,"register":2,"release":3,"get-registration":4,"find-by-name":5}, + "high-school-sweetheart": {"cleanupname":1,"firstletter":2,"initial":3,"couple":4}, + "joiners-journey": {"with-kerf":1,"kerf-and-finish":2,"cut-card":3,"per-piece":4,"compare-bolts":5}, + "lap-leaderboard": {"assign-bibs":1,"lane-labels":2,"tag-racers":3,"record-finishes":4,"lap-bells":5}, + "lasagna": {"preparation-time":2,"remaining-time":3,"total-working-time":4}, + "lasagna-luminary": {"cooking-status":1,"preparation-time":2,"quantities":3,"add-secret-ingredient":4,"scale-recipe":5}, + "ledger-lookout": {"valid-amount?":1,"dollar-amounts":2,"percentages":3,"flagged?":4}, + "librarians-ledger": {"protected-balance":1,"running-balance":2,"least-balance-so-far":3,"halve-until":4}, + "lighthouse-logbook": {"empty-log":1,"sight":2,"seen?":3,"forget-sighting":4,"unique-count":5,"reachable":6}, + "log-levels": {"message":1,"log-level":2,"reformat":3}, + "mixed-juices": {"time-to-mix-juice":1,"wedges-from-lime":2,"limes-to-cut":2,"order-times":3,"remaining-orders":4}, + "mixtape-maker": {"count-combinations":1,"count-permutations":2,"list-combinations":3,"list-permutations":4,"combinations-summing-to":5}, + "mosaic-making": {"tile-strip":1,"row-of-three":2,"combine-rows":3,"mirror-row":4,"tile-position":5,"has-colour?":6}, + "mosaic-mischief": {"fresh-mosaic":1,"place-tile":2,"chip-tile":3,"recolour-tile":4,"snapshot-mosaic":5,"stash-tile":6,"return-tile":7}, + "passphrase-patrol": {"valid-badge?":1,"badge-codes":2,"digit-count":3,"redact":4}, + "pirates-path": {"tide-queue":1,"coves-reachable":2,"hop-count":3,"gold-count":4,"treasure-route":5}, + "poetry-club": {"new-club":1,"collaborate":2,"circle-of":3,"same-circle?":4}, + "pursers-pantry": {"create-inventory":1,"add-items":2,"decrement-items":3,"remove-item":4,"list-inventory":5}, + "quayside-crew": {"weigh-crate":1,"weigh-all":2,"":3,"hoist-crate":4,"crane-tonnage":5,"load-cargo":6}, + "role-playing-game": {"introduce":2,"revive":3,"take-damage":4}, + "rpn-calculator": {"add-op":1,"multiply-op":2,"apply-op":3,"evaluate":4,"evaluate-named":5,"divide-op":6}, + "secrets": {"shift-back":1,"set-bits":2,"flip-bits":3,"clear-bits":4}, + "signalers-satchel": {"quote-value":1,"flag-body":2,"split-flag":3,"triangulate":4,"triangle-stats":5}, + "tellers-triage": {"new-queue":1,"join-queue":2,"next-name":3,"serve-all":4}, + "valentines-day": {"rate-restaurant":1,"rate-movie":2,"rate-walk":3,"rate-activity":4,"approval-counts":5}, + "vltava-weather-watch": {"read-readings":1,"latest-reading":2,"log-text":3,"record-reading":4,"rewrite-log":5}, +} + +# Exercises whose Mode-B prefix run we cannot synthesize automatically (MACRO: +# bodies cannot be re-stubbed with a throw, and exercises without TASK: markers +# have no prefixes). Mode A is still checked for them. +MODEB_SKIP = {"signal-stencils", "telegraphers-tape", "bering-bearings", + "boatswains-bilge", "dragons-descendants"} + +# The shipped stub does not make the test file compile, but the missing +# definition is created in task 1, so the exercise is still accretive by the +# implement-first-K-tasks rule (a student who does task 1 sees task 1 pass). +ACCRETIVE_VIA_TASK1 = {"lasagna", "role-playing-game"} + +DEF_STARTERS = {":","::","M:","M::","TYPED:","TYPED::","MEMO:","MEMO::"} + + +def tokenize(text): + i, n = 0, len(text) + while i < n: + c = text[i] + if c.isspace(): + i += 1; continue + if c == '"': + j = i + 1 + while j < n: + if text[j] == '\\': j += 2; continue + if text[j] == '"': j += 1; break + j += 1 + yield (text[i:j], i, j); i = j; continue + j = i + while j < n and not text[j].isspace(): + j += 1 + yield (text[i:j], i, j); i = j + + +def restub(text, words_to_stub): + toks = list(tokenize(text)) + edits, k = [], 0 + while k < len(toks): + t = toks[k][0] + if t in DEF_STARTERS: + if t in ("M:", "M::"): + if k + 2 >= len(toks): k += 1; continue + name, after = toks[k + 2][0], k + 3 + else: + if k + 1 >= len(toks): k += 1; continue + name, after = toks[k + 1][0], k + 2 + body_start = after + if after < len(toks) and toks[after][0] == "(": + m = after + while m < len(toks) and toks[m][0] != ")": + m += 1 + body_start = m + 1 + term = body_start + while term < len(toks) and toks[term][0] != ";": + term += 1 + if term < len(toks) and name in words_to_stub and body_start < len(toks): + edits.append((toks[body_start][1], toks[term][1])) + k = term + 1 + continue + k += 1 + for body_off, semi_off in sorted(edits, reverse=True): + text = text[:body_off] + ' "unimplemented" throw ' + text[semi_off:] + return text + + +def ensure_kernel(text): + def repl(mo): + body = mo.group(1) + return mo.group(0) if re.search(r'\bkernel\b', body) else "USING: " + body.strip() + " kernel ;" + new, n = re.subn(r'USING:\s+(.*?);', repl, text, count=1, flags=re.S) + return new if n else "USING: kernel ;\n" + text + + +def task_ranges(tests_text): + out = [] + for idx, ln in enumerate(tests_text.splitlines(), start=1): + m = re.match(r'\s*TASK:\s+(\d+)\b', ln) + if m: + out.append((idx, int(m.group(1)))) + return out + + +def line_to_task(line, ranges): + task = None + for start, t in ranges: + if line >= start: task = t + else: break + return task + + +def setup(slug, write_solutions=None): + """Make a temp run dir; optionally overwrite solution files. Returns (dir, cfg).""" + src = os.path.join(CONCEPT, slug) + cfg = json.load(open(os.path.join(src, ".meta/config.json"))) + work = tempfile.mkdtemp(prefix=f"acc-{slug}-") + shutil.copytree(os.path.join(src, slug), os.path.join(work, slug)) + shutil.copytree(TOOLS, os.path.join(work, "exercism-tools")) + if write_solutions: + for rel, content in write_solutions.items(): + with open(os.path.join(work, rel), "w") as f: + f.write(content) + # strip STOP-HERE in every test file + for rel in cfg["files"]["test"]: + p = os.path.join(work, rel) + txt = "\n".join(l for l in open(p).read().splitlines() if "STOP-HERE" not in l) + open(p, "w").write(txt) + return work, cfg + + +def run(work, slug): + p = subprocess.run([FACTOR, "-roots=.", "-run=exercism-tools", slug], + cwd=work, capture_output=True, text=True, timeout=300) + return p.stdout + p.stderr + + +def stub_compiles(slug): + work, _ = setup(slug) + try: + out = run(work, slug) + return "Unit Test:" in out + finally: + shutil.rmtree(work, ignore_errors=True) + + +def modeb_violations(slug): + """Return dict K -> list of early task numbers whose tests fail, or {} if clean.""" + src = os.path.join(CONCEPT, slug) + cfg = json.load(open(os.path.join(src, ".meta/config.json"))) + wmap = MAPS[slug] + N = max(wmap.values()) + sols, exes = cfg["files"]["solution"], cfg["files"].get("exemplar") or cfg["files"]["example"] + test_base = os.path.basename(cfg["files"]["test"][0]) + bad = {} + for K in range(1, N): + to_stub = {w for w, t in wmap.items() if t > K} + writes = {} + for sol, exe in zip(sols, exes): + writes[sol] = ensure_kernel(restub(open(os.path.join(src, exe)).read(), to_stub)) + work, _ = setup(slug, writes) + try: + ttext = "\n".join(l for l in open(os.path.join(work, cfg["files"]["test"][0])).read().splitlines()) + ranges = task_ranges(ttext) + out = run(work, slug) + if "Unit Test:" not in out: + bad[K] = ["COMPILE-FAIL"]; continue + fl = [int(m.group(1)) for m in re.finditer(re.escape(test_base) + r":\s*(\d+)", out)] + early = sorted({line_to_task(l, ranges) for l in fl if line_to_task(l, ranges) and line_to_task(l, ranges) <= K}) + if early: + bad[K] = early + finally: + shutil.rmtree(work, ignore_errors=True) + return bad + + +def main(): + slugs = sys.argv[1:] or sorted(os.listdir(CONCEPT)) + failures = 0 + for slug in slugs: + if not os.path.isdir(os.path.join(CONCEPT, slug)): + continue + compiles = stub_compiles(slug) + notes = [] + verdict = "ACCRETIVE" + # Structural: no TASK: markers means tests are not grouped by task and + # cannot be passed task-by-task. + ttext = open(os.path.join(CONCEPT, slug, json.load(open(os.path.join(CONCEPT, slug, ".meta/config.json")))["files"]["test"][0])).read() + if not task_ranges(ttext): + verdict = "STRUCTURAL" + notes.append("tests are not grouped by TASK: marker, so no prefix of tasks can pass on its own") + elif not compiles: + if slug in ACCRETIVE_VIA_TASK1: + notes.append("stub omits a definition that task 1 creates; accretive by the implement-first-K rule") + else: + verdict = "MODE-A" + notes.append("stub does not pre-declare everything the test file needs (a " + "tuple/slot/constant/symbol/word is introduced by a task later than task 1)") + if slug in MODEB_SKIP: + notes.append("Mode-B prefix run skipped (macros / no TASK: markers); checked by hand") + elif slug in MAPS: + bad = modeb_violations(slug) + if bad: + verdict = "MODE-A+B" if verdict == "MODE-A" else "MODE-B" + for K in sorted(bad): + notes.append(f"prefix K={K}: task(s) {bad[K]} fail (an earlier task's test calls a later task's word)") + if verdict != "ACCRETIVE": + failures += 1 + print(f"{slug:28s} {verdict}") + for nt in notes: + print(f" - {nt}") + print(f"\n{failures} non-accretive of {len([s for s in slugs if os.path.isdir(os.path.join(CONCEPT, s))])} checked") + sys.exit(1 if failures else 0) + + +if __name__ == "__main__": + main() diff --git a/exercises/concept/ledger-lookout/.meta/exemplar.factor b/exercises/concept/ledger-lookout/.meta/exemplar.factor index bf919e0..65dc235 100644 --- a/exercises/concept/ledger-lookout/.meta/exemplar.factor +++ b/exercises/concept/ledger-lookout/.meta/exemplar.factor @@ -1,4 +1,4 @@ -USING: regexp ; +USING: kernel regexp ; IN: ledger-lookout : valid-amount? ( line -- ? ) diff --git a/exercises/concept/passphrase-patrol/.meta/exemplar.factor b/exercises/concept/passphrase-patrol/.meta/exemplar.factor index 2ca5243..a603ab3 100644 --- a/exercises/concept/passphrase-patrol/.meta/exemplar.factor +++ b/exercises/concept/passphrase-patrol/.meta/exemplar.factor @@ -1,4 +1,4 @@ -USING: regexp ; +USING: kernel regexp ; IN: passphrase-patrol : valid-badge? ( badge -- ? )