Semantic quality score is a hardcoded 0.8 placeholder

## Context

SPEC-03 defines Dimension 5 (Semantic Quality, weight 0.20) as an LLM review of the diff that evaluates code quality, readability, and adherence to conventions.

## Current Behavior

`runner.py` hardcodes `dimensions["semantic"] = 0.8`. No LLM call is made.

## Expected Behavior

After a mutation, send the diff to an LLM (via the Reflector or a dedicated scorer) to get a semantic quality assessment. Parse the response into a 0.0-1.0 score.

## Impact

- 20% of the composite score is constant, providing no signal
- All experiments get the same semantic score regardless of quality
- Noted as intentional in IMPL-08 — placeholder until evolution layer LLM review is wired up

## References

- `atlas-specs/03-EVALUATION.md` — Dimension 5: Semantic Quality
- `atlas/evaluation/runner.py` — `dimensions["semantic"] = 0.8`
- `atlas/evolution/reflector.py` — could be extended to also produce a semantic score

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Semantic quality score is a hardcoded 0.8 placeholder #16

Context

Current Behavior

Expected Behavior

Impact

References

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Semantic quality score is a hardcoded 0.8 placeholder #16

Description

Context

Current Behavior

Expected Behavior

Impact

References

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions