SONAR Prompt Optimization

Experiments in learning interpretable text prompts via optimization in SONAR embedding space.

Approach

Two-stage optimization system:

Stage 1: Optimize a z embedding vector that generates prompt tokens via SONAR decoder
Stage 2: Use the generated prompt with z=0 (unconditioned) to solve tasks

Key technique: Straight-through gradient estimation via embedding geometry.

Key Finding

Using PPL regularization (weight=0.1) provides stability without dominating the task loss. The optimization achieves 67% accuracy on an antonym completion task, with specific examples (hot -> cold, happy -> sad) consistently failing.

Usage

uv run python scripts/optimize_prompt.py

Structure

scripts/
  optimize_prompt.py    # Main optimization script
src/prompt_interp/      # Package stub
papers/                 # Reference papers (SONAR, EPO, ContextBench)

Requirements

Python 3.12+
CUDA GPU
SONAR (sonar-space)
PyTorch

Install dependencies:

uv sync

Ideas to try:

[done] Add perplexity term.
[done] Each iteration, update z to be the re-encoding.
PCA with different learning rates.
Test if jailbreaks transfer to normal TinyStories models

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
papers		papers
results		results
scripts		scripts
src/prompt_interp		src/prompt_interp
.env.example		.env.example
.gitignore		.gitignore
.python-version		.python-version
CLAUDE.md		CLAUDE.md
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SONAR Prompt Optimization

Approach

Key Finding

Usage

Structure

Requirements

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

SONAR Prompt Optimization

Approach

Key Finding

Usage

Structure

Requirements

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages