Skip to content

fix: harden RDNA device property patch against OOM and attr loss#53

Open
on22s wants to merge 19 commits into
Finrandojin:devfrom
on22s:fix/rdna-device-props
Open

fix: harden RDNA device property patch against OOM and attr loss#53
on22s wants to merge 19 commits into
Finrandojin:devfrom
on22s:fix/rdna-device-props

Conversation

@on22s

@on22s on22s commented Jun 4, 2026

Copy link
Copy Markdown

Summary

Three bugs in _patch_rdna_device_properties and the batch warmup paths:

  • Device key TypeErrorint(device) raises TypeError when get_device_properties is called with a torch.device object (e.g. torch.device('cuda:0')). Fixed by using dev.index, with fallbacks for unindexed devices and errors.

  • SimpleNamespace silently drops C-extension attrsdir(props) does not enumerate all C-extension attributes on the real props object, so any property not surfaced by dir() is missing from the patched result. Replaced with _RDNADeviceProps, a thin proxy that delegates all attribute lookups to the real props object and only overrides multi_processor_count and warp_size. Future PyTorch device property fields are forwarded automatically with no maintenance required.

  • Warmup loads second model into VRAM (OOM)_local_batch_clone and _local_batch_lora both called self._init_local_custom() as the warmup model target while the clone/LoRA model was already loaded. Two full models resident simultaneously causes OOM on 12–16 GB cards. Fixed by using the already-loaded model for warmup. For the LoRA path, the warmup block is moved inside the adapter loop so the LoRA model is resident before warmup runs.

Test plan

  • Verify get_device_properties called with torch.device('cuda:0') no longer raises TypeError on RDNA hardware
  • Verify patched props object forwards arbitrary attributes (e.g. props.total_memory, props.name) correctly to the underlying device
  • Verify batch clone generation completes on a 12–16 GB RDNA card without OOM (previously required a restart after warmup)
  • Verify batch LoRA generation completes with warmup running against the LoRA model, not a separately loaded CustomVoice model

🤖 Generated with Claude Code

Finrandojin and others added 19 commits May 28, 2026 12:19
System updates can change the ROCm kernel module version without warning,
breaking the previously hardcoded rocm6.3 torch wheel. Detect the installed
ROCm version from /opt/rocm/.info/version and map to the nearest PyTorch
index URL (7.x→rocm7.2, 6.3→rocm6.3, 6.2→rocm6.2.4, fallback rocm6.3).

Ref pinokiocomputer/pinokio#1087

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Resolves Dependabot alert for DoS via unbounded multipart part headers.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The previous implementation iterated directly over process.stdout on the
calling thread, which blocks cancellation checks and would require
select.select() if stderr were separated — an API that does not work on
Windows pipes.

A dedicated daemon thread now drains stdout into a queue.Queue. The drain
loop calls queue.get(timeout=0.05) so it can honour a per-task "cancel"
flag via process.terminate() between reads, with no platform-specific I/O
multiplexing needed.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds a Preparer tab to the Alexandria web UI allowing users to generate
LoRA training datasets from audiobooks, either one at a time or in a batch.

Backend (app/app.py):
- PreparerConfig, BatchPreparerTask, BatchPreparerRequest Pydantic models
- check_disk_space(), _normalize_filename_tokens(), _fuzzy_score() helpers
- _stream_subprocess_to_logs(): cross-platform stdout capture via thread +
  queue.Queue — no select.select(), works on Windows pipes
- /api/preparer/suggest_source  — fuzzy-match uploaded EPUB/TXT to audio file
- /api/preparer/start           — upload + run preparer script (single file)
- /api/preparer/cancel          — send SIGTERM to running preparer
- /api/preparer/list            — list generated dataset ZIPs
- /api/preparer/download/{path} — download a dataset ZIP
- /api/preparer/batch/start     — queue multiple files, run sequentially
- /api/preparer/batch/cancel    — cancel in-progress batch
- 503 guard on both start endpoints when app/alexandria_preparer.py absent

Frontend (app/static/index.html):
- Preparer nav tab (Advanced section)
- Single-mode: file pickers for audio + source, "Match" auto-suggest button
- Batch mode: multi-file picker, auto-match queue table, per-task status badges
- Shared config: language, confidence, min SNR, keep-unaligned toggle
- Live log window with 1 s polling, Cancel button, status message

Tests (app/test_api.py):
- preparer and batch_preparer added to status_known_tasks check
- 8 new quick tests: suggest_source, status, cancel-when-idle, list, download-404,
  batch-schema, batch-cancel

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Drop EPUB/ebook source alignment step per upstream feedback (discussion
Finrandojin#40). Finrandojin wants a tool for processing audio the user owns, not
one that implies extracting from commercial audiobooks.

- Remove suggest_source endpoint and all EPUB fuzzy-match logic
- Remove source_filename and keep_unaligned from Pydantic models and API
- Remove source file picker and Match button from UI
- Rename nav label and tab header to "Voice Training Dataset Builder"
- Update descriptions to emphasise own recordings / CC audio use cases
- Remove test_preparer_suggest_source_no_match (endpoint deleted)

Tests: 52 passed, 0 failed

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Persona prompts were hardcoded in generate_personas.py, unlike script
generation and review prompts which are user-editable via the UI. Extract
persona prompts to persona_prompts.txt, add a loader module following the
existing review_prompts.py pattern, and wire into the config API and
frontend Prompt Settings section with three new textareas.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add tests for /api/system/stats, persona prompt config roundtrip,
persona cancel endpoint, M4B audiobook endpoint, and persona status
polling. Update default_prompts test to verify persona prompt fields.
72 tests, all passing in full mode.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Remove unused imports (tempfile, DEFAULT_SYSTEM_PROMPT/DEFAULT_USER_PROMPT),
unused constants (VOICES_PATH, BUILTIN_LORA_MANIFEST), redundant
_atomic_json_write wrappers in app.py and project.py, obsolete
parse_voices.py and its endpoint/test, a debug print in update_chunk,
and a bare except clause.

Fix all dynamically-generated onclick attributes where JSON.stringify
double quotes conflicted with the attribute's double-quote delimiters,
silently breaking buttons in saved scripts, voice designer, LoRA
datasets, and LoRA models. Also fix loadScript to force full chunk
redraw and show a success toast.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Resolve conflicts: keep import signal from PR, keep updateSystemStats
from main, combine both.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ss, scope preparer output

Re-apply cleanup from 392e8a2 that PR Finrandojin#47 reintroduced (dead imports,
unused constants, _atomic_json_write wrapper, parse_voices endpoint,
broken onclick quoting, loadScript fixes).

Additional fixes to PR Finrandojin#47:
- Remove unused _fuzzy_score and _normalize_filename_tokens functions
- Replace os.kill(pid, SIGTERM) with process.terminate() for cross-platform
- Deduplicate run_process by delegating to _stream_subprocess_to_logs
- Remove duplicate check_disk_space definition
- Scope preparer output to dedicated preparer_output/ directory
- Rename nav tab from "Dataset Builder" to "Preparer" to avoid confusion
- Add missing Form import for FastAPI endpoint
- Add preparer_output/ to .gitignore

Ref Finrandojin#46, Ref Finrandojin#47

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Resolve conflicts: keep _stream_subprocess_to_logs from dev,
combine persona + preparer/batch_preparer task names in tests.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Flash and memory-efficient SDPA kernels on ROCm 7.2+ hang during
batched attention with left-padded sequences. Disable these backends
for batch clone/LoRA methods on AMD GPUs, falling back to the math
SDPA kernel. Single generation and NVIDIA users are unaffected.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The batch generation "hang" on ROCm 7.x was caused by the GPU's DPM
controller aggressively downclocking the shader engine between
autoregressive steps, not an SDPA kernel bug. Setting the COMPUTE
power profile (pp_power_profile_mode=5) enforces a min clock floor
and resolves the issue.

- Remove SDPA backend disable workaround (was masking the real cause)
- Remove GPU keepalive thread (unnecessary with COMPUTE profile)
- Fix batch warmup to use CustomVoice model (Base model lacks custom
  voice speakers, causing warmup to fail silently)
- Add RDNA2/3 CU count and warp size correction (ROCm reports half
  the CUs and warp_size=32 instead of 64 on consumer GPUs)
- Document COMPUTE power profile fix in README

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Three bugs in _patch_rdna_device_properties and the batch warmup paths:

1. Device key normalization: int(device) raises TypeError when called
   with a torch.device object (e.g. torch.device('cuda:0')).  Use
   dev.index instead, falling back to current_device() for unindexed
   devices and 0 on any error.

2. SimpleNamespace proxy silently drops C-extension attributes that
   don't appear in dir(props).  Replace with _RDNADeviceProps, a thin
   proxy that delegates all attribute lookups to the real props object
   and only overrides multi_processor_count and warp_size.  Future
   PyTorch device property fields are forwarded automatically.

3. _local_batch_clone and _local_batch_lora both loaded the CustomVoice
   model as the warmup target while the clone/LoRA model was already in
   VRAM.  Having two full models resident simultaneously causes OOM on
   12–16 GB cards.  Use the already-loaded model for warmup instead.
   For the LoRA path, move the warmup block inside the adapter loop so
   the LoRA model is resident before warmup runs.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@on22s on22s changed the base branch from main to dev June 5, 2026 12:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants