fix: harden RDNA device property patch against OOM and attr loss#53
Open
on22s wants to merge 19 commits into
Open
fix: harden RDNA device property patch against OOM and attr loss#53on22s wants to merge 19 commits into
on22s wants to merge 19 commits into
Conversation
System updates can change the ROCm kernel module version without warning, breaking the previously hardcoded rocm6.3 torch wheel. Detect the installed ROCm version from /opt/rocm/.info/version and map to the nearest PyTorch index URL (7.x→rocm7.2, 6.3→rocm6.3, 6.2→rocm6.2.4, fallback rocm6.3). Ref pinokiocomputer/pinokio#1087 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Resolves Dependabot alert for DoS via unbounded multipart part headers. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The previous implementation iterated directly over process.stdout on the calling thread, which blocks cancellation checks and would require select.select() if stderr were separated — an API that does not work on Windows pipes. A dedicated daemon thread now drains stdout into a queue.Queue. The drain loop calls queue.get(timeout=0.05) so it can honour a per-task "cancel" flag via process.terminate() between reads, with no platform-specific I/O multiplexing needed. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds a Preparer tab to the Alexandria web UI allowing users to generate
LoRA training datasets from audiobooks, either one at a time or in a batch.
Backend (app/app.py):
- PreparerConfig, BatchPreparerTask, BatchPreparerRequest Pydantic models
- check_disk_space(), _normalize_filename_tokens(), _fuzzy_score() helpers
- _stream_subprocess_to_logs(): cross-platform stdout capture via thread +
queue.Queue — no select.select(), works on Windows pipes
- /api/preparer/suggest_source — fuzzy-match uploaded EPUB/TXT to audio file
- /api/preparer/start — upload + run preparer script (single file)
- /api/preparer/cancel — send SIGTERM to running preparer
- /api/preparer/list — list generated dataset ZIPs
- /api/preparer/download/{path} — download a dataset ZIP
- /api/preparer/batch/start — queue multiple files, run sequentially
- /api/preparer/batch/cancel — cancel in-progress batch
- 503 guard on both start endpoints when app/alexandria_preparer.py absent
Frontend (app/static/index.html):
- Preparer nav tab (Advanced section)
- Single-mode: file pickers for audio + source, "Match" auto-suggest button
- Batch mode: multi-file picker, auto-match queue table, per-task status badges
- Shared config: language, confidence, min SNR, keep-unaligned toggle
- Live log window with 1 s polling, Cancel button, status message
Tests (app/test_api.py):
- preparer and batch_preparer added to status_known_tasks check
- 8 new quick tests: suggest_source, status, cancel-when-idle, list, download-404,
batch-schema, batch-cancel
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Drop EPUB/ebook source alignment step per upstream feedback (discussion Finrandojin#40). Finrandojin wants a tool for processing audio the user owns, not one that implies extracting from commercial audiobooks. - Remove suggest_source endpoint and all EPUB fuzzy-match logic - Remove source_filename and keep_unaligned from Pydantic models and API - Remove source file picker and Match button from UI - Rename nav label and tab header to "Voice Training Dataset Builder" - Update descriptions to emphasise own recordings / CC audio use cases - Remove test_preparer_suggest_source_no_match (endpoint deleted) Tests: 52 passed, 0 failed Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Persona prompts were hardcoded in generate_personas.py, unlike script generation and review prompts which are user-editable via the UI. Extract persona prompts to persona_prompts.txt, add a loader module following the existing review_prompts.py pattern, and wire into the config API and frontend Prompt Settings section with three new textareas. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add tests for /api/system/stats, persona prompt config roundtrip, persona cancel endpoint, M4B audiobook endpoint, and persona status polling. Update default_prompts test to verify persona prompt fields. 72 tests, all passing in full mode. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Remove unused imports (tempfile, DEFAULT_SYSTEM_PROMPT/DEFAULT_USER_PROMPT), unused constants (VOICES_PATH, BUILTIN_LORA_MANIFEST), redundant _atomic_json_write wrappers in app.py and project.py, obsolete parse_voices.py and its endpoint/test, a debug print in update_chunk, and a bare except clause. Fix all dynamically-generated onclick attributes where JSON.stringify double quotes conflicted with the attribute's double-quote delimiters, silently breaking buttons in saved scripts, voice designer, LoRA datasets, and LoRA models. Also fix loadScript to force full chunk redraw and show a success toast. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Resolve conflicts: keep import signal from PR, keep updateSystemStats from main, combine both. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ss, scope preparer output Re-apply cleanup from 392e8a2 that PR Finrandojin#47 reintroduced (dead imports, unused constants, _atomic_json_write wrapper, parse_voices endpoint, broken onclick quoting, loadScript fixes). Additional fixes to PR Finrandojin#47: - Remove unused _fuzzy_score and _normalize_filename_tokens functions - Replace os.kill(pid, SIGTERM) with process.terminate() for cross-platform - Deduplicate run_process by delegating to _stream_subprocess_to_logs - Remove duplicate check_disk_space definition - Scope preparer output to dedicated preparer_output/ directory - Rename nav tab from "Dataset Builder" to "Preparer" to avoid confusion - Add missing Form import for FastAPI endpoint - Add preparer_output/ to .gitignore Ref Finrandojin#46, Ref Finrandojin#47 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Resolve conflicts: keep _stream_subprocess_to_logs from dev, combine persona + preparer/batch_preparer task names in tests. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Flash and memory-efficient SDPA kernels on ROCm 7.2+ hang during batched attention with left-padded sequences. Disable these backends for batch clone/LoRA methods on AMD GPUs, falling back to the math SDPA kernel. Single generation and NVIDIA users are unaffected. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The batch generation "hang" on ROCm 7.x was caused by the GPU's DPM controller aggressively downclocking the shader engine between autoregressive steps, not an SDPA kernel bug. Setting the COMPUTE power profile (pp_power_profile_mode=5) enforces a min clock floor and resolves the issue. - Remove SDPA backend disable workaround (was masking the real cause) - Remove GPU keepalive thread (unnecessary with COMPUTE profile) - Fix batch warmup to use CustomVoice model (Base model lacks custom voice speakers, causing warmup to fail silently) - Add RDNA2/3 CU count and warp size correction (ROCm reports half the CUs and warp_size=32 instead of 64 on consumer GPUs) - Document COMPUTE power profile fix in README Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Three bugs in _patch_rdna_device_properties and the batch warmup paths:
1. Device key normalization: int(device) raises TypeError when called
with a torch.device object (e.g. torch.device('cuda:0')). Use
dev.index instead, falling back to current_device() for unindexed
devices and 0 on any error.
2. SimpleNamespace proxy silently drops C-extension attributes that
don't appear in dir(props). Replace with _RDNADeviceProps, a thin
proxy that delegates all attribute lookups to the real props object
and only overrides multi_processor_count and warp_size. Future
PyTorch device property fields are forwarded automatically.
3. _local_batch_clone and _local_batch_lora both loaded the CustomVoice
model as the warmup target while the clone/LoRA model was already in
VRAM. Having two full models resident simultaneously causes OOM on
12–16 GB cards. Use the already-loaded model for warmup instead.
For the LoRA path, move the warmup block inside the adapter loop so
the LoRA model is resident before warmup runs.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Three bugs in
_patch_rdna_device_propertiesand the batch warmup paths:Device key TypeError —
int(device)raisesTypeErrorwhenget_device_propertiesis called with atorch.deviceobject (e.g.torch.device('cuda:0')). Fixed by usingdev.index, with fallbacks for unindexed devices and errors.SimpleNamespacesilently drops C-extension attrs —dir(props)does not enumerate all C-extension attributes on the real props object, so any property not surfaced bydir()is missing from the patched result. Replaced with_RDNADeviceProps, a thin proxy that delegates all attribute lookups to the realpropsobject and only overridesmulti_processor_countandwarp_size. Future PyTorch device property fields are forwarded automatically with no maintenance required.Warmup loads second model into VRAM (OOM) —
_local_batch_cloneand_local_batch_loraboth calledself._init_local_custom()as the warmup model target while the clone/LoRA model was already loaded. Two full models resident simultaneously causes OOM on 12–16 GB cards. Fixed by using the already-loaded model for warmup. For the LoRA path, the warmup block is moved inside the adapter loop so the LoRA model is resident before warmup runs.Test plan
get_device_propertiescalled withtorch.device('cuda:0')no longer raisesTypeErroron RDNA hardwareprops.total_memory,props.name) correctly to the underlying device🤖 Generated with Claude Code