Wire memory retrieval through resource scheduler

Embedding and rerank calls currently talk to rkllama directly via qmd-client. Refactor so the memory_search skill and any RAG path submit Tasks to the scheduler with:
- capability = Capability.EMBEDDING
- preferred_resources = [npu-rk3588 with max_wait_ms=200, cpu-inference]
- priority = Priority.INTERACTIVE_AGENT

This is the load-bearing example in docs/design/resource-scheduler.md §Worked example — chat-while-generating. When image gen holds the NPU, memory lookups should transparently route to CPU embedding in ~500ms rather than blocking behind 34s of SD.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Wire memory retrieval through resource scheduler #29

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Wire memory retrieval through resource scheduler #29

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions