Skip to content

Wire memory retrieval through resource scheduler #29

@jaylfc

Description

@jaylfc

Embedding and rerank calls currently talk to rkllama directly via qmd-client. Refactor so the memory_search skill and any RAG path submit Tasks to the scheduler with:

  • capability = Capability.EMBEDDING
  • preferred_resources = [npu-rk3588 with max_wait_ms=200, cpu-inference]
  • priority = Priority.INTERACTIVE_AGENT

This is the load-bearing example in docs/design/resource-scheduler.md §Worked example — chat-while-generating. When image gen holds the NPU, memory lookups should transparently route to CPU embedding in ~500ms rather than blocking behind 34s of SD.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    Status

    Todo

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions