Skip to content

Latest commit

 

History

History
197 lines (159 loc) · 3.37 KB

File metadata and controls

197 lines (159 loc) · 3.37 KB

🧠 System Architecture & Data Flow (MVP)


High Level Overview

Frontend  →  Backend API  →  Queue  →  Worker  →  Kernel Memory  →  Vector DB
                      ↑                                             ↓
                File Storage  ←────────────── Processed Chunks ←─────

1. User Upload Flow

  1. User uploads a ZIP file The user uploads a ZIP via the UI. Frontend sends:
POST /api/upload
Body: multipart/form-data

with:

  • The user ID (from authentication)
  • The ZIP file

2. Backend File Intake

Endpoint: POST /api/upload Backend does the following:

2.1. Validate ZIP Checks MIME, size constraints, etc.

2.2. Store ZIP locally For MVP, ZIP is written to: /storage/uploads/{userId}/{jobId}.zip

2.3. Create a Job Record Stored in Postgres or Redis (MVP: Redis Hash):

job:{jobId} =
{
  status: "queued",
  userId: "...",
  filePath: "...",
  createdAt: ...
}

2.4. Push Work Item to Queue (Redis List)

Queue message:
{
  "jobId": "abc123",
  "userId": "t1",
  "zipPath": "/storage/uploads/t1/abc123.zip"
}
Redis:
LPUSH work:ingest "{...json...}"

2.5. Return jobId to frontend Frontend receives: { "jobId": "abc123" }

3. Frontend Polling for Status

Frontend calls: GET /api/upload-status?jobId=abc123

Backend maps this to Redis job state. Statuses include: "queued" "unzipping" "processing" "uploading_to_km" "completed" "failed"

Frontend uses this to show a loading bar.

Polling: every 2 seconds.

4. Worker Service (Background Processor)

A separate HostedService or console worker application runs continuously. Loop: BRPOP work:ingest

For each work item:

4.1 Unzip the Archive Extract ZIP into: /storage/files/{userId}/{jobId}/ Update job status: status = "unzipping"

4.2. Prepare KM Uploads

Every extracted file becomes a KM upload. Example directory:

/storage/files/t1/abc123/
   file1.pdf
   file2.txt
   notes.docx

The worker builds:

{
  "jobId": "...",
  "userId": "...",
  "files": [
    "/storage/files/t1/abc123/file1.pdf",
    ...
  ]
}

Update status: status = "processing"

4.3. Upload Files to Kernel Memory Worker calls: POST http://km:9001/upload for each file: KM stores:

  • chunks
  • embeddings (in Postgres pgvector)
  • metadata

Update status: status = "uploading_to_km"

5. KM Processing / Indexing

KM performs:

  • Extract
  • Partition
  • Generate Embeddings
  • Save Records (chunks)
  • Store metadata per document

KM processes asynchronously.

Worker optionally polls: GET /indexes/{documentId}/status

When completed: status = "completed"

Store back into Redis job state.

6. Frontend – Upload Complete

Once: status = "completed"

Frontend transitions to the chat UI. Messages are now routed through: POST /api/ask

7. Ask Flow

Frontend sends:

POST /api/ask
{
  "userId": "t1",
  "question": "What is a zebra?"
}

Backend forwards to KM: POST http://km:9001/ask

KM performs:

  • embedding search
  • retrieval
  • context assembly
  • LLM inference (using your Ollama config)
  • Returns answer + citations.

Backend returns response to frontend.

8. Multi-Tenancy Requirements

Every step includes a userId.

KM supports namespaces via:

?index=userId_default
?documentId=userId_<uuid>

Your worker prefixes:

index = $"{userId}_default";
documentId = $"{userId}_{Guid.NewGuid()}"

Prevents cross-user mixing.