Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 13 additions & 0 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,10 @@
# TEST_SUPABASE_URL # same as SUPABASE_URL (test project)
# TEST_SUPABASE_SECRET_KEY # same as SUPABASE_SECRET_KEY (test project)
# GEMINI_API_KEY # Google AI Studio free-tier key
# R2_ENDPOINT_URL # https://<account-id>.r2.cloudflarestorage.com
# R2_ACCESS_KEY_ID # Cloudflare R2 API token Access Key ID
# R2_SECRET_ACCESS_KEY # Cloudflare R2 API token Secret Access Key
# R2_BUCKET_NAME # R2 bucket name (test bucket — not prod)

name: CI

Expand Down Expand Up @@ -175,6 +179,10 @@ jobs:
TEST_SUPABASE_URL: ${{ secrets.TEST_SUPABASE_URL }}
TEST_SUPABASE_SECRET_KEY: ${{ secrets.TEST_SUPABASE_SECRET_KEY }}
GEMINI_API_KEY: ${{ secrets.GEMINI_API_KEY }}
R2_ENDPOINT_URL: ${{ secrets.R2_ENDPOINT_URL }}
R2_ACCESS_KEY_ID: ${{ secrets.R2_ACCESS_KEY_ID }}
R2_SECRET_ACCESS_KEY: ${{ secrets.R2_SECRET_ACCESS_KEY }}
R2_BUCKET_NAME: ${{ secrets.R2_BUCKET_NAME }}
run: |
cat > backend/.env.test <<EOF
NODE_ENV=test
Expand All @@ -194,6 +202,11 @@ jobs:
OPENAI_API_KEY=
RESEND_API_KEY=

R2_ENDPOINT_URL=${R2_ENDPOINT_URL}
R2_ACCESS_KEY_ID=${R2_ACCESS_KEY_ID}
R2_SECRET_ACCESS_KEY=${R2_SECRET_ACCESS_KEY}
R2_BUCKET_NAME=${R2_BUCKET_NAME}

USER_API_KEYS_ENCRYPTION_SECRET=0000000000000000000000000000000000000000000000000000000000000000
DOWNLOAD_SIGNING_SECRET=0000000000000000000000000000000000000000000000000000000000000001

Expand Down
69 changes: 60 additions & 9 deletions TECHDEBT.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,15 +11,10 @@ permissive rule / workaround it references.

## High priority

### Re-enable skipped Playwright specs once selectors are fixed
**Files:** `e2e/chat.spec.ts`, `e2e/documents.spec.ts`, `e2e/projects.spec.ts`, `e2e/tabular.spec.ts`

All four product-flow specs are wrapped in `test.describe.skip()`. The
auth setup (`createAndLoginTestUser`) works — proven by all four auth
tests in `e2e/auth.spec.ts` passing. Each spec then fails inside the
test body on selectors / flows that don't match the current frontend
(e.g. `createProject` helper in `documents.spec.ts`, the "new project"
button locator in `projects.spec.ts`, the chat input flow, etc.).
### ~~Re-enable skipped Playwright specs once selectors are fixed~~
All four product-flow specs are now un-skipped — see Done. Full suite
(auth + projects + documents + chat + tabular) is 13/13 green locally
in ~1.4 minutes.

To re-enable:

Expand Down Expand Up @@ -148,3 +143,59 @@ matters.
## Done

<!-- Move items here as they're fixed. Include the commit SHA. -->

### Re-enable `e2e/tabular.spec.ts` — _pending commit SHA_
Added `data-testid` attributes to `ProjectReviewsTab` ("+ Create New"),
`AddNewTRModal` (Create submit), `AddColumnModal` (name input, prompt
textarea, submit), `TabularReviewView` (Add Columns toolbar button,
Run button), and `TabularCell` (outer wrapper with `data-cell-status`,
citation chip). Rewrote `tabular.spec.ts` to (1) create a project,
(2) upload sample.pdf, (3) open `/projects/{id}/tabular-reviews`,
(4) create a review via the empty-state — `AddNewTRModal` auto-selects
the ready docs when in project mode so no extra picker step is needed,
(5) add a column with a manual prompt (skipping auto-generate to save an
LLM call), (6) click Run, (7) wait for a `cell-citation` chip to render.
Backend's `tabular_model` defaults to `gemini-3-flash-preview` which is
already on the free-tier allowlist, no settings needed. 1/1 green in
~18s; full e2e suite (13 tests) green in 1.4 min.

### Re-enable `e2e/chat.spec.ts` — _pending commit SHA_
Added `data-testid` attributes to `ChatInput` (textarea, send button) and
`AssistantMessage` (outer wrapper, citation marker button), plus
`new-chat-empty-state` on the assistant tab's "+ Create New" button.
Rewrote `chat.spec.ts` to (1) create a project, (2) upload sample.pdf
through the same modal flow as the documents spec, (3) navigate to
`/projects/{id}/assistant`, (4) click the empty-state button which
creates a chat and redirects to `/projects/{id}/assistant/chat/{chatId}`,
(5) submit a question via the textarea, (6) wait for `citation-marker`
to render inside the streamed assistant response. Default model
(`gemini-3-flash-preview`) is on the free-tier allowlist in
`backend/src/lib/llm/freeTierGuard.ts`, so no model toggle is needed.
Setup gotcha: the original `GEMINI_API_KEY` in `.env.test` had expired
("API_KEY_INVALID") — rotate at <https://aistudio.google.com/app/apikey>
when the test starts failing with a 400 from googleapis. 1/1 green
locally in ~18s.

### Re-enable `e2e/documents.spec.ts` — _pending commit SHA_
Added `data-testid` attributes to `AddDocumentsModal` (file input, Confirm)
and to each document row + the project page's "Add Documents" toolbar
button. Rewrote `documents.spec.ts` around the current modal-based
upload flow: open the modal, set files on the hidden input, wait for
Confirm to re-enable, click Confirm, then assert the row in the
project's document table by `data-doc-filename`. Also added a
`row-action-download` testid in `RowActions`. Setup gotcha discovered:
the test environment requires Cloudflare R2 credentials in
`backend/.env.test` (`R2_ENDPOINT_URL`, `R2_ACCESS_KEY_ID`,
`R2_SECRET_ACCESS_KEY`, `R2_BUCKET_NAME`) — the original README only
documented Supabase + Gemini. 3/3 tests green locally.

### Re-enable `e2e/projects.spec.ts` — _pending commit SHA_
Added `data-testid` attributes to `ProjectsOverview`, `RowActions` (kebab
toggle, Rename, Delete menu items) and rewrote `projects.spec.ts` to use
them. Fixed two flow drifts: (1) rename is launched from the row's
kebab menu, not by clicking the row (which navigates into the project);
(2) `Create project` redirects into `/projects/{id}` first, so the test
navigates back to `/projects` before asserting the row. All 4 tests
green locally. Setup gotcha discovered along the way: the test
Supabase project must have `backend/schema.sql` applied — see
`e2e/README.md` step 2.
5 changes: 4 additions & 1 deletion backend/src/lib/llm/freeTierGuard.ts
Original file line number Diff line number Diff line change
Expand Up @@ -63,7 +63,10 @@ export function assertFreeTierAllowed(input: FreeTierGuardInput): void {
);
}

const offenders = (input.documentFilenames ?? []).filter((f) => !allowlist.has(f));
const docFilenames = input.documentFilenames ?? [];
if (docFilenames.length === 0) return; // no documents — no data-privacy risk

const offenders = docFilenames.filter((f) => !allowlist.has(f));
if (offenders.length > 0) {
throw new Error(
`Refusing to send non-fixture document(s) [${offenders.join(", ")}] to free-tier ` +
Expand Down
7 changes: 6 additions & 1 deletion backend/src/routes/documents.ts
Original file line number Diff line number Diff line change
Expand Up @@ -961,10 +961,15 @@ async function handleDocumentUpload(
: updated;
return void res.status(201).json(responseDoc);
} catch (e) {
const msg =
e instanceof AggregateError
? `${e.message}: [${e.errors.map(String).join(", ")}]`
: String(e);
console.error("[upload] document processing failed:", e);
await db.from("documents").update({ status: "error" }).eq("id", doc.id);
return void res
.status(500)
.json({ detail: `Document processing failed: ${String(e)}` });
.json({ detail: `Document processing failed: ${msg}` });
}
}

Expand Down
7 changes: 6 additions & 1 deletion backend/src/routes/projects.ts
Original file line number Diff line number Diff line change
Expand Up @@ -796,10 +796,15 @@ export async function handleDocumentUpload(
: updated;
return void res.status(201).json(responseDoc);
} catch (e) {
const msg =
e instanceof AggregateError
? `${e.message}: [${e.errors.map(String).join(", ")}]`
: String(e);
console.error("[upload] document processing failed:", e);
await db.from("documents").update({ status: "error" }).eq("id", doc.id);
return void res
.status(500)
.json({ detail: `Document processing failed: ${String(e)}` });
.json({ detail: `Document processing failed: ${msg}` });
}
}

Expand Down
8 changes: 6 additions & 2 deletions backend/src/routes/tabular.ts
Original file line number Diff line number Diff line change
Expand Up @@ -318,8 +318,12 @@ tabularRouter.post("/prompt", requireAuth, async (req, res) => {
} else {
res.status(502).json({ detail: "LLM returned an empty prompt" });
}
} catch {
res.status(502).json({ detail: "Failed to generate prompt from LLM" });
} catch (err) {
console.error("[tabular-review/prompt] LLM generation failed:", err);
const message = err instanceof Error ? err.message : String(err);
res.status(502).json({
detail: `Failed to generate prompt from LLM: ${message}`,
});
}
});

Expand Down
13 changes: 13 additions & 0 deletions e2e/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,19 @@ Setup steps (one-time):

Playwright performs a placeholder check on startup: if any of `SUPABASE_URL`, `SUPABASE_SECRET_KEY`, `NEXT_PUBLIC_SUPABASE_URL`, `NEXT_PUBLIC_SUPABASE_ANON_KEY`, or `GEMINI_API_KEY` still contain the literal `CHANGEME`, it refuses to start with a clear error.

### Object storage (Cloudflare R2)

`documents.spec.ts`, `chat.spec.ts`, and `tabular.spec.ts` upload `sample.pdf` to the backend, which writes to R2 via the S3 API. You need a separate R2 bucket for testing — production credentials must not be used.

1. Cloudflare dashboard → **R2 Object Storage** → enable R2 (requires a payment method on file, but the free tier — 10 GB / 1M Class A ops / 10M Class B ops per month — easily covers e2e usage).
2. Create a bucket (e.g. `gordonoss-test`).
3. "Manage R2 API Tokens" → **Create User API Token** with **Object Read & Write** scoped to the new bucket. Copy the Access Key ID, Secret Access Key, and the account-level S3 endpoint URL (`https://<account-id>.r2.cloudflarestorage.com`).
4. Paste into `backend/.env.test`:
- `R2_ENDPOINT_URL`
- `R2_ACCESS_KEY_ID`
- `R2_SECRET_ACCESS_KEY`
- `R2_BUCKET_NAME`

## Sample fixture

The tests upload `e2e/fixtures/sample.pdf`. It is a small (~4 KB) four-page PDF containing original prose written for this repository. Regenerate it with:
Expand Down
99 changes: 57 additions & 42 deletions e2e/chat.spec.ts
Original file line number Diff line number Diff line change
@@ -1,55 +1,70 @@
import { resolve } from "node:path";
import { expect, test } from "@playwright/test";
import { expect, test, type Page } from "@playwright/test";
import { createAndLoginTestUser } from "./helpers/auth";

const SAMPLE_PDF = resolve(__dirname, "fixtures", "sample.pdf");

// Chat depends on a real LLM provider — Anthropic, OpenAI, or Gemini.
// Without keys the request fails before any tokens stream back.
// See e2e/README.md for how to wire up keys for this suite.
// TODO(TECHDEBT.md): test body fails on selectors / flows that have
// drifted from the current UI. Auth setup (createAndLoginTestUser)
// works. Re-enable per test once selectors are fixed against the
// current frontend. Download playwright-report from CI to see the
// exact failure point in each.
test.describe.skip("chat", () => {
test("ask a question about an uploaded PDF and get a streamed answer with a citation", async ({ page }) => {
test.setTimeout(180_000); // LLM round-trip can take a while end-to-end
// Chat depends on a real LLM provider. The frontend's default model is
// `gemini-3-flash-preview`, which is on the backend's free-tier list. The test
// env sets ALLOW_FREE_TIER_LLM=true and FREE_TIER_FIXTURE_ALLOWLIST=sample.pdf
// so the backend will route the call to Gemini's free tier — see
// `backend/src/lib/llm/freeTierGuard.ts`.

async function createProjectAndOpen(page: Page, name: string) {
await page.goto("/projects");
await page.getByTestId("new-project-button").click();
await page.getByPlaceholder("Project name").fill(name);
await page.getByRole("button", { name: /^create project$/i }).click();
await page.waitForURL(/\/projects\/[a-f0-9-]+/, { timeout: 15_000 });
}

async function uploadSamplePdf(page: Page) {
await page.getByTestId("add-documents-button").click();
await page.getByTestId("add-docs-file-input").setInputFiles(SAMPLE_PDF);
const confirm = page.getByTestId("add-docs-confirm");
await expect(confirm).toBeEnabled({ timeout: 60_000 });
await confirm.click();
await expect(
page.locator('[data-testid="document-row"][data-doc-filename="sample.pdf"]'),
).toBeVisible({ timeout: 30_000 });
}

test.describe("chat", () => {
test("ask a question about an uploaded PDF and receive a streamed answer with a citation", async ({
page,
}) => {
test.setTimeout(240_000); // LLM round-trip on free tier can be slow

await createAndLoginTestUser(page, "chat");
await createProjectAndOpen(page, `Chat Project ${Date.now()}`);
await uploadSamplePdf(page);

// The project URL is the current path; the assistant tab lives at
// /projects/{id}/assistant — switch to it explicitly rather than tab-click
// so the test doesn't depend on the toolbar tab's accessible name.
const projectPath = new URL(page.url()).pathname.replace(/\/$/, "");
await page.goto(`${projectPath}/assistant`);

// No chats yet — the empty-state "+ Create New" creates one and redirects
// to /projects/{id}/assistant/chat/{chatId}.
await page.getByTestId("new-chat-empty-state").click();
await page.waitForURL(/\/assistant\/chat\/[a-f0-9-]+/, { timeout: 15_000 });

// Create a project and upload the sample PDF
await page.goto("/projects");
const projectName = `Chat Project ${Date.now()}`;
await page.getByRole("button", { name: /(new project|create project|add project)/i }).first().click();
await page.getByPlaceholder("Project name").fill(projectName);
await page.getByRole("button", { name: /create project/i }).click();
await expect(page.getByText(projectName, { exact: false })).toBeVisible({ timeout: 15_000 });
await page.getByText(projectName, { exact: false }).first().click();
await page.waitForURL(/\/projects\/[a-f0-9-]+/, { timeout: 10_000 });

// Upload sample.pdf
await page.locator('input[type="file"]').first().setInputFiles(SAMPLE_PDF);
await expect(page.getByText(/sample\.pdf/i)).toBeVisible({ timeout: 30_000 });

// Open the assistant chat in this project
const projectUrl = new URL(page.url());
await page.goto(`${projectUrl.pathname.replace(/\/$/, "")}/assistant`);

// Ask a question about the document
const chatInput = page.getByPlaceholder(/ask a question/i);
await chatInput.click();
await chatInput.fill("What is this document about?");
const chatInput = page.getByTestId("chat-input");
await chatInput.fill("What is this document about? Cite the source.");
await chatInput.press("Enter");

// Wait for an assistant response to appear and finish streaming.
// We assert on a citation marker [1] arriving somewhere on the page
// — that is how AssistantMessage renders inline source references.
const citation = page.locator("text=/\\[1\\]/");
await expect(citation).toBeVisible({ timeout: 120_000 });
// Assistant message bubble appears as soon as streaming begins. Wait for
// a citation marker to render inside it — that's how we know the model
// grounded the answer against sample.pdf and finished at least one
// citation token.
const citation = page.getByTestId("citation-marker").first();
await expect(citation).toBeVisible({ timeout: 180_000 });

// Body text should also have meaningful content (not just the marker).
const responseText = await page.locator("body").innerText();
expect(responseText.length).toBeGreaterThan(200);
// Sanity: the assistant message exists and has substantive content.
const assistantMessage = page.getByTestId("assistant-message").first();
await expect(assistantMessage).toBeVisible();
const text = await assistantMessage.innerText();
expect(text.length).toBeGreaterThan(50);
});
});
Loading
Loading