A full-stack web application for auditing clinical AI model performance. Enter your model's metrics manually or upload a JSON file, and the app sends them to a locally-running fine-tuned LLM (PhantomAjusshi/phi3-auditor-merged) which returns a structured health classification and detailed explanation.
- Overview
- Architecture
- Project Structure
- Features
- Prerequisites
- Setup & Installation
- Running the App
- API Reference
- Environment Variables
- Tech Stack
- Related Repository
The app provides a clean UI where users can:
- Enter clinical ML model metrics (AUC, accuracy, ECE, drift, etc.) manually or via JSON upload
- Submit them to a FastAPI backend
- Receive a verdict label (e.g. "Calibration Failure", "Major Drift") and a ~300-word structured analysis from the Phi-3 auditor model
The model is served locally via llama.cpp using a quantized GGUF file (phi3-auditor-q4.gguf), keeping inference entirely on-device.
Browser (Next.js :3000)
│
│ POST /analyze { metrics JSON }
▼
FastAPI Backend (:8000)
│
│ POST /completion { prompt }
▼
llama.cpp server (:8080)
│
│ Loads
▼
models/phi3-auditor-q4.gguf (~2.2 GB)
Hospital-Model-Audit-Website/
│
├── app/ # Next.js App Router
│ ├── page.tsx # Main UI — metrics input + model output
│ ├── layout.tsx # Root layout with Vercel Analytics
│ └── globals.css # Global styles
│
├── backend/
│ ├── main.py # FastAPI server — /analyze endpoint
│ ├── model.py # HuggingFace model loader (alternative inference path)
│ ├── requirements.txt # Python dependencies
│ ├── start.sh # Launches llama.cpp server then FastAPI
│ └── llama_server.log # Runtime log from llama.cpp (auto-generated)
│
├── components/
│ ├── theme-provider.tsx
│ └── ui/ # shadcn/ui component library (40+ components)
│
├── hooks/ # Custom React hooks
├── lib/utils.ts # Tailwind utility helpers
├── public/ # Static assets & icons
│
├── models/ # ← Place phi3-auditor-q4.gguf here (not in repo)
├── llama.cpp/ # ← Clone & build separately (not in repo)
│
├── next.config.mjs
├── package.json
├── tsconfig.json
└── SETUP.md # New-machine setup guide
- Two input modes — manual field-by-field entry or JSON file upload
- Live backend status indicator — checks
/healthon load; shows a warning banner if the backend is unreachable - Structured AI output — the model returns a label + a 4-section explanation (Observations → Diagnosis → Impact → Recommendation)
- Mock mode — set
USE_MOCK=truein the backend to get instant placeholder responses without loading the model (useful for frontend development) - Dark mode support via
next-themes - 5-minute request timeout with user-friendly timeout and CORS error messages
- JSON schema enforcement — llama.cpp is prompted with a JSON schema so the output is always parseable
| Tool | Version | Notes |
|---|---|---|
| Node.js | 18+ | For the Next.js frontend |
| Python | 3.10+ | For the FastAPI backend |
| C++ Compiler | — | To build llama.cpp (gcc / clang / cmake) |
| Git | — | To clone llama.cpp |
| RAM | 8 GB+ | For running the 4-bit quantized model |
| GPU (optional) | — | Speeds up inference via -ngl 99 flag in start.sh |
git clone <your-repo-url>
cd Hospital-Model-Audit-WebsiteThe llama.cpp binary is excluded from the repo (system-specific build). Compile it from source:
git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp
make
# OR with cmake:
# mkdir build && cd build && cmake .. && cmake --build . --config Release
cd ..The
start.shscript expects the binary atllama.cpp/build/bin/llama-server. Adjust the path inbackend/start.shif your build places it elsewhere.
The quantized GGUF model (phi3-auditor-q4.gguf, ~2.2 GB) is not included in the repo. Create the models/ directory and place it there:
mkdir -p models
# Transfer phi3-auditor-q4.gguf into models/
# The model was quantized from: https://huggingface.co/PhantomAjusshi/phi3-auditor-mergedcd backend
python3 -m venv venv
source venv/bin/activate # Windows: venv\Scripts\activate
pip install -r requirements.txt# From the project root
npm install --legacy-peer-depsThe start.sh script kills any existing processes on ports 8000/8080, starts the llama.cpp inference server, waits for it to load, then starts FastAPI:
chmod +x backend/start.sh
./backend/start.shThen open a separate terminal for the frontend:
npm run devVisit http://localhost:3000.
Terminal 1 — llama.cpp server:
./llama.cpp/build/bin/llama-server \
-m ./models/phi3-auditor-q4.gguf \
--port 8080 \
-c 2048 \
-ngl 99 \
--host 127.0.0.1Terminal 2 — FastAPI backend:
cd backend
source venv/bin/activate
python3 main.pyTerminal 3 — Next.js frontend:
npm run devUseful during frontend development when you don't want to load the full model:
cd backend
USE_MOCK=true python3 main.pyBase URL: http://localhost:8000
Returns backend and llama.cpp server status.
Response:
{ "status": "online" }
// or if the model server is down:
{ "status": "backend_online_but_model_server_down" }Accepts model performance metrics and returns an audit verdict.
Request body (all fields optional — send at least one):
{
"auc": 0.8523,
"accuracy": 0.7865,
"precision": 0.8234,
"recall": 0.7432,
"f1Score": 0.7823,
"ece": 0.0452,
"brier": 0.1234,
"drift": 0.0523,
"missingRate": 0.0145,
"labelShift": 0.0832,
"positiveRate": 0.4521,
"dataIntegrity": 2
}Response:
{
"label": "Needs Review",
"explanation": "1. Observations: ...\n2. Diagnosis: ...\n3. Impact: ...\n4. Recommendation: ..."
}Generation settings: n_predict: 1024, temperature: 0.5, JSON schema enforced via llama.cpp grammar.
NEXT_PUBLIC_API_URL=http://localhost:8000Defaults to http://localhost:8000 if not set. Update this to your production API URL when deploying.
| Variable | Default | Description |
|---|---|---|
USE_MOCK |
False |
Return mock responses without loading the model |
HF_TOKEN |
— | Optional HuggingFace token (needed only if using the model.py HF inference path) |
| Package | Version | Role |
|---|---|---|
| Next.js | 16.0.7 | React framework (App Router) |
| React | 19.2.0 | UI library |
| TypeScript | 5.x | Type safety |
| Tailwind CSS | 4.x | Styling |
| shadcn/ui + Radix UI | — | Accessible component library |
| Recharts | 2.15.4 | Chart components |
| Vercel Analytics | 1.3.1 | Usage analytics |
| next-themes | 0.4.6 | Dark mode |
| Zod | 3.25.76 | Schema validation |
| Package | Role |
|---|---|
| FastAPI | REST API server |
| Uvicorn | ASGI server |
| Requests | HTTP calls to llama.cpp /completion endpoint |
| llama.cpp | Local LLM inference engine (GGUF format) |
phi3-auditor-q4.gguf |
4-bit quantized Phi-3 auditor model (~2.2 GB) |
The fine-tuned model powering this app was trained in a separate project:
Hospital-Audit-Trained-Model — LoRA fine-tuning of
microsoft/Phi-3-mini-4k-instructon 5,000 synthetic clinical audit reports using 8-bit quantization and PEFT.Merged model on HuggingFace: PhantomAjusshi/phi3-auditor-merged