Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
45 changes: 45 additions & 0 deletions backend/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,3 +31,48 @@ This space is configured to run as a Docker container on port 7860.
- `GET /health` is a lightweight liveness check. It returns API status and model load flags.
- `GET /ready` is a deployment readiness check. It returns `200` only when the API, classifier, NER service, duplicate index, and RAG service are ready; otherwise it returns `503` with a flat response body and per-check details. Set `REQUIRE_SUPABASE=true` to include Supabase configuration in the strict readiness gate.
- Docker images run `backend/healthcheck.py` against `/ready` every 30 seconds after a 120-second startup grace period. Override `HEALTHCHECK_URL` or `HEALTHCHECK_TIMEOUT_SECONDS` if your deployment uses a different internal port or gateway.

### Rate limiting

The backend enforces per-IP rate limits using [slowapi](https://github.com/laurentS/slowapi) (a FastAPI port of Flask-Limiter).

**Default limit:** `POST /ai/analyze_ticket` — **10 requests per minute per client IP**.

All other endpoints (`/health`, `/ready`, `/ai/analyze`, etc.) are unrestricted.

**429 error response**

When a client exceeds the limit the server returns HTTP `429 Too Many Requests` with a `Retry-After` header and a JSON body:

```json
{"error": "Rate limit exceeded: 10 per 1 minute"}
```

Example with curl:

```bash
# 11th request within a minute from the same IP triggers 429
curl -s -o /dev/null -w "%{http_code}" \
-X POST https://<your-host>/ai/analyze_ticket \
-H "Content-Type: application/json" \
-d '{"text": "My printer is broken", "company": "acme"}'
# → 429
```

**Configuration**

The limit string follows slowapi / limits syntax (`N/second`, `N/minute`, `N/hour`, `N/day`). To change it without modifying source code set the `RATELIMIT_DEFAULT` environment variable and initialise the limiter with `default_limits`:

```bash
RATELIMIT_DEFAULT="20/minute"
```

```python
# backend/main.py (excerpt)
limiter = Limiter(
key_func=get_remote_address,
default_limits=[os.getenv("RATELIMIT_DEFAULT", "10/minute")],
)
```

> **Note:** The current codebase hardcodes `"10/minute"` in the `@limiter.limit` decorator on `/ai/analyze_ticket`. The env-var pattern above is the recommended next step for operator-configurable limits.
2 changes: 1 addition & 1 deletion docs/architecture.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ HELPDESK.AI is a multi-tenant SaaS ticketing platform driven by an AI processing

## 3. Storage and Scaling Strategy
- **Client-Side:** Critical UI state cached in `localStorage` via Zustand persist. Direct `localStorage` access wrapped in `try-catch` blocks for adversarial resilience.
- **API Rate Limiting:** Expected at the API Gateway level (or via Hugging Face limits).
- **API Rate Limiting:** Enforced at the application layer via `slowapi`. `POST /ai/analyze_ticket` is limited to **10 requests per minute per client IP**; exceeding the limit returns HTTP 429 with a `Retry-After` header. See `backend/README.md` for configuration details.

## 4. Production Hardening (BMAD Phase 1 & 2)
As part of the BMAD End-Game:
Expand Down
Loading