diff --git a/backend/README.md b/backend/README.md
index cdc5b0c25..77665b65d 100644
--- a/backend/README.md
+++ b/backend/README.md
@@ -31,3 +31,48 @@ This space is configured to run as a Docker container on port 7860.
 - `GET /health` is a lightweight liveness check. It returns API status and model load flags.
 - `GET /ready` is a deployment readiness check. It returns `200` only when the API, classifier, NER service, duplicate index, and RAG service are ready; otherwise it returns `503` with a flat response body and per-check details. Set `REQUIRE_SUPABASE=true` to include Supabase configuration in the strict readiness gate.
 - Docker images run `backend/healthcheck.py` against `/ready` every 30 seconds after a 120-second startup grace period. Override `HEALTHCHECK_URL` or `HEALTHCHECK_TIMEOUT_SECONDS` if your deployment uses a different internal port or gateway.
+
+### Rate limiting
+
+The backend enforces per-IP rate limits using [slowapi](https://github.com/laurentS/slowapi) (a FastAPI port of Flask-Limiter).
+
+**Default limit:** `POST /ai/analyze_ticket` — **10 requests per minute per client IP**.
+
+All other endpoints (`/health`, `/ready`, `/ai/analyze`, etc.) are unrestricted.
+
+**429 error response**
+
+When a client exceeds the limit the server returns HTTP `429 Too Many Requests` with a `Retry-After` header and a JSON body:
+
+```json
+{"error": "Rate limit exceeded: 10 per 1 minute"}
+```
+
+Example with curl:
+
+```bash
+# 11th request within a minute from the same IP triggers 429
+curl -s -o /dev/null -w "%{http_code}" \
+  -X POST https://<your-host>/ai/analyze_ticket \
+  -H "Content-Type: application/json" \
+  -d '{"text": "My printer is broken", "company": "acme"}'
+# → 429
+```
+
+**Configuration**
+
+The limit string follows slowapi / limits syntax (`N/second`, `N/minute`, `N/hour`, `N/day`). To change it without modifying source code set the `RATELIMIT_DEFAULT` environment variable and initialise the limiter with `default_limits`:
+
+```bash
+RATELIMIT_DEFAULT="20/minute"
+```
+
+```python
+# backend/main.py (excerpt)
+limiter = Limiter(
+    key_func=get_remote_address,
+    default_limits=[os.getenv("RATELIMIT_DEFAULT", "10/minute")],
+)
+```
+
+> **Note:** The current codebase hardcodes `"10/minute"` in the `@limiter.limit` decorator on `/ai/analyze_ticket`. The env-var pattern above is the recommended next step for operator-configurable limits.
diff --git a/docs/architecture.md b/docs/architecture.md
index 404b87fa8..083d36ba6 100644
--- a/docs/architecture.md
+++ b/docs/architecture.md
@@ -32,7 +32,7 @@ HELPDESK.AI is a multi-tenant SaaS ticketing platform driven by an AI processing
 
 ## 3. Storage and Scaling Strategy
 - **Client-Side:** Critical UI state cached in `localStorage` via Zustand persist. Direct `localStorage` access wrapped in `try-catch` blocks for adversarial resilience.
-- **API Rate Limiting:** Expected at the API Gateway level (or via Hugging Face limits).
+- **API Rate Limiting:** Enforced at the application layer via `slowapi`. `POST /ai/analyze_ticket` is limited to **10 requests per minute per client IP**; exceeding the limit returns HTTP 429 with a `Retry-After` header. See `backend/README.md` for configuration details.
 
 ## 4. Production Hardening (BMAD Phase 1 & 2)
 As part of the BMAD End-Game: