π 2nd Place β Hackvidia at Arkavidia 10.0
CLARA is an AI-powered legal assistant purpose-built for Indonesian MSMEs (Micro, Small & Medium Enterprises). It helps business owners understand contracts and employment law, draft legal documents, and detect risky clauses β all without needing a lawyer on retainer.
Features Β· Architecture Β· Quick Start Β· API Docs Β· Contributing
- About the Project
- Features
- Architecture
- Tech Stack
- Project Structure
- Quick Start
- Environment Variables
- API Documentation
- Running Tests
- Deployment
- Contributing
- Team
- License
Indonesian MSMEs frequently sign contracts they don't fully understand, often without access to legal counsel. CLARA bridges this gap by combining:
- Retrieval-Augmented Generation (RAG) over a curated Indonesian legal knowledge base
- Self-Consistency Reasoning with Jensen-Shannon entropy confidence scoring
- Knowledge Graph (Neo4j) for symbolic legal reasoning via Cypher traversal
- AI-powered document drafting for MoU, LoI, and PKS document types
- OCR + guardrail pipeline that automatically flags illegal contract clauses
CLARA won 2nd Place at the Hackvidia competition, Arkavidia 10.0 β a national-level IT competition hosted by HMIF ITB.
Upload any contract (PDF or image) and get an instant, structured legal risk report:
- Clause-by-clause explanation in plain language
- Severity-tagged violations:
CRITICAL,WARNING,INFO - Automatic detection of illegal patterns (forced seizure, excessive penalties, illegal wage cuts, etc.)
- Statutory citations (Indonesian Law, Government Regulations, Ministry Decrees)
Ask questions about Indonesian employment law and contract regulation:
- Hybrid Retrieval: Dense vector search + BM25 full-text search + symbolic Neo4j graph traversal, fused via Reciprocal Rank Fusion (RRF)
- Self-Consistency Loop: Generates multiple reasoning paths, measures divergence (Jensen-Shannon entropy), and maps to a
green / yellow / redconfidence level - Answers always cite specific articles and laws (
Pasal N UU No. X Tahun YYYY)
Conversational smart drafter for legal documents:
- Supports MoU (Memorandum of Understanding), LoI (Letter of Intent), and PKS (Cooperation Agreement)
- Multi-turn dialogue: CLARA asks clarifying questions until all required fields are gathered
- Detects legally binding terms and warns before generating
- Outputs a structured Markdown document + downloadable PDF
- Post-generation guardrail scan on the produced draft
- Google OAuth 2.0 login
- JWT-based stateless session
- Per-user chat history persisted in Neo4j
- Aggregated view of all uploaded contract reviews and drafting projects
- File management with source tracing per conversation
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β React (Vite) Frontend β
β Pages: Landing Β· Login Β· Chat Β· Files Β· Home β
ββββββββββββββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββββββββ
β REST / JWT
ββββββββββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββββββββββ
β Express.js Backend (Node) β
β β
β ββββββββββββ ββββββββββββ ββββββββββββ ββββββββββββββββββββ β
β β /auth β β/contract β β /query β β /drafter β β
β ββββββββββββ ββββββ¬ββββββ ββββββ¬ββββββ ββββββββββ¬ββββββββββ β
β β β β β
β ββββββββββββββββββββΌβββββββββββββββΌβββββββββββββββββββΌβββββββββββ β
β β Service Layer β β
β β OCR Service β Guardrail β Hybrid Retrieval β Reasoning β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
β βββββββββββββββββββ βββββββββββββββ ββββββββββββββββββββββββ β
β β BullMQ Worker β β Neo4j DB β β Google Gemini API β β
β β (async OCR) β β (KG + RAG) β β (LLM + Embeddings) β β
β ββββββββββ¬βββββββββ βββββββββββββββ ββββββββββββββββββββββββ β
β β Redis Queue β
βββββββββββββΌββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
βββββββββΌβββββββββ
β Google Cloud β
β Vision (OCR) β
ββββββββββββββββββ
User Query
β
βΌ
ββββββββββββββββββββββββββββββββββββββββββ
β Hybrid Retrieval β
β ββββββββββββ ββββββββββ ββββββββββββ β
β β Dense β β BM25 β βSymbolic β β
β β (768d β β (full β β(Neo4j β β
β β vector) β β text) β β Cypher) β β
β ββββββββββββ ββββββββββ ββββββββββββ β
β Reciprocal Rank Fusion β
ββββββββββββββββββββ¬ββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββ
β Reasoning β
β Service (N=3 β
β paths) β
β β
β JS Entropy β β
β Confidence β
β green/yellow/ β
β red β
ββββββββββ¬βββββββββ
β
βΌ
Final Answer
+ Citations
+ Confidence
| Layer | Technology |
|---|---|
| Frontend | React 19, Vite 7, Tailwind CSS 4, Framer Motion, React Router DOM 7 |
| Backend | Node.js 22, Express.js 5, TypeScript 5.7 |
| AI / LLM | Google Gemini 2.5 Flash, Google Gemini Embeddings (gemini-embedding-001, 768d) |
| OCR | Google Cloud Vision API |
| Database | Neo4j 5.18 Community (APOC + Graph Data Science plugins) |
| Queue | BullMQ + Redis 7 |
| Auth | Passport.js, Google OAuth 2.0, JSON Web Tokens |
| PDF Generation | pdf-lib |
| API Docs | Swagger UI (OpenAPI 3.0) |
| Containerization | Docker + Docker Compose |
| Deployment | Vercel (Frontend), Docker (Backend) |
CLARA_AI/
βββ docker-compose.yml # Orchestrates Neo4j, Redis, and Backend
βββ backend/
β βββ src/
β β βββ index.ts # Express app entry point
β β βββ config/ # env, Neo4j, Passport, Redis, Swagger
β β βββ middleware/ # JWT auth guard
β β βββ routes/ # auth, chat, contract, document, drafter, query
β β βββ services/
β β β βββ chat/ # Chat history persistence (Neo4j)
β β β βββ dashboard/ # User project aggregation
β β β βββ drafter/ # Multi-turn document drafting + PDF export
β β β βββ embedding/ # Gemini embedding service
β β β βββ guardrail/ # Statutory limit & clause violation checks
β β β βββ ocr/ # Google Cloud Vision OCR
β β β βββ reasoning/ # Self-consistency loop, JS-entropy, citations
β β β βββ retrieval/ # Dense, BM25, Symbolic, Hybrid (RRF) retrieval
β β β βββ user/ # User creation & lookup
β β βββ workers/
β β β βββ analysisWorker.ts # BullMQ worker for async OCR jobs
β β βββ queues/
β β β βββ analysisQueue.ts # BullMQ queue definition
β β βββ scripts/ # DB init, PDF seeding, knowledge seeding
β β βββ utils/ # Response helpers
β βββ base_knowledge/ # Curated Indonesian legal PDFs for RAG seeding
β βββ Dockerfile
β βββ package.json
β βββ tsconfig.json
βββ frontend/
βββ src/
β βββ pages/ # Landing, Login, Home, ChatDetail, Files
β βββ components/ # ChatBubble, ChatPanel, SourcesPanel, StudioPanelβ¦
β βββ hooks/ # useAuth, useChat, useProjects, useSourcesβ¦
β βββ Services/ # Axios service wrappers per domain
β βββ lib/ # Configured Axios instance
βββ public/
βββ vite.config.js
βββ package.json
- Node.js v20+
- Docker & Docker Compose
- A Google AI Studio API key (Gemini)
- A Google Cloud project with Cloud Vision API enabled and a service-account JSON key
- A Google OAuth 2.0 client (Web application)
git clone https://github.com/your-org/clara-ai.git
cd clara-aicp backend/.env.example backend/.envFill in all required values in backend/.env (see Environment Variables).
Place your Google Cloud Vision service account JSON at:
backend/clara-google-cloud-vision.json
docker compose up neo4j redis -dWait for both services to be healthy:
docker compose ps # both should show "(healthy)"cd backend
npm install
npm run init-schema # creates Neo4j constraints and indexes
npm run seed:pdf # seeds base Indonesian legal knowledge into Neo4jnpm run dev # runs on http://localhost:3001cd ../frontend
npm install
npm run dev # runs on http://localhost:5173Open http://localhost:5173 in your browser.
To run all services including the backend in Docker:
docker compose up --build| Service | URL |
|---|---|
| Frontend (dev) | http://localhost:5173 |
| Backend API | http://localhost:3001 |
| API Docs (Swagger) | http://localhost:3001/api/docs |
| Neo4j Browser | http://localhost:7474 |
Create backend/.env based on the table below:
| Variable | Description | Example |
|---|---|---|
PORT |
Backend port | 3001 |
NODE_ENV |
Environment | development |
NEO4J_URI |
Neo4j Bolt URI | bolt://localhost:7687 |
NEO4J_USER |
Neo4j username | neo4j |
NEO4J_PASSWORD |
Neo4j password | clara_password |
GOOGLE_AI_API_KEY |
Gemini API key | AIza... |
GEMINI_MODEL |
Gemini model name | gemini-2.5-flash |
EMBEDDING_MODEL |
Embedding model | gemini-embedding-001 |
EMBEDDING_DIMENSION |
Embedding vector size | 768 |
GOOGLE_APPLICATION_CREDENTIALS |
Path to GCV service account JSON | clara-google-cloud-vision.json |
JWT_SECRET |
Secret for signing JWTs | <long random string> |
OAUTH_GOOGLE_CLIENT_ID |
Google OAuth client ID | 123...apps.googleusercontent.com |
OAUTH_GOOGLE_CLIENT_SECRET |
Google OAuth client secret | GOCSPX-... |
REASONING_PATHS |
Number of self-consistency paths | 3 |
TEMPERATURE_LOW |
Temperature for conservative reasoning | 0.1 |
TEMPERATURE_HIGH |
Temperature for exploratory reasoning | 0.7 |
MAX_CONTEXT_TOKENS |
Max tokens in Gemini context | 8192 |
TOP_K_DENSE |
Top-K for dense retrieval | 5 |
TOP_K_BM25 |
Top-K for BM25 retrieval | 5 |
TOP_K_SYMBOLIC |
Top-K for symbolic/graph retrieval | 5 |
HYBRID_DENSE_WEIGHT |
RRF weight for dense leg | 0.5 |
HYBRID_BM25_WEIGHT |
RRF weight for BM25 leg | 0.3 |
HYBRID_SYMBOLIC_WEIGHT |
RRF weight for symbolic leg | 0.2 |
MAX_FILE_SIZE_MB |
Maximum upload file size | 10 |
UPLOAD_DIR |
Local upload directory | ./uploads |
VITE_API_URL |
Frontend β Backend base URL | http://localhost:3001 |
After starting the backend, interactive Swagger docs are available at:
http://localhost:3001/api/docs
| Method | Endpoint | Auth | Description |
|---|---|---|---|
GET |
/health |
β | Service health check |
GET |
/api/v1/auth/google |
β | Initiate Google OAuth flow |
GET |
/api/v1/auth/google/callback |
β | OAuth callback, returns JWT |
POST |
/api/v1/document/analyze |
Optional | Upload contract PDF/image for async OCR analysis (returns 202) |
GET |
/api/v1/document/:id/status |
Optional | Poll OCR job status |
POST |
/api/v1/contract/review |
β JWT | Review an already-analyzed contract; runs guardrail + reasoning |
POST |
/api/v1/query |
β JWT | Ask a legal question via hybrid RAG + self-consistency |
POST |
/api/v1/drafter/chat |
β JWT | Multi-turn document drafting conversation |
GET |
/api/v1/chat/sessions |
β JWT | List user's chat sessions |
GET |
/api/v1/chat/sessions/:id |
β JWT | Get message history for a session |
cd backend
npm testThe test suite uses Jest + ts-jest. Test files follow the *.test.ts convention.
Notable test files:
src/services/guardrail/guardrailService.test.tssrc/services/retrieval/hybridRetrieval.test.ts
The backend is fully containerized. For production, deploy via Docker Compose on any VPS or cloud VM:
docker compose up --build -dMake sure NODE_ENV=production and update NEO4J_URI / REDIS_URL to point to your managed services.
The frontend is configured for Vercel deployment (vercel.json is included). All routes are rewired to index.html for SPA routing.
cd frontend
npm run build # outputs to dist/
vercel --prod # or connect your GitHub repo in the Vercel dashboardSet VITE_API_URL in Vercel's Environment Variables to point to your backend URL.
We welcome contributions! Please follow these steps:
git fork https://github.com/your-org/clara-ai.git
git checkout -b feat/your-feature-name# Backend
cd backend && npm run dev
# Frontend (separate terminal)
cd frontend && npm run dev- Backend: TypeScript strict mode. Follow existing service patterns (service class β route handler separation).
- Frontend: React functional components with hooks. Keep service calls in
src/Services/. - All new backend endpoints must include a Swagger JSDoc comment block.
Follow Conventional Commits:
feat: add penalty clause detection to guardrail
fix: resolve RRF weight normalization bug
docs: update retrieval architecture section
refactor: extract confidence label mapping to util
- Target the
mainbranch - Include a short description of what and why
- Reference any related issues
Use GitHub Issues. Include:
- Steps to reproduce
- Expected vs actual behavior
- Relevant logs or screenshots
CLARA was built with β€οΈ by a team of 4 engineers competing at Arkavidia 10.0 Hackvidia:
| Name | Role |
|---|---|
| Manta Yuana | Backend & Project Manager |
| Kadek Pindra | Frontend & Integration |
| Rama Dita | Fullstack & Data Engineering |
| Dewa Surya | Frontend & UI/UX Designer |
This project is licensed under the MIT License. See LICENSE for details.
Made for Indonesian MSMEs Β· Built at Arkavidia 10.0 Hackvidia Β· π 2nd Place
