Add missing infrastructure files and database schema by fbmoulin · Pull Request #1 · fbmoulin/2acivelagent

fbmoulin · 2025-12-04T17:01:29Z

Add Dockerfile.python for Python microservices container
Add nginx.conf with SSL, rate limiting, and reverse proxy config
Add Python Flask API application (app.py)
Add PostgreSQL database initialization script with:
- Judicial schema with cases, documents, analyses tables
- Audit schema for change tracking
- Triggers for automatic timestamps and audit logging
- Views for case summary and processing statistics
Add SSL certificates directory placeholder

Summary by CodeRabbit

New Features
- Legal document automation API with six public endpoints (health, PDF extraction, FIRAC analysis, jurisprudence search, precedent analysis, document generation)
- OpenAPI spec and user-facing API docs for integration and examples
Infrastructure
- CI and multi-stage deployment pipelines (build, test, security scans, image publishing)
- Database migrations and initial judicial/audit schema
- Secure reverse-proxy with TLS, rate limits, health checks
- Production-ready container image layout and safer compose exposure
- Environment template with configuration placeholders
Docs & Tests
- Expanded deployment/security docs, migration guide, comprehensive unit/integration tests
Chores
- Repo ignore rules and Dependabot configuration added

_{✏️ Tip: You can customize this high-level summary in your review settings.}

- Add Dockerfile.python for Python microservices container - Add nginx.conf with SSL, rate limiting, and reverse proxy config - Add Python Flask API application (app.py) - Add PostgreSQL database initialization script with: - Judicial schema with cases, documents, analyses tables - Audit schema for change tracking - Triggers for automatic timestamps and audit logging - Views for case summary and processing statistics - Add SSL certificates directory placeholder

- Add requirements.txt with all Python dependencies - Add pyproject.toml with modern Python project configuration - Package metadata and classifiers - Optional dependencies for dev and monitoring - Tool configurations for black, isort, mypy, pytest - Add .env.template with documented environment variables - Add .gitignore for Python, Docker, secrets, and IDE files

- Add conftest.py with shared fixtures for testing - Add unit tests for PDFExtractor class - Add unit tests for all API endpoints: - /health, /extract-pdf, /firac-analysis - /datajud-search, /distinguish-analysis, /generate-document - Add integration tests for complete workflows - Add tests for error handling and service degradation - Add pytest.ini with markers and configuration - Test structure: tests/unit/, tests/integration/

- Add CI pipeline (ci.yml) with: - Code quality checks (black, isort, flake8, mypy) - Unit tests with pytest - Integration tests with Redis and PostgreSQL services - Test coverage reporting with Codecov - Security scanning (bandit, safety) - Docker build verification - Add deployment pipeline (deploy.yml) with: - Docker image build and push to GHCR - Staging deployment with smoke tests - Production deployment with health checks - GitHub Release creation for tags - Manual rollback capability - Add Dependabot configuration for: - Python dependencies (weekly) - GitHub Actions (weekly) - Docker images (weekly)

- Add OpenAPI 3.1 specification (openapi.yaml) with: - All API endpoints documented - Request/response schemas - Error responses - Authentication details - Rate limiting info - Add comprehensive API documentation (API_DOCUMENTATION.md): - Endpoint descriptions with examples - curl examples for each endpoint - Python and JavaScript SDK examples - Error handling guide - LGPD compliance information

- Add alembic.ini configuration - Add migrations/env.py with environment setup - Add migrations/models.py with SQLAlchemy ORM models: - Case, Document, FIRACAnalysis - JurisprudenceSearch, DistinguishAnalysis - GeneratedDocument, Notification, AuditLog - Add initial migration (001) creating: - judicial and audit schemas - All tables with indexes and constraints - Enum types for case_status and document_type - Update timestamp triggers - Add migrations/README.md with usage documentation

coderabbitai · 2025-12-04T17:01:40Z

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

Walkthrough

Adds project foundation: environment template, CI/CD (Dependabot, GitHub Actions), Alembic migrations and SQLAlchemy models, a Flask API (PDF extraction, FIRAC, DataJud, distinguish, document generation) with Redis/OpenAI integrations, Docker/Nginx artifacts, OpenAPI and docs, DB init scripts, and tests.

Changes

Cohort / File(s)	Change Summary
Environment & Tooling `\.env.template`, `pyproject.toml`, `requirements.txt`, `pytest.ini`, `alembic.ini`	New environment template and project/tooling manifests: dependency pins, build system, Alembic settings, pytest configuration, and project metadata.
CI / CD / Repo `.github/dependabot.yml`, `.github/workflows/ci.yml`, `.github/workflows/deploy.yml`, `\.gitignore`	Added Dependabot config, CI pipeline (lint, tests, coverage, security, docker build), deploy workflow (build, staging, production, rollback), and a consolidated .gitignore.
Database & Migrations `migrations/*`, `migrations/versions/..._initial_schema.py`, `migrations/README.md`, `migrations/script.py.mako`	Added Alembic env, migration template, comprehensive SQLAlchemy models/enums, and an initial migration creating schemas, enums, tables, indexes, triggers; README for migrations.
DB Init Script `scripts/sql/init_database.sql`	New full PostgreSQL initialization script: extensions, schemas, tables, indexes, triggers, audit functions, views, initial data, and grants.
API Application `scripts/python/app.py`	New Flask microservice implementing JWT auth/token endpoint, PDF extraction, FIRAC analysis, DataJud client, distinguish analysis, document generation, Redis caching, OpenAI integration, input validation, middleware, and error handlers.
Infrastructure `infrastructure/docker/Dockerfile.python`, `infrastructure/nginx/nginx.conf`, `docker-compose.yml`, `security/ssl_certificates/README.txt`	Multi-stage Python Dockerfile, Nginx reverse-proxy with TLS/rate limiting, docker-compose changed to use `expose`, and SSL certs README.
API Docs & Spec `docs/API_DOCUMENTATION.md`, `docs/api/openapi.yaml`	Human API docs and OpenAPI 3.1.0 spec with six endpoints, request/response schemas, JWT bearer security, examples, and components.
Deployment & Security Docs `docs/DEPLOYMENT.md`, `docs/SECURITY.md`, `README.md`	Added deployment guide, security guide, and README updates describing JWT auth, infrastructure diagrams, LGPD notes, and usage examples.
Tests `tests/`, `tests/unit/`, `tests/integration/*`	Test package initializers, `conftest.py` with fixtures/mocks, unit tests for routes and PDF extractor, and integration tests covering end-to-end flows and failure scenarios.
Misc / Templates `migrations/script.py.mako`, `security/ssl_certificates/README.txt`, `tests/__init__.py`	Misc templates, small READMEs, and test package markers supporting migrations, SSL handling, and tests.

Sequence Diagram(s)

sequenceDiagram
    actor Client
    participant API as Flask API
    participant Redis
    participant OpenAI
    participant Datajud as DATAJUD API
    participant DB as PostgreSQL

    Client->>API: POST /extract-pdf (pdf/base64)
    API->>Redis: GET extracted_text(cache)
    alt cache hit
        Redis-->>API: cached text
    else
        API->>API: extract via PyPDF2
        API->>Redis: SET extracted_text(cache)
    end
    API-->>Client: extracted text

    Client->>API: POST /firac-analysis (text)
    API->>Redis: GET firac_analysis(cache)
    alt cache hit
        Redis-->>API: cached analysis
    else
        API->>OpenAI: FIRAC prompt (GPT-4)
        OpenAI-->>API: structured FIRAC response
        API->>DB: INSERT firac_analyses
        API->>Redis: SET firac_analysis(cache)
    end
    API-->>Client: FIRAC analysis

    Client->>API: POST /datajud-search (params)
    API->>Datajud: search request
    Datajud-->>API: results
    API->>DB: INSERT jurisprudence_searches
    API-->>Client: search results

    Client->>API: POST /distinguish-analysis (facts, precedent)
    API->>OpenAI: distinguish prompt
    OpenAI-->>API: applicability analysis
    API->>DB: INSERT distinguish_analyses
    API-->>Client: distinguish result

    Client->>API: POST /generate-document (type, case)
    API->>OpenAI: generation prompt
    OpenAI-->>API: document content
    API->>DB: INSERT generated_documents
    API-->>Client: generated document

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Focus review on:
- scripts/python/app.py: input validation, JWT auth/token flow, OpenAI prompt composition and error handling, Redis fallback, and security headers.
- migrations/models.py and migrations/versions/..._initial_schema.py: FK constraints, cascade semantics, enum creation idempotency, indexes, and triggers.
- scripts/sql/init_database.sql: audit functions/triggers, view definitions, grants, and initial data.
- .github/workflows/deploy.yml: secret usage, image tagging, and rollback steps.

Poem

🐰 I nibble docs and spin a test,

Schemas sprout in tidy nests.
Pipelines hum and servers sing,
PDFs, FIRACs, prompts take wing.
A rabbit cheers — deploy, then rest.

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately reflects the main addition: infrastructure files (Dockerfile, nginx.conf, SSL directory) and database schema (migrations, SQL scripts, Alembic configuration) are the primary content of this comprehensive PR.
Docstring Coverage	✅ Passed	Docstring coverage is 90.29% which is sufficient. The required threshold is 80.00%.

✨ Finishing touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch claude/project-analysis-planning-01MDFn863jteLVoZXZPj61Zf

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

gemini-code-assist · 2025-12-04T17:02:09Z

Summary of Changes

Hello @fbmoulin, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request establishes the core infrastructure and initial microservices for a judicial automation platform. It sets up the environment for Python-based services, configures Nginx for secure and efficient traffic management, and lays down a comprehensive PostgreSQL database schema with migration capabilities. The included Flask API provides essential functionalities for legal document processing and AI-driven analysis, supported by automated dependency updates and a foundational test suite.

Highlights

Core Infrastructure Setup: Introduced foundational infrastructure components including a Dockerfile for Python microservices, Nginx configuration for reverse proxy with SSL and rate limiting, and a comprehensive .env.template for environment variables.
Database Schema and Migrations: Added a robust PostgreSQL database schema with dedicated judicial and audit schemas, defining tables for cases, documents, various legal analyses (FIRAC, distinguish), generated documents, and audit logs. Alembic is configured for managing database migrations.
Python Flask API: Implemented a new Python Flask API (app.py) to handle core judicial automation functionalities, including PDF text extraction, FIRAC legal analysis, DATAJUD jurisprudence search, distinguish analysis, and legal document generation, leveraging OpenAI and other services.
Automated Dependency Management: Integrated Dependabot via .github/dependabot.yml to automate dependency updates for Python, GitHub Actions, and Docker, ensuring the project stays up-to-date with security patches and new features.
API Documentation and Testing: Provided detailed API documentation in Markdown and an OpenAPI (Swagger) specification for easy understanding and integration. Initial unit and integration tests, along with Pytest configuration, are included to ensure code quality and functionality.

Ignored Files

Ignored by pattern: .github/workflows/** (2)
- .github/workflows/ci.yml
- .github/workflows/deploy.yml

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces a significant amount of infrastructure and application code, including Docker setup, Nginx configuration, a Python Flask API, and a complete database schema with migrations. The overall structure is very well-organized and follows many best practices, such as using multi-stage Docker builds, security headers in Nginx, and a comprehensive test suite structure. However, there are several critical security vulnerabilities related to hardcoded passwords and some significant bugs in the Python application due to outdated library usage. My review focuses on addressing these critical issues, improving security, fixing bugs, and enhancing maintainability by removing redundancies.

gemini-code-assist · 2025-12-04T17:05:30Z

+            response = openai.ChatCompletion.create(
+                model="gpt-4",
+                messages=[
+                    {"role": "system", "content": "Voce e um especialista em analise juridica brasileira."},
+                    {"role": "user", "content": prompt}
+                ],
+                max_tokens=2000,
+                temperature=0.3
+            )


The openai.ChatCompletion.create syntax is for openai library version < 1.0.0. Your requirements.txt specifies openai==1.6.1, which uses a new client-based API. This code will raise an AttributeError. The suggested change updates the call to the new syntax. For better performance, you may want to initialize the OpenAI client once globally rather than in each function call. This same issue exists in DistinguishAnalyzer and DocumentGenerator.

Suggested change

response = openai.ChatCompletion.create(

model="gpt-4",

messages=[

{"role": "system", "content": "Voce e um especialista em analise juridica brasileira."},

{"role": "user", "content": prompt}

],

max_tokens=2000,

temperature=0.3

)

from openai import OpenAI

client = OpenAI()

response = client.chat.completions.create(

model="gpt-4",

messages=[

{"role": "system", "content": "Voce e um especialista em analise juridica brasileira."},

{"role": "user", "content": prompt}

],

max_tokens=2000,

temperature=0.3

)

gemini-code-assist · 2025-12-04T17:05:30Z

+
+# SQLAlchemy URL (can be overridden by env)
+# Use env variable in production
+sqlalchemy.url = postgresql://n8n_user:password@localhost:5432/n8n


Hardcoding credentials, even for a local database with a simple password, is a significant security risk. The sqlalchemy.url is committed to version control with a password. It's much safer to rely entirely on environment variables, which your migrations/env.py script already supports. Please remove this line or comment it out to prevent accidental credential exposure.

# sqlalchemy.url = postgresql://n8n_user:password@localhost:5432/n8n

gemini-code-assist · 2025-12-04T17:05:30Z

+if __name__ == '__main__':
+    logger.info("Starting Judicial Automation API...")
+    logger.info(f"OpenAI configured: {bool(openai.api_key)}")
+    logger.info(f"DATAJUD configured: {bool(datajud_client.username)}")
+    logger.info(f"Redis configured: {redis_client is not None}")
+
+    app.run(host='0.0.0.0', port=5000, debug=os.getenv('FLASK_DEBUG', 'false').lower() == 'true')


The application is started using app.run(), which runs the Flask development server. This server is not suitable for production as it's not designed to be efficient, stable, or secure. Your Dockerfile executes this script directly. You should use a production-grade WSGI server like Gunicorn or uWSGI to run the application, and update the CMD in your Dockerfile accordingly. This if __name__ == '__main__': block is useful for local development but shouldn't be the entrypoint for production.

gemini-code-assist · 2025-12-04T17:05:30Z

+            response = openai.ChatCompletion.create(
+                model="gpt-4",
+                messages=[
+                    {"role": "system", "content": "Voce e um magistrado especialista em redacao de pecas judiciais."},
+                    {"role": "user", "content": prompts[document_type]}
+                ],
+                max_tokens=3000,
+                temperature=0.3
+            )


The openai.ChatCompletion.create syntax is outdated for openai>=1.0.0 and will cause a runtime error. This needs to be updated to use the new client-based API, similar to the other OpenAI calls in this file.

Suggested change

response = openai.ChatCompletion.create(

model="gpt-4",

messages=[

{"role": "system", "content": "Voce e um magistrado especialista em redacao de pecas judiciais."},

{"role": "user", "content": prompts[document_type]}

],

max_tokens=3000,

temperature=0.3

)

from openai import OpenAI

client = OpenAI()

response = client.chat.completions.create(

model="gpt-4",

messages=[

{"role": "system", "content": "Voce e um magistrado especialista em redacao de pecas judiciais."},

{"role": "user", "content": prompts[document_type]}

],

max_tokens=3000,

temperature=0.3

)

gemini-code-assist · 2025-12-04T17:05:31Z

+    port = os.getenv("POSTGRES_PORT", "5432")
+    db = os.getenv("POSTGRES_DB", "n8n")
+    user = os.getenv("POSTGRES_USER", "n8n_user")
+    password = os.getenv("POSTGRES_PASSWORD", "password")


The get_url function has a hardcoded default password 'password'. This is a critical security risk, as it might be used accidentally in a real environment if the POSTGRES_PASSWORD environment variable is not set. The application should fail explicitly if a password is not provided, preventing connections with weak or default credentials.

Suggested change

password = os.getenv("POSTGRES_PASSWORD", "password")

password = os.environ["POSTGRES_PASSWORD"] # Fail loudly if not set

gemini-code-assist · 2025-12-04T17:05:32Z

+[pytest]
+# Sistema de Automacao Juridica - Pytest Configuration
+
+# Test discovery
+testpaths = tests
+python_files = test_*.py *_test.py
+python_classes = Test*
+python_functions = test_*
+
+# Minimum version
+minversion = 7.0
+
+# Default options
+addopts =
+    -ra
+    -q
+    --strict-markers
+    --tb=short
+    -v
+
+# Markers
+markers =
+    unit: Unit tests (fast, no external dependencies)
+    integration: Integration tests (may require mocked services)
+    slow: Slow running tests
+    requires_api: Tests requiring external API access
+
+# Logging
+log_cli = true
+log_cli_level = INFO
+log_cli_format = %(asctime)s [%(levelname)s] %(message)s
+log_cli_date_format = %Y-%m-%d %H:%M:%S
+
+# Warnings
+filterwarnings =
+    ignore::DeprecationWarning
+    ignore::PendingDeprecationWarning
+
+# Timeout (requires pytest-timeout)
+# timeout = 30
+
+# Coverage settings (when running with --cov)
+# Configured in pyproject.toml


This pytest.ini file duplicates several configurations that are also present in pyproject.toml (e.g., testpaths, python_files, filterwarnings). This can lead to confusion and make maintenance harder, as it's unclear which file takes precedence. It's recommended to consolidate all tool configurations into pyproject.toml to have a single source of truth.

gemini-code-assist · 2025-12-04T17:05:32Z

+  | buck-out
+  | build
+  | dist
+  | migrations


Excluding the entire migrations directory from Black formatting is not ideal. While the auto-generated migration files can be noisy, the env.py and models.py files within it would benefit from consistent formatting. Consider refining the exclusion pattern or running Black on those specific files manually.

gemini-code-assist · 2025-12-04T17:05:32Z

+JWT_SECRET=generate_a_secure_32_character_key_here
+ENCRYPTION_KEY=generate_another_32_character_key


The placeholder values for JWT_SECRET and ENCRYPTION_KEY are not secure. While this is a template, providing insecure placeholders can lead to them being used in development or even staging environments. It's better to provide instructions on how to generate secure keys, or leave them blank.

gemini-code-assist · 2025-12-04T17:05:32Z

+security:
+  - bearerAuth: []


Applying security at the top level makes all endpoints require authentication. This is likely not intended for the /health endpoint, which is typically public for monitoring purposes. It's better to define security requirements on a per-path or per-operation basis for more granular control. Consider removing this global security definition and applying it only to the endpoints that need protection.

gemini-code-assist · 2025-12-04T17:05:32Z

+-- Sistema de Automacao Juridica - Database Initialization
+-- PostgreSQL 15
+
+-- Create extensions
+CREATE EXTENSION IF NOT EXISTS "uuid-ossp";
+CREATE EXTENSION IF NOT EXISTS "pg_trgm";
+
+-- Create schemas
+CREATE SCHEMA IF NOT EXISTS judicial;
+CREATE SCHEMA IF NOT EXISTS audit;
+
+-- Set search path
+SET search_path TO judicial, public;
+
+-- =====================================================
+-- MAIN TABLES
+-- =====================================================
+
+-- Cases table
+CREATE TABLE IF NOT EXISTS judicial.cases (
+    id UUID PRIMARY KEY DEFAULT uuid_generate_v4(),
+    case_number VARCHAR(50) NOT NULL UNIQUE,
+    tribunal VARCHAR(20) NOT NULL,
+    court VARCHAR(100),
+    class_code INTEGER,
+    class_name VARCHAR(255),
+    subject_codes INTEGER[],
+    status VARCHAR(50) DEFAULT 'pending',
+    priority INTEGER DEFAULT 0,
+    received_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(),
+    processed_at TIMESTAMP WITH TIME ZONE,
+    created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(),
+    updated_at TIMESTAMP WITH TIME ZONE DEFAULT NOW()
+);
+
+-- Create index for case search
+CREATE INDEX IF NOT EXISTS idx_cases_number ON judicial.cases(case_number);
+CREATE INDEX IF NOT EXISTS idx_cases_tribunal ON judicial.cases(tribunal);
+CREATE INDEX IF NOT EXISTS idx_cases_status ON judicial.cases(status);
+CREATE INDEX IF NOT EXISTS idx_cases_created ON judicial.cases(created_at);
+
+-- Documents table
+CREATE TABLE IF NOT EXISTS judicial.documents (
+    id UUID PRIMARY KEY DEFAULT uuid_generate_v4(),
+    case_id UUID REFERENCES judicial.cases(id) ON DELETE CASCADE,
+    document_type VARCHAR(50) NOT NULL,
+    file_name VARCHAR(255),
+    file_path VARCHAR(500),
+    file_size BIGINT,
+    mime_type VARCHAR(100),
+    extracted_text TEXT,
+    page_count INTEGER,
+    checksum VARCHAR(64),
+    uploaded_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(),
+    processed_at TIMESTAMP WITH TIME ZONE,
+    created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW()
+);
+
+-- Create index for document search
+CREATE INDEX IF NOT EXISTS idx_documents_case ON judicial.documents(case_id);
+CREATE INDEX IF NOT EXISTS idx_documents_type ON judicial.documents(document_type);
+CREATE INDEX IF NOT EXISTS idx_documents_text_gin ON judicial.documents USING gin(to_tsvector('portuguese', extracted_text));
+
+-- FIRAC Analysis table
+CREATE TABLE IF NOT EXISTS judicial.firac_analyses (
+    id UUID PRIMARY KEY DEFAULT uuid_generate_v4(),
+    case_id UUID REFERENCES judicial.cases(id) ON DELETE CASCADE,
+    document_id UUID REFERENCES judicial.documents(id) ON DELETE SET NULL,
+    facts TEXT,
+    issues TEXT,
+    rules TEXT,
+    analysis TEXT,
+    conclusion TEXT,
+    raw_response JSONB,
+    model_used VARCHAR(50),
+    tokens_used INTEGER,
+    confidence_score DECIMAL(3,2),
+    analyzed_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(),
+    created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW()
+);
+
+CREATE INDEX IF NOT EXISTS idx_firac_case ON judicial.firac_analyses(case_id);
+
+-- Jurisprudence Search Results table
+CREATE TABLE IF NOT EXISTS judicial.jurisprudence_searches (
+    id UUID PRIMARY KEY DEFAULT uuid_generate_v4(),
+    case_id UUID REFERENCES judicial.cases(id) ON DELETE CASCADE,
+    search_query JSONB NOT NULL,
+    tribunal VARCHAR(20),
+    total_results INTEGER,
+    results JSONB,
+    search_duration_ms INTEGER,
+    searched_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(),
+    created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW()
+);
+
+CREATE INDEX IF NOT EXISTS idx_jurisprudence_case ON judicial.jurisprudence_searches(case_id);
+CREATE INDEX IF NOT EXISTS idx_jurisprudence_tribunal ON judicial.jurisprudence_searches(tribunal);
+
+-- Distinguish Analysis table
+CREATE TABLE IF NOT EXISTS judicial.distinguish_analyses (
+    id UUID PRIMARY KEY DEFAULT uuid_generate_v4(),
+    case_id UUID REFERENCES judicial.cases(id) ON DELETE CASCADE,
+    jurisprudence_search_id UUID REFERENCES judicial.jurisprudence_searches(id) ON DELETE SET NULL,
+    current_facts TEXT NOT NULL,
+    precedent_data JSONB,
+    is_applicable BOOLEAN,
+    similarities TEXT,
+    differences TEXT,
+    recommendation TEXT,
+    raw_response JSONB,
+    confidence_score DECIMAL(3,2),
+    analyzed_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(),
+    created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW()
+);
+
+CREATE INDEX IF NOT EXISTS idx_distinguish_case ON judicial.distinguish_analyses(case_id);
+
+-- Generated Documents table
+CREATE TABLE IF NOT EXISTS judicial.generated_documents (
+    id UUID PRIMARY KEY DEFAULT uuid_generate_v4(),
+    case_id UUID REFERENCES judicial.cases(id) ON DELETE CASCADE,
+    firac_analysis_id UUID REFERENCES judicial.firac_analyses(id) ON DELETE SET NULL,
+    distinguish_analysis_id UUID REFERENCES judicial.distinguish_analyses(id) ON DELETE SET NULL,
+    document_type VARCHAR(50) NOT NULL,
+    title VARCHAR(255),
+    content TEXT NOT NULL,
+    google_docs_id VARCHAR(255),
+    google_docs_url VARCHAR(500),
+    version INTEGER DEFAULT 1,
+    status VARCHAR(50) DEFAULT 'draft',
+    generated_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(),
+    published_at TIMESTAMP WITH TIME ZONE,
+    created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(),
+    updated_at TIMESTAMP WITH TIME ZONE DEFAULT NOW()
+);
+
+CREATE INDEX IF NOT EXISTS idx_generated_case ON judicial.generated_documents(case_id);
+CREATE INDEX IF NOT EXISTS idx_generated_type ON judicial.generated_documents(document_type);
+CREATE INDEX IF NOT EXISTS idx_generated_status ON judicial.generated_documents(status);
+
+-- Notifications table
+CREATE TABLE IF NOT EXISTS judicial.notifications (
+    id UUID PRIMARY KEY DEFAULT uuid_generate_v4(),
+    case_id UUID REFERENCES judicial.cases(id) ON DELETE CASCADE,
+    notification_type VARCHAR(50) NOT NULL,
+    recipient_email VARCHAR(255),
+    subject VARCHAR(255),
+    message TEXT,
+    status VARCHAR(50) DEFAULT 'pending',
+    sent_at TIMESTAMP WITH TIME ZONE,
+    error_message TEXT,
+    created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW()
+);
+
+CREATE INDEX IF NOT EXISTS idx_notifications_case ON judicial.notifications(case_id);
+CREATE INDEX IF NOT EXISTS idx_notifications_status ON judicial.notifications(status);
+
+-- =====================================================
+-- AUDIT TABLES
+-- =====================================================
+
+-- Audit log table
+CREATE TABLE IF NOT EXISTS audit.logs (
+    id UUID PRIMARY KEY DEFAULT uuid_generate_v4(),
+    table_name VARCHAR(100) NOT NULL,
+    record_id UUID,
+    action VARCHAR(20) NOT NULL,
+    old_data JSONB,
+    new_data JSONB,
+    user_id VARCHAR(100),
+    ip_address INET,
+    user_agent TEXT,
+    created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW()
+);
+
+CREATE INDEX IF NOT EXISTS idx_audit_table ON audit.logs(table_name);
+CREATE INDEX IF NOT EXISTS idx_audit_record ON audit.logs(record_id);
+CREATE INDEX IF NOT EXISTS idx_audit_action ON audit.logs(action);
+CREATE INDEX IF NOT EXISTS idx_audit_created ON audit.logs(created_at);
+
+-- =====================================================
+-- FUNCTIONS AND TRIGGERS
+-- =====================================================
+
+-- Update timestamp function
+CREATE OR REPLACE FUNCTION update_updated_at_column()
+RETURNS TRIGGER AS $$
+BEGIN
+    NEW.updated_at = NOW();
+    RETURN NEW;
+END;
+$$ language 'plpgsql';
+
+-- Apply update trigger to relevant tables
+CREATE TRIGGER update_cases_updated_at
+    BEFORE UPDATE ON judicial.cases
+    FOR EACH ROW
+    EXECUTE FUNCTION update_updated_at_column();
+
+CREATE TRIGGER update_generated_documents_updated_at
+    BEFORE UPDATE ON judicial.generated_documents
+    FOR EACH ROW
+    EXECUTE FUNCTION update_updated_at_column();
+
+-- Audit logging function
+CREATE OR REPLACE FUNCTION audit.log_changes()
+RETURNS TRIGGER AS $$
+BEGIN
+    IF TG_OP = 'INSERT' THEN
+        INSERT INTO audit.logs (table_name, record_id, action, new_data)
+        VALUES (TG_TABLE_SCHEMA || '.' || TG_TABLE_NAME, NEW.id, 'INSERT', to_jsonb(NEW));
+        RETURN NEW;
+    ELSIF TG_OP = 'UPDATE' THEN
+        INSERT INTO audit.logs (table_name, record_id, action, old_data, new_data)
+        VALUES (TG_TABLE_SCHEMA || '.' || TG_TABLE_NAME, NEW.id, 'UPDATE', to_jsonb(OLD), to_jsonb(NEW));
+        RETURN NEW;
+    ELSIF TG_OP = 'DELETE' THEN
+        INSERT INTO audit.logs (table_name, record_id, action, old_data)
+        VALUES (TG_TABLE_SCHEMA || '.' || TG_TABLE_NAME, OLD.id, 'DELETE', to_jsonb(OLD));
+        RETURN OLD;
+    END IF;
+    RETURN NULL;
+END;
+$$ LANGUAGE plpgsql;
+
+-- Apply audit triggers
+CREATE TRIGGER audit_cases
+    AFTER INSERT OR UPDATE OR DELETE ON judicial.cases
+    FOR EACH ROW EXECUTE FUNCTION audit.log_changes();
+
+CREATE TRIGGER audit_documents
+    AFTER INSERT OR UPDATE OR DELETE ON judicial.documents
+    FOR EACH ROW EXECUTE FUNCTION audit.log_changes();
+
+CREATE TRIGGER audit_generated_documents
+    AFTER INSERT OR UPDATE OR DELETE ON judicial.generated_documents
+    FOR EACH ROW EXECUTE FUNCTION audit.log_changes();
+
+-- =====================================================
+-- VIEWS
+-- =====================================================
+
+-- Case summary view
+CREATE OR REPLACE VIEW judicial.case_summary AS
+SELECT
+    c.id,
+    c.case_number,
+    c.tribunal,
+    c.court,
+    c.class_name,
+    c.status,
+    c.priority,
+    c.received_at,
+    c.processed_at,
+    COUNT(DISTINCT d.id) AS document_count,
+    COUNT(DISTINCT f.id) AS firac_analysis_count,
+    COUNT(DISTINCT g.id) AS generated_document_count,
+    MAX(g.generated_at) AS last_document_generated
+FROM judicial.cases c
+LEFT JOIN judicial.documents d ON d.case_id = c.id
+LEFT JOIN judicial.firac_analyses f ON f.case_id = c.id
+LEFT JOIN judicial.generated_documents g ON g.case_id = c.id
+GROUP BY c.id;
+
+-- Processing statistics view
+CREATE OR REPLACE VIEW judicial.processing_stats AS
+SELECT
+    DATE(created_at) AS date,
+    COUNT(*) AS total_cases,
+    COUNT(CASE WHEN status = 'completed' THEN 1 END) AS completed_cases,
+    COUNT(CASE WHEN status = 'pending' THEN 1 END) AS pending_cases,
+    COUNT(CASE WHEN status = 'error' THEN 1 END) AS error_cases,
+    AVG(EXTRACT(EPOCH FROM (processed_at - received_at))) AS avg_processing_seconds
+FROM judicial.cases
+WHERE created_at >= NOW() - INTERVAL '30 days'
+GROUP BY DATE(created_at)
+ORDER BY date DESC;
+
+-- =====================================================
+-- INITIAL DATA
+-- =====================================================
+
+-- Insert common document types
+CREATE TABLE IF NOT EXISTS judicial.document_types (
+    code VARCHAR(50) PRIMARY KEY,
+    name VARCHAR(100) NOT NULL,
+    description TEXT
+);
+
+INSERT INTO judicial.document_types (code, name, description) VALUES
+    ('sentenca', 'Sentenca', 'Decisao judicial que resolve o merito da causa'),
+    ('despacho', 'Despacho', 'Ato judicial de mero expediente'),
+    ('decisao', 'Decisao Interlocutoria', 'Decisao que resolve questao incidente'),
+    ('acordao', 'Acordao', 'Decisao colegiada de tribunal'),
+    ('peticao', 'Peticao', 'Documento da parte'),
+    ('parecer', 'Parecer', 'Opiniao tecnica do Ministerio Publico')
+ON CONFLICT (code) DO NOTHING;
+
+-- Grant permissions
+GRANT USAGE ON SCHEMA judicial TO n8n_user;
+GRANT USAGE ON SCHEMA audit TO n8n_user;
+GRANT ALL PRIVILEGES ON ALL TABLES IN SCHEMA judicial TO n8n_user;
+GRANT ALL PRIVILEGES ON ALL TABLES IN SCHEMA audit TO n8n_user;
+GRANT ALL PRIVILEGES ON ALL SEQUENCES IN SCHEMA judicial TO n8n_user;
+
+-- Success message
+DO $$
+BEGIN
+    RAISE NOTICE 'Database initialization completed successfully!';
+    RAISE NOTICE 'Schemas created: judicial, audit';
+    RAISE NOTICE 'Tables created: cases, documents, firac_analyses, jurisprudence_searches, distinguish_analyses, generated_documents, notifications';
+END $$;


This SQL script manually defines the database schema, which is also managed by Alembic migrations. Maintaining two sources of truth for the database schema is problematic and can lead to inconsistencies. For example, this script includes a GIN index and audit triggers that are missing from the Alembic migration. It's best to rely on a single system, preferably Alembic, to manage all schema changes. Please consolidate all schema definitions into your Alembic migrations and consider removing this SQL file or marking it clearly as for development setup only and ensuring it's generated from migrations.

coderabbitai

Actionable comments posted: 16

Note

Due to the large number of review comments, Critical, Major severity comments were prioritized as inline comments.

🟡 Minor comments (11)

.env.template-68-68 (1)
68-68: Quote unquoted environment variable value.

Line 68 contains a value with spaces that should be quoted. The dotenv parser may not handle the unquoted value correctly.

Apply this diff:
-SMTP_FROM_NAME=Sistema Automacao Juridica
+SMTP_FROM_NAME="Sistema Automacao Juridica"
tests/unit/test_pdf_extractor.py-1-102 (1)
1-102: Run Black formatter to fix formatting issues.

The CI pipeline reports a Black formatting check failure. Run black tests/unit/test_pdf_extractor.py to automatically format the code.
#!/bin/bash
# Format the file with Black
black tests/unit/test_pdf_extractor.py
docs/API_DOCUMENTATION.md-19-19 (1)
19-19: Fix markdown formatting issues flagged by markdownlint.

Several minor markdown formatting issues were detected:

Line 19: Add language identifier to fenced code block

Lines 161, 301: Add blank lines around tables

Lines 388-389: Convert bare URLs to markdown links

Apply these fixes:
+
Authorization: Bearer <your_token>
 | size | integer | No | Max results (default: 50, max: 100) |
+
 **Response:**
 | 500 | Internal server error |
+
 ---
 For API support:
-- **Email:** suporte@judicial-automation.com.br
-- **GitHub Issues:** https://github.com/fbmoulin/2acivelagent/issues
+- **Email:** <suporte@judicial-automation.com.br>
+- **GitHub Issues:** <https://github.com/fbmoulin/2acivelagent/issues>
Also applies to: 161-161, 301-301, 388-389
alembic.ini-28-28 (1)
28-28: Update database name to match the project.

The database name n8n appears to reference the n8n workflow automation platform, which is inconsistent with this judicial automation system. Update to a more appropriate name like judicial_automation.
-sqlalchemy.url = postgresql://n8n_user:password@localhost:5432/n8n
+# sqlalchemy.url = postgresql://judicial_user:password@localhost:5432/judicial_automation
docs/api/openapi.yaml-158-158 (1)
158-158: Escape or rephrase the colon in the description text.

YAMLlint reports a syntax error on this line. The colon character within the Portuguese text (ex: falha OpenAI) may confuse YAML parsers that interpret it as a key-value separator.

Apply this diff to fix the issue by wrapping the description in quotes:
-          description: Erro na analise (ex: falha OpenAI)
+          description: "Erro na analise (ex: falha OpenAI)"
Alternatively, replace the colon with a dash or remove the parenthetical:
-          description: Erro na analise (ex: falha OpenAI)
+          description: Erro na analise (ex - falha OpenAI)
tests/unit/test_api_routes.py-134-143 (1)
134-143: Add assertions on the response.

The response variable is assigned but never used. The test should verify the response status code or data to ensure complete coverage.
         response = client.post('/firac-analysis',
                                data=json.dumps({'text': sample_legal_text}),
                                content_type='application/json')
 
         mock_openai.assert_called_once()
+        assert response.status_code == 200
tests/unit/test_api_routes.py-175-190 (1)
175-190: Add assertions on the response.

The response variable is assigned but never used. The test should verify the response to ensure the API call succeeded.
         response = client.post('/datajud-search',
                                data=json.dumps({
                                    'tribunal': 'tjsp',
                                    'texto_livre': 'execucao fiscal'
                                }),
                                content_type='application/json')
 
         mock_post.assert_called_once()
+        assert response.status_code == 200
migrations/models.py-244-244 (1)

244-244: Same privacy consideration as noted for SQL script.

recipient_email stores PII; consider documenting data retention and protection requirements per LGPD compliance.

scripts/python/app.py-1-2 (1)

1-2: Run Black formatter to fix CI pipeline failure.

The CI pipeline reports Black formatting check failed. Run black scripts/python/app.py to fix. Also, the shebang is present but file is not executable (Ruff EXE001).

tests/conftest.py-1-3 (1)

1-3: Run Black formatter to fix CI pipeline failure.

The CI pipeline reports that Black formatting check failed for this file. Run black tests/conftest.py to fix formatting issues.

scripts/sql/init_database.sql-142-157 (1)

142-157: Consider privacy implications for email storage.

The recipient_email column stores email addresses in plain text. Depending on your compliance requirements (LGPD - Brazil's data protection law), you may need to:

Document this data collection in privacy policies

Consider encryption at rest

Implement data retention policies

🧹 Nitpick comments (20)

.env.template (1)

1-109: Consider reordering environment variables for consistency.

The dotenv-linter tool flagged 17 instances where variables within each section are not alphabetically ordered (e.g., POSTGRES_DB before POSTGRES_HOST, REDIS_PASSWORD before REDIS_PORT). While this does not affect functionality, alphabetical ordering improves consistency and readability.
docs/api/openapi.yaml (2)
54-76: Consider excluding /health from authentication requirements.

The /health endpoint is typically used by load balancers, monitoring systems, and container orchestrators for health checks. Requiring JWT authentication may prevent these systems from functioning correctly.

If you want to exclude the health endpoint from authentication, add a security override:
 /health:
   get:
     tags:
       - Health
     summary: Verificacao de saude
     description: Retorna o status de saude da API e seus servicos dependentes.
     operationId: healthCheck
+    security: []
     responses:
The empty array security: [] overrides the global security requirement for this specific endpoint.

363-364: Consider increasing the minimum text length for FIRAC analysis.

The minLength: 50 constraint for legal text analysis may be too permissive. A meaningful FIRAC analysis typically requires a substantial case description with facts, issues, and legal context.

Consider increasing the minimum to a more practical value:
       text:
         type: string
-        minLength: 50
+        minLength: 200
         description: Texto juridico para analise (minimo 50 caracteres)
And update the description accordingly.
alembic.ini (1)
9-9: Verify the file_template format string.

The migration filename template uses multiple datetime components that will generate very long filenames. While this works, it may be unnecessarily verbose.

Consider simplifying to the more common format:
-file_template = %%(year)d%%(month).2d%%(day).2d_%%(hour).2d%%(minute).2d%%(second).2d_%%(rev)s_%%(slug)s
+file_template = %%(year)d%%(month).2d%%(day).2d_%%(rev)s_%%(slug)s
This removes hour/minute/second, which are rarely needed since the revision ID already provides uniqueness.
infrastructure/docker/Dockerfile.python (1)
64-65: Consider using Python for health checks instead of curl.

While the current health check works, using Python would eliminate the curl dependency and reduce the image size slightly.

Replace the curl-based health check with a Python-based one:
 HEALTHCHECK --interval=30s --timeout=10s --start-period=5s --retries=3 \
-    CMD curl -f http://localhost:5000/health || exit 1
+    CMD python -c "import urllib.request; urllib.request.urlopen('http://localhost:5000/health').read()" || exit 1
Then remove curl from the runtime dependencies:
 RUN apt-get update && apt-get install -y --no-install-recommends \
     libpq5 \
-    curl \
     && rm -rf /var/lib/apt/lists/* \
     && apt-get clean
migrations/README.md (1)

15-18: Consider using a more descriptive database name following naming conventions.

The example uses n8n as the database name. While n8n is a core component of this judicial automation system, database names typically reflect the database content or application domain rather than the orchestration tool name. Consider using judicial_automation, legal_system, or jus_automation for better clarity and maintainability.
tests/unit/test_api_routes.py (1)
78-86: Assertion is too permissive.

The test accepts either 'text' or 'error' in the response, which means it passes even if the extraction fails. Consider separate tests for success and error cases to ensure proper validation.
-    @pytest.mark.unit
-    def test_extract_pdf_returns_text(self, client, sample_pdf_base64):
-        """Test that extracted text is returned"""
-        response = client.post('/extract-pdf',
-                               data=json.dumps({'pdf_content': sample_pdf_base64}),
-                               content_type='application/json')
-
-        data = json.loads(response.data)
-        assert 'text' in data or 'error' in data
+    @pytest.mark.unit
+    def test_extract_pdf_returns_text(self, client, sample_pdf_base64):
+        """Test that extracted text is returned"""
+        response = client.post('/extract-pdf',
+                               data=json.dumps({'pdf_content': sample_pdf_base64}),
+                               content_type='application/json')
+
+        assert response.status_code == 200
+        data = json.loads(response.data)
+        assert 'text' in data
migrations/versions/20241204_000000_001_initial_schema.py (1)
103-125: Consider partial indexes for confidence scores.

If you frequently query for high-confidence analyses, consider adding partial indexes on confidence_score where the score exceeds a threshold. This is optional and can be deferred based on actual query patterns.

Example for future migration:
CREATE INDEX idx_firac_high_confidence 
  ON judicial.firac_analyses(confidence_score) 
  WHERE confidence_score > 0.80;
tests/integration/test_api_integration.py (2)
207-218: Remove unused mock_openai parameter.

The mock_openai parameter is declared but never used in this test. Remove it to clean up the test signature.
     @pytest.mark.integration
-    @patch('scripts.python.app.openai.ChatCompletion.create')
-    def test_multiple_pdf_extractions(self, mock_openai, client, sample_pdf_base64):
+    def test_multiple_pdf_extractions(self, client, sample_pdf_base64):
         """Test multiple PDF extractions"""
243-261: Remove unused sample_pdf_base64 parameter.

The sample_pdf_base64 parameter is declared but never used in this test. Remove it to clean up the test signature.
     @pytest.mark.integration
-    def test_unicode_text_handled(self, client, sample_pdf_base64):
+    def test_unicode_text_handled(self, client):
         """Test handling of Unicode text (Portuguese characters)"""
infrastructure/nginx/nginx.conf (1)
55-59: Consider adding Content-Security-Policy header.

The security headers are good, but consider adding a Content-Security-Policy (CSP) header for additional protection against XSS and data injection attacks. Note that X-XSS-Protection is deprecated in modern browsers but doesn't hurt to keep for legacy support.
     add_header X-Frame-Options "SAMEORIGIN" always;
     add_header X-Content-Type-Options "nosniff" always;
     add_header X-XSS-Protection "1; mode=block" always;
     add_header Referrer-Policy "strict-origin-when-cross-origin" always;
+    add_header Content-Security-Policy "default-src 'self'; script-src 'self' 'unsafe-inline' 'unsafe-eval'; style-src 'self' 'unsafe-inline';" always;
.github/workflows/ci.yml (1)

194-197: Consider increasing coverage threshold over time.

The 50% coverage threshold is reasonable for initial setup, but plan to gradually increase this to 70-80% as the codebase matures to ensure comprehensive test coverage.
tests/conftest.py (2)
11-15: Consider using pytest.ini or pyproject.toml for path configuration.

The sys.path.insert approach works but is fragile. Consider configuring pythonpath in pytest.ini or pyproject.toml instead for cleaner test discovery.
# pytest.ini
[pytest]
pythonpath = .
73-77: Move base64 import to module level.

The base64 import should be at the top of the file with other imports for consistency with PEP 8.
 import os
 import sys
 import json
+import base64
 import pytest
And then remove line 76.
scripts/sql/init_database.sql (2)
28-28: Consider adding a CHECK constraint for status values.

The status column uses VARCHAR(50) with a default, but unlike the ORM models (migrations/models.py) which define a CaseStatus enum, this SQL script lacks validation. Consider adding a CHECK constraint or creating an ENUM type for consistency:
-- Option 1: CHECK constraint
status VARCHAR(50) DEFAULT 'pending' CHECK (status IN ('pending', 'processing', 'completed', 'error', 'archived')),

-- Option 2: ENUM type (preferred for consistency with ORM)
CREATE TYPE judicial.case_status AS ENUM ('pending', 'processing', 'completed', 'error', 'archived');
-- Then use: status judicial.case_status DEFAULT 'pending',
206-225: Audit trigger does not capture user context.

The audit.log_changes() function logs row changes but doesn't populate user_id, ip_address, or user_agent. These fields will always be NULL. Consider using session variables or application-level auditing to capture user context:
-- Application can set these before operations:
-- SET LOCAL app.user_id = 'user123';
-- SET LOCAL app.ip_address = '192.168.1.1';

-- Then in the trigger function:
INSERT INTO audit.logs (table_name, record_id, action, new_data, user_id, ip_address)
VALUES (
    TG_TABLE_SCHEMA || '.' || TG_TABLE_NAME, 
    NEW.id, 
    'INSERT', 
    to_jsonb(NEW),
    current_setting('app.user_id', true),
    current_setting('app.ip_address', true)::inet
);
scripts/python/app.py (3)
71-76: Use logger.exception for automatic stack traces.

Replace logger.error with logger.exception in exception handlers. This applies to all similar patterns in this file (lines 72, 121, 207, 261, 331, 401, 422, 439, 460, 483).
         except Exception as e:
-            logger.error(f"PDF extraction error: {e}")
+            logger.exception("PDF extraction error")
             return {
                 "success": False,
                 "error": str(e)
             }
183-189: No validation of DATAJUD credentials before request.

If DATAJUD_USERNAME or DATAJUD_PASSWORD are not set, auth=(None, None) is passed to requests, which may cause unexpected behavior. Validate credentials before making the request:
+            if not self.username or not self.password:
+                return {
+                    "success": False,
+                    "error": "DATAJUD credentials not configured"
+                }
+
             response = requests.post(
                 endpoint,
                 json=search_query,
                 headers=headers,
                 auth=(self.username, self.password),
                 timeout=30
             )
488-495: Unused error parameter in error handlers.

Prefix with underscore to indicate intentionally unused:
 @app.errorhandler(404)
-def not_found(error):
+def not_found(_error):
     return jsonify({"error": "Endpoint not found"}), 404


 @app.errorhandler(500)
-def internal_error(error):
+def internal_error(_error):
     return jsonify({"error": "Internal server error"}), 500
migrations/models.py (1)
90-90: Inconsistent type usage for document_type.

Document.document_type uses String(50) while GeneratedDocument.document_type (line 211) uses SQLEnum(DocumentType). Consider using the enum consistently:
-    document_type = Column(String(50), nullable=False)
+    document_type = Column(
+        SQLEnum(DocumentType, name="document_type", schema="judicial"),
+        nullable=False
+    )

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2547c6d and ebe0325.

📒 Files selected for processing (28)

.env.template (1 hunks)
.github/dependabot.yml (1 hunks)
.github/workflows/ci.yml (1 hunks)
.github/workflows/deploy.yml (1 hunks)
.gitignore (1 hunks)
alembic.ini (1 hunks)
docs/API_DOCUMENTATION.md (1 hunks)
docs/api/openapi.yaml (1 hunks)
infrastructure/docker/Dockerfile.python (1 hunks)
infrastructure/nginx/nginx.conf (1 hunks)
migrations/README.md (1 hunks)
migrations/env.py (1 hunks)
migrations/models.py (1 hunks)
migrations/script.py.mako (1 hunks)
migrations/versions/20241204_000000_001_initial_schema.py (1 hunks)
pyproject.toml (1 hunks)
pytest.ini (1 hunks)
requirements.txt (1 hunks)
scripts/python/app.py (1 hunks)
scripts/sql/init_database.sql (1 hunks)
security/ssl_certificates/README.txt (1 hunks)
tests/__init__.py (1 hunks)
tests/conftest.py (1 hunks)
tests/integration/__init__.py (1 hunks)
tests/integration/test_api_integration.py (1 hunks)
tests/unit/__init__.py (1 hunks)
tests/unit/test_api_routes.py (1 hunks)
tests/unit/test_pdf_extractor.py (1 hunks)

🧰 Additional context used

🧬 Code graph analysis (3)

tests/conftest.py (1)

scripts/python/app.py (5)

PDFExtractor (47-76)

FIRACAnalyzer (79-125)

DatajudClient (128-211)

DistinguishAnalyzer (214-265)

DocumentGenerator (268-335)

scripts/python/app.py (1)

tests/conftest.py (5)

pdf_extractor (180-182)

firac_analyzer (186-188)

datajud_client (192-194)

distinguish_analyzer (198-200)

document_generator (204-206)

tests/unit/test_pdf_extractor.py (2)

tests/conftest.py (2)

pdf_extractor (180-182)

sample_pdf_content (35-70)

scripts/python/app.py (1)

extract_text_from_pdf (51-76)

🪛 actionlint (1.7.9)

.github/workflows/deploy.yml

152-152: the runner of "softprops/action-gh-release@v1" action is too old to run on GitHub Actions. update the action's version to fix this issue

(action)

🪛 Checkov (3.2.334)

.github/workflows/ci.yml

[medium] 163-164: Basic Auth Credentials

(CKV_SECRET_4)

🪛 dotenv-linter (4.0.0)

.env.template

[warning] 12-12: [UnorderedKey] The N8N_PASSWORD key should go before the N8N_PORT key

(UnorderedKey)

[warning] 20-20: [UnorderedKey] The POSTGRES_DB key should go before the POSTGRES_HOST key

(UnorderedKey)

[warning] 22-22: [UnorderedKey] The POSTGRES_PASSWORD key should go before the POSTGRES_PORT key

(UnorderedKey)

[warning] 29-29: [UnorderedKey] The REDIS_PASSWORD key should go before the REDIS_PORT key

(UnorderedKey)

[warning] 37-37: [UnorderedKey] The OPENAI_MAX_TOKENS key should go before the OPENAI_MODEL key

(UnorderedKey)

[warning] 43-43: [UnorderedKey] The DATAJUD_PASSWORD key should go before the DATAJUD_USERNAME key

(UnorderedKey)

[warning] 44-44: [UnorderedKey] The DATAJUD_BASE_URL key should go before the DATAJUD_PASSWORD key

(UnorderedKey)

[warning] 57-57: [UnorderedKey] The GOOGLE_LOCATION key should go before the GOOGLE_PROJECT_ID key

(UnorderedKey)

[warning] 58-58: [UnorderedKey] The GOOGLE_PROCESSOR_ID key should go before the GOOGLE_PROJECT_ID key

(UnorderedKey)

[warning] 66-66: [UnorderedKey] The SMTP_PASSWORD key should go before the SMTP_PORT key

(UnorderedKey)

[warning] 67-67: [UnorderedKey] The SMTP_FROM_EMAIL key should go before the SMTP_HOST key

(UnorderedKey)

[warning] 68-68: [UnorderedKey] The SMTP_FROM_NAME key should go before the SMTP_HOST key

(UnorderedKey)

[warning] 68-68: [ValueWithoutQuotes] This value needs to be surrounded in quotes

(ValueWithoutQuotes)

[warning] 74-74: [UnorderedKey] The GRAFANA_PASSWORD key should go before the GRAFANA_USER key

(UnorderedKey)

[warning] 75-75: [UnorderedKey] The GF_SERVER_ROOT_URL key should go before the GRAFANA_PASSWORD key

(UnorderedKey)

[warning] 81-81: [UnorderedKey] The ENCRYPTION_KEY key should go before the JWT_SECRET key

(UnorderedKey)

[warning] 87-87: [UnorderedKey] The FLASK_DEBUG key should go before the FLASK_ENV key

(UnorderedKey)

🪛 GitHub Actions: CI Pipeline

tests/integration/test_api_integration.py