Skip to content

Add missing infrastructure files and database schema#1

Open
fbmoulin wants to merge 8 commits intomainfrom
claude/project-analysis-planning-01MDFn863jteLVoZXZPj61Zf
Open

Add missing infrastructure files and database schema#1
fbmoulin wants to merge 8 commits intomainfrom
claude/project-analysis-planning-01MDFn863jteLVoZXZPj61Zf

Conversation

@fbmoulin
Copy link
Copy Markdown
Owner

@fbmoulin fbmoulin commented Dec 4, 2025

  • Add Dockerfile.python for Python microservices container
  • Add nginx.conf with SSL, rate limiting, and reverse proxy config
  • Add Python Flask API application (app.py)
  • Add PostgreSQL database initialization script with:
    • Judicial schema with cases, documents, analyses tables
    • Audit schema for change tracking
    • Triggers for automatic timestamps and audit logging
    • Views for case summary and processing statistics
  • Add SSL certificates directory placeholder

Summary by CodeRabbit

  • New Features

    • Legal document automation API with six public endpoints (health, PDF extraction, FIRAC analysis, jurisprudence search, precedent analysis, document generation)
    • OpenAPI spec and user-facing API docs for integration and examples
  • Infrastructure

    • CI and multi-stage deployment pipelines (build, test, security scans, image publishing)
    • Database migrations and initial judicial/audit schema
    • Secure reverse-proxy with TLS, rate limits, health checks
    • Production-ready container image layout and safer compose exposure
    • Environment template with configuration placeholders
  • Docs & Tests

    • Expanded deployment/security docs, migration guide, comprehensive unit/integration tests
  • Chores

    • Repo ignore rules and Dependabot configuration added

✏️ Tip: You can customize this high-level summary in your review settings.

- Add Dockerfile.python for Python microservices container
- Add nginx.conf with SSL, rate limiting, and reverse proxy config
- Add Python Flask API application (app.py)
- Add PostgreSQL database initialization script with:
  - Judicial schema with cases, documents, analyses tables
  - Audit schema for change tracking
  - Triggers for automatic timestamps and audit logging
  - Views for case summary and processing statistics
- Add SSL certificates directory placeholder
- Add requirements.txt with all Python dependencies
- Add pyproject.toml with modern Python project configuration
  - Package metadata and classifiers
  - Optional dependencies for dev and monitoring
  - Tool configurations for black, isort, mypy, pytest
- Add .env.template with documented environment variables
- Add .gitignore for Python, Docker, secrets, and IDE files
- Add conftest.py with shared fixtures for testing
- Add unit tests for PDFExtractor class
- Add unit tests for all API endpoints:
  - /health, /extract-pdf, /firac-analysis
  - /datajud-search, /distinguish-analysis, /generate-document
- Add integration tests for complete workflows
- Add tests for error handling and service degradation
- Add pytest.ini with markers and configuration
- Test structure: tests/unit/, tests/integration/
- Add CI pipeline (ci.yml) with:
  - Code quality checks (black, isort, flake8, mypy)
  - Unit tests with pytest
  - Integration tests with Redis and PostgreSQL services
  - Test coverage reporting with Codecov
  - Security scanning (bandit, safety)
  - Docker build verification
- Add deployment pipeline (deploy.yml) with:
  - Docker image build and push to GHCR
  - Staging deployment with smoke tests
  - Production deployment with health checks
  - GitHub Release creation for tags
  - Manual rollback capability
- Add Dependabot configuration for:
  - Python dependencies (weekly)
  - GitHub Actions (weekly)
  - Docker images (weekly)
- Add OpenAPI 3.1 specification (openapi.yaml) with:
  - All API endpoints documented
  - Request/response schemas
  - Error responses
  - Authentication details
  - Rate limiting info
- Add comprehensive API documentation (API_DOCUMENTATION.md):
  - Endpoint descriptions with examples
  - curl examples for each endpoint
  - Python and JavaScript SDK examples
  - Error handling guide
  - LGPD compliance information
- Add alembic.ini configuration
- Add migrations/env.py with environment setup
- Add migrations/models.py with SQLAlchemy ORM models:
  - Case, Document, FIRACAnalysis
  - JurisprudenceSearch, DistinguishAnalysis
  - GeneratedDocument, Notification, AuditLog
- Add initial migration (001) creating:
  - judicial and audit schemas
  - All tables with indexes and constraints
  - Enum types for case_status and document_type
  - Update timestamp triggers
- Add migrations/README.md with usage documentation
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Dec 4, 2025

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

Walkthrough

Adds project foundation: environment template, CI/CD (Dependabot, GitHub Actions), Alembic migrations and SQLAlchemy models, a Flask API (PDF extraction, FIRAC, DataJud, distinguish, document generation) with Redis/OpenAI integrations, Docker/Nginx artifacts, OpenAPI and docs, DB init scripts, and tests.

Changes

Cohort / File(s) Change Summary
Environment & Tooling
\.env.template, pyproject.toml, requirements.txt, pytest.ini, alembic.ini
New environment template and project/tooling manifests: dependency pins, build system, Alembic settings, pytest configuration, and project metadata.
CI / CD / Repo
.github/dependabot.yml, .github/workflows/ci.yml, .github/workflows/deploy.yml, \.gitignore
Added Dependabot config, CI pipeline (lint, tests, coverage, security, docker build), deploy workflow (build, staging, production, rollback), and a consolidated .gitignore.
Database & Migrations
migrations/*, migrations/versions/..._initial_schema.py, migrations/README.md, migrations/script.py.mako
Added Alembic env, migration template, comprehensive SQLAlchemy models/enums, and an initial migration creating schemas, enums, tables, indexes, triggers; README for migrations.
DB Init Script
scripts/sql/init_database.sql
New full PostgreSQL initialization script: extensions, schemas, tables, indexes, triggers, audit functions, views, initial data, and grants.
API Application
scripts/python/app.py
New Flask microservice implementing JWT auth/token endpoint, PDF extraction, FIRAC analysis, DataJud client, distinguish analysis, document generation, Redis caching, OpenAI integration, input validation, middleware, and error handlers.
Infrastructure
infrastructure/docker/Dockerfile.python, infrastructure/nginx/nginx.conf, docker-compose.yml, security/ssl_certificates/README.txt
Multi-stage Python Dockerfile, Nginx reverse-proxy with TLS/rate limiting, docker-compose changed to use expose, and SSL certs README.
API Docs & Spec
docs/API_DOCUMENTATION.md, docs/api/openapi.yaml
Human API docs and OpenAPI 3.1.0 spec with six endpoints, request/response schemas, JWT bearer security, examples, and components.
Deployment & Security Docs
docs/DEPLOYMENT.md, docs/SECURITY.md, README.md
Added deployment guide, security guide, and README updates describing JWT auth, infrastructure diagrams, LGPD notes, and usage examples.
Tests
tests/*, tests/unit/*, tests/integration/*
Test package initializers, conftest.py with fixtures/mocks, unit tests for routes and PDF extractor, and integration tests covering end-to-end flows and failure scenarios.
Misc / Templates
migrations/script.py.mako, security/ssl_certificates/README.txt, tests/__init__.py
Misc templates, small READMEs, and test package markers supporting migrations, SSL handling, and tests.

Sequence Diagram(s)

sequenceDiagram
    actor Client
    participant API as Flask API
    participant Redis
    participant OpenAI
    participant Datajud as DATAJUD API
    participant DB as PostgreSQL

    Client->>API: POST /extract-pdf (pdf/base64)
    API->>Redis: GET extracted_text(cache)
    alt cache hit
        Redis-->>API: cached text
    else
        API->>API: extract via PyPDF2
        API->>Redis: SET extracted_text(cache)
    end
    API-->>Client: extracted text

    Client->>API: POST /firac-analysis (text)
    API->>Redis: GET firac_analysis(cache)
    alt cache hit
        Redis-->>API: cached analysis
    else
        API->>OpenAI: FIRAC prompt (GPT-4)
        OpenAI-->>API: structured FIRAC response
        API->>DB: INSERT firac_analyses
        API->>Redis: SET firac_analysis(cache)
    end
    API-->>Client: FIRAC analysis

    Client->>API: POST /datajud-search (params)
    API->>Datajud: search request
    Datajud-->>API: results
    API->>DB: INSERT jurisprudence_searches
    API-->>Client: search results

    Client->>API: POST /distinguish-analysis (facts, precedent)
    API->>OpenAI: distinguish prompt
    OpenAI-->>API: applicability analysis
    API->>DB: INSERT distinguish_analyses
    API-->>Client: distinguish result

    Client->>API: POST /generate-document (type, case)
    API->>OpenAI: generation prompt
    OpenAI-->>API: document content
    API->>DB: INSERT generated_documents
    API-->>Client: generated document
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

  • Focus review on:
    • scripts/python/app.py: input validation, JWT auth/token flow, OpenAI prompt composition and error handling, Redis fallback, and security headers.
    • migrations/models.py and migrations/versions/..._initial_schema.py: FK constraints, cascade semantics, enum creation idempotency, indexes, and triggers.
    • scripts/sql/init_database.sql: audit functions/triggers, view definitions, grants, and initial data.
    • .github/workflows/deploy.yml: secret usage, image tagging, and rollback steps.

Poem

🐰 I nibble docs and spin a test,

Schemas sprout in tidy nests.
Pipelines hum and servers sing,
PDFs, FIRACs, prompts take wing.
A rabbit cheers — deploy, then rest.

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately reflects the main addition: infrastructure files (Dockerfile, nginx.conf, SSL directory) and database schema (migrations, SQL scripts, Alembic configuration) are the primary content of this comprehensive PR.
Docstring Coverage ✅ Passed Docstring coverage is 90.29% which is sufficient. The required threshold is 80.00%.
✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch claude/project-analysis-planning-01MDFn863jteLVoZXZPj61Zf

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@gemini-code-assist
Copy link
Copy Markdown

Summary of Changes

Hello @fbmoulin, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request establishes the core infrastructure and initial microservices for a judicial automation platform. It sets up the environment for Python-based services, configures Nginx for secure and efficient traffic management, and lays down a comprehensive PostgreSQL database schema with migration capabilities. The included Flask API provides essential functionalities for legal document processing and AI-driven analysis, supported by automated dependency updates and a foundational test suite.

Highlights

  • Core Infrastructure Setup: Introduced foundational infrastructure components including a Dockerfile for Python microservices, Nginx configuration for reverse proxy with SSL and rate limiting, and a comprehensive .env.template for environment variables.
  • Database Schema and Migrations: Added a robust PostgreSQL database schema with dedicated judicial and audit schemas, defining tables for cases, documents, various legal analyses (FIRAC, distinguish), generated documents, and audit logs. Alembic is configured for managing database migrations.
  • Python Flask API: Implemented a new Python Flask API (app.py) to handle core judicial automation functionalities, including PDF text extraction, FIRAC legal analysis, DATAJUD jurisprudence search, distinguish analysis, and legal document generation, leveraging OpenAI and other services.
  • Automated Dependency Management: Integrated Dependabot via .github/dependabot.yml to automate dependency updates for Python, GitHub Actions, and Docker, ensuring the project stays up-to-date with security patches and new features.
  • API Documentation and Testing: Provided detailed API documentation in Markdown and an OpenAPI (Swagger) specification for easy understanding and integration. Initial unit and integration tests, along with Pytest configuration, are included to ensure code quality and functionality.
Ignored Files
  • Ignored by pattern: .github/workflows/** (2)
    • .github/workflows/ci.yml
    • .github/workflows/deploy.yml
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a significant amount of infrastructure and application code, including Docker setup, Nginx configuration, a Python Flask API, and a complete database schema with migrations. The overall structure is very well-organized and follows many best practices, such as using multi-stage Docker builds, security headers in Nginx, and a comprehensive test suite structure. However, there are several critical security vulnerabilities related to hardcoded passwords and some significant bugs in the Python application due to outdated library usage. My review focuses on addressing these critical issues, improving security, fixing bugs, and enhancing maintainability by removing redundancies.

Comment thread scripts/python/app.py
Comment on lines +101 to +109
response = openai.ChatCompletion.create(
model="gpt-4",
messages=[
{"role": "system", "content": "Voce e um especialista em analise juridica brasileira."},
{"role": "user", "content": prompt}
],
max_tokens=2000,
temperature=0.3
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The openai.ChatCompletion.create syntax is for openai library version < 1.0.0. Your requirements.txt specifies openai==1.6.1, which uses a new client-based API. This code will raise an AttributeError. The suggested change updates the call to the new syntax. For better performance, you may want to initialize the OpenAI client once globally rather than in each function call. This same issue exists in DistinguishAnalyzer and DocumentGenerator.

Suggested change
response = openai.ChatCompletion.create(
model="gpt-4",
messages=[
{"role": "system", "content": "Voce e um especialista em analise juridica brasileira."},
{"role": "user", "content": prompt}
],
max_tokens=2000,
temperature=0.3
)
from openai import OpenAI
client = OpenAI()
response = client.chat.completions.create(
model="gpt-4",
messages=[
{"role": "system", "content": "Voce e um especialista em analise juridica brasileira."},
{"role": "user", "content": prompt}
],
max_tokens=2000,
temperature=0.3
)

Comment thread alembic.ini Outdated

# SQLAlchemy URL (can be overridden by env)
# Use env variable in production
sqlalchemy.url = postgresql://n8n_user:password@localhost:5432/n8n
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

Hardcoding credentials, even for a local database with a simple password, is a significant security risk. The sqlalchemy.url is committed to version control with a password. It's much safer to rely entirely on environment variables, which your migrations/env.py script already supports. Please remove this line or comment it out to prevent accidental credential exposure.

# sqlalchemy.url = postgresql://n8n_user:password@localhost:5432/n8n

Comment thread scripts/python/app.py Outdated
Comment on lines +498 to +504
if __name__ == '__main__':
logger.info("Starting Judicial Automation API...")
logger.info(f"OpenAI configured: {bool(openai.api_key)}")
logger.info(f"DATAJUD configured: {bool(datajud_client.username)}")
logger.info(f"Redis configured: {redis_client is not None}")

app.run(host='0.0.0.0', port=5000, debug=os.getenv('FLASK_DEBUG', 'false').lower() == 'true')
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The application is started using app.run(), which runs the Flask development server. This server is not suitable for production as it's not designed to be efficient, stable, or secure. Your Dockerfile executes this script directly. You should use a production-grade WSGI server like Gunicorn or uWSGI to run the application, and update the CMD in your Dockerfile accordingly. This if __name__ == '__main__': block is useful for local development but shouldn't be the entrypoint for production.

Comment thread scripts/python/app.py
Comment on lines +311 to +319
response = openai.ChatCompletion.create(
model="gpt-4",
messages=[
{"role": "system", "content": "Voce e um magistrado especialista em redacao de pecas judiciais."},
{"role": "user", "content": prompts[document_type]}
],
max_tokens=3000,
temperature=0.3
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The openai.ChatCompletion.create syntax is outdated for openai>=1.0.0 and will cause a runtime error. This needs to be updated to use the new client-based API, similar to the other OpenAI calls in this file.

Suggested change
response = openai.ChatCompletion.create(
model="gpt-4",
messages=[
{"role": "system", "content": "Voce e um magistrado especialista em redacao de pecas judiciais."},
{"role": "user", "content": prompts[document_type]}
],
max_tokens=3000,
temperature=0.3
)
from openai import OpenAI
client = OpenAI()
response = client.chat.completions.create(
model="gpt-4",
messages=[
{"role": "system", "content": "Voce e um magistrado especialista em redacao de pecas judiciais."},
{"role": "user", "content": prompts[document_type]}
],
max_tokens=3000,
temperature=0.3
)

Comment thread migrations/env.py Outdated
port = os.getenv("POSTGRES_PORT", "5432")
db = os.getenv("POSTGRES_DB", "n8n")
user = os.getenv("POSTGRES_USER", "n8n_user")
password = os.getenv("POSTGRES_PASSWORD", "password")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The get_url function has a hardcoded default password 'password'. This is a critical security risk, as it might be used accidentally in a real environment if the POSTGRES_PASSWORD environment variable is not set. The application should fail explicitly if a password is not provided, preventing connections with weak or default credentials.

Suggested change
password = os.getenv("POSTGRES_PASSWORD", "password")
password = os.environ["POSTGRES_PASSWORD"] # Fail loudly if not set

Comment thread pytest.ini
Comment on lines +1 to +43
[pytest]
# Sistema de Automacao Juridica - Pytest Configuration

# Test discovery
testpaths = tests
python_files = test_*.py *_test.py
python_classes = Test*
python_functions = test_*

# Minimum version
minversion = 7.0

# Default options
addopts =
-ra
-q
--strict-markers
--tb=short
-v

# Markers
markers =
unit: Unit tests (fast, no external dependencies)
integration: Integration tests (may require mocked services)
slow: Slow running tests
requires_api: Tests requiring external API access

# Logging
log_cli = true
log_cli_level = INFO
log_cli_format = %(asctime)s [%(levelname)s] %(message)s
log_cli_date_format = %Y-%m-%d %H:%M:%S

# Warnings
filterwarnings =
ignore::DeprecationWarning
ignore::PendingDeprecationWarning

# Timeout (requires pytest-timeout)
# timeout = 30

# Coverage settings (when running with --cov)
# Configured in pyproject.toml
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This pytest.ini file duplicates several configurations that are also present in pyproject.toml (e.g., testpaths, python_files, filterwarnings). This can lead to confusion and make maintenance harder, as it's unclear which file takes precedence. It's recommended to consolidate all tool configurations into pyproject.toml to have a single source of truth.

Comment thread pyproject.toml
| buck-out
| build
| dist
| migrations
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Excluding the entire migrations directory from Black formatting is not ideal. While the auto-generated migration files can be noisy, the env.py and models.py files within it would benefit from consistent formatting. Consider refining the exclusion pattern or running Black on those specific files manually.

Comment thread .env.template Outdated
Comment on lines +80 to +81
JWT_SECRET=generate_a_secure_32_character_key_here
ENCRYPTION_KEY=generate_another_32_character_key
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The placeholder values for JWT_SECRET and ENCRYPTION_KEY are not secure. While this is a template, providing insecure placeholders can lead to them being used in development or even staging environments. It's better to provide instructions on how to generate secure keys, or leave them blank.

Comment thread docs/api/openapi.yaml
Comment on lines +504 to +505
security:
- bearerAuth: []
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Applying security at the top level makes all endpoints require authentication. This is likely not intended for the /health endpoint, which is typically public for monitoring purposes. It's better to define security requirements on a per-path or per-operation basis for more granular control. Consider removing this global security definition and applying it only to the endpoints that need protection.

Comment on lines +1 to +313
-- Sistema de Automacao Juridica - Database Initialization
-- PostgreSQL 15

-- Create extensions
CREATE EXTENSION IF NOT EXISTS "uuid-ossp";
CREATE EXTENSION IF NOT EXISTS "pg_trgm";

-- Create schemas
CREATE SCHEMA IF NOT EXISTS judicial;
CREATE SCHEMA IF NOT EXISTS audit;

-- Set search path
SET search_path TO judicial, public;

-- =====================================================
-- MAIN TABLES
-- =====================================================

-- Cases table
CREATE TABLE IF NOT EXISTS judicial.cases (
id UUID PRIMARY KEY DEFAULT uuid_generate_v4(),
case_number VARCHAR(50) NOT NULL UNIQUE,
tribunal VARCHAR(20) NOT NULL,
court VARCHAR(100),
class_code INTEGER,
class_name VARCHAR(255),
subject_codes INTEGER[],
status VARCHAR(50) DEFAULT 'pending',
priority INTEGER DEFAULT 0,
received_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(),
processed_at TIMESTAMP WITH TIME ZONE,
created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(),
updated_at TIMESTAMP WITH TIME ZONE DEFAULT NOW()
);

-- Create index for case search
CREATE INDEX IF NOT EXISTS idx_cases_number ON judicial.cases(case_number);
CREATE INDEX IF NOT EXISTS idx_cases_tribunal ON judicial.cases(tribunal);
CREATE INDEX IF NOT EXISTS idx_cases_status ON judicial.cases(status);
CREATE INDEX IF NOT EXISTS idx_cases_created ON judicial.cases(created_at);

-- Documents table
CREATE TABLE IF NOT EXISTS judicial.documents (
id UUID PRIMARY KEY DEFAULT uuid_generate_v4(),
case_id UUID REFERENCES judicial.cases(id) ON DELETE CASCADE,
document_type VARCHAR(50) NOT NULL,
file_name VARCHAR(255),
file_path VARCHAR(500),
file_size BIGINT,
mime_type VARCHAR(100),
extracted_text TEXT,
page_count INTEGER,
checksum VARCHAR(64),
uploaded_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(),
processed_at TIMESTAMP WITH TIME ZONE,
created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW()
);

-- Create index for document search
CREATE INDEX IF NOT EXISTS idx_documents_case ON judicial.documents(case_id);
CREATE INDEX IF NOT EXISTS idx_documents_type ON judicial.documents(document_type);
CREATE INDEX IF NOT EXISTS idx_documents_text_gin ON judicial.documents USING gin(to_tsvector('portuguese', extracted_text));

-- FIRAC Analysis table
CREATE TABLE IF NOT EXISTS judicial.firac_analyses (
id UUID PRIMARY KEY DEFAULT uuid_generate_v4(),
case_id UUID REFERENCES judicial.cases(id) ON DELETE CASCADE,
document_id UUID REFERENCES judicial.documents(id) ON DELETE SET NULL,
facts TEXT,
issues TEXT,
rules TEXT,
analysis TEXT,
conclusion TEXT,
raw_response JSONB,
model_used VARCHAR(50),
tokens_used INTEGER,
confidence_score DECIMAL(3,2),
analyzed_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(),
created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW()
);

CREATE INDEX IF NOT EXISTS idx_firac_case ON judicial.firac_analyses(case_id);

-- Jurisprudence Search Results table
CREATE TABLE IF NOT EXISTS judicial.jurisprudence_searches (
id UUID PRIMARY KEY DEFAULT uuid_generate_v4(),
case_id UUID REFERENCES judicial.cases(id) ON DELETE CASCADE,
search_query JSONB NOT NULL,
tribunal VARCHAR(20),
total_results INTEGER,
results JSONB,
search_duration_ms INTEGER,
searched_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(),
created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW()
);

CREATE INDEX IF NOT EXISTS idx_jurisprudence_case ON judicial.jurisprudence_searches(case_id);
CREATE INDEX IF NOT EXISTS idx_jurisprudence_tribunal ON judicial.jurisprudence_searches(tribunal);

-- Distinguish Analysis table
CREATE TABLE IF NOT EXISTS judicial.distinguish_analyses (
id UUID PRIMARY KEY DEFAULT uuid_generate_v4(),
case_id UUID REFERENCES judicial.cases(id) ON DELETE CASCADE,
jurisprudence_search_id UUID REFERENCES judicial.jurisprudence_searches(id) ON DELETE SET NULL,
current_facts TEXT NOT NULL,
precedent_data JSONB,
is_applicable BOOLEAN,
similarities TEXT,
differences TEXT,
recommendation TEXT,
raw_response JSONB,
confidence_score DECIMAL(3,2),
analyzed_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(),
created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW()
);

CREATE INDEX IF NOT EXISTS idx_distinguish_case ON judicial.distinguish_analyses(case_id);

-- Generated Documents table
CREATE TABLE IF NOT EXISTS judicial.generated_documents (
id UUID PRIMARY KEY DEFAULT uuid_generate_v4(),
case_id UUID REFERENCES judicial.cases(id) ON DELETE CASCADE,
firac_analysis_id UUID REFERENCES judicial.firac_analyses(id) ON DELETE SET NULL,
distinguish_analysis_id UUID REFERENCES judicial.distinguish_analyses(id) ON DELETE SET NULL,
document_type VARCHAR(50) NOT NULL,
title VARCHAR(255),
content TEXT NOT NULL,
google_docs_id VARCHAR(255),
google_docs_url VARCHAR(500),
version INTEGER DEFAULT 1,
status VARCHAR(50) DEFAULT 'draft',
generated_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(),
published_at TIMESTAMP WITH TIME ZONE,
created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(),
updated_at TIMESTAMP WITH TIME ZONE DEFAULT NOW()
);

CREATE INDEX IF NOT EXISTS idx_generated_case ON judicial.generated_documents(case_id);
CREATE INDEX IF NOT EXISTS idx_generated_type ON judicial.generated_documents(document_type);
CREATE INDEX IF NOT EXISTS idx_generated_status ON judicial.generated_documents(status);

-- Notifications table
CREATE TABLE IF NOT EXISTS judicial.notifications (
id UUID PRIMARY KEY DEFAULT uuid_generate_v4(),
case_id UUID REFERENCES judicial.cases(id) ON DELETE CASCADE,
notification_type VARCHAR(50) NOT NULL,
recipient_email VARCHAR(255),
subject VARCHAR(255),
message TEXT,
status VARCHAR(50) DEFAULT 'pending',
sent_at TIMESTAMP WITH TIME ZONE,
error_message TEXT,
created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW()
);

CREATE INDEX IF NOT EXISTS idx_notifications_case ON judicial.notifications(case_id);
CREATE INDEX IF NOT EXISTS idx_notifications_status ON judicial.notifications(status);

-- =====================================================
-- AUDIT TABLES
-- =====================================================

-- Audit log table
CREATE TABLE IF NOT EXISTS audit.logs (
id UUID PRIMARY KEY DEFAULT uuid_generate_v4(),
table_name VARCHAR(100) NOT NULL,
record_id UUID,
action VARCHAR(20) NOT NULL,
old_data JSONB,
new_data JSONB,
user_id VARCHAR(100),
ip_address INET,
user_agent TEXT,
created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW()
);

CREATE INDEX IF NOT EXISTS idx_audit_table ON audit.logs(table_name);
CREATE INDEX IF NOT EXISTS idx_audit_record ON audit.logs(record_id);
CREATE INDEX IF NOT EXISTS idx_audit_action ON audit.logs(action);
CREATE INDEX IF NOT EXISTS idx_audit_created ON audit.logs(created_at);

-- =====================================================
-- FUNCTIONS AND TRIGGERS
-- =====================================================

-- Update timestamp function
CREATE OR REPLACE FUNCTION update_updated_at_column()
RETURNS TRIGGER AS $$
BEGIN
NEW.updated_at = NOW();
RETURN NEW;
END;
$$ language 'plpgsql';

-- Apply update trigger to relevant tables
CREATE TRIGGER update_cases_updated_at
BEFORE UPDATE ON judicial.cases
FOR EACH ROW
EXECUTE FUNCTION update_updated_at_column();

CREATE TRIGGER update_generated_documents_updated_at
BEFORE UPDATE ON judicial.generated_documents
FOR EACH ROW
EXECUTE FUNCTION update_updated_at_column();

-- Audit logging function
CREATE OR REPLACE FUNCTION audit.log_changes()
RETURNS TRIGGER AS $$
BEGIN
IF TG_OP = 'INSERT' THEN
INSERT INTO audit.logs (table_name, record_id, action, new_data)
VALUES (TG_TABLE_SCHEMA || '.' || TG_TABLE_NAME, NEW.id, 'INSERT', to_jsonb(NEW));
RETURN NEW;
ELSIF TG_OP = 'UPDATE' THEN
INSERT INTO audit.logs (table_name, record_id, action, old_data, new_data)
VALUES (TG_TABLE_SCHEMA || '.' || TG_TABLE_NAME, NEW.id, 'UPDATE', to_jsonb(OLD), to_jsonb(NEW));
RETURN NEW;
ELSIF TG_OP = 'DELETE' THEN
INSERT INTO audit.logs (table_name, record_id, action, old_data)
VALUES (TG_TABLE_SCHEMA || '.' || TG_TABLE_NAME, OLD.id, 'DELETE', to_jsonb(OLD));
RETURN OLD;
END IF;
RETURN NULL;
END;
$$ LANGUAGE plpgsql;

-- Apply audit triggers
CREATE TRIGGER audit_cases
AFTER INSERT OR UPDATE OR DELETE ON judicial.cases
FOR EACH ROW EXECUTE FUNCTION audit.log_changes();

CREATE TRIGGER audit_documents
AFTER INSERT OR UPDATE OR DELETE ON judicial.documents
FOR EACH ROW EXECUTE FUNCTION audit.log_changes();

CREATE TRIGGER audit_generated_documents
AFTER INSERT OR UPDATE OR DELETE ON judicial.generated_documents
FOR EACH ROW EXECUTE FUNCTION audit.log_changes();

-- =====================================================
-- VIEWS
-- =====================================================

-- Case summary view
CREATE OR REPLACE VIEW judicial.case_summary AS
SELECT
c.id,
c.case_number,
c.tribunal,
c.court,
c.class_name,
c.status,
c.priority,
c.received_at,
c.processed_at,
COUNT(DISTINCT d.id) AS document_count,
COUNT(DISTINCT f.id) AS firac_analysis_count,
COUNT(DISTINCT g.id) AS generated_document_count,
MAX(g.generated_at) AS last_document_generated
FROM judicial.cases c
LEFT JOIN judicial.documents d ON d.case_id = c.id
LEFT JOIN judicial.firac_analyses f ON f.case_id = c.id
LEFT JOIN judicial.generated_documents g ON g.case_id = c.id
GROUP BY c.id;

-- Processing statistics view
CREATE OR REPLACE VIEW judicial.processing_stats AS
SELECT
DATE(created_at) AS date,
COUNT(*) AS total_cases,
COUNT(CASE WHEN status = 'completed' THEN 1 END) AS completed_cases,
COUNT(CASE WHEN status = 'pending' THEN 1 END) AS pending_cases,
COUNT(CASE WHEN status = 'error' THEN 1 END) AS error_cases,
AVG(EXTRACT(EPOCH FROM (processed_at - received_at))) AS avg_processing_seconds
FROM judicial.cases
WHERE created_at >= NOW() - INTERVAL '30 days'
GROUP BY DATE(created_at)
ORDER BY date DESC;

-- =====================================================
-- INITIAL DATA
-- =====================================================

-- Insert common document types
CREATE TABLE IF NOT EXISTS judicial.document_types (
code VARCHAR(50) PRIMARY KEY,
name VARCHAR(100) NOT NULL,
description TEXT
);

INSERT INTO judicial.document_types (code, name, description) VALUES
('sentenca', 'Sentenca', 'Decisao judicial que resolve o merito da causa'),
('despacho', 'Despacho', 'Ato judicial de mero expediente'),
('decisao', 'Decisao Interlocutoria', 'Decisao que resolve questao incidente'),
('acordao', 'Acordao', 'Decisao colegiada de tribunal'),
('peticao', 'Peticao', 'Documento da parte'),
('parecer', 'Parecer', 'Opiniao tecnica do Ministerio Publico')
ON CONFLICT (code) DO NOTHING;

-- Grant permissions
GRANT USAGE ON SCHEMA judicial TO n8n_user;
GRANT USAGE ON SCHEMA audit TO n8n_user;
GRANT ALL PRIVILEGES ON ALL TABLES IN SCHEMA judicial TO n8n_user;
GRANT ALL PRIVILEGES ON ALL TABLES IN SCHEMA audit TO n8n_user;
GRANT ALL PRIVILEGES ON ALL SEQUENCES IN SCHEMA judicial TO n8n_user;

-- Success message
DO $$
BEGIN
RAISE NOTICE 'Database initialization completed successfully!';
RAISE NOTICE 'Schemas created: judicial, audit';
RAISE NOTICE 'Tables created: cases, documents, firac_analyses, jurisprudence_searches, distinguish_analyses, generated_documents, notifications';
END $$;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This SQL script manually defines the database schema, which is also managed by Alembic migrations. Maintaining two sources of truth for the database schema is problematic and can lead to inconsistencies. For example, this script includes a GIN index and audit triggers that are missing from the Alembic migration. It's best to rely on a single system, preferably Alembic, to manage all schema changes. Please consolidate all schema definitions into your Alembic migrations and consider removing this SQL file or marking it clearly as for development setup only and ensuring it's generated from migrations.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 16

Note

Due to the large number of review comments, Critical, Major severity comments were prioritized as inline comments.

🟡 Minor comments (11)
.env.template-68-68 (1)

68-68: Quote unquoted environment variable value.

Line 68 contains a value with spaces that should be quoted. The dotenv parser may not handle the unquoted value correctly.

Apply this diff:

-SMTP_FROM_NAME=Sistema Automacao Juridica
+SMTP_FROM_NAME="Sistema Automacao Juridica"
tests/unit/test_pdf_extractor.py-1-102 (1)

1-102: Run Black formatter to fix formatting issues.

The CI pipeline reports a Black formatting check failure. Run black tests/unit/test_pdf_extractor.py to automatically format the code.

#!/bin/bash
# Format the file with Black
black tests/unit/test_pdf_extractor.py
docs/API_DOCUMENTATION.md-19-19 (1)

19-19: Fix markdown formatting issues flagged by markdownlint.

Several minor markdown formatting issues were detected:

  • Line 19: Add language identifier to fenced code block
  • Lines 161, 301: Add blank lines around tables
  • Lines 388-389: Convert bare URLs to markdown links

Apply these fixes:

+

Authorization: Bearer <your_token>

 | size | integer | No | Max results (default: 50, max: 100) |
+
 **Response:**
 | 500 | Internal server error |
+
 ---
 For API support:
-- **Email:** suporte@judicial-automation.com.br
-- **GitHub Issues:** https://github.com/fbmoulin/2acivelagent/issues
+- **Email:** <suporte@judicial-automation.com.br>
+- **GitHub Issues:** <https://github.com/fbmoulin/2acivelagent/issues>

Also applies to: 161-161, 301-301, 388-389

alembic.ini-28-28 (1)

28-28: Update database name to match the project.

The database name n8n appears to reference the n8n workflow automation platform, which is inconsistent with this judicial automation system. Update to a more appropriate name like judicial_automation.

-sqlalchemy.url = postgresql://n8n_user:password@localhost:5432/n8n
+# sqlalchemy.url = postgresql://judicial_user:password@localhost:5432/judicial_automation
docs/api/openapi.yaml-158-158 (1)

158-158: Escape or rephrase the colon in the description text.

YAMLlint reports a syntax error on this line. The colon character within the Portuguese text (ex: falha OpenAI) may confuse YAML parsers that interpret it as a key-value separator.

Apply this diff to fix the issue by wrapping the description in quotes:

-          description: Erro na analise (ex: falha OpenAI)
+          description: "Erro na analise (ex: falha OpenAI)"

Alternatively, replace the colon with a dash or remove the parenthetical:

-          description: Erro na analise (ex: falha OpenAI)
+          description: Erro na analise (ex - falha OpenAI)
tests/unit/test_api_routes.py-134-143 (1)

134-143: Add assertions on the response.

The response variable is assigned but never used. The test should verify the response status code or data to ensure complete coverage.

         response = client.post('/firac-analysis',
                                data=json.dumps({'text': sample_legal_text}),
                                content_type='application/json')
 
         mock_openai.assert_called_once()
+        assert response.status_code == 200
tests/unit/test_api_routes.py-175-190 (1)

175-190: Add assertions on the response.

The response variable is assigned but never used. The test should verify the response to ensure the API call succeeded.

         response = client.post('/datajud-search',
                                data=json.dumps({
                                    'tribunal': 'tjsp',
                                    'texto_livre': 'execucao fiscal'
                                }),
                                content_type='application/json')
 
         mock_post.assert_called_once()
+        assert response.status_code == 200
migrations/models.py-244-244 (1)

244-244: Same privacy consideration as noted for SQL script.

recipient_email stores PII; consider documenting data retention and protection requirements per LGPD compliance.

scripts/python/app.py-1-2 (1)

1-2: Run Black formatter to fix CI pipeline failure.

The CI pipeline reports Black formatting check failed. Run black scripts/python/app.py to fix. Also, the shebang is present but file is not executable (Ruff EXE001).

tests/conftest.py-1-3 (1)

1-3: Run Black formatter to fix CI pipeline failure.

The CI pipeline reports that Black formatting check failed for this file. Run black tests/conftest.py to fix formatting issues.

scripts/sql/init_database.sql-142-157 (1)

142-157: Consider privacy implications for email storage.

The recipient_email column stores email addresses in plain text. Depending on your compliance requirements (LGPD - Brazil's data protection law), you may need to:

  1. Document this data collection in privacy policies
  2. Consider encryption at rest
  3. Implement data retention policies
🧹 Nitpick comments (20)
.env.template (1)

1-109: Consider reordering environment variables for consistency.

The dotenv-linter tool flagged 17 instances where variables within each section are not alphabetically ordered (e.g., POSTGRES_DB before POSTGRES_HOST, REDIS_PASSWORD before REDIS_PORT). While this does not affect functionality, alphabetical ordering improves consistency and readability.

docs/api/openapi.yaml (2)

54-76: Consider excluding /health from authentication requirements.

The /health endpoint is typically used by load balancers, monitoring systems, and container orchestrators for health checks. Requiring JWT authentication may prevent these systems from functioning correctly.

If you want to exclude the health endpoint from authentication, add a security override:

 /health:
   get:
     tags:
       - Health
     summary: Verificacao de saude
     description: Retorna o status de saude da API e seus servicos dependentes.
     operationId: healthCheck
+    security: []
     responses:

The empty array security: [] overrides the global security requirement for this specific endpoint.


363-364: Consider increasing the minimum text length for FIRAC analysis.

The minLength: 50 constraint for legal text analysis may be too permissive. A meaningful FIRAC analysis typically requires a substantial case description with facts, issues, and legal context.

Consider increasing the minimum to a more practical value:

       text:
         type: string
-        minLength: 50
+        minLength: 200
         description: Texto juridico para analise (minimo 50 caracteres)

And update the description accordingly.

alembic.ini (1)

9-9: Verify the file_template format string.

The migration filename template uses multiple datetime components that will generate very long filenames. While this works, it may be unnecessarily verbose.

Consider simplifying to the more common format:

-file_template = %%(year)d%%(month).2d%%(day).2d_%%(hour).2d%%(minute).2d%%(second).2d_%%(rev)s_%%(slug)s
+file_template = %%(year)d%%(month).2d%%(day).2d_%%(rev)s_%%(slug)s

This removes hour/minute/second, which are rarely needed since the revision ID already provides uniqueness.

infrastructure/docker/Dockerfile.python (1)

64-65: Consider using Python for health checks instead of curl.

While the current health check works, using Python would eliminate the curl dependency and reduce the image size slightly.

Replace the curl-based health check with a Python-based one:

 HEALTHCHECK --interval=30s --timeout=10s --start-period=5s --retries=3 \
-    CMD curl -f http://localhost:5000/health || exit 1
+    CMD python -c "import urllib.request; urllib.request.urlopen('http://localhost:5000/health').read()" || exit 1

Then remove curl from the runtime dependencies:

 RUN apt-get update && apt-get install -y --no-install-recommends \
     libpq5 \
-    curl \
     && rm -rf /var/lib/apt/lists/* \
     && apt-get clean
migrations/README.md (1)

15-18: Consider using a more descriptive database name following naming conventions.

The example uses n8n as the database name. While n8n is a core component of this judicial automation system, database names typically reflect the database content or application domain rather than the orchestration tool name. Consider using judicial_automation, legal_system, or jus_automation for better clarity and maintainability.

tests/unit/test_api_routes.py (1)

78-86: Assertion is too permissive.

The test accepts either 'text' or 'error' in the response, which means it passes even if the extraction fails. Consider separate tests for success and error cases to ensure proper validation.

-    @pytest.mark.unit
-    def test_extract_pdf_returns_text(self, client, sample_pdf_base64):
-        """Test that extracted text is returned"""
-        response = client.post('/extract-pdf',
-                               data=json.dumps({'pdf_content': sample_pdf_base64}),
-                               content_type='application/json')
-
-        data = json.loads(response.data)
-        assert 'text' in data or 'error' in data
+    @pytest.mark.unit
+    def test_extract_pdf_returns_text(self, client, sample_pdf_base64):
+        """Test that extracted text is returned"""
+        response = client.post('/extract-pdf',
+                               data=json.dumps({'pdf_content': sample_pdf_base64}),
+                               content_type='application/json')
+
+        assert response.status_code == 200
+        data = json.loads(response.data)
+        assert 'text' in data
migrations/versions/20241204_000000_001_initial_schema.py (1)

103-125: Consider partial indexes for confidence scores.

If you frequently query for high-confidence analyses, consider adding partial indexes on confidence_score where the score exceeds a threshold. This is optional and can be deferred based on actual query patterns.

Example for future migration:

CREATE INDEX idx_firac_high_confidence 
  ON judicial.firac_analyses(confidence_score) 
  WHERE confidence_score > 0.80;
tests/integration/test_api_integration.py (2)

207-218: Remove unused mock_openai parameter.

The mock_openai parameter is declared but never used in this test. Remove it to clean up the test signature.

     @pytest.mark.integration
-    @patch('scripts.python.app.openai.ChatCompletion.create')
-    def test_multiple_pdf_extractions(self, mock_openai, client, sample_pdf_base64):
+    def test_multiple_pdf_extractions(self, client, sample_pdf_base64):
         """Test multiple PDF extractions"""

243-261: Remove unused sample_pdf_base64 parameter.

The sample_pdf_base64 parameter is declared but never used in this test. Remove it to clean up the test signature.

     @pytest.mark.integration
-    def test_unicode_text_handled(self, client, sample_pdf_base64):
+    def test_unicode_text_handled(self, client):
         """Test handling of Unicode text (Portuguese characters)"""
infrastructure/nginx/nginx.conf (1)

55-59: Consider adding Content-Security-Policy header.

The security headers are good, but consider adding a Content-Security-Policy (CSP) header for additional protection against XSS and data injection attacks. Note that X-XSS-Protection is deprecated in modern browsers but doesn't hurt to keep for legacy support.

     add_header X-Frame-Options "SAMEORIGIN" always;
     add_header X-Content-Type-Options "nosniff" always;
     add_header X-XSS-Protection "1; mode=block" always;
     add_header Referrer-Policy "strict-origin-when-cross-origin" always;
+    add_header Content-Security-Policy "default-src 'self'; script-src 'self' 'unsafe-inline' 'unsafe-eval'; style-src 'self' 'unsafe-inline';" always;
.github/workflows/ci.yml (1)

194-197: Consider increasing coverage threshold over time.

The 50% coverage threshold is reasonable for initial setup, but plan to gradually increase this to 70-80% as the codebase matures to ensure comprehensive test coverage.

tests/conftest.py (2)

11-15: Consider using pytest.ini or pyproject.toml for path configuration.

The sys.path.insert approach works but is fragile. Consider configuring pythonpath in pytest.ini or pyproject.toml instead for cleaner test discovery.

# pytest.ini
[pytest]
pythonpath = .

73-77: Move base64 import to module level.

The base64 import should be at the top of the file with other imports for consistency with PEP 8.

 import os
 import sys
 import json
+import base64
 import pytest

And then remove line 76.

scripts/sql/init_database.sql (2)

28-28: Consider adding a CHECK constraint for status values.

The status column uses VARCHAR(50) with a default, but unlike the ORM models (migrations/models.py) which define a CaseStatus enum, this SQL script lacks validation. Consider adding a CHECK constraint or creating an ENUM type for consistency:

-- Option 1: CHECK constraint
status VARCHAR(50) DEFAULT 'pending' CHECK (status IN ('pending', 'processing', 'completed', 'error', 'archived')),

-- Option 2: ENUM type (preferred for consistency with ORM)
CREATE TYPE judicial.case_status AS ENUM ('pending', 'processing', 'completed', 'error', 'archived');
-- Then use: status judicial.case_status DEFAULT 'pending',

206-225: Audit trigger does not capture user context.

The audit.log_changes() function logs row changes but doesn't populate user_id, ip_address, or user_agent. These fields will always be NULL. Consider using session variables or application-level auditing to capture user context:

-- Application can set these before operations:
-- SET LOCAL app.user_id = 'user123';
-- SET LOCAL app.ip_address = '192.168.1.1';

-- Then in the trigger function:
INSERT INTO audit.logs (table_name, record_id, action, new_data, user_id, ip_address)
VALUES (
    TG_TABLE_SCHEMA || '.' || TG_TABLE_NAME, 
    NEW.id, 
    'INSERT', 
    to_jsonb(NEW),
    current_setting('app.user_id', true),
    current_setting('app.ip_address', true)::inet
);
scripts/python/app.py (3)

71-76: Use logger.exception for automatic stack traces.

Replace logger.error with logger.exception in exception handlers. This applies to all similar patterns in this file (lines 72, 121, 207, 261, 331, 401, 422, 439, 460, 483).

         except Exception as e:
-            logger.error(f"PDF extraction error: {e}")
+            logger.exception("PDF extraction error")
             return {
                 "success": False,
                 "error": str(e)
             }

183-189: No validation of DATAJUD credentials before request.

If DATAJUD_USERNAME or DATAJUD_PASSWORD are not set, auth=(None, None) is passed to requests, which may cause unexpected behavior. Validate credentials before making the request:

+            if not self.username or not self.password:
+                return {
+                    "success": False,
+                    "error": "DATAJUD credentials not configured"
+                }
+
             response = requests.post(
                 endpoint,
                 json=search_query,
                 headers=headers,
                 auth=(self.username, self.password),
                 timeout=30
             )

488-495: Unused error parameter in error handlers.

Prefix with underscore to indicate intentionally unused:

 @app.errorhandler(404)
-def not_found(error):
+def not_found(_error):
     return jsonify({"error": "Endpoint not found"}), 404


 @app.errorhandler(500)
-def internal_error(error):
+def internal_error(_error):
     return jsonify({"error": "Internal server error"}), 500
migrations/models.py (1)

90-90: Inconsistent type usage for document_type.

Document.document_type uses String(50) while GeneratedDocument.document_type (line 211) uses SQLEnum(DocumentType). Consider using the enum consistently:

-    document_type = Column(String(50), nullable=False)
+    document_type = Column(
+        SQLEnum(DocumentType, name="document_type", schema="judicial"),
+        nullable=False
+    )
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2547c6d and ebe0325.

📒 Files selected for processing (28)
  • .env.template (1 hunks)
  • .github/dependabot.yml (1 hunks)
  • .github/workflows/ci.yml (1 hunks)
  • .github/workflows/deploy.yml (1 hunks)
  • .gitignore (1 hunks)
  • alembic.ini (1 hunks)
  • docs/API_DOCUMENTATION.md (1 hunks)
  • docs/api/openapi.yaml (1 hunks)
  • infrastructure/docker/Dockerfile.python (1 hunks)
  • infrastructure/nginx/nginx.conf (1 hunks)
  • migrations/README.md (1 hunks)
  • migrations/env.py (1 hunks)
  • migrations/models.py (1 hunks)
  • migrations/script.py.mako (1 hunks)
  • migrations/versions/20241204_000000_001_initial_schema.py (1 hunks)
  • pyproject.toml (1 hunks)
  • pytest.ini (1 hunks)
  • requirements.txt (1 hunks)
  • scripts/python/app.py (1 hunks)
  • scripts/sql/init_database.sql (1 hunks)
  • security/ssl_certificates/README.txt (1 hunks)
  • tests/__init__.py (1 hunks)
  • tests/conftest.py (1 hunks)
  • tests/integration/__init__.py (1 hunks)
  • tests/integration/test_api_integration.py (1 hunks)
  • tests/unit/__init__.py (1 hunks)
  • tests/unit/test_api_routes.py (1 hunks)
  • tests/unit/test_pdf_extractor.py (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (3)
tests/conftest.py (1)
scripts/python/app.py (5)
  • PDFExtractor (47-76)
  • FIRACAnalyzer (79-125)
  • DatajudClient (128-211)
  • DistinguishAnalyzer (214-265)
  • DocumentGenerator (268-335)
scripts/python/app.py (1)
tests/conftest.py (5)
  • pdf_extractor (180-182)
  • firac_analyzer (186-188)
  • datajud_client (192-194)
  • distinguish_analyzer (198-200)
  • document_generator (204-206)
tests/unit/test_pdf_extractor.py (2)
tests/conftest.py (2)
  • pdf_extractor (180-182)
  • sample_pdf_content (35-70)
scripts/python/app.py (1)
  • extract_text_from_pdf (51-76)
🪛 actionlint (1.7.9)
.github/workflows/deploy.yml

152-152: the runner of "softprops/action-gh-release@v1" action is too old to run on GitHub Actions. update the action's version to fix this issue

(action)

🪛 Checkov (3.2.334)
.github/workflows/ci.yml

[medium] 163-164: Basic Auth Credentials

(CKV_SECRET_4)

🪛 dotenv-linter (4.0.0)
.env.template

[warning] 12-12: [UnorderedKey] The N8N_PASSWORD key should go before the N8N_PORT key

(UnorderedKey)


[warning] 20-20: [UnorderedKey] The POSTGRES_DB key should go before the POSTGRES_HOST key

(UnorderedKey)


[warning] 22-22: [UnorderedKey] The POSTGRES_PASSWORD key should go before the POSTGRES_PORT key

(UnorderedKey)


[warning] 29-29: [UnorderedKey] The REDIS_PASSWORD key should go before the REDIS_PORT key

(UnorderedKey)


[warning] 37-37: [UnorderedKey] The OPENAI_MAX_TOKENS key should go before the OPENAI_MODEL key

(UnorderedKey)


[warning] 43-43: [UnorderedKey] The DATAJUD_PASSWORD key should go before the DATAJUD_USERNAME key

(UnorderedKey)


[warning] 44-44: [UnorderedKey] The DATAJUD_BASE_URL key should go before the DATAJUD_PASSWORD key

(UnorderedKey)


[warning] 57-57: [UnorderedKey] The GOOGLE_LOCATION key should go before the GOOGLE_PROJECT_ID key

(UnorderedKey)


[warning] 58-58: [UnorderedKey] The GOOGLE_PROCESSOR_ID key should go before the GOOGLE_PROJECT_ID key

(UnorderedKey)


[warning] 66-66: [UnorderedKey] The SMTP_PASSWORD key should go before the SMTP_PORT key

(UnorderedKey)


[warning] 67-67: [UnorderedKey] The SMTP_FROM_EMAIL key should go before the SMTP_HOST key

(UnorderedKey)


[warning] 68-68: [UnorderedKey] The SMTP_FROM_NAME key should go before the SMTP_HOST key

(UnorderedKey)


[warning] 68-68: [ValueWithoutQuotes] This value needs to be surrounded in quotes

(ValueWithoutQuotes)


[warning] 74-74: [UnorderedKey] The GRAFANA_PASSWORD key should go before the GRAFANA_USER key

(UnorderedKey)


[warning] 75-75: [UnorderedKey] The GF_SERVER_ROOT_URL key should go before the GRAFANA_PASSWORD key

(UnorderedKey)


[warning] 81-81: [UnorderedKey] The ENCRYPTION_KEY key should go before the JWT_SECRET key

(UnorderedKey)


[warning] 87-87: [UnorderedKey] The FLASK_DEBUG key should go before the FLASK_ENV key

(UnorderedKey)

🪛 GitHub Actions: CI Pipeline
tests/integration/test_api_integration.py

[error] 1-1: Black formatting check failed. Run 'black' to format code in this file.

tests/conftest.py

[error] 1-1: Black formatting check failed. Run 'black' to format code in this file.

scripts/python/app.py

[error] 1-1: Black formatting check failed. Run 'black' to format code in this file.

tests/unit/test_api_routes.py

[error] 1-1: Black formatting check failed. Run 'black' to format code in this file.

tests/unit/test_pdf_extractor.py

[error] 1-1: Black formatting check failed. Run 'black' to format code in this file.

🪛 markdownlint-cli2 (0.18.1)
docs/API_DOCUMENTATION.md

19-19: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


161-161: Tables should be surrounded by blank lines

(MD058, blanks-around-tables)


301-301: Tables should be surrounded by blank lines

(MD058, blanks-around-tables)


388-388: Bare URL used

(MD034, no-bare-urls)


389-389: Bare URL used

(MD034, no-bare-urls)

🪛 OSV Scanner (2.3.0)
requirements.txt

[HIGH] 7-7: werkzeug 3.0.1: Werkzeug debugger vulnerable to remote execution when interacting with attacker controlled domain

(GHSA-2g68-c3qc-8985)


[HIGH] 7-7: werkzeug 3.0.1: Werkzeug safe_join not safe on Windows

(GHSA-f9vj-2wh5-fj8j)


[HIGH] 7-7: werkzeug 3.0.1: Werkzeug safe_join() allows Windows special device names

(GHSA-hgf8-39gv-g3f2)


[HIGH] 7-7: werkzeug 3.0.1: Werkzeug possible resource exhaustion when parsing file data in forms

(GHSA-q34m-jh98-gwm2)


[HIGH] 33-33: aiohttp 3.9.1: undefined

(PYSEC-2024-24)


[HIGH] 33-33: aiohttp 3.9.1: undefined

(PYSEC-2024-26)


[HIGH] 33-33: aiohttp 3.9.1: aiohttp is vulnerable to directory traversal

(GHSA-5h86-8mv2-jq9f)


[HIGH] 33-33: aiohttp 3.9.1: aiohttp vulnerable to Denial of Service when trying to parse malformed POST requests

(GHSA-5m98-qgg9-wh84)


[HIGH] 33-33: aiohttp 3.9.1: aiohttp Cross-site Scripting vulnerability on index pages for static file handling

(GHSA-7gpw-8wmc-pm8g)


[HIGH] 33-33: aiohttp 3.9.1: aiohttp allows request smuggling due to incorrect parsing of chunk extensions

(GHSA-8495-4g3g-x7pr)


[HIGH] 33-33: aiohttp 3.9.1: aiohttp's HTTP parser (the python one, not llhttp) still overly lenient about separators

(GHSA-8qpw-xqxj-h4r2)


[HIGH] 33-33: aiohttp 3.9.1: AIOHTTP is vulnerable to HTTP Request/Response Smuggling through incorrect parsing of chunked trailer sections

(GHSA-9548-qrrj-x5pj)


[HIGH] 42-42: cryptography 41.0.7: undefined

(PYSEC-2024-225)


[HIGH] 42-42: cryptography 41.0.7: Python Cryptography package vulnerable to Bleichenbacher timing oracle attack

(GHSA-3ww4-gg4f-jr7f)


[HIGH] 42-42: cryptography 41.0.7: cryptography NULL pointer dereference with pkcs12.serialize_key_and_certificates when called with a non-matching certificate and private key and an hmac_hash override

(GHSA-6vqw-3v5j-54x4)


[HIGH] 42-42: cryptography 41.0.7: Null pointer dereference in PKCS12 parsing

(GHSA-9v9h-cgj8-h64p)


[HIGH] 42-42: cryptography 41.0.7: pyca/cryptography has a vulnerable OpenSSL included in cryptography wheels

(GHSA-h4gh-qq45-vh27)

🪛 Ruff (0.14.7)
tests/integration/test_api_integration.py

209-209: Unused method argument: mock_openai

(ARG002)


244-244: Unused method argument: sample_pdf_base64

(ARG002)

scripts/python/app.py

1-1: Shebang is present but file is not executable

(EXE001)


42-42: Do not catch blind exception: Exception

(BLE001)


71-71: Do not catch blind exception: Exception

(BLE001)


72-72: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


120-120: Do not catch blind exception: Exception

(BLE001)


121-121: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


206-206: Do not catch blind exception: Exception

(BLE001)


207-207: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


260-260: Do not catch blind exception: Exception

(BLE001)


261-261: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


330-330: Do not catch blind exception: Exception

(BLE001)


331-331: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


356-356: Do not catch blind exception: Exception

(BLE001)


385-385: Do not catch blind exception: Exception

(BLE001)


395-395: Do not catch blind exception: Exception

(BLE001)


400-400: Do not catch blind exception: Exception

(BLE001)


401-401: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


421-421: Do not catch blind exception: Exception

(BLE001)


422-422: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


438-438: Do not catch blind exception: Exception

(BLE001)


439-439: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


459-459: Do not catch blind exception: Exception

(BLE001)


460-460: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


482-482: Do not catch blind exception: Exception

(BLE001)


483-483: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


489-489: Unused function argument: error

(ARG001)


494-494: Unused function argument: error

(ARG001)


504-504: Possible binding to all interfaces

(S104)

tests/unit/test_api_routes.py

139-139: Local variable response is assigned to but never used

Remove assignment to unused variable response

(F841)


183-183: Local variable response is assigned to but never used

Remove assignment to unused variable response

(F841)

🪛 YAMLlint (1.37.1)
docs/api/openapi.yaml

[error] 158-158: syntax error: mapping values are not allowed here

(syntax)

🔇 Additional comments (32)
tests/__init__.py (1)

1-1: LGTM!

Simple and appropriate package initializer.

tests/integration/__init__.py (1)

1-1: LGTM!

Simple and appropriate package initializer.

.github/dependabot.yml (1)

1-58: LGTM!

Well-structured Dependabot configuration with appropriate cadence, PR limits, and ecosystem coverage. The timezone-aware schedule and semantic commit prefixes are good practices.

.gitignore (1)

1-142: LGTM!

Comprehensive and well-organized .gitignore covering Python, Docker, secrets, IDEs, and system artifacts. Clear section headers and strategic comments (e.g., preserving docker-compose.yml) demonstrate thoughtful design.

pytest.ini (1)

1-43: LGTM!

Well-configured pytest setup with appropriate markers, discovery patterns, and logging. Strict marker enforcement and informative output configuration are good practices.

pyproject.toml (2)

1-162: LGTM: Project configuration is well-structured.

Comprehensive configuration with appropriate separation of build system, metadata, dependencies (core vs dev vs monitoring), tool configurations, and packaging setup. The use of version ranges (>=) here with exact pinning in requirements.txt is a solid pattern for balancing flexibility and reproducibility.


89-90: Entry point is correctly configured and callable.

Verification confirms that scripts/python/app.py exists, defines a Flask application object (app) at line 30, and the app.run() method is properly inherited from Flask and called at line 504. The entry point judicial-api = "scripts.python.app:app.run" is valid and functional.

docs/api/openapi.yaml (1)

1-505: Well-structured OpenAPI specification with comprehensive coverage.

The specification provides clear documentation for all endpoints with appropriate request/response schemas, error handling, and examples. The JWT authentication scheme is properly configured and the schema definitions are thorough.

tests/unit/__init__.py (1)

1-1: LGTM!

Standard Python package initializer.

docs/API_DOCUMENTATION.md (1)

1-389: Excellent comprehensive API documentation.

The documentation thoroughly covers all endpoints with clear examples, authentication details, error handling, rate limiting, and LGPD compliance information. The SDK examples in Python and JavaScript are particularly helpful.

migrations/README.md (1)

1-116: Well-structured migration documentation.

The README provides comprehensive coverage of Alembic workflow, including setup, common commands, schema overview, best practices, and troubleshooting. The examples are clear and practical.

alembic.ini (1)

31-38: Excellent use of post-write hooks for code formatting.

The Black formatting hook ensures migration files are automatically formatted after generation, maintaining code consistency.

infrastructure/docker/Dockerfile.python (1)

1-68: Excellent multi-stage Dockerfile with security best practices.

The Dockerfile follows best practices:

  • Multi-stage build minimizes final image size
  • Non-root user for security
  • Minimal base image (python:3.11-slim)
  • Proper layer caching with requirements first
  • Comprehensive health check configuration
  • Clean apt cache to reduce size
tests/unit/test_pdf_extractor.py (1)

1-102: Comprehensive test coverage for PDF extraction.

The test suite thoroughly covers:

  • Success cases with metadata validation
  • Error handling for invalid and empty content
  • Edge cases like corrupted PDFs and empty pages
  • Multi-page PDF processing with proper mocking

The use of pytest markers and fixtures demonstrates good test organization.

security/ssl_certificates/README.txt (1)

1-1: SSL certificates are properly excluded from version control.

The .gitignore file includes the *.pem pattern (line 12), ensuring certificate files are not tracked by git. No certificate files are currently tracked in the repository, confirming proper security configuration.

migrations/versions/20241204_000000_001_initial_schema.py (1)

21-291: LGTM! Well-structured database migration.

The migration is comprehensive and follows best practices:

  • Proper schema organization (judicial/audit separation)
  • Correct use of DO blocks for idempotent enum creation
  • Appropriate indexes on foreign keys and query columns
  • Cascade deletes where appropriate
  • Complete downgrade path
infrastructure/nginx/nginx.conf (2)

100-118: LGTM! Strong SSL/TLS configuration.

The SSL configuration follows modern best practices:

  • TLSv1.2 and TLSv1.3 only
  • Strong cipher suites with forward secrecy
  • HSTS with appropriate max-age
  • Session tickets disabled for enhanced privacy

62-64: Verify rate limits match expected traffic patterns.

The rate limits are set to 10 req/s for API and 5 req/s for webhooks. Ensure these values align with your expected legitimate traffic patterns to avoid false positives.

Consider monitoring actual traffic patterns in staging to validate these thresholds before production deployment.

migrations/env.py (2)

32-46: LGTM! Clean database URL construction.

The function properly handles both full DATABASE_URL/POSTGRES_URL and individual component environment variables with sensible defaults.


82-86: Good use of NullPool for migrations.

Using NullPool is the correct choice for migrations as it prevents connection pooling issues during schema changes. This follows Alembic best practices.

migrations/script.py.mako (1)

1-28: LGTM! Clean migration template.

The template follows Alembic best practices with proper type annotations and clear structure. The use of typed variables and the pass defaults for upgrade/downgrade functions are appropriate.

.github/workflows/ci.yml (1)

1-284: LGTM! Comprehensive CI pipeline.

The CI pipeline is well-structured with:

  • Proper job dependencies and parallelization
  • Service containers for integration tests
  • Multiple quality gates (lint, tests, coverage, security)
  • Appropriate caching strategies
  • Final success job to gate the pipeline

The credential warning from Checkov at lines 163-164 is a false positive—these are test database credentials used in CI service containers, which is standard practice.

tests/conftest.py (3)

153-166: LGTM on mock fixtures.

The mock_openai_response and mock_redis fixtures are well-structured and provide good isolation for unit tests.


209-215: LGTM on pytest markers configuration.

Custom markers are properly registered for test classification (unit, integration, slow, requires_api).


34-70: Verify the hardcoded PDF byte offsets are accurate.

The xref table contains hardcoded byte offsets (9, 58, 115, 214 for objects 1-4, and 306 for startxref). These must match the actual byte positions in the PDF stream or parsers like PyPDF2 will fail. Calculate the exact byte positions of each object marker (1 0 obj, 2 0 obj, etc.) to confirm the offsets are correct before relying on this fixture.

scripts/sql/init_database.sql (3)

4-13: LGTM on schema setup.

Good separation of concerns with judicial and audit schemas. Extensions uuid-ossp and pg_trgm are appropriate for the use case.


59-62: LGTM on full-text search configuration.

The GIN index with Portuguese dictionary for extracted_text is appropriate for a Brazilian legal document system.


291-298: LGTM on seed data.

Using ON CONFLICT (code) DO NOTHING makes this idempotent for repeated runs.

scripts/python/app.py (1)

498-504: Binding to all interfaces is acceptable for containerized deployment.

The Ruff S104 warning about binding to 0.0.0.0 is a false positive for Docker deployments where this is necessary. The nginx reverse proxy provides the security boundary.

migrations/models.py (3)

23-39: LGTM on enum definitions.

CaseStatus and DocumentType enums are well-defined and consistent with database reference data.


42-77: LGTM on Case model.

Well-structured with proper indexing, schema qualification, and cascade behaviors for child relationships.


256-276: LGTM on AuditLog model.

Properly isolated in the audit schema with appropriate PostgreSQL-specific types (INET, JSONB).

Comment on lines +150 to +156
- name: Create GitHub Release
if: startsWith(github.ref, 'refs/tags/v')
uses: softprops/action-gh-release@v1
with:
generate_release_notes: true
draft: false
prerelease: ${{ contains(github.ref, '-rc') || contains(github.ref, '-beta') }}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Update GitHub Action to a newer version.

The softprops/action-gh-release@v1 action uses an outdated runner. Update to v2 to ensure compatibility with current GitHub Actions infrastructure.

       - name: Create GitHub Release
         if: startsWith(github.ref, 'refs/tags/v')
-        uses: softprops/action-gh-release@v1
+        uses: softprops/action-gh-release@v2
         with:
           generate_release_notes: true
           draft: false
           prerelease: ${{ contains(github.ref, '-rc') || contains(github.ref, '-beta') }}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
- name: Create GitHub Release
if: startsWith(github.ref, 'refs/tags/v')
uses: softprops/action-gh-release@v1
with:
generate_release_notes: true
draft: false
prerelease: ${{ contains(github.ref, '-rc') || contains(github.ref, '-beta') }}
- name: Create GitHub Release
if: startsWith(github.ref, 'refs/tags/v')
uses: softprops/action-gh-release@v2
with:
generate_release_notes: true
draft: false
prerelease: ${{ contains(github.ref, '-rc') || contains(github.ref, '-beta') }}
🧰 Tools
🪛 actionlint (1.7.9)

152-152: the runner of "softprops/action-gh-release@v1" action is too old to run on GitHub Actions. update the action's version to fix this issue

(action)

🤖 Prompt for AI Agents
.github/workflows/deploy.yml around lines 150 to 156: the workflow uses
softprops/action-gh-release@v1 which relies on an outdated runner; update the
action reference to softprops/action-gh-release@v2 (or a specific v2.x tag) in
the workflow, keep the same inputs (generate_release_notes, draft, prerelease)
and verify the action's v2 docs for any input/name changes before committing.

Comment on lines +167 to +176
rollback:
name: Rollback Deployment
runs-on: ubuntu-latest
if: github.event_name == 'workflow_dispatch' && failure()
needs: [deploy-staging, deploy-production]
steps:
- name: Rollback to previous version
run: |
echo "Rolling back deployment..."
# ssh ${{ secrets.PROD_HOST }} "cd /app && docker compose down && git checkout HEAD~1 && docker compose up -d"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Fix rollback trigger logic.

The rollback job uses failure() which checks if the current job failed, but the job hasn't run yet. The condition github.event_name == 'workflow_dispatch' && failure() will never trigger correctly. Consider using a manual workflow_dispatch input for rollback decisions.

   rollback:
     name: Rollback Deployment
     runs-on: ubuntu-latest
-    if: github.event_name == 'workflow_dispatch' && failure()
-    needs: [deploy-staging, deploy-production]
+    if: github.event_name == 'workflow_dispatch' && github.event.inputs.action == 'rollback'
+    
     steps:
       - name: Rollback to previous version
         run: |

Also add the rollback input to workflow_dispatch:

   workflow_dispatch:
     inputs:
       environment:
         description: 'Deployment environment'
         required: true
         default: 'staging'
         type: choice
         options:
           - staging
           - production
+      action:
+        description: 'Deployment action'
+        required: true
+        default: 'deploy'
+        type: choice
+        options:
+          - deploy
+          - rollback

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
.github/workflows/deploy.yml lines 167-176: the rollback job condition uses
failure() which cannot evaluate before the job runs, so it will never trigger;
change the workflow to accept a workflow_dispatch input (e.g., rollback: boolean
or string) and update the rollback job's if to check github.event_name ==
'workflow_dispatch' && github.event.inputs.rollback == 'true' (or appropriate
value), and update the workflow_dispatch section to declare that rollback input
so the job can be manually invoked with rollback=true; ensure any existing needs
remain correct for manual rollback.

Comment thread alembic.ini Outdated
Comment thread migrations/models.py
Comment on lines +7 to +18
from datetime import datetime
from typing import Optional, List
from sqlalchemy import (
Column, String, Integer, Text, Boolean, DateTime, Float,
ForeignKey, JSON, ARRAY, BigInteger, Index, UniqueConstraint,
Numeric, Enum as SQLEnum
)
from sqlalchemy.dialects.postgresql import UUID, INET, JSONB
from sqlalchemy.orm import declarative_base, relationship
from sqlalchemy.sql import func
import uuid
import enum
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Remove unused imports.

Optional, List, Float, and UniqueConstraint are imported but never used.

 from datetime import datetime
-from typing import Optional, List
 from sqlalchemy import (
-    Column, String, Integer, Text, Boolean, DateTime, Float,
-    ForeignKey, JSON, ARRAY, BigInteger, Index, UniqueConstraint,
+    Column, String, Integer, Text, Boolean, DateTime,
+    ForeignKey, JSON, ARRAY, BigInteger, Index,
     Numeric, Enum as SQLEnum
 )
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
from datetime import datetime
from typing import Optional, List
from sqlalchemy import (
Column, String, Integer, Text, Boolean, DateTime, Float,
ForeignKey, JSON, ARRAY, BigInteger, Index, UniqueConstraint,
Numeric, Enum as SQLEnum
)
from sqlalchemy.dialects.postgresql import UUID, INET, JSONB
from sqlalchemy.orm import declarative_base, relationship
from sqlalchemy.sql import func
import uuid
import enum
from datetime import datetime
from sqlalchemy import (
Column, String, Integer, Text, Boolean, DateTime,
ForeignKey, JSON, ARRAY, BigInteger, Index,
Numeric, Enum as SQLEnum
)
from sqlalchemy.dialects.postgresql import UUID, INET, JSONB
from sqlalchemy.orm import declarative_base, relationship
from sqlalchemy.sql import func
import uuid
import enum
🤖 Prompt for AI Agents
migrations/models.py lines 7-18: the file currently imports Optional, List,
Float and UniqueConstraint but those names are not used anywhere; remove these
four imports from the import lists (delete Optional and List from the typing
import, delete Float and UniqueConstraint from the sqlalchemy import list) and
run the linter/type-checker to confirm no other unused imports remain.

Comment thread pyproject.toml
Comment on lines +1 to +73
[build-system]
requires = ["setuptools>=68.0", "wheel"]
build-backend = "setuptools.build_meta"

[project]
name = "judicial-automation-system"
version = "1.0.0"
description = "Sistema de Automacao Juridica - Brazilian Judicial Process Automation Platform"
readme = "README.md"
license = {text = "MIT"}
requires-python = ">=3.10"
authors = [
{name = "Judicial Automation Team"}
]
keywords = [
"legal",
"automation",
"judiciary",
"n8n",
"flask",
"ai",
"brazilian-law"
]
classifiers = [
"Development Status :: 4 - Beta",
"Intended Audience :: Legal Industry",
"License :: OSI Approved :: MIT License",
"Operating System :: OS Independent",
"Programming Language :: Python :: 3",
"Programming Language :: Python :: 3.10",
"Programming Language :: Python :: 3.11",
"Programming Language :: Python :: 3.12",
"Topic :: Office/Business",
"Topic :: Scientific/Engineering :: Artificial Intelligence",
]

dependencies = [
"Flask>=3.0.0",
"Flask-CORS>=4.0.0",
"Werkzeug>=3.0.1",
"redis>=5.0.1",
"psycopg2-binary>=2.9.9",
"SQLAlchemy>=2.0.23",
"openai>=1.6.1",
"tiktoken>=0.5.2",
"google-cloud-documentai>=2.22.0",
"google-auth>=2.25.2",
"google-api-python-client>=2.111.0",
"PyPDF2>=3.0.1",
"python-docx>=1.1.0",
"pdfplumber>=0.10.3",
"requests>=2.31.0",
"pydantic>=2.5.3",
"PyJWT>=2.8.0",
"python-dotenv>=1.0.0",
"prometheus-client>=0.19.0",
"structlog>=23.2.0",
"flasgger>=0.9.7.1",
"tenacity>=8.2.3",
]

[project.optional-dependencies]
dev = [
"pytest>=7.4.3",
"pytest-cov>=4.1.0",
"pytest-asyncio>=0.23.2",
"pytest-mock>=3.12.0",
"httpx>=0.26.0",
"factory-boy>=3.3.0",
"faker>=21.0.0",
"black>=23.12.1",
"isort>=5.13.2",
"flake8>=6.1.0",
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

# Check if aiohttp and cryptography are direct or transitive dependencies
cd /repo && git ls-files | head -20

Repository: fbmoulin/2acivelagent

Length of output: 120


🏁 Script executed:

# Search for aiohttp and cryptography in pyproject.toml and related files
rg -i "aiohttp|cryptography" --type toml

Repository: fbmoulin/2acivelagent

Length of output: 47


🏁 Script executed:

# Look for any requirements files or dependency specs
fd -e txt -e in | grep -i req

Repository: fbmoulin/2acivelagent

Length of output: 81


🏁 Script executed:

# Check imports in the codebase to see if aiohttp/cryptography are actually used
rg "import aiohttp|from aiohttp|import cryptography|from cryptography" --type py

Repository: fbmoulin/2acivelagent

Length of output: 47


🏁 Script executed:

cat requirements.txt

Repository: fbmoulin/2acivelagent

Length of output: 1219


🏁 Script executed:

# Check if aiohttp or cryptography appear anywhere in the repo
rg -i "aiohttp|cryptography" --type py --type txt --type toml --type md

Repository: fbmoulin/2acivelagent

Length of output: 134


🏁 Script executed:

# List all files in repo to understand structure
git ls-files | head -30

Repository: fbmoulin/2acivelagent

Length of output: 691


🌐 Web query:

Werkzeug 3.0.1 vulnerabilities CVE remote code execution safe_join

💡 Result:

Short answer — yes: Werkzeug 3.0.1 is affected by multiple published vulnerabilities.

  • CVE-2024-34069 — Remote Code Execution: debugger hostname/relative-path issue (fixed in werkzeug >= 3.0.3). If you run the debugger (not recommended in production) this is exploitable under specific conditions. [1]
  • CVE-2024-49766 — safe_join / UNC path bypass (Windows + Python < 3.11): safe_join could produce unsafe paths (fixed in werkzeug 3.0.6). Affects Windows deployments using older Python. [2]
  • CVE-2025-66221 — safe_join allows Windows device names (CON/AUX) causing hangs/DoS (fixed in werkzeug 3.1.4); note this is a separate Windows-specific issue affecting versions < 3.1.4. [3]

Recommended action: upgrade to a fixed release (at minimum >= 3.0.6 to address safe_join UNC and >= 3.0.3 for the debugger RCE; consider upgrading to the latest 3.x that includes the 3.1.4 fix if you run on Windows). References: [1] Snyk / CVE-2024-34069; [2] NVD / CVE-2024-49766; [3] Ubuntu / CVE-2025-66221.


🌐 Web query:

aiohttp 3.9.1 vulnerabilities CVE directory traversal request smuggling

💡 Result:

Short answer: yes — aiohttp 3.9.1 is affected by request‑smuggling and a static-file path‑traversal issue (and other related flaws). Upgrade is recommended.

Details and patches:

  • Request smuggling / CRLF/HTTP parsing issues: multiple advisories (e.g. CVE-2023-49082 and follow‑ups) note lenient HTTP parsing that can enable request‑smuggling; fixes were released in later 3.9.x releases (upgrade to ≥ 3.9.2) or newer maintenance branches. [1][2][3]
  • Path‑traversal in static routes: static() with follow_symlinks=True can allow directory traversal — fixed in ≥ 3.9.2. [2]
  • Other related parsing fixes were later addressed in other releases (e.g. 3.10.11, 3.12.14 for trailer/newline parsing). Review which runtime (pure‑Python vs with C extensions) you use — some issues only apply when extensions are disabled. [4][5]

Recommended action: upgrade aiohttp to a patched release (at minimum ≥ 3.9.2 for the issues above; consider a current supported release such as latest 3.12.x or 4.x) and test with your deployment (note pure‑Python builds/AIOHTTP_NO_EXTENSIONS can affect exposure). [2][4][5]

Sources:
[1] NVD / CVE-2023-49082 / advisories.
[2] Snyk package advisory for aiohttp 3.9.1 (request smuggling, path traversal, XSS notes).
[3] Debian / security tracker (notes and fixes).
[4] NVD / CVE-2024-52304 (newline/chunk extension parsing).
[5] NVD / CVE-2025-53643 (trailer parsing) .

If you want, I can (a) list the exact commit/advisory links and fixed versions, or (b) give a one‑line pip/requirements change you can apply.


🌐 Web query:

cryptography 41.0.7 vulnerabilities CVE PKCS7 NULL pointer dereference

💡 Result:

Short answer: This was a NULL-pointer dereference in cryptography's PKCS7 parsing (CVE-2023-49083). It affected cryptography >= 3.1 and was patched in 41.0.6 — so 41.0.7 is not vulnerable. [1][2][3]

References:

  • GitHub security advisory (NULL-dereference when loading PKCS7 certificates). [1]
  • NVD entry for CVE-2023-49083. [2]
  • OSS‑Security / release notes linking the patch (PR #9926). [3]

🌐 Web query:

cryptography 41.0.7 known vulnerabilities CVE security issues

💡 Result:

Short answer — yes. Known/security-relevant issues affecting cryptography 41.0.7:

  • CVE-2024-26130 — NULL-pointer dereference / crash (PKCS#12 mismatch) that can lead to denial-of-service; reported in vendor advisories and affects builds shipped as 41.0.7 in some distributions. [1][2]
  • CVE-2023-49083 — NULL-dereference when loading PKCS#7 (causes segfault/DoS); patched in 41.0.6 but listed in advisories for nearby versions. Verify your build. [3]
  • CVE-2020-25659 (Observable timing discrepancy) — incomplete fixes mean versions < 42.0.0 (including 41.0.7) are flagged by scanners as vulnerable; upgrade to ≥42.0.0 to remediate. [4]

References:

  1. Ubuntu USN / package advisory for python-cryptography (USN-6673-3). [2]
  2. IBM security bulletin referencing CVE-2024-26130. [1]
  3. CVE entry / vulnerability summary for CVE-2023-49083. [3]
  4. Snyk / package vulnerability listing noting observable-timing and other issues for <42.0.0. [4]

If you want, I can (a) fetch the exact CVSS scores and affected version ranges for each CVE, or (b) check whether your installed package/build includes the vulnerable code — tell me your platform and how cryptography was installed.


Critical: Resolve high-severity dependency vulnerabilities before release.

File: requirements.txt

Multiple HIGH severity vulnerabilities require immediate patching:

Werkzeug 3.0.1 (3 CVEs):

Recommendation: Update to Werkzeug 3.0.6 or later

aiohttp 3.9.1 (Multiple CVEs):

  • CVE-2023-49082 and follow-ups: HTTP request smuggling via lenient CRLF/chunk parsing
  • Path traversal vulnerability in static file serving with follow_symlinks=True
  • Additional parsing issues in trailer and newline handling

Recommendation: Update to aiohttp 3.9.2 minimum; prefer 3.10.11 or 3.12.14 for comprehensive fixes

cryptography 41.0.7 (3 CVEs):

  • CVE-2024-26130: NULL-pointer dereference in PKCS#12 causing DoS
  • CVE-2023-49083: NULL-dereference in PKCS#7 certificate deserialization
  • CVE-2020-25659: Observable timing discrepancy unfixed in versions <42.0.0

Recommendation: Update to cryptography 42.0.0 or later

These are blocking issues for production release.

Comment thread scripts/python/app.py Outdated
Comment on lines +300 to +305
-- Grant permissions
GRANT USAGE ON SCHEMA judicial TO n8n_user;
GRANT USAGE ON SCHEMA audit TO n8n_user;
GRANT ALL PRIVILEGES ON ALL TABLES IN SCHEMA judicial TO n8n_user;
GRANT ALL PRIVILEGES ON ALL TABLES IN SCHEMA audit TO n8n_user;
GRANT ALL PRIVILEGES ON ALL SEQUENCES IN SCHEMA judicial TO n8n_user;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Script assumes n8n_user exists and grants excessive privileges.

Two issues:

  1. The script will fail if n8n_user doesn't exist. Add a conditional user creation.
  2. ALL PRIVILEGES violates least privilege principle. Grant only what's needed.
-- Create user if not exists (PostgreSQL 9.0+)
DO $$
BEGIN
    IF NOT EXISTS (SELECT FROM pg_catalog.pg_roles WHERE rolname = 'n8n_user') THEN
        CREATE ROLE n8n_user WITH LOGIN PASSWORD 'change_me_in_production';
    END IF;
END
$$;

-- Grant minimal required privileges
GRANT USAGE ON SCHEMA judicial TO n8n_user;
GRANT USAGE ON SCHEMA audit TO n8n_user;
GRANT SELECT, INSERT, UPDATE ON ALL TABLES IN SCHEMA judicial TO n8n_user;
GRANT SELECT, INSERT ON ALL TABLES IN SCHEMA audit TO n8n_user;
GRANT USAGE, SELECT ON ALL SEQUENCES IN SCHEMA judicial TO n8n_user;
🤖 Prompt for AI Agents
In scripts/sql/init_database.sql around lines 300 to 305, the script assumes
n8n_user exists and uses ALL PRIVILEGES which is too broad; modify the script to
first create the role only if it does not exist (use a DO block checking
pg_catalog.pg_roles and create role with LOGIN and a placeholder password to be
changed in production), then replace ALL PRIVILEGES grants with least-privilege
grants: keep GRANT USAGE ON SCHEMA for judicial and audit, grant only
SELECT/INSERT/UPDATE (or the specific subset your app requires) on tables in
judicial and the narrower set needed for audit, and grant USAGE and SELECT on
sequences in judicial instead of ALL PRIVILEGES on sequences.

Comment on lines +1 to +261
# Sistema de Automacao Juridica - Integration Tests

import pytest
import json
from unittest.mock import Mock, patch, MagicMock


class TestFullWorkflowIntegration:
"""Integration tests for complete workflow"""

@pytest.mark.integration
@patch('scripts.python.app.openai.ChatCompletion.create')
def test_pdf_to_firac_workflow(self, mock_openai, client, sample_pdf_base64, mock_openai_response):
"""Test workflow from PDF extraction to FIRAC analysis"""
# Step 1: Extract PDF
extract_response = client.post('/extract-pdf',
data=json.dumps({'pdf_content': sample_pdf_base64}),
content_type='application/json')

assert extract_response.status_code == 200
extract_data = json.loads(extract_response.data)

# Step 2: If extraction successful and has text, perform FIRAC analysis
if extract_data.get('success') and extract_data.get('text'):
text = extract_data['text']

# Ensure text is long enough for analysis
if len(text) < 50:
text = text + " " * 50 # Pad for test

mock_openai.return_value = mock_openai_response

firac_response = client.post('/firac-analysis',
data=json.dumps({'text': text}),
content_type='application/json')

# Analysis should succeed with mocked OpenAI
if mock_openai.called:
assert firac_response.status_code == 200

@pytest.mark.integration
@patch('scripts.python.app.openai.ChatCompletion.create')
@patch('requests.post')
def test_firac_to_jurisprudence_workflow(self, mock_requests, mock_openai, client,
sample_legal_text, mock_openai_response,
sample_datajud_response):
"""Test workflow from FIRAC analysis to jurisprudence search"""
mock_openai.return_value = mock_openai_response

# Step 1: FIRAC Analysis
firac_response = client.post('/firac-analysis',
data=json.dumps({'text': sample_legal_text}),
content_type='application/json')

assert firac_response.status_code == 200
firac_data = json.loads(firac_response.data)
assert firac_data['success'] is True

# Step 2: Search jurisprudence based on analysis
mock_api_response = Mock()
mock_api_response.status_code = 200
mock_api_response.json.return_value = sample_datajud_response
mock_requests.return_value = mock_api_response

search_response = client.post('/datajud-search',
data=json.dumps({
'tribunal': 'tjsp',
'texto_livre': 'cobranca contrato'
}),
content_type='application/json')

assert search_response.status_code == 200
search_data = json.loads(search_response.data)
assert search_data['success'] is True

@pytest.mark.integration
@patch('scripts.python.app.openai.ChatCompletion.create')
@patch('requests.post')
def test_complete_analysis_workflow(self, mock_requests, mock_openai, client,
sample_legal_text, sample_case_data,
mock_openai_response, sample_datajud_response):
"""Test complete workflow: FIRAC -> Search -> Distinguish -> Generate"""
# Mock OpenAI responses
mock_openai.return_value = mock_openai_response

# Mock DATAJUD response
mock_api_response = Mock()
mock_api_response.status_code = 200
mock_api_response.json.return_value = sample_datajud_response
mock_requests.return_value = mock_api_response

# Step 1: FIRAC Analysis
firac_response = client.post('/firac-analysis',
data=json.dumps({'text': sample_legal_text}),
content_type='application/json')
assert firac_response.status_code == 200

# Step 2: Jurisprudence Search
search_response = client.post('/datajud-search',
data=json.dumps({
'tribunal': 'tjsp',
'texto_livre': 'cobranca'
}),
content_type='application/json')
assert search_response.status_code == 200
search_data = json.loads(search_response.data)

# Step 3: Distinguish Analysis (if results found)
if search_data.get('total_results', 0) > 0:
precedent = sample_datajud_response['hits']['hits'][0]['_source']
distinguish_response = client.post('/distinguish-analysis',
data=json.dumps({
'current_facts': sample_legal_text,
'precedent_data': precedent
}),
content_type='application/json')
assert distinguish_response.status_code == 200

# Step 4: Generate Document
generate_response = client.post('/generate-document',
data=json.dumps({
'document_type': 'sentenca',
'case_data': sample_case_data
}),
content_type='application/json')
assert generate_response.status_code == 200
generate_data = json.loads(generate_response.data)
assert generate_data['success'] is True
assert 'generated_text' in generate_data


class TestServiceAvailability:
"""Tests for service availability and degradation"""

@pytest.mark.integration
def test_api_available_without_redis(self, client):
"""Test that API works even if Redis is unavailable"""
# Health check should still work
response = client.get('/health')
assert response.status_code == 200

data = json.loads(response.data)
assert data['status'] == 'healthy'
# Redis might be false but API should work
assert 'services' in data

@pytest.mark.integration
@patch('scripts.python.app.openai.ChatCompletion.create')
def test_api_handles_openai_failure(self, mock_openai, client, sample_legal_text):
"""Test graceful handling of OpenAI API failure"""
mock_openai.side_effect = Exception("OpenAI API Error")

response = client.post('/firac-analysis',
data=json.dumps({'text': sample_legal_text}),
content_type='application/json')

# Should return error gracefully, not crash
assert response.status_code in [200, 500]
data = json.loads(response.data)
assert 'error' in data or data.get('success') is False

@pytest.mark.integration
@patch('requests.post')
def test_api_handles_datajud_failure(self, mock_requests, client):
"""Test graceful handling of DATAJUD API failure"""
mock_requests.side_effect = Exception("Connection timeout")

response = client.post('/datajud-search',
data=json.dumps({'tribunal': 'tjsp'}),
content_type='application/json')

# Should return error gracefully
assert response.status_code in [200, 500]
data = json.loads(response.data)
assert 'error' in data or data.get('success') is False

@pytest.mark.integration
@patch('requests.post')
def test_api_handles_datajud_auth_failure(self, mock_requests, client):
"""Test handling of DATAJUD authentication failure"""
mock_response = Mock()
mock_response.status_code = 401
mock_response.text = "Unauthorized"
mock_requests.return_value = mock_response

response = client.post('/datajud-search',
data=json.dumps({'tribunal': 'tjsp'}),
content_type='application/json')

data = json.loads(response.data)
assert data['success'] is False


class TestConcurrentRequests:
"""Tests for handling concurrent requests"""

@pytest.mark.integration
def test_multiple_health_checks(self, client):
"""Test multiple concurrent health checks"""
responses = []
for _ in range(10):
response = client.get('/health')
responses.append(response)

assert all(r.status_code == 200 for r in responses)

@pytest.mark.integration
@patch('scripts.python.app.openai.ChatCompletion.create')
def test_multiple_pdf_extractions(self, mock_openai, client, sample_pdf_base64):
"""Test multiple PDF extractions"""
responses = []
for _ in range(5):
response = client.post('/extract-pdf',
data=json.dumps({'pdf_content': sample_pdf_base64}),
content_type='application/json')
responses.append(response)

assert all(r.status_code == 200 for r in responses)


class TestDataValidation:
"""Tests for data validation across endpoints"""

@pytest.mark.integration
def test_json_content_type_required(self, client):
"""Test that JSON content type is properly handled"""
# Send without content-type
response = client.post('/firac-analysis',
data='{"text": "some text"}')

# Should handle gracefully
assert response.status_code in [200, 400, 415]

@pytest.mark.integration
def test_malformed_json_handled(self, client):
"""Test handling of malformed JSON"""
response = client.post('/firac-analysis',
data='{"text": invalid json}',
content_type='application/json')

assert response.status_code == 400

@pytest.mark.integration
def test_unicode_text_handled(self, client, sample_pdf_base64):
"""Test handling of Unicode text (Portuguese characters)"""
unicode_text = """
Processo judicial com caracteres especiais:
ação, execução, sentença, decisão, réu, autôr,
índice, cônsul, além, através, após, até
""" + " " * 100 # Ensure minimum length

with patch('scripts.python.app.openai.ChatCompletion.create') as mock_openai:
mock_response = Mock()
mock_response.choices = [Mock(message=Mock(content='{"result": "ok"}'))]
mock_openai.return_value = mock_response

response = client.post('/firac-analysis',
data=json.dumps({'text': unicode_text}),
content_type='application/json')

assert response.status_code == 200
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Run Black formatter to fix formatting issues.

The pipeline indicates this file needs Black formatting.

#!/bin/bash
# Format the file with Black
black tests/integration/test_api_integration.py --check --diff
🧰 Tools
🪛 GitHub Actions: CI Pipeline

[error] 1-1: Black formatting check failed. Run 'black' to format code in this file.

🪛 Ruff (0.14.7)

209-209: Unused method argument: mock_openai

(ARG002)


244-244: Unused method argument: sample_pdf_base64

(ARG002)

🤖 Prompt for AI Agents
tests/integration/test_api_integration.py lines 1-261: the file is not
Black-formatted which causes CI to fail; run Black on this file and commit the
changes to satisfy formatting checks — e.g. run `black
tests/integration/test_api_integration.py` (or `black .` at repo root), verify
the diff, stage and commit the formatted file, and re-run the pipeline.

Comment on lines +1 to +342
# Sistema de Automacao Juridica - API Routes Unit Tests

import pytest
import json
from unittest.mock import Mock, patch, MagicMock


class TestHealthEndpoint:
"""Tests for /health endpoint"""

@pytest.mark.unit
def test_health_check_returns_200(self, client):
"""Test health endpoint returns 200 OK"""
response = client.get('/health')

assert response.status_code == 200

@pytest.mark.unit
def test_health_check_returns_json(self, client):
"""Test health endpoint returns JSON"""
response = client.get('/health')

assert response.content_type == 'application/json'

@pytest.mark.unit
def test_health_check_contains_status(self, client):
"""Test health response contains status field"""
response = client.get('/health')
data = json.loads(response.data)

assert 'status' in data
assert data['status'] == 'healthy'

@pytest.mark.unit
def test_health_check_contains_services(self, client):
"""Test health response contains services status"""
response = client.get('/health')
data = json.loads(response.data)

assert 'services' in data
assert 'redis' in data['services']
assert 'openai' in data['services']
assert 'datajud' in data['services']

@pytest.mark.unit
def test_health_check_contains_version(self, client):
"""Test health response contains version"""
response = client.get('/health')
data = json.loads(response.data)

assert 'version' in data
assert data['version'] == '1.0.0'


class TestExtractPDFEndpoint:
"""Tests for /extract-pdf endpoint"""

@pytest.mark.unit
def test_extract_pdf_requires_pdf_content(self, client):
"""Test that pdf_content is required"""
response = client.post('/extract-pdf',
data=json.dumps({}),
content_type='application/json')

assert response.status_code == 400
data = json.loads(response.data)
assert 'error' in data

@pytest.mark.unit
def test_extract_pdf_accepts_base64(self, client, sample_pdf_base64):
"""Test that base64 encoded PDF is accepted"""
response = client.post('/extract-pdf',
data=json.dumps({'pdf_content': sample_pdf_base64}),
content_type='application/json')

assert response.status_code == 200

@pytest.mark.unit
def test_extract_pdf_returns_text(self, client, sample_pdf_base64):
"""Test that extracted text is returned"""
response = client.post('/extract-pdf',
data=json.dumps({'pdf_content': sample_pdf_base64}),
content_type='application/json')

data = json.loads(response.data)
assert 'text' in data or 'error' in data

@pytest.mark.unit
def test_extract_pdf_invalid_base64(self, client):
"""Test handling of invalid base64 content"""
response = client.post('/extract-pdf',
data=json.dumps({'pdf_content': 'not-valid-base64!!!'}),
content_type='application/json')

assert response.status_code in [400, 500]

@pytest.mark.unit
def test_extract_pdf_empty_request(self, client):
"""Test handling of empty request body"""
response = client.post('/extract-pdf',
data='',
content_type='application/json')

assert response.status_code == 400


class TestFIRACAnalysisEndpoint:
"""Tests for /firac-analysis endpoint"""

@pytest.mark.unit
def test_firac_requires_text(self, client):
"""Test that text field is required"""
response = client.post('/firac-analysis',
data=json.dumps({}),
content_type='application/json')

assert response.status_code == 400
data = json.loads(response.data)
assert 'error' in data

@pytest.mark.unit
def test_firac_rejects_short_text(self, client):
"""Test that very short text is rejected"""
response = client.post('/firac-analysis',
data=json.dumps({'text': 'Too short'}),
content_type='application/json')

assert response.status_code == 400
data = json.loads(response.data)
assert 'error' in data
assert 'short' in data['error'].lower() or 'minimum' in data['error'].lower()

@pytest.mark.unit
@patch('scripts.python.app.openai.ChatCompletion.create')
def test_firac_calls_openai(self, mock_openai, client, sample_legal_text, mock_openai_response):
"""Test that OpenAI is called for analysis"""
mock_openai.return_value = mock_openai_response

response = client.post('/firac-analysis',
data=json.dumps({'text': sample_legal_text}),
content_type='application/json')

mock_openai.assert_called_once()

@pytest.mark.unit
@patch('scripts.python.app.openai.ChatCompletion.create')
def test_firac_returns_analysis(self, mock_openai, client, sample_legal_text, mock_openai_response):
"""Test that FIRAC analysis is returned"""
mock_openai.return_value = mock_openai_response

response = client.post('/firac-analysis',
data=json.dumps({'text': sample_legal_text}),
content_type='application/json')

assert response.status_code == 200
data = json.loads(response.data)
assert data['success'] is True
assert 'firac_analysis' in data


class TestDatajudSearchEndpoint:
"""Tests for /datajud-search endpoint"""

@pytest.mark.unit
def test_datajud_accepts_empty_params(self, client):
"""Test that empty params returns error or default search"""
response = client.post('/datajud-search',
data=json.dumps({}),
content_type='application/json')

# Should return 200 with error in body or perform default search
assert response.status_code in [200, 400]

@pytest.mark.unit
@patch('requests.post')
def test_datajud_calls_api(self, mock_post, client):
"""Test that DATAJUD API is called"""
mock_response = Mock()
mock_response.status_code = 200
mock_response.json.return_value = {"hits": {"total": {"value": 0}, "hits": []}}
mock_post.return_value = mock_response

response = client.post('/datajud-search',
data=json.dumps({
'tribunal': 'tjsp',
'texto_livre': 'execucao fiscal'
}),
content_type='application/json')

mock_post.assert_called_once()

@pytest.mark.unit
@patch('requests.post')
def test_datajud_returns_results(self, mock_post, client, sample_datajud_response):
"""Test that search results are returned"""
mock_response = Mock()
mock_response.status_code = 200
mock_response.json.return_value = sample_datajud_response
mock_post.return_value = mock_response

response = client.post('/datajud-search',
data=json.dumps({'tribunal': 'tjsp'}),
content_type='application/json')

assert response.status_code == 200
data = json.loads(response.data)
assert data['success'] is True
assert 'total_results' in data


class TestDistinguishAnalysisEndpoint:
"""Tests for /distinguish-analysis endpoint"""

@pytest.mark.unit
def test_distinguish_requires_current_facts(self, client, sample_precedent_data):
"""Test that current_facts is required"""
response = client.post('/distinguish-analysis',
data=json.dumps({'precedent_data': sample_precedent_data}),
content_type='application/json')

assert response.status_code == 400

@pytest.mark.unit
def test_distinguish_requires_precedent_data(self, client):
"""Test that precedent_data is required"""
response = client.post('/distinguish-analysis',
data=json.dumps({'current_facts': 'Some facts'}),
content_type='application/json')

assert response.status_code == 400

@pytest.mark.unit
@patch('scripts.python.app.openai.ChatCompletion.create')
def test_distinguish_returns_analysis(self, mock_openai, client, sample_legal_text,
sample_precedent_data, mock_openai_response):
"""Test that distinguish analysis is returned"""
mock_openai.return_value = mock_openai_response

response = client.post('/distinguish-analysis',
data=json.dumps({
'current_facts': sample_legal_text,
'precedent_data': sample_precedent_data
}),
content_type='application/json')

assert response.status_code == 200
data = json.loads(response.data)
assert data['success'] is True


class TestGenerateDocumentEndpoint:
"""Tests for /generate-document endpoint"""

@pytest.mark.unit
def test_generate_requires_document_type(self, client, sample_case_data):
"""Test that document_type is required"""
response = client.post('/generate-document',
data=json.dumps({'case_data': sample_case_data}),
content_type='application/json')

assert response.status_code == 400

@pytest.mark.unit
def test_generate_requires_case_data(self, client):
"""Test that case_data is required"""
response = client.post('/generate-document',
data=json.dumps({'document_type': 'sentenca'}),
content_type='application/json')

assert response.status_code == 400

@pytest.mark.unit
@patch('scripts.python.app.openai.ChatCompletion.create')
def test_generate_sentenca(self, mock_openai, client, sample_case_data):
"""Test generating sentenca document"""
mock_response = Mock()
mock_response.choices = [Mock(message=Mock(content="Generated sentenca content"))]
mock_openai.return_value = mock_response

response = client.post('/generate-document',
data=json.dumps({
'document_type': 'sentenca',
'case_data': sample_case_data
}),
content_type='application/json')

assert response.status_code == 200
data = json.loads(response.data)
assert data['success'] is True
assert data['document_type'] == 'sentenca'

@pytest.mark.unit
@patch('scripts.python.app.openai.ChatCompletion.create')
def test_generate_despacho(self, mock_openai, client, sample_case_data):
"""Test generating despacho document"""
mock_response = Mock()
mock_response.choices = [Mock(message=Mock(content="Generated despacho content"))]
mock_openai.return_value = mock_response

response = client.post('/generate-document',
data=json.dumps({
'document_type': 'despacho',
'case_data': sample_case_data
}),
content_type='application/json')

assert response.status_code == 200
data = json.loads(response.data)
assert data['document_type'] == 'despacho'

@pytest.mark.unit
def test_generate_invalid_type(self, client, sample_case_data):
"""Test that invalid document type returns error"""
response = client.post('/generate-document',
data=json.dumps({
'document_type': 'invalid_type',
'case_data': sample_case_data
}),
content_type='application/json')

# The endpoint should handle this - either 400 or 200 with error
data = json.loads(response.data)
assert data['success'] is False or response.status_code == 400


class TestErrorHandling:
"""Tests for error handling"""

@pytest.mark.unit
def test_404_returns_json(self, client):
"""Test that 404 errors return JSON"""
response = client.get('/nonexistent-endpoint')

assert response.status_code == 404
assert response.content_type == 'application/json'

@pytest.mark.unit
def test_method_not_allowed(self, client):
"""Test handling of wrong HTTP method"""
response = client.get('/extract-pdf')

assert response.status_code == 405
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Run Black formatter to fix formatting issues.

The pipeline indicates this file needs Black formatting. Run black tests/unit/test_api_routes.py to resolve the CI failure.

#!/bin/bash
# Format the file with Black
black tests/unit/test_api_routes.py --check --diff
🧰 Tools
🪛 GitHub Actions: CI Pipeline

[error] 1-1: Black formatting check failed. Run 'black' to format code in this file.

🪛 Ruff (0.14.7)

139-139: Local variable response is assigned to but never used

Remove assignment to unused variable response

(F841)


183-183: Local variable response is assigned to but never used

Remove assignment to unused variable response

(F841)

🤖 Prompt for AI Agents
tests/unit/test_api_routes.py lines 1-342: the file is not Black-formatted which
fails CI; run the Black formatter on this file (e.g., black
tests/unit/test_api_routes.py), stage and commit the resulting changes so the
file matches the project formatting rules and the CI Black check passes.

Comment on lines +68 to +86
@patch('PyPDF2.PdfReader')
def test_extract_text_multi_page(self, mock_pdf_reader, pdf_extractor):
"""Test extraction from multi-page PDF"""
# Mock multiple pages
mock_page1 = Mock()
mock_page1.extract_text.return_value = "Page 1 content"
mock_page2 = Mock()
mock_page2.extract_text.return_value = "Page 2 content"

mock_reader = Mock()
mock_reader.pages = [mock_page1, mock_page2]
mock_pdf_reader.return_value = mock_reader

result = pdf_extractor.extract_text_from_pdf(b"dummy pdf content")

assert result['success'] is True
assert result['pages'] == 2
assert "Page 1 content" in result['text']
assert "Page 2 content" in result['text']
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Locate the PDFExtractor implementation
echo "=== Finding PDFExtractor implementation ==="
find . -type f -name "*.py" | xargs grep -l "class PDFExtractor"

# Check imports in app.py
echo ""
echo "=== Checking PyPDF2 imports in app.py ==="
rg -n "import.*PyPDF2|from.*PyPDF2" --type py

# Check PdfReader usage in PDFExtractor
echo ""
echo "=== Checking PdfReader usage ==="
rg -n "PdfReader" --type py -B 2 -A 2

Repository: fbmoulin/2acivelagent

Length of output: 1577


Update the patch path to target where PyPDF2 is used in the implementation.

The current @patch('PyPDF2.PdfReader') patches the global PyPDF2 module, but since scripts/python/app.py imports PyPDF2 at the module level and uses PyPDF2.PdfReader() directly, the patch should target the imported reference:

-@patch('PyPDF2.PdfReader')
+@patch('scripts.python.app.PyPDF2.PdfReader')

The mock must patch where the object is used (in the app.py namespace), not where it's defined (in the PyPDF2 module).

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
@patch('PyPDF2.PdfReader')
def test_extract_text_multi_page(self, mock_pdf_reader, pdf_extractor):
"""Test extraction from multi-page PDF"""
# Mock multiple pages
mock_page1 = Mock()
mock_page1.extract_text.return_value = "Page 1 content"
mock_page2 = Mock()
mock_page2.extract_text.return_value = "Page 2 content"
mock_reader = Mock()
mock_reader.pages = [mock_page1, mock_page2]
mock_pdf_reader.return_value = mock_reader
result = pdf_extractor.extract_text_from_pdf(b"dummy pdf content")
assert result['success'] is True
assert result['pages'] == 2
assert "Page 1 content" in result['text']
assert "Page 2 content" in result['text']
@patch('scripts.python.app.PyPDF2.PdfReader')
def test_extract_text_multi_page(self, mock_pdf_reader, pdf_extractor):
"""Test extraction from multi-page PDF"""
# Mock multiple pages
mock_page1 = Mock()
mock_page1.extract_text.return_value = "Page 1 content"
mock_page2 = Mock()
mock_page2.extract_text.return_value = "Page 2 content"
mock_reader = Mock()
mock_reader.pages = [mock_page1, mock_page2]
mock_pdf_reader.return_value = mock_reader
result = pdf_extractor.extract_text_from_pdf(b"dummy pdf content")
assert result['success'] is True
assert result['pages'] == 2
assert "Page 1 content" in result['text']
assert "Page 2 content" in result['text']
🤖 Prompt for AI Agents
In tests/unit/test_pdf_extractor.py around lines 68 to 86, the test currently
patches PyPDF2 at its module location ('PyPDF2.PdfReader') but the code under
test imports and uses PyPDF2 from scripts/python/app.py; update the patch to
target the symbol where it is used (e.g., change the decorator to patch
'scripts.python.app.PyPDF2.PdfReader' or the actual import path your app module
exposes), so the mock replaces the PdfReader reference in app.py, keeping the
rest of the test logic the same.

Security fixes implemented:

1. alembic.ini: Remove hardcoded database credentials

2. migrations/env.py: Require database password from environment,
   no longer accepts default values for sensitive credentials

3. docker-compose.yml: Remove exposed ports for internal services
   - PostgreSQL (5432), Redis (6379), Python services (5000),
     and Prometheus (9090) now only accessible within Docker network
   - Services communicate through internal network, not host ports

4. scripts/python/app.py: Complete security rewrite
   - Add JWT authentication with token verification
   - Restrict CORS to configured allowed origins only
   - Add input validation with length limits
   - Add prompt injection detection and blocking
   - Implement secure error handling (no stack traces exposed)
   - Add request ID tracking for audit trail
   - Remove debug mode in production
   - Add configuration validation at startup

5. .github/workflows/ci.yml: Enforce security scans
   - Bandit security scan now fails pipeline on issues
   - Safety dependency check now fails on vulnerabilities
   - Security job added to required CI checks

These changes address CRITICAL and HIGH severity issues identified
in the security analysis.
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

♻️ Duplicate comments (4)
scripts/python/app.py (4)

128-130: Deprecated OpenAI API key assignment.

Setting openai.api_key is deprecated in openai>=1.0. The v1.x library reads from OPENAI_API_KEY environment variable automatically when creating a client. Remove this assignment and instantiate clients properly.

This is related to the OpenAI v1.x migration issue flagged in past reviews.


376-406: Critical: OpenAI API usage incompatible with openai v1.x.

openai.ChatCompletion.create() and openai.error.* exceptions were removed in openai v1.0+. Since openai>=1.6.1 is required, this code will fail with AttributeError at runtime.

As flagged in past reviews, migrate to the client-based API:

from openai import OpenAI

client = OpenAI()  # Uses OPENAI_API_KEY env var automatically

try:
    response = client.chat.completions.create(
        model=Config.OPENAI_MODEL,
        messages=[
            {"role": "system", "content": cls.SYSTEM_PROMPT},
            {"role": "user", "content": prompt}
        ],
        max_tokens=Config.OPENAI_MAX_TOKENS,
        temperature=0.3,
        timeout=Config.OPENAI_TIMEOUT
    )
    return {
        "success": True,
        "firac_analysis": response.choices[0].message.content,
        ...
    }
except openai.RateLimitError:
    ...
except openai.AuthenticationError:
    ...
except openai.APITimeoutError:
    ...

520-540: Same OpenAI v1.x incompatibility issue.

This method uses the deprecated openai.ChatCompletion.create() API, same as FIRACAnalyzer. Apply the same migration pattern using the client-based API.


572-593: Same OpenAI v1.x incompatibility issue.

This method also uses the deprecated openai.ChatCompletion.create() API. Apply the same client-based migration.

🧹 Nitpick comments (7)
.github/workflows/ci.yml (4)

12-14: Unused POETRY_VERSION environment variable.

POETRY_VERSION is defined but the workflow uses pip throughout. Either remove this unused variable or switch dependency management to Poetry if that's the intended approach.

 env:
   PYTHON_VERSION: '3.11'
-  POETRY_VERSION: '1.7.1'

40-43: Consider installing type stubs for mypy.

The lint job installs black isort flake8 mypy but not type stubs. If mypy is intended to provide meaningful checks, consider adding common stubs or using --install-types --non-interactive.


197-200: Coverage job re-runs all tests - consider caching coverage data.

This job runs the full test suite again after unit and integration tests already passed. For efficiency, consider collecting coverage artifacts from previous jobs and merging them, or accept this as the simplest approach for now.


294-305: Consider including docker-build in failure checks.

The ci-success job checks lint, test-unit, test-integration, and security but not coverage or docker-build. If Docker build failures should block merging, add it to the condition:

           if [[ "${{ needs.lint.result }}" == "failure" ]] || \
              [[ "${{ needs.test-unit.result }}" == "failure" ]] || \
              [[ "${{ needs.test-integration.result }}" == "failure" ]] || \
+             [[ "${{ needs.docker-build.result }}" == "failure" ]] || \
              [[ "${{ needs.security.result }}" == "failure" ]]; then
scripts/python/app.py (3)

238-249: Annotate class constant with ClassVar.

DANGEROUS_PATTERNS is a class-level constant. Annotating it with ClassVar improves type checking clarity.

+from typing import ClassVar
+
 class InputValidator:
     """Input validation and sanitization utilities."""

     # Patterns that might indicate prompt injection
-    DANGEROUS_PATTERNS = [
+    DANGEROUS_PATTERNS: ClassVar[list[str]] = [
         r'ignore\s+(previous|all|above)',

342-347: Use logging.exception to include stack traces for debugging.

In error handlers, logging.error doesn't include the stack trace. Using logging.exception provides more context for debugging while still sanitizing the user-facing message.

         except PyPDF2.errors.PdfReadError as e:
-            logger.error(f"PDF read error: {type(e).__name__}")
+            logger.exception("PDF read error")
             return {"success": False, "error": "Invalid or corrupted PDF file"}
         except Exception as e:
-            logger.error(f"PDF extraction error: {type(e).__name__}: {e}")
+            logger.exception("PDF extraction error")
             return {"success": False, "error": "Failed to extract text from PDF"}

Apply similar changes to other error handlers (lines 399-406, 483, 539, 592).


183-184: datetime.utcnow() is deprecated.

datetime.utcnow() is deprecated since Python 3.12. Use datetime.now(timezone.utc) for forward compatibility:

+from datetime import datetime, timedelta, timezone

 def create_jwt_token(user_id: str, roles: List[str] = None) -> str:
     """Create JWT token for authentication."""
     payload = {
         'user_id': user_id,
         'roles': roles or ['user'],
-        'iat': datetime.utcnow(),
-        'exp': datetime.utcnow() + timedelta(hours=Config.JWT_EXPIRY_HOURS)
+        'iat': datetime.now(timezone.utc),
+        'exp': datetime.now(timezone.utc) + timedelta(hours=Config.JWT_EXPIRY_HOURS)
     }
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ebe0325 and fd66572.

📒 Files selected for processing (5)
  • .github/workflows/ci.yml (1 hunks)
  • alembic.ini (1 hunks)
  • docker-compose.yml (4 hunks)
  • migrations/env.py (1 hunks)
  • scripts/python/app.py (1 hunks)
🧰 Additional context used
🪛 Checkov (3.2.334)
.github/workflows/ci.yml

[medium] 166-167: Basic Auth Credentials

(CKV_SECRET_4)

🪛 Ruff (0.14.7)
migrations/env.py

46-49: Avoid specifying long messages outside the exception class

(TRY003)

scripts/python/app.py

1-1: Shebang is present but file is not executable

(EXE001)


73-73: Possible hardcoded password assigned to: "JWT_SECRET"

(S105)


145-145: Do not catch blind exception: Exception

(BLE001)


178-178: PEP 484 prohibits implicit Optional

Convert to T | None

(RUF013)


193-193: Consider moving this statement to an else block

(TRY300)


239-249: Mutable class attributes should be annotated with typing.ClassVar

(RUF012)


252-252: PEP 484 prohibits implicit Optional

Convert to T | None

(RUF013)


343-343: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


345-345: Do not catch blind exception: Exception

(BLE001)


346-346: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


399-399: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


404-404: Do not catch blind exception: Exception

(BLE001)


405-405: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


483-483: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


538-538: Do not catch blind exception: Exception

(BLE001)


539-539: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


549-553: Mutable class attributes should be annotated with typing.ClassVar

(RUF012)


591-591: Do not catch blind exception: Exception

(BLE001)


592-592: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


702-702: Do not catch blind exception: Exception

(BLE001)


832-832: Unused function argument: error

(ARG001)


841-841: Unused function argument: error

(ARG001)


850-850: Unused function argument: error

(ARG001)


859-859: Unused function argument: error

(ARG001)


868-868: Unused function argument: error

(ARG001)


877-877: Unused function argument: error

(ARG001)

🔇 Additional comments (10)
alembic.ini (1)

26-30: Previous security concern successfully addressed.

The hardcoded database credentials flagged in earlier reviews have been removed. The configuration now correctly directs users to set DATABASE_URL or POSTGRES_URL via environment variables, with clear inline documentation.

migrations/env.py (3)

44-49: Previous security concern successfully addressed.

The function now properly requires the password via environment variable and raises a clear error if not set, eliminating the risk of using a hardcoded default.


59-81: LGTM!

The offline migration mode is properly configured with appropriate flags (literal_binds=True, compare_type=True, compare_server_default=True) for accurate schema comparison.


84-109: LGTM!

The online migration mode is correctly implemented with NullPool for migrations and include_schemas=True to support the multi-schema setup (judicial and audit schemas) described in the PR.

docker-compose.yml (1)

69-72: Good security hardening - internal services no longer exposed to host.

Changing from ports to expose for postgres, redis, python-services, and prometheus is the correct approach. These services are only needed within the Docker network and should be accessed through nginx reverse proxy or via docker exec during development.

Also applies to: 88-91, 108-110, 157-159

.github/workflows/ci.yml (3)

91-94: Verify pytest marker logic includes intended tests.

The marker -m "unit or not integration" runs tests marked unit OR any test not marked integration. This means unmarked tests run in unit tests. If this is intentional for gradual marker adoption, consider adding a comment explaining the behavior.


126-128: Test credentials in CI are acceptable.

The Checkov warning about credentials (CKV_SECRET_4) is a false positive here. These are ephemeral test database credentials used only within the CI runner's isolated PostgreSQL service container.


220-256: Security scanning job is well-configured.

Running Bandit with -ll -ii for medium+ severity/confidence and Safety for dependency vulnerabilities provides good coverage. The job runs independently in parallel, which is efficient.

scripts/python/app.py (2)

720-728: Good fix: Deterministic cache key using hashlib.

The cache key now uses hashlib.sha256() instead of the non-deterministic hash() function. This ensures consistent cache hits across interpreter sessions.


831-892: Error handlers follow Flask conventions.

The unused error parameter warnings (ARG001) are false positives. Flask requires this parameter signature for error handlers. The handlers intentionally don't expose internal error details to prevent information disclosure.

Comment thread migrations/env.py
Comment on lines +51 to +54
host = os.getenv("POSTGRES_HOST", "localhost")
port = os.getenv("POSTGRES_PORT", "5432")
db = os.getenv("POSTGRES_DB", "n8n")
user = os.getenv("POSTGRES_USER", "n8n_user")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Default database name and user appear misaligned with project purpose.

The defaults reference "n8n" (line 53: POSTGRES_DB defaults to "n8n", line 54: POSTGRES_USER defaults to "n8n_user"), which seems to be from a different project. This is a judicial automation system per the PR description, so these defaults may confuse developers during local setup.

Consider updating the defaults to reflect the actual project:

-    db = os.getenv("POSTGRES_DB", "n8n")
-    user = os.getenv("POSTGRES_USER", "n8n_user")
+    db = os.getenv("POSTGRES_DB", "judicial_automation")
+    user = os.getenv("POSTGRES_USER", "judicial_user")
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
host = os.getenv("POSTGRES_HOST", "localhost")
port = os.getenv("POSTGRES_PORT", "5432")
db = os.getenv("POSTGRES_DB", "n8n")
user = os.getenv("POSTGRES_USER", "n8n_user")
host = os.getenv("POSTGRES_HOST", "localhost")
port = os.getenv("POSTGRES_PORT", "5432")
db = os.getenv("POSTGRES_DB", "judicial_automation")
user = os.getenv("POSTGRES_USER", "judicial_user")
🤖 Prompt for AI Agents
In migrations/env.py around lines 51 to 54, the POSTGRES_DB and POSTGRES_USER
default values ("n8n" and "n8n_user") are from another project; update these
defaults to project-appropriate values (for example a descriptive DB name like
"judicial_automation" and a matching user like "judicial_user" or blank them so
callers must set env vars). Modify the os.getenv default arguments accordingly
and ensure any downstream docs or .env.example are updated to match the new
defaults.

Comment thread scripts/python/app.py
Comment on lines +68 to +79
@classmethod
def validate(cls) -> Tuple[bool, List[str]]:
"""Validate required configuration."""
errors = []

if not cls.JWT_SECRET or cls.JWT_SECRET == 'dev-secret-key':
errors.append("JWT_SECRET must be set to a secure value")

if not cls.OPENAI_API_KEY:
errors.append("OPENAI_API_KEY must be set")

return len(errors) == 0, errors
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Configuration validation only runs in __main__ block.

Config.validate() is called only when running directly via python app.py. When deployed via Gunicorn (gunicorn app:app), this validation is bypassed. Consider validating at module load time or in a factory function.

+# Validate configuration at import time for production safety
+if os.getenv('FLASK_ENV') != 'development':
+    valid, errors = Config.validate()
+    if not valid:
+        raise RuntimeError(f"Configuration errors: {'; '.join(errors)}")

Add this after the Config class definition (around line 80).

🧰 Tools
🪛 Ruff (0.14.7)

73-73: Possible hardcoded password assigned to: "JWT_SECRET"

(S105)

🤖 Prompt for AI Agents
In scripts/python/app.py around lines 68 to 79 (add after the Config class at
~line 80): Config.validate() currently only runs in the __main__ block so
Gunicorn bypasses it; call Config.validate() at module load (immediately after
the class definition) and raise a RuntimeError or exit when validation fails so
the app fails fast under Gunicorn and other import-based runners; ensure the
raised error includes the list of validation messages for clear diagnostics.

Comment thread scripts/python/app.py
Comment on lines +899 to +916
if __name__ == '__main__':
# Validate configuration
valid, errors = Config.validate()
if not valid:
for error in errors:
logger.critical(f"Configuration error: {error}")
logger.critical("Application cannot start with invalid configuration")
exit(1)

logger.info("Starting Judicial Automation API (Security-Hardened)...")
logger.info(f"Allowed origins: {Config.ALLOWED_ORIGINS}")
logger.info(f"OpenAI configured: {bool(Config.OPENAI_API_KEY)}")
logger.info(f"DATAJUD configured: {bool(Config.DATAJUD_USERNAME)}")
logger.info(f"Redis configured: {redis_client is not None}")

# IMPORTANT: In production, use a proper WSGI server like Gunicorn
# gunicorn -w 4 -b 0.0.0.0:5000 app:app
app.run(host='127.0.0.1', port=5000)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

fd -t f "Dockerfile.python" --exec cat {}

Repository: fbmoulin/2acivelagent

Length of output: 1790


🏁 Script executed:

fd -t f "requirements.txt" --exec grep -i gunicorn {} +

Repository: fbmoulin/2acivelagent

Length of output: 99


🏁 Script executed:

fd -t f "requirements.txt" --exec cat {}

Repository: fbmoulin/2acivelagent

Length of output: 1219


Use Gunicorn (or another production WSGI server) instead of Flask development server.

The code includes a comment recommending Gunicorn for production, but Dockerfile.python runs the application with CMD ["python", "app.py"], which executes Flask's development server. Gunicorn is not installed (missing from requirements.txt). The development server is unsuitable for production and exposes unnecessary risks. Add gunicorn to requirements.txt and update the Dockerfile's CMD to use Gunicorn, such as: CMD ["gunicorn", "-w", "4", "-b", "0.0.0.0:5000", "app:app"].

Documentation updates:

1. README.md
   - Updated security section with new security measures
   - Added infrastructure security diagram
   - Added JWT authentication examples for all API endpoints
   - Updated LGPD compliance section

2. .env.template
   - Added JWT configuration variables (JWT_ALGORITHM, JWT_EXPIRY_HOURS)
   - Added API_AUTH_ENABLED flag
   - Added ALLOWED_ORIGINS for CORS configuration
   - Added command to generate secure JWT secret

3. docs/DEPLOYMENT.md (NEW)
   - Complete step-by-step deployment tutorial
   - Server preparation and requirements
   - Docker and firewall configuration
   - SSL/TLS setup with Let's Encrypt
   - Database migrations guide
   - Service startup order
   - Post-deployment verification checklist
   - Monitoring setup with Grafana
   - Backup configuration
   - Security hardening steps
   - Troubleshooting guide

4. docs/SECURITY.md (NEW)
   - Security architecture overview
   - JWT authentication guide
   - Network security configuration
   - Data protection guidelines
   - Input validation and prompt injection protection
   - Security headers and rate limiting
   - Monitoring and auditing setup
   - Incident response procedures
   - LGPD compliance requirements
   - Security checklists (pre/post deployment, periodic)
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
.env.template (1)

35-35: Improve security placeholders and add clearer generation guidance.

The placeholders for sensitive keys should use clearer formats and include direct instructions rather than generic descriptions.

- OPENAI_API_KEY=sk-your_openai_api_key_here
+ OPENAI_API_KEY=sk-CHANGE_ME_MUST_BE_SET

For JWT and encryption keys, provide inline instructions to generate secure values:

- # REQUIRED: Generate with: python -c "import secrets; print(secrets.token_hex(32))"
- JWT_SECRET=generate_a_secure_64_character_hex_key_here
+ # REQUIRED: Generate with: python3 -c "import secrets; print(secrets.token_hex(32))"
+ # Example: JWT_SECRET=a1b2c3d4e5f6... (64 hex characters)
+ JWT_SECRET=CHANGE_ME_GENERATE_WITH_COMMAND_ABOVE

And similarly for ENCRYPTION_KEY:

- # Encryption key for sensitive data
- ENCRYPTION_KEY=generate_another_32_character_key
+ # Encryption key for sensitive data
+ # Generate with: python3 -c "import secrets; print(secrets.token_hex(16))"
+ ENCRYPTION_KEY=CHANGE_ME_GENERATE_WITH_COMMAND_ABOVE

This prevents accidental use of placeholder values and makes the setup process more explicit.

Also applies to: 81-81, 92-92

🧹 Nitpick comments (6)
docs/SECURITY.md (2)

446-448: Convert bare URLs to proper markdown links for better documentation.

Lines 446 and 551 contain bare URLs that should be formatted as markdown links for consistency with documentation standards and to improve user experience.

- - Security Team Email: security@your-domain.com
- - Emergency Phone: +55 XX XXXX-XXXX
- - External Security Consultant: contact info
+ - Security Team Email: [security@your-domain.com](mailto:security@your-domain.com)
+ - Emergency Phone: [+55 XX XXXX-XXXX](tel:+55XXXXXXXXXX)
+ - External Security Consultant: [contact info](CONTACT_INFO_URL)

And similarly at line 551 for the GitHub Issues URL:

- 1. **Do NOT** open a public GitHub issue
- 2. Email security details to: security@your-domain.com
+ 1. **Do NOT** open a public GitHub issue
+ 2. Email security details to: [security@your-domain.com](mailto:security@your-domain.com)

Also applies to: 551-553


117-119: Minor grammar improvements in token best practices section.

Line 117 repeats "only" and line 119 has a compound adjective that needs a hyphen:

- 1. **Never log tokens** - Tokens should not appear in logs
- 2. **Use HTTPS only** - Tokens must only be transmitted over TLS
- 3. **Short expiry** - Default 24 hours, configurable via `JWT_EXPIRY_HOURS`
- 4. **Secure storage** - Store tokens securely on client side
+ 1. **Never log tokens** - Tokens should not appear in logs
+ 2. **Use HTTPS only** - Tokens must be transmitted over TLS only
+ 3. **Short expiry** - Default 24 hours, configurable via `JWT_EXPIRY_HOURS`
+ 4. **Secure storage** - Store tokens securely on client-side systems
.env.template (2)

68-68: Quote SMTP_FROM_NAME value to ensure shell safety.

Line 68 contains spaces and should be quoted for safety when sourced in shell scripts:

- SMTP_FROM_NAME=Sistema Automacao Juridica
+ SMTP_FROM_NAME="Sistema Automacao Juridica"

1-4: Add usage instructions at the top of the template.

Provide clear guidance on how to use this template to avoid accidental deployment with placeholder values:

+ # IMPORTANT: This is a template file. To use it:
+ # 1. Copy to .env: cp .env.template .env
+ # 2. Edit all CHANGE_ME_ values
+ # 3. Replace all generated keys (JWT_SECRET, ENCRYPTION_KEY, passwords)
+ # 4. NEVER commit .env to version control
+ # 5. DO NOT use this template directly - always create a new .env file first
+ #
docs/DEPLOYMENT.md (2)

503-515: Consider adding backup encryption for production security.

The backup script (lines 503-515) compresses backups with gzip but does not encrypt them. This is inconsistent with the SECURITY.md guide which recommends encrypted backups.

Add GPG encryption to sensitive backups:

 # Backup PostgreSQL
 echo "Backing up PostgreSQL..."
-docker compose exec -T postgres pg_dump -U n8n_user n8n | gzip > "$BACKUP_DIR/database.sql.gz"
+docker compose exec -T postgres pg_dump -U n8n_user n8n | gzip | gpg --symmetric --cipher-algo AES256 -o "$BACKUP_DIR/database.sql.gz.gpg"
 
 # Backup N8N workflows
 echo "Backing up N8N workflows..."
 docker compose exec -T n8n n8n export:workflow --all --output=/home/node/.n8n/backup.json 2>/dev/null || true
-docker cp judicial-n8n:/home/node/.n8n/backup.json "$BACKUP_DIR/workflows.json" 2>/dev/null || true
+docker cp judicial-n8n:/home/node/.n8n/backup.json - 2>/dev/null | gpg --symmetric --cipher-algo AES256 -o "$BACKUP_DIR/workflows.json.gpg" || true
 
 # Backup configuration
 echo "Backing up configuration..."
-tar -czf "$BACKUP_DIR/config.tar.gz" -C /opt/judicial-automation .env config/
+tar -czf - -C /opt/judicial-automation .env config/ | gpg --symmetric --cipher-algo AES256 -o "$BACKUP_DIR/config.tar.gz.gpg"

Add a note that the GPG passphrase must be securely stored.


262-265: Ensure cron job has proper path context for docker-compose.

The crontab entry at line 264 references docker-compose but cron jobs may not have the same PATH as interactive shells:

- # Add: 0 3 * * * certbot renew --quiet --post-hook "docker compose -f /opt/judicial-automation/docker-compose.yml restart nginx"
+ # Add: 0 3 * * * /usr/local/bin/docker-compose -f /opt/judicial-automation/docker-compose.yml down && /usr/local/bin/docker-compose -f /opt/judicial-automation/docker-compose.yml up -d nginx
+ # Or use full paths: 0 3 * * * certbot renew --quiet --post-hook "sh -c 'cd /opt/judicial-automation && docker compose restart nginx'"

Alternatively, create a wrapper script and reference it:

cat > /opt/judicial-automation/scripts/shell/certbot-renew.sh << 'EOF'
#!/bin/bash
cd /opt/judicial-automation
docker compose restart nginx
EOF
chmod +x /opt/judicial-automation/scripts/shell/certbot-renew.sh

# Then crontab:
# 0 3 * * * certbot renew --quiet --post-hook "/opt/judicial-automation/scripts/shell/certbot-renew.sh"
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between fd66572 and bde35ee.

📒 Files selected for processing (4)
  • .env.template (1 hunks)
  • README.md (7 hunks)
  • docs/DEPLOYMENT.md (1 hunks)
  • docs/SECURITY.md (1 hunks)
🧰 Additional context used
🪛 dotenv-linter (4.0.0)
.env.template

[warning] 12-12: [UnorderedKey] The N8N_PASSWORD key should go before the N8N_PORT key

(UnorderedKey)


[warning] 20-20: [UnorderedKey] The POSTGRES_DB key should go before the POSTGRES_HOST key

(UnorderedKey)


[warning] 22-22: [UnorderedKey] The POSTGRES_PASSWORD key should go before the POSTGRES_PORT key

(UnorderedKey)


[warning] 29-29: [UnorderedKey] The REDIS_PASSWORD key should go before the REDIS_PORT key

(UnorderedKey)


[warning] 37-37: [UnorderedKey] The OPENAI_MAX_TOKENS key should go before the OPENAI_MODEL key

(UnorderedKey)


[warning] 43-43: [UnorderedKey] The DATAJUD_PASSWORD key should go before the DATAJUD_USERNAME key

(UnorderedKey)


[warning] 44-44: [UnorderedKey] The DATAJUD_BASE_URL key should go before the DATAJUD_PASSWORD key

(UnorderedKey)


[warning] 57-57: [UnorderedKey] The GOOGLE_LOCATION key should go before the GOOGLE_PROJECT_ID key

(UnorderedKey)


[warning] 58-58: [UnorderedKey] The GOOGLE_PROCESSOR_ID key should go before the GOOGLE_PROJECT_ID key

(UnorderedKey)


[warning] 66-66: [UnorderedKey] The SMTP_PASSWORD key should go before the SMTP_PORT key

(UnorderedKey)


[warning] 67-67: [UnorderedKey] The SMTP_FROM_EMAIL key should go before the SMTP_HOST key

(UnorderedKey)


[warning] 68-68: [UnorderedKey] The SMTP_FROM_NAME key should go before the SMTP_HOST key

(UnorderedKey)


[warning] 68-68: [ValueWithoutQuotes] This value needs to be surrounded in quotes

(ValueWithoutQuotes)


[warning] 74-74: [UnorderedKey] The GRAFANA_PASSWORD key should go before the GRAFANA_USER key

(UnorderedKey)


[warning] 75-75: [UnorderedKey] The GF_SERVER_ROOT_URL key should go before the GRAFANA_PASSWORD key

(UnorderedKey)


[warning] 82-82: [UnorderedKey] The JWT_ALGORITHM key should go before the JWT_SECRET key

(UnorderedKey)


[warning] 83-83: [UnorderedKey] The JWT_EXPIRY_HOURS key should go before the JWT_SECRET key

(UnorderedKey)


[warning] 98-98: [UnorderedKey] The FLASK_DEBUG key should go before the FLASK_ENV key

(UnorderedKey)

🪛 Gitleaks (8.30.0)
docs/SECURITY.md

[high] 99-99: Detected a Generic API Key, potentially exposing access to various services and sensitive operations.

(generic-api-key)

🪛 LanguageTool
docs/SECURITY.md

[style] ~117-~117: This adverb was used twice in the sentence. Consider removing one of them or replacing them with a synonym.
Context: ...ogs 2. Use HTTPS only - Tokens must only be transmitted over TLS 3. **Short expi...

(ADVERB_REPETITION_PREMIUM)


[uncategorized] ~119-~119: If this is a compound adjective that modifies the following noun, use a hyphen.
Context: ...re storage** - Store tokens securely on client side 5. Revocation - Implement token bla...

(EN_COMPOUND_ADJECTIVE_INTERNAL)

🪛 markdownlint-cli2 (0.18.1)
docs/SECURITY.md

26-26: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


446-446: Bare URL used

(MD034, no-bare-urls)


551-551: Bare URL used

(MD034, no-bare-urls)

🔇 Additional comments (4)
docs/SECURITY.md (1)

1-560: Comprehensive and well-structured security guide.

The security documentation provides excellent coverage of authentication, network isolation, data protection, input validation, monitoring, and LGPD compliance. The architecture diagrams and code examples are clear and actionable. The security checklist and incident response procedures add practical value for operational teams.

README.md (2)

157-243: Comprehensive and well-structured JWT authentication documentation.

The README updates properly document the new JWT-centric authentication approach with clear examples, a token endpoint, usage patterns, and infrastructure security considerations. The addition of the infrastructure security diagram and the updated LGPD compliance section enhance the overall documentation quality.


157-169: JWT authentication implementation verified and documented correctly.

The /auth/token endpoint exists and all documented API endpoints properly enforce JWT authentication:

  • /auth/token endpoint implemented at scripts/python/app.py:198 with correct response format
  • ✓ All API endpoints (/extract-pdf, /firac-analysis, /datajud-search, /distinguish-analysis, /generate-document) use @require_auth decorator
  • ✓ JWT helper functions implemented with proper token creation, validation, and expiry handling (default 24 hours)
  • ✓ Bearer token validation correctly extracts and verifies tokens

Note: The /health endpoint is publicly accessible without authentication and is not documented in the README. Consider documenting this explicitly if it's intentionally public for monitoring purposes.

docs/DEPLOYMENT.md (1)

1-730: Comprehensive and production-ready deployment documentation.

The deployment guide provides excellent step-by-step coverage from server preparation through post-deployment verification. Service startup order is properly sequenced, health checks are included, SSL/TLS setup covers both Let's Encrypt and self-signed scenarios, and troubleshooting section is practical. The guide properly emphasizes security hardening, firewall configuration, and backup strategies.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants