Add missing infrastructure files and database schema#1
Add missing infrastructure files and database schema#1
Conversation
- Add Dockerfile.python for Python microservices container - Add nginx.conf with SSL, rate limiting, and reverse proxy config - Add Python Flask API application (app.py) - Add PostgreSQL database initialization script with: - Judicial schema with cases, documents, analyses tables - Audit schema for change tracking - Triggers for automatic timestamps and audit logging - Views for case summary and processing statistics - Add SSL certificates directory placeholder
- Add requirements.txt with all Python dependencies - Add pyproject.toml with modern Python project configuration - Package metadata and classifiers - Optional dependencies for dev and monitoring - Tool configurations for black, isort, mypy, pytest - Add .env.template with documented environment variables - Add .gitignore for Python, Docker, secrets, and IDE files
- Add conftest.py with shared fixtures for testing - Add unit tests for PDFExtractor class - Add unit tests for all API endpoints: - /health, /extract-pdf, /firac-analysis - /datajud-search, /distinguish-analysis, /generate-document - Add integration tests for complete workflows - Add tests for error handling and service degradation - Add pytest.ini with markers and configuration - Test structure: tests/unit/, tests/integration/
- Add CI pipeline (ci.yml) with: - Code quality checks (black, isort, flake8, mypy) - Unit tests with pytest - Integration tests with Redis and PostgreSQL services - Test coverage reporting with Codecov - Security scanning (bandit, safety) - Docker build verification - Add deployment pipeline (deploy.yml) with: - Docker image build and push to GHCR - Staging deployment with smoke tests - Production deployment with health checks - GitHub Release creation for tags - Manual rollback capability - Add Dependabot configuration for: - Python dependencies (weekly) - GitHub Actions (weekly) - Docker images (weekly)
- Add OpenAPI 3.1 specification (openapi.yaml) with: - All API endpoints documented - Request/response schemas - Error responses - Authentication details - Rate limiting info - Add comprehensive API documentation (API_DOCUMENTATION.md): - Endpoint descriptions with examples - curl examples for each endpoint - Python and JavaScript SDK examples - Error handling guide - LGPD compliance information
- Add alembic.ini configuration - Add migrations/env.py with environment setup - Add migrations/models.py with SQLAlchemy ORM models: - Case, Document, FIRACAnalysis - JurisprudenceSearch, DistinguishAnalysis - GeneratedDocument, Notification, AuditLog - Add initial migration (001) creating: - judicial and audit schemas - All tables with indexes and constraints - Enum types for case_status and document_type - Update timestamp triggers - Add migrations/README.md with usage documentation
|
Note Other AI code review bot(s) detectedCodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review. WalkthroughAdds project foundation: environment template, CI/CD (Dependabot, GitHub Actions), Alembic migrations and SQLAlchemy models, a Flask API (PDF extraction, FIRAC, DataJud, distinguish, document generation) with Redis/OpenAI integrations, Docker/Nginx artifacts, OpenAPI and docs, DB init scripts, and tests. Changes
Sequence Diagram(s)sequenceDiagram
actor Client
participant API as Flask API
participant Redis
participant OpenAI
participant Datajud as DATAJUD API
participant DB as PostgreSQL
Client->>API: POST /extract-pdf (pdf/base64)
API->>Redis: GET extracted_text(cache)
alt cache hit
Redis-->>API: cached text
else
API->>API: extract via PyPDF2
API->>Redis: SET extracted_text(cache)
end
API-->>Client: extracted text
Client->>API: POST /firac-analysis (text)
API->>Redis: GET firac_analysis(cache)
alt cache hit
Redis-->>API: cached analysis
else
API->>OpenAI: FIRAC prompt (GPT-4)
OpenAI-->>API: structured FIRAC response
API->>DB: INSERT firac_analyses
API->>Redis: SET firac_analysis(cache)
end
API-->>Client: FIRAC analysis
Client->>API: POST /datajud-search (params)
API->>Datajud: search request
Datajud-->>API: results
API->>DB: INSERT jurisprudence_searches
API-->>Client: search results
Client->>API: POST /distinguish-analysis (facts, precedent)
API->>OpenAI: distinguish prompt
OpenAI-->>API: applicability analysis
API->>DB: INSERT distinguish_analyses
API-->>Client: distinguish result
Client->>API: POST /generate-document (type, case)
API->>OpenAI: generation prompt
OpenAI-->>API: document content
API->>DB: INSERT generated_documents
API-->>Client: generated document
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes
Poem
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Summary of ChangesHello @fbmoulin, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request establishes the core infrastructure and initial microservices for a judicial automation platform. It sets up the environment for Python-based services, configures Nginx for secure and efficient traffic management, and lays down a comprehensive PostgreSQL database schema with migration capabilities. The included Flask API provides essential functionalities for legal document processing and AI-driven analysis, supported by automated dependency updates and a foundational test suite. Highlights
Ignored Files
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces a significant amount of infrastructure and application code, including Docker setup, Nginx configuration, a Python Flask API, and a complete database schema with migrations. The overall structure is very well-organized and follows many best practices, such as using multi-stage Docker builds, security headers in Nginx, and a comprehensive test suite structure. However, there are several critical security vulnerabilities related to hardcoded passwords and some significant bugs in the Python application due to outdated library usage. My review focuses on addressing these critical issues, improving security, fixing bugs, and enhancing maintainability by removing redundancies.
| response = openai.ChatCompletion.create( | ||
| model="gpt-4", | ||
| messages=[ | ||
| {"role": "system", "content": "Voce e um especialista em analise juridica brasileira."}, | ||
| {"role": "user", "content": prompt} | ||
| ], | ||
| max_tokens=2000, | ||
| temperature=0.3 | ||
| ) |
There was a problem hiding this comment.
The openai.ChatCompletion.create syntax is for openai library version < 1.0.0. Your requirements.txt specifies openai==1.6.1, which uses a new client-based API. This code will raise an AttributeError. The suggested change updates the call to the new syntax. For better performance, you may want to initialize the OpenAI client once globally rather than in each function call. This same issue exists in DistinguishAnalyzer and DocumentGenerator.
| response = openai.ChatCompletion.create( | |
| model="gpt-4", | |
| messages=[ | |
| {"role": "system", "content": "Voce e um especialista em analise juridica brasileira."}, | |
| {"role": "user", "content": prompt} | |
| ], | |
| max_tokens=2000, | |
| temperature=0.3 | |
| ) | |
| from openai import OpenAI | |
| client = OpenAI() | |
| response = client.chat.completions.create( | |
| model="gpt-4", | |
| messages=[ | |
| {"role": "system", "content": "Voce e um especialista em analise juridica brasileira."}, | |
| {"role": "user", "content": prompt} | |
| ], | |
| max_tokens=2000, | |
| temperature=0.3 | |
| ) |
|
|
||
| # SQLAlchemy URL (can be overridden by env) | ||
| # Use env variable in production | ||
| sqlalchemy.url = postgresql://n8n_user:password@localhost:5432/n8n |
There was a problem hiding this comment.
Hardcoding credentials, even for a local database with a simple password, is a significant security risk. The sqlalchemy.url is committed to version control with a password. It's much safer to rely entirely on environment variables, which your migrations/env.py script already supports. Please remove this line or comment it out to prevent accidental credential exposure.
# sqlalchemy.url = postgresql://n8n_user:password@localhost:5432/n8n
| if __name__ == '__main__': | ||
| logger.info("Starting Judicial Automation API...") | ||
| logger.info(f"OpenAI configured: {bool(openai.api_key)}") | ||
| logger.info(f"DATAJUD configured: {bool(datajud_client.username)}") | ||
| logger.info(f"Redis configured: {redis_client is not None}") | ||
|
|
||
| app.run(host='0.0.0.0', port=5000, debug=os.getenv('FLASK_DEBUG', 'false').lower() == 'true') |
There was a problem hiding this comment.
The application is started using app.run(), which runs the Flask development server. This server is not suitable for production as it's not designed to be efficient, stable, or secure. Your Dockerfile executes this script directly. You should use a production-grade WSGI server like Gunicorn or uWSGI to run the application, and update the CMD in your Dockerfile accordingly. This if __name__ == '__main__': block is useful for local development but shouldn't be the entrypoint for production.
| response = openai.ChatCompletion.create( | ||
| model="gpt-4", | ||
| messages=[ | ||
| {"role": "system", "content": "Voce e um magistrado especialista em redacao de pecas judiciais."}, | ||
| {"role": "user", "content": prompts[document_type]} | ||
| ], | ||
| max_tokens=3000, | ||
| temperature=0.3 | ||
| ) |
There was a problem hiding this comment.
The openai.ChatCompletion.create syntax is outdated for openai>=1.0.0 and will cause a runtime error. This needs to be updated to use the new client-based API, similar to the other OpenAI calls in this file.
| response = openai.ChatCompletion.create( | |
| model="gpt-4", | |
| messages=[ | |
| {"role": "system", "content": "Voce e um magistrado especialista em redacao de pecas judiciais."}, | |
| {"role": "user", "content": prompts[document_type]} | |
| ], | |
| max_tokens=3000, | |
| temperature=0.3 | |
| ) | |
| from openai import OpenAI | |
| client = OpenAI() | |
| response = client.chat.completions.create( | |
| model="gpt-4", | |
| messages=[ | |
| {"role": "system", "content": "Voce e um magistrado especialista em redacao de pecas judiciais."}, | |
| {"role": "user", "content": prompts[document_type]} | |
| ], | |
| max_tokens=3000, | |
| temperature=0.3 | |
| ) |
| port = os.getenv("POSTGRES_PORT", "5432") | ||
| db = os.getenv("POSTGRES_DB", "n8n") | ||
| user = os.getenv("POSTGRES_USER", "n8n_user") | ||
| password = os.getenv("POSTGRES_PASSWORD", "password") |
There was a problem hiding this comment.
The get_url function has a hardcoded default password 'password'. This is a critical security risk, as it might be used accidentally in a real environment if the POSTGRES_PASSWORD environment variable is not set. The application should fail explicitly if a password is not provided, preventing connections with weak or default credentials.
| password = os.getenv("POSTGRES_PASSWORD", "password") | |
| password = os.environ["POSTGRES_PASSWORD"] # Fail loudly if not set |
| [pytest] | ||
| # Sistema de Automacao Juridica - Pytest Configuration | ||
|
|
||
| # Test discovery | ||
| testpaths = tests | ||
| python_files = test_*.py *_test.py | ||
| python_classes = Test* | ||
| python_functions = test_* | ||
|
|
||
| # Minimum version | ||
| minversion = 7.0 | ||
|
|
||
| # Default options | ||
| addopts = | ||
| -ra | ||
| -q | ||
| --strict-markers | ||
| --tb=short | ||
| -v | ||
|
|
||
| # Markers | ||
| markers = | ||
| unit: Unit tests (fast, no external dependencies) | ||
| integration: Integration tests (may require mocked services) | ||
| slow: Slow running tests | ||
| requires_api: Tests requiring external API access | ||
|
|
||
| # Logging | ||
| log_cli = true | ||
| log_cli_level = INFO | ||
| log_cli_format = %(asctime)s [%(levelname)s] %(message)s | ||
| log_cli_date_format = %Y-%m-%d %H:%M:%S | ||
|
|
||
| # Warnings | ||
| filterwarnings = | ||
| ignore::DeprecationWarning | ||
| ignore::PendingDeprecationWarning | ||
|
|
||
| # Timeout (requires pytest-timeout) | ||
| # timeout = 30 | ||
|
|
||
| # Coverage settings (when running with --cov) | ||
| # Configured in pyproject.toml |
There was a problem hiding this comment.
This pytest.ini file duplicates several configurations that are also present in pyproject.toml (e.g., testpaths, python_files, filterwarnings). This can lead to confusion and make maintenance harder, as it's unclear which file takes precedence. It's recommended to consolidate all tool configurations into pyproject.toml to have a single source of truth.
| | buck-out | ||
| | build | ||
| | dist | ||
| | migrations |
There was a problem hiding this comment.
Excluding the entire migrations directory from Black formatting is not ideal. While the auto-generated migration files can be noisy, the env.py and models.py files within it would benefit from consistent formatting. Consider refining the exclusion pattern or running Black on those specific files manually.
| JWT_SECRET=generate_a_secure_32_character_key_here | ||
| ENCRYPTION_KEY=generate_another_32_character_key |
There was a problem hiding this comment.
| security: | ||
| - bearerAuth: [] |
There was a problem hiding this comment.
Applying security at the top level makes all endpoints require authentication. This is likely not intended for the /health endpoint, which is typically public for monitoring purposes. It's better to define security requirements on a per-path or per-operation basis for more granular control. Consider removing this global security definition and applying it only to the endpoints that need protection.
| -- Sistema de Automacao Juridica - Database Initialization | ||
| -- PostgreSQL 15 | ||
|
|
||
| -- Create extensions | ||
| CREATE EXTENSION IF NOT EXISTS "uuid-ossp"; | ||
| CREATE EXTENSION IF NOT EXISTS "pg_trgm"; | ||
|
|
||
| -- Create schemas | ||
| CREATE SCHEMA IF NOT EXISTS judicial; | ||
| CREATE SCHEMA IF NOT EXISTS audit; | ||
|
|
||
| -- Set search path | ||
| SET search_path TO judicial, public; | ||
|
|
||
| -- ===================================================== | ||
| -- MAIN TABLES | ||
| -- ===================================================== | ||
|
|
||
| -- Cases table | ||
| CREATE TABLE IF NOT EXISTS judicial.cases ( | ||
| id UUID PRIMARY KEY DEFAULT uuid_generate_v4(), | ||
| case_number VARCHAR(50) NOT NULL UNIQUE, | ||
| tribunal VARCHAR(20) NOT NULL, | ||
| court VARCHAR(100), | ||
| class_code INTEGER, | ||
| class_name VARCHAR(255), | ||
| subject_codes INTEGER[], | ||
| status VARCHAR(50) DEFAULT 'pending', | ||
| priority INTEGER DEFAULT 0, | ||
| received_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(), | ||
| processed_at TIMESTAMP WITH TIME ZONE, | ||
| created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(), | ||
| updated_at TIMESTAMP WITH TIME ZONE DEFAULT NOW() | ||
| ); | ||
|
|
||
| -- Create index for case search | ||
| CREATE INDEX IF NOT EXISTS idx_cases_number ON judicial.cases(case_number); | ||
| CREATE INDEX IF NOT EXISTS idx_cases_tribunal ON judicial.cases(tribunal); | ||
| CREATE INDEX IF NOT EXISTS idx_cases_status ON judicial.cases(status); | ||
| CREATE INDEX IF NOT EXISTS idx_cases_created ON judicial.cases(created_at); | ||
|
|
||
| -- Documents table | ||
| CREATE TABLE IF NOT EXISTS judicial.documents ( | ||
| id UUID PRIMARY KEY DEFAULT uuid_generate_v4(), | ||
| case_id UUID REFERENCES judicial.cases(id) ON DELETE CASCADE, | ||
| document_type VARCHAR(50) NOT NULL, | ||
| file_name VARCHAR(255), | ||
| file_path VARCHAR(500), | ||
| file_size BIGINT, | ||
| mime_type VARCHAR(100), | ||
| extracted_text TEXT, | ||
| page_count INTEGER, | ||
| checksum VARCHAR(64), | ||
| uploaded_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(), | ||
| processed_at TIMESTAMP WITH TIME ZONE, | ||
| created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW() | ||
| ); | ||
|
|
||
| -- Create index for document search | ||
| CREATE INDEX IF NOT EXISTS idx_documents_case ON judicial.documents(case_id); | ||
| CREATE INDEX IF NOT EXISTS idx_documents_type ON judicial.documents(document_type); | ||
| CREATE INDEX IF NOT EXISTS idx_documents_text_gin ON judicial.documents USING gin(to_tsvector('portuguese', extracted_text)); | ||
|
|
||
| -- FIRAC Analysis table | ||
| CREATE TABLE IF NOT EXISTS judicial.firac_analyses ( | ||
| id UUID PRIMARY KEY DEFAULT uuid_generate_v4(), | ||
| case_id UUID REFERENCES judicial.cases(id) ON DELETE CASCADE, | ||
| document_id UUID REFERENCES judicial.documents(id) ON DELETE SET NULL, | ||
| facts TEXT, | ||
| issues TEXT, | ||
| rules TEXT, | ||
| analysis TEXT, | ||
| conclusion TEXT, | ||
| raw_response JSONB, | ||
| model_used VARCHAR(50), | ||
| tokens_used INTEGER, | ||
| confidence_score DECIMAL(3,2), | ||
| analyzed_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(), | ||
| created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW() | ||
| ); | ||
|
|
||
| CREATE INDEX IF NOT EXISTS idx_firac_case ON judicial.firac_analyses(case_id); | ||
|
|
||
| -- Jurisprudence Search Results table | ||
| CREATE TABLE IF NOT EXISTS judicial.jurisprudence_searches ( | ||
| id UUID PRIMARY KEY DEFAULT uuid_generate_v4(), | ||
| case_id UUID REFERENCES judicial.cases(id) ON DELETE CASCADE, | ||
| search_query JSONB NOT NULL, | ||
| tribunal VARCHAR(20), | ||
| total_results INTEGER, | ||
| results JSONB, | ||
| search_duration_ms INTEGER, | ||
| searched_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(), | ||
| created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW() | ||
| ); | ||
|
|
||
| CREATE INDEX IF NOT EXISTS idx_jurisprudence_case ON judicial.jurisprudence_searches(case_id); | ||
| CREATE INDEX IF NOT EXISTS idx_jurisprudence_tribunal ON judicial.jurisprudence_searches(tribunal); | ||
|
|
||
| -- Distinguish Analysis table | ||
| CREATE TABLE IF NOT EXISTS judicial.distinguish_analyses ( | ||
| id UUID PRIMARY KEY DEFAULT uuid_generate_v4(), | ||
| case_id UUID REFERENCES judicial.cases(id) ON DELETE CASCADE, | ||
| jurisprudence_search_id UUID REFERENCES judicial.jurisprudence_searches(id) ON DELETE SET NULL, | ||
| current_facts TEXT NOT NULL, | ||
| precedent_data JSONB, | ||
| is_applicable BOOLEAN, | ||
| similarities TEXT, | ||
| differences TEXT, | ||
| recommendation TEXT, | ||
| raw_response JSONB, | ||
| confidence_score DECIMAL(3,2), | ||
| analyzed_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(), | ||
| created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW() | ||
| ); | ||
|
|
||
| CREATE INDEX IF NOT EXISTS idx_distinguish_case ON judicial.distinguish_analyses(case_id); | ||
|
|
||
| -- Generated Documents table | ||
| CREATE TABLE IF NOT EXISTS judicial.generated_documents ( | ||
| id UUID PRIMARY KEY DEFAULT uuid_generate_v4(), | ||
| case_id UUID REFERENCES judicial.cases(id) ON DELETE CASCADE, | ||
| firac_analysis_id UUID REFERENCES judicial.firac_analyses(id) ON DELETE SET NULL, | ||
| distinguish_analysis_id UUID REFERENCES judicial.distinguish_analyses(id) ON DELETE SET NULL, | ||
| document_type VARCHAR(50) NOT NULL, | ||
| title VARCHAR(255), | ||
| content TEXT NOT NULL, | ||
| google_docs_id VARCHAR(255), | ||
| google_docs_url VARCHAR(500), | ||
| version INTEGER DEFAULT 1, | ||
| status VARCHAR(50) DEFAULT 'draft', | ||
| generated_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(), | ||
| published_at TIMESTAMP WITH TIME ZONE, | ||
| created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(), | ||
| updated_at TIMESTAMP WITH TIME ZONE DEFAULT NOW() | ||
| ); | ||
|
|
||
| CREATE INDEX IF NOT EXISTS idx_generated_case ON judicial.generated_documents(case_id); | ||
| CREATE INDEX IF NOT EXISTS idx_generated_type ON judicial.generated_documents(document_type); | ||
| CREATE INDEX IF NOT EXISTS idx_generated_status ON judicial.generated_documents(status); | ||
|
|
||
| -- Notifications table | ||
| CREATE TABLE IF NOT EXISTS judicial.notifications ( | ||
| id UUID PRIMARY KEY DEFAULT uuid_generate_v4(), | ||
| case_id UUID REFERENCES judicial.cases(id) ON DELETE CASCADE, | ||
| notification_type VARCHAR(50) NOT NULL, | ||
| recipient_email VARCHAR(255), | ||
| subject VARCHAR(255), | ||
| message TEXT, | ||
| status VARCHAR(50) DEFAULT 'pending', | ||
| sent_at TIMESTAMP WITH TIME ZONE, | ||
| error_message TEXT, | ||
| created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW() | ||
| ); | ||
|
|
||
| CREATE INDEX IF NOT EXISTS idx_notifications_case ON judicial.notifications(case_id); | ||
| CREATE INDEX IF NOT EXISTS idx_notifications_status ON judicial.notifications(status); | ||
|
|
||
| -- ===================================================== | ||
| -- AUDIT TABLES | ||
| -- ===================================================== | ||
|
|
||
| -- Audit log table | ||
| CREATE TABLE IF NOT EXISTS audit.logs ( | ||
| id UUID PRIMARY KEY DEFAULT uuid_generate_v4(), | ||
| table_name VARCHAR(100) NOT NULL, | ||
| record_id UUID, | ||
| action VARCHAR(20) NOT NULL, | ||
| old_data JSONB, | ||
| new_data JSONB, | ||
| user_id VARCHAR(100), | ||
| ip_address INET, | ||
| user_agent TEXT, | ||
| created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW() | ||
| ); | ||
|
|
||
| CREATE INDEX IF NOT EXISTS idx_audit_table ON audit.logs(table_name); | ||
| CREATE INDEX IF NOT EXISTS idx_audit_record ON audit.logs(record_id); | ||
| CREATE INDEX IF NOT EXISTS idx_audit_action ON audit.logs(action); | ||
| CREATE INDEX IF NOT EXISTS idx_audit_created ON audit.logs(created_at); | ||
|
|
||
| -- ===================================================== | ||
| -- FUNCTIONS AND TRIGGERS | ||
| -- ===================================================== | ||
|
|
||
| -- Update timestamp function | ||
| CREATE OR REPLACE FUNCTION update_updated_at_column() | ||
| RETURNS TRIGGER AS $$ | ||
| BEGIN | ||
| NEW.updated_at = NOW(); | ||
| RETURN NEW; | ||
| END; | ||
| $$ language 'plpgsql'; | ||
|
|
||
| -- Apply update trigger to relevant tables | ||
| CREATE TRIGGER update_cases_updated_at | ||
| BEFORE UPDATE ON judicial.cases | ||
| FOR EACH ROW | ||
| EXECUTE FUNCTION update_updated_at_column(); | ||
|
|
||
| CREATE TRIGGER update_generated_documents_updated_at | ||
| BEFORE UPDATE ON judicial.generated_documents | ||
| FOR EACH ROW | ||
| EXECUTE FUNCTION update_updated_at_column(); | ||
|
|
||
| -- Audit logging function | ||
| CREATE OR REPLACE FUNCTION audit.log_changes() | ||
| RETURNS TRIGGER AS $$ | ||
| BEGIN | ||
| IF TG_OP = 'INSERT' THEN | ||
| INSERT INTO audit.logs (table_name, record_id, action, new_data) | ||
| VALUES (TG_TABLE_SCHEMA || '.' || TG_TABLE_NAME, NEW.id, 'INSERT', to_jsonb(NEW)); | ||
| RETURN NEW; | ||
| ELSIF TG_OP = 'UPDATE' THEN | ||
| INSERT INTO audit.logs (table_name, record_id, action, old_data, new_data) | ||
| VALUES (TG_TABLE_SCHEMA || '.' || TG_TABLE_NAME, NEW.id, 'UPDATE', to_jsonb(OLD), to_jsonb(NEW)); | ||
| RETURN NEW; | ||
| ELSIF TG_OP = 'DELETE' THEN | ||
| INSERT INTO audit.logs (table_name, record_id, action, old_data) | ||
| VALUES (TG_TABLE_SCHEMA || '.' || TG_TABLE_NAME, OLD.id, 'DELETE', to_jsonb(OLD)); | ||
| RETURN OLD; | ||
| END IF; | ||
| RETURN NULL; | ||
| END; | ||
| $$ LANGUAGE plpgsql; | ||
|
|
||
| -- Apply audit triggers | ||
| CREATE TRIGGER audit_cases | ||
| AFTER INSERT OR UPDATE OR DELETE ON judicial.cases | ||
| FOR EACH ROW EXECUTE FUNCTION audit.log_changes(); | ||
|
|
||
| CREATE TRIGGER audit_documents | ||
| AFTER INSERT OR UPDATE OR DELETE ON judicial.documents | ||
| FOR EACH ROW EXECUTE FUNCTION audit.log_changes(); | ||
|
|
||
| CREATE TRIGGER audit_generated_documents | ||
| AFTER INSERT OR UPDATE OR DELETE ON judicial.generated_documents | ||
| FOR EACH ROW EXECUTE FUNCTION audit.log_changes(); | ||
|
|
||
| -- ===================================================== | ||
| -- VIEWS | ||
| -- ===================================================== | ||
|
|
||
| -- Case summary view | ||
| CREATE OR REPLACE VIEW judicial.case_summary AS | ||
| SELECT | ||
| c.id, | ||
| c.case_number, | ||
| c.tribunal, | ||
| c.court, | ||
| c.class_name, | ||
| c.status, | ||
| c.priority, | ||
| c.received_at, | ||
| c.processed_at, | ||
| COUNT(DISTINCT d.id) AS document_count, | ||
| COUNT(DISTINCT f.id) AS firac_analysis_count, | ||
| COUNT(DISTINCT g.id) AS generated_document_count, | ||
| MAX(g.generated_at) AS last_document_generated | ||
| FROM judicial.cases c | ||
| LEFT JOIN judicial.documents d ON d.case_id = c.id | ||
| LEFT JOIN judicial.firac_analyses f ON f.case_id = c.id | ||
| LEFT JOIN judicial.generated_documents g ON g.case_id = c.id | ||
| GROUP BY c.id; | ||
|
|
||
| -- Processing statistics view | ||
| CREATE OR REPLACE VIEW judicial.processing_stats AS | ||
| SELECT | ||
| DATE(created_at) AS date, | ||
| COUNT(*) AS total_cases, | ||
| COUNT(CASE WHEN status = 'completed' THEN 1 END) AS completed_cases, | ||
| COUNT(CASE WHEN status = 'pending' THEN 1 END) AS pending_cases, | ||
| COUNT(CASE WHEN status = 'error' THEN 1 END) AS error_cases, | ||
| AVG(EXTRACT(EPOCH FROM (processed_at - received_at))) AS avg_processing_seconds | ||
| FROM judicial.cases | ||
| WHERE created_at >= NOW() - INTERVAL '30 days' | ||
| GROUP BY DATE(created_at) | ||
| ORDER BY date DESC; | ||
|
|
||
| -- ===================================================== | ||
| -- INITIAL DATA | ||
| -- ===================================================== | ||
|
|
||
| -- Insert common document types | ||
| CREATE TABLE IF NOT EXISTS judicial.document_types ( | ||
| code VARCHAR(50) PRIMARY KEY, | ||
| name VARCHAR(100) NOT NULL, | ||
| description TEXT | ||
| ); | ||
|
|
||
| INSERT INTO judicial.document_types (code, name, description) VALUES | ||
| ('sentenca', 'Sentenca', 'Decisao judicial que resolve o merito da causa'), | ||
| ('despacho', 'Despacho', 'Ato judicial de mero expediente'), | ||
| ('decisao', 'Decisao Interlocutoria', 'Decisao que resolve questao incidente'), | ||
| ('acordao', 'Acordao', 'Decisao colegiada de tribunal'), | ||
| ('peticao', 'Peticao', 'Documento da parte'), | ||
| ('parecer', 'Parecer', 'Opiniao tecnica do Ministerio Publico') | ||
| ON CONFLICT (code) DO NOTHING; | ||
|
|
||
| -- Grant permissions | ||
| GRANT USAGE ON SCHEMA judicial TO n8n_user; | ||
| GRANT USAGE ON SCHEMA audit TO n8n_user; | ||
| GRANT ALL PRIVILEGES ON ALL TABLES IN SCHEMA judicial TO n8n_user; | ||
| GRANT ALL PRIVILEGES ON ALL TABLES IN SCHEMA audit TO n8n_user; | ||
| GRANT ALL PRIVILEGES ON ALL SEQUENCES IN SCHEMA judicial TO n8n_user; | ||
|
|
||
| -- Success message | ||
| DO $$ | ||
| BEGIN | ||
| RAISE NOTICE 'Database initialization completed successfully!'; | ||
| RAISE NOTICE 'Schemas created: judicial, audit'; | ||
| RAISE NOTICE 'Tables created: cases, documents, firac_analyses, jurisprudence_searches, distinguish_analyses, generated_documents, notifications'; | ||
| END $$; |
There was a problem hiding this comment.
This SQL script manually defines the database schema, which is also managed by Alembic migrations. Maintaining two sources of truth for the database schema is problematic and can lead to inconsistencies. For example, this script includes a GIN index and audit triggers that are missing from the Alembic migration. It's best to rely on a single system, preferably Alembic, to manage all schema changes. Please consolidate all schema definitions into your Alembic migrations and consider removing this SQL file or marking it clearly as for development setup only and ensuring it's generated from migrations.
There was a problem hiding this comment.
Actionable comments posted: 16
Note
Due to the large number of review comments, Critical, Major severity comments were prioritized as inline comments.
🟡 Minor comments (11)
.env.template-68-68 (1)
68-68: Quote unquoted environment variable value.Line 68 contains a value with spaces that should be quoted. The dotenv parser may not handle the unquoted value correctly.
Apply this diff:
-SMTP_FROM_NAME=Sistema Automacao Juridica +SMTP_FROM_NAME="Sistema Automacao Juridica"tests/unit/test_pdf_extractor.py-1-102 (1)
1-102: Run Black formatter to fix formatting issues.The CI pipeline reports a Black formatting check failure. Run
black tests/unit/test_pdf_extractor.pyto automatically format the code.#!/bin/bash # Format the file with Black black tests/unit/test_pdf_extractor.pydocs/API_DOCUMENTATION.md-19-19 (1)
19-19: Fix markdown formatting issues flagged by markdownlint.Several minor markdown formatting issues were detected:
- Line 19: Add language identifier to fenced code block
- Lines 161, 301: Add blank lines around tables
- Lines 388-389: Convert bare URLs to markdown links
Apply these fixes:
+Authorization: Bearer <your_token>
| size | integer | No | Max results (default: 50, max: 100) | + **Response:**| 500 | Internal server error | + ---For API support: -- **Email:** suporte@judicial-automation.com.br -- **GitHub Issues:** https://github.com/fbmoulin/2acivelagent/issues +- **Email:** <suporte@judicial-automation.com.br> +- **GitHub Issues:** <https://github.com/fbmoulin/2acivelagent/issues>Also applies to: 161-161, 301-301, 388-389
alembic.ini-28-28 (1)
28-28: Update database name to match the project.The database name
n8nappears to reference the n8n workflow automation platform, which is inconsistent with this judicial automation system. Update to a more appropriate name likejudicial_automation.-sqlalchemy.url = postgresql://n8n_user:password@localhost:5432/n8n +# sqlalchemy.url = postgresql://judicial_user:password@localhost:5432/judicial_automationdocs/api/openapi.yaml-158-158 (1)
158-158: Escape or rephrase the colon in the description text.YAMLlint reports a syntax error on this line. The colon character within the Portuguese text
(ex: falha OpenAI)may confuse YAML parsers that interpret it as a key-value separator.Apply this diff to fix the issue by wrapping the description in quotes:
- description: Erro na analise (ex: falha OpenAI) + description: "Erro na analise (ex: falha OpenAI)"Alternatively, replace the colon with a dash or remove the parenthetical:
- description: Erro na analise (ex: falha OpenAI) + description: Erro na analise (ex - falha OpenAI)tests/unit/test_api_routes.py-134-143 (1)
134-143: Add assertions on the response.The
responsevariable is assigned but never used. The test should verify the response status code or data to ensure complete coverage.response = client.post('/firac-analysis', data=json.dumps({'text': sample_legal_text}), content_type='application/json') mock_openai.assert_called_once() + assert response.status_code == 200tests/unit/test_api_routes.py-175-190 (1)
175-190: Add assertions on the response.The
responsevariable is assigned but never used. The test should verify the response to ensure the API call succeeded.response = client.post('/datajud-search', data=json.dumps({ 'tribunal': 'tjsp', 'texto_livre': 'execucao fiscal' }), content_type='application/json') mock_post.assert_called_once() + assert response.status_code == 200migrations/models.py-244-244 (1)
244-244: Same privacy consideration as noted for SQL script.
recipient_emailstores PII; consider documenting data retention and protection requirements per LGPD compliance.scripts/python/app.py-1-2 (1)
1-2: Run Black formatter to fix CI pipeline failure.The CI pipeline reports Black formatting check failed. Run
black scripts/python/app.pyto fix. Also, the shebang is present but file is not executable (Ruff EXE001).tests/conftest.py-1-3 (1)
1-3: Run Black formatter to fix CI pipeline failure.The CI pipeline reports that Black formatting check failed for this file. Run
black tests/conftest.pyto fix formatting issues.scripts/sql/init_database.sql-142-157 (1)
142-157: Consider privacy implications for email storage.The
recipient_emailcolumn stores email addresses in plain text. Depending on your compliance requirements (LGPD - Brazil's data protection law), you may need to:
- Document this data collection in privacy policies
- Consider encryption at rest
- Implement data retention policies
🧹 Nitpick comments (20)
.env.template (1)
1-109: Consider reordering environment variables for consistency.The dotenv-linter tool flagged 17 instances where variables within each section are not alphabetically ordered (e.g., POSTGRES_DB before POSTGRES_HOST, REDIS_PASSWORD before REDIS_PORT). While this does not affect functionality, alphabetical ordering improves consistency and readability.
docs/api/openapi.yaml (2)
54-76: Consider excluding /health from authentication requirements.The
/healthendpoint is typically used by load balancers, monitoring systems, and container orchestrators for health checks. Requiring JWT authentication may prevent these systems from functioning correctly.If you want to exclude the health endpoint from authentication, add a security override:
/health: get: tags: - Health summary: Verificacao de saude description: Retorna o status de saude da API e seus servicos dependentes. operationId: healthCheck + security: [] responses:The empty array
security: []overrides the global security requirement for this specific endpoint.
363-364: Consider increasing the minimum text length for FIRAC analysis.The
minLength: 50constraint for legal text analysis may be too permissive. A meaningful FIRAC analysis typically requires a substantial case description with facts, issues, and legal context.Consider increasing the minimum to a more practical value:
text: type: string - minLength: 50 + minLength: 200 description: Texto juridico para analise (minimo 50 caracteres)And update the description accordingly.
alembic.ini (1)
9-9: Verify the file_template format string.The migration filename template uses multiple datetime components that will generate very long filenames. While this works, it may be unnecessarily verbose.
Consider simplifying to the more common format:
-file_template = %%(year)d%%(month).2d%%(day).2d_%%(hour).2d%%(minute).2d%%(second).2d_%%(rev)s_%%(slug)s +file_template = %%(year)d%%(month).2d%%(day).2d_%%(rev)s_%%(slug)sThis removes hour/minute/second, which are rarely needed since the revision ID already provides uniqueness.
infrastructure/docker/Dockerfile.python (1)
64-65: Consider using Python for health checks instead of curl.While the current health check works, using Python would eliminate the curl dependency and reduce the image size slightly.
Replace the curl-based health check with a Python-based one:
HEALTHCHECK --interval=30s --timeout=10s --start-period=5s --retries=3 \ - CMD curl -f http://localhost:5000/health || exit 1 + CMD python -c "import urllib.request; urllib.request.urlopen('http://localhost:5000/health').read()" || exit 1Then remove curl from the runtime dependencies:
RUN apt-get update && apt-get install -y --no-install-recommends \ libpq5 \ - curl \ && rm -rf /var/lib/apt/lists/* \ && apt-get cleanmigrations/README.md (1)
15-18: Consider using a more descriptive database name following naming conventions.The example uses
n8nas the database name. While n8n is a core component of this judicial automation system, database names typically reflect the database content or application domain rather than the orchestration tool name. Consider usingjudicial_automation,legal_system, orjus_automationfor better clarity and maintainability.tests/unit/test_api_routes.py (1)
78-86: Assertion is too permissive.The test accepts either
'text'or'error'in the response, which means it passes even if the extraction fails. Consider separate tests for success and error cases to ensure proper validation.- @pytest.mark.unit - def test_extract_pdf_returns_text(self, client, sample_pdf_base64): - """Test that extracted text is returned""" - response = client.post('/extract-pdf', - data=json.dumps({'pdf_content': sample_pdf_base64}), - content_type='application/json') - - data = json.loads(response.data) - assert 'text' in data or 'error' in data + @pytest.mark.unit + def test_extract_pdf_returns_text(self, client, sample_pdf_base64): + """Test that extracted text is returned""" + response = client.post('/extract-pdf', + data=json.dumps({'pdf_content': sample_pdf_base64}), + content_type='application/json') + + assert response.status_code == 200 + data = json.loads(response.data) + assert 'text' in datamigrations/versions/20241204_000000_001_initial_schema.py (1)
103-125: Consider partial indexes for confidence scores.If you frequently query for high-confidence analyses, consider adding partial indexes on
confidence_scorewhere the score exceeds a threshold. This is optional and can be deferred based on actual query patterns.Example for future migration:
CREATE INDEX idx_firac_high_confidence ON judicial.firac_analyses(confidence_score) WHERE confidence_score > 0.80;tests/integration/test_api_integration.py (2)
207-218: Remove unused mock_openai parameter.The
mock_openaiparameter is declared but never used in this test. Remove it to clean up the test signature.@pytest.mark.integration - @patch('scripts.python.app.openai.ChatCompletion.create') - def test_multiple_pdf_extractions(self, mock_openai, client, sample_pdf_base64): + def test_multiple_pdf_extractions(self, client, sample_pdf_base64): """Test multiple PDF extractions"""
243-261: Remove unused sample_pdf_base64 parameter.The
sample_pdf_base64parameter is declared but never used in this test. Remove it to clean up the test signature.@pytest.mark.integration - def test_unicode_text_handled(self, client, sample_pdf_base64): + def test_unicode_text_handled(self, client): """Test handling of Unicode text (Portuguese characters)"""infrastructure/nginx/nginx.conf (1)
55-59: Consider adding Content-Security-Policy header.The security headers are good, but consider adding a Content-Security-Policy (CSP) header for additional protection against XSS and data injection attacks. Note that
X-XSS-Protectionis deprecated in modern browsers but doesn't hurt to keep for legacy support.add_header X-Frame-Options "SAMEORIGIN" always; add_header X-Content-Type-Options "nosniff" always; add_header X-XSS-Protection "1; mode=block" always; add_header Referrer-Policy "strict-origin-when-cross-origin" always; + add_header Content-Security-Policy "default-src 'self'; script-src 'self' 'unsafe-inline' 'unsafe-eval'; style-src 'self' 'unsafe-inline';" always;.github/workflows/ci.yml (1)
194-197: Consider increasing coverage threshold over time.The 50% coverage threshold is reasonable for initial setup, but plan to gradually increase this to 70-80% as the codebase matures to ensure comprehensive test coverage.
tests/conftest.py (2)
11-15: Consider usingpytest.iniorpyproject.tomlfor path configuration.The
sys.path.insertapproach works but is fragile. Consider configuringpythonpathinpytest.iniorpyproject.tomlinstead for cleaner test discovery.# pytest.ini [pytest] pythonpath = .
73-77: Movebase64import to module level.The
base64import should be at the top of the file with other imports for consistency with PEP 8.import os import sys import json +import base64 import pytestAnd then remove line 76.
scripts/sql/init_database.sql (2)
28-28: Consider adding a CHECK constraint for status values.The
statuscolumn uses VARCHAR(50) with a default, but unlike the ORM models (migrations/models.py) which define aCaseStatusenum, this SQL script lacks validation. Consider adding a CHECK constraint or creating an ENUM type for consistency:-- Option 1: CHECK constraint status VARCHAR(50) DEFAULT 'pending' CHECK (status IN ('pending', 'processing', 'completed', 'error', 'archived')), -- Option 2: ENUM type (preferred for consistency with ORM) CREATE TYPE judicial.case_status AS ENUM ('pending', 'processing', 'completed', 'error', 'archived'); -- Then use: status judicial.case_status DEFAULT 'pending',
206-225: Audit trigger does not capture user context.The
audit.log_changes()function logs row changes but doesn't populateuser_id,ip_address, oruser_agent. These fields will always be NULL. Consider using session variables or application-level auditing to capture user context:-- Application can set these before operations: -- SET LOCAL app.user_id = 'user123'; -- SET LOCAL app.ip_address = '192.168.1.1'; -- Then in the trigger function: INSERT INTO audit.logs (table_name, record_id, action, new_data, user_id, ip_address) VALUES ( TG_TABLE_SCHEMA || '.' || TG_TABLE_NAME, NEW.id, 'INSERT', to_jsonb(NEW), current_setting('app.user_id', true), current_setting('app.ip_address', true)::inet );scripts/python/app.py (3)
71-76: Uselogger.exceptionfor automatic stack traces.Replace
logger.errorwithlogger.exceptionin exception handlers. This applies to all similar patterns in this file (lines 72, 121, 207, 261, 331, 401, 422, 439, 460, 483).except Exception as e: - logger.error(f"PDF extraction error: {e}") + logger.exception("PDF extraction error") return { "success": False, "error": str(e) }
183-189: No validation of DATAJUD credentials before request.If
DATAJUD_USERNAMEorDATAJUD_PASSWORDare not set,auth=(None, None)is passed to requests, which may cause unexpected behavior. Validate credentials before making the request:+ if not self.username or not self.password: + return { + "success": False, + "error": "DATAJUD credentials not configured" + } + response = requests.post( endpoint, json=search_query, headers=headers, auth=(self.username, self.password), timeout=30 )
488-495: Unusederrorparameter in error handlers.Prefix with underscore to indicate intentionally unused:
@app.errorhandler(404) -def not_found(error): +def not_found(_error): return jsonify({"error": "Endpoint not found"}), 404 @app.errorhandler(500) -def internal_error(error): +def internal_error(_error): return jsonify({"error": "Internal server error"}), 500migrations/models.py (1)
90-90: Inconsistent type usage fordocument_type.
Document.document_typeusesString(50)whileGeneratedDocument.document_type(line 211) usesSQLEnum(DocumentType). Consider using the enum consistently:- document_type = Column(String(50), nullable=False) + document_type = Column( + SQLEnum(DocumentType, name="document_type", schema="judicial"), + nullable=False + )
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (28)
.env.template(1 hunks).github/dependabot.yml(1 hunks).github/workflows/ci.yml(1 hunks).github/workflows/deploy.yml(1 hunks).gitignore(1 hunks)alembic.ini(1 hunks)docs/API_DOCUMENTATION.md(1 hunks)docs/api/openapi.yaml(1 hunks)infrastructure/docker/Dockerfile.python(1 hunks)infrastructure/nginx/nginx.conf(1 hunks)migrations/README.md(1 hunks)migrations/env.py(1 hunks)migrations/models.py(1 hunks)migrations/script.py.mako(1 hunks)migrations/versions/20241204_000000_001_initial_schema.py(1 hunks)pyproject.toml(1 hunks)pytest.ini(1 hunks)requirements.txt(1 hunks)scripts/python/app.py(1 hunks)scripts/sql/init_database.sql(1 hunks)security/ssl_certificates/README.txt(1 hunks)tests/__init__.py(1 hunks)tests/conftest.py(1 hunks)tests/integration/__init__.py(1 hunks)tests/integration/test_api_integration.py(1 hunks)tests/unit/__init__.py(1 hunks)tests/unit/test_api_routes.py(1 hunks)tests/unit/test_pdf_extractor.py(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (3)
tests/conftest.py (1)
scripts/python/app.py (5)
PDFExtractor(47-76)FIRACAnalyzer(79-125)DatajudClient(128-211)DistinguishAnalyzer(214-265)DocumentGenerator(268-335)
scripts/python/app.py (1)
tests/conftest.py (5)
pdf_extractor(180-182)firac_analyzer(186-188)datajud_client(192-194)distinguish_analyzer(198-200)document_generator(204-206)
tests/unit/test_pdf_extractor.py (2)
tests/conftest.py (2)
pdf_extractor(180-182)sample_pdf_content(35-70)scripts/python/app.py (1)
extract_text_from_pdf(51-76)
🪛 actionlint (1.7.9)
.github/workflows/deploy.yml
152-152: the runner of "softprops/action-gh-release@v1" action is too old to run on GitHub Actions. update the action's version to fix this issue
(action)
🪛 Checkov (3.2.334)
.github/workflows/ci.yml
[medium] 163-164: Basic Auth Credentials
(CKV_SECRET_4)
🪛 dotenv-linter (4.0.0)
.env.template
[warning] 12-12: [UnorderedKey] The N8N_PASSWORD key should go before the N8N_PORT key
(UnorderedKey)
[warning] 20-20: [UnorderedKey] The POSTGRES_DB key should go before the POSTGRES_HOST key
(UnorderedKey)
[warning] 22-22: [UnorderedKey] The POSTGRES_PASSWORD key should go before the POSTGRES_PORT key
(UnorderedKey)
[warning] 29-29: [UnorderedKey] The REDIS_PASSWORD key should go before the REDIS_PORT key
(UnorderedKey)
[warning] 37-37: [UnorderedKey] The OPENAI_MAX_TOKENS key should go before the OPENAI_MODEL key
(UnorderedKey)
[warning] 43-43: [UnorderedKey] The DATAJUD_PASSWORD key should go before the DATAJUD_USERNAME key
(UnorderedKey)
[warning] 44-44: [UnorderedKey] The DATAJUD_BASE_URL key should go before the DATAJUD_PASSWORD key
(UnorderedKey)
[warning] 57-57: [UnorderedKey] The GOOGLE_LOCATION key should go before the GOOGLE_PROJECT_ID key
(UnorderedKey)
[warning] 58-58: [UnorderedKey] The GOOGLE_PROCESSOR_ID key should go before the GOOGLE_PROJECT_ID key
(UnorderedKey)
[warning] 66-66: [UnorderedKey] The SMTP_PASSWORD key should go before the SMTP_PORT key
(UnorderedKey)
[warning] 67-67: [UnorderedKey] The SMTP_FROM_EMAIL key should go before the SMTP_HOST key
(UnorderedKey)
[warning] 68-68: [UnorderedKey] The SMTP_FROM_NAME key should go before the SMTP_HOST key
(UnorderedKey)
[warning] 68-68: [ValueWithoutQuotes] This value needs to be surrounded in quotes
(ValueWithoutQuotes)
[warning] 74-74: [UnorderedKey] The GRAFANA_PASSWORD key should go before the GRAFANA_USER key
(UnorderedKey)
[warning] 75-75: [UnorderedKey] The GF_SERVER_ROOT_URL key should go before the GRAFANA_PASSWORD key
(UnorderedKey)
[warning] 81-81: [UnorderedKey] The ENCRYPTION_KEY key should go before the JWT_SECRET key
(UnorderedKey)
[warning] 87-87: [UnorderedKey] The FLASK_DEBUG key should go before the FLASK_ENV key
(UnorderedKey)
🪛 GitHub Actions: CI Pipeline
tests/integration/test_api_integration.py
[error] 1-1: Black formatting check failed. Run 'black' to format code in this file.
tests/conftest.py
[error] 1-1: Black formatting check failed. Run 'black' to format code in this file.
scripts/python/app.py
[error] 1-1: Black formatting check failed. Run 'black' to format code in this file.
tests/unit/test_api_routes.py
[error] 1-1: Black formatting check failed. Run 'black' to format code in this file.
tests/unit/test_pdf_extractor.py
[error] 1-1: Black formatting check failed. Run 'black' to format code in this file.
🪛 markdownlint-cli2 (0.18.1)
docs/API_DOCUMENTATION.md
19-19: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
161-161: Tables should be surrounded by blank lines
(MD058, blanks-around-tables)
301-301: Tables should be surrounded by blank lines
(MD058, blanks-around-tables)
388-388: Bare URL used
(MD034, no-bare-urls)
389-389: Bare URL used
(MD034, no-bare-urls)
🪛 OSV Scanner (2.3.0)
requirements.txt
[HIGH] 7-7: werkzeug 3.0.1: Werkzeug debugger vulnerable to remote execution when interacting with attacker controlled domain
[HIGH] 7-7: werkzeug 3.0.1: Werkzeug safe_join not safe on Windows
[HIGH] 7-7: werkzeug 3.0.1: Werkzeug safe_join() allows Windows special device names
[HIGH] 7-7: werkzeug 3.0.1: Werkzeug possible resource exhaustion when parsing file data in forms
[HIGH] 33-33: aiohttp 3.9.1: undefined
(PYSEC-2024-24)
[HIGH] 33-33: aiohttp 3.9.1: undefined
(PYSEC-2024-26)
[HIGH] 33-33: aiohttp 3.9.1: aiohttp is vulnerable to directory traversal
[HIGH] 33-33: aiohttp 3.9.1: aiohttp vulnerable to Denial of Service when trying to parse malformed POST requests
[HIGH] 33-33: aiohttp 3.9.1: aiohttp Cross-site Scripting vulnerability on index pages for static file handling
[HIGH] 33-33: aiohttp 3.9.1: aiohttp allows request smuggling due to incorrect parsing of chunk extensions
[HIGH] 33-33: aiohttp 3.9.1: aiohttp's HTTP parser (the python one, not llhttp) still overly lenient about separators
[HIGH] 33-33: aiohttp 3.9.1: AIOHTTP is vulnerable to HTTP Request/Response Smuggling through incorrect parsing of chunked trailer sections
[HIGH] 42-42: cryptography 41.0.7: undefined
(PYSEC-2024-225)
[HIGH] 42-42: cryptography 41.0.7: Python Cryptography package vulnerable to Bleichenbacher timing oracle attack
[HIGH] 42-42: cryptography 41.0.7: cryptography NULL pointer dereference with pkcs12.serialize_key_and_certificates when called with a non-matching certificate and private key and an hmac_hash override
[HIGH] 42-42: cryptography 41.0.7: Null pointer dereference in PKCS12 parsing
[HIGH] 42-42: cryptography 41.0.7: pyca/cryptography has a vulnerable OpenSSL included in cryptography wheels
🪛 Ruff (0.14.7)
tests/integration/test_api_integration.py
209-209: Unused method argument: mock_openai
(ARG002)
244-244: Unused method argument: sample_pdf_base64
(ARG002)
scripts/python/app.py
1-1: Shebang is present but file is not executable
(EXE001)
42-42: Do not catch blind exception: Exception
(BLE001)
71-71: Do not catch blind exception: Exception
(BLE001)
72-72: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
120-120: Do not catch blind exception: Exception
(BLE001)
121-121: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
206-206: Do not catch blind exception: Exception
(BLE001)
207-207: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
260-260: Do not catch blind exception: Exception
(BLE001)
261-261: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
330-330: Do not catch blind exception: Exception
(BLE001)
331-331: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
356-356: Do not catch blind exception: Exception
(BLE001)
385-385: Do not catch blind exception: Exception
(BLE001)
395-395: Do not catch blind exception: Exception
(BLE001)
400-400: Do not catch blind exception: Exception
(BLE001)
401-401: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
421-421: Do not catch blind exception: Exception
(BLE001)
422-422: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
438-438: Do not catch blind exception: Exception
(BLE001)
439-439: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
459-459: Do not catch blind exception: Exception
(BLE001)
460-460: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
482-482: Do not catch blind exception: Exception
(BLE001)
483-483: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
489-489: Unused function argument: error
(ARG001)
494-494: Unused function argument: error
(ARG001)
504-504: Possible binding to all interfaces
(S104)
tests/unit/test_api_routes.py
139-139: Local variable response is assigned to but never used
Remove assignment to unused variable response
(F841)
183-183: Local variable response is assigned to but never used
Remove assignment to unused variable response
(F841)
🪛 YAMLlint (1.37.1)
docs/api/openapi.yaml
[error] 158-158: syntax error: mapping values are not allowed here
(syntax)
🔇 Additional comments (32)
tests/__init__.py (1)
1-1: LGTM!Simple and appropriate package initializer.
tests/integration/__init__.py (1)
1-1: LGTM!Simple and appropriate package initializer.
.github/dependabot.yml (1)
1-58: LGTM!Well-structured Dependabot configuration with appropriate cadence, PR limits, and ecosystem coverage. The timezone-aware schedule and semantic commit prefixes are good practices.
.gitignore (1)
1-142: LGTM!Comprehensive and well-organized .gitignore covering Python, Docker, secrets, IDEs, and system artifacts. Clear section headers and strategic comments (e.g., preserving docker-compose.yml) demonstrate thoughtful design.
pytest.ini (1)
1-43: LGTM!Well-configured pytest setup with appropriate markers, discovery patterns, and logging. Strict marker enforcement and informative output configuration are good practices.
pyproject.toml (2)
1-162: LGTM: Project configuration is well-structured.Comprehensive configuration with appropriate separation of build system, metadata, dependencies (core vs dev vs monitoring), tool configurations, and packaging setup. The use of version ranges (>=) here with exact pinning in requirements.txt is a solid pattern for balancing flexibility and reproducibility.
89-90: Entry point is correctly configured and callable.Verification confirms that
scripts/python/app.pyexists, defines a Flask application object (app) at line 30, and theapp.run()method is properly inherited from Flask and called at line 504. The entry pointjudicial-api = "scripts.python.app:app.run"is valid and functional.docs/api/openapi.yaml (1)
1-505: Well-structured OpenAPI specification with comprehensive coverage.The specification provides clear documentation for all endpoints with appropriate request/response schemas, error handling, and examples. The JWT authentication scheme is properly configured and the schema definitions are thorough.
tests/unit/__init__.py (1)
1-1: LGTM!Standard Python package initializer.
docs/API_DOCUMENTATION.md (1)
1-389: Excellent comprehensive API documentation.The documentation thoroughly covers all endpoints with clear examples, authentication details, error handling, rate limiting, and LGPD compliance information. The SDK examples in Python and JavaScript are particularly helpful.
migrations/README.md (1)
1-116: Well-structured migration documentation.The README provides comprehensive coverage of Alembic workflow, including setup, common commands, schema overview, best practices, and troubleshooting. The examples are clear and practical.
alembic.ini (1)
31-38: Excellent use of post-write hooks for code formatting.The Black formatting hook ensures migration files are automatically formatted after generation, maintaining code consistency.
infrastructure/docker/Dockerfile.python (1)
1-68: Excellent multi-stage Dockerfile with security best practices.The Dockerfile follows best practices:
- Multi-stage build minimizes final image size
- Non-root user for security
- Minimal base image (python:3.11-slim)
- Proper layer caching with requirements first
- Comprehensive health check configuration
- Clean apt cache to reduce size
tests/unit/test_pdf_extractor.py (1)
1-102: Comprehensive test coverage for PDF extraction.The test suite thoroughly covers:
- Success cases with metadata validation
- Error handling for invalid and empty content
- Edge cases like corrupted PDFs and empty pages
- Multi-page PDF processing with proper mocking
The use of pytest markers and fixtures demonstrates good test organization.
security/ssl_certificates/README.txt (1)
1-1: SSL certificates are properly excluded from version control.The
.gitignorefile includes the*.pempattern (line 12), ensuring certificate files are not tracked by git. No certificate files are currently tracked in the repository, confirming proper security configuration.migrations/versions/20241204_000000_001_initial_schema.py (1)
21-291: LGTM! Well-structured database migration.The migration is comprehensive and follows best practices:
- Proper schema organization (judicial/audit separation)
- Correct use of DO blocks for idempotent enum creation
- Appropriate indexes on foreign keys and query columns
- Cascade deletes where appropriate
- Complete downgrade path
infrastructure/nginx/nginx.conf (2)
100-118: LGTM! Strong SSL/TLS configuration.The SSL configuration follows modern best practices:
- TLSv1.2 and TLSv1.3 only
- Strong cipher suites with forward secrecy
- HSTS with appropriate max-age
- Session tickets disabled for enhanced privacy
62-64: Verify rate limits match expected traffic patterns.The rate limits are set to 10 req/s for API and 5 req/s for webhooks. Ensure these values align with your expected legitimate traffic patterns to avoid false positives.
Consider monitoring actual traffic patterns in staging to validate these thresholds before production deployment.
migrations/env.py (2)
32-46: LGTM! Clean database URL construction.The function properly handles both full DATABASE_URL/POSTGRES_URL and individual component environment variables with sensible defaults.
82-86: Good use of NullPool for migrations.Using
NullPoolis the correct choice for migrations as it prevents connection pooling issues during schema changes. This follows Alembic best practices.migrations/script.py.mako (1)
1-28: LGTM! Clean migration template.The template follows Alembic best practices with proper type annotations and clear structure. The use of typed variables and the pass defaults for upgrade/downgrade functions are appropriate.
.github/workflows/ci.yml (1)
1-284: LGTM! Comprehensive CI pipeline.The CI pipeline is well-structured with:
- Proper job dependencies and parallelization
- Service containers for integration tests
- Multiple quality gates (lint, tests, coverage, security)
- Appropriate caching strategies
- Final success job to gate the pipeline
The credential warning from Checkov at lines 163-164 is a false positive—these are test database credentials used in CI service containers, which is standard practice.
tests/conftest.py (3)
153-166: LGTM on mock fixtures.The
mock_openai_responseandmock_redisfixtures are well-structured and provide good isolation for unit tests.
209-215: LGTM on pytest markers configuration.Custom markers are properly registered for test classification (
unit,integration,slow,requires_api).
34-70: Verify the hardcoded PDF byte offsets are accurate.The xref table contains hardcoded byte offsets (9, 58, 115, 214 for objects 1-4, and 306 for startxref). These must match the actual byte positions in the PDF stream or parsers like PyPDF2 will fail. Calculate the exact byte positions of each object marker (
1 0 obj,2 0 obj, etc.) to confirm the offsets are correct before relying on this fixture.scripts/sql/init_database.sql (3)
4-13: LGTM on schema setup.Good separation of concerns with
judicialandauditschemas. Extensionsuuid-osspandpg_trgmare appropriate for the use case.
59-62: LGTM on full-text search configuration.The GIN index with Portuguese dictionary for
extracted_textis appropriate for a Brazilian legal document system.
291-298: LGTM on seed data.Using
ON CONFLICT (code) DO NOTHINGmakes this idempotent for repeated runs.scripts/python/app.py (1)
498-504: Binding to all interfaces is acceptable for containerized deployment.The Ruff S104 warning about binding to
0.0.0.0is a false positive for Docker deployments where this is necessary. The nginx reverse proxy provides the security boundary.migrations/models.py (3)
23-39: LGTM on enum definitions.
CaseStatusandDocumentTypeenums are well-defined and consistent with database reference data.
42-77: LGTM on Case model.Well-structured with proper indexing, schema qualification, and cascade behaviors for child relationships.
256-276: LGTM on AuditLog model.Properly isolated in the
auditschema with appropriate PostgreSQL-specific types (INET,JSONB).
| - name: Create GitHub Release | ||
| if: startsWith(github.ref, 'refs/tags/v') | ||
| uses: softprops/action-gh-release@v1 | ||
| with: | ||
| generate_release_notes: true | ||
| draft: false | ||
| prerelease: ${{ contains(github.ref, '-rc') || contains(github.ref, '-beta') }} |
There was a problem hiding this comment.
Update GitHub Action to a newer version.
The softprops/action-gh-release@v1 action uses an outdated runner. Update to v2 to ensure compatibility with current GitHub Actions infrastructure.
- name: Create GitHub Release
if: startsWith(github.ref, 'refs/tags/v')
- uses: softprops/action-gh-release@v1
+ uses: softprops/action-gh-release@v2
with:
generate_release_notes: true
draft: false
prerelease: ${{ contains(github.ref, '-rc') || contains(github.ref, '-beta') }}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| - name: Create GitHub Release | |
| if: startsWith(github.ref, 'refs/tags/v') | |
| uses: softprops/action-gh-release@v1 | |
| with: | |
| generate_release_notes: true | |
| draft: false | |
| prerelease: ${{ contains(github.ref, '-rc') || contains(github.ref, '-beta') }} | |
| - name: Create GitHub Release | |
| if: startsWith(github.ref, 'refs/tags/v') | |
| uses: softprops/action-gh-release@v2 | |
| with: | |
| generate_release_notes: true | |
| draft: false | |
| prerelease: ${{ contains(github.ref, '-rc') || contains(github.ref, '-beta') }} |
🧰 Tools
🪛 actionlint (1.7.9)
152-152: the runner of "softprops/action-gh-release@v1" action is too old to run on GitHub Actions. update the action's version to fix this issue
(action)
🤖 Prompt for AI Agents
.github/workflows/deploy.yml around lines 150 to 156: the workflow uses
softprops/action-gh-release@v1 which relies on an outdated runner; update the
action reference to softprops/action-gh-release@v2 (or a specific v2.x tag) in
the workflow, keep the same inputs (generate_release_notes, draft, prerelease)
and verify the action's v2 docs for any input/name changes before committing.
| rollback: | ||
| name: Rollback Deployment | ||
| runs-on: ubuntu-latest | ||
| if: github.event_name == 'workflow_dispatch' && failure() | ||
| needs: [deploy-staging, deploy-production] | ||
| steps: | ||
| - name: Rollback to previous version | ||
| run: | | ||
| echo "Rolling back deployment..." | ||
| # ssh ${{ secrets.PROD_HOST }} "cd /app && docker compose down && git checkout HEAD~1 && docker compose up -d" |
There was a problem hiding this comment.
Fix rollback trigger logic.
The rollback job uses failure() which checks if the current job failed, but the job hasn't run yet. The condition github.event_name == 'workflow_dispatch' && failure() will never trigger correctly. Consider using a manual workflow_dispatch input for rollback decisions.
rollback:
name: Rollback Deployment
runs-on: ubuntu-latest
- if: github.event_name == 'workflow_dispatch' && failure()
- needs: [deploy-staging, deploy-production]
+ if: github.event_name == 'workflow_dispatch' && github.event.inputs.action == 'rollback'
+
steps:
- name: Rollback to previous version
run: |Also add the rollback input to workflow_dispatch:
workflow_dispatch:
inputs:
environment:
description: 'Deployment environment'
required: true
default: 'staging'
type: choice
options:
- staging
- production
+ action:
+ description: 'Deployment action'
+ required: true
+ default: 'deploy'
+ type: choice
+ options:
+ - deploy
+ - rollbackCommittable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
.github/workflows/deploy.yml lines 167-176: the rollback job condition uses
failure() which cannot evaluate before the job runs, so it will never trigger;
change the workflow to accept a workflow_dispatch input (e.g., rollback: boolean
or string) and update the rollback job's if to check github.event_name ==
'workflow_dispatch' && github.event.inputs.rollback == 'true' (or appropriate
value), and update the workflow_dispatch section to declare that rollback input
so the job can be manually invoked with rollback=true; ensure any existing needs
remain correct for manual rollback.
| from datetime import datetime | ||
| from typing import Optional, List | ||
| from sqlalchemy import ( | ||
| Column, String, Integer, Text, Boolean, DateTime, Float, | ||
| ForeignKey, JSON, ARRAY, BigInteger, Index, UniqueConstraint, | ||
| Numeric, Enum as SQLEnum | ||
| ) | ||
| from sqlalchemy.dialects.postgresql import UUID, INET, JSONB | ||
| from sqlalchemy.orm import declarative_base, relationship | ||
| from sqlalchemy.sql import func | ||
| import uuid | ||
| import enum |
There was a problem hiding this comment.
🛠️ Refactor suggestion | 🟠 Major
Remove unused imports.
Optional, List, Float, and UniqueConstraint are imported but never used.
from datetime import datetime
-from typing import Optional, List
from sqlalchemy import (
- Column, String, Integer, Text, Boolean, DateTime, Float,
- ForeignKey, JSON, ARRAY, BigInteger, Index, UniqueConstraint,
+ Column, String, Integer, Text, Boolean, DateTime,
+ ForeignKey, JSON, ARRAY, BigInteger, Index,
Numeric, Enum as SQLEnum
)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| from datetime import datetime | |
| from typing import Optional, List | |
| from sqlalchemy import ( | |
| Column, String, Integer, Text, Boolean, DateTime, Float, | |
| ForeignKey, JSON, ARRAY, BigInteger, Index, UniqueConstraint, | |
| Numeric, Enum as SQLEnum | |
| ) | |
| from sqlalchemy.dialects.postgresql import UUID, INET, JSONB | |
| from sqlalchemy.orm import declarative_base, relationship | |
| from sqlalchemy.sql import func | |
| import uuid | |
| import enum | |
| from datetime import datetime | |
| from sqlalchemy import ( | |
| Column, String, Integer, Text, Boolean, DateTime, | |
| ForeignKey, JSON, ARRAY, BigInteger, Index, | |
| Numeric, Enum as SQLEnum | |
| ) | |
| from sqlalchemy.dialects.postgresql import UUID, INET, JSONB | |
| from sqlalchemy.orm import declarative_base, relationship | |
| from sqlalchemy.sql import func | |
| import uuid | |
| import enum |
🤖 Prompt for AI Agents
migrations/models.py lines 7-18: the file currently imports Optional, List,
Float and UniqueConstraint but those names are not used anywhere; remove these
four imports from the import lists (delete Optional and List from the typing
import, delete Float and UniqueConstraint from the sqlalchemy import list) and
run the linter/type-checker to confirm no other unused imports remain.
| [build-system] | ||
| requires = ["setuptools>=68.0", "wheel"] | ||
| build-backend = "setuptools.build_meta" | ||
|
|
||
| [project] | ||
| name = "judicial-automation-system" | ||
| version = "1.0.0" | ||
| description = "Sistema de Automacao Juridica - Brazilian Judicial Process Automation Platform" | ||
| readme = "README.md" | ||
| license = {text = "MIT"} | ||
| requires-python = ">=3.10" | ||
| authors = [ | ||
| {name = "Judicial Automation Team"} | ||
| ] | ||
| keywords = [ | ||
| "legal", | ||
| "automation", | ||
| "judiciary", | ||
| "n8n", | ||
| "flask", | ||
| "ai", | ||
| "brazilian-law" | ||
| ] | ||
| classifiers = [ | ||
| "Development Status :: 4 - Beta", | ||
| "Intended Audience :: Legal Industry", | ||
| "License :: OSI Approved :: MIT License", | ||
| "Operating System :: OS Independent", | ||
| "Programming Language :: Python :: 3", | ||
| "Programming Language :: Python :: 3.10", | ||
| "Programming Language :: Python :: 3.11", | ||
| "Programming Language :: Python :: 3.12", | ||
| "Topic :: Office/Business", | ||
| "Topic :: Scientific/Engineering :: Artificial Intelligence", | ||
| ] | ||
|
|
||
| dependencies = [ | ||
| "Flask>=3.0.0", | ||
| "Flask-CORS>=4.0.0", | ||
| "Werkzeug>=3.0.1", | ||
| "redis>=5.0.1", | ||
| "psycopg2-binary>=2.9.9", | ||
| "SQLAlchemy>=2.0.23", | ||
| "openai>=1.6.1", | ||
| "tiktoken>=0.5.2", | ||
| "google-cloud-documentai>=2.22.0", | ||
| "google-auth>=2.25.2", | ||
| "google-api-python-client>=2.111.0", | ||
| "PyPDF2>=3.0.1", | ||
| "python-docx>=1.1.0", | ||
| "pdfplumber>=0.10.3", | ||
| "requests>=2.31.0", | ||
| "pydantic>=2.5.3", | ||
| "PyJWT>=2.8.0", | ||
| "python-dotenv>=1.0.0", | ||
| "prometheus-client>=0.19.0", | ||
| "structlog>=23.2.0", | ||
| "flasgger>=0.9.7.1", | ||
| "tenacity>=8.2.3", | ||
| ] | ||
|
|
||
| [project.optional-dependencies] | ||
| dev = [ | ||
| "pytest>=7.4.3", | ||
| "pytest-cov>=4.1.0", | ||
| "pytest-asyncio>=0.23.2", | ||
| "pytest-mock>=3.12.0", | ||
| "httpx>=0.26.0", | ||
| "factory-boy>=3.3.0", | ||
| "faker>=21.0.0", | ||
| "black>=23.12.1", | ||
| "isort>=5.13.2", | ||
| "flake8>=6.1.0", |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# Check if aiohttp and cryptography are direct or transitive dependencies
cd /repo && git ls-files | head -20Repository: fbmoulin/2acivelagent
Length of output: 120
🏁 Script executed:
# Search for aiohttp and cryptography in pyproject.toml and related files
rg -i "aiohttp|cryptography" --type tomlRepository: fbmoulin/2acivelagent
Length of output: 47
🏁 Script executed:
# Look for any requirements files or dependency specs
fd -e txt -e in | grep -i reqRepository: fbmoulin/2acivelagent
Length of output: 81
🏁 Script executed:
# Check imports in the codebase to see if aiohttp/cryptography are actually used
rg "import aiohttp|from aiohttp|import cryptography|from cryptography" --type pyRepository: fbmoulin/2acivelagent
Length of output: 47
🏁 Script executed:
cat requirements.txtRepository: fbmoulin/2acivelagent
Length of output: 1219
🏁 Script executed:
# Check if aiohttp or cryptography appear anywhere in the repo
rg -i "aiohttp|cryptography" --type py --type txt --type toml --type mdRepository: fbmoulin/2acivelagent
Length of output: 134
🏁 Script executed:
# List all files in repo to understand structure
git ls-files | head -30Repository: fbmoulin/2acivelagent
Length of output: 691
🌐 Web query:
Werkzeug 3.0.1 vulnerabilities CVE remote code execution safe_join
💡 Result:
Short answer — yes: Werkzeug 3.0.1 is affected by multiple published vulnerabilities.
- CVE-2024-34069 — Remote Code Execution: debugger hostname/relative-path issue (fixed in werkzeug >= 3.0.3). If you run the debugger (not recommended in production) this is exploitable under specific conditions. [1]
- CVE-2024-49766 — safe_join / UNC path bypass (Windows + Python < 3.11): safe_join could produce unsafe paths (fixed in werkzeug 3.0.6). Affects Windows deployments using older Python. [2]
- CVE-2025-66221 — safe_join allows Windows device names (CON/AUX) causing hangs/DoS (fixed in werkzeug 3.1.4); note this is a separate Windows-specific issue affecting versions < 3.1.4. [3]
Recommended action: upgrade to a fixed release (at minimum >= 3.0.6 to address safe_join UNC and >= 3.0.3 for the debugger RCE; consider upgrading to the latest 3.x that includes the 3.1.4 fix if you run on Windows). References: [1] Snyk / CVE-2024-34069; [2] NVD / CVE-2024-49766; [3] Ubuntu / CVE-2025-66221.
🌐 Web query:
aiohttp 3.9.1 vulnerabilities CVE directory traversal request smuggling
💡 Result:
Short answer: yes — aiohttp 3.9.1 is affected by request‑smuggling and a static-file path‑traversal issue (and other related flaws). Upgrade is recommended.
Details and patches:
- Request smuggling / CRLF/HTTP parsing issues: multiple advisories (e.g. CVE-2023-49082 and follow‑ups) note lenient HTTP parsing that can enable request‑smuggling; fixes were released in later 3.9.x releases (upgrade to ≥ 3.9.2) or newer maintenance branches. [1][2][3]
- Path‑traversal in static routes: static() with follow_symlinks=True can allow directory traversal — fixed in ≥ 3.9.2. [2]
- Other related parsing fixes were later addressed in other releases (e.g. 3.10.11, 3.12.14 for trailer/newline parsing). Review which runtime (pure‑Python vs with C extensions) you use — some issues only apply when extensions are disabled. [4][5]
Recommended action: upgrade aiohttp to a patched release (at minimum ≥ 3.9.2 for the issues above; consider a current supported release such as latest 3.12.x or 4.x) and test with your deployment (note pure‑Python builds/AIOHTTP_NO_EXTENSIONS can affect exposure). [2][4][5]
Sources:
[1] NVD / CVE-2023-49082 / advisories.
[2] Snyk package advisory for aiohttp 3.9.1 (request smuggling, path traversal, XSS notes).
[3] Debian / security tracker (notes and fixes).
[4] NVD / CVE-2024-52304 (newline/chunk extension parsing).
[5] NVD / CVE-2025-53643 (trailer parsing) .
If you want, I can (a) list the exact commit/advisory links and fixed versions, or (b) give a one‑line pip/requirements change you can apply.
🌐 Web query:
cryptography 41.0.7 vulnerabilities CVE PKCS7 NULL pointer dereference
💡 Result:
Short answer: This was a NULL-pointer dereference in cryptography's PKCS7 parsing (CVE-2023-49083). It affected cryptography >= 3.1 and was patched in 41.0.6 — so 41.0.7 is not vulnerable. [1][2][3]
References:
- GitHub security advisory (NULL-dereference when loading PKCS7 certificates). [1]
- NVD entry for CVE-2023-49083. [2]
- OSS‑Security / release notes linking the patch (PR #9926). [3]
🌐 Web query:
cryptography 41.0.7 known vulnerabilities CVE security issues
💡 Result:
Short answer — yes. Known/security-relevant issues affecting cryptography 41.0.7:
- CVE-2024-26130 — NULL-pointer dereference / crash (PKCS#12 mismatch) that can lead to denial-of-service; reported in vendor advisories and affects builds shipped as 41.0.7 in some distributions. [1][2]
- CVE-2023-49083 — NULL-dereference when loading PKCS#7 (causes segfault/DoS); patched in 41.0.6 but listed in advisories for nearby versions. Verify your build. [3]
- CVE-2020-25659 (Observable timing discrepancy) — incomplete fixes mean versions < 42.0.0 (including 41.0.7) are flagged by scanners as vulnerable; upgrade to ≥42.0.0 to remediate. [4]
References:
- Ubuntu USN / package advisory for python-cryptography (USN-6673-3). [2]
- IBM security bulletin referencing CVE-2024-26130. [1]
- CVE entry / vulnerability summary for CVE-2023-49083. [3]
- Snyk / package vulnerability listing noting observable-timing and other issues for <42.0.0. [4]
If you want, I can (a) fetch the exact CVSS scores and affected version ranges for each CVE, or (b) check whether your installed package/build includes the vulnerable code — tell me your platform and how cryptography was installed.
Critical: Resolve high-severity dependency vulnerabilities before release.
File: requirements.txt
Multiple HIGH severity vulnerabilities require immediate patching:
Werkzeug 3.0.1 (3 CVEs):
- CVE-2024-34069: Remote code execution in debugger
- CVE-2024-49766: Unsafe safe_join() on Windows allowing path traversal
- CVE-2025-66221: Device name handling causing DoS on Windows
Recommendation: Update to Werkzeug 3.0.6 or later
aiohttp 3.9.1 (Multiple CVEs):
- CVE-2023-49082 and follow-ups: HTTP request smuggling via lenient CRLF/chunk parsing
- Path traversal vulnerability in static file serving with follow_symlinks=True
- Additional parsing issues in trailer and newline handling
Recommendation: Update to aiohttp 3.9.2 minimum; prefer 3.10.11 or 3.12.14 for comprehensive fixes
cryptography 41.0.7 (3 CVEs):
- CVE-2024-26130: NULL-pointer dereference in PKCS#12 causing DoS
- CVE-2023-49083: NULL-dereference in PKCS#7 certificate deserialization
- CVE-2020-25659: Observable timing discrepancy unfixed in versions <42.0.0
Recommendation: Update to cryptography 42.0.0 or later
These are blocking issues for production release.
| -- Grant permissions | ||
| GRANT USAGE ON SCHEMA judicial TO n8n_user; | ||
| GRANT USAGE ON SCHEMA audit TO n8n_user; | ||
| GRANT ALL PRIVILEGES ON ALL TABLES IN SCHEMA judicial TO n8n_user; | ||
| GRANT ALL PRIVILEGES ON ALL TABLES IN SCHEMA audit TO n8n_user; | ||
| GRANT ALL PRIVILEGES ON ALL SEQUENCES IN SCHEMA judicial TO n8n_user; |
There was a problem hiding this comment.
Script assumes n8n_user exists and grants excessive privileges.
Two issues:
- The script will fail if
n8n_userdoesn't exist. Add a conditional user creation. ALL PRIVILEGESviolates least privilege principle. Grant only what's needed.
-- Create user if not exists (PostgreSQL 9.0+)
DO $$
BEGIN
IF NOT EXISTS (SELECT FROM pg_catalog.pg_roles WHERE rolname = 'n8n_user') THEN
CREATE ROLE n8n_user WITH LOGIN PASSWORD 'change_me_in_production';
END IF;
END
$$;
-- Grant minimal required privileges
GRANT USAGE ON SCHEMA judicial TO n8n_user;
GRANT USAGE ON SCHEMA audit TO n8n_user;
GRANT SELECT, INSERT, UPDATE ON ALL TABLES IN SCHEMA judicial TO n8n_user;
GRANT SELECT, INSERT ON ALL TABLES IN SCHEMA audit TO n8n_user;
GRANT USAGE, SELECT ON ALL SEQUENCES IN SCHEMA judicial TO n8n_user;🤖 Prompt for AI Agents
In scripts/sql/init_database.sql around lines 300 to 305, the script assumes
n8n_user exists and uses ALL PRIVILEGES which is too broad; modify the script to
first create the role only if it does not exist (use a DO block checking
pg_catalog.pg_roles and create role with LOGIN and a placeholder password to be
changed in production), then replace ALL PRIVILEGES grants with least-privilege
grants: keep GRANT USAGE ON SCHEMA for judicial and audit, grant only
SELECT/INSERT/UPDATE (or the specific subset your app requires) on tables in
judicial and the narrower set needed for audit, and grant USAGE and SELECT on
sequences in judicial instead of ALL PRIVILEGES on sequences.
| # Sistema de Automacao Juridica - Integration Tests | ||
|
|
||
| import pytest | ||
| import json | ||
| from unittest.mock import Mock, patch, MagicMock | ||
|
|
||
|
|
||
| class TestFullWorkflowIntegration: | ||
| """Integration tests for complete workflow""" | ||
|
|
||
| @pytest.mark.integration | ||
| @patch('scripts.python.app.openai.ChatCompletion.create') | ||
| def test_pdf_to_firac_workflow(self, mock_openai, client, sample_pdf_base64, mock_openai_response): | ||
| """Test workflow from PDF extraction to FIRAC analysis""" | ||
| # Step 1: Extract PDF | ||
| extract_response = client.post('/extract-pdf', | ||
| data=json.dumps({'pdf_content': sample_pdf_base64}), | ||
| content_type='application/json') | ||
|
|
||
| assert extract_response.status_code == 200 | ||
| extract_data = json.loads(extract_response.data) | ||
|
|
||
| # Step 2: If extraction successful and has text, perform FIRAC analysis | ||
| if extract_data.get('success') and extract_data.get('text'): | ||
| text = extract_data['text'] | ||
|
|
||
| # Ensure text is long enough for analysis | ||
| if len(text) < 50: | ||
| text = text + " " * 50 # Pad for test | ||
|
|
||
| mock_openai.return_value = mock_openai_response | ||
|
|
||
| firac_response = client.post('/firac-analysis', | ||
| data=json.dumps({'text': text}), | ||
| content_type='application/json') | ||
|
|
||
| # Analysis should succeed with mocked OpenAI | ||
| if mock_openai.called: | ||
| assert firac_response.status_code == 200 | ||
|
|
||
| @pytest.mark.integration | ||
| @patch('scripts.python.app.openai.ChatCompletion.create') | ||
| @patch('requests.post') | ||
| def test_firac_to_jurisprudence_workflow(self, mock_requests, mock_openai, client, | ||
| sample_legal_text, mock_openai_response, | ||
| sample_datajud_response): | ||
| """Test workflow from FIRAC analysis to jurisprudence search""" | ||
| mock_openai.return_value = mock_openai_response | ||
|
|
||
| # Step 1: FIRAC Analysis | ||
| firac_response = client.post('/firac-analysis', | ||
| data=json.dumps({'text': sample_legal_text}), | ||
| content_type='application/json') | ||
|
|
||
| assert firac_response.status_code == 200 | ||
| firac_data = json.loads(firac_response.data) | ||
| assert firac_data['success'] is True | ||
|
|
||
| # Step 2: Search jurisprudence based on analysis | ||
| mock_api_response = Mock() | ||
| mock_api_response.status_code = 200 | ||
| mock_api_response.json.return_value = sample_datajud_response | ||
| mock_requests.return_value = mock_api_response | ||
|
|
||
| search_response = client.post('/datajud-search', | ||
| data=json.dumps({ | ||
| 'tribunal': 'tjsp', | ||
| 'texto_livre': 'cobranca contrato' | ||
| }), | ||
| content_type='application/json') | ||
|
|
||
| assert search_response.status_code == 200 | ||
| search_data = json.loads(search_response.data) | ||
| assert search_data['success'] is True | ||
|
|
||
| @pytest.mark.integration | ||
| @patch('scripts.python.app.openai.ChatCompletion.create') | ||
| @patch('requests.post') | ||
| def test_complete_analysis_workflow(self, mock_requests, mock_openai, client, | ||
| sample_legal_text, sample_case_data, | ||
| mock_openai_response, sample_datajud_response): | ||
| """Test complete workflow: FIRAC -> Search -> Distinguish -> Generate""" | ||
| # Mock OpenAI responses | ||
| mock_openai.return_value = mock_openai_response | ||
|
|
||
| # Mock DATAJUD response | ||
| mock_api_response = Mock() | ||
| mock_api_response.status_code = 200 | ||
| mock_api_response.json.return_value = sample_datajud_response | ||
| mock_requests.return_value = mock_api_response | ||
|
|
||
| # Step 1: FIRAC Analysis | ||
| firac_response = client.post('/firac-analysis', | ||
| data=json.dumps({'text': sample_legal_text}), | ||
| content_type='application/json') | ||
| assert firac_response.status_code == 200 | ||
|
|
||
| # Step 2: Jurisprudence Search | ||
| search_response = client.post('/datajud-search', | ||
| data=json.dumps({ | ||
| 'tribunal': 'tjsp', | ||
| 'texto_livre': 'cobranca' | ||
| }), | ||
| content_type='application/json') | ||
| assert search_response.status_code == 200 | ||
| search_data = json.loads(search_response.data) | ||
|
|
||
| # Step 3: Distinguish Analysis (if results found) | ||
| if search_data.get('total_results', 0) > 0: | ||
| precedent = sample_datajud_response['hits']['hits'][0]['_source'] | ||
| distinguish_response = client.post('/distinguish-analysis', | ||
| data=json.dumps({ | ||
| 'current_facts': sample_legal_text, | ||
| 'precedent_data': precedent | ||
| }), | ||
| content_type='application/json') | ||
| assert distinguish_response.status_code == 200 | ||
|
|
||
| # Step 4: Generate Document | ||
| generate_response = client.post('/generate-document', | ||
| data=json.dumps({ | ||
| 'document_type': 'sentenca', | ||
| 'case_data': sample_case_data | ||
| }), | ||
| content_type='application/json') | ||
| assert generate_response.status_code == 200 | ||
| generate_data = json.loads(generate_response.data) | ||
| assert generate_data['success'] is True | ||
| assert 'generated_text' in generate_data | ||
|
|
||
|
|
||
| class TestServiceAvailability: | ||
| """Tests for service availability and degradation""" | ||
|
|
||
| @pytest.mark.integration | ||
| def test_api_available_without_redis(self, client): | ||
| """Test that API works even if Redis is unavailable""" | ||
| # Health check should still work | ||
| response = client.get('/health') | ||
| assert response.status_code == 200 | ||
|
|
||
| data = json.loads(response.data) | ||
| assert data['status'] == 'healthy' | ||
| # Redis might be false but API should work | ||
| assert 'services' in data | ||
|
|
||
| @pytest.mark.integration | ||
| @patch('scripts.python.app.openai.ChatCompletion.create') | ||
| def test_api_handles_openai_failure(self, mock_openai, client, sample_legal_text): | ||
| """Test graceful handling of OpenAI API failure""" | ||
| mock_openai.side_effect = Exception("OpenAI API Error") | ||
|
|
||
| response = client.post('/firac-analysis', | ||
| data=json.dumps({'text': sample_legal_text}), | ||
| content_type='application/json') | ||
|
|
||
| # Should return error gracefully, not crash | ||
| assert response.status_code in [200, 500] | ||
| data = json.loads(response.data) | ||
| assert 'error' in data or data.get('success') is False | ||
|
|
||
| @pytest.mark.integration | ||
| @patch('requests.post') | ||
| def test_api_handles_datajud_failure(self, mock_requests, client): | ||
| """Test graceful handling of DATAJUD API failure""" | ||
| mock_requests.side_effect = Exception("Connection timeout") | ||
|
|
||
| response = client.post('/datajud-search', | ||
| data=json.dumps({'tribunal': 'tjsp'}), | ||
| content_type='application/json') | ||
|
|
||
| # Should return error gracefully | ||
| assert response.status_code in [200, 500] | ||
| data = json.loads(response.data) | ||
| assert 'error' in data or data.get('success') is False | ||
|
|
||
| @pytest.mark.integration | ||
| @patch('requests.post') | ||
| def test_api_handles_datajud_auth_failure(self, mock_requests, client): | ||
| """Test handling of DATAJUD authentication failure""" | ||
| mock_response = Mock() | ||
| mock_response.status_code = 401 | ||
| mock_response.text = "Unauthorized" | ||
| mock_requests.return_value = mock_response | ||
|
|
||
| response = client.post('/datajud-search', | ||
| data=json.dumps({'tribunal': 'tjsp'}), | ||
| content_type='application/json') | ||
|
|
||
| data = json.loads(response.data) | ||
| assert data['success'] is False | ||
|
|
||
|
|
||
| class TestConcurrentRequests: | ||
| """Tests for handling concurrent requests""" | ||
|
|
||
| @pytest.mark.integration | ||
| def test_multiple_health_checks(self, client): | ||
| """Test multiple concurrent health checks""" | ||
| responses = [] | ||
| for _ in range(10): | ||
| response = client.get('/health') | ||
| responses.append(response) | ||
|
|
||
| assert all(r.status_code == 200 for r in responses) | ||
|
|
||
| @pytest.mark.integration | ||
| @patch('scripts.python.app.openai.ChatCompletion.create') | ||
| def test_multiple_pdf_extractions(self, mock_openai, client, sample_pdf_base64): | ||
| """Test multiple PDF extractions""" | ||
| responses = [] | ||
| for _ in range(5): | ||
| response = client.post('/extract-pdf', | ||
| data=json.dumps({'pdf_content': sample_pdf_base64}), | ||
| content_type='application/json') | ||
| responses.append(response) | ||
|
|
||
| assert all(r.status_code == 200 for r in responses) | ||
|
|
||
|
|
||
| class TestDataValidation: | ||
| """Tests for data validation across endpoints""" | ||
|
|
||
| @pytest.mark.integration | ||
| def test_json_content_type_required(self, client): | ||
| """Test that JSON content type is properly handled""" | ||
| # Send without content-type | ||
| response = client.post('/firac-analysis', | ||
| data='{"text": "some text"}') | ||
|
|
||
| # Should handle gracefully | ||
| assert response.status_code in [200, 400, 415] | ||
|
|
||
| @pytest.mark.integration | ||
| def test_malformed_json_handled(self, client): | ||
| """Test handling of malformed JSON""" | ||
| response = client.post('/firac-analysis', | ||
| data='{"text": invalid json}', | ||
| content_type='application/json') | ||
|
|
||
| assert response.status_code == 400 | ||
|
|
||
| @pytest.mark.integration | ||
| def test_unicode_text_handled(self, client, sample_pdf_base64): | ||
| """Test handling of Unicode text (Portuguese characters)""" | ||
| unicode_text = """ | ||
| Processo judicial com caracteres especiais: | ||
| ação, execução, sentença, decisão, réu, autôr, | ||
| índice, cônsul, além, através, após, até | ||
| """ + " " * 100 # Ensure minimum length | ||
|
|
||
| with patch('scripts.python.app.openai.ChatCompletion.create') as mock_openai: | ||
| mock_response = Mock() | ||
| mock_response.choices = [Mock(message=Mock(content='{"result": "ok"}'))] | ||
| mock_openai.return_value = mock_response | ||
|
|
||
| response = client.post('/firac-analysis', | ||
| data=json.dumps({'text': unicode_text}), | ||
| content_type='application/json') | ||
|
|
||
| assert response.status_code == 200 |
There was a problem hiding this comment.
Run Black formatter to fix formatting issues.
The pipeline indicates this file needs Black formatting.
#!/bin/bash
# Format the file with Black
black tests/integration/test_api_integration.py --check --diff🧰 Tools
🪛 GitHub Actions: CI Pipeline
[error] 1-1: Black formatting check failed. Run 'black' to format code in this file.
🪛 Ruff (0.14.7)
209-209: Unused method argument: mock_openai
(ARG002)
244-244: Unused method argument: sample_pdf_base64
(ARG002)
🤖 Prompt for AI Agents
tests/integration/test_api_integration.py lines 1-261: the file is not
Black-formatted which causes CI to fail; run Black on this file and commit the
changes to satisfy formatting checks — e.g. run `black
tests/integration/test_api_integration.py` (or `black .` at repo root), verify
the diff, stage and commit the formatted file, and re-run the pipeline.
| # Sistema de Automacao Juridica - API Routes Unit Tests | ||
|
|
||
| import pytest | ||
| import json | ||
| from unittest.mock import Mock, patch, MagicMock | ||
|
|
||
|
|
||
| class TestHealthEndpoint: | ||
| """Tests for /health endpoint""" | ||
|
|
||
| @pytest.mark.unit | ||
| def test_health_check_returns_200(self, client): | ||
| """Test health endpoint returns 200 OK""" | ||
| response = client.get('/health') | ||
|
|
||
| assert response.status_code == 200 | ||
|
|
||
| @pytest.mark.unit | ||
| def test_health_check_returns_json(self, client): | ||
| """Test health endpoint returns JSON""" | ||
| response = client.get('/health') | ||
|
|
||
| assert response.content_type == 'application/json' | ||
|
|
||
| @pytest.mark.unit | ||
| def test_health_check_contains_status(self, client): | ||
| """Test health response contains status field""" | ||
| response = client.get('/health') | ||
| data = json.loads(response.data) | ||
|
|
||
| assert 'status' in data | ||
| assert data['status'] == 'healthy' | ||
|
|
||
| @pytest.mark.unit | ||
| def test_health_check_contains_services(self, client): | ||
| """Test health response contains services status""" | ||
| response = client.get('/health') | ||
| data = json.loads(response.data) | ||
|
|
||
| assert 'services' in data | ||
| assert 'redis' in data['services'] | ||
| assert 'openai' in data['services'] | ||
| assert 'datajud' in data['services'] | ||
|
|
||
| @pytest.mark.unit | ||
| def test_health_check_contains_version(self, client): | ||
| """Test health response contains version""" | ||
| response = client.get('/health') | ||
| data = json.loads(response.data) | ||
|
|
||
| assert 'version' in data | ||
| assert data['version'] == '1.0.0' | ||
|
|
||
|
|
||
| class TestExtractPDFEndpoint: | ||
| """Tests for /extract-pdf endpoint""" | ||
|
|
||
| @pytest.mark.unit | ||
| def test_extract_pdf_requires_pdf_content(self, client): | ||
| """Test that pdf_content is required""" | ||
| response = client.post('/extract-pdf', | ||
| data=json.dumps({}), | ||
| content_type='application/json') | ||
|
|
||
| assert response.status_code == 400 | ||
| data = json.loads(response.data) | ||
| assert 'error' in data | ||
|
|
||
| @pytest.mark.unit | ||
| def test_extract_pdf_accepts_base64(self, client, sample_pdf_base64): | ||
| """Test that base64 encoded PDF is accepted""" | ||
| response = client.post('/extract-pdf', | ||
| data=json.dumps({'pdf_content': sample_pdf_base64}), | ||
| content_type='application/json') | ||
|
|
||
| assert response.status_code == 200 | ||
|
|
||
| @pytest.mark.unit | ||
| def test_extract_pdf_returns_text(self, client, sample_pdf_base64): | ||
| """Test that extracted text is returned""" | ||
| response = client.post('/extract-pdf', | ||
| data=json.dumps({'pdf_content': sample_pdf_base64}), | ||
| content_type='application/json') | ||
|
|
||
| data = json.loads(response.data) | ||
| assert 'text' in data or 'error' in data | ||
|
|
||
| @pytest.mark.unit | ||
| def test_extract_pdf_invalid_base64(self, client): | ||
| """Test handling of invalid base64 content""" | ||
| response = client.post('/extract-pdf', | ||
| data=json.dumps({'pdf_content': 'not-valid-base64!!!'}), | ||
| content_type='application/json') | ||
|
|
||
| assert response.status_code in [400, 500] | ||
|
|
||
| @pytest.mark.unit | ||
| def test_extract_pdf_empty_request(self, client): | ||
| """Test handling of empty request body""" | ||
| response = client.post('/extract-pdf', | ||
| data='', | ||
| content_type='application/json') | ||
|
|
||
| assert response.status_code == 400 | ||
|
|
||
|
|
||
| class TestFIRACAnalysisEndpoint: | ||
| """Tests for /firac-analysis endpoint""" | ||
|
|
||
| @pytest.mark.unit | ||
| def test_firac_requires_text(self, client): | ||
| """Test that text field is required""" | ||
| response = client.post('/firac-analysis', | ||
| data=json.dumps({}), | ||
| content_type='application/json') | ||
|
|
||
| assert response.status_code == 400 | ||
| data = json.loads(response.data) | ||
| assert 'error' in data | ||
|
|
||
| @pytest.mark.unit | ||
| def test_firac_rejects_short_text(self, client): | ||
| """Test that very short text is rejected""" | ||
| response = client.post('/firac-analysis', | ||
| data=json.dumps({'text': 'Too short'}), | ||
| content_type='application/json') | ||
|
|
||
| assert response.status_code == 400 | ||
| data = json.loads(response.data) | ||
| assert 'error' in data | ||
| assert 'short' in data['error'].lower() or 'minimum' in data['error'].lower() | ||
|
|
||
| @pytest.mark.unit | ||
| @patch('scripts.python.app.openai.ChatCompletion.create') | ||
| def test_firac_calls_openai(self, mock_openai, client, sample_legal_text, mock_openai_response): | ||
| """Test that OpenAI is called for analysis""" | ||
| mock_openai.return_value = mock_openai_response | ||
|
|
||
| response = client.post('/firac-analysis', | ||
| data=json.dumps({'text': sample_legal_text}), | ||
| content_type='application/json') | ||
|
|
||
| mock_openai.assert_called_once() | ||
|
|
||
| @pytest.mark.unit | ||
| @patch('scripts.python.app.openai.ChatCompletion.create') | ||
| def test_firac_returns_analysis(self, mock_openai, client, sample_legal_text, mock_openai_response): | ||
| """Test that FIRAC analysis is returned""" | ||
| mock_openai.return_value = mock_openai_response | ||
|
|
||
| response = client.post('/firac-analysis', | ||
| data=json.dumps({'text': sample_legal_text}), | ||
| content_type='application/json') | ||
|
|
||
| assert response.status_code == 200 | ||
| data = json.loads(response.data) | ||
| assert data['success'] is True | ||
| assert 'firac_analysis' in data | ||
|
|
||
|
|
||
| class TestDatajudSearchEndpoint: | ||
| """Tests for /datajud-search endpoint""" | ||
|
|
||
| @pytest.mark.unit | ||
| def test_datajud_accepts_empty_params(self, client): | ||
| """Test that empty params returns error or default search""" | ||
| response = client.post('/datajud-search', | ||
| data=json.dumps({}), | ||
| content_type='application/json') | ||
|
|
||
| # Should return 200 with error in body or perform default search | ||
| assert response.status_code in [200, 400] | ||
|
|
||
| @pytest.mark.unit | ||
| @patch('requests.post') | ||
| def test_datajud_calls_api(self, mock_post, client): | ||
| """Test that DATAJUD API is called""" | ||
| mock_response = Mock() | ||
| mock_response.status_code = 200 | ||
| mock_response.json.return_value = {"hits": {"total": {"value": 0}, "hits": []}} | ||
| mock_post.return_value = mock_response | ||
|
|
||
| response = client.post('/datajud-search', | ||
| data=json.dumps({ | ||
| 'tribunal': 'tjsp', | ||
| 'texto_livre': 'execucao fiscal' | ||
| }), | ||
| content_type='application/json') | ||
|
|
||
| mock_post.assert_called_once() | ||
|
|
||
| @pytest.mark.unit | ||
| @patch('requests.post') | ||
| def test_datajud_returns_results(self, mock_post, client, sample_datajud_response): | ||
| """Test that search results are returned""" | ||
| mock_response = Mock() | ||
| mock_response.status_code = 200 | ||
| mock_response.json.return_value = sample_datajud_response | ||
| mock_post.return_value = mock_response | ||
|
|
||
| response = client.post('/datajud-search', | ||
| data=json.dumps({'tribunal': 'tjsp'}), | ||
| content_type='application/json') | ||
|
|
||
| assert response.status_code == 200 | ||
| data = json.loads(response.data) | ||
| assert data['success'] is True | ||
| assert 'total_results' in data | ||
|
|
||
|
|
||
| class TestDistinguishAnalysisEndpoint: | ||
| """Tests for /distinguish-analysis endpoint""" | ||
|
|
||
| @pytest.mark.unit | ||
| def test_distinguish_requires_current_facts(self, client, sample_precedent_data): | ||
| """Test that current_facts is required""" | ||
| response = client.post('/distinguish-analysis', | ||
| data=json.dumps({'precedent_data': sample_precedent_data}), | ||
| content_type='application/json') | ||
|
|
||
| assert response.status_code == 400 | ||
|
|
||
| @pytest.mark.unit | ||
| def test_distinguish_requires_precedent_data(self, client): | ||
| """Test that precedent_data is required""" | ||
| response = client.post('/distinguish-analysis', | ||
| data=json.dumps({'current_facts': 'Some facts'}), | ||
| content_type='application/json') | ||
|
|
||
| assert response.status_code == 400 | ||
|
|
||
| @pytest.mark.unit | ||
| @patch('scripts.python.app.openai.ChatCompletion.create') | ||
| def test_distinguish_returns_analysis(self, mock_openai, client, sample_legal_text, | ||
| sample_precedent_data, mock_openai_response): | ||
| """Test that distinguish analysis is returned""" | ||
| mock_openai.return_value = mock_openai_response | ||
|
|
||
| response = client.post('/distinguish-analysis', | ||
| data=json.dumps({ | ||
| 'current_facts': sample_legal_text, | ||
| 'precedent_data': sample_precedent_data | ||
| }), | ||
| content_type='application/json') | ||
|
|
||
| assert response.status_code == 200 | ||
| data = json.loads(response.data) | ||
| assert data['success'] is True | ||
|
|
||
|
|
||
| class TestGenerateDocumentEndpoint: | ||
| """Tests for /generate-document endpoint""" | ||
|
|
||
| @pytest.mark.unit | ||
| def test_generate_requires_document_type(self, client, sample_case_data): | ||
| """Test that document_type is required""" | ||
| response = client.post('/generate-document', | ||
| data=json.dumps({'case_data': sample_case_data}), | ||
| content_type='application/json') | ||
|
|
||
| assert response.status_code == 400 | ||
|
|
||
| @pytest.mark.unit | ||
| def test_generate_requires_case_data(self, client): | ||
| """Test that case_data is required""" | ||
| response = client.post('/generate-document', | ||
| data=json.dumps({'document_type': 'sentenca'}), | ||
| content_type='application/json') | ||
|
|
||
| assert response.status_code == 400 | ||
|
|
||
| @pytest.mark.unit | ||
| @patch('scripts.python.app.openai.ChatCompletion.create') | ||
| def test_generate_sentenca(self, mock_openai, client, sample_case_data): | ||
| """Test generating sentenca document""" | ||
| mock_response = Mock() | ||
| mock_response.choices = [Mock(message=Mock(content="Generated sentenca content"))] | ||
| mock_openai.return_value = mock_response | ||
|
|
||
| response = client.post('/generate-document', | ||
| data=json.dumps({ | ||
| 'document_type': 'sentenca', | ||
| 'case_data': sample_case_data | ||
| }), | ||
| content_type='application/json') | ||
|
|
||
| assert response.status_code == 200 | ||
| data = json.loads(response.data) | ||
| assert data['success'] is True | ||
| assert data['document_type'] == 'sentenca' | ||
|
|
||
| @pytest.mark.unit | ||
| @patch('scripts.python.app.openai.ChatCompletion.create') | ||
| def test_generate_despacho(self, mock_openai, client, sample_case_data): | ||
| """Test generating despacho document""" | ||
| mock_response = Mock() | ||
| mock_response.choices = [Mock(message=Mock(content="Generated despacho content"))] | ||
| mock_openai.return_value = mock_response | ||
|
|
||
| response = client.post('/generate-document', | ||
| data=json.dumps({ | ||
| 'document_type': 'despacho', | ||
| 'case_data': sample_case_data | ||
| }), | ||
| content_type='application/json') | ||
|
|
||
| assert response.status_code == 200 | ||
| data = json.loads(response.data) | ||
| assert data['document_type'] == 'despacho' | ||
|
|
||
| @pytest.mark.unit | ||
| def test_generate_invalid_type(self, client, sample_case_data): | ||
| """Test that invalid document type returns error""" | ||
| response = client.post('/generate-document', | ||
| data=json.dumps({ | ||
| 'document_type': 'invalid_type', | ||
| 'case_data': sample_case_data | ||
| }), | ||
| content_type='application/json') | ||
|
|
||
| # The endpoint should handle this - either 400 or 200 with error | ||
| data = json.loads(response.data) | ||
| assert data['success'] is False or response.status_code == 400 | ||
|
|
||
|
|
||
| class TestErrorHandling: | ||
| """Tests for error handling""" | ||
|
|
||
| @pytest.mark.unit | ||
| def test_404_returns_json(self, client): | ||
| """Test that 404 errors return JSON""" | ||
| response = client.get('/nonexistent-endpoint') | ||
|
|
||
| assert response.status_code == 404 | ||
| assert response.content_type == 'application/json' | ||
|
|
||
| @pytest.mark.unit | ||
| def test_method_not_allowed(self, client): | ||
| """Test handling of wrong HTTP method""" | ||
| response = client.get('/extract-pdf') | ||
|
|
||
| assert response.status_code == 405 |
There was a problem hiding this comment.
Run Black formatter to fix formatting issues.
The pipeline indicates this file needs Black formatting. Run black tests/unit/test_api_routes.py to resolve the CI failure.
#!/bin/bash
# Format the file with Black
black tests/unit/test_api_routes.py --check --diff🧰 Tools
🪛 GitHub Actions: CI Pipeline
[error] 1-1: Black formatting check failed. Run 'black' to format code in this file.
🪛 Ruff (0.14.7)
139-139: Local variable response is assigned to but never used
Remove assignment to unused variable response
(F841)
183-183: Local variable response is assigned to but never used
Remove assignment to unused variable response
(F841)
🤖 Prompt for AI Agents
tests/unit/test_api_routes.py lines 1-342: the file is not Black-formatted which
fails CI; run the Black formatter on this file (e.g., black
tests/unit/test_api_routes.py), stage and commit the resulting changes so the
file matches the project formatting rules and the CI Black check passes.
| @patch('PyPDF2.PdfReader') | ||
| def test_extract_text_multi_page(self, mock_pdf_reader, pdf_extractor): | ||
| """Test extraction from multi-page PDF""" | ||
| # Mock multiple pages | ||
| mock_page1 = Mock() | ||
| mock_page1.extract_text.return_value = "Page 1 content" | ||
| mock_page2 = Mock() | ||
| mock_page2.extract_text.return_value = "Page 2 content" | ||
|
|
||
| mock_reader = Mock() | ||
| mock_reader.pages = [mock_page1, mock_page2] | ||
| mock_pdf_reader.return_value = mock_reader | ||
|
|
||
| result = pdf_extractor.extract_text_from_pdf(b"dummy pdf content") | ||
|
|
||
| assert result['success'] is True | ||
| assert result['pages'] == 2 | ||
| assert "Page 1 content" in result['text'] | ||
| assert "Page 2 content" in result['text'] |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Locate the PDFExtractor implementation
echo "=== Finding PDFExtractor implementation ==="
find . -type f -name "*.py" | xargs grep -l "class PDFExtractor"
# Check imports in app.py
echo ""
echo "=== Checking PyPDF2 imports in app.py ==="
rg -n "import.*PyPDF2|from.*PyPDF2" --type py
# Check PdfReader usage in PDFExtractor
echo ""
echo "=== Checking PdfReader usage ==="
rg -n "PdfReader" --type py -B 2 -A 2Repository: fbmoulin/2acivelagent
Length of output: 1577
Update the patch path to target where PyPDF2 is used in the implementation.
The current @patch('PyPDF2.PdfReader') patches the global PyPDF2 module, but since scripts/python/app.py imports PyPDF2 at the module level and uses PyPDF2.PdfReader() directly, the patch should target the imported reference:
-@patch('PyPDF2.PdfReader')
+@patch('scripts.python.app.PyPDF2.PdfReader')The mock must patch where the object is used (in the app.py namespace), not where it's defined (in the PyPDF2 module).
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| @patch('PyPDF2.PdfReader') | |
| def test_extract_text_multi_page(self, mock_pdf_reader, pdf_extractor): | |
| """Test extraction from multi-page PDF""" | |
| # Mock multiple pages | |
| mock_page1 = Mock() | |
| mock_page1.extract_text.return_value = "Page 1 content" | |
| mock_page2 = Mock() | |
| mock_page2.extract_text.return_value = "Page 2 content" | |
| mock_reader = Mock() | |
| mock_reader.pages = [mock_page1, mock_page2] | |
| mock_pdf_reader.return_value = mock_reader | |
| result = pdf_extractor.extract_text_from_pdf(b"dummy pdf content") | |
| assert result['success'] is True | |
| assert result['pages'] == 2 | |
| assert "Page 1 content" in result['text'] | |
| assert "Page 2 content" in result['text'] | |
| @patch('scripts.python.app.PyPDF2.PdfReader') | |
| def test_extract_text_multi_page(self, mock_pdf_reader, pdf_extractor): | |
| """Test extraction from multi-page PDF""" | |
| # Mock multiple pages | |
| mock_page1 = Mock() | |
| mock_page1.extract_text.return_value = "Page 1 content" | |
| mock_page2 = Mock() | |
| mock_page2.extract_text.return_value = "Page 2 content" | |
| mock_reader = Mock() | |
| mock_reader.pages = [mock_page1, mock_page2] | |
| mock_pdf_reader.return_value = mock_reader | |
| result = pdf_extractor.extract_text_from_pdf(b"dummy pdf content") | |
| assert result['success'] is True | |
| assert result['pages'] == 2 | |
| assert "Page 1 content" in result['text'] | |
| assert "Page 2 content" in result['text'] |
🤖 Prompt for AI Agents
In tests/unit/test_pdf_extractor.py around lines 68 to 86, the test currently
patches PyPDF2 at its module location ('PyPDF2.PdfReader') but the code under
test imports and uses PyPDF2 from scripts/python/app.py; update the patch to
target the symbol where it is used (e.g., change the decorator to patch
'scripts.python.app.PyPDF2.PdfReader' or the actual import path your app module
exposes), so the mock replaces the PdfReader reference in app.py, keeping the
rest of the test logic the same.
Security fixes implemented:
1. alembic.ini: Remove hardcoded database credentials
2. migrations/env.py: Require database password from environment,
no longer accepts default values for sensitive credentials
3. docker-compose.yml: Remove exposed ports for internal services
- PostgreSQL (5432), Redis (6379), Python services (5000),
and Prometheus (9090) now only accessible within Docker network
- Services communicate through internal network, not host ports
4. scripts/python/app.py: Complete security rewrite
- Add JWT authentication with token verification
- Restrict CORS to configured allowed origins only
- Add input validation with length limits
- Add prompt injection detection and blocking
- Implement secure error handling (no stack traces exposed)
- Add request ID tracking for audit trail
- Remove debug mode in production
- Add configuration validation at startup
5. .github/workflows/ci.yml: Enforce security scans
- Bandit security scan now fails pipeline on issues
- Safety dependency check now fails on vulnerabilities
- Security job added to required CI checks
These changes address CRITICAL and HIGH severity issues identified
in the security analysis.
There was a problem hiding this comment.
Actionable comments posted: 3
♻️ Duplicate comments (4)
scripts/python/app.py (4)
128-130: Deprecated OpenAI API key assignment.Setting
openai.api_keyis deprecated inopenai>=1.0. The v1.x library reads fromOPENAI_API_KEYenvironment variable automatically when creating a client. Remove this assignment and instantiate clients properly.This is related to the OpenAI v1.x migration issue flagged in past reviews.
376-406: Critical: OpenAI API usage incompatible with openai v1.x.
openai.ChatCompletion.create()andopenai.error.*exceptions were removed in openai v1.0+. Sinceopenai>=1.6.1is required, this code will fail withAttributeErrorat runtime.As flagged in past reviews, migrate to the client-based API:
from openai import OpenAI client = OpenAI() # Uses OPENAI_API_KEY env var automatically try: response = client.chat.completions.create( model=Config.OPENAI_MODEL, messages=[ {"role": "system", "content": cls.SYSTEM_PROMPT}, {"role": "user", "content": prompt} ], max_tokens=Config.OPENAI_MAX_TOKENS, temperature=0.3, timeout=Config.OPENAI_TIMEOUT ) return { "success": True, "firac_analysis": response.choices[0].message.content, ... } except openai.RateLimitError: ... except openai.AuthenticationError: ... except openai.APITimeoutError: ...
520-540: Same OpenAI v1.x incompatibility issue.This method uses the deprecated
openai.ChatCompletion.create()API, same asFIRACAnalyzer. Apply the same migration pattern using the client-based API.
572-593: Same OpenAI v1.x incompatibility issue.This method also uses the deprecated
openai.ChatCompletion.create()API. Apply the same client-based migration.
🧹 Nitpick comments (7)
.github/workflows/ci.yml (4)
12-14: UnusedPOETRY_VERSIONenvironment variable.
POETRY_VERSIONis defined but the workflow usespipthroughout. Either remove this unused variable or switch dependency management to Poetry if that's the intended approach.env: PYTHON_VERSION: '3.11' - POETRY_VERSION: '1.7.1'
40-43: Consider installing type stubs for mypy.The lint job installs
black isort flake8 mypybut not type stubs. If mypy is intended to provide meaningful checks, consider adding common stubs or using--install-types --non-interactive.
197-200: Coverage job re-runs all tests - consider caching coverage data.This job runs the full test suite again after unit and integration tests already passed. For efficiency, consider collecting coverage artifacts from previous jobs and merging them, or accept this as the simplest approach for now.
294-305: Consider includingdocker-buildin failure checks.The
ci-successjob checkslint,test-unit,test-integration, andsecuritybut notcoverageordocker-build. If Docker build failures should block merging, add it to the condition:if [[ "${{ needs.lint.result }}" == "failure" ]] || \ [[ "${{ needs.test-unit.result }}" == "failure" ]] || \ [[ "${{ needs.test-integration.result }}" == "failure" ]] || \ + [[ "${{ needs.docker-build.result }}" == "failure" ]] || \ [[ "${{ needs.security.result }}" == "failure" ]]; thenscripts/python/app.py (3)
238-249: Annotate class constant withClassVar.
DANGEROUS_PATTERNSis a class-level constant. Annotating it withClassVarimproves type checking clarity.+from typing import ClassVar + class InputValidator: """Input validation and sanitization utilities.""" # Patterns that might indicate prompt injection - DANGEROUS_PATTERNS = [ + DANGEROUS_PATTERNS: ClassVar[list[str]] = [ r'ignore\s+(previous|all|above)',
342-347: Uselogging.exceptionto include stack traces for debugging.In error handlers,
logging.errordoesn't include the stack trace. Usinglogging.exceptionprovides more context for debugging while still sanitizing the user-facing message.except PyPDF2.errors.PdfReadError as e: - logger.error(f"PDF read error: {type(e).__name__}") + logger.exception("PDF read error") return {"success": False, "error": "Invalid or corrupted PDF file"} except Exception as e: - logger.error(f"PDF extraction error: {type(e).__name__}: {e}") + logger.exception("PDF extraction error") return {"success": False, "error": "Failed to extract text from PDF"}Apply similar changes to other error handlers (lines 399-406, 483, 539, 592).
183-184:datetime.utcnow()is deprecated.
datetime.utcnow()is deprecated since Python 3.12. Usedatetime.now(timezone.utc)for forward compatibility:+from datetime import datetime, timedelta, timezone def create_jwt_token(user_id: str, roles: List[str] = None) -> str: """Create JWT token for authentication.""" payload = { 'user_id': user_id, 'roles': roles or ['user'], - 'iat': datetime.utcnow(), - 'exp': datetime.utcnow() + timedelta(hours=Config.JWT_EXPIRY_HOURS) + 'iat': datetime.now(timezone.utc), + 'exp': datetime.now(timezone.utc) + timedelta(hours=Config.JWT_EXPIRY_HOURS) }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
.github/workflows/ci.yml(1 hunks)alembic.ini(1 hunks)docker-compose.yml(4 hunks)migrations/env.py(1 hunks)scripts/python/app.py(1 hunks)
🧰 Additional context used
🪛 Checkov (3.2.334)
.github/workflows/ci.yml
[medium] 166-167: Basic Auth Credentials
(CKV_SECRET_4)
🪛 Ruff (0.14.7)
migrations/env.py
46-49: Avoid specifying long messages outside the exception class
(TRY003)
scripts/python/app.py
1-1: Shebang is present but file is not executable
(EXE001)
73-73: Possible hardcoded password assigned to: "JWT_SECRET"
(S105)
145-145: Do not catch blind exception: Exception
(BLE001)
178-178: PEP 484 prohibits implicit Optional
Convert to T | None
(RUF013)
193-193: Consider moving this statement to an else block
(TRY300)
239-249: Mutable class attributes should be annotated with typing.ClassVar
(RUF012)
252-252: PEP 484 prohibits implicit Optional
Convert to T | None
(RUF013)
343-343: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
345-345: Do not catch blind exception: Exception
(BLE001)
346-346: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
399-399: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
404-404: Do not catch blind exception: Exception
(BLE001)
405-405: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
483-483: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
538-538: Do not catch blind exception: Exception
(BLE001)
539-539: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
549-553: Mutable class attributes should be annotated with typing.ClassVar
(RUF012)
591-591: Do not catch blind exception: Exception
(BLE001)
592-592: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
702-702: Do not catch blind exception: Exception
(BLE001)
832-832: Unused function argument: error
(ARG001)
841-841: Unused function argument: error
(ARG001)
850-850: Unused function argument: error
(ARG001)
859-859: Unused function argument: error
(ARG001)
868-868: Unused function argument: error
(ARG001)
877-877: Unused function argument: error
(ARG001)
🔇 Additional comments (10)
alembic.ini (1)
26-30: Previous security concern successfully addressed.The hardcoded database credentials flagged in earlier reviews have been removed. The configuration now correctly directs users to set
DATABASE_URLorPOSTGRES_URLvia environment variables, with clear inline documentation.migrations/env.py (3)
44-49: Previous security concern successfully addressed.The function now properly requires the password via environment variable and raises a clear error if not set, eliminating the risk of using a hardcoded default.
59-81: LGTM!The offline migration mode is properly configured with appropriate flags (
literal_binds=True,compare_type=True,compare_server_default=True) for accurate schema comparison.
84-109: LGTM!The online migration mode is correctly implemented with
NullPoolfor migrations andinclude_schemas=Trueto support the multi-schema setup (judicial and audit schemas) described in the PR.docker-compose.yml (1)
69-72: Good security hardening - internal services no longer exposed to host.Changing from
portstoexposefor postgres, redis, python-services, and prometheus is the correct approach. These services are only needed within the Docker network and should be accessed through nginx reverse proxy or viadocker execduring development.Also applies to: 88-91, 108-110, 157-159
.github/workflows/ci.yml (3)
91-94: Verify pytest marker logic includes intended tests.The marker
-m "unit or not integration"runs tests markedunitOR any test not markedintegration. This means unmarked tests run in unit tests. If this is intentional for gradual marker adoption, consider adding a comment explaining the behavior.
126-128: Test credentials in CI are acceptable.The Checkov warning about credentials (CKV_SECRET_4) is a false positive here. These are ephemeral test database credentials used only within the CI runner's isolated PostgreSQL service container.
220-256: Security scanning job is well-configured.Running Bandit with
-ll -iifor medium+ severity/confidence and Safety for dependency vulnerabilities provides good coverage. The job runs independently in parallel, which is efficient.scripts/python/app.py (2)
720-728: Good fix: Deterministic cache key using hashlib.The cache key now uses
hashlib.sha256()instead of the non-deterministichash()function. This ensures consistent cache hits across interpreter sessions.
831-892: Error handlers follow Flask conventions.The unused
errorparameter warnings (ARG001) are false positives. Flask requires this parameter signature for error handlers. The handlers intentionally don't expose internal error details to prevent information disclosure.
| host = os.getenv("POSTGRES_HOST", "localhost") | ||
| port = os.getenv("POSTGRES_PORT", "5432") | ||
| db = os.getenv("POSTGRES_DB", "n8n") | ||
| user = os.getenv("POSTGRES_USER", "n8n_user") |
There was a problem hiding this comment.
Default database name and user appear misaligned with project purpose.
The defaults reference "n8n" (line 53: POSTGRES_DB defaults to "n8n", line 54: POSTGRES_USER defaults to "n8n_user"), which seems to be from a different project. This is a judicial automation system per the PR description, so these defaults may confuse developers during local setup.
Consider updating the defaults to reflect the actual project:
- db = os.getenv("POSTGRES_DB", "n8n")
- user = os.getenv("POSTGRES_USER", "n8n_user")
+ db = os.getenv("POSTGRES_DB", "judicial_automation")
+ user = os.getenv("POSTGRES_USER", "judicial_user")📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| host = os.getenv("POSTGRES_HOST", "localhost") | |
| port = os.getenv("POSTGRES_PORT", "5432") | |
| db = os.getenv("POSTGRES_DB", "n8n") | |
| user = os.getenv("POSTGRES_USER", "n8n_user") | |
| host = os.getenv("POSTGRES_HOST", "localhost") | |
| port = os.getenv("POSTGRES_PORT", "5432") | |
| db = os.getenv("POSTGRES_DB", "judicial_automation") | |
| user = os.getenv("POSTGRES_USER", "judicial_user") |
🤖 Prompt for AI Agents
In migrations/env.py around lines 51 to 54, the POSTGRES_DB and POSTGRES_USER
default values ("n8n" and "n8n_user") are from another project; update these
defaults to project-appropriate values (for example a descriptive DB name like
"judicial_automation" and a matching user like "judicial_user" or blank them so
callers must set env vars). Modify the os.getenv default arguments accordingly
and ensure any downstream docs or .env.example are updated to match the new
defaults.
| @classmethod | ||
| def validate(cls) -> Tuple[bool, List[str]]: | ||
| """Validate required configuration.""" | ||
| errors = [] | ||
|
|
||
| if not cls.JWT_SECRET or cls.JWT_SECRET == 'dev-secret-key': | ||
| errors.append("JWT_SECRET must be set to a secure value") | ||
|
|
||
| if not cls.OPENAI_API_KEY: | ||
| errors.append("OPENAI_API_KEY must be set") | ||
|
|
||
| return len(errors) == 0, errors |
There was a problem hiding this comment.
Configuration validation only runs in __main__ block.
Config.validate() is called only when running directly via python app.py. When deployed via Gunicorn (gunicorn app:app), this validation is bypassed. Consider validating at module load time or in a factory function.
+# Validate configuration at import time for production safety
+if os.getenv('FLASK_ENV') != 'development':
+ valid, errors = Config.validate()
+ if not valid:
+ raise RuntimeError(f"Configuration errors: {'; '.join(errors)}")Add this after the Config class definition (around line 80).
🧰 Tools
🪛 Ruff (0.14.7)
73-73: Possible hardcoded password assigned to: "JWT_SECRET"
(S105)
🤖 Prompt for AI Agents
In scripts/python/app.py around lines 68 to 79 (add after the Config class at
~line 80): Config.validate() currently only runs in the __main__ block so
Gunicorn bypasses it; call Config.validate() at module load (immediately after
the class definition) and raise a RuntimeError or exit when validation fails so
the app fails fast under Gunicorn and other import-based runners; ensure the
raised error includes the list of validation messages for clear diagnostics.
| if __name__ == '__main__': | ||
| # Validate configuration | ||
| valid, errors = Config.validate() | ||
| if not valid: | ||
| for error in errors: | ||
| logger.critical(f"Configuration error: {error}") | ||
| logger.critical("Application cannot start with invalid configuration") | ||
| exit(1) | ||
|
|
||
| logger.info("Starting Judicial Automation API (Security-Hardened)...") | ||
| logger.info(f"Allowed origins: {Config.ALLOWED_ORIGINS}") | ||
| logger.info(f"OpenAI configured: {bool(Config.OPENAI_API_KEY)}") | ||
| logger.info(f"DATAJUD configured: {bool(Config.DATAJUD_USERNAME)}") | ||
| logger.info(f"Redis configured: {redis_client is not None}") | ||
|
|
||
| # IMPORTANT: In production, use a proper WSGI server like Gunicorn | ||
| # gunicorn -w 4 -b 0.0.0.0:5000 app:app | ||
| app.run(host='127.0.0.1', port=5000) |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
fd -t f "Dockerfile.python" --exec cat {}Repository: fbmoulin/2acivelagent
Length of output: 1790
🏁 Script executed:
fd -t f "requirements.txt" --exec grep -i gunicorn {} +Repository: fbmoulin/2acivelagent
Length of output: 99
🏁 Script executed:
fd -t f "requirements.txt" --exec cat {}Repository: fbmoulin/2acivelagent
Length of output: 1219
Use Gunicorn (or another production WSGI server) instead of Flask development server.
The code includes a comment recommending Gunicorn for production, but Dockerfile.python runs the application with CMD ["python", "app.py"], which executes Flask's development server. Gunicorn is not installed (missing from requirements.txt). The development server is unsuitable for production and exposes unnecessary risks. Add gunicorn to requirements.txt and update the Dockerfile's CMD to use Gunicorn, such as: CMD ["gunicorn", "-w", "4", "-b", "0.0.0.0:5000", "app:app"].
Documentation updates: 1. README.md - Updated security section with new security measures - Added infrastructure security diagram - Added JWT authentication examples for all API endpoints - Updated LGPD compliance section 2. .env.template - Added JWT configuration variables (JWT_ALGORITHM, JWT_EXPIRY_HOURS) - Added API_AUTH_ENABLED flag - Added ALLOWED_ORIGINS for CORS configuration - Added command to generate secure JWT secret 3. docs/DEPLOYMENT.md (NEW) - Complete step-by-step deployment tutorial - Server preparation and requirements - Docker and firewall configuration - SSL/TLS setup with Let's Encrypt - Database migrations guide - Service startup order - Post-deployment verification checklist - Monitoring setup with Grafana - Backup configuration - Security hardening steps - Troubleshooting guide 4. docs/SECURITY.md (NEW) - Security architecture overview - JWT authentication guide - Network security configuration - Data protection guidelines - Input validation and prompt injection protection - Security headers and rate limiting - Monitoring and auditing setup - Incident response procedures - LGPD compliance requirements - Security checklists (pre/post deployment, periodic)
There was a problem hiding this comment.
Actionable comments posted: 0
♻️ Duplicate comments (1)
.env.template (1)
35-35: Improve security placeholders and add clearer generation guidance.The placeholders for sensitive keys should use clearer formats and include direct instructions rather than generic descriptions.
- OPENAI_API_KEY=sk-your_openai_api_key_here + OPENAI_API_KEY=sk-CHANGE_ME_MUST_BE_SETFor JWT and encryption keys, provide inline instructions to generate secure values:
- # REQUIRED: Generate with: python -c "import secrets; print(secrets.token_hex(32))" - JWT_SECRET=generate_a_secure_64_character_hex_key_here + # REQUIRED: Generate with: python3 -c "import secrets; print(secrets.token_hex(32))" + # Example: JWT_SECRET=a1b2c3d4e5f6... (64 hex characters) + JWT_SECRET=CHANGE_ME_GENERATE_WITH_COMMAND_ABOVEAnd similarly for ENCRYPTION_KEY:
- # Encryption key for sensitive data - ENCRYPTION_KEY=generate_another_32_character_key + # Encryption key for sensitive data + # Generate with: python3 -c "import secrets; print(secrets.token_hex(16))" + ENCRYPTION_KEY=CHANGE_ME_GENERATE_WITH_COMMAND_ABOVEThis prevents accidental use of placeholder values and makes the setup process more explicit.
Also applies to: 81-81, 92-92
🧹 Nitpick comments (6)
docs/SECURITY.md (2)
446-448: Convert bare URLs to proper markdown links for better documentation.Lines 446 and 551 contain bare URLs that should be formatted as markdown links for consistency with documentation standards and to improve user experience.
- - Security Team Email: security@your-domain.com - - Emergency Phone: +55 XX XXXX-XXXX - - External Security Consultant: contact info + - Security Team Email: [security@your-domain.com](mailto:security@your-domain.com) + - Emergency Phone: [+55 XX XXXX-XXXX](tel:+55XXXXXXXXXX) + - External Security Consultant: [contact info](CONTACT_INFO_URL)And similarly at line 551 for the GitHub Issues URL:
- 1. **Do NOT** open a public GitHub issue - 2. Email security details to: security@your-domain.com + 1. **Do NOT** open a public GitHub issue + 2. Email security details to: [security@your-domain.com](mailto:security@your-domain.com)Also applies to: 551-553
117-119: Minor grammar improvements in token best practices section.Line 117 repeats "only" and line 119 has a compound adjective that needs a hyphen:
- 1. **Never log tokens** - Tokens should not appear in logs - 2. **Use HTTPS only** - Tokens must only be transmitted over TLS - 3. **Short expiry** - Default 24 hours, configurable via `JWT_EXPIRY_HOURS` - 4. **Secure storage** - Store tokens securely on client side + 1. **Never log tokens** - Tokens should not appear in logs + 2. **Use HTTPS only** - Tokens must be transmitted over TLS only + 3. **Short expiry** - Default 24 hours, configurable via `JWT_EXPIRY_HOURS` + 4. **Secure storage** - Store tokens securely on client-side systems.env.template (2)
68-68: Quote SMTP_FROM_NAME value to ensure shell safety.Line 68 contains spaces and should be quoted for safety when sourced in shell scripts:
- SMTP_FROM_NAME=Sistema Automacao Juridica + SMTP_FROM_NAME="Sistema Automacao Juridica"
1-4: Add usage instructions at the top of the template.Provide clear guidance on how to use this template to avoid accidental deployment with placeholder values:
+ # IMPORTANT: This is a template file. To use it: + # 1. Copy to .env: cp .env.template .env + # 2. Edit all CHANGE_ME_ values + # 3. Replace all generated keys (JWT_SECRET, ENCRYPTION_KEY, passwords) + # 4. NEVER commit .env to version control + # 5. DO NOT use this template directly - always create a new .env file first + #docs/DEPLOYMENT.md (2)
503-515: Consider adding backup encryption for production security.The backup script (lines 503-515) compresses backups with gzip but does not encrypt them. This is inconsistent with the SECURITY.md guide which recommends encrypted backups.
Add GPG encryption to sensitive backups:
# Backup PostgreSQL echo "Backing up PostgreSQL..." -docker compose exec -T postgres pg_dump -U n8n_user n8n | gzip > "$BACKUP_DIR/database.sql.gz" +docker compose exec -T postgres pg_dump -U n8n_user n8n | gzip | gpg --symmetric --cipher-algo AES256 -o "$BACKUP_DIR/database.sql.gz.gpg" # Backup N8N workflows echo "Backing up N8N workflows..." docker compose exec -T n8n n8n export:workflow --all --output=/home/node/.n8n/backup.json 2>/dev/null || true -docker cp judicial-n8n:/home/node/.n8n/backup.json "$BACKUP_DIR/workflows.json" 2>/dev/null || true +docker cp judicial-n8n:/home/node/.n8n/backup.json - 2>/dev/null | gpg --symmetric --cipher-algo AES256 -o "$BACKUP_DIR/workflows.json.gpg" || true # Backup configuration echo "Backing up configuration..." -tar -czf "$BACKUP_DIR/config.tar.gz" -C /opt/judicial-automation .env config/ +tar -czf - -C /opt/judicial-automation .env config/ | gpg --symmetric --cipher-algo AES256 -o "$BACKUP_DIR/config.tar.gz.gpg"Add a note that the GPG passphrase must be securely stored.
262-265: Ensure cron job has proper path context for docker-compose.The crontab entry at line 264 references
docker-composebut cron jobs may not have the same PATH as interactive shells:- # Add: 0 3 * * * certbot renew --quiet --post-hook "docker compose -f /opt/judicial-automation/docker-compose.yml restart nginx" + # Add: 0 3 * * * /usr/local/bin/docker-compose -f /opt/judicial-automation/docker-compose.yml down && /usr/local/bin/docker-compose -f /opt/judicial-automation/docker-compose.yml up -d nginx + # Or use full paths: 0 3 * * * certbot renew --quiet --post-hook "sh -c 'cd /opt/judicial-automation && docker compose restart nginx'"Alternatively, create a wrapper script and reference it:
cat > /opt/judicial-automation/scripts/shell/certbot-renew.sh << 'EOF' #!/bin/bash cd /opt/judicial-automation docker compose restart nginx EOF chmod +x /opt/judicial-automation/scripts/shell/certbot-renew.sh # Then crontab: # 0 3 * * * certbot renew --quiet --post-hook "/opt/judicial-automation/scripts/shell/certbot-renew.sh"
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
.env.template(1 hunks)README.md(7 hunks)docs/DEPLOYMENT.md(1 hunks)docs/SECURITY.md(1 hunks)
🧰 Additional context used
🪛 dotenv-linter (4.0.0)
.env.template
[warning] 12-12: [UnorderedKey] The N8N_PASSWORD key should go before the N8N_PORT key
(UnorderedKey)
[warning] 20-20: [UnorderedKey] The POSTGRES_DB key should go before the POSTGRES_HOST key
(UnorderedKey)
[warning] 22-22: [UnorderedKey] The POSTGRES_PASSWORD key should go before the POSTGRES_PORT key
(UnorderedKey)
[warning] 29-29: [UnorderedKey] The REDIS_PASSWORD key should go before the REDIS_PORT key
(UnorderedKey)
[warning] 37-37: [UnorderedKey] The OPENAI_MAX_TOKENS key should go before the OPENAI_MODEL key
(UnorderedKey)
[warning] 43-43: [UnorderedKey] The DATAJUD_PASSWORD key should go before the DATAJUD_USERNAME key
(UnorderedKey)
[warning] 44-44: [UnorderedKey] The DATAJUD_BASE_URL key should go before the DATAJUD_PASSWORD key
(UnorderedKey)
[warning] 57-57: [UnorderedKey] The GOOGLE_LOCATION key should go before the GOOGLE_PROJECT_ID key
(UnorderedKey)
[warning] 58-58: [UnorderedKey] The GOOGLE_PROCESSOR_ID key should go before the GOOGLE_PROJECT_ID key
(UnorderedKey)
[warning] 66-66: [UnorderedKey] The SMTP_PASSWORD key should go before the SMTP_PORT key
(UnorderedKey)
[warning] 67-67: [UnorderedKey] The SMTP_FROM_EMAIL key should go before the SMTP_HOST key
(UnorderedKey)
[warning] 68-68: [UnorderedKey] The SMTP_FROM_NAME key should go before the SMTP_HOST key
(UnorderedKey)
[warning] 68-68: [ValueWithoutQuotes] This value needs to be surrounded in quotes
(ValueWithoutQuotes)
[warning] 74-74: [UnorderedKey] The GRAFANA_PASSWORD key should go before the GRAFANA_USER key
(UnorderedKey)
[warning] 75-75: [UnorderedKey] The GF_SERVER_ROOT_URL key should go before the GRAFANA_PASSWORD key
(UnorderedKey)
[warning] 82-82: [UnorderedKey] The JWT_ALGORITHM key should go before the JWT_SECRET key
(UnorderedKey)
[warning] 83-83: [UnorderedKey] The JWT_EXPIRY_HOURS key should go before the JWT_SECRET key
(UnorderedKey)
[warning] 98-98: [UnorderedKey] The FLASK_DEBUG key should go before the FLASK_ENV key
(UnorderedKey)
🪛 Gitleaks (8.30.0)
docs/SECURITY.md
[high] 99-99: Detected a Generic API Key, potentially exposing access to various services and sensitive operations.
(generic-api-key)
🪛 LanguageTool
docs/SECURITY.md
[style] ~117-~117: This adverb was used twice in the sentence. Consider removing one of them or replacing them with a synonym.
Context: ...ogs 2. Use HTTPS only - Tokens must only be transmitted over TLS 3. **Short expi...
(ADVERB_REPETITION_PREMIUM)
[uncategorized] ~119-~119: If this is a compound adjective that modifies the following noun, use a hyphen.
Context: ...re storage** - Store tokens securely on client side 5. Revocation - Implement token bla...
(EN_COMPOUND_ADJECTIVE_INTERNAL)
🪛 markdownlint-cli2 (0.18.1)
docs/SECURITY.md
26-26: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
446-446: Bare URL used
(MD034, no-bare-urls)
551-551: Bare URL used
(MD034, no-bare-urls)
🔇 Additional comments (4)
docs/SECURITY.md (1)
1-560: Comprehensive and well-structured security guide.The security documentation provides excellent coverage of authentication, network isolation, data protection, input validation, monitoring, and LGPD compliance. The architecture diagrams and code examples are clear and actionable. The security checklist and incident response procedures add practical value for operational teams.
README.md (2)
157-243: Comprehensive and well-structured JWT authentication documentation.The README updates properly document the new JWT-centric authentication approach with clear examples, a token endpoint, usage patterns, and infrastructure security considerations. The addition of the infrastructure security diagram and the updated LGPD compliance section enhance the overall documentation quality.
157-169: JWT authentication implementation verified and documented correctly.The
/auth/tokenendpoint exists and all documented API endpoints properly enforce JWT authentication:
- ✓
/auth/tokenendpoint implemented atscripts/python/app.py:198with correct response format- ✓ All API endpoints (
/extract-pdf,/firac-analysis,/datajud-search,/distinguish-analysis,/generate-document) use@require_authdecorator- ✓ JWT helper functions implemented with proper token creation, validation, and expiry handling (default 24 hours)
- ✓ Bearer token validation correctly extracts and verifies tokens
Note: The
/healthendpoint is publicly accessible without authentication and is not documented in the README. Consider documenting this explicitly if it's intentionally public for monitoring purposes.docs/DEPLOYMENT.md (1)
1-730: Comprehensive and production-ready deployment documentation.The deployment guide provides excellent step-by-step coverage from server preparation through post-deployment verification. Service startup order is properly sequenced, health checks are included, SSL/TLS setup covers both Let's Encrypt and self-signed scenarios, and troubleshooting section is practical. The guide properly emphasizes security hardening, firewall configuration, and backup strategies.
Summary by CodeRabbit
New Features
Infrastructure
Docs & Tests
Chores
✏️ Tip: You can customize this high-level summary in your review settings.