Skip to content

Conversation

@AkhileshNegi
Copy link
Collaborator

@AkhileshNegi AkhileshNegi commented Dec 9, 2025

Summary

Target issue is #475

Checklist

Before submitting a pull request, please ensure that you mark these task.

  • Ran fastapi run --reload app/main.py or docker compose up in the repository root and test.
  • If you've fixed a bug or added code that is tested and has test cases.

Notes

Please add here if any other information is required for the reviewer.

Summary by CodeRabbit

  • New Features

    • API keys now include user association and lifecycle metadata (creation, update, deletion).
    • Assistants gain richer configuration (temperature, result limits, vector store IDs).
  • Improvements

    • Audit timestamps and soft-delete flags added across many models for better lifecycle tracking.
    • Documents track filenames, storage URLs and lineage.
    • Projects, organizations, collections, jobs, evaluations, fine-tuning, and conversations now include richer metadata and clearer relational handling.

✏️ Tip: You can customize this high-level summary in your review settings.

@AkhileshNegi AkhileshNegi self-assigned this Dec 9, 2025
@coderabbitai
Copy link

coderabbitai bot commented Dec 9, 2025

Walkthrough

Added column-level comments, timestamp auditing, and soft-delete flags across many SQLModel ORM models; introduced several new FK fields, refined typing to union-style annotations, and removed minor unused imports. No behavioral logic changes; changes are metadata, schema enrichment, and public-model type refinements.

Changes

Cohort / File(s) Change Summary
API Keys & Users
backend/app/models/api_key.py, backend/app/models/user.py
Added user_id FK to APIKey; documented columns (key_prefix, key_hash, is_deleted, inserted_at/updated_at/deleted_at) via sa_column_kwargs; APIKey and User models gained docstrings and explicit Field/column metadata; removed unused imports.
Assistants & Collections
backend/app/models/assistants.py, backend/app/models/collection.py, backend/app/models/collection_job.py
Assistant model expanded (id, assistant_id, instructions as Text, model, ARRAY vector_store_ids, temperature, max_num_results, is_deleted, FKs, timestamps, relationships); Collection gained llm_service_id/name, timestamps, relationships; CollectionJob added task_id, trace_id, error_message, timestamps and column comments.
Documents & Document Collections
backend/app/models/document.py, backend/app/models/document_collection.py
DocumentBase added fname; Document added object_store_url, is_deleted, source_document_id (self FK), inserted_at/updated_at/deleted_at and column comments; DocumentCollection annotated id and FK columns with comments and added docstring.
Batch Jobs & Evaluations
backend/app/models/batch_job.py, backend/app/models/evaluation.py
Added per-column sa_column_kwargs comments, explicit JSONB/Text Column usages, timestamp auditing, and FK metadata; EvaluationDataset/Run fields updated with comments and explicit nullable/JSON handling.
Fine-tuning & Model Evaluation
backend/app/models/fine_tuning.py, backend/app/models/model_evaluation.py
Fine_Tuning enriched with column comments, is_deleted, timestamps, FK fields (document_id, project_id, organization_id), union-type refinements and new public response model; ModelEvaluation gained FKs (fine_tuning_id, document_id), soft-delete, timestamps, Text/JSON columns, and expanded public model.
Jobs, Threads & Conversations
backend/app/models/job.py, backend/app/models/threads.py, backend/app/models/openai_conversation.py
Job model added created_at/updated_at and column comments; Threads and OpenAIConversation converted optionals to union types, added comments, timestamps, soft-delete, and expanded public conversation model and relationships.
Credentials & Doc Transformation Jobs
backend/app/models/credentials.py, backend/app/models/doc_transformation_job.py
Credentials: credential typing updated to dict[str, Any], added is_active and FK comments, typed relationships; DocTransformationJob: added status, task_id, trace_id, error_message, source/transformed document FKs, timestamps and comments.
Organization, Project & Config
backend/app/models/organization.py, backend/app/models/project.py, backend/app/models/config/*.py
Organization and Project gained sa_column_kwargs comments, explicit id Fields and inserted_at/updated_at auditing; Project added storage_path UUID. Config and ConfigVersion fields received column comments and JSONB comment enrichment.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

  • Focus areas needing extra attention:
    • Foreign-key definitions, ondelete behavior, and relationship back_populates across fine_tuning, model_evaluation, document, collection, and assistants.
    • JSONB/Text column definitions, nullability, and explicit Column usages in evaluation.py, batch_job.py, and model_evaluation.py.
    • Soft-delete and timestamp additions (migration/compatibility considerations) across many models.
    • New/changed public response models (e.g., FineTuningJobPublic, OpenAIConversationPublic) for serialization and API compatibility.

Possibly related PRs

Suggested labels

enhancement, ready-for-review

Suggested reviewers

  • Prajna1999
  • nishika26
  • avirajsingh7

Poem

🐇 I hop through columns, carve a tiny note,
Timestamps bloom where quiet rows once wrote,
FKs twitch like vines that softly wind,
Soft-delete naps leave shadows behind,
— a rabbit cheers the schema's tidy coat.

Pre-merge checks and finishing touches

❌ Failed checks (1 warning, 1 inconclusive)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 40.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
Title check ❓ Inconclusive The title 'Kaapi v1.0: Database Comments' is vague and does not clearly communicate the main change. It uses a generic version identifier without describing what is actually changing in the codebase. Revise the title to be more specific about the changes, such as 'Add database column comments to ORM models' or 'Document database schema with column-level comments'.
✅ Passed checks (1 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch enhancement/db-comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@AkhileshNegi AkhileshNegi linked an issue Dec 9, 2025 that may be closed by this pull request
@codecov
Copy link

codecov bot commented Dec 9, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

Copy link
Collaborator

@avirajsingh7 avirajsingh7 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

    project_id: int = Field(
        foreign_key="project.id", nullable=False, ondelete="CASCADE", sa_column_kwargs={"comment": "List of Vector Store IDs that exist in OpenAI."}
    )

We can use this way also

@AkhileshNegi AkhileshNegi marked this pull request as ready for review December 11, 2025 08:23
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 8

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
backend/app/models/document.py (1)

72-85: Duplicate signed_url field definition.

The signed_url field is defined twice in DocumentPublic (lines 74-76 and 83-85). This is likely a copy-paste error. Remove the duplicate:

 class DocumentPublic(DocumentBase):
     id: UUID = Field(description="The unique identifier of the document")
     signed_url: str | None = Field(
         default=None, description="A signed URL for accessing the document"
     )
     inserted_at: datetime = Field(
         description="The timestamp when the document was inserted"
     )
     updated_at: datetime = Field(
         description="The timestamp when the document was last updated"
     )
-    signed_url: str | None = Field(
-        default=None, description="A signed URL for accessing the document"
-    )
🧹 Nitpick comments (13)
backend/app/models/collection_job.py (2)

55-59: Consider adding explicit default for nullable field.

The error_message field is typed as str | None and the column is nullable, but no default=None is specified. While SQLModel may handle this implicitly, adding an explicit default improves clarity and ensures consistent behavior when creating instances.

     error_message: str | None = Field(
+        default=None,
         sa_column=Column(
             Text, nullable=True, comment="Error message if the job failed"
         ),
     )

118-118: Minor formatting issue in comment.

Missing space after ## in the comment.

-##Response models
+# Response models
backend/app/models/user.py (1)

73-75: Consider adding nullable=False to hashed_password field.

The hashed_password field lacks an explicit nullable specification. While SQLModel defaults to nullable based on the type hint (str implies not null), explicitly setting nullable=False would be consistent with other fields in this PR and make the constraint clear.

     hashed_password: str = Field(
+        nullable=False,
         sa_column_kwargs={"comment": "Bcrypt hash of the user's password"},
     )
backend/app/alembic/versions/8d3c3e8ce7b8_add_db_comments.py (1)

10-10: Remove unused import.

sqlmodel.sql.sqltypes is imported but not used in this migration file.

 from alembic import op
 import sqlalchemy as sa
-import sqlmodel.sql.sqltypes
 from sqlalchemy.dialects import postgresql
backend/app/models/doc_transformation_job.py (2)

18-70: Consider adding soft-delete fields for consistency.

Unlike Fine_Tuning and other models in this PR, DocTransformationJob lacks is_deleted and deleted_at fields. If the codebase follows a soft-delete pattern consistently, consider adding these fields. If transformation jobs are intentionally hard-deleted (as transient records), this is fine as-is.


39-43: Minor: Duplicate description in description and sa_column_kwargs comment.

The trace_id field has the same text in both description (Pydantic) and sa_column_kwargs["comment"] (database). This is harmless but slightly redundant.

backend/app/models/openai_conversation.py (1)

180-183: Consider migrating to model_config for Pydantic v2 compatibility.

The Config inner class is Pydantic v1 style. For Pydantic v2 (used by modern SQLModel), prefer the model_config attribute:

 class OpenAIConversationPublic(OpenAIConversationBase):
     """Public model for OpenAIConversation without sensitive fields"""

     id: int
     inserted_at: datetime
     updated_at: datetime

-    class Config:
-        from_attributes = True
-        populate_by_name = True
-        use_enum_values = True
+    model_config = {
+        "from_attributes": True,
+        "populate_by_name": True,
+        "use_enum_values": True,
+    }
backend/app/models/assistants.py (2)

15-18: __table_args__ in base class may cause issues.

Defining __table_args__ with a UniqueConstraint in AssistantBase (a non-table model) is unusual. SQLModel/SQLAlchemy typically expects table arguments on the table model itself. The constraint references project_id and assistant_id, but this base class is also inherited by create/update schemas that don't have these as database columns.

Consider moving this to the Assistant table class only:

 class AssistantBase(SQLModel):
-    __table_args__ = (
-        UniqueConstraint("project_id", "assistant_id", name="uq_project_assistant_id"),
-    )
-
     assistant_id: str = Field(index=True)
     # ... rest of fields
 class Assistant(AssistantBase, table=True):
     """OpenAI assistant configuration and metadata."""

     __tablename__ = "openai_assistant"
+    __table_args__ = (
+        UniqueConstraint("project_id", "assistant_id", name="uq_project_assistant_id"),
+    )

     id: int = Field(

54-58: Minor: Two different comment patterns used for instructions.

The instructions field uses sa_column=Column(Text, nullable=False, comment=...) with an inline comment, while other fields use sa_column_kwargs={"comment": ...}. Both work, but consider using a consistent pattern across all fields for maintainability.

backend/app/models/collection.py (2)

11-12: Potential circular import risk with direct imports.

Unlike other models in this PR that use TYPE_CHECKING for relationship types, Organization and Project are imported directly. This works if these modules don't import Collection at module level, but for consistency and safety, consider using TYPE_CHECKING pattern:

-from .organization import Organization
-from .project import Project
+from typing import TYPE_CHECKING
+
+if TYPE_CHECKING:
+    from .organization import Organization
+    from .project import Project

Then update the relationship type hints to use string references: organization: "Organization".


48-60: Add nullable=False to timestamp fields for consistency.

Other models in this PR (e.g., Project, BatchJob) explicitly set nullable=False on inserted_at and updated_at. For consistency:

     inserted_at: datetime = Field(
         default_factory=now,
+        nullable=False,
         sa_column_kwargs={"comment": "Timestamp when the collection was created"},
     )
     updated_at: datetime = Field(
         default_factory=now,
+        nullable=False,
         sa_column_kwargs={"comment": "Timestamp when the collection was last updated"},
     )
backend/app/models/batch_job.py (1)

134-136: Relationships lack back_populates configuration.

Unlike other models in this PR (e.g., Project, Collection), these relationships don't specify back_populates. This means bidirectional navigation won't work from Organization/Project to BatchJob. If this is intentional (unidirectional relationship), consider adding a brief comment to clarify.

backend/app/models/model_evaluation.py (1)

25-30: fine_tuning_id is defined in both base and table class.

fine_tuning_id is defined in ModelEvaluationBase (lines 26-30) and redefined in ModelEvaluation (lines 94-99) with additional sa_column_kwargs. While SQLModel allows field overriding, this can be confusing. Consider either:

  1. Removing the definition from ModelEvaluationBase if it's only used by the table model
  2. Adding the comment kwargs in the base class if they should apply everywhere
 class ModelEvaluationBase(SQLModel):
     fine_tuning_id: int = Field(
         foreign_key="fine_tuning.id",
         nullable=False,
         ondelete="CASCADE",
+        sa_column_kwargs={"comment": "Reference to the fine-tuning job"},
     )

Then remove the duplicate definition from ModelEvaluation.

Also applies to: 94-99

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 30ef268 and 2176085.

📒 Files selected for processing (19)
  • backend/app/alembic/versions/8d3c3e8ce7b8_add_db_comments.py (1 hunks)
  • backend/app/models/api_key.py (2 hunks)
  • backend/app/models/assistants.py (3 hunks)
  • backend/app/models/batch_job.py (2 hunks)
  • backend/app/models/collection.py (4 hunks)
  • backend/app/models/collection_job.py (2 hunks)
  • backend/app/models/credentials.py (5 hunks)
  • backend/app/models/doc_transformation_job.py (2 hunks)
  • backend/app/models/document.py (2 hunks)
  • backend/app/models/document_collection.py (1 hunks)
  • backend/app/models/evaluation.py (5 hunks)
  • backend/app/models/fine_tuning.py (3 hunks)
  • backend/app/models/job.py (2 hunks)
  • backend/app/models/model_evaluation.py (2 hunks)
  • backend/app/models/openai_conversation.py (4 hunks)
  • backend/app/models/organization.py (2 hunks)
  • backend/app/models/project.py (2 hunks)
  • backend/app/models/threads.py (1 hunks)
  • backend/app/models/user.py (2 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Use type hints in Python code (Python 3.11+ project)

Files:

  • backend/app/alembic/versions/8d3c3e8ce7b8_add_db_comments.py
  • backend/app/models/doc_transformation_job.py
  • backend/app/models/job.py
  • backend/app/models/document_collection.py
  • backend/app/models/openai_conversation.py
  • backend/app/models/project.py
  • backend/app/models/organization.py
  • backend/app/models/threads.py
  • backend/app/models/collection.py
  • backend/app/models/credentials.py
  • backend/app/models/fine_tuning.py
  • backend/app/models/document.py
  • backend/app/models/api_key.py
  • backend/app/models/batch_job.py
  • backend/app/models/assistants.py
  • backend/app/models/user.py
  • backend/app/models/collection_job.py
  • backend/app/models/evaluation.py
  • backend/app/models/model_evaluation.py
backend/app/models/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Define SQLModel entities (database tables and domain objects) in backend/app/models/

Files:

  • backend/app/models/doc_transformation_job.py
  • backend/app/models/job.py
  • backend/app/models/document_collection.py
  • backend/app/models/openai_conversation.py
  • backend/app/models/project.py
  • backend/app/models/organization.py
  • backend/app/models/threads.py
  • backend/app/models/collection.py
  • backend/app/models/credentials.py
  • backend/app/models/fine_tuning.py
  • backend/app/models/document.py
  • backend/app/models/api_key.py
  • backend/app/models/batch_job.py
  • backend/app/models/assistants.py
  • backend/app/models/user.py
  • backend/app/models/collection_job.py
  • backend/app/models/evaluation.py
  • backend/app/models/model_evaluation.py
🧬 Code graph analysis (15)
backend/app/models/doc_transformation_job.py (2)
backend/app/crud/rag/open_ai.py (1)
  • default (31-32)
backend/app/core/util.py (1)
  • now (13-14)
backend/app/models/job.py (1)
backend/app/core/util.py (1)
  • now (13-14)
backend/app/models/openai_conversation.py (3)
backend/app/core/util.py (1)
  • now (13-14)
backend/app/models/organization.py (1)
  • Organization (44-82)
backend/app/models/project.py (1)
  • Project (51-107)
backend/app/models/project.py (1)
backend/app/core/util.py (1)
  • now (13-14)
backend/app/models/organization.py (1)
backend/app/core/util.py (1)
  • now (13-14)
backend/app/models/threads.py (2)
backend/app/core/util.py (1)
  • now (13-14)
backend/app/crud/rag/open_ai.py (1)
  • default (31-32)
backend/app/models/collection.py (3)
backend/app/core/util.py (1)
  • now (13-14)
backend/app/models/organization.py (1)
  • Organization (44-82)
backend/app/models/project.py (1)
  • Project (51-107)
backend/app/models/credentials.py (3)
backend/app/core/util.py (1)
  • now (13-14)
backend/app/models/organization.py (1)
  • Organization (44-82)
backend/app/models/project.py (1)
  • Project (51-107)
backend/app/models/fine_tuning.py (3)
backend/app/core/util.py (1)
  • now (13-14)
backend/app/models/model_evaluation.py (1)
  • ModelEvaluation (42-137)
backend/app/models/project.py (1)
  • Project (51-107)
backend/app/models/document.py (1)
backend/app/core/util.py (1)
  • now (13-14)
backend/app/models/api_key.py (1)
backend/app/core/util.py (1)
  • now (13-14)
backend/app/models/batch_job.py (1)
backend/app/core/util.py (1)
  • now (13-14)
backend/app/models/assistants.py (3)
backend/app/core/util.py (1)
  • now (13-14)
backend/app/models/organization.py (1)
  • Organization (44-82)
backend/app/models/project.py (1)
  • Project (51-107)
backend/app/models/collection_job.py (2)
backend/app/core/util.py (1)
  • now (13-14)
backend/app/models/collection.py (2)
  • CollectionIDPublic (166-167)
  • CollectionPublic (170-179)
backend/app/models/model_evaluation.py (3)
backend/app/core/util.py (1)
  • now (13-14)
backend/app/models/fine_tuning.py (1)
  • Fine_Tuning (59-151)
backend/app/models/project.py (1)
  • Project (51-107)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: checks (3.11.7, 6)
🔇 Additional comments (46)
backend/app/models/collection_job.py (4)

1-8: LGTM!

Imports are appropriate for the model definition. The Column and Text imports from sqlmodel are correctly used for the error_message field, and the collection types are used in the response models.


23-45: LGTM!

The model docstring and field definitions are well-structured. The sa_column_kwargs comments provide useful database-level documentation, and fields have appropriate type hints and constraints.


102-132: LGTM!

Request and response models are well-structured with appropriate type annotations. The use of str | None union syntax is consistent with Python 3.9+ style as per project guidelines.


82-87: No changes needed.

The updated_at field is intentionally managed at the application layer, not the database layer. Every update operation in the CRUD layer explicitly sets updated_at = now() before committing changes (see backend/app/crud/jobs.py, backend/app/crud/fine_tuning.py, backend/app/crud/model_evaluation.py, backend/app/crud/credentials.py, backend/app/crud/document/document.py, backend/app/crud/document/doc_transformation_job.py, and backend/app/crud/evaluations/core.py). This is a consistent, intentional pattern across the entire codebase. Adding onupdate=now to sa_column_kwargs would conflict with this design by creating duplicate timestamp updates at both the database and application layers.

Likely an incorrect or invalid review comment.

backend/app/models/threads.py (3)

8-13: LGTM! Modern type annotations used correctly.

The base class uses Python 3.10+ union syntax (str | None) for optional fields, which aligns with the project's Python 3.11+ requirement.


20-27: LGTM! Proper table model definition.

The table model correctly inherits from the base class and adds a primary key with appropriate column comment metadata.


28-47: Field redefinitions add database comments as intended.

The fields inherited from OpenAIThreadBase are redefined here to add column-level comments via sa_column_kwargs. While this creates some duplication between the base and table classes, it's a valid SQLModel pattern where:

  • The base class provides Pydantic validation/serialization
  • The table class adds database-specific metadata

Note: thread_id includes index=True, unique=True in both the base (line 9) and here (lines 29-30), which is redundant but harmless.

backend/app/models/job.py (1)

3-5: Imports, base model setup, and primary key metadata look good

Using UUID/uuid4 with Field(..., primary_key=True) and adding a clear SQL comment on id provides strong typing and better DB introspection. Explicit imports for Field/SQLModel and the concise Job docstring are also aligned with the project’s typing and documentation goals.

Also applies to: 23-23, 27-31

backend/app/models/fine_tuning.py (2)

59-111: LGTM on field definitions with database comments.

The field definitions with sa_column_kwargs comments are well-structured and provide clear documentation for the database schema. The soft-delete pattern with is_deleted and deleted_at fields is consistent with other models in the PR.


119-130: ondelete parameter is valid in SQLModel Field() and works as expected throughout the codebase.

The ondelete parameter passed directly to Field() is correctly processed and creates proper cascade delete constraints in the database. This pattern is used consistently across 40+ model files in the codebase and is functioning correctly. No changes are needed.

backend/app/models/user.py (1)

6-29: LGTM on UserBase field definitions.

The field definitions with descriptive comments and proper constraints (unique, index, max_length) are well-structured. The modern union type syntax (str | None) aligns with coding guidelines for Python 3.11+.

backend/app/alembic/versions/8d3c3e8ce7b8_add_db_comments.py (2)

82-148: LGTM on downgrade implementation.

The downgrade function correctly reverts all column comments to None while documenting the original values via existing_comment. This ensures clean rollback capability.


20-79: Correct the migration file reference and remove the unfounded verification request.

The review comment references migration file 8d3c3e8ce7b8_add_db_comments.py, which does not exist. The actual migration file is 707b8035b64c_add_db_comments.py, and it already covers all the mentioned models: Fine_Tuning, User, DocTransformationJob, and DocumentCollection. Each of these tables has its columns properly commented in the migration with their corresponding sa_column_kwargs definitions from the models.

backend/app/models/document_collection.py (1)

6-15: LGTM on model structure and docstring.

The junction table model is well-documented with a clear docstring and appropriate column comments.

backend/app/models/doc_transformation_job.py (2)

72-82: LGTM on property accessors.

The job_id, job_inserted_at, and job_updated_at properties provide clean aliases for the underlying fields.


50-60: Two foreign keys to the same table—consider sa_relationship_kwargs if you add relationships.

Both source_document_id and transformed_document_id reference document.id. This is valid without explicit relationships, but if you later add Relationship definitions, you'll need sa_relationship_kwargs with foreign_keys to disambiguate which FK each relationship uses (see the pattern in EvaluationRun model).

backend/app/models/organization.py (2)

17-29: LGTM! Well-documented base model.

The OrganizationBase class now includes a clear docstring and descriptive SQL column comments via sa_column_kwargs. The field definitions follow SQLModel best practices.


44-65: LGTM! Properly structured Organization model with comprehensive metadata.

The model correctly:

  • Uses default=None with primary_key=True for auto-increment ID (standard SQLModel pattern)
  • Applies default_factory=now for timestamps
  • Includes descriptive column comments for database documentation
backend/app/models/credentials.py (3)

9-12: LGTM! Proper TYPE_CHECKING guard for forward references.

Using TYPE_CHECKING to guard imports prevents circular import issues at runtime while maintaining type safety for static analysis.


14-37: LGTM! Well-structured base model with comprehensive field documentation.

The CredsBase model properly:

  • Adds the is_active flag with a descriptive comment
  • Documents foreign key relationships with column comments
  • Uses appropriate nullable=False and ondelete="CASCADE" constraints

83-116: LGTM! Credential model follows consistent patterns.

The model correctly implements:

  • Primary key with int | None type and default=None
  • Indexed provider field with proper constraints
  • Encrypted credential field with clear documentation
  • Timestamps with default_factory=now
  • Relationship typing using union syntax ("Organization | None")
backend/app/models/openai_conversation.py (2)

62-141: LGTM! Comprehensive conversation model with soft-delete support.

The model properly implements:

  • All fields from base with added sa_column_kwargs for documentation
  • Soft-delete pattern with is_deleted flag and deleted_at timestamp
  • Consistent foreign key definitions with CASCADE delete
  • Proper timestamp fields with default_factory=now

139-141: Relationships in openai_conversation.py match the project's standard pattern.

The relationships are correctly typed as non-optional ("Project" and "Organization"). While credentials.py does use optional relationships ("Organization | None" and "Project | None"), this is an outlier. The majority of models in backend/app/models/ (assistants.py, fine_tuning.py, model_evaluation.py, evaluation.py, collection.py, and others) use non-optional relationship types. The current implementation is consistent with the predominant pattern across the codebase.

backend/app/models/api_key.py (3)

9-32: LGTM! Well-documented base model with foreign key fields.

The APIKeyBase properly documents all foreign key relationships with appropriate constraints (nullable=False, ondelete="CASCADE") and descriptive column comments.


48-89: LGTM! APIKey model with UUID primary key and comprehensive metadata.

The model correctly implements:

  • UUID-based primary key with uuid4 factory
  • Indexed and unique key_prefix for efficient lookups
  • Secure key_hash field for bcrypt storage
  • Soft-delete pattern with is_deleted and deleted_at
  • Consistent timestamp fields

25-32: User model is properly defined and foreign key reference is valid.

The User model exists at backend/app/models/user.py:65 and is configured as a SQLModel table entity with table=True. The foreign key reference to user.id is correct and will function as expected.

backend/app/models/assistants.py (2)

10-13: LGTM! Proper TYPE_CHECKING guard for circular import prevention.

The conditional imports correctly guard Organization and Project to avoid runtime circular dependencies while preserving type hints.


37-115: LGTM! Comprehensive Assistant model with rich metadata.

The model properly implements:

  • Clear docstring and field-level documentation
  • PostgreSQL-specific ARRAY(String) for vector_store_ids
  • Text column type for instructions with inline comment
  • Soft-delete pattern with is_deleted and deleted_at
  • Consistent foreign key and timestamp patterns
backend/app/models/project.py (3)

20-35: LGTM!

The base model fields are well-documented with descriptive column comments. The use of sa_column_kwargs for database-level documentation is consistent with the broader schema enrichment effort.


58-68: LGTM!

The id and storage_path fields are properly configured. Using uuid4 as the default factory for storage_path ensures unique cloud storage paths.


91-107: LGTM!

The relationships are properly defined with back_populates and cascade_delete=True for dependent entities. The TYPE_CHECKING imports prevent circular import issues.

backend/app/models/collection.py (2)

18-32: LGTM!

The UUID primary key with uuid4 factory and LLM service fields are well-documented. The column comments clearly describe the purpose of each field.


89-106: LGTM!

The AssistantOptions model with the @model_validator properly enforces that both model and instructions must be provided together or neither.

backend/app/models/batch_job.py (3)

24-28: LGTM!

The primary key definition follows the correct SQLModel pattern with int | None type and default=None for auto-increment behavior.


46-56: LGTM!

Using sa_column with explicit Column(JSONB, ...) is the correct approach for PostgreSQL-specific types. The comment is appropriately placed within the Column definition.


100-104: LGTM!

Using sa_column with Column(Text, ...) is appropriate for potentially long error messages that may exceed VARCHAR limits.

backend/app/models/document.py (2)

10-25: LGTM!

The DocumentBase model is well-structured with proper field documentation. The project_id foreign key with ondelete="CASCADE" ensures referential integrity.


28-69: LGTM!

The Document model is well-designed with:

  • UUID primary key with auto-generation
  • Self-referential FK for document transformations
  • Soft-delete pattern with is_deleted flag and deleted_at timestamp
  • Consistent timestamp fields
backend/app/models/model_evaluation.py (2)

47-91: LGTM!

The field definitions are well-documented with appropriate column comments. The use of sa_column for Text and JSON column types is correct. The soft-delete pattern with is_deleted and deleted_at is consistent with other models.


118-133: LGTM!

The timestamp fields are properly configured with now() default factory and descriptive column comments. The soft-delete deleted_at field correctly allows null values.

backend/app/models/evaluation.py (6)

86-102: LGTM!

The field definitions correctly use SQLField with appropriate sa_column_kwargs for database comments. Type hints follow modern Python 3.9+ syntax.


105-130: LGTM!

Appropriate use of sa_column with explicit Column(JSONB) for the metadata field, and sa_column_kwargs for simpler string fields. Comments are descriptive and consistent.


133-160: LGTM!

Foreign key definitions and timestamp fields are correctly configured with SQLField, proper cascade behavior, and nullable=False constraints on timestamps.


179-205: LGTM!

Field definitions correctly use SQLField and appropriate column configurations for JSONB types.


208-274: LGTM!

Foreign key relationships, JSONB columns for scores, and Text column for error messages are all properly configured with appropriate nullability and cascade behaviors.


304-317: LGTM!

Relationships are correctly configured with appropriate back_populates and foreign_keys specifications for the dual batch job references.

Comment on lines 46 to 49
task_id: str = Field(
nullable=True,
sa_column_kwargs={"comment": "Celery task ID for async processing"},
)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Type annotation inconsistent with nullability.

The task_id field is typed as str but has nullable=True, which means the database allows NULL values. The Python type annotation should reflect this to avoid type checker warnings and runtime confusion.

Apply this diff to fix the type annotation:

-    task_id: str = Field(
+    task_id: str | None = Field(
+        default=None,
         nullable=True,
         sa_column_kwargs={"comment": "Celery task ID for async processing"},
     )
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
task_id: str = Field(
nullable=True,
sa_column_kwargs={"comment": "Celery task ID for async processing"},
)
task_id: str | None = Field(
default=None,
nullable=True,
sa_column_kwargs={"comment": "Celery task ID for async processing"},
)
🤖 Prompt for AI Agents
In backend/app/models/collection_job.py around lines 46 to 49, the task_id field
is annotated as str but marked nullable=True; update the type annotation to
reflect nullability (e.g., Optional[str] or str | None), and add or update the
necessary import from typing (Optional) if not already present so the annotation
and database nullability are consistent.

Comment on lines 291 to 302
inserted_at: datetime = Field(
default_factory=now,
description="The timestamp when the evaluation run was started",
sa_column_kwargs={"comment": "Timestamp when the evaluation run was started"},
)
updated_at: datetime = Field(
default_factory=now,
description="The timestamp when the evaluation run was last updated",
sa_column_kwargs={
"comment": "Timestamp when the evaluation run was last updated"
},
)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Use SQLField instead of pydantic's Field for table model fields.

These timestamp fields incorrectly use pydantic's Field (imported on line 4) instead of SQLField (SQLModel's Field, imported on line 7). The sa_column_kwargs parameter is SQLModel-specific and will be silently ignored when using pydantic's Field, so the database comments won't be applied.

This is inconsistent with EvaluationDataset (lines 147-160) which correctly uses SQLField for its timestamps.

Apply this diff to fix:

     # Timestamps
-    inserted_at: datetime = Field(
+    inserted_at: datetime = SQLField(
         default_factory=now,
+        nullable=False,
         description="The timestamp when the evaluation run was started",
         sa_column_kwargs={"comment": "Timestamp when the evaluation run was started"},
     )
-    updated_at: datetime = Field(
+    updated_at: datetime = SQLField(
         default_factory=now,
+        nullable=False,
         description="The timestamp when the evaluation run was last updated",
         sa_column_kwargs={
             "comment": "Timestamp when the evaluation run was last updated"
         },
     )
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
inserted_at: datetime = Field(
default_factory=now,
description="The timestamp when the evaluation run was started",
sa_column_kwargs={"comment": "Timestamp when the evaluation run was started"},
)
updated_at: datetime = Field(
default_factory=now,
description="The timestamp when the evaluation run was last updated",
sa_column_kwargs={
"comment": "Timestamp when the evaluation run was last updated"
},
)
inserted_at: datetime = SQLField(
default_factory=now,
nullable=False,
description="The timestamp when the evaluation run was started",
sa_column_kwargs={"comment": "Timestamp when the evaluation run was started"},
)
updated_at: datetime = SQLField(
default_factory=now,
nullable=False,
description="The timestamp when the evaluation run was last updated",
sa_column_kwargs={
"comment": "Timestamp when the evaluation run was last updated"
},
)
🤖 Prompt for AI Agents
In backend/app/models/evaluation.py around lines 291 to 302, the timestamp
fields use pydantic's Field instead of SQLModel's SQLField so the
sa_column_kwargs (DB comments) are ignored; replace Field with SQLField for both
inserted_at and updated_at, preserving default_factory=now, description and the
sa_column_kwargs entries so the column comments are applied (no other changes to
the field arguments or types).

Comment on lines 149 to 151
# Relationships
project: "Project" = Relationship(back_populates="fine_tuning")
model_evaluation: "ModelEvaluation" = Relationship(back_populates="fine_tuning")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

# Find and examine the Fine_Tuning model definition
find . -type f -name "*.py" -path "*/models/*" | head -20

Repository: ProjectTech4DevAI/kaapi-backend

Length of output: 798


🏁 Script executed:

# Look for the Fine_Tuning and ModelEvaluation models
rg -n "class Fine_Tuning|class ModelEvaluation" --type py -A 20 | head -100

Repository: ProjectTech4DevAI/kaapi-backend

Length of output: 6220


🏁 Script executed:

# Check for fine_tuning_id foreign key in ModelEvaluation
rg -n "fine_tuning_id" --type py -B 2 -A 2

Repository: ProjectTech4DevAI/kaapi-backend

Length of output: 16032


🏁 Script executed:

# Examine the complete relationship definitions in fine_tuning.py
cat -n backend/app/models/fine_tuning.py | head -160

Repository: ProjectTech4DevAI/kaapi-backend

Length of output: 6219


Change model_evaluation type hint to list["ModelEvaluation"].

The relationship is one-to-many: one Fine_Tuning can have multiple ModelEvaluations (confirmed by the fine_tuning_id foreign key in ModelEvaluation with ondelete="CASCADE"). The type hint should reflect this cardinality for proper Python type safety, especially given the project targets Python 3.11+.

🤖 Prompt for AI Agents
In backend/app/models/fine_tuning.py around lines 149 to 151, the Relationship
type hint for model_evaluation is incorrect for a one-to-many relationship;
change model_evaluation: "ModelEvaluation" = Relationship(...) to
model_evaluation: list["ModelEvaluation"] = Relationship(...) so the annotation
reflects multiple ModelEvaluation instances, import typing if needed for forward
references (using list[...] is fine in Python 3.11), and run type checks to
ensure no other references expect a single instance.

Comment on lines 32 to 36
task_id: str | None = Field(
nullable=True, description="Celery task ID returned when job is queued."
nullable=True,
description="Celery task ID returned when job is queued.",
sa_column_kwargs={"comment": "Celery task ID returned when job is queued"},
)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

cd backend && find . -type f -name "job.py" | head -20

Repository: ProjectTech4DevAI/kaapi-backend

Length of output: 129


🏁 Script executed:

cat -n backend/app/models/job.py

Repository: ProjectTech4DevAI/kaapi-backend

Length of output: 2772


🏁 Script executed:

rg "Job\(" --type py -A 3 -B 1 | head -100

Repository: ProjectTech4DevAI/kaapi-backend

Length of output: 6065


🏁 Script executed:

rg "Job\(" --type py -c

Repository: ProjectTech4DevAI/kaapi-backend

Length of output: 678


🏁 Script executed:

cat -n backend/app/crud/jobs.py

Repository: ProjectTech4DevAI/kaapi-backend

Length of output: 1543


🏁 Script executed:

rg "task_id\s*=" backend/app --type py -B 2 -A 1

Repository: ProjectTech4DevAI/kaapi-backend

Length of output: 24366


🏁 Script executed:

rg "JobCrud.*create" backend/app/tests --type py -B 2 -A 5

Repository: ProjectTech4DevAI/kaapi-backend

Length of output: 4126


🏁 Script executed:

rg "nullable.*True" backend/app/models --type py -B 2 -A 2 | head -60

Repository: ProjectTech4DevAI/kaapi-backend

Length of output: 3679


🏁 Script executed:

cat -n backend/app/models/doc_transformation_job.py | head -40

Repository: ProjectTech4DevAI/kaapi-backend

Length of output: 1438


Add default=None to task_id field

task_id is typed as str | None with nullable=True but lacks a default value, unlike other optional fields (trace_id, error_message) and unlike the equivalent task_id field in DocTransformationJob. This makes the field required at construction time despite its optional type annotation.

The code in backend/app/crud/jobs.py:16-19 creates Job instances without providing task_id, which will fail Pydantic validation. Align with the pattern used elsewhere in the codebase:

    task_id: str | None = Field(
+       default=None,
        nullable=True,
        description="Celery task ID returned when job is queued.",
        sa_column_kwargs={"comment": "Celery task ID returned when job is queued"},
    )
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
task_id: str | None = Field(
nullable=True, description="Celery task ID returned when job is queued."
nullable=True,
description="Celery task ID returned when job is queued.",
sa_column_kwargs={"comment": "Celery task ID returned when job is queued"},
)
task_id: str | None = Field(
default=None,
nullable=True,
description="Celery task ID returned when job is queued.",
sa_column_kwargs={"comment": "Celery task ID returned when job is queued"},
)
🤖 Prompt for AI Agents
In backend/app/models/job.py around lines 32 to 36, the task_id field is
declared as nullable (str | None) but lacks a default causing Pydantic to
require it at construction; add default=None to the Field so the attribute is
optional at instantiation (matching trace_id/error_message and
DocTransformationJob) — update the Field(...) call to include default=None while
keeping nullable=True and existing description/sa_column_kwargs.

Comment on lines +80 to 89
inserted_at: datetime = Field(
default_factory=now,
nullable=False,
sa_column_kwargs={"comment": "Timestamp when the project was created"},
)
updated_at: datetime = Field(
default_factory=now,
nullable=False,
sa_column_kwargs={"comment": "Timestamp when the project was last updated"},
)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

rg -n "updated_at" --type py -C 3 backend/app

Repository: ProjectTech4DevAI/kaapi-backend

Length of output: 50387


🏁 Script executed:

rg -n "updated_at = now()" backend/app/crud/project.py -B 5 -A 2

Repository: ProjectTech4DevAI/kaapi-backend

Length of output: 342


🏁 Script executed:

cat -n backend/app/crud/project.py

Repository: ProjectTech4DevAI/kaapi-backend

Length of output: 2934


🏁 Script executed:

rg -n "\.update.*[Pp]roject|[Pp]roject.*\.update" backend/app --type py -i | head -20

Repository: ProjectTech4DevAI/kaapi-backend

Length of output: 425


🏁 Script executed:

fd -t f "project" backend/app/api/routes --type py | xargs grep -l "project" -i

Repository: ProjectTech4DevAI/kaapi-backend

Length of output: 246


🏁 Script executed:

find backend/app/api/routes -name "*project*" -type f

Repository: ProjectTech4DevAI/kaapi-backend

Length of output: 108


🏁 Script executed:

cat -n backend/app/api/routes/project.py

Repository: ProjectTech4DevAI/kaapi-backend

Length of output: 4339


🏁 Script executed:

rg -A 10 "class ProjectUpdate" backend/app/models/project.py

Repository: ProjectTech4DevAI/kaapi-backend

Length of output: 413


The updated_at field is not being updated when projects are modified.

The PATCH endpoint for updating projects (lines 76-91 in backend/app/api/routes/project.py) does not set updated_at to the current time. Add project.updated_at = now() before persisting the changes, following the pattern used consistently throughout the codebase (e.g., config, job, document CRUD operations).

🤖 Prompt for AI Agents
In backend/app/api/routes/project.py around lines 76-91, the PATCH handler does
not update the project's updated_at timestamp; before persisting changes (before
commit/save), set project.updated_at = now() and ensure now is imported from the
same util used elsewhere in the codebase so the timestamp matches existing
patterns; then persist the project as before.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
backend/app/models/openai_conversation.py (1)

46-51: Move foreign key definitions to table model only.

The foreign key fields project_id and organization_id are defined in the base class and redeclared in the table model (lines 104-115). Examining the Project model pattern in this codebase, foreign keys should only be defined in the table model, not in the base class. Base classes should contain business logic fields, while database-specific concerns like foreign keys belong in the table model.

Remove these foreign key field definitions from OpenAIConversationBase and keep them only in the OpenAIConversation table model.

Based on coding guidelines and established patterns in this codebase.

backend/app/models/credentials.py (1)

116-131: Public model currently exposes decrypted credentials; verify this matches security expectations

to_public() decrypts self.credential and passes the full result into CredsPublic.credential, while the docstring for CredsPublic says “Public representation of credentials, excluding sensitive information.” Combined with the name CredsPublic, this strongly suggests the model may be used as a FastAPI response_model, which would expose provider API credentials to API consumers.

Unless decrypt_credentials is already redacting/masking secrets and CredsPublic is strictly internal, this is a potential secret leakage risk.

Consider one of:

  • Removing credential from CredsPublic entirely, using it only server-side, or
  • Replacing it with a redacted view (e.g., masked keys, metadata only), and adjusting to_public() accordingly, or
  • Renaming this model and tightening where it’s used if it’s not actually “public” in the API sense.

To help verify usage, you can grep for where CredsPublic is returned or used as a response model:

#!/bin/bash
rg -n "CredsPublic" -C3

Also applies to: 134-141

♻️ Duplicate comments (1)
backend/app/models/fine_tuning.py (1)

145-147: Fix model_evaluation relationship type and undefined-name warning

This side of the relationship is logically one-to-many (a fine-tuning job can have multiple evaluations via fine_tuning_id), and Ruff reports ModelEvaluation as undefined here.

Consider both:

  • Changing the annotation to a list to reflect cardinality.
  • Adding a TYPE_CHECKING-guarded import to satisfy static analysis without creating a runtime import cycle.

Example diff:

-from sqlmodel import Field, Relationship, SQLModel
+from typing import TYPE_CHECKING
+from sqlmodel import Field, Relationship, SQLModel
+
+if TYPE_CHECKING:
+    from app.models.model_evaluation import ModelEvaluation
@@
-    # Relationships
-    project: Project = Relationship(back_populates="fine_tuning")
-    model_evaluation: "ModelEvaluation" = Relationship(back_populates="fine_tuning")
+    # Relationships
+    project: Project = Relationship(back_populates="fine_tuning")
+    model_evaluation: list["ModelEvaluation"] = Relationship(
+        back_populates="fine_tuning"
+    )

This keeps runtime behavior the same, matches the one-to-many schema, and should clear the F821 warning.

🧹 Nitpick comments (6)
backend/app/models/openai_conversation.py (1)

137-138: Consider using string literals for relationship type hints.

The relationships use concrete types Project and Organization, while the reciprocal relationships in those models use string literals (e.g., list["OpenAIConversation"]). Using string literals is the more common pattern in SQLModel/SQLAlchemy to avoid potential circular import issues and maintain consistency.

Apply this diff for consistency with the established pattern:

-    project: Project = Relationship(back_populates="openai_conversations")
-    organization: Organization = Relationship(back_populates="openai_conversations")
+    project: "Project" = Relationship(back_populates="openai_conversations")
+    organization: "Organization" = Relationship(back_populates="openai_conversations")

Based on established patterns in this codebase.

backend/app/models/assistants.py (1)

35-116: Assistant DB model enrichment looks solid; consider Optional relationship typing and reducing duplication

The enriched Assistant table model (id, soft-delete flag, timestamps, FK metadata, and column comments) is consistent with the patterns used in Organization and Project (timestamps via now(), FK comments, and cascade semantics) and fits the “database comments” objective well.

Two follow‑ups to consider:

  1. Relationship typing as Optional
    For the single-object relationships, it’s more accurate (and consistent with Project.organization: Optional["Organization"]) to mark them as optional, since they can be unset on partially loaded instances:

  • project: Project = Relationship(back_populates="assistants")
  • organization: Organization = Relationship(back_populates="assistants")
  • project: Project | None = Relationship(back_populates="assistants")
  • organization: Organization | None = Relationship(back_populates="assistants")

This better reflects runtime behavior and improves type-checker friendliness. Based on the `project.py` snippet, this keeps relationship typing symmetric across models.

2. **Field duplication between `AssistantBase` and `Assistant` (optional cleanup)**  
Several scalar fields (`assistant_id`, `name`, `instructions`, `model`, `vector_store_ids`, `project_id`, `organization_id`) are defined both on `AssistantBase` (lines 17-31) and again on `Assistant` (lines 44-63, 84-95). This works because the subclass overrides the base definitions, but it increases the risk of the two drifting (e.g., defaults or validation differing later). Longer term, it may be cleaner to keep a single authoritative definition for these columns (either move them fully into `Assistant`, or split base/API models so you don’t need to override). Not urgent, but worth considering to reduce maintenance overhead.




Please check that these relationship type changes align with how `Assistant` instances are constructed/used across the codebase (especially in services and tests) before applying them.

</blockquote></details>
<details>
<summary>backend/app/models/credentials.py (2)</summary><blockquote>

`44-48`: **Align `CredsCreate.credential` type with its default**

`credential` is annotated as `dict[str, Any]` but has `default=None`, which is effectively treating it as optional at runtime while type hints say otherwise. To avoid confusion (and keep mypy/IDE support accurate), consider making the type explicitly optional.

For a minimal, backward-compatible change:

```diff
-    credential: dict[str, Any] = Field(
-        default=None,
-        description="Dictionary mapping provider names to their credentials",
-    )
+    credential: dict[str, Any] | None = Field(
+        default=None,
+        description="Dictionary mapping provider names to their credentials",
+    )

This also lines up with CredsPublic.credential being dict[str, Any] | None.


81-85: Credential ID/fields and timestamp comments are well-structured; watch the id optionality with CredsPublic

The enriched metadata on id, provider, credential, and the timestamp fields (inserted_at, updated_at) looks consistent with other models and should improve DB introspection and docs.

One subtle point: Credential.id is now int | None (which matches the default=None for a PK before flush), but CredsPublic.id is typed as plain int. If to_public() is ever called on an unpersisted Credential (where id is still None), Pydantic/SQLModel validation for CredsPublic may fail.

If there’s any chance to_public() is used pre-insert, consider one of:

-class CredsPublic(CredsBase):
+class CredsPublic(CredsBase):
@@
-    id: int
+    id: int | None

or guarding in to_public() to ensure self.id is not None before constructing CredsPublic.

Also applies to: 86-98, 100-110

backend/app/models/fine_tuning.py (1)

150-158: Align FineTuningUpdate.status with the enum type

The table column uses FineTuningStatus, but FineTuningUpdate.status is typed as str | None, which weakens type checking and can hide invalid states.

Suggest tightening the type:

-class FineTuningUpdate(SQLModel):
-    training_file_id: str | None = None
-    train_data_s3_object: str | None = None
-    test_data_s3_object: str | None = None
-    split_ratio: float | None = None
-    provider_job_id: str | None = None
-    fine_tuned_model: str | None = None
-    status: str | None = None
-    error_message: str | None = None
+class FineTuningUpdate(SQLModel):
+    training_file_id: str | None = None
+    train_data_s3_object: str | None = None
+    test_data_s3_object: str | None = None
+    split_ratio: float | None = None
+    provider_job_id: str | None = None
+    fine_tuned_model: str | None = None
+    status: FineTuningStatus | None = None
+    error_message: str | None = None

Pydantic will still accept string values and coerce them to the enum, while giving you better type safety.

backend/app/models/model_evaluation.py (1)

89-95: Avoid redefining fine_tuning_id in both base and table models (optional)

fine_tuning_id is declared in ModelEvaluationBase and again in ModelEvaluation. While SQLModel will usually tolerate this, it’s redundant and can confuse future readers about which definition “wins”.

If you only need the additional sa_column_kwargs, consider moving the full definition into the base and omitting the duplicate in the table class, or vice versa, so the field is defined in one place.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2176085 and c7c4214.

📒 Files selected for processing (5)
  • backend/app/models/assistants.py (3 hunks)
  • backend/app/models/credentials.py (5 hunks)
  • backend/app/models/fine_tuning.py (3 hunks)
  • backend/app/models/model_evaluation.py (2 hunks)
  • backend/app/models/openai_conversation.py (4 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Use type hints in Python code (Python 3.11+ project)

Files:

  • backend/app/models/credentials.py
  • backend/app/models/openai_conversation.py
  • backend/app/models/assistants.py
  • backend/app/models/model_evaluation.py
  • backend/app/models/fine_tuning.py
backend/app/models/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Define SQLModel entities (database tables and domain objects) in backend/app/models/

Files:

  • backend/app/models/credentials.py
  • backend/app/models/openai_conversation.py
  • backend/app/models/assistants.py
  • backend/app/models/model_evaluation.py
  • backend/app/models/fine_tuning.py
🧬 Code graph analysis (4)
backend/app/models/credentials.py (3)
backend/app/core/util.py (1)
  • now (13-14)
backend/app/models/organization.py (1)
  • Organization (44-82)
backend/app/models/project.py (1)
  • Project (51-107)
backend/app/models/openai_conversation.py (3)
backend/app/core/util.py (1)
  • now (13-14)
backend/app/models/organization.py (1)
  • Organization (44-82)
backend/app/models/project.py (1)
  • Project (51-107)
backend/app/models/assistants.py (3)
backend/app/models/organization.py (1)
  • Organization (44-82)
backend/app/models/project.py (1)
  • Project (51-107)
backend/app/core/util.py (1)
  • now (13-14)
backend/app/models/model_evaluation.py (2)
backend/app/core/util.py (1)
  • now (13-14)
backend/app/models/project.py (1)
  • Project (51-107)
🪛 Ruff (0.14.8)
backend/app/models/model_evaluation.py

133-133: Undefined name Fine_Tuning

(F821)

backend/app/models/fine_tuning.py

147-147: Undefined name ModelEvaluation

(F821)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: checks (3.11.7, 6)
🔇 Additional comments (15)
backend/app/models/openai_conversation.py (5)

3-3: LGTM: Imports properly added for relationships.

The imports of Organization and Project are correctly added to support the relationship type hints.

Also applies to: 8-9


30-30: LGTM: Type hints updated to Python 3.11+ syntax.

The use of str | None union syntax is appropriate for Python 3.11+ and is equivalent to Optional[str].

Also applies to: 34-34, 40-40


97-101: LGTM: Soft-delete flag and timestamp fields properly implemented.

The is_deleted flag and timestamp fields (inserted_at, updated_at, deleted_at) follow the established pattern from the Project and Organization models, using the now() utility function for defaults.

Also applies to: 118-134


141-167: LGTM: Create model properly updated with type hints.

The type hint changes to str | None and validator application are correct.


170-179: Verify whether exposing foreign keys in Public model is intentional.

OpenAIConversationPublic extends OpenAIConversationBase, which includes project_id and organization_id foreign keys. For a "Public" model, consider whether these internal identifiers should be exposed to API consumers or if they should be explicitly excluded.

If foreign keys should not be exposed, consider creating a dedicated base class without foreign keys for public models, or explicitly exclude them in the Public model definition.

backend/app/models/assistants.py (2)

8-9: Imports for Organization and Project look appropriate

Importing Organization and Project here is consistent with the relationship usage below and avoids circular imports because the reverse relationships in organization.py and project.py use string annotations only. No changes needed.

If you haven’t already, please run the usual app/tests startup to confirm there are no import-order surprises in the full environment.


21-23: vector_store_ids type updated to built‑in generics

Switching vector_store_ids in AssistantBase to list[str] matches Python 3.11+ style and keeps this type consistent with AssistantCreate and AssistantUpdate in the same module; behavior should remain unchanged. As per coding guidelines, this is aligned with the project’s typing expectations.

Consider running your type checker (e.g., mypy/pyright) over this module to ensure there are no downstream typing regressions.

backend/app/models/credentials.py (4)

13-35: CredsBase flags and foreign keys are consistent with the schema direction

The addition of is_active plus organization_id/project_id with ondelete="CASCADE" and descriptive sa_column_kwargs matches the patterns in Organization and Project models and should generate clear DB-level documentation without changing runtime behavior.


56-63: CredsUpdate shape for partial updates looks sound

Using a required credential: dict[str, Any] alongside an optional is_active: bool | None is a reasonable shape for a patch-style update, and the type hints are consistent with the intended behavior.


8-9: Relationship wiring between Credential, Organization, and Project is correct

The new organization: Organization | None and project: Project | None relationships, with back_populates="creds", line up with the corresponding definitions in Organization and Project (creds: list["Credential"] = Relationship(back_populates="organization"/"project", cascade_delete=True)), and the direct imports avoid string-based forward refs while not introducing a circular import.

Also applies to: 112-114


139-141: Type of CredsPublic.credential is consistent with decrypted payload

The change to credential: dict[str, Any] | None matches the decrypt_credentials return type expectation and aligns with the suggested optional typing for CredsCreate.credential. Once the security/usage concerns above are addressed, the type itself is appropriate.

backend/app/models/fine_tuning.py (2)

21-26: Base model training_file_id change looks consistent

Making training_file_id optional (str | None = Field(default=None)) in FineTuningJobBase matches how it’s used on the table model and in updates; no issues here.


58-143: Column comments, foreign keys, and audit fields are cohesive

The enriched Field definitions (comments, soft-delete flag, document/project/organization FKs, and timestamps) are consistent with the patterns used on Project and provide clear database-level metadata. No functional or typing issues spotted in this block.

backend/app/models/model_evaluation.py (2)

29-35: fine_tuning_ids de-duplication is simple and effective

Using dict.fromkeys in dedupe_ids preserves order while removing duplicates; the validator signature and return type are correct.


38-87: Core ModelEvaluation fields and metadata look consistent

The added comments, JSON column for score, Text-backed system_prompt, and soft-delete/timestamp fields are coherent with the rest of the schema design; no obvious correctness or typing issues in this block.

Comment on lines +131 to 133
# Relationships
project: Project = Relationship()
fine_tuning: "Fine_Tuning" = Relationship(back_populates="model_evaluation")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

# First, let's find and read the model_evaluation.py file
find . -type f -name "model_evaluation.py" | head -20

Repository: ProjectTech4DevAI/kaapi-backend

Length of output: 199


🏁 Script executed:

cat -n ./backend/app/models/model_evaluation.py

Repository: ProjectTech4DevAI/kaapi-backend

Length of output: 5834


🏁 Script executed:

find . -type f -name "fine_tuning.py" -path "*/models/*"

Repository: ProjectTech4DevAI/kaapi-backend

Length of output: 110


🏁 Script executed:

cat -n ./backend/app/models/fine_tuning.py | head -150

Repository: ProjectTech4DevAI/kaapi-backend

Length of output: 5808


🏁 Script executed:

find . -type f \( -name "ruff.toml" -o -name "pyproject.toml" -o -name ".ruff.toml" \) | head -5

Repository: ProjectTech4DevAI/kaapi-backend

Length of output: 111


🏁 Script executed:

cat -n ./ruff.toml

Repository: ProjectTech4DevAI/kaapi-backend

Length of output: 1191


🏁 Script executed:

rg "TYPE_CHECKING" --type py -B 2 -A 2 | head -50

Repository: ProjectTech4DevAI/kaapi-backend

Length of output: 2558


Add TYPE_CHECKING-guarded import for Fine_Tuning to resolve F821 linting error

The Fine_Tuning reference in the relationship annotation (line 133) is undefined at module load time, triggering Ruff's F821 error. Add a TYPE_CHECKING-guarded import at the top of the file to make the type available to linters and type checkers while avoiding circular import issues:

-from datetime import datetime
-from enum import Enum
-from uuid import UUID
+from datetime import datetime
+from enum import Enum
+from typing import TYPE_CHECKING
+from uuid import UUID
 
 from pydantic import field_validator
 from sqlalchemy import Column, Text
 from sqlmodel import Field, Relationship, SQLModel
 
 from app.core.util import now
 from app.models.project import Project
+
+if TYPE_CHECKING:
+    from app.models.fine_tuning import Fine_Tuning

Keep the string annotation "Fine_Tuning" on line 133 as is; the guarded import ensures type checkers recognize the name without creating a runtime dependency cycle.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# Relationships
project: Project = Relationship()
fine_tuning: "Fine_Tuning" = Relationship(back_populates="model_evaluation")
from datetime import datetime
from enum import Enum
from typing import TYPE_CHECKING
from uuid import UUID
from pydantic import field_validator
from sqlalchemy import Column, Text
from sqlmodel import Field, Relationship, SQLModel
from app.core.util import now
from app.models.project import Project
if TYPE_CHECKING:
from app.models.fine_tuning import Fine_Tuning
🧰 Tools
🪛 Ruff (0.14.8)

133-133: Undefined name Fine_Tuning

(F821)

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (2)
backend/app/models/evaluation.py (1)

291-304: Use SQLField instead of pydantic's Field for table model timestamp fields.

This issue was already flagged in a previous review but remains unaddressed. These timestamp fields incorrectly use pydantic's Field (line 4 import) instead of SQLField (line 7 import), causing sa_column_kwargs to be silently ignored—the database comments won't be applied.

This is inconsistent with EvaluationDataset timestamps (lines 147-160) which correctly use SQLField.

Apply this diff:

     # Timestamps
-    inserted_at: datetime = Field(
+    inserted_at: datetime = SQLField(
         default_factory=now,
         nullable=False,
         description="The timestamp when the evaluation run was started",
         sa_column_kwargs={"comment": "Timestamp when the evaluation run was started"},
     )
-    updated_at: datetime = Field(
+    updated_at: datetime = SQLField(
         default_factory=now,
         nullable=False,
         description="The timestamp when the evaluation run was last updated",
         sa_column_kwargs={
             "comment": "Timestamp when the evaluation run was last updated"
         },
     )
backend/app/models/collection_job.py (1)

46-50: Type annotation inconsistent with nullability.

The task_id field is typed as str but has default=None and nullable=True. The type annotation should be str | None to reflect nullability and avoid type checker warnings.

Apply this diff:

-    task_id: str = Field(
+    task_id: str | None = Field(
         default=None,
         nullable=True,
         sa_column_kwargs={"comment": "Celery task ID for async processing"},
     )
🧹 Nitpick comments (4)
backend/app/models/collection_job.py (1)

56-60: Consider adding explicit default=None for consistency.

The error_message field lacks an explicit default=None, unlike task_id and trace_id. While SQLModel may infer this from the str | None type, being explicit improves consistency across the model.

     error_message: str | None = Field(
+        default=None,
         sa_column=Column(
             Text, nullable=True, comment="Error message if the job failed"
         ),
     )
backend/app/models/openai_conversation.py (3)

12-20: Align validator helper and method type hints with actual None handling

validate_response_id_pattern and the validate_response_ids methods explicitly handle and return None, but their type hints currently suggest only str, which is misleading for static analysis and IDEs.

Consider tightening the annotations to reflect the actual behavior:

-def validate_response_id_pattern(v: str) -> str:
+def validate_response_id_pattern(v: str | None) -> str | None:
@@
-    def validate_response_ids(cls, v):
-        return validate_response_id_pattern(v)
+    def validate_response_ids(cls, v: str | None) -> str | None:
+        return validate_response_id_pattern(v)
@@
-    def validate_response_ids(cls, v):
-        return validate_response_id_pattern(v)
+    def validate_response_ids(cls, v: str | None) -> str | None:
+        return validate_response_id_pattern(v)

This keeps runtime behavior unchanged while making the intent clearer to type checkers.

Also applies to: 78-81, 130-150


85-98: Prefer optional relationship types for project and organization

The new id, soft-delete flag, and timestamp fields are consistent with other models. For the relationships, however, using non-optional types is a bit too strict: even though project_id/organization_id are non-nullable, the relationship attributes themselves can be None on partially loaded instances.

To better match SQLModel patterns (e.g., Project.organization: Optional["Organization"]) and avoid type-checker noise, consider:

-    project: Project = Relationship(back_populates="openai_conversations")
-    organization: Organization = Relationship(back_populates="openai_conversations")
+    project: Project | None = Relationship(back_populates="openai_conversations")
+    organization: Organization | None = Relationship(back_populates="openai_conversations")

This keeps the DB constraints as-is while making the Python-side typing more accurate.

Also applies to: 100-117, 120-121


2-5: Update Pydantic config from class Config to model_config for v2

In Pydantic v2, the inner class Config pattern is deprecated. Use model_config = ConfigDict(from_attributes=True, populate_by_name=True, use_enum_values=True) instead.

A v2-style update could look like:

-from pydantic import field_validator
+from pydantic import ConfigDict, field_validator
@@
 class OpenAIConversationPublic(OpenAIConversationBase):
     id: int
     inserted_at: datetime
     updated_at: datetime
 
-    class Config:
-        from_attributes = True
-        populate_by_name = True
-        use_enum_values = True
+    model_config = ConfigDict(
+        from_attributes=True,
+        populate_by_name=True,
+        use_enum_values=True,
+    )

Also applies to: 153-163

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c7c4214 and ee3b646.

📒 Files selected for processing (5)
  • backend/app/models/collection_job.py (2 hunks)
  • backend/app/models/document_collection.py (1 hunks)
  • backend/app/models/evaluation.py (5 hunks)
  • backend/app/models/openai_conversation.py (4 hunks)
  • backend/app/models/threads.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • backend/app/models/document_collection.py
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Use type hints in Python code (Python 3.11+ project)

Files:

  • backend/app/models/threads.py
  • backend/app/models/collection_job.py
  • backend/app/models/openai_conversation.py
  • backend/app/models/evaluation.py
backend/app/models/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Define SQLModel entities (database tables and domain objects) in backend/app/models/

Files:

  • backend/app/models/threads.py
  • backend/app/models/collection_job.py
  • backend/app/models/openai_conversation.py
  • backend/app/models/evaluation.py
🧬 Code graph analysis (4)
backend/app/models/threads.py (2)
backend/app/core/util.py (1)
  • now (13-14)
backend/app/crud/rag/open_ai.py (1)
  • default (31-32)
backend/app/models/collection_job.py (2)
backend/app/core/util.py (1)
  • now (13-14)
backend/app/models/collection.py (2)
  • CollectionIDPublic (166-167)
  • CollectionPublic (170-179)
backend/app/models/openai_conversation.py (3)
backend/app/core/util.py (1)
  • now (13-14)
backend/app/models/organization.py (1)
  • Organization (44-82)
backend/app/models/project.py (1)
  • Project (51-107)
backend/app/models/evaluation.py (1)
backend/app/core/util.py (1)
  • now (13-14)
🔇 Additional comments (16)
backend/app/models/evaluation.py (2)

86-160: LGTM: Proper use of SQLField with database comments.

The metadata additions for EvaluationDataset are correct and consistent:

  • Primary key, indexed fields, and foreign keys all use SQLField with appropriate sa_column_kwargs
  • JSONB column correctly uses sa_column=Column(JSONB, ...)
  • Timestamps correctly use SQLField with nullable=False and comments

179-288: LGTM: Proper use of SQLField with database comments.

The metadata additions for EvaluationRun fields are correct and consistent:

  • Primary key, indexed fields, and foreign keys all use SQLField with appropriate sa_column_kwargs
  • JSONB and Text columns correctly use sa_column=Column(...)
  • Foreign key constraints include proper ondelete cascades
backend/app/models/collection_job.py (8)

1-8: LGTM!

Imports are well-organized and all are utilized within the file.


28-32: LGTM!

The id field is properly configured with UUID generation, primary key designation, and helpful database-level comment metadata.


33-40: LGTM!

The status field is well-defined with appropriate defaults, enum constraint, and descriptive database comment.


41-45: LGTM!

The action_type field is correctly defined as a required enum field with appropriate metadata.


51-55: LGTM!

The trace_id field is correctly typed as str | None and properly configured for tracing correlation.


62-74: LGTM!

Foreign key definitions are well-structured with appropriate CASCADE behavior and proper nullability settings. The collection_id being nullable correctly supports jobs pending collection creation.


103-133: LGTM!

Request and response models are well-structured. The use of aliased properties (job_id, job_inserted_at, job_updated_at) provides clear API semantics, and the CollectionJobUpdate model properly supports partial updates with optional fields.


76-88: No action needed – updated_at is already properly managed on record updates.

The updated_at field correctly uses default_factory=now for initial creation, and CollectionJobCrud.update() explicitly sets job.updated_at = now() whenever records are modified (line 66). This pattern is consistent across all models in the codebase.

backend/app/models/threads.py (4)

3-5: LGTM: Clean import organization.

The imports are well-organized and the now utility function is appropriately imported for timestamp handling.


11-13: LGTM: Type hints modernized correctly.

Using str | None syntax aligns with Python 3.10+ standards and the project's Python 3.11+ requirement.


20-47: LGTM: Comprehensive field documentation.

The model docstring and field-level comments via sa_column_kwargs provide excellent database documentation. The field redefinitions from the base class to add table-specific configuration follow standard SQLModel patterns.


49-60: LGTM: Timestamp auditing correctly implemented.

The timestamp fields are properly configured:

  • inserted_at uses default_factory=now for creation timestamps
  • updated_at correctly combines default_factory=now (for INSERT) with "onupdate": now (for UPDATE)

The previous review concern about updated_at not auto-updating has been addressed. The onupdate parameter in sa_column_kwargs will be passed to SQLAlchemy's Column constructor, ensuring the timestamp refreshes on each UPDATE operation.

backend/app/models/openai_conversation.py (2)

8-9: Base model field definitions and FK metadata look good

The added metadata (comments, length constraints, and FK ondelete="CASCADE"/nullable=False) on response_id/threading fields, user_question/response, model, assistant_id, project_id, and organization_id is consistent and in line with the patterns used in Project and Organization. No functional issues stand out here.

Also applies to: 25-29, 33-40, 41-48, 52-55, 58-58, 63-63, 66-69, 72-75


124-145: Create model changes (optionality and constraints) look consistent

The relaxations to previous_response_id, response, and assistant_id as str | None with sensible defaults, along with the existing min_length/description constraints, line up with the base model and the shared validator. No issues from a validation or typing perspective here.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (2)
backend/app/models/model_evaluation.py (1)

1-11: Add TYPE_CHECKING-guarded import for Fine_Tuning (duplicate concern).

The past review requested adding a TYPE_CHECKING-guarded import to resolve the F821 linting error on line 143, but this was not addressed.

Apply this diff to add the TYPE_CHECKING import:

 from datetime import datetime
 from enum import Enum
+from typing import TYPE_CHECKING
 from uuid import UUID
 
 from pydantic import field_validator
 from sqlalchemy import Column, Text
 from sqlalchemy.dialects.postgresql import JSON
 from sqlmodel import Field, Relationship, SQLModel
 
 from app.core.util import now
 from app.models.project import Project
+
+if TYPE_CHECKING:
+    from app.models.fine_tuning import Fine_Tuning
backend/app/models/fine_tuning.py (1)

154-154: Change relationship type to list["ModelEvaluation"] (duplicate concern).

The past review identified that this is a one-to-many relationship (one Fine_Tuning can have multiple ModelEvaluations), but the type annotation remains singular. The foreign key in ModelEvaluation (line 100-105 in model_evaluation.py) confirms this cardinality.

Apply this diff to correct the type annotation:

-    model_evaluation: "ModelEvaluation" = Relationship(back_populates="fine_tuning")
+    model_evaluation: list["ModelEvaluation"] = Relationship(back_populates="fine_tuning")
🧹 Nitpick comments (1)
backend/app/models/user.py (1)

66-75: Align id type hint with SQLModel primary-key pattern

The added docstring and sa_column_kwargs comments on id and hashed_password are clear and helpful. For autoincrementing primary keys, SQLModel recommends using an optional type so the "not yet persisted" state can be represented cleanly and type checkers don't complain about default=None on a non-optional field.

Consider:

-    id: int = Field(
-        default=None,
+    id: int | None = Field(
+        default=None,
         primary_key=True,
         sa_column_kwargs={"comment": "Unique identifier for the user"},
     )

This keeps runtime behavior the same while making the model more accurate from a typing and SQLModel-idiomatic perspective. The database enforces NOT NULL, but the Python type allows None during object creation so the database can generate the value.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ee3b646 and ec359a1.

📒 Files selected for processing (3)
  • backend/app/models/fine_tuning.py (3 hunks)
  • backend/app/models/model_evaluation.py (2 hunks)
  • backend/app/models/user.py (2 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Use type hints in Python code (Python 3.11+ project)

Files:

  • backend/app/models/model_evaluation.py
  • backend/app/models/user.py
  • backend/app/models/fine_tuning.py
backend/app/models/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Define SQLModel entities (database tables and domain objects) in backend/app/models/

Files:

  • backend/app/models/model_evaluation.py
  • backend/app/models/user.py
  • backend/app/models/fine_tuning.py
🧬 Code graph analysis (2)
backend/app/models/model_evaluation.py (2)
backend/app/core/util.py (1)
  • now (13-14)
backend/app/models/project.py (1)
  • Project (51-107)
backend/app/models/fine_tuning.py (2)
backend/app/core/util.py (1)
  • now (13-14)
backend/app/models/project.py (1)
  • Project (51-107)
🪛 Ruff (0.14.8)
backend/app/models/model_evaluation.py

143-143: Undefined name Fine_Tuning

(F821)

backend/app/models/fine_tuning.py

154-154: Undefined name ModelEvaluation

(F821)

🔇 Additional comments (13)
backend/app/models/user.py (2)

2-2: Import cleanup looks good

Using from sqlmodel import Field, SQLModel is appropriate here; the import set is minimal and matches actual usage in this module.


7-29: UserBase column metadata and docstring are well-structured

The added docstring plus Field metadata (unique, index, max_length, and sa_column_kwargs comments) for email, is_active, is_superuser, and full_name are consistent, descriptive, and keep both API typing and DB-layer concerns clear.

backend/app/models/model_evaluation.py (6)

43-46: LGTM!

The id field follows the established pattern with appropriate primary key configuration and descriptive comment.


47-91: LGTM!

The field definitions are well-documented with appropriate column comments, and the conversion of system_prompt to a SQLAlchemy Text column is appropriate for handling potentially large prompt text.


92-97: LGTM!

The soft delete flag follows the standard pattern with appropriate defaults and documentation.


99-122: LGTM!

The foreign key definitions are properly configured with appropriate cascade behaviors and descriptive comments. The addition of document_id provides clear traceability to evaluation documents.


146-166: LGTM!

The update and public schemas correctly use modern union syntax (PEP 604) and appropriately expose the new fields including document_id and timestamps for client consumption.


142-142: No action needed. The one-way relationship is intentional—the Project model does not define a back reference to ModelEvaluation. This design choice is consistent with the codebase, where Project maintains explicit relationships only to Credentials, Assistants, Collections, Fine_Tuning, and OpenAIConversation.

backend/app/models/fine_tuning.py (5)

21-53: LGTM!

The base model and creation schema use appropriate modern union syntax, and the validators properly enforce constraints on split_ratio and system_prompt.


58-114: LGTM!

The field definitions are comprehensive with appropriate column comments, proper nullability constraints, and the Text column for system_prompt is suitable for large prompt text. The soft delete flag follows the established pattern.


116-133: LGTM!

The foreign key definitions include appropriate cascade behaviors and descriptive comments for clear schema documentation.


141-145: Verify auto-update mechanism for updated_at.

Similar to the concern in model_evaluation.py, the updated_at field uses default_factory=now but lacks a visible auto-update mechanism (such as onupdate) to refresh the timestamp on modifications. Ensure this is handled either through SQLAlchemy configuration or application logic.


157-184: LGTM!

The update and public schemas correctly use modern union syntax and comprehensively expose the necessary fields for client consumption, including the new timestamp fields.

Comment on lines 1 to +10
from datetime import datetime
from enum import Enum
from uuid import UUID

from sqlalchemy import Column, Text
from pydantic import field_validator
from sqlmodel import SQLModel, Field, Relationship
from sqlalchemy import Column, Text
from sqlmodel import Field, Relationship, SQLModel

from app.core.util import now
from app.models.project import Project
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Add TYPE_CHECKING-guarded import for ModelEvaluation.

The ModelEvaluation reference in the relationship annotation (line 154) is undefined at module load time, triggering Ruff's F821 error. Add a TYPE_CHECKING-guarded import to resolve this.

Apply this diff:

 from datetime import datetime
 from enum import Enum
+from typing import TYPE_CHECKING
 from uuid import UUID
 
 from pydantic import field_validator
 from sqlalchemy import Column, Text
 from sqlmodel import Field, Relationship, SQLModel
 
 from app.core.util import now
 from app.models.project import Project
+
+if TYPE_CHECKING:
+    from app.models.model_evaluation import ModelEvaluation
🤖 Prompt for AI Agents
In backend/app/models/fine_tuning.py around lines 1 to 10, the ModelEvaluation
name used later in a Relationship annotation is undefined at import time causing
Ruff F821; add a TYPE_CHECKING-guarded import at the top: from typing import
TYPE_CHECKING and inside if TYPE_CHECKING: import ModelEvaluation from its
module (e.g. from app.models.model_evaluation import ModelEvaluation), so the
name is available for type-checkers and annotations but not executed at runtime.

Comment on lines +125 to +134
inserted_at: datetime = Field(
default_factory=now,
nullable=False,
sa_column_kwargs={"comment": "Timestamp when the evaluation was created"},
)
updated_at: datetime = Field(
default_factory=now,
nullable=False,
sa_column_kwargs={"comment": "Timestamp when the evaluation was last updated"},
)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Add auto-update mechanism for updated_at field.

The updated_at field currently only sets its value on record creation via default_factory=now. When records are updated through the CRUD layer, this timestamp is not automatically refreshed. To maintain data integrity, either configure SQLAlchemy's onupdate parameter on the field definition or explicitly set updated_at = now() in the update logic (e.g., in the base CRUD update() method before persisting changes).

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
backend/app/models/config/version.py (1)

14-36: Validator typing + ConfigBlob override mismatch (may not catch “empty” on create).

ConfigVersionBase.config_blob is typed as dict[str, Any], but ConfigVersionCreate overrides it to ConfigBlob. The base @field_validator("config_blob") likely also runs for ConfigBlob, where if not value may not behave as intended. Also, this validator lacks type hints (project guideline).

Suggested patch:

@@
-from typing import Any
+from typing import Any
@@
 class ConfigVersionBase(SQLModel):
@@
     @field_validator("config_blob")
-    def validate_blob_not_empty(cls, value):
-        if not value:
-            raise ValueError("config_blob cannot be empty")
-        return value
+    @classmethod
+    @field_validator("config_blob", mode="before")
+    def validate_blob_not_empty(cls, value: Any) -> Any:
+        if value is None:
+            raise ValueError("config_blob cannot be empty")
+        if isinstance(value, ConfigBlob):
+            if not value.model_dump(exclude_none=True):
+                raise ValueError("config_blob cannot be empty")
+            return value
+        if isinstance(value, dict):
+            if not value:
+                raise ValueError("config_blob cannot be empty")
+            return value
+        raise TypeError("config_blob must be a dict or ConfigBlob")
🧹 Nitpick comments (1)
backend/app/models/config/config.py (1)

94-98: Add type hints to the Pydantic validator (project guideline)
This is a Python 3.11+ codebase and the validator currently lacks annotations.

 @field_validator("config_blob")
-    def validate_blob_not_empty(cls, value):
+    def validate_blob_not_empty(cls, value: ConfigBlob) -> ConfigBlob:
         if not value:
             raise ValueError("config_blob cannot be empty")
         return value
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7214199 and 1f281d0.

📒 Files selected for processing (2)
  • backend/app/models/config/config.py (3 hunks)
  • backend/app/models/config/version.py (2 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Use type hints in Python code (Python 3.11+ project)

Files:

  • backend/app/models/config/config.py
  • backend/app/models/config/version.py
backend/app/models/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Define SQLModel entities (database tables and domain objects) in backend/app/models/

Files:

  • backend/app/models/config/config.py
  • backend/app/models/config/version.py
🧬 Code graph analysis (2)
backend/app/models/config/config.py (1)
backend/app/core/util.py (1)
  • now (13-14)
backend/app/models/config/version.py (1)
backend/app/core/util.py (1)
  • now (13-14)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: checks (3.11.7, 6)
🔇 Additional comments (3)
backend/app/models/config/version.py (2)

72-87: updated_at won’t update automatically—verify “auditing” behavior.

updated_at is default_factory=now, but there’s no onupdate/trigger/server-side default here. If the intent is real timestamp auditing, please confirm the codebase sets updated_at on updates (service layer, SQLAlchemy events, etc.) or consider wiring onupdate/server-side behavior.


15-22: Column-level comments additions look consistent.

The added Postgres comment metadata for config_blob, commit_message, id, config_id, and version is consistent and should help schema introspection/documentation.

Also applies to: 23-30, 53-57, 59-70

backend/app/models/config/config.py (1)

50-80: Keep timestamp/comment changes consistent; also verify updated_at semantics
The added comments are fine; two follow-ups:

  1. updated_at still only has default_factory=now—please ensure it’s updated on writes elsewhere (events/triggers/service layer), since the PR theme mentions auditing.
  2. Minor consistency: updated_at.sa_column_kwargs can match the single-line style used elsewhere.

Comment on lines +16 to 27
name: str = Field(
min_length=1,
max_length=128,
description="Config name",
sa_column_kwargs={"comment": "Configuration name"},
)
description: str | None = Field(
default=None, max_length=512, description="Optional description"
default=None,
max_length=512,
description="Description of the configuration",
sa_column_kwargs={"comment": "Description of the configuration"},
)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🌐 Web query:

SQLModel Field sa_column_kwargs comment support SQLAlchemy

💡 Result:

Yes — SQLModel's Field accepts sa_column_kwargs and you can pass SQLAlchemy Column keyword args such as comment. SQLModel forwards sa_column_kwargs into the underlying sqlalchemy.Column, so using e.g.

class User(SQLModel, table=True):
id: int | None = Field(default=None, primary_key=True)
name: str = Field(sa_column_kwargs={"comment": "user full name"})

will create a column with that comment (subject to DB support). [1][2][3]

Sources:

  • SQLModel (project README / usage — sa_column_kwargs is passed to SQLAlchemy Column). [1]
  • SQLAlchemy docs — Column supports the comment parameter. [2]
  • Example/tutorial showing sa_column_kwargs={"comment": "..."} usage. [3]

🌐 Web query:

Alembic autogenerate column comments diffs PostgreSQL

💡 Result:

Short answer

  • Alembic does not reliably autogenerate column-comment diffs out of the box. Autogenerate’s built‑in detection list does not include column comment changes. [1]
  • You can (a) add comment changes manually with op.alter_column(..., comment="...") or (b) implement a custom autogenerate comparator that compares the DB column comment (via the Inspector) to the SQLAlchemy Column.comment and emits an alter_column operation. [1][2][3]

How to implement (concise example)

  1. Manual migration (simple):
    op.alter_column("my_table", "my_col", comment="new comment", existing_type=sa.String())

  2. Automatic via custom comparator (placed in alembic/env.py or a module imported there):
    from alembic.autogenerate import comparators
    from alembic.autogenerate import op as ag_op
    from alembic.operations import ops

    @comparators.dispatch_for("column")
    def _compare_column_comments(autogen_context, modify_table_ops, schema, tname, colname, conn_col, metadata_col):
    # conn_col and metadata_col are reflected dicts/Column objects; adjust None checks as needed
    db_comment = conn_col.get("comment") if conn_col is not None else None
    model_comment = getattr(metadata_col, "comment", None)
    if db_comment != model_comment:
    # append an alter_column operation so autogenerate will render op.alter_column(..., comment=...)
    modify_table_ops.append(
    ops.AlterColumnOp(tname, colname, {"comment": model_comment}, existing_server_default=None)
    )

Notes and pointers

  • Alembic provides op.alter_column(..., comment=...) and table comment ops; use those for migrations. [2]
  • Use alembic.autogenerate.comparators.dispatch_for to register custom comparison logic (examples in the docs). [3]
  • Always review autogenerated migrations before applying. [1]

Sources

  • Alembic autogenerate docs (what autogenerate detects / limitations). [1]
  • Alembic Operations reference (alter_column supports comment). [2]
  • Alembic autogenerate custom comparator pattern / dispatch_for example. [3]

Alembic won't auto-detect column comment changes—confirm migration strategy

SQLModel + SQLAlchemy do support sa_column_kwargs={"comment": "..."}, and column comments will be created in the database. However, Alembic's autogenerate does not automatically detect column comment diffs; it requires either manual migrations (op.alter_column(..., comment="...")) or a custom autogenerate comparator. Verify your migration workflow accounts for this—otherwise comment updates in code won't be reflected in migrations, causing schema drift.

🤖 Prompt for AI Agents
In backend/app/models/config/config.py around lines 16 to 27, column comments
set via sa_column_kwargs will not be detected by Alembic autogenerate; update
your migration workflow so comment changes are persisted: when you change a
Field.comment either (A) after running alembic revision --autogenerate, manually
edit the generated migration to call op.alter_column(<table>, <col>,
comment="...") for the changed columns, or (B) implement a custom autogenerate
comparator in alembic/env.py that inspects column.comment and emits alter_column
operations; pick one approach and apply it consistently so code comment updates
produce migrations rather than creating schema drift.

Comment on lines 88 to 92
commit_message: str | None = Field(
default=None,
max_length=512,
description="Optional message describing the changes in this version",
description="Message describing the changes in this version",
)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

commit_message is still optional but description no longer says so
If it’s truly optional (str | None, default=None), consider restoring “Optional …” wording to avoid API-doc confusion; otherwise make it required.

-    commit_message: str | None = Field(
-        default=None,
-        max_length=512,
-        description="Message describing the changes in this version",
-    )
+    commit_message: str | None = Field(
+        default=None,
+        max_length=512,
+        description="Optional message describing the changes in this version",
+    )
🤖 Prompt for AI Agents
In backend/app/models/config/config.py around lines 88 to 92, the Field for
commit_message is typed as optional (str | None, default=None) but the
description omits that and may mislead docs; either update the description to
explicitly say "Optional commit message..." or change the type/default to make
it required (remove | None and default) depending on intended behavior. Modify
the Field so the type/default and the description are consistent: if optional,
prepend "Optional" to the description and keep default=None; if required, remove
None/default and adjust description accordingly.

object_store_url: str = Field(
sa_column_kwargs={"comment": "Cloud storage URL for the document"},
)
is_deleted: bool = Field(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

may be it could be is_soft_deleted for better readability

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Kaapi v1.0: DB comments

4 participants