memgraph · antejavor · Oct 13, 2025 · Sep 29, 2025 · Oct 1, 2025 · Oct 8, 2025
@@ -1,6 +1,12 @@
 # OpenAI API Configuration
 OPENAI_API_KEY=your_actual_openai_api_key
 
+# Migration Defaults (can be overridden via CLI flags)
+SQL2MG_MODE=automatic
+SQL2MG_STRATEGY=deterministic
+SQL2MG_META_POLICY=auto
+SQL2MG_LOG_LEVEL=INFO
+
 # MySQL Database Configuration
 MYSQL_HOST=host.docker.internal
 MYSQL_USER=root

@@ -0,0 +1,90 @@
+# SQL → Memgraph Migration Agent Prompt
+
+## TL;DR
+
+You are working inside the `agents/sql2memgraph` package, a UV-managed Python project that turns relational schemas (MySQL/PostgreSQL) into Memgraph graph schemas and data. The primary entry point is `main.py`, which wires configuration, environment validation, database analyzers, and the HyGM graph-modeling subsystem. Changes usually touch:
+
+- `core/` — Orchestrates the migration workflow (`migration_agent.py`) and HyGM graph modeling (`hygm/`).
+- `database/` — Connectors and analyzers for the source RDBMSs.
+- `query_generation/` — Cypher generation helpers.
+- `utils/` — Environment setup, connection probes, and CLI helpers.
+
+Always maintain the CLI experience (`main.py`) and respect the line-length < 79 char lint rule.
+
+## Tech Stack & Tooling
+
+- Python 3.10+, managed with [uv](https://github.com/astral-sh/uv).
+- Memgraph as the target graph database (Bolt connection).
+- Optional LLM features powered by OpenAI (LangChain / LangGraph patterns inside `core/hygm`).
+- Testing: `pytest` under `tests/` (integration heavy, uses mocks for DB analyzers).
+
+## Core Concepts
+
+- **HyGM (Hypothetical Graph Modeling)** lives in `core/hygm/hygm.py` and exposes modeling modes via `ModelingMode`:
+  - `AUTOMATIC` – one-shot graph generation.
+  - `INCREMENTAL` – table-by-table confirmation flow with an optional refinement
+    loop after processing all tables.
+- **GraphModelingStrategy**: `DETERMINISTIC` (rule-based) and `LLM_POWERED` (needs an LLM+API key).
+- **SQLToMemgraphAgent** (`core/migration_agent.py`) coordinates schema analysis, HyGM modeling, query generation, execution, and validation.
+- **Database analyzers** in `database/` introspect MySQL/PostgreSQL schemas and emit a normalized metadata structure consumed by HyGM.
+- **Query generation** in `query_generation/` converts the graph model + metadata into Cypher migrations, indexes, and constraints.
+- **Database data interfaces** in `database/models.py` define the canonical `TableInfo`, `ColumnInfo`, `RelationshipInfo`, and `DatabaseStructure` data classes. These objects flow from analyzers into HyGM via the `to_hygm_format()` helpers, ensuring consistent schema metadata for every modeling mode.
+- **Graph schema structures** in `core/hygm/models/graph_models.py` (e.g., `GraphModel`, `GraphNode`, `GraphRelationship`) capture the in-memory graph representation HyGM produces and later serializes to schema format.
+- **LLM structured output models** in `core/hygm/models/llm_models.py` (`LLMGraphModel`, `LLMGraphNode`, `LLMGraphRelationship`) describe the contract for AI-generated schemas and include `to_graph_model()` utilities to convert LLM responses into the standard `GraphModel` objects.
+- The `GraphModel` serialization format matches the canonical spec in `core/schema/spec.json`, so any changes to the schema data classes should be mirrored against that document.
+- Source tracking helpers in `core/hygm/models/sources.py` annotate nodes, relationships, properties, indexes, and constraints with origin metadata. Preserve these when modifying `GraphModel` so downstream migrations retain the link back to the originating tables or user-applied changes.
+
+## Entry Points & CLI Flow
+
+- Run with `uv run main.py` (banner, env checks, connection probes, then the migration workflow).
+- CLI prompts include:
+  - Graph modeling mode (automatic / incremental with interactive refinement).
+  - Modeling strategy (deterministic / AI-powered).
+  - Confirmation dialogs during automatic or incremental flows.
+  - Post-session prompts that let users launch the interactive refinement loop
+    after reviewing every table in an incremental run.
+- Environment validation happens before migration; failures raise `MigrationEnvironmentError` or `DatabaseConnectionError` from `utils/`.
+
+## Configuration & Environment
+
+- `.env` (or env vars) must provide:
+  - `SOURCE_DB_*` (host, port, name, user, password, type [`mysql|postgresql`]).
+  - `MEMGRAPH_*` connection details.
+  - `OPENAI_API_KEY` for LLM features; omit or leave empty to disable LLM strategy.
+- Memgraph must run with `--schema-info-enabled=true` for schema validation.
+
+## Testing & Validation
+
+- Install deps: `uv sync` (or `uv pip install -e .`).
+- Run targeted tests: `uv run python -m pytest tests/test_integration.py -v`.
+- Keep graph-modeling logic covered via integration tests; they rely on mocked analyzers.
+- Observe linting: adhere to 79-character lines and existing logging conventions (`logging` module).
+
+## Development Tips
+
+- Update `PROMPT.md` when project layout or workflows change.
+- Prefer existing abstractions: use `SQLToMemgraphAgent` methods, HyGM strategies/helpers, and database adapters.
+- For new modeling flows, ensure `ModelingMode` and CLI choices stay in sync.
+- Preserve user-facing prompts/emojis—they guide the interactive experience.
+- When adding LLM-dependent features, guard them when API keys or clients are missing.
+- Document new commands or config expectations in this prompt and `README.md` if user-facing.
+
+## Useful Commands
+
+```bash
+# Sync dependencies
+uv sync
+
+# Run the CLI
+uv run main.py
+
+# Run tests
+uv run python -m pytest tests -v
+```
+
+## When Generating Code
+
+- Mention which files you change and why, referencing modules above.
+- Explain how to rerun the CLI or relevant tests after modifications.
+- Provide small follow-up suggestions if more validation is needed.
+- Keep output concise but cover context so the next agent run has everything it needs.
@@ -10,7 +10,9 @@ This package provides a sophisticated migration agent that:
 - **Generates optimal graph models** - Uses AI to create node and relationship structures
 - **Creates indexes and constraints** - Ensures performance and data integrity
 - **Handles complex relationships** - Converts foreign keys to graph relationships
-- **Interactive refinement** - Allows users to review and adjust the graph model
+- **Incremental refinement** - Review each table, adjust the model
+  immediately, then enter the interactive refinement loop once all tables
+  are processed
 - **Comprehensive validation** - Verifies migration results and data integrity
 
 ## Installation
@@ -35,38 +37,61 @@ The agent will guide you through:
 
 1. Environment setup and database connections
 2. Graph modeling strategy selection
-3. Interactive or automatic migration mode
+3. Automatic or incremental migration mode
 4. Complete migration workflow with progress tracking
 
+> **Incremental review:** The LLM now drafts the entire graph model in a single
+> shot and then walks you through table-level changes detected since the last
+> migration. You only need to approve (or tweak) the differences that matter.
+
+You can also preconfigure the workflow using CLI flags or environment variables:
+
+```bash
+uv run main --mode incremental --strategy llm --meta-graph reset --log-level DEBUG
+```
+
+| Option                           | Environment          | Description                                                   |
+| -------------------------------- | -------------------- | ------------------------------------------------------------- |
+| `--mode {automatic,incremental}` | `SQL2MG_MODE`        | Selects automatic or incremental modeling flow.               |
+| `--strategy {deterministic,llm}` | `SQL2MG_STRATEGY`    | Chooses deterministic or LLM-powered HyGM strategy.           |
+| `--meta-graph {auto,skip,reset}` | `SQL2MG_META_POLICY` | Controls how stored meta graph data is used (default `auto`). |
+| `--log-level LEVEL`              | `SQL2MG_LOG_LEVEL`   | Sets logging verbosity (`DEBUG`, `INFO`, etc.).               |
+
 ## Configuration
 
 Set up your environment variables in `.env`:
 
 ```bash
-# Source Database (MySQL/PostgreSQL)
-SOURCE_DB_HOST=localhost
-SOURCE_DB_PORT=3306
-SOURCE_DB_NAME=your_database
-SOURCE_DB_USER=username
-SOURCE_DB_PASSWORD=password
-SOURCE_DB_TYPE=mysql  # or postgresql
+# MySQL Database (primary source)
+MYSQL_HOST=localhost
+MYSQL_PORT=3306
+MYSQL_DATABASE=sakila
+MYSQL_USER=username
+MYSQL_PASSWORD=password
 
 # Memgraph Database
-MEMGRAPH_HOST=localhost
-MEMGRAPH_PORT=7687
-MEMGRAPH_USER=
+MEMGRAPH_URL=bolt://localhost:7687
+MEMGRAPH_USERNAME=
 MEMGRAPH_PASSWORD=
+MEMGRAPH_DATABASE=memgraph
 
 # OpenAI (for LLM-powered features)
 OPENAI_API_KEY=your_openai_key
+
+# Optional migration defaults (override CLI prompts)
+SQL2MG_MODE=automatic
+SQL2MG_STRATEGY=deterministic
+SQL2MG_META_POLICY=auto
+SQL2MG_LOG_LEVEL=INFO
 ```
 
 Make sure that Memgraph is started with the `--schema-info-enabled=true`, since agent uses the schema information from Memgraph `SHOW SCHEMA INFO`.
 
 # Arhitecture
 
+```
 core/hygm/
-├── hygm.py # Main orchestrator class  
+├── hygm.py # Main orchestrator class
 ├── models/ # Data models and structures
 │ ├── graph_models.py # Core graph representation
 │ ├── llm_models.py # LLM-specific models
@@ -76,3 +101,4 @@ core/hygm/
 ├── base.py # Abstract interface
 ├── deterministic.py # Rule-based modeling
 └── llm.py # AI-powered modeling
+```
@@ -17,13 +17,14 @@
 
 ```python
 from agents.core import SQLToMemgraphAgent, HyGM
+from agents.core.hygm import ModelingMode
 from agents.utils import setup_and_validate_environment
 
 # Setup environment
 mysql_config, memgraph_config = setup_and_validate_environment()
 
 # Create migration agent
-agent = SQLToMemgraphAgent(interactive_graph_modeling=False)
+agent = SQLToMemgraphAgent(modeling_mode=ModelingMode.AUTOMATIC)
 
 # Run migration
 result = agent.migrate(mysql_config, memgraph_config)