Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -1,6 +1,12 @@
# OpenAI API Configuration
OPENAI_API_KEY=your_actual_openai_api_key

# Migration Defaults (can be overridden via CLI flags)
SQL2MG_MODE=automatic
SQL2MG_STRATEGY=deterministic
SQL2MG_META_POLICY=auto
SQL2MG_LOG_LEVEL=INFO

# MySQL Database Configuration
MYSQL_HOST=host.docker.internal
MYSQL_USER=root
Expand Down
90 changes: 90 additions & 0 deletions agents/sql2memgraph/PROMPT.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,90 @@
# SQL → Memgraph Migration Agent Prompt

## TL;DR

You are working inside the `agents/sql2memgraph` package, a UV-managed Python project that turns relational schemas (MySQL/PostgreSQL) into Memgraph graph schemas and data. The primary entry point is `main.py`, which wires configuration, environment validation, database analyzers, and the HyGM graph-modeling subsystem. Changes usually touch:

- `core/` — Orchestrates the migration workflow (`migration_agent.py`) and HyGM graph modeling (`hygm/`).
- `database/` — Connectors and analyzers for the source RDBMSs.
- `query_generation/` — Cypher generation helpers.
- `utils/` — Environment setup, connection probes, and CLI helpers.

Always maintain the CLI experience (`main.py`) and respect the line-length < 79 char lint rule.

## Tech Stack & Tooling

- Python 3.10+, managed with [uv](https://github.com/astral-sh/uv).
- Memgraph as the target graph database (Bolt connection).
- Optional LLM features powered by OpenAI (LangChain / LangGraph patterns inside `core/hygm`).
- Testing: `pytest` under `tests/` (integration heavy, uses mocks for DB analyzers).

## Core Concepts

- **HyGM (Hypothetical Graph Modeling)** lives in `core/hygm/hygm.py` and exposes modeling modes via `ModelingMode`:
- `AUTOMATIC` – one-shot graph generation.
- `INCREMENTAL` – table-by-table confirmation flow with an optional refinement
loop after processing all tables.
- **GraphModelingStrategy**: `DETERMINISTIC` (rule-based) and `LLM_POWERED` (needs an LLM+API key).
- **SQLToMemgraphAgent** (`core/migration_agent.py`) coordinates schema analysis, HyGM modeling, query generation, execution, and validation.
- **Database analyzers** in `database/` introspect MySQL/PostgreSQL schemas and emit a normalized metadata structure consumed by HyGM.
- **Query generation** in `query_generation/` converts the graph model + metadata into Cypher migrations, indexes, and constraints.
- **Database data interfaces** in `database/models.py` define the canonical `TableInfo`, `ColumnInfo`, `RelationshipInfo`, and `DatabaseStructure` data classes. These objects flow from analyzers into HyGM via the `to_hygm_format()` helpers, ensuring consistent schema metadata for every modeling mode.
- **Graph schema structures** in `core/hygm/models/graph_models.py` (e.g., `GraphModel`, `GraphNode`, `GraphRelationship`) capture the in-memory graph representation HyGM produces and later serializes to schema format.
- **LLM structured output models** in `core/hygm/models/llm_models.py` (`LLMGraphModel`, `LLMGraphNode`, `LLMGraphRelationship`) describe the contract for AI-generated schemas and include `to_graph_model()` utilities to convert LLM responses into the standard `GraphModel` objects.
- The `GraphModel` serialization format matches the canonical spec in `core/schema/spec.json`, so any changes to the schema data classes should be mirrored against that document.
- Source tracking helpers in `core/hygm/models/sources.py` annotate nodes, relationships, properties, indexes, and constraints with origin metadata. Preserve these when modifying `GraphModel` so downstream migrations retain the link back to the originating tables or user-applied changes.

## Entry Points & CLI Flow

- Run with `uv run main.py` (banner, env checks, connection probes, then the migration workflow).
- CLI prompts include:
- Graph modeling mode (automatic / incremental with interactive refinement).
- Modeling strategy (deterministic / AI-powered).
- Confirmation dialogs during automatic or incremental flows.
- Post-session prompts that let users launch the interactive refinement loop
after reviewing every table in an incremental run.
- Environment validation happens before migration; failures raise `MigrationEnvironmentError` or `DatabaseConnectionError` from `utils/`.

## Configuration & Environment

- `.env` (or env vars) must provide:
- `SOURCE_DB_*` (host, port, name, user, password, type [`mysql|postgresql`]).
- `MEMGRAPH_*` connection details.
- `OPENAI_API_KEY` for LLM features; omit or leave empty to disable LLM strategy.
- Memgraph must run with `--schema-info-enabled=true` for schema validation.

## Testing & Validation

- Install deps: `uv sync` (or `uv pip install -e .`).
- Run targeted tests: `uv run python -m pytest tests/test_integration.py -v`.
- Keep graph-modeling logic covered via integration tests; they rely on mocked analyzers.
- Observe linting: adhere to 79-character lines and existing logging conventions (`logging` module).

## Development Tips

- Update `PROMPT.md` when project layout or workflows change.
- Prefer existing abstractions: use `SQLToMemgraphAgent` methods, HyGM strategies/helpers, and database adapters.
- For new modeling flows, ensure `ModelingMode` and CLI choices stay in sync.
- Preserve user-facing prompts/emojis—they guide the interactive experience.
- When adding LLM-dependent features, guard them when API keys or clients are missing.
- Document new commands or config expectations in this prompt and `README.md` if user-facing.

## Useful Commands

```bash
# Sync dependencies
uv sync

# Run the CLI
uv run main.py

# Run tests
uv run python -m pytest tests -v
```

## When Generating Code

- Mention which files you change and why, referencing modules above.
- Explain how to rerun the CLI or relevant tests after modifications.
- Provide small follow-up suggestions if more validation is needed.
- Keep output concise but cover context so the next agent run has everything it needs.
52 changes: 39 additions & 13 deletions integrations/agents/README.md → agents/sql2memgraph/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,9 @@ This package provides a sophisticated migration agent that:
- **Generates optimal graph models** - Uses AI to create node and relationship structures
- **Creates indexes and constraints** - Ensures performance and data integrity
- **Handles complex relationships** - Converts foreign keys to graph relationships
- **Interactive refinement** - Allows users to review and adjust the graph model
- **Incremental refinement** - Review each table, adjust the model
immediately, then enter the interactive refinement loop once all tables
are processed
- **Comprehensive validation** - Verifies migration results and data integrity

## Installation
Expand All @@ -35,38 +37,61 @@ The agent will guide you through:

1. Environment setup and database connections
2. Graph modeling strategy selection
3. Interactive or automatic migration mode
3. Automatic or incremental migration mode
4. Complete migration workflow with progress tracking

> **Incremental review:** The LLM now drafts the entire graph model in a single
> shot and then walks you through table-level changes detected since the last
> migration. You only need to approve (or tweak) the differences that matter.

You can also preconfigure the workflow using CLI flags or environment variables:

```bash
uv run main --mode incremental --strategy llm --meta-graph reset --log-level DEBUG
```

| Option | Environment | Description |
| -------------------------------- | -------------------- | ------------------------------------------------------------- |
| `--mode {automatic,incremental}` | `SQL2MG_MODE` | Selects automatic or incremental modeling flow. |
| `--strategy {deterministic,llm}` | `SQL2MG_STRATEGY` | Chooses deterministic or LLM-powered HyGM strategy. |
| `--meta-graph {auto,skip,reset}` | `SQL2MG_META_POLICY` | Controls how stored meta graph data is used (default `auto`). |
| `--log-level LEVEL` | `SQL2MG_LOG_LEVEL` | Sets logging verbosity (`DEBUG`, `INFO`, etc.). |

## Configuration

Set up your environment variables in `.env`:

```bash
# Source Database (MySQL/PostgreSQL)
SOURCE_DB_HOST=localhost
SOURCE_DB_PORT=3306
SOURCE_DB_NAME=your_database
SOURCE_DB_USER=username
SOURCE_DB_PASSWORD=password
SOURCE_DB_TYPE=mysql # or postgresql
# MySQL Database (primary source)
MYSQL_HOST=localhost
MYSQL_PORT=3306
MYSQL_DATABASE=sakila
MYSQL_USER=username
MYSQL_PASSWORD=password

# Memgraph Database
MEMGRAPH_HOST=localhost
MEMGRAPH_PORT=7687
MEMGRAPH_USER=
MEMGRAPH_URL=bolt://localhost:7687
MEMGRAPH_USERNAME=
MEMGRAPH_PASSWORD=
MEMGRAPH_DATABASE=memgraph

# OpenAI (for LLM-powered features)
OPENAI_API_KEY=your_openai_key

# Optional migration defaults (override CLI prompts)
SQL2MG_MODE=automatic
SQL2MG_STRATEGY=deterministic
SQL2MG_META_POLICY=auto
SQL2MG_LOG_LEVEL=INFO
```

Make sure that Memgraph is started with the `--schema-info-enabled=true`, since agent uses the schema information from Memgraph `SHOW SCHEMA INFO`.

# Arhitecture

```
core/hygm/
├── hygm.py # Main orchestrator class
├── hygm.py # Main orchestrator class
├── models/ # Data models and structures
│ ├── graph_models.py # Core graph representation
│ ├── llm_models.py # LLM-specific models
Expand All @@ -76,3 +101,4 @@ core/hygm/
├── base.py # Abstract interface
├── deterministic.py # Rule-based modeling
└── llm.py # AI-powered modeling
```
Original file line number Diff line number Diff line change
Expand Up @@ -17,13 +17,14 @@

```python
from agents.core import SQLToMemgraphAgent, HyGM
from agents.core.hygm import ModelingMode
from agents.utils import setup_and_validate_environment

# Setup environment
mysql_config, memgraph_config = setup_and_validate_environment()

# Create migration agent
agent = SQLToMemgraphAgent(interactive_graph_modeling=False)
agent = SQLToMemgraphAgent(modeling_mode=ModelingMode.AUTOMATIC)

# Run migration
result = agent.migrate(mysql_config, memgraph_config)
Expand Down
Loading