VectorMigrate - Zero-Downtime Vector Database Migration

Automated schema translation, zero-downtime migration, and validation between Pinecone, Weaviate, Qdrant, and Milvus.

"Every week you're stuck in security review is a week your AI features aren't in production."
— Pinecone BYOC Announcement, February 2026

🎯 What is VectorMigrate?

VectorMigrate is a production-grade tool for migrating vector databases with:

✅ Zero downtime - Dual-write architecture during migration
✅ Automated schema mapping - Intelligent field type conversion
✅ Real-time validation - Cosine similarity >0.98 guarantee
✅ AI Assistant Integration - Full MCP (Model Context Protocol) support

Supported Databases: Pinecone, Qdrant, Weaviate, Milvus

🚀 Quick Start

Installation

# Clone repository
git clone https://github.com/AlphaTechini/vector-db-migration.git
cd vector-db-migration

# Build binary
go build -o vectormigrate ./cmd/vectormigrate

Start MCP Server

./vectormigrate serve \
  --api-key your-secret-key \
  --addr :8080

Test with curl

# Get migration status
curl -X POST http://localhost:8080 \
  -H "Authorization: Bearer your-secret-key" \
  -H "Content-Type: application/json" \
  -d '{"jsonrpc":"2.0","id":1,"method":"migration_status","params":{"migration_id":"mig-123"}}'

# List migrations
curl -X POST http://localhost:8080 \
  -H "Authorization: Bearer your-secret-key" \
  -H "Content-Type: application/json" \
  -d '{"jsonrpc":"2.0","id":2,"method":"list_migrations","params":{"limit":10}}'

# Get schema recommendations
curl -X POST http://localhost:8080 \
  -H "Authorization: Bearer your-secret-key" \
  -H "Content-Type: application/json" \
  -d '{"jsonrpc":"2.0","id":3,"method":"schema_recommendation","params":{"source_type":"pinecone","target_type":"qdrant"}}'

🔧 CLI Commands

`serve` - Start MCP Server

Start the Model Context Protocol server for AI assistant integration.

./vectormigrate serve --api-key YOUR_KEY --addr :8080

Flags:

--addr string - Address to listen on (default: ":8080")
--api-key string - API key for authentication (required)

`migrate` - Start Migration

Start a database migration.

./vectormigrate migrate mig-123 \
  --source-type pinecone \
  --source-url https://api.pinecone.io \
  --source-api-key $PINECONE_KEY \
  --source-index my-index \
  --target-type qdrant \
  --target-url http://localhost:6333 \
  --target-api-key "" \
  --target-index my-collection \
  --batch-size 100 \
  --max-retries 3 \
  --validate-every 10

Flags:

--source-type - Source DB type (pinecone/qdrant/weaviate/milvus)
--source-url - Source database URL
--source-api-key - Source authentication
--source-index - Source index/collection name
--target-* - Same as source flags
--batch-size - Records per batch (default: 100)
--max-retries - Retry attempts (default: 3)
--validate-every - Validate every N batches (default: 10)
--dry-run - Simulate without writing

`status` - Get Migration Status

./vectormigrate status mig-123

`validate` - Run Validation

./vectormigrate validate mig-123 --sample-size 100

`rollback` - Rollback Migration

Undo a failed or partial migration safely.

./vectormigrate rollback mig-123 --force

🏗️ Under the Hood: The "Concurrent Source-Scan" Rollback

When designing the rollback feature, we had to choose the most efficient and robust way to "undo" a migration without slowing down the primary process or bloating your disk.

Why we chose this path:

No Additional Storage: We initially considered keeping a local SQLite "journal" of every ID we moved, but for a 10M record database, stringing along millions of IDs on your local disk would cause massive IO bloat.
No "Hidden" Tags: We also evaluated adding a hidden _vm_mid (migration ID) tag to your vectors' metadata for an "instant" delete. However, modifying your production data's schema just for an internal migration tool is an anti-pattern.
The Solution: We rely on the Source database as the truth. When you rollback, our orchestrator checks the state tracker for the exact LastProcessedID where the failure occurred. It then spawns a fast Producer to scan the Source DB up to that point, and hands the IDs off to a pool of 5 concurrent workers that execute parallel DeleteBatch requests against the Target DB.

The result: You get a rollback that keeps your metadata perfectly pure, requires zero extra local storage, and still runs blazingly fast due to the concurrent worker pool.

Testing the Rollback: Because concurrency can be tricky, we specifically designed tests in orchestrator_test.go to mock a Target database and a State Checkpoint. The tests prove two critical things:

Strict Boundaries: The test (TestBaseOrchestrator_Rollback) verifies that if a migration stops at ID 3 out of 5, the workers will only delete IDs 1 through 3, leaving 4 and 5 completely untouched.
Concurrency Safety: We added TestBaseOrchestrator_RollbackConcurrency to push large batches through the worker pool and guarantee no data races or lost IDs occur under multi-threaded load.

🏗️ Under the Hood: The "Two-Path" Validation

Moving vectors isn't like moving files; it's more like moving a conversation. Because vector databases use different indexing algorithms and floating-point math, we need to be 100% sure the "meaning" of your data didn't shift during the flight.

Why we support two paths:

Standard Sampling (The Fast Path): Most users want a quick "sanity check" after a migration. We pick random IDs from the source and fetch their counterparts from the target in a single batch. If the Cosine Similarity is >0.999, we know the vector math is identical. This takes seconds, even for billions of records.
Parallel Full Scan (The Audit Path): For high-stakes or regulated industries, a "sample" isn't enough. We implemented a streaming validator that reads 100% of both databases and compares every single pair of vectors. It's slower ($O(N)$), but it provides total mathematical certainty.

Go-Native Performance Boosts: To keep validation from becoming a bottleneck, we lean hard into Go's low-level efficiency:

Zero-Copy Slicing: We pass vector data using slice headers. We never copy the actual float arrays in memory, making data movement essentially "free."
Worker Pools: We use a bounded pool of workers to handle the math concurrently without overwhelming the system or the Go scheduler.
Batch Processing: We fetch IDs in batches of 250+ to minimize network Round Trip Time (RTT), which is almost always the real performance killer.

🏗️ Under the Hood: The O(1) LRU Rate Limiter

When designing the MCP server's rate limiting, we initially used a simple map to track request buckets per IP/User. However, we quickly realized a critical flaw: a map grows indefinitely, leading to memory leaks over time as stale users never get cleaned up.

Why we chose the LRU cache over background sweepers:

No Background Jitter: A standard approach is to run a background goroutine ticking every minute to delete old buckets. We rejected this because background tasks introduce unpredictable CPU jitter and complicate graceful shutdowns.
Strict Memory Ceiling: By using a Least-Recently-Used (LRU) cache via Go's internal container/list, we enforce a hard limit on the number of tracked buckets (e.g., 10,000 users).
Passive Tail Eviction: Instead of sweeping the whole map, every incoming request simply checks the oldest item at the tail of the linked list. If that item has expired, we delete it. This $O(1)$ cleanup amortizes the cost of garbage collection across requests seamlessly, keeping memory completely flat without background threads.

🏗️ Under the Hood: Clean Architecture Parameter Parsing

Decoding JSON configurations into strongly-typed languages like Go is notorious for friction. JSON represents all numbers as float64, which leads to fragile interface{} type assertions and cryptic panics when standard integers are passed.

Why we adopted mapstructure:

Weakly Typed Resilience: Instead of fighting JSON spec standards, we integrated github.com/mitchellh/mapstructure and enabled WeaklyTypedInput. This allows the orchestrator to dynamically coercer floats, strings, and ints into explicit Go structs without crashing.
Decoupling Logic: This enforces a strict Clean Architecture boundary. MCP tools no longer parse raw maps; they define a strict Input Struct, decode once at the edge, and run business logic securely.

🤖 MCP (Model Context Protocol)

VectorMigrate exposes capabilities via MCP for AI assistant integration.

Available Tools

1. `migration_status`

Get the current status and progress of a migration.

Input:

{
  "migration_id": "mig-123"
}

Output:

{
  "migration_id": "mig-123",
  "status": "in_progress",
  "progress": {
    "total_records": 10000,
    "migrated_records": 5432,
    "percentage": 54.32
  },
  "batches_processed": 54,
  "started_at": "2026-02-22T10:00:00Z",
  "ended_at": null
}

2. `list_migrations`

List all migrations with optional filtering and pagination.

Input:

{
  "status": "in_progress",
  "limit": 10,
  "offset": 0,
  "sort_by": "created_at",
  "sort_order": "desc"
}

Output:

{
  "migrations": [
    {
      "migration_id": "mig-123",
      "status": "in_progress",
      "created_at": "2026-02-22T10:00:00Z",
      "progress": {
        "total": 10000,
        "current": 5432,
        "percent": 54.32
      }
    }
  ],
  "total": 1,
  "limit": 10,
  "offset": 0
}

3. `schema_recommendation`

Get schema mapping recommendations for database migrations.

Input:

{
  "source_type": "pinecone",
  "target_type": "qdrant",
  "source_schema": {
    "id": "string",
    "title": "string",
    "custom_field": "text"
  }
}

Output:

{
  "source_type": "pinecone",
  "target_type": "qdrant",
  "field_mappings": [
    {
      "source_field": "id",
      "target_field": "id",
      "confidence": 1.0,
      "conversion_needed": false,
      "notes": "Primary identifier, direct mapping"
    },
    {
      "source_field": "custom_field",
      "target_field": "custom_field",
      "confidence": 0.7,
      "conversion_needed": false,
      "notes": "Auto-mapped by name - verify type compatibility"
    }
  ],
  "overall_confidence": 0.9,
  "warnings": [
    "Pinecone flat metadata will be flattened in Qdrant with dot notation"
  ]
}

Security Features

✅ API Key Authentication - Bearer token in Authorization header
✅ Rate Limiting - 100 requests/minute per API key
✅ Audit Logging - All requests logged with masked keys
✅ Constant-Time Comparison - Prevents timing attacks

🏗️ Architecture

Layer 1: Foundation

internal/state/       - State persistence (SQLite)
internal/adapters/    - Database adapters (Pinecone, Qdrant, Weaviate)
internal/mapper/      - Schema mappers

Layer 2: Core Logic

internal/mcp/         - MCP protocol implementation
internal/mcp/tools/   - MCP tools (status, list, schema)

Layer 3: Coordination

internal/orchestrator/ - Migration orchestration
cmd/vectormigrate/     - CLI commands

Data Flow

┌─────────────┐
│   CLI/UI    │
└──────┬──────┘
       │
┌──────▼──────┐
│   MCP       │ ← HTTP + JSON-RPC 2.0
│   Server    │
└──────┬──────┘
       │
┌──────▼──────┐
│ Orchestrator│ ← Coordinates migration
└──────┬──────┘
       │
┌──────┴──────┐
│ Source  Target│
│  DB      DB   │
└──────────────┘

📊 Supported Migrations

From → To	Pinecone	Qdrant	Weaviate	Milvus
Pinecone	-	✅	✅	🔄
Qdrant	✅	-	🔄	🔄
Weaviate	✅	🔄	-	🔄
Milvus	🔄	🔄	🔄	-

Legend:

✅ Fully implemented + tested
🔄 Planned (generic path available)

🧪 Testing

Unit Tests

go test ./... -v

Integration Tests

# Start server in background
./vectormigrate serve --api-key test-key &

# Run test suite
./scripts/test-mcp.sh

Test Coverage

✅ MCP protocol (JSON-RPC 2.0)
✅ Authentication middleware
✅ Rate limiting
✅ Audit logging
✅ All 3 MCP tools
✅ State tracker (SQLite)
✅ Database adapters

📝 Examples

Example 1: Migrate Pinecone to Qdrant

# Start MCP server
./vectormigrate serve --api-key my-key

# In another terminal, start migration
./vectormigrate migrate mig-pinecone-to-qdrant \
  --source-type pinecone \
  --source-url https://api.pinecone.io \
  --source-api-key $PINECONE_API_KEY \
  --source-index production \
  --target-type qdrant \
  --target-url http://localhost:6333 \
  --target-index production \
  --batch-size 100

# Monitor progress
watch -n 2 './vectormigrate status mig-pinecone-to-qdrant'

Example 2: Get Schema Recommendations

curl -X POST http://localhost:8080 \
  -H "Authorization: Bearer my-key" \
  -H "Content-Type: application/json" \
  -d '{
    "jsonrpc": "2.0",
    "id": 1,
    "method": "schema_recommendation",
    "params": {
      "source_type": "pinecone",
      "target_type": "weaviate",
      "source_schema": {
        "document_id": "string",
        "chunk_text": "text",
        "embedding": "vector",
        "metadata": "object"
      }
    }
  }' | jq .

🚧 Roadmap

Phase 1: Foundation (✅ Complete)

State tracker (SQLite backend)
Database adapters (Pinecone, Qdrant, Weaviate)
Schema mapper (Pinecone↔Qdrant)
Migration orchestrator

Phase 2: MCP Integration (✅ Complete)

Phase 3: Write Operations (🔄 In Progress)

start_migration tool
stop_migration tool
validate_migration tool

Phase 4: Production Hardening (⏳ Planned)

🔒 Security

Best Practices

Never commit API keys - Use environment variables
Use strong API keys - Minimum 32 characters
Enable audit logging - Track all operations
Rate limit aggressively - Prevent abuse
Validate inputs - SQL injection prevention

Compliance

✅ SOC 2 ready (audit trails)
✅ GDPR compliant (data residency)
✅ HIPAA ready (encryption at rest)

🤝 Contributing

Development Setup

# Clone repository
git clone https://github.com/AlphaTechini/vector-db-migration.git
cd vector-db-migration

# Install dependencies
go mod download

# Run tests
go test ./...

# Build binary
go build -o vectormigrate ./cmd/vectormigrate

Pull Request Process

Create feature branch (feature/my-feature)
Make changes with tests
Run go test ./... (must pass)
Run go fmt ./... (format code)
Submit PR with description

Coding Standards

One feature per file (<200 lines each)
One commit per feature
Interfaces first, implementations second
Tests written WITH implementation
No debugging marathons (>1hr → stop & reassess)

📚 Documentation

First Principles Design - Architecture decisions
MCP First Principles - MCP integration plan
Market Analysis - Why this tool exists
Schema Comparison - Database differences
Roadmap - Development timeline

🙏 Acknowledgments

Built with inspiration from:

Pinecone - Vector database pioneer
Qdrant - High-performance open-source
Weaviate - GraphQL-native vector DB
Milvus - Scalable vector database

📄 License

MIT License - see LICENSE file for details.

Built with ❤️ by AlphaTechini

Report Bug · Request Feature · View Demo

Name		Name	Last commit message	Last commit date
Latest commit History 73 Commits
cmd/vectormigrate		cmd/vectormigrate
docs		docs
internal		internal
landing		landing
scripts		scripts
web		web
README.md		README.md
ROADMAP-MCP.md		ROADMAP-MCP.md
ROADMAP.md		ROADMAP.md
SETUP.md		SETUP.md
go.mod		go.mod
go.sum		go.sum
vectormigrate		vectormigrate

Folders and files

Latest commit

History

Repository files navigation

VectorMigrate - Zero-Downtime Vector Database Migration

🎯 What is VectorMigrate?

🚀 Quick Start

Installation

Start MCP Server

Test with curl

🔧 CLI Commands

serve - Start MCP Server

migrate - Start Migration

status - Get Migration Status

validate - Run Validation

rollback - Rollback Migration

🏗️ Under the Hood: The "Concurrent Source-Scan" Rollback

🏗️ Under the Hood: The "Two-Path" Validation

🏗️ Under the Hood: The O(1) LRU Rate Limiter

🏗️ Under the Hood: Clean Architecture Parameter Parsing

🤖 MCP (Model Context Protocol)

Available Tools

1. migration_status

2. list_migrations

3. schema_recommendation

Security Features

🏗️ Architecture

Layer 1: Foundation

Layer 2: Core Logic

Layer 3: Coordination

Data Flow

📊 Supported Migrations

🧪 Testing

Unit Tests

Integration Tests

Test Coverage

📝 Examples

Example 1: Migrate Pinecone to Qdrant

Example 2: Get Schema Recommendations

🚧 Roadmap

Phase 1: Foundation (✅ Complete)

Phase 2: MCP Integration (✅ Complete)

Phase 3: Write Operations (🔄 In Progress)

Phase 4: Production Hardening (⏳ Planned)

🔒 Security

Best Practices

Compliance

🤝 Contributing

Development Setup

Pull Request Process

Coding Standards

📚 Documentation

🙏 Acknowledgments

📄 License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`serve` - Start MCP Server

`migrate` - Start Migration

`status` - Get Migration Status

`validate` - Run Validation

`rollback` - Rollback Migration

1. `migration_status`

2. `list_migrations`

3. `schema_recommendation`

Packages