Automated schema translation, zero-downtime migration, and validation between Pinecone, Weaviate, Qdrant, and Milvus.
"Every week you're stuck in security review is a week your AI features aren't in production."
— Pinecone BYOC Announcement, February 2026
VectorMigrate is a production-grade tool for migrating vector databases with:
- ✅ Zero downtime - Dual-write architecture during migration
- ✅ Automated schema mapping - Intelligent field type conversion
- ✅ Real-time validation - Cosine similarity >0.98 guarantee
- ✅ AI Assistant Integration - Full MCP (Model Context Protocol) support
Supported Databases: Pinecone, Qdrant, Weaviate, Milvus
# Clone repository
git clone https://github.com/AlphaTechini/vector-db-migration.git
cd vector-db-migration
# Build binary
go build -o vectormigrate ./cmd/vectormigrate./vectormigrate serve \
--api-key your-secret-key \
--addr :8080# Get migration status
curl -X POST http://localhost:8080 \
-H "Authorization: Bearer your-secret-key" \
-H "Content-Type: application/json" \
-d '{"jsonrpc":"2.0","id":1,"method":"migration_status","params":{"migration_id":"mig-123"}}'
# List migrations
curl -X POST http://localhost:8080 \
-H "Authorization: Bearer your-secret-key" \
-H "Content-Type: application/json" \
-d '{"jsonrpc":"2.0","id":2,"method":"list_migrations","params":{"limit":10}}'
# Get schema recommendations
curl -X POST http://localhost:8080 \
-H "Authorization: Bearer your-secret-key" \
-H "Content-Type: application/json" \
-d '{"jsonrpc":"2.0","id":3,"method":"schema_recommendation","params":{"source_type":"pinecone","target_type":"qdrant"}}'Start the Model Context Protocol server for AI assistant integration.
./vectormigrate serve --api-key YOUR_KEY --addr :8080Flags:
--addr string- Address to listen on (default: ":8080")--api-key string- API key for authentication (required)
Start a database migration.
./vectormigrate migrate mig-123 \
--source-type pinecone \
--source-url https://api.pinecone.io \
--source-api-key $PINECONE_KEY \
--source-index my-index \
--target-type qdrant \
--target-url http://localhost:6333 \
--target-api-key "" \
--target-index my-collection \
--batch-size 100 \
--max-retries 3 \
--validate-every 10Flags:
--source-type- Source DB type (pinecone/qdrant/weaviate/milvus)--source-url- Source database URL--source-api-key- Source authentication--source-index- Source index/collection name--target-*- Same as source flags--batch-size- Records per batch (default: 100)--max-retries- Retry attempts (default: 3)--validate-every- Validate every N batches (default: 10)--dry-run- Simulate without writing
./vectormigrate status mig-123./vectormigrate validate mig-123 --sample-size 100Undo a failed or partial migration safely.
./vectormigrate rollback mig-123 --forceWhen designing the rollback feature, we had to choose the most efficient and robust way to "undo" a migration without slowing down the primary process or bloating your disk.
Why we chose this path:
- No Additional Storage: We initially considered keeping a local SQLite "journal" of every ID we moved, but for a 10M record database, stringing along millions of IDs on your local disk would cause massive IO bloat.
- No "Hidden" Tags: We also evaluated adding a hidden
_vm_mid(migration ID) tag to your vectors' metadata for an "instant" delete. However, modifying your production data's schema just for an internal migration tool is an anti-pattern. - The Solution: We rely on the Source database as the truth. When you rollback, our orchestrator checks the state tracker for the exact
LastProcessedIDwhere the failure occurred. It then spawns a fast Producer to scan the Source DB up to that point, and hands the IDs off to a pool of 5 concurrent workers that execute parallelDeleteBatchrequests against the Target DB.
The result: You get a rollback that keeps your metadata perfectly pure, requires zero extra local storage, and still runs blazingly fast due to the concurrent worker pool.
Testing the Rollback:
Because concurrency can be tricky, we specifically designed tests in orchestrator_test.go to mock a Target database and a State Checkpoint. The tests prove two critical things:
- Strict Boundaries: The test (
TestBaseOrchestrator_Rollback) verifies that if a migration stops at ID 3 out of 5, the workers will only delete IDs 1 through 3, leaving 4 and 5 completely untouched. - Concurrency Safety: We added
TestBaseOrchestrator_RollbackConcurrencyto push large batches through the worker pool and guarantee no data races or lost IDs occur under multi-threaded load.
Moving vectors isn't like moving files; it's more like moving a conversation. Because vector databases use different indexing algorithms and floating-point math, we need to be 100% sure the "meaning" of your data didn't shift during the flight.
Why we support two paths:
- Standard Sampling (The Fast Path): Most users want a quick "sanity check" after a migration. We pick random IDs from the source and fetch their counterparts from the target in a single batch. If the Cosine Similarity is >0.999, we know the vector math is identical. This takes seconds, even for billions of records.
- Parallel Full Scan (The Audit Path): For high-stakes or regulated industries, a "sample" isn't enough. We implemented a streaming validator that reads 100% of both databases and compares every single pair of vectors. It's slower ($O(N)$), but it provides total mathematical certainty.
Go-Native Performance Boosts: To keep validation from becoming a bottleneck, we lean hard into Go's low-level efficiency:
- Zero-Copy Slicing: We pass vector data using slice headers. We never copy the actual float arrays in memory, making data movement essentially "free."
- Worker Pools: We use a bounded pool of workers to handle the math concurrently without overwhelming the system or the Go scheduler.
- Batch Processing: We fetch IDs in batches of 250+ to minimize network Round Trip Time (RTT), which is almost always the real performance killer.
When designing the MCP server's rate limiting, we initially used a simple map to track request buckets per IP/User. However, we quickly realized a critical flaw: a map grows indefinitely, leading to memory leaks over time as stale users never get cleaned up.
Why we chose the LRU cache over background sweepers:
- No Background Jitter: A standard approach is to run a background goroutine ticking every minute to delete old buckets. We rejected this because background tasks introduce unpredictable CPU jitter and complicate graceful shutdowns.
-
Strict Memory Ceiling: By using a Least-Recently-Used (LRU) cache via Go's internal
container/list, we enforce a hard limit on the number of tracked buckets (e.g., 10,000 users). -
Passive Tail Eviction: Instead of sweeping the whole map, every incoming request simply checks the oldest item at the tail of the linked list. If that item has expired, we delete it. This
$O(1)$ cleanup amortizes the cost of garbage collection across requests seamlessly, keeping memory completely flat without background threads.
Decoding JSON configurations into strongly-typed languages like Go is notorious for friction. JSON represents all numbers as float64, which leads to fragile interface{} type assertions and cryptic panics when standard integers are passed.
Why we adopted mapstructure:
- Weakly Typed Resilience: Instead of fighting JSON spec standards, we integrated
github.com/mitchellh/mapstructureand enabledWeaklyTypedInput. This allows the orchestrator to dynamically coercer floats, strings, and ints into explicit Go structs without crashing. - Decoupling Logic: This enforces a strict Clean Architecture boundary. MCP tools no longer parse raw maps; they define a strict Input Struct, decode once at the edge, and run business logic securely.
VectorMigrate exposes capabilities via MCP for AI assistant integration.
Get the current status and progress of a migration.
Input:
{
"migration_id": "mig-123"
}Output:
{
"migration_id": "mig-123",
"status": "in_progress",
"progress": {
"total_records": 10000,
"migrated_records": 5432,
"percentage": 54.32
},
"batches_processed": 54,
"started_at": "2026-02-22T10:00:00Z",
"ended_at": null
}List all migrations with optional filtering and pagination.
Input:
{
"status": "in_progress",
"limit": 10,
"offset": 0,
"sort_by": "created_at",
"sort_order": "desc"
}Output:
{
"migrations": [
{
"migration_id": "mig-123",
"status": "in_progress",
"created_at": "2026-02-22T10:00:00Z",
"progress": {
"total": 10000,
"current": 5432,
"percent": 54.32
}
}
],
"total": 1,
"limit": 10,
"offset": 0
}Get schema mapping recommendations for database migrations.
Input:
{
"source_type": "pinecone",
"target_type": "qdrant",
"source_schema": {
"id": "string",
"title": "string",
"custom_field": "text"
}
}Output:
{
"source_type": "pinecone",
"target_type": "qdrant",
"field_mappings": [
{
"source_field": "id",
"target_field": "id",
"confidence": 1.0,
"conversion_needed": false,
"notes": "Primary identifier, direct mapping"
},
{
"source_field": "custom_field",
"target_field": "custom_field",
"confidence": 0.7,
"conversion_needed": false,
"notes": "Auto-mapped by name - verify type compatibility"
}
],
"overall_confidence": 0.9,
"warnings": [
"Pinecone flat metadata will be flattened in Qdrant with dot notation"
]
}- ✅ API Key Authentication - Bearer token in Authorization header
- ✅ Rate Limiting - 100 requests/minute per API key
- ✅ Audit Logging - All requests logged with masked keys
- ✅ Constant-Time Comparison - Prevents timing attacks
internal/state/ - State persistence (SQLite)
internal/adapters/ - Database adapters (Pinecone, Qdrant, Weaviate)
internal/mapper/ - Schema mappers
internal/mcp/ - MCP protocol implementation
internal/mcp/tools/ - MCP tools (status, list, schema)
internal/orchestrator/ - Migration orchestration
cmd/vectormigrate/ - CLI commands
┌─────────────┐
│ CLI/UI │
└──────┬──────┘
│
┌──────▼──────┐
│ MCP │ ← HTTP + JSON-RPC 2.0
│ Server │
└──────┬──────┘
│
┌──────▼──────┐
│ Orchestrator│ ← Coordinates migration
└──────┬──────┘
│
┌──────┴──────┐
│ Source Target│
│ DB DB │
└──────────────┘
| From → To | Pinecone | Qdrant | Weaviate | Milvus |
|---|---|---|---|---|
| Pinecone | - | ✅ | ✅ | 🔄 |
| Qdrant | ✅ | - | 🔄 | 🔄 |
| Weaviate | ✅ | 🔄 | - | 🔄 |
| Milvus | 🔄 | 🔄 | 🔄 | - |
Legend:
- ✅ Fully implemented + tested
- 🔄 Planned (generic path available)
go test ./... -v# Start server in background
./vectormigrate serve --api-key test-key &
# Run test suite
./scripts/test-mcp.sh- ✅ MCP protocol (JSON-RPC 2.0)
- ✅ Authentication middleware
- ✅ Rate limiting
- ✅ Audit logging
- ✅ All 3 MCP tools
- ✅ State tracker (SQLite)
- ✅ Database adapters
# Start MCP server
./vectormigrate serve --api-key my-key
# In another terminal, start migration
./vectormigrate migrate mig-pinecone-to-qdrant \
--source-type pinecone \
--source-url https://api.pinecone.io \
--source-api-key $PINECONE_API_KEY \
--source-index production \
--target-type qdrant \
--target-url http://localhost:6333 \
--target-index production \
--batch-size 100
# Monitor progress
watch -n 2 './vectormigrate status mig-pinecone-to-qdrant'curl -X POST http://localhost:8080 \
-H "Authorization: Bearer my-key" \
-H "Content-Type: application/json" \
-d '{
"jsonrpc": "2.0",
"id": 1,
"method": "schema_recommendation",
"params": {
"source_type": "pinecone",
"target_type": "weaviate",
"source_schema": {
"document_id": "string",
"chunk_text": "text",
"embedding": "vector",
"metadata": "object"
}
}
}' | jq .- State tracker (SQLite backend)
- Database adapters (Pinecone, Qdrant, Weaviate)
- Schema mapper (Pinecone↔Qdrant)
- Migration orchestrator
- MCP server (HTTP + JSON-RPC 2.0)
- Authentication middleware
- Rate limiting
- Audit logging
- migration_status tool
- list_migrations tool
- schema_recommendation tool
- Integration tests
- start_migration tool
- stop_migration tool
- validate_migration tool
- Prometheus metrics
- Grafana dashboards
- Distributed tracing
- Health checks
- Documentation site
- Never commit API keys - Use environment variables
- Use strong API keys - Minimum 32 characters
- Enable audit logging - Track all operations
- Rate limit aggressively - Prevent abuse
- Validate inputs - SQL injection prevention
- ✅ SOC 2 ready (audit trails)
- ✅ GDPR compliant (data residency)
- ✅ HIPAA ready (encryption at rest)
# Clone repository
git clone https://github.com/AlphaTechini/vector-db-migration.git
cd vector-db-migration
# Install dependencies
go mod download
# Run tests
go test ./...
# Build binary
go build -o vectormigrate ./cmd/vectormigrate- Create feature branch (
feature/my-feature) - Make changes with tests
- Run
go test ./...(must pass) - Run
go fmt ./...(format code) - Submit PR with description
- One feature per file (<200 lines each)
- One commit per feature
- Interfaces first, implementations second
- Tests written WITH implementation
- No debugging marathons (>1hr → stop & reassess)
- First Principles Design - Architecture decisions
- MCP First Principles - MCP integration plan
- Market Analysis - Why this tool exists
- Schema Comparison - Database differences
- Roadmap - Development timeline
Built with inspiration from:
- Pinecone - Vector database pioneer
- Qdrant - High-performance open-source
- Weaviate - GraphQL-native vector DB
- Milvus - Scalable vector database
MIT License - see LICENSE file for details.
Built with ❤️ by AlphaTechini