Note: This tool was developed for Storj Labs' internal migration needs and is being open-sourced to potentially benefit others. It successfully migrated 8 large-scale databases in late 2024 and early 2025. However, it currently requires polishing, generalization, additional testing, and documentation before it would be suitable for general use. We're sharing it in its current state for those who might find it useful as a reference or starting point for their own migration projects. As our migration is complete, we don't anticipate significant ongoing development.
A specialized tool developed by Storj Labs for migrating databases from CockroachDB to Google Cloud Spanner with a focus on reliability and data integrity.
- Data streaming: Captures data changes from CockroachDB using changefeeds
- Reliable transfer: Processes changes through Google PubSub
- Parallel processing: Configurable worker pools for efficient batch processing
- Validation: Tools to verify migration correctness and compare data between sources
- Metrics: Collection and monitoring of migration progress
- Graceful shutdown: Support for safely stopping and resuming migrations
The migration process works through these main components:
- Changefeed: Manages CockroachDB changefeeds to capture data changes
- PubSub: Routes change events through Google PubSub
- Worker Pool: Processes events in batches
- Spanner Writer: Persists changes to the target Spanner database
- Validation: Compares source and target data for integrity
- Access to source CockroachDB database
- Google Cloud project with Spanner instance
- Google Cloud service account with appropriate permissions
- Google Cloud PubSub setup
Create a YAML configuration file (default: migration.yaml
) or set the environment variable STORJ_MIGRATION_CONFIG
:
cockroach: postgresql://USER:PASSWORD@your-cockroach-host:26257/{{DB}}?sslmode=verify-full&sslrootcert=/path/to/ca.crt
credential: /path/to/service-account.json
spanner: projects/your-project/instances/your-instance/databases/{{DB}}
project: your-gcp-project
topic: prefix-{{TABLE}}
prefix: your-prefix
# Process changefeeds from PubSub to Spanner for a specific table
./spanner-migration process --table your_table \
--batch-size 2500 \
--workers 24
# Create a changefeed
./spanner-migration changefeed create --table your_table
# List running changefeeds
./spanner-migration changefeed list
# Cancel a changefeed
./spanner-migration changefeed cancel --id your_changefeed_id
# Compare data between CockroachDB and Spanner
./spanner-migration validation compare --table your_table
process
: Process changefeeds from PubSub to Spannerpersist
: Persist a single changefeed filemetrics
: Monitor migration progresspubsub
: Manage PubSub topics and subscriptionsspanner
: Manage Spanner database operationschangefeed
: Create and manage CockroachDB changefeedsvalidation/validate
: Compare and validate data integritytable-list
: List supported tablescockroach
: Perform CockroachDB operations
The migration tool has several parameters that can be tuned:
--batch-size
: Number of records to batch together (default: 2500)--workers
: Number of parallel workers (default: 24)--commit-delay
: Delay between commits (default: 300ms)--commit-timeout
: Maximum time to wait for a commit (default: 60s)--max-retries
: Maximum number of retry attempts (default: 10)
Licensed under the GNU Affero General Public License v3.0 (AGPL-3.0).
Copyright (C) 2024 Storj Labs, Inc.