An initial loader prototype for the web portal of the Alliance of Genome Resources.
- Docker
- Docker-compose
-
Build the local image with
make build
. -
Start the Neo4j database with
make startdb
. Allow ~10 seconds for Neo4j to initialize.- To initialize an empty database after previously using the loader, be sure to run
make removedb
before runningmake startdb
.
- To initialize an empty database after previously using the loader, be sure to run
-
ensure that your local docker installation has access to at least 5G (preferentially 8G) of memory or else your run_test target will fail with a non-inituative error that "Cannot resolve address 'neo4j'" this can be done in the docker preferences.
- Initialize a full load with
make run
. - Alternatively,
make run_test
will launch a much smaller test load; this is useful for development and testing.
- Once the loader has been run (either test load or full load), unit tests can be executed via
make unit_tests
.
- From your command line:
docker exec -ti neo4j bin/cypher-shell
- A quick command to count the number of nodes in your db:
match (n) return count (n);
- A quick command to count the number of nodes in your db:
- Remove the database with
make removedb
.
make reload
will re-run theInstallation
andRunning the Loader
steps from above.make reload_test
will re-run the same steps using a test subset of data.- note: reload_test will not re-download the file bolus.
- There are 3 loader configurations that come with the system (in src/config): default.yml, develop.yml, test.yml. Each is set up to work on a particular environment (and differs in the default number of threads for both downloading files and the number of threads used to load the database). test.yml will be used while running the load using the test data set. default.yml is the configuration used on all the shared systems and on production. develop.yml is used for the full data set on a development system. Each can be modified to remove or add the data types (ie: Allele, BGI, Expression, etc...) and subtypes (ie: ZFIN, SGD, RGD, etc...) as needed for development purposes.
- When adding a new data load, be sure to add to validation.yml as well so the system knows the expected data types and subtypes.
- local_submission_system.json is a file consumed in addition to the submission system data (from the submission system API) that is used to customize non-submission system files like ontology files.
- DOWNLOAD_HOST - the s3 bucket from which files are pulled.
- ALLIANCE_RELEASE - the release version that this code acts on.
- FMS_API_URL - the host from which this code pulls its available file paths from (submission system host). Note: the submission system host is reliant on the ferret file grabber. That pipeline is responsible for ontologie files and GAF files being up to date. And, the submission system requires a snapshot to be taken to fetch 'latest' files.
- TEST_SCHEMA_BRANCH - If set that branch of the agr_schema wil be used instead of master
- If the site is built with docker-compose, these will be set automatically to the 'dev' versions of all these variables.