BioSamples search is a Spring Boot application leveraging ElasticSearch for full-text search, filtering and faceting.
The project contains 2 modules: proto and server. The following list shows important directories in the project.
proto- protobuf definitions and generated codeserver- server application exposing search endpoints ..model- core biosamples modelfilter- filtering related codefacet- faceting related code
helm- cicd, k8s deploymentk8s- other related deployment files (ES, PV, ..)docs- further documentation
Three main APIs are exposed by the application.
- Search samples (POST, GRPC)
- Search samples streaming (GRPC)
- Get facets for search (POST, GRPC)
BioSamples core services uses GRPC to communicate with biosamples-search. The RESTfull services are implemented mainly for the testing and development purposes.
- Java 24
./gradlew build
# build without unit and integration tests
./gradlew build -x test -x check
# build only proto module
./gradlew :proto:buildcurl --location 'http://localhost:8080/search' \
--header 'Content-Type: application/json' \
--data '{
"text": "soil",
"filters": [
{
"type": "attr",
"field": "env_medium",
"values": ["Soil"]
},
{
"type": "attr",
"field": "locus_tag_prefix",
"values": ["SM2"]
},
{
"type": "acc",
"accession": "SAMD00000364"
},
{
"type": "dt",
"field": "create",
"from": "2014-04-21T00:00:00Z",
"to": "2014-04-22T05:00:00Z"
}
],
"page": 0,
"size": 3,
"sort": [
{
"direction": "DESC",
"field": "create"
}
]
}'curl --location 'http://localhost:8080/facet' \
--header 'Content-Type: application/json' \
--data '{
"text": "live",
"filters": [
]
}'Currently, there are two faceting strategies implemented. The default implementation RegularFacetingStratey could be slow due to large number of attributes in BioSamples database.
The SamplingFacetingStrategy uses sampling method to get facets from all shards faster, but is not providing the exact facet count.
It is possible to limit the set of attributes to be faceted for even faster results. This is left as a future enhancement.