BioSamples Search

BioSamples search is a Spring Boot application leveraging ElasticSearch for full-text search, filtering and faceting.

Project Structure

The project contains 2 modules: proto and server. The following list shows important directories in the project.

proto - protobuf definitions and generated code
server - server application exposing search endpoints ..
- model - core biosamples model
- filter - filtering related code
- facet - faceting related code
helm - cicd, k8s deployment
k8s - other related deployment files (ES, PV, ..)
docs - further documentation

API

Three main APIs are exposed by the application.

Search samples (POST, GRPC)
Search samples streaming (GRPC)
Get facets for search (POST, GRPC)

BioSamples core services uses GRPC to communicate with biosamples-search. The RESTfull services are implemented mainly for the testing and development purposes.

Build

Requirements

Java 24

./gradlew build
# build without unit and integration tests
./gradlew build -x test -x check
# build only proto module
./gradlew :proto:build

Search samples

POST

curl --location 'http://localhost:8080/search' \
--header 'Content-Type: application/json' \
--data '{
    "text": "soil",
    "filters": [
      {
        "type": "attr",
        "field": "env_medium",
        "values": ["Soil"]
      },
      {
        "type": "attr",
        "field": "locus_tag_prefix",
        "values": ["SM2"]
      },
      {
        "type": "acc",
        "accession": "SAMD00000364"
      },
      {
        "type": "dt",
        "field": "create",
        "from": "2014-04-21T00:00:00Z",
        "to": "2014-04-22T05:00:00Z"
      }
    ],
    "page": 0,
    "size": 3,
    "sort": [
      {
        "direction": "DESC",
        "field": "create"
      }
    ]
}'

Get facets for search

POST

curl --location 'http://localhost:8080/facet' \
--header 'Content-Type: application/json' \
--data '{
    "text": "live",
    "filters": [
    ]
}'

Facets

Currently, there are two faceting strategies implemented. The default implementation RegularFacetingStratey could be slow due to large number of attributes in BioSamples database. The SamplingFacetingStrategy uses sampling method to get facets from all shards faster, but is not providing the exact facet count. It is possible to limit the set of attributes to be faceted for even faster results. This is left as a future enhancement.

Name		Name	Last commit message	Last commit date
Latest commit History 83 Commits
docs		docs
gradle/wrapper		gradle/wrapper
k8s		k8s
proto		proto
server		server
.env		.env
.gitattributes		.gitattributes
.gitignore		.gitignore
.gitlab-ci.yml		.gitlab-ci.yml
Dockerfile		Dockerfile
build.gradle.kts		build.gradle.kts
docker-compose.yaml		docker-compose.yaml
gradlew		gradlew
gradlew.bat		gradlew.bat
readme.md		readme.md
search.md		search.md
settings.gradle.kts		settings.gradle.kts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

BioSamples Search

Project Structure

API

Build

Requirements

Search samples

POST

Get facets for search

POST

Facets

About

Uh oh!

Releases

Packages

Languages

EBIBioSamples/biosamples-search

Folders and files

Latest commit

History

Repository files navigation

BioSamples Search

Project Structure

API

Build

Requirements

Search samples

POST

Get facets for search

POST

Facets

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages