ESGF_Search_API_Obsolete

ESGF Search API

Scope

This document describes the ESGF Search Services API (Application Programming Interface), which is all the information that is needed by a client application to send a search query to the service, and interpret the response sent back by the service.

Note that the document does NOT include requirements for a browser user interface, or a desktop tool, that act as clients to the API (although the requirements of such clients do inform the definition of the API). Note also that the target of a search is always metadata (specifically, a text document that contains information about matching search results), not the resources themselves (dataset, files, etc.). In other words, an ESGF search is not meant to return a binary stream to data.

Actors

The ESGF search API is meant to be used by the following actors:

Humans using a web browser interface, interacting with the interface in real time
Humans using a rich desktop client, or a command line utility, interacting with the client in real time
Batch Jobs executing data discovery at regular intervals

Currently, the API is not meant to support massive harvesting of records from one ESGF site to another, which is accomplished through different protocols and services.

Example Queries

The following are representative examples of the kind of queries that an ESGF search service must be able to resolve. Unless specified, these queries are expected to be performed by humans using either a web browser, or a desktop client alike. For each query, we report the syntax (described later) that can be used to execute it.

Find all datasets about "climate change"
- ?query=climate+change
Find all files for experiment=decadal2001, variable=Specific Humidity, frequency=monthly, model=CanCM4
- ?experiment=decadal2000&variable=Specific+Humidity&time_frequency=mon&model=CanCM4
Find a specific dataset by id, return all possible metadata
- ?id=my.test.dataset&fields=*
Find all files belonging to a given dataset
- ?type=File&dataset_id=my.test.dataset
Find a specific file by id, or tracking id, or filename, etc.
- ?type=File&tracking_id=abc123
Find all datasets (or files) that have changed since a given date
- ?type=Dataset&from=20100101T00:00:00Z
Find all datasets for which the id is matching a wildcard expression
- ?id=my.test.* (or ?query=id:my.test.* ?)
Find all datasets within a latitude/longitude bounding box
- ?bbox=-111.032,42.943,-119.856,43.039
Find all datasets within a certain time extent
- ?start=2007-02-12T04:30:02Z&end=2007-03-11T02:28:00Z
Find all datasets for a certain named geographic region
- ?location=Arctic
Find all available facets names that can be used to query for results
- ?facets=*
Find all possible values of a given facet that apply to search results (wether or not it is a controlled vocabulary)
- ?facets=experiment
Find all replicas of a dataset that exist in the system
- ?

Requests

A request to an ESGF-compliant search engine is expressed as an HTTP GET/POST request, which is composed of a base URL and additional query parameters. The general syntax of a request (in the GET case) is as follows:

http://<base search url>/?[keyword parameters as (name, value) pairs][facet parameters as (name,value) pairs]

or more explicitly:

http://<base search url>/?[query=...][offset=...][limit=...][type=...][format=...][facets=...][fields=...][lat,lon,radius,polygon,location=...][start,end=...][from,to=...][facet1=value1][facet2=value2][...]

Note that the is totally opaque as to the search semantics, i.e. the full search specification must be encoded as part of the query parameters (this is un-RESTful but self-consistent and more scalable...). At the discretion of the site, the may contain the version of the supported ESGF search API, for example: = http://hostname/search/v1/

Note also that all parameters are optional, i.e. a simple request to with no parameters at all should return a response document corresponding to all default values of the query parameters.

The value of all parameters must be URL-encoded, so that the complete search URL is well formed.

Keyword parameters

Keyword parameters are query parameters that have reserved names, and are interpreted by the search engine for special purposes. The following keyword parameters are recognized at this time:

query (default: *): used to pass a free text constraint to the search engine, to match one or more fields

* The search engine is free to apply that constraint any way it is appropriate to its back-end and holdings. For example, "query=cmip5" may be used by a search engine to match any of the metadata fields, or maybe just the _ title _ and _ description _ fields. 

* Depending on what the search engine supports, the query value may include special characters such as "*", "?", "!" that are interpreted as query _ modifiers _

* It is also highly recommended that the search engine parse the query value and reject requests that contain dangerous characters such as ">","<","$" etc.

type (default: all types): the type of the returned record. The value of _ type _ must be chosen from the ESGF controlled vocabulary - currently the only allowed types are: Dataset, File and Aggregation.
format (default: site specific): the format of the returned response document, encoded as the document mime type

* Each search engine is free to return documents in its default format of choice, if a format is not explicitly requested 
* If a search engine cannot support a requested format, a 501 HTTP response ("Not Implemented") should be returned 
* Examples: format=application/atom+xml, format=application/solr+xml, format=application/esgf+json

offset (default: 0): the starting index for the returned results
limit (default: site specific): the maximum number of returned results. The search engine is also free to override this value with a maximum number of records it is willing to serve for each request.
lat , lon , bbox , location , radius , polygon (default: none): these parameters are used to perform a geo-spatial search according to the Open Search Geo extension specification.
start , end (default: none): these parameters are used to perform a temporal query according to the Open Search Time extension specification.

* The date and time values must be encoded in the format "YYYY-MM-DDTHH:mm:ssZ". 

* The _ start _ , _ end _ parameters refer to the data temporal coverage, not the metadata last update time stamp.

from , to (default: none): used as lower and upper limit of the last update time stamp for each record, for example to return only the newest records.

* The values must be encoded in the format "YYYY-MM-DDTHH:mm:ssZ".

facets (default: site specific): comma separated list of facets to be returned in the response. For each requested facet, the engine should return all the possible values and counts (if available) for that facet _ across all the records matching the query _ (not just the records returned in the current response document).

* Example: "query=co2&facets=experiment&offset=0&limit=10" will instruct the search engine to include in the response document all the possible values and counts of the field "experiment" for _ all _ records matching "co2" (not just the first 10 records). 

* Example: "query=co2&facets=experiment&offset=10&limit=20" will return different records than the previous query, but the same values and counts for the "experiment" facet. 

* Each engine is free to implement its default behavior for the case when the _ facets _ parameter is not specified: return no facets, return all facets, or a selected set of facets.

fields (default: site specific): used to specify which metadata fields should be included for each returned result, if available.
distrib (default: true): specifies if the search should be done in a distributed manner or only locally (if set to false).
replica (default: unset): Specifies if the search should return replicas only(true) or only originals (false). Unset to return both.
latest (default: unset): Specifies if the search should return the latest version only (true) or older versions only (false). Unset to return both.

In addition to the standard keyword parameters, each engine is free to implement and process additional keyword parameters.

For example, an engine might be able to process the instruction "highlight=true" to highlight matching text in the search result.
Site specific keywords must not collide with the controlled vocabulary of ESFG facets.

Facet parameters

Any parameter which is not a keyword parameter (i.e. it has a name that is not one of the special names listed above) is interpreted by the system as a facet parameter, and used to apply a facet constraint to the query. Multiple facet parameters can be specified as part of the same request to limit the results space to the intersection of records matching all constraints (in other words, facet parameters are combined with a logical _ AND _ ).

Note that:

At this time, for interoperability reasons (as well as security), the facet names must be chosen from a controlled vocabulary.
The facet value is used as-is to match the returned results, i.e. no regular expression matching is applied.
As for keyword parameter values, facet values must be properly URL-encoded.

Examples:

_ &experiment=decadal2000 _ : will match all records that have a metadata field with name="experiment" and value="decadal2000"
_ &cf_standard_name=Air+Temperature _ : will match records that have a metadata field with name="cf_standard_name" and value="Air Temperature"
_ &experiment=decadal* _ : will not likely match any record, since no records will have a metadata field "experiment" which is _ exactly _ set to "decadal*"

Responses

Response Document Content

The result of a search request is a response document that is encoded in the format specified by the request parameter _ format _ . Independently of the format, the response document always contains the following logical sections:

Header : contains all of the parameters used in the request, so that the same response document can be re-produced
Results : contains zero or more records that match the search criteria, each with associated metadata.
Facets : contains the values and counts of the requested facets. May be empty or non-present if no facets are requested.

Metadata Fields

Each result record contained in the response document is associated with a set of metadata fields. Each field has a name, and may be single-valued or multiple-valued. Some fields are meaningful for records of all types and have been assigned standardized names, while other fields that are more type specific and may have any name. The rules for including metadata fields in the response documents are as follows:

The standard metadata fields must always be included for each record (if applicable and available)
Other fields may be included for each record, if explicitly requested via the _ fields _ parameter.

* Even if _ fields=... _ is specified, the common metadata fields must always be included 

* fields=* may be used to include all available metadata fields

Each search engine is allowed to define its default behavior, provided it complies with the previous rules. For example, the default behavior of a search engine can be to return only the common metadata fields, or all of the available fields.

The following table lists the standard metadata fields , i.e. those fields that represent the minimum amount of metadata that is common to records of all types, and that must always be returned as part of each result record.

Field Name

Description

Multi-Valued?

Mandatory?

Applicable Record Type

id

Globally unique record identifier

false

true

all

title

Human-readable short description of the record, usable in a summary display of results

false

true

all

description

A human-readable longer description of the record, suitable for being displayed under the record's title, possibly in a shortened form

true

false

all

type

The record's type, which should match the search requested type (if provided)

false

true

all

timestamp

The date and time when the record was last updated

false

true

all

url

A URL that can be used to access the record, must include a descriptive name and the content/mime type

true

all

size

The file size, or total dataset size (sum of all files)

false

true

Dataset, File

dataset_id

The identifier of the containing dataset

false

true

File

checksum

The file checksum, if available

true

File

checksum_type

The file checksum type, if available

true

File

Notes:

Each result record must contain one or more URLs that the record can be hyperlinked to. The URL metadata must include the type of the application serving that URL, and a short descriptive name of the application itself. For example, the record representing a single file could contain a URL field for each of the possible ways to download the file:
- url= http://myhostname/thredds/fileserver/file_aaa.nc , type=application/netcdf, name=HTTP Server
- url= http://myhostname/thredds/dodsC/file_aaa.nc , type=application/opendap, name=OpenDAP Server
- url= http://myhostname/gridftp/file_aaa.nc , type=application/gridftp, name=GridFTP Server
For simplicity, a search should return only one single version (the latest) of each matching record. If available, each record may contain an optional version field that is part of its descriptive metadata. The versioning schema may be completely arbitrary, being dependent on the record type, the publishing agent, etc., and should not be used for any other purposes than visual inspection of the record.

Response Formats

[incomplete]

Errors

The search engine responsible for processing a search request should encode any unusual circumstance in processing as an HTTP response with the appropriate HTTP status code. In particular, the following status code can be returned:

HTTP 400 ("Bad Request")

* if any of the HTTP parameter names or values contain illegal characters 
* if the _ facets= _ parameter contains an illegal facet 

* if the _ fields= _ parameter contains an illegal field

HTTP 500 ("Internal Server Error")

* if a generic server error takes place

HTTP 501 ("Not Implemented")

* in response to a valid value of _ format= _ which is not yet supported by the search engine

ESGF_Search_API_Obsolete

ESGF Search API

Scope

Actors

Example Queries

Requests

Keyword parameters

Facet parameters

Responses

Response Document Content

Metadata Fields

Response Formats

Errors

References

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!