Skip to content

Timeout setting for Opensearch and Elasticsearch #408

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Jul 2, 2025

Conversation

z-mrozu
Copy link
Contributor

@z-mrozu z-mrozu commented Jun 26, 2025

Description:

Added timeout setting in Opensearch & Elasticsearch config which should only be relevant if user sets "ES_TIMEOUT"

PR Checklist:

  • Code is formatted and linted (run pre-commit run --all-files)
  • Tests pass (run make test)
  • Documentation has been updated to reflect changes, if applicable
  • Changes are added to the changelog

@z-mrozu z-mrozu marked this pull request as ready for review June 26, 2025 10:06
@@ -56,6 +56,10 @@ def _es_config() -> Dict[str, Any]:
if (u := os.getenv("ES_USER")) and (p := os.getenv("ES_PASS")):
config["http_auth"] = (u, p)

# Include timeout setting if set
if timeout := os.getenv("ES_TIMEOUT"):
config["timeout"] = timeout
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, didn't notice the parameter was deprecated in Elasticsearch client, changed it

@@ -53,6 +53,10 @@ def _es_config() -> Dict[str, Any]:

config["headers"] = headers

# Include timeout setting if set
if timeout := os.getenv("ES_TIMEOUT"):
config["timeout"] = timeout
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this right for Opensearch? I am not sure.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be, from OpenSearch Client parameters:
"kwargs (Any) – any additional arguments will be passed on to the Transport class and, subsequently, to the Connection instances."
and then Connection has the timeout parameter:
"timeout (int) – default timeout in seconds (float, default: 10)"

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for looking into this, I was going through the opensearch-py code a little and this makes sense for sure.

@jonhealy1
Copy link
Collaborator

@z-mrozu
Copy link
Contributor Author

z-mrozu commented Jun 27, 2025

@jonhealy1 just to double check - for opensearch if the timeout parameter is not set it defaults to 10s because of the value set in Connection class (both for the sync client and async client), but for Elasticsearch if the user doesn't set request_timeout it defaults to either urlib3's default timeout for sync client (because of the elastic-transport urlib3 transport implementation) - which would be None, I think - or elastic-transport's aiohttp default timeout for async client - which would be 10s - so I'm a bit unsure what to put in the Default column in configuration reference. Something like 10s (OS/ES async) None (ES sync) maybe?

@jonhealy1
Copy link
Collaborator

@z-mrozu I think maybe we say something general, because we don't actually set a default. We could leave it blank or say something like - uses db client default?

@z-mrozu
Copy link
Contributor Author

z-mrozu commented Jun 27, 2025

@jonhealy1 I added es_timeout to the config reference and went for "DB client default" in Default column. I also cleaned up the formatting of the table a bit, hope that's okay

@jonhealy1
Copy link
Collaborator

This is from ai so I know it's questionable information but it is something that I am wondering about - are timeout and request_timeout two different things?

Why timeout and request_timeout Differ:

Elasticsearch (elasticsearch-py): timeout sets the connection timeout (e.g., 10s for establishing a connection), while request_timeout sets the total request timeout (e.g., 30s for connection, sending, processing, response) for all API calls.

OpenSearch (opensearch-py): Only timeout is available at client initialization, controlling connection timeout. A tuple timeout=(connect_timeout, read_timeout) can approximate request_timeout by setting read_timeout for the response phase, but it’s HTTP-layer (less precise) and applies globally.

Difference: timeout is for connection establishment; request_timeout (Elasticsearch) or read_timeout (OpenSearch) covers the entire request lifecycle.

Simplest Solution
Use a single SEARCH_TIMEOUT environment variable (e.g., 30) to set request_timeout for Elasticsearch or read_timeout (with fixed connect_timeout=10) for OpenSearch in your FastAPI app.

I can try to verify this info when I have time. Are you a Elasticsearch or Opensearch User?

@z-mrozu
Copy link
Contributor Author

z-mrozu commented Jun 27, 2025

Mostly Opensearch user, I think request_timeout used to be named timeout in ES client at some point (based on the deprecation warning) but I'm unsure if OS timeout and ES timeout were implemented in the same way or not. At least in Opensearch the setting seems to control connection timeout - as in if you can connect with OS within set timeout but for example you are making a sizeable search request that takes a while to actually download, maybe even longer than the set timeout, you're still in the clear and the search request doesn't throw a timeout error

Also just checked and you can't set timeout as a tuple (connect_timeout, read_timeout) so at least that's questionable

@z-mrozu
Copy link
Contributor Author

z-mrozu commented Jun 27, 2025

In earlier versions of ES (7.x) it seems that the timeout param was the same - as in it was a kwarg passed to Connection with default value set as 10, but again I'm unsure about how and why it was changed to request_timeout

@jonhealy1 jonhealy1 self-requested a review June 27, 2025 15:15
@jonhealy1
Copy link
Collaborator

The logic here - I guess - is that the Requests library will accept a tuple and there is nothing in the opensearch code that will prevent this? https://requests.readthedocs.io/en/latest/user/advanced/#timeouts

Not my words:

The timeout=(connect_timeout, read_timeout) tuple works in opensearch-py because the RequestsHttpConnection class (in opensearchpy/connection/http_requests.py) passes the timeout parameter directly to the requests library’s Session.send method, which supports a tuple for setting separate connection and read timeouts. The requests library interprets the tuple as (connect_timeout, read_timeout), applying the first value to establish the connection and the second to wait for the server’s response, ensuring both timeouts are enforced for all API calls despite the misleading Optional[Union[int, float]] type hint.

@z-mrozu
Copy link
Contributor Author

z-mrozu commented Jun 30, 2025

I tested it, and by default the sync client of Opensearch uses urlib3 too and the timeout argument is passed in such a way that if we try using a tuple it errors out due to ValueError: Timeout value connect was (5, 5), but it must be an int, float or None.. Similarly for the async client; it uses aiohttp and errors out due to TypeError: '>' not supported between instances of 'tuple' and 'int'. So it seems that in both cases it's passed in such a way that it doesn't know how to handle a tuple.

Also, I double checked another thing while doing that and it seems that the sync client of Opensearch uses timeout only as connection timeout, but the async version of the client uses it as both connection timeout and read timeout (so the sync version doesn't throw a timeout error during long searches but the async version does). Should I change "Connection timeout for Elasticsearch/OpenSearch." in the config reference to something like "Client timeout for Elasticsearch/OpenSearch" maybe?

@jonhealy1
Copy link
Collaborator

Yes, let's make that change. Thanks for doing all this work. After the readme change this should be good to go.

Copy link
Collaborator

@jonhealy1 jonhealy1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @z-mrozu!

@jonhealy1 jonhealy1 merged commit a0b77cb into stac-utils:main Jul 2, 2025
15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants