Why does my Elasticsearch query time out only during peak hours?

Peak-hour timeouts almost always indicate thread pool saturation. During high load, the `search` thread pool queue fills faster than threads drain it. Requests sit in the queue until the coordinating node's client-side timeout fires. Check `_cat/thread_pool` for a rising `queue` and `rejected` count. Fixes: add nodes to distribute shard load, reduce query complexity, or implement index-level request caching for repeated aggregations with `index.requests.cache.enable: true`.

I set ?timeout=30s but Elasticsearch still returns a 504 from my load balancer. Who is timing out?

The load balancer is the culprit. The `?timeout=30s` parameter instructs Elasticsearch to return partial results after 30 s, but if your nginx or AWS ALB idle timeout is shorter (default 60 s for ALB, 75 s for nginx), the TCP connection is torn down before Elasticsearch can respond. Increase the load balancer idle timeout to at least 90 s, or better, move long-running queries to `_async_search` so the client polls for results rather than holding a persistent HTTP connection.

My bulk indexing API is timing out with 'timeout_exception'. Is this the same issue?

Bulk indexing timeouts are usually a `write` thread pool or a merge-throttling issue, not the `search` thread pool. Check `_cat/thread_pool/write` for queue depth, and inspect `indices.merges.current` in `_nodes/stats`. If segments are merging heavily (common after a large re-index), indexing throughput drops sharply. Mitigations: increase `index.refresh_interval` to `30s` during bulk loads, set `index.number_of_replicas: 0` temporarily, and use the `_bulk` API with batches of 5–15 MB rather than thousands of tiny requests.

The Elasticsearch Python client throws ConnectionTimeout even though the server responds in time on curl. Why?

The Python `elasticsearch-py` client enforces both a connect timeout and a read timeout separately. The default read timeout is 10 s. If your query takes 11 s, the client raises `ConnectionTimeout` even though the server is healthy. Pass `request_timeout=60` to the `Elasticsearch()` constructor or per-call via `es.search(index='...', request_timeout=60, body={...})`. Also check whether a proxy or corporate firewall is dropping idle TCP connections before your read timeout fires — a symptom is timeouts at suspiciously round numbers (30, 60, 120 s).

After increasing timeouts, queries succeed but take 30+ seconds. How do I actually speed them up?

Long query duration is almost always query design, not configuration. Use the Profile API (`"profile": true`) to identify the slowest clause. Common culprits: (1) `wildcard` with a leading star (`*error*`) requires a full-index scan — replace with a `match` query on an analyzed field; (2) unbounded `terms` aggregations with `size: 10000` — use composite aggregations with pagination instead; (3) `script` queries running Painless on every document — convert to indexed fields or runtime fields with a more selective pre-filter; (4) deep pagination using `from/size` beyond 10,000 — switch to `search_after` with a pit (point-in-time).

Elasticsearch API Timeout: How to Diagnose and Fix Connection, Request, and Search Timeouts

Fix Elasticsearch API timeouts fast: tune request_timeout, adjust index.search.slowlog thresholds, scale shards, and configure circuit breakers to stop 504s.

Last updated: February 23, 2026

Last verified: February 23, 2026

1,933 words

Key Takeaways

The most common root causes are undersized thread pools, heap pressure triggering GC pauses, overly broad wildcard queries, and misconfigured client-side or load-balancer timeout values.
A 'ReadTimeoutError' or HTTP 504 from the REST API almost always means the coordinating node accepted the request but could not assemble shard responses within the timeout window — the fix lives on the cluster side, not the client side.
Quick wins: raise request_timeout on the client, add ?timeout=30s to the REST call, profile the slow query with the Profile API, and check hot_threads to find the CPU bottleneck before touching any shard or replica counts.

Fix Approaches Compared
Method	When to Use	Time to Apply	Risk
Increase client request_timeout	Client throws ReadTimeoutError but cluster is healthy	< 5 min	Low — buys time, does not fix root cause
Add ?timeout=30s query parameter	One-off slow query or bulk indexing job	< 1 min	Low — scoped to single request
Reduce query scope (filter before query)	Wildcard/fuzzy queries on large indices	30–60 min	Medium — requires query refactoring
Scale horizontal (add data nodes)	Shard queue depth consistently > 0	Hours to days	Medium — requires cluster rebalance
Tune thread pool queue size	Bulk or search rejections visible in _cat/thread_pool	5–15 min	Medium — wrong value causes OOM
Increase JVM heap (up to 31 GB)	GC pauses visible in logs, heap > 85%	15–30 min with restart	High — requires rolling restart
Enable request caching	Repeated aggregation queries on non-volatile indices	15 min	Low — may serve stale data
Circuit breaker adjustment	requests.breaker.total.tripped counter rising	10 min	High — masking memory pressure

Understanding the Elasticsearch API Timeout Error

When your application or curl command hits an Elasticsearch API timeout you will see one of several error signatures depending on the layer that gave up first.

Client-side (Python elasticsearch-py):

elasticsearch.exceptions.ConnectionTimeout: ConnectionTimeout caused by - ReadTimeoutError(HTTPConnectionPool(host='localhost', port=9200): Read timed out. (read timeout=10))

REST / HTTP layer:

{"error":{"root_cause":[{"type":"search_phase_execution_exception","reason":"all shards failed"}],"type":"search_phase_execution_exception","reason":"all shards failed","phase":"query","grouped":true,"failed_shards":[{"shard":0,"index":"logs-2024","node":"abc123","reason":{"type":"query_shard_exception","reason":"Request timed out"}}]}}

Coordinating node log (elasticsearch.log):

[WARN][o.e.a.s.SearchService] [node-1] timeout while executing [indices:data/read/search[phase/query]]

Bulk indexing timeout:

{"error":{"type":"timeout_exception","reason":"Timeout waiting for task [bulk[shard_id=3]]"}}

Each error surface points to a different layer of the stack. Understanding the taxonomy saves you from chasing the wrong fix.

Architecture: Why Timeouts Happen

Every Elasticsearch search flows through three phases: the coordinating node fans out to primary or replica shards (query phase), each shard returns doc IDs and scores (fetch phase), and the coordinator merges and returns results. A timeout can fire at any phase boundary. The default search.default_search_timeout is -1 (unlimited) at the cluster level, but client libraries and API gateways almost always impose their own limits — usually 10–30 seconds.

Timeout triggers, ranked by frequency in production:

JVM GC pause — A full GC pause of 10–20 s on a data node causes all in-flight shard requests to exceed the timeout. Observable via /_nodes/stats (jvm.gc.collectors.old.collection_time_in_millis climbing).
Thread pool saturation — The search thread pool defaults to int(availableProcessors / 2) + 1 threads with a queue of 1000. When the queue fills, new requests are rejected with a 429, but requests already queued may time out before a thread picks them up.
Hot shards / skewed data distribution — A single shard holding 80 % of documents means one thread does most of the work while others sit idle.
Expensive queries — Unbounded wildcard, script, or nested terms queries with millions of candidate documents.
Network partition or slow disk I/O — Especially common on cloud instances with burst-capable EBS volumes that exhaust their I/O credits.

Step 1: Identify Which Layer Is Timing Out

Before changing any configuration, confirm whether the timeout originates at the client, the coordinating node, or a data node.

# 1a. Check cluster health — RED means shards are unassigned and queries will hang
curl -s 'http://localhost:9200/_cluster/health?pretty'

# 1b. Check hot threads — reveals what CPU is actually doing
curl -s 'http://localhost:9200/_nodes/hot_threads?threads=5&interval=500ms'

# 1c. Check thread pool queues and rejections
curl -s 'http://localhost:9200/_cat/thread_pool/search,bulk,write?v&h=node_name,name,active,queue,rejected,completed'

# 1d. Check JVM heap and GC pressure
curl -s 'http://localhost:9200/_nodes/stats/jvm?pretty' | \
  python3 -c "import sys,json; n=json.load(sys.stdin)['nodes']; \
  [print(v['name'], v['jvm']['mem']['heap_used_percent'],'% heap') for v in n.values()]"

# 1e. Check slow log — requires slowlog threshold set (see Step 2)
curl -s 'http://localhost:9200/_cat/indices?v&h=index,search.fetch_time,search.query_time,search.scroll_time'

Step 2: Enable the Slow Log to Profile the Offending Query

The Elasticsearch slow log is the most actionable diagnostic tool for search timeouts. Enable it dynamically without restarting.

# Enable slowlog on a specific index (adjust index name and thresholds)
curl -X PUT 'http://localhost:9200/logs-2024/_settings' \
  -H 'Content-Type: application/json' \
  -d '{
    "index.search.slowlog.threshold.query.warn": "5s",
    "index.search.slowlog.threshold.query.info": "2s",
    "index.search.slowlog.threshold.fetch.warn": "1s",
    "index.search.slowlog.level": "info"
  }'

Once set, slow queries appear in logs/elasticsearch_index_search_slowlog.json. Look for the took field and the source field showing the full query JSON. A query taking > 5 s in the slow log but appearing fast in _profile output usually indicates I/O wait, not CPU.

Step 3: Use the Profile API to Find the Expensive Clause

Add "profile": true to any search request to get per-shard timing broken down by query clause.

curl -X GET 'http://localhost:9200/logs-2024/_search?pretty' \
  -H 'Content-Type: application/json' \
  -d '{
    "profile": true,
    "query": {
      "bool": {
        "must": [{"wildcard": {"message": "*error*"}}],
        "filter": [{"range": {"@timestamp": {"gte": "now-1h"}}}]
      }
    }
  }'

In the response, shards[].searches[].query[].time_in_nanos reveals which clause dominates. A wildcard on an un-analyzed keyword field at 2–3 billion nanoseconds is a clear culprit — replace it with a full-text match query or a prefix-aware edge_ngram analyzer.

Step 4: Apply the Appropriate Fix

Fix A — Immediate relief (client timeout increase):

For Python:

from elasticsearch import Elasticsearch
es = Elasticsearch(
    ['http://localhost:9200'],
    request_timeout=60,   # seconds
    retry_on_timeout=True,
    max_retries=3
)

For curl / REST clients, append ?timeout=30s to the URL:

curl 'http://localhost:9200/logs-2024/_search?timeout=30s'

Note: this tells Elasticsearch to return partial results after 30 s rather than hanging. It does not make the query faster.

Fix B — Rewrite expensive queries (permanent fix):

Replace wildcard with match or use the multi_match query with best_fields type. For log analytics, move aggregation-heavy dashboards to async search (_async_search) so they do not block the user thread.

Fix C — Thread pool tuning:

# Add to elasticsearch.yml on each node, then rolling-restart
thread_pool.search.size: 16          # default: (vCPU/2)+1
thread_pool.search.queue_size: 2000  # default: 1000

Do not set size above vCPU * 2. Setting queue_size too high causes requests to succeed eventually but greatly increases P99 latency.

Fix D — Heap and GC tuning:

Set JVM heap to 50 % of available RAM, capped at 30 GB (above 32 GB, the JVM disables compressed OOPs, wasting memory). Edit jvm.options:

-Xms16g
-Xmx16g

For modern Elasticsearch 8.x, use the ES_JAVA_OPTS approach in jvm.options.d/ rather than editing the main file.

Step 5: Verify the Fix

After applying changes, confirm the cluster is stable and timeout rate has dropped:

# Watch rejection rate in real time (Ctrl-C to stop)
watch -n 5 'curl -s http://localhost:9200/_cat/thread_pool/search?v&h=node_name,active,queue,rejected'

# Check circuit breaker trips
curl -s 'http://localhost:9200/_nodes/stats/breaker?pretty' | grep -E '(tripped|limit_size_in_bytes|estimated_size_in_bytes)'

# Run a benchmark query and measure wall-clock time
time curl -s 'http://localhost:9200/logs-2024/_search?size=0' \
  -H 'Content-Type: application/json' \
  -d '{"query":{"match_all":{}},"aggs":{"per_host":{"terms":{"field":"host.keyword","size":10}}}}' | \
  python3 -c "import sys,json; r=json.load(sys.stdin); print('took:', r['took'], 'ms')"

A healthy cluster should return most aggregations under 500 ms. If P95 is still above 5 s after thread pool and heap tuning, consider adding data nodes and triggering a shard rebalance via _cluster/reroute.

Frequently Asked Questions

bash

#!/usr/bin/env bash
# elasticsearch-timeout-diagnostics.sh
# Run against any Elasticsearch node to collect timeout-related metrics.
# Usage: ES_HOST=http://localhost:9200 bash elasticsearch-timeout-diagnostics.sh

ES="${ES_HOST:-http://localhost:9200}"
echo "=== Cluster Health ==="
curl -s "$ES/_cluster/health?pretty"

echo ""
echo "=== Thread Pool: search + write + bulk ==="
curl -s "$ES/_cat/thread_pool/search,write,bulk?v&h=node_name,name,type,active,queue,rejected,completed,largest"

echo ""
echo "=== JVM Heap Usage per Node ==="
curl -s "$ES/_nodes/stats/jvm" | \
  python3 -c "
import sys, json
data = json.load(sys.stdin)
for nid, node in data['nodes'].items():
    heap_pct = node['jvm']['mem']['heap_used_percent']
    gc_old   = node['jvm']['gc']['collectors']['old']['collection_time_in_millis']
    print(f"{node['name']:30s}  heap={heap_pct}%  old_gc_ms={gc_old}")
"

echo ""
echo "=== Circuit Breaker Status ==="
curl -s "$ES/_nodes/stats/breaker" | \
  python3 -c "
import sys, json
data = json.load(sys.stdin)
for nid, node in data['nodes'].items():
    print(f"Node: {node['name']}")
    for bname, bdata in node['breakers'].items():
        print(f"  {bname}: tripped={bdata['tripped']}  used={bdata['estimated_size']}  limit={bdata['limit_size']}")
"

echo ""
echo "=== Hot Threads (top 3, 500ms sample) ==="
curl -s "$ES/_nodes/hot_threads?threads=3&interval=500ms&type=cpu"

echo ""
echo "=== Pending Tasks ==="
curl -s "$ES/_cluster/pending_tasks?pretty"

echo ""
echo "=== Slowlog thresholds on all indices ==="
curl -s "$ES/_all/_settings?pretty&filter_path=**.slowlog"

echo ""
echo "=== Search Latency (took ms for match_all with size=0) ==="
curl -s -w "\nHTTP %{http_code}  wall=%{time_total}s" \
  "$ES/_search?size=0&pretty" \
  -H 'Content-Type: application/json' \
  -d '{"query":{"match_all":{}}}' | tail -5

Error Medic Editorial

Error Medic Editorial is a team of senior DevOps and SRE engineers with collective experience running Elasticsearch clusters from single-node development setups to 200-node multi-region deployments. The team specializes in distributed systems observability, JVM performance tuning, and cloud-native search infrastructure.

Sources

Explore More API Errors Guides

AWS API Rate Limit Exceeded (ThrottlingException): Complete Troubleshooting Guide

Fix AWS ThrottlingException and API timeouts with exponential backoff, Service Quotas increases, and optimized API polling strategies for your workloads.

Azure API Timeout: 'The operation timed out' — Root Causes and Fixes

Fix Azure API timeouts caused by misconfigured APIM policies, backend latency, or connection limits. Step-by-step diagnostics and policy fixes included.

Azure API Timeout: Fix 504 Gateway Timeout and RequestTimeout Errors in Azure API Management, Functions, and ARM

Diagnose and fix Azure API timeout errors (504, 408, RequestTimeout) across API Management, Functions, and ARM. Includes policy fixes, host.json config, and CLI