Error Medic

Elasticsearch API Timeout: How to Diagnose and Fix Connection, Request, and Search Timeouts

Fix Elasticsearch API timeouts fast: tune request_timeout, adjust index.search.slowlog thresholds, scale shards, and configure circuit breakers to stop 504s.

Last updated:
Last verified:
1,933 words
Key Takeaways
  • The most common root causes are undersized thread pools, heap pressure triggering GC pauses, overly broad wildcard queries, and misconfigured client-side or load-balancer timeout values.
  • A 'ReadTimeoutError' or HTTP 504 from the REST API almost always means the coordinating node accepted the request but could not assemble shard responses within the timeout window — the fix lives on the cluster side, not the client side.
  • Quick wins: raise request_timeout on the client, add ?timeout=30s to the REST call, profile the slow query with the Profile API, and check hot_threads to find the CPU bottleneck before touching any shard or replica counts.
Fix Approaches Compared
MethodWhen to UseTime to ApplyRisk
Increase client request_timeoutClient throws ReadTimeoutError but cluster is healthy< 5 minLow — buys time, does not fix root cause
Add ?timeout=30s query parameterOne-off slow query or bulk indexing job< 1 minLow — scoped to single request
Reduce query scope (filter before query)Wildcard/fuzzy queries on large indices30–60 minMedium — requires query refactoring
Scale horizontal (add data nodes)Shard queue depth consistently > 0Hours to daysMedium — requires cluster rebalance
Tune thread pool queue sizeBulk or search rejections visible in _cat/thread_pool5–15 minMedium — wrong value causes OOM
Increase JVM heap (up to 31 GB)GC pauses visible in logs, heap > 85%15–30 min with restartHigh — requires rolling restart
Enable request cachingRepeated aggregation queries on non-volatile indices15 minLow — may serve stale data
Circuit breaker adjustmentrequests.breaker.total.tripped counter rising10 minHigh — masking memory pressure

Understanding the Elasticsearch API Timeout Error

When your application or curl command hits an Elasticsearch API timeout you will see one of several error signatures depending on the layer that gave up first.

Client-side (Python elasticsearch-py):

elasticsearch.exceptions.ConnectionTimeout: ConnectionTimeout caused by - ReadTimeoutError(HTTPConnectionPool(host='localhost', port=9200): Read timed out. (read timeout=10))

REST / HTTP layer:

{"error":{"root_cause":[{"type":"search_phase_execution_exception","reason":"all shards failed"}],"type":"search_phase_execution_exception","reason":"all shards failed","phase":"query","grouped":true,"failed_shards":[{"shard":0,"index":"logs-2024","node":"abc123","reason":{"type":"query_shard_exception","reason":"Request timed out"}}]}}

Coordinating node log (elasticsearch.log):

[WARN][o.e.a.s.SearchService] [node-1] timeout while executing [indices:data/read/search[phase/query]]

Bulk indexing timeout:

{"error":{"type":"timeout_exception","reason":"Timeout waiting for task [bulk[shard_id=3]]"}}

Each error surface points to a different layer of the stack. Understanding the taxonomy saves you from chasing the wrong fix.


Architecture: Why Timeouts Happen

Every Elasticsearch search flows through three phases: the coordinating node fans out to primary or replica shards (query phase), each shard returns doc IDs and scores (fetch phase), and the coordinator merges and returns results. A timeout can fire at any phase boundary. The default search.default_search_timeout is -1 (unlimited) at the cluster level, but client libraries and API gateways almost always impose their own limits — usually 10–30 seconds.

Timeout triggers, ranked by frequency in production:

  1. JVM GC pause — A full GC pause of 10–20 s on a data node causes all in-flight shard requests to exceed the timeout. Observable via /_nodes/stats (jvm.gc.collectors.old.collection_time_in_millis climbing).
  2. Thread pool saturation — The search thread pool defaults to int(availableProcessors / 2) + 1 threads with a queue of 1000. When the queue fills, new requests are rejected with a 429, but requests already queued may time out before a thread picks them up.
  3. Hot shards / skewed data distribution — A single shard holding 80 % of documents means one thread does most of the work while others sit idle.
  4. Expensive queries — Unbounded wildcard, script, or nested terms queries with millions of candidate documents.
  5. Network partition or slow disk I/O — Especially common on cloud instances with burst-capable EBS volumes that exhaust their I/O credits.

Step 1: Identify Which Layer Is Timing Out

Before changing any configuration, confirm whether the timeout originates at the client, the coordinating node, or a data node.

# 1a. Check cluster health — RED means shards are unassigned and queries will hang
curl -s 'http://localhost:9200/_cluster/health?pretty'

# 1b. Check hot threads — reveals what CPU is actually doing
curl -s 'http://localhost:9200/_nodes/hot_threads?threads=5&interval=500ms'

# 1c. Check thread pool queues and rejections
curl -s 'http://localhost:9200/_cat/thread_pool/search,bulk,write?v&h=node_name,name,active,queue,rejected,completed'

# 1d. Check JVM heap and GC pressure
curl -s 'http://localhost:9200/_nodes/stats/jvm?pretty' | \
  python3 -c "import sys,json; n=json.load(sys.stdin)['nodes']; \
  [print(v['name'], v['jvm']['mem']['heap_used_percent'],'% heap') for v in n.values()]"

# 1e. Check slow log — requires slowlog threshold set (see Step 2)
curl -s 'http://localhost:9200/_cat/indices?v&h=index,search.fetch_time,search.query_time,search.scroll_time'

Step 2: Enable the Slow Log to Profile the Offending Query

The Elasticsearch slow log is the most actionable diagnostic tool for search timeouts. Enable it dynamically without restarting.

# Enable slowlog on a specific index (adjust index name and thresholds)
curl -X PUT 'http://localhost:9200/logs-2024/_settings' \
  -H 'Content-Type: application/json' \
  -d '{
    "index.search.slowlog.threshold.query.warn": "5s",
    "index.search.slowlog.threshold.query.info": "2s",
    "index.search.slowlog.threshold.fetch.warn": "1s",
    "index.search.slowlog.level": "info"
  }'

Once set, slow queries appear in logs/elasticsearch_index_search_slowlog.json. Look for the took field and the source field showing the full query JSON. A query taking > 5 s in the slow log but appearing fast in _profile output usually indicates I/O wait, not CPU.


Step 3: Use the Profile API to Find the Expensive Clause

Add "profile": true to any search request to get per-shard timing broken down by query clause.

curl -X GET 'http://localhost:9200/logs-2024/_search?pretty' \
  -H 'Content-Type: application/json' \
  -d '{
    "profile": true,
    "query": {
      "bool": {
        "must": [{"wildcard": {"message": "*error*"}}],
        "filter": [{"range": {"@timestamp": {"gte": "now-1h"}}}]
      }
    }
  }'

In the response, shards[].searches[].query[].time_in_nanos reveals which clause dominates. A wildcard on an un-analyzed keyword field at 2–3 billion nanoseconds is a clear culprit — replace it with a full-text match query or a prefix-aware edge_ngram analyzer.


Step 4: Apply the Appropriate Fix

Fix A — Immediate relief (client timeout increase):

For Python:

from elasticsearch import Elasticsearch
es = Elasticsearch(
    ['http://localhost:9200'],
    request_timeout=60,   # seconds
    retry_on_timeout=True,
    max_retries=3
)

For curl / REST clients, append ?timeout=30s to the URL:

curl 'http://localhost:9200/logs-2024/_search?timeout=30s'

Note: this tells Elasticsearch to return partial results after 30 s rather than hanging. It does not make the query faster.

Fix B — Rewrite expensive queries (permanent fix):

Replace wildcard with match or use the multi_match query with best_fields type. For log analytics, move aggregation-heavy dashboards to async search (_async_search) so they do not block the user thread.

Fix C — Thread pool tuning:

# Add to elasticsearch.yml on each node, then rolling-restart
thread_pool.search.size: 16          # default: (vCPU/2)+1
thread_pool.search.queue_size: 2000  # default: 1000

Do not set size above vCPU * 2. Setting queue_size too high causes requests to succeed eventually but greatly increases P99 latency.

Fix D — Heap and GC tuning:

Set JVM heap to 50 % of available RAM, capped at 30 GB (above 32 GB, the JVM disables compressed OOPs, wasting memory). Edit jvm.options:

-Xms16g
-Xmx16g

For modern Elasticsearch 8.x, use the ES_JAVA_OPTS approach in jvm.options.d/ rather than editing the main file.


Step 5: Verify the Fix

After applying changes, confirm the cluster is stable and timeout rate has dropped:

# Watch rejection rate in real time (Ctrl-C to stop)
watch -n 5 'curl -s http://localhost:9200/_cat/thread_pool/search?v&h=node_name,active,queue,rejected'

# Check circuit breaker trips
curl -s 'http://localhost:9200/_nodes/stats/breaker?pretty' | grep -E '(tripped|limit_size_in_bytes|estimated_size_in_bytes)'

# Run a benchmark query and measure wall-clock time
time curl -s 'http://localhost:9200/logs-2024/_search?size=0' \
  -H 'Content-Type: application/json' \
  -d '{"query":{"match_all":{}},"aggs":{"per_host":{"terms":{"field":"host.keyword","size":10}}}}' | \
  python3 -c "import sys,json; r=json.load(sys.stdin); print('took:', r['took'], 'ms')"

A healthy cluster should return most aggregations under 500 ms. If P95 is still above 5 s after thread pool and heap tuning, consider adding data nodes and triggering a shard rebalance via _cluster/reroute.

Frequently Asked Questions

bash
#!/usr/bin/env bash
# elasticsearch-timeout-diagnostics.sh
# Run against any Elasticsearch node to collect timeout-related metrics.
# Usage: ES_HOST=http://localhost:9200 bash elasticsearch-timeout-diagnostics.sh

ES="${ES_HOST:-http://localhost:9200}"
echo "=== Cluster Health ==="
curl -s "$ES/_cluster/health?pretty"

echo ""
echo "=== Thread Pool: search + write + bulk ==="
curl -s "$ES/_cat/thread_pool/search,write,bulk?v&h=node_name,name,type,active,queue,rejected,completed,largest"

echo ""
echo "=== JVM Heap Usage per Node ==="
curl -s "$ES/_nodes/stats/jvm" | \
  python3 -c "
import sys, json
data = json.load(sys.stdin)
for nid, node in data['nodes'].items():
    heap_pct = node['jvm']['mem']['heap_used_percent']
    gc_old   = node['jvm']['gc']['collectors']['old']['collection_time_in_millis']
    print(f"{node['name']:30s}  heap={heap_pct}%  old_gc_ms={gc_old}")
"

echo ""
echo "=== Circuit Breaker Status ==="
curl -s "$ES/_nodes/stats/breaker" | \
  python3 -c "
import sys, json
data = json.load(sys.stdin)
for nid, node in data['nodes'].items():
    print(f"Node: {node['name']}")
    for bname, bdata in node['breakers'].items():
        print(f"  {bname}: tripped={bdata['tripped']}  used={bdata['estimated_size']}  limit={bdata['limit_size']}")
"

echo ""
echo "=== Hot Threads (top 3, 500ms sample) ==="
curl -s "$ES/_nodes/hot_threads?threads=3&interval=500ms&type=cpu"

echo ""
echo "=== Pending Tasks ==="
curl -s "$ES/_cluster/pending_tasks?pretty"

echo ""
echo "=== Slowlog thresholds on all indices ==="
curl -s "$ES/_all/_settings?pretty&filter_path=**.slowlog"

echo ""
echo "=== Search Latency (took ms for match_all with size=0) ==="
curl -s -w "\nHTTP %{http_code}  wall=%{time_total}s" \
  "$ES/_search?size=0&pretty" \
  -H 'Content-Type: application/json' \
  -d '{"query":{"match_all":{}}}' | tail -5
E

Error Medic Editorial

Error Medic Editorial is a team of senior DevOps and SRE engineers with collective experience running Elasticsearch clusters from single-node development setups to 200-node multi-region deployments. The team specializes in distributed systems observability, JVM performance tuning, and cloud-native search infrastructure.

Sources

Related Articles in Elasticsearch Api

Explore More API Errors Guides