Elasticsearch API Timeout: Diagnosing and Fixing Request Timeout Errors
Fix Elasticsearch API timeout errors fast. Covers socket timeouts, request_timeout settings, slow queries, and cluster health fixes with real commands.
- Most Elasticsearch API timeouts are caused by one of three root causes: undersized thread pools, slow or unoptimized queries hitting too many shards, or a cluster with unassigned shards causing request queuing
- The default request timeout in most Elasticsearch clients is 30 seconds; if your query or bulk operation exceeds this, you will see a `ConnectionTimeout` or `RequestError` with a `timeout` reason in the response
- Quick fix: start by running `GET _cluster/health` and `GET _cat/thread_pool?v` to triage whether the issue is cluster-wide, query-specific, or client-configuration-related before changing any settings
| Method | When to Use | Time to Apply | Risk |
|---|---|---|---|
| Increase client request_timeout | Slow but valid long-running queries (reindex, bulk) | < 5 min | Low — client-side only |
| Tune search.default_search_timeout | Queries consistently slow across the cluster | 5–15 min | Medium — affects all queries |
| Add more shards / reduce shard count | Too many or too few shards causing hot spots | 30–120 min | Medium — requires reindex |
| Scale data nodes horizontally | Thread pool queue saturation under sustained load | 1–4 hrs | Low — additive change |
| Add request circuit breaker | Protect cluster from memory-exhausting queries | 10 min | Low — adds guardrails |
| Optimize query (filters, avoid wildcards) | Specific slow query identified via slow log | 15–60 min | Low — query-level change |
| Force-assign unassigned shards | Red/yellow cluster blocking primary operations | 5–30 min | Medium — review reason first |
Understanding Elasticsearch API Timeout Errors
When a request to Elasticsearch exceeds the configured time limit, the client or server terminates the connection and surfaces one of several timeout-related errors. Understanding which layer is timing out is the critical first diagnostic step.
Common Error Messages
Depending on your client library and configuration, you will see one of these signatures:
Python elasticsearch-py client:
elasticsearch.exceptions.ConnectionTimeout: ConnectionTimeout caused by - ReadTimeoutError(HTTPConnectionPool(host='localhost', port=9200): Read timed out. (read timeout=30))
Java High-Level REST Client:
java.net.SocketTimeoutException: 30,000 milliseconds timeout on connection http-outgoing-0 [ACTIVE]
Kibana / curl:
{"error":{"root_cause":[{"type":"search_phase_execution_exception","reason":"all shards failed"}],"type":"search_phase_execution_exception","reason":"all shards failed","phase":"query","grouped":true,"failed_shards":[{"shard":0,"index":"my-index","node":"abc123","reason":{"type":"query_shard_exception","reason":"Request timed out"}}]}}
Server-side timeout (search timeout):
{"timed_out":true,"_shards":{"total":5,"successful":3,"skipped":0,"failed":2}}
Note the difference: a client timeout throws an exception before any response arrives. A server-side timeout returns a 200 response with "timed_out": true and partial results.
Step 1: Diagnose — Identify the Timeout Layer
Run these commands in order to narrow down the root cause.
1a. Check cluster health first
curl -s 'http://localhost:9200/_cluster/health?pretty'
A red or yellow status means shards are unassigned. Primary shard unavailability causes all writes and some reads to block until timeout. If status is red, fix this before anything else.
1b. Inspect thread pool saturation
curl -s 'http://localhost:9200/_cat/thread_pool/search,write,bulk?v&h=node_name,name,active,queue,rejected,completed'
If queue is consistently > 0 or rejected is climbing, your nodes are overwhelmed. Requests are queuing past the client timeout window.
1c. Find slow queries via the slow log
Enable the slow log temporarily:
curl -X PUT 'http://localhost:9200/my-index/_settings' -H 'Content-Type: application/json' -d '{
"index.search.slowlog.threshold.query.warn": "2s",
"index.search.slowlog.threshold.fetch.warn": "1s",
"index.search.slowlog.level": "warn"
}'
Then tail the Elasticsearch log:
tail -f /var/log/elasticsearch/my-cluster_index_search_slowlog.log
1d. Check pending tasks
curl -s 'http://localhost:9200/_cluster/pending_tasks?pretty'
A large backlog of pending cluster tasks (e.g., shard assignment, mapping updates) can delay query execution.
1e. Check JVM heap pressure
curl -s 'http://localhost:9200/_nodes/stats/jvm?pretty' | python3 -c "
import json, sys
nodes = json.load(sys.stdin)['nodes']
for nid, n in nodes.items():
heap = n['jvm']['mem']
used_pct = heap['heap_used_percent']
print(f"{n['name']}: heap {used_pct}%")
"
Heap usage consistently above 75% triggers frequent GC pauses, which directly cause timeouts.
Step 2: Fix — Apply the Right Remedy
Fix A: Adjust the Client-Side Timeout
This is the correct fix when your operation (bulk indexing, reindex, aggregation over large datasets) is legitimately long-running and valid.
Python:
from elasticsearch import Elasticsearch
es = Elasticsearch(
['http://localhost:9200'],
request_timeout=120 # seconds
)
# Per-request override
result = es.search(index='my-index', body=query, request_timeout=60)
Node.js (@elastic/elasticsearch):
const { Client } = require('@elastic/elasticsearch')
const client = new Client({
node: 'http://localhost:9200',
requestTimeout: 120000 // milliseconds
})
curl:
curl --max-time 120 -X GET 'http://localhost:9200/my-index/_search' -H 'Content-Type: application/json' -d '{"query":{"match_all":{}}}'
Fix B: Set a Cluster-Wide Search Timeout
This prevents runaway queries from tying up resources. It returns partial results rather than blocking forever.
curl -X PUT 'http://localhost:9200/_cluster/settings' -H 'Content-Type: application/json' -d '{
"persistent": {
"search.default_search_timeout": "30s"
}
}'
Or pass timeout per request at query time:
curl -X GET 'http://localhost:9200/my-index/_search?timeout=10s' -H 'Content-Type: application/json' -d '{"query":{"match_all":{}}}'
Fix C: Fix Unassigned Shards (Red Cluster)
# Find unassigned shards and their reason
curl -s 'http://localhost:9200/_cluster/allocation/explain?pretty'
# Retry failed shard allocation
curl -X POST 'http://localhost:9200/_cluster/reroute?retry_failed=true'
# If a node was permanently removed and you accept data loss on that shard:
curl -X POST 'http://localhost:9200/_cluster/reroute' -H 'Content-Type: application/json' -d '{
"commands": [{
"allocate_stale_primary": {
"index": "my-index",
"shard": 0,
"node": "node-1",
"accept_data_loss": true
}
}]
}'
Fix D: Relieve Thread Pool Pressure
For sustained search thread pool saturation, tune the thread pool size (requires node restart):
# elasticsearch.yml
thread_pool:
search:
size: 13 # default: (vCPUs * 3) / 2 + 1
queue_size: 1000 # default: 1000
Alternatively, reduce concurrent client connections or implement request rate limiting upstream (NGINX, API gateway).
Fix E: Optimize the Query
Replace slow wildcard / fuzzy / leading-wildcard patterns with match, prefix on keyword fields, or use ngram tokenizers:
# Bad — leading wildcard scans all terms:
{"query": {"wildcard": {"title": "*cloud*"}}}
# Good — use match on analyzed field or prefix on keyword:
{"query": {"match": {"title": "cloud"}}}
Use filter context for non-scoring clauses (enables caching):
{
"query": {
"bool": {
"filter": [
{"term": {"status": "active"}},
{"range": {"created_at": {"gte": "now-7d"}}}
],
"must": [
{"match": {"description": "kubernetes"}}
]
}
}
}
Fix F: Reduce Shard Count for Over-Sharded Indices
Over-sharding (too many small shards) causes excessive coordination overhead. The recommended shard size is 10–50 GB.
# Check current shard sizes
curl -s 'http://localhost:9200/_cat/shards?v&h=index,shard,prirep,state,docs,store,node' | sort -k6 -h
# Shrink an over-sharded index (must be read-only, all shards on one node)
curl -X PUT 'http://localhost:9200/my-index/_settings' -H 'Content-Type: application/json' -d '{
"settings": {
"index.routing.allocation.require._name": "node-1",
"index.blocks.write": true
}
}'
curl -X POST 'http://localhost:9200/my-index/_shrink/my-index-shrunk' -H 'Content-Type: application/json' -d '{
"settings": {
"index.number_of_shards": 1,
"index.number_of_replicas": 1,
"index.routing.allocation.require._name": null
}
}'
Step 3: Verify the Fix
After applying your fix, confirm the improvement:
# Confirm cluster is green
curl -s 'http://localhost:9200/_cluster/health?wait_for_status=green&timeout=30s&pretty'
# Check response time of your query
time curl -s -X GET 'http://localhost:9200/my-index/_search' -H 'Content-Type: application/json' -d '{"query":{"match_all":{}},"size":1}' | python3 -m json.tool | grep took
# Confirm timed_out is false
curl -s 'http://localhost:9200/my-index/_search?timeout=10s' -H 'Content-Type: application/json' -d '{"query":{"match_all":{}}}' | python3 -c "import json,sys; r=json.load(sys.stdin); print('timed_out:', r.get('timed_out'))"
Frequently Asked Questions
#!/usr/bin/env bash
# elasticsearch-timeout-triage.sh
# Run against your cluster to collect timeout diagnostic data
ES_HOST="${ES_HOST:-http://localhost:9200}"
INDEX="${1:-*}" # pass index name as arg or defaults to all
echo "=== [1] Cluster Health ==="
curl -s "${ES_HOST}/_cluster/health?pretty"
echo -e "\n=== [2] Thread Pool Saturation ==="
curl -s "${ES_HOST}/_cat/thread_pool/search,write,bulk?v&h=node_name,name,active,queue,rejected,completed"
echo -e "\n=== [3] Pending Cluster Tasks ==="
curl -s "${ES_HOST}/_cluster/pending_tasks?pretty" | python3 -c "
import json,sys
d=json.load(sys.stdin)
print(f'Pending tasks: {len(d.get(\"tasks\",[])))}')
for t in d.get('tasks',[])[:5]:
print(' -', t.get('source','?'), '|', t.get('time_in_queue','?'))
"
echo -e "\n=== [4] JVM Heap by Node ==="
curl -s "${ES_HOST}/_nodes/stats/jvm?pretty" | python3 -c "
import json,sys
nodes=json.load(sys.stdin)['nodes']
for nid,n in nodes.items():
heap=n['jvm']['mem']
used=heap['heap_used_in_bytes']
total=heap['heap_max_in_bytes']
pct=heap['heap_used_percent']
gc_count=sum(v['collection_count'] for v in n['jvm']['gc']['collectors'].values())
print(f\"{n['name']}: heap={pct}% ({used//1024//1024}MB/{total//1024//1024}MB) gc_count={gc_count}\")
"
echo -e "\n=== [5] Unassigned Shards ==="
UNASSIGNED=$(curl -s "${ES_HOST}/_cat/shards?h=index,shard,prirep,state" | grep -c UNASSIGNED || true)
echo "Unassigned shard count: ${UNASSIGNED}"
if [ "${UNASSIGNED}" -gt 0 ]; then
echo " --> Run: curl -s '${ES_HOST}/_cluster/allocation/explain?pretty' for details"
fi
echo -e "\n=== [6] Large / Hot Shards ==="
curl -s "${ES_HOST}/_cat/shards/${INDEX}?v&h=index,shard,prirep,state,docs,store,node&s=store:desc" | head -20
echo -e "\n=== [7] Sample Query Latency ==="
SAMPLE_MS=$(curl -s -X GET "${ES_HOST}/${INDEX}/_search" \
-H 'Content-Type: application/json' \
-d '{"query":{"match_all":{}},"size":1,"timeout":"5s"}' \
| python3 -c "import json,sys; r=json.load(sys.stdin); print(r.get('took','?'),'ms | timed_out:',r.get('timed_out'))")
echo " Took: ${SAMPLE_MS}"
echo -e "\n=== [8] Circuit Breaker Status ==="
curl -s "${ES_HOST}/_nodes/stats/breaker?pretty" | python3 -c "
import json,sys
nodes=json.load(sys.stdin)['nodes']
for nid,n in nodes.items():
print(n['name'])
for name,cb in n['breakers'].items():
print(f\" {name}: {cb['overhead']}x overhead, limit={cb['limit_size_in_bytes']//1024//1024}MB, tripped={cb['tripped']}\")
"
echo -e "\n=== Triage Complete ==="Error Medic Editorial
Error Medic Editorial is a team of senior DevOps engineers and SREs with hands-on experience managing large-scale Elasticsearch deployments across cloud and on-premises environments. Our guides are based on real production incidents, official documentation, and community-verified solutions.
Sources
- https://www.elastic.co/guide/en/elasticsearch/reference/current/search-your-data.html
- https://www.elastic.co/guide/en/elasticsearch/reference/current/modules-threadpool.html
- https://www.elastic.co/guide/en/elasticsearch/reference/current/cluster-health.html
- https://www.elastic.co/guide/en/elasticsearch/client/python-api/current/getting-started-python.html
- https://github.com/elastic/elasticsearch-py/issues/1005
- https://stackoverflow.com/questions/22924300/elasticsearch-timeout-request
- https://www.elastic.co/guide/en/elasticsearch/reference/current/index-modules-slowlog.html