Why do my Elasticsearch timeouts happen in bursts every few minutes rather than continuously?

Burst timeouts correlate almost perfectly with JVM Old Generation GC (stop-the-world) events. When the JVM pauses to collect garbage, all in-flight requests stall simultaneously. Check `_nodes/stats/jvm` for `old_gc_time_in_millis` growing rapidly. Fix: reduce heap pressure by disabling unnecessary fielddata, capping result sizes, and tuning JVM heap to ≤50% of available RAM.

My Elasticsearch timeout error says ReadTimeoutError but the query succeeds when I retry—what's happening?

This is a network-layer timeout, not an Elasticsearch timeout. Your client timer expired before the proxy (ALB, NGINX) forwarded the response. The query did complete on the server. Increase your client `request_timeout` to match or exceed the proxy timeout, and increase the proxy idle/read timeout. Enabling `retry_on_timeout=True` in your client provides automatic recovery for idempotent GET/search requests.

I'm getting 504 Gateway Timeout but Elasticsearch cluster health is green—where is the timeout?

A 504 always originates from a reverse proxy or load balancer, not Elasticsearch itself. The proxy gave up waiting before the upstream (Elasticsearch) responded. Check: (1) ALB idle timeout (default 60s), (2) NGINX proxy_read_timeout, (3) HAProxy timeout server. Increase these to 120s+ for search workloads, and simultaneously investigate why the query is slow using the Profile API and slow logs.

How do I set a per-request timeout so a slow query doesn't block other requests?

Pass `timeout` in the query body: `{"timeout": "10s", "query": {...}}`. Elasticsearch returns partial results from shards that responded within the window and sets `timed_out: true` in the response. For global protection, set `search.default_search_timeout` as a persistent cluster setting. This ensures runaway queries fail fast rather than exhausting the search thread pool queue.

After upgrading Elasticsearch from 7.x to 8.x my API timeouts increased dramatically—why?

Elasticsearch 8.x enforces stricter security defaults (TLS required, API key auth) and changed the default HTTP client settings. Common causes after upgrade: (1) the Java security manager overhead on first requests, (2) certificate verification adding latency, (3) the new `_source` filtering behavior increasing serialization time for large docs. Profile your slowest queries with `profile: true` and compare fielddata/fetch phase times between versions. Also check if `xpack.security.enabled` added TLS handshake overhead by comparing response times on direct node access vs. through your client.

Elasticsearch API Timeout: Diagnosing and Fixing ReadTimeoutError, SocketTimeoutException, and Request Timeout Errors

Fix Elasticsearch API timeout errors—ReadTimeoutError, SocketTimeoutException, 408/504 responses—with step-by-step diagnosis and config tuning.

Last updated: February 23, 2026

Last verified: February 23, 2026

1,841 words

Key Takeaways

The most common root causes are undersized thread pools, JVM GC pauses causing stop-the-world events, and queries hitting unoptimized large indices without shard routing
Network-level timeouts (TCP keepalive, load balancer idle timeout) frequently mask themselves as Elasticsearch client timeouts—always check both layers
Quick fix: increase client request_timeout to 60s as a stopgap, but the permanent fix requires identifying whether the bottleneck is at the query, index, JVM, or network layer

Fix Approaches Compared
Method	When to Use	Time to Apply	Risk
Increase client request_timeout	Immediate relief while diagnosing root cause	< 5 min	Low — client-side only
Tune search.default_search_timeout	Queries consistently slow on large indices	5–10 min, rolling restart not required	Low — cluster setting
Add index routing / filter context	Queries scan too many shards	30–60 min (index rebuild may be needed)	Medium — schema change
Scale JVM heap and GC tuning	Full GC pauses > 10s visible in logs	15–30 min, requires node restart	Medium — node restart
Add replica shards / hot-warm tiering	Sustained high read throughput	1–4 hours	Low-Medium — cluster rebalance
Fix load balancer idle timeout	Connections silently dropped mid-request	5–15 min (infra change)	Low

Understanding Elasticsearch API Timeouts

An Elasticsearch API timeout surfaces differently depending on where in the request chain the clock runs out. Developers typically encounter one of these exact error strings:

readtimeouterror: httpsconnectionpool(host='es-host', port=9200): read timed out. (read timeout=10)
SocketTimeoutException: 30,000 milliseconds timeout on connection http://es-host:9200
ConnectionTimeout: ConnectionTimeout caused by - ReadTimeoutError
[408 Request Timeout] or [504 Gateway Timeout] from a proxy
transportexception[failed to get node response]

The timeout can originate at three distinct layers: the client library (Python elasticsearch-py, Java High-Level REST Client, etc.), the network/proxy layer (ALB, NGINX, HAProxy), or inside Elasticsearch itself (search timeout, bulk timeout). Conflating these layers is the single most common debugging mistake.

Step 1: Identify Which Layer Is Timing Out

Check cluster health and slow-log first:

# Cluster health — are you yellow/red?
curl -s 'http://localhost:9200/_cluster/health?pretty'

# Node stats — check thread pool rejection counters
curl -s 'http://localhost:9200/_nodes/stats/thread_pool?pretty' | \
  python3 -c "import sys,json; d=json.load(sys.stdin); \
  [print(n, tp, v) for n,nd in d['nodes'].items() \
  for tp,tv in nd['thread_pool'].items() \
  for k,v in tv.items() if k in ('rejected','queue') and v>0]"

# Pending tasks
curl -s 'http://localhost:9200/_cluster/pending_tasks?pretty'

# Hot threads — what is the JVM actually doing?
curl -s 'http://localhost:9200/_nodes/hot_threads'

Enable slow logs on the problematic index:

curl -X PUT 'http://localhost:9200/my-index/_settings' -H 'Content-Type: application/json' -d '{
  "index.search.slowlog.threshold.query.warn": "5s",
  "index.search.slowlog.threshold.query.info": "2s",
  "index.search.slowlog.threshold.fetch.warn": "1s",
  "index.indexing.slowlog.threshold.index.warn": "5s"
}'

Then tail /var/log/elasticsearch/*_index_search_slowlog.log on each data node.

Check if a load balancer is killing the connection:

AWS ALB default idle timeout is 60 seconds. If your queries consistently time out at exactly 60s from the client side but Elasticsearch is still running the query, the ALB is dropping the TCP connection. Verify:

# Time an actual request end-to-end
time curl -s -w "\n\nHTTP %{http_code} | Total: %{time_total}s\n" \
  'http://localhost:9200/my-index/_search?pretty' \
  -H 'Content-Type: application/json' \
  -d '{"query":{"match_all":{}},"size":1}'

# Compare against direct node access (bypassing LB)
time curl -s 'http://ES-NODE-IP:9200/my-index/_search' \
  -H 'Content-Type: application/json' \
  -d '{"query":{"match_all":{}},"size":1}'

Step 2: Diagnose JVM GC Pauses

GC stop-the-world events pause the entire JVM, making every in-flight request appear to time out simultaneously. This is a classic pattern: you see a burst of timeouts across multiple unrelated queries at the exact same second.

# Look for GC log evidence
grep -E 'GC overhead|Stopping world|pause' /var/log/elasticsearch/gc.log | tail -50

# Node JVM stats
curl -s 'http://localhost:9200/_nodes/stats/jvm?pretty' | \
  python3 -c "
import sys, json
d = json.load(sys.stdin)
for nid, n in d['nodes'].items():
    jvm = n['jvm']
    print(n['name'])
    print('  heap_used_percent:', jvm['mem']['heap_used_percent'])
    print('  old_gc_count:', jvm['gc']['collectors']['old']['collection_count'])
    print('  old_gc_time_ms:', jvm['gc']['collectors']['old']['collection_time_in_millis'])
"

If heap_used_percent is consistently above 75% or old_gc_time_in_millis is growing rapidly, you have a heap pressure problem.

Fix for JVM heap pressure:

Set heap to no more than 50% of RAM, max 31GB (stays below compressed OOP threshold):
```
# In /etc/elasticsearch/jvm.options
-Xms16g
-Xmx16g
```

Switch to G1GC if on JDK 9+ (default since ES 7.x):

-XX:+UseG1GC
-XX:G1HeapRegionSize=4m
-XX:InitiatingHeapOccupancyPercent=30

Identify field data cache bloat:

curl -s 'http://localhost:9200/_nodes/stats/indices/fielddata?pretty' | \
  grep -E 'memory_size|evictions'

Step 3: Diagnose Query-Level Timeouts

A query scanning too many shards or too many documents will exhaust the search thread pool, causing queuing and eventual timeout.

# How many shards does this index have?
curl -s 'http://localhost:9200/_cat/shards/my-index?v&h=index,shard,prirep,state,docs,store,node'

# How expensive is the query? Use profile API
curl -X POST 'http://localhost:9200/my-index/_search?pretty' \
  -H 'Content-Type: application/json' -d '{
  "profile": true,
  "query": { "YOUR_QUERY_HERE": {} }
}'

# Explain a specific document match
curl -X GET 'http://localhost:9200/my-index/_explain/DOC_ID?pretty' \
  -H 'Content-Type: application/json' -d '{
  "query": { "YOUR_QUERY_HERE": {} }
}'

Common query fixes:

Replace wildcard: {"field": "*term*"} with an ngram analyzer at index time
Move date range filters into filter context (cached) instead of query context (scored)
Add routing parameter to limit shard fan-out: POST /my-index/_search?routing=tenant_id
Set a circuit-breaker timeout at query level:
```
{ "timeout": "30s", "query": { ... } }
```

Set a cluster-wide search timeout as a safety net:

curl -X PUT 'http://localhost:9200/_cluster/settings' \
  -H 'Content-Type: application/json' -d '{
  "persistent": {
    "search.default_search_timeout": "30s"
  }
}'

Step 4: Fix Client-Side Timeout Configuration

Python (elasticsearch-py 8.x):

from elasticsearch import Elasticsearch

es = Elasticsearch(
    ["https://es-host:9200"],
    request_timeout=60,       # seconds before ReadTimeoutError
    retry_on_timeout=True,
    max_retries=3,
    http_compress=True,
)

Node.js (@elastic/elasticsearch):

const { Client } = require('@elastic/elasticsearch')
const client = new Client({
  node: 'https://es-host:9200',
  requestTimeout: 60000,      // milliseconds
  sniffOnStart: false,        // avoid sniff timeouts in prod
})

Java (RestHighLevelClient / 8.x JavaClient):

RestClientBuilder builder = RestClient.builder(
    new HttpHost("es-host", 9200, "https"))
  .setRequestConfigCallback(config -> config
    .setConnectTimeout(5000)
    .setSocketTimeout(60000))   // ms
  .setHttpClientConfigCallback(httpClient -> httpClient
    .setKeepAliveStrategy((response, context) -> 60_000));

Step 5: Fix Network / Proxy Timeouts

AWS ALB: Set idle timeout to 120s+ in the console or:

aws elbv2 modify-load-balancer-attributes \
  --load-balancer-arn arn:aws:elasticloadbalancing:... \
  --attributes Key=idle_timeout.timeout_seconds,Value=120

NGINX upstream:

upstream elasticsearch {
    server es-host:9200;
    keepalive 32;
}
server {
    location / {
        proxy_pass http://elasticsearch;
        proxy_read_timeout 120s;
        proxy_send_timeout 120s;
        proxy_connect_timeout 10s;
    }
}

HAProxy:

timeout connect 10s
timeout client  120s
timeout server  120s

Step 6: Validate the Fix

# Watch thread pool rejection counters in real time
watch -n 5 "curl -s 'http://localhost:9200/_cat/thread_pool?v&h=name,active,queue,rejected&s=rejected:desc' | head -15"

# Confirm GC pressure reduced
curl -s 'http://localhost:9200/_nodes/stats/jvm' | \
  python3 -c "import sys,json; d=json.load(sys.stdin); \
  [print(n['name'], n['jvm']['mem']['heap_used_percent'], '%') \
  for n in d['nodes'].values()]"

# Run a representative query with timing
for i in $(seq 1 10); do
  curl -s -w "%{time_total}\n" -o /dev/null \
    -X POST 'http://localhost:9200/my-index/_search' \
    -H 'Content-Type: application/json' \
    -d '{"query":{"match_all":{}},"size":10}'
done

If median response times drop below your SLA threshold and thread pool rejections return to zero, the fix is confirmed.

Frequently Asked Questions

bash

#!/usr/bin/env bash
# elasticsearch-timeout-diagnose.sh
# Run on any host with curl access to your ES cluster

ES="http://localhost:9200"

echo "=== 1. Cluster Health ==="
curl -s "$ES/_cluster/health?pretty"

echo ""
echo "=== 2. Thread Pool Rejections (non-zero only) ==="
curl -s "$ES/_cat/thread_pool?v&h=name,active,queue,rejected,completed&s=rejected:desc" | \
  awk 'NR==1 || $4 > 0'

echo ""
echo "=== 3. JVM Heap + GC Per Node ==="
curl -s "$ES/_nodes/stats/jvm?pretty" | python3 -c "
import sys, json
d = json.load(sys.stdin)
for nid, n in d['nodes'].items():
    jvm = n['jvm']
    print('Node:', n['name'])
    print('  heap_used_percent:', jvm['mem']['heap_used_percent'])
    old = jvm['gc']['collectors'].get('old', {})
    print('  old_gc_count:', old.get('collection_count', 'N/A'))
    print('  old_gc_time_ms:', old.get('collection_time_in_millis', 'N/A'))
"

echo ""
echo "=== 4. Pending Cluster Tasks ==="
curl -s "$ES/_cluster/pending_tasks?pretty"

echo ""
echo "=== 5. Hot Threads (top CPU consumers) ==="
curl -s "$ES/_nodes/hot_threads?threads=3&interval=2s"

echo ""
echo "=== 6. Shard Distribution ==="
curl -s "$ES/_cat/shards?v&h=index,shard,prirep,state,docs,store,node&s=store:desc" | head -30

echo ""
echo "=== 7. FieldData Cache Memory ==="
curl -s "$ES/_nodes/stats/indices/fielddata?pretty" | python3 -c "
import sys, json
d = json.load(sys.stdin)
for nid, n in d['nodes'].items():
    fd = n['indices']['fielddata']
    print(n['name'], '|', 'fielddata_memory:', fd['memory_size'], '| evictions:', fd['evictions'])
"

echo ""
echo "=== 8. Timed Query Sample (5 requests) ==="
for i in $(seq 1 5); do
  RESULT=$(curl -s -w "HTTP:%{http_code} TIME:%{time_total}s" \
    -X POST "$ES/_search" \
    -H 'Content-Type: application/json' \
    -d '{"query":{"match_all":{}},"size":10,"timeout":"10s"}')
  echo "  Run $i: $(echo $RESULT | grep -o 'HTTP:[^ ]*') $(echo $RESULT | grep -o 'TIME:[^ ]*')"
done

echo ""
echo "=== Done. Review thread pool rejections, heap %, and GC time ==="
# Remediation quick reference:
# High rejections   -> increase search threadpool OR fix slow queries
# heap_used > 75%   -> reduce fielddata, lower doc fetch size, tune JVM heap
# GC time growing   -> check for cartesian products in aggregations, large sorts
# Pending tasks > 0 -> cluster rebalancing; wait or investigate shard assignment

Error Medic Editorial

Error Medic Editorial is written by senior DevOps and SRE engineers with production experience running Elasticsearch clusters at scale across AWS, GCP, and on-premise environments. Articles are peer-reviewed for technical accuracy and tested against current Elasticsearch 7.x and 8.x releases.

Sources

Explore More API Errors Guides

AWS API Rate Limit Exceeded (ThrottlingException): Complete Troubleshooting Guide

Fix AWS ThrottlingException and API timeouts with exponential backoff, Service Quotas increases, and optimized API polling strategies for your workloads.

Azure API Timeout: 'The operation timed out' — Root Causes and Fixes

Fix Azure API timeouts caused by misconfigured APIM policies, backend latency, or connection limits. Step-by-step diagnostics and policy fixes included.

Azure API Timeout: Fix 504 Gateway Timeout and RequestTimeout Errors in Azure API Management, Functions, and ARM

Diagnose and fix Azure API timeout errors (504, 408, RequestTimeout) across API Management, Functions, and ARM. Includes policy fixes, host.json config, and CLI