What is the difference between a connection timeout and a read timeout in Elasticsearch?

A connection timeout occurs when the client cannot establish an initial TCP connection to the Elasticsearch node (often due to network firewalls or the node being completely down). A read timeout occurs after the connection is established; the client sends the request but the Elasticsearch node takes too long to process it and return the data.

Why do my Bulk API indexing requests keep timing out?

Bulk indexing timeouts usually happen because the payload is too large or the cluster's write thread pool queue is full. Try reducing the batch size (e.g., from 5,000 documents to 1,000 documents per request) and check for long JVM GC pauses on the data nodes.

How do I fix EsRejectedExecutionException?

This exception means the thread pool queue for a specific action (like search or write) is full. You can temporarily increase the queue size in cluster settings, but the true fix is to either optimize the queries to execute faster or scale out the cluster by adding more nodes to handle the concurrency.

Is it safe to just increase the client timeout to 5 minutes?

While it masks the error, arbitrarily increasing timeouts is dangerous. It can lead to 'cascading failures' where clients keep holding open connections for long-running, inefficient queries, eventually exhausting all cluster resources and causing an entire cluster crash. Always optimize queries first.

How can I tell if Garbage Collection (GC) is causing my timeouts?

Check the Elasticsearch logs for `[gc]` tags. If you see messages indicating GC took multiple seconds (e.g., `[gc][old]... duration [10.2s]`), your JVM is freezing. You can also monitor the `jvm.mem.heap_used_percent` metric; if it constantly sits above 85-90% before a timeout, heap pressure is the root cause.

Resolving Elasticsearch API Timeout Errors: 408 Request Timeout & Read timed out

Fix Elasticsearch API timeouts (408 Request Timeout, Read timed out) by optimizing heavy queries, tuning thread pools, and adjusting client-side limits.

Last updated: February 23, 2026

Last verified: February 23, 2026

1,341 words

Key Takeaways

Client-side connection limits or read timeouts are too low for the query complexity.
High JVM Garbage Collection (GC) pauses are freezing node responsiveness.
Search or write thread pools are exhausted, leading to queued and ultimately rejected requests (EsRejectedExecutionException).
Unoptimized queries (e.g., leading wildcards, deep pagination, massive aggregations) are consuming excessive CPU and memory.
Quick Fix: Temporarily increase client timeout parameters, but structurally resolve by optimizing slow queries and scaling node resources.

Fix Approaches Compared
Method	When to Use	Time to Implement	Risk Level
Increase Client Timeout	Client drops connection before ES finishes processing a valid, complex request	5 mins	Low
Optimize Search Queries	Heavy queries causing high CPU/Memory loads across the cluster	1-2 hours	Medium
Tune Node Thread Pools	High queued/rejected executions in node stats during traffic spikes	15 mins	Medium
Increase JVM Heap / Scale Out	Frequent long GC pauses or continuous OutOfMemory (OOM) conditions	1 hour	High

Understanding the Error

Elasticsearch API timeout errors are a common and critical symptom of underlying performance bottlenecks or misconfigurations within your search infrastructure. When interacting with the Elasticsearch REST API, clients expect a response within a specific time window. If the Elasticsearch cluster fails to return the data before this window expires, the client severs the connection and throws a timeout exception.

These errors typically manifest in logs as:

elasticsearch.exceptions.ConnectionTimeout: ConnectionTimeout caused by - ReadTimeoutError(HTTPConnectionPool(host='localhost', port=9200): Read timed out. (read timeout=10))
408 Request Timeout
java.net.SocketTimeoutException: Read timed out
EsRejectedExecutionException: rejected execution of org.elasticsearch.transport.TransportService

The root cause rarely lies in the network itself. Instead, it is usually a symptom of the Elasticsearch cluster being overwhelmed, JVM memory pressure, exhausted thread pools, or unoptimized data access patterns.

Step 1: Diagnose the Bottleneck

Before implementing a fix, you must identify whether the timeout is a client-side misconfiguration or a server-side resource exhaustion issue.

1. Check the Elasticsearch Slow Logs Slow logs are your first line of defense. If queries are taking longer than the client's timeout threshold, they will appear here. By default, slow logs are disabled. You must enable them dynamically on the indices experiencing issues. Look for queries that take longer than 5-10 seconds. Identify patterns: are they using wildcards (*term), massive aggregations, or deep pagination (from: 10000)?

2. Monitor JVM Garbage Collection (GC) Elasticsearch runs on the JVM. If the heap is undersized for the data volume, the JVM will frequently trigger 'Stop-the-World' Garbage Collection pauses. During a major GC pause, the node literally stops processing requests. If a GC pause lasts for 15 seconds, any client with a 10-second read timeout will fail. Check your Elasticsearch logs for lines containing [gc][young] or [gc][old] accompanied by high latency durations.

3. Inspect Thread Pools and Rejections Elasticsearch uses distinct thread pools for different operations (search, write, get). When a node receives a request, it hands it to the appropriate thread pool. If all threads are busy, the request goes into a queue. If the queue fills up, Elasticsearch rejects the request. You can check this by querying the _cat/thread_pool API. High numbers in the rejected column indicate that your cluster is fundamentally under-provisioned for the current workload concurrency.

Step 2: Implement the Fix

Depending on your diagnostic findings, apply one or more of the following solutions.

Fix A: Adjusting Client-Side Timeouts

If you are running intentionally long operations (e.g., bulk reindexing, snapshotting, or heavy analytical aggregations), the default client timeout (often 10 seconds) is simply too aggressive. You need to explicitly configure the client to wait longer.

Python Client (elasticsearch-py):

from elasticsearch import Elasticsearch
# Increase timeout to 60 seconds
es = Elasticsearch(
    ["http://localhost:9200"],
    timeout=60,
    max_retries=3,
    retry_on_timeout=True
)

Node.js Client (@elastic/elasticsearch):

const { Client } = require('@elastic/elasticsearch')
const client = new Client({
  node: 'http://localhost:9200',
  requestTimeout: 60000 // 60 seconds in milliseconds
})

Fix B: Query Optimization

Throwing hardware at bad queries is an expensive and temporary fix. Optimize your search requests to reduce cluster load.

Avoid Leading Wildcards: Queries like *searchterm force Elasticsearch to scan the entire inverted index. This is extremely slow. Use edge n-grams during indexing instead.
Eliminate Deep Pagination: Fetching page 1,000 using from and size requires Elasticsearch to retrieve and sort 10,000 documents just to discard 9,990 of them. Switch to the search_after API for deep scrolling.
Use Filter Context: If you are filtering data (e.g., status: active) and don't care about scoring, put the clause in a filter block instead of must. Filter clauses are cached and bypass the scoring algorithm entirely.

Fix C: Addressing JVM and Hardware Limits

If your queries are optimized but timeouts persist due to GC pauses or high CPU, you must scale your infrastructure.

Increase Heap Size: Ensure your JVM heap (-Xms and -Xmx in jvm.options) is set to 50% of the available physical RAM, but never more than 32GB (to maintain zero-based compressed oops).
Scale Out: Add more data nodes to the cluster. Elasticsearch scales horizontally very well. Distributing primary shards across more nodes parallelizes the workload and reduces the per-node CPU and memory pressure.
Storage Speed: Ensure your Elasticsearch data directories are backed by high-IOPS NVMe SSDs. Spinning disks (HDDs) or low-tier network-attached storage will cause massive IO wait times, indirectly leading to API timeouts.

Step 3: Preventative Monitoring

To prevent recurrences, establish robust alerting. Monitor the _nodes/stats endpoint. Alert if JVM heap usage consistently exceeds 85%. Alert on any increase in thread pool rejections. Ensure your APM or logging infrastructure tracks 99th percentile query latency, allowing you to catch degrading performance long before it turns into a hard timeout.

Frequently Asked Questions

bash

# 1. Check cluster health and active long-running tasks
curl -X GET "localhost:9200/_cluster/health?pretty"
curl -X GET "localhost:9200/_tasks?detailed=true&actions=*search*&pretty"

# 2. Check for thread pool rejections (Look at the 'rejected' column)
curl -X GET "localhost:9200/_cat/thread_pool/search,write?v&h=id,name,active,queue,rejected,completed"

# 3. View node memory, JVM heap stats, and GC pause times
curl -X GET "localhost:9200/_nodes/stats/jvm?pretty"

# 4. Enable Slow Logs dynamically for a specific index to catch timeout culprits
curl -X PUT "localhost:9200/my-index/_settings" -H 'Content-Type: application/json' -d'
{
  "index.search.slowlog.threshold.query.warn": "2s",
  "index.search.slowlog.threshold.fetch.warn": "2s",
  "index.search.slowlog.threshold.query.info": "1s",
  "index.search.slowlog.level": "info"
}'

Error Medic Editorial

Error Medic Editorial consists of senior DevOps engineers and Site Reliability Experts dedicated to untangling complex infrastructure bottlenecks. With decades of combined experience managing petabyte-scale Elasticsearch clusters, we provide actionable, production-ready solutions.

Sources

Explore More API Errors Guides

AWS API Rate Limit Exceeded (ThrottlingException): Complete Troubleshooting Guide

Fix AWS ThrottlingException and API timeouts with exponential backoff, Service Quotas increases, and optimized API polling strategies for your workloads.

Azure API Timeout: 'The operation timed out' — Root Causes and Fixes

Fix Azure API timeouts caused by misconfigured APIM policies, backend latency, or connection limits. Step-by-step diagnostics and policy fixes included.

Azure API Timeout: Fix 504 Gateway Timeout and RequestTimeout Errors in Azure API Management, Functions, and ARM

Diagnose and fix Azure API timeout errors (504, 408, RequestTimeout) across API Management, Functions, and ARM. Includes policy fixes, host.json config, and CLI