Why is the Linux kernel killing my Elasticsearch process?

This happens when the host OS runs entirely out of available RAM and swap space. The Linux Out-Of-Memory (OOM) Killer heuristically selects a high-memory-consuming process (usually Elasticsearch) and sends a SIGKILL to protect the operating system from crashing. This usually means you have allocated too much memory to the JVM heap, leaving insufficient RAM for the OS and filesystem cache.

How do I recover from a java.lang.OutOfMemoryError in Elasticsearch?

If the node is completely unresponsive, you will need to restart the Elasticsearch service. Once recovered, immediately clear the fielddata cache, investigate the query logs or task manager to identify the heavy queries causing the spike, and verify that your JVM heap limits (-Xms and -Xmx) are configured correctly in jvm.options.

Should I give Elasticsearch more than 32GB of heap?

Generally, no. Giving the JVM more than ~31GB of heap disables Compressed Ordinary Object Pointers (OOPs). Without compressed OOPs, pointers jump from 32-bit to 64-bit, drastically increasing memory overhead. A 40GB heap without compressed OOPs actually holds less data than a 30GB heap with them. If you need more capacity, scale out by adding more nodes rather than increasing a single node's heap beyond 31GB.

What causes high fielddata memory usage?

Fielddata is an in-memory data structure used primarily for sorting and aggregating on analyzed 'text' fields. When you run an aggregation on a text field, Elasticsearch loads every unique term into memory. To fix this, you should update your mapping to use 'keyword' types for aggregations and sorting, which use disk-based doc_values instead of in-memory fielddata.

How can I prevent expensive queries from crashing the node?

Utilize Elasticsearch Circuit Breakers. Circuit breakers estimate the memory a query will require before executing it. You can adjust settings like 'indices.breaker.total.limit' and 'indices.breaker.request.limit' to reject impossibly large requests (returning a 429 error) instead of allowing them to exhaust the heap and trigger an OutOfMemoryError.

Troubleshooting Elasticsearch OOM: Fixing OutOfMemoryError and Killed Processes

Fix Elasticsearch OutOfMemoryError and kernel OOM killed process crashes. Learn how to tune JVM heap, optimize queries, and configure circuit breakers.

Last updated: February 24, 2026

Last verified: February 24, 2026

1,784 words

Key Takeaways

Improper JVM heap sizing is the primary cause; heap should be set to 50% of available RAM, but never exceed 31GB to ensure compressed Ordinary Object Pointers (OOPs) are used.
The Linux OOM Killer terminating the process (out of memory killed process) indicates the OS lacks memory, often because heap + off-heap usage exceeds system RAM or swap is enabled.
Unbounded aggregations, deeply nested queries, or sorting on unoptimized text fields can rapidly exhaust the JVM heap, triggering a java.lang.OutOfMemoryError.
Quick Fix: Clear the fielddata cache, adjust your jvm.options (-Xms and -Xmx), and implement stricter circuit breaker limits to prevent rogue queries from crashing the node.

Fix Approaches Compared
Method	When to Use	Time	Risk
Increase JVM Heap Size	Node is chronically under-provisioned, heap is constantly > 85%	Fast	Low (Requires node restart)
Enable bootstrap.memory_lock	Elasticsearch memory is being swapped to disk, causing OS-level OOMs	Medium	Low (Requires OS config changes)
Tune Circuit Breakers	Preventing rogue, heavy queries from exhausting all available heap	Medium	Medium (May reject legitimate heavy queries)
Clear Fielddata Cache	Immediate relief needed for a node actively throwing OutOfMemoryError	Immediate	Low (Temporary latency increase for cached queries)

Understanding the Error

When managing an Elasticsearch cluster, one of the most critical and catastrophic failures you can encounter is an Out of Memory (OOM) event. Because Elasticsearch is a memory-intensive application built on top of the Java Virtual Machine (JVM) and relies heavily on the operating system's filesystem cache (Lucene), memory management is paramount.

There are two distinct types of OOM scenarios that engineers often conflate:

The JVM java.lang.OutOfMemoryError: This occurs when the Elasticsearch JVM process exhausts its allocated heap space. The JVM attempts to run garbage collection (GC), but if it cannot free enough memory to accommodate new object allocations, it throws an elasticsearch outofmemoryerror. The process might stay alive but remain completely unresponsive, or it may crash entirely depending on your ExitOnOutOfMemoryError JVM flags.
The Linux OS OOM Killer (elasticsearch out of memory killed process): This is an operating system-level intervention. When the host Linux kernel runs out of physical memory and swap space, it invokes the OOM Killer to sacrifice a process to save the system. Because Elasticsearch is usually the largest memory consumer on the node, it becomes the primary target. You will find Out of memory: Killed process in your dmesg or /var/log/messages logs.

Understanding which type of "out of memory elasticsearch" event you are facing is the crucial first step in your troubleshooting journey.

Step 1: Diagnose the Exact Failure Domain

Before making any configuration changes, you must determine if the OS killed the process or if the JVM exhausted its heap.

Checking for OS-level OOM Kills

If your Elasticsearch node suddenly disappears from the cluster and the process is no longer running, check the Linux kernel ring buffer. Run dmesg -T | grep -i oom or inspect /var/log/syslog (or /var/log/messages depending on your distro). You will typically see a message like:

[Tue Feb 24 10:14:22 2026] Out of memory: Killed process 14322 (java) total-vm:64532100kB, anon-rss:32145600kB, file-rss:0kB

If you see this, the OS killed Elasticsearch. This means you have likely overallocated the JVM heap (leaving too little RAM for the OS and Lucene filesystem cache), or another process on the host is consuming RAM.

Checking for JVM OutOfMemoryError

If the process is still running but unresponsive, or if it crashed and left a heap dump, check the Elasticsearch application logs located in /var/log/elasticsearch/<cluster-name>.log. You are looking for stack traces containing:

java.lang.OutOfMemoryError: Java heap space or java.lang.OutOfMemoryError: GC overhead limit exceeded

This indicates that the queries, aggregations, or indexing operations required more memory than the -Xmx limit defined in jvm.options.

Step 2: Immediate Remediation and Stabilization

If the cluster is currently unstable, you need immediate mitigation tactics.

1. Clear the Caches: If the node is accessible but struggling, clear the caches. The fielddata cache is a notorious memory hog, especially if you are mistakenly sorting or aggregating on text fields instead of keyword fields.

curl -X POST "localhost:9200/_cache/clear?fielddata=true&pretty"

2. Identify Expensive Tasks: Use the Task Management API to find and cancel long-running queries that might be hoarding memory.

curl -X GET "localhost:9200/_tasks?detailed=true&actions=*data/read/search*"
curl -X POST "localhost:9200/_tasks/<task_id>/_cancel"

Step 3: Root Cause Fixes and Configuration

Fix 1: Properly Sizing the JVM Heap

The golden rule of Elasticsearch memory allocation is: Allocate 50% of your total physical RAM to the JVM heap, but never exceed 31GB.

Why 50%? Elasticsearch relies heavily on Apache Lucene for underlying search segments. Lucene leverages the OS filesystem cache to keep data structures in memory. If you give the JVM 100% of the RAM, Lucene will have nothing left, leading to severe performance degradation and OS-level OOM kills.

Why a maximum of 31GB? The JVM uses a feature called Compressed Ordinary Object Pointers (Compressed OOPs). When the heap is under approximately 32GB (the exact threshold varies by JVM, usually around 31.something GB), the JVM can use 32-bit pointers instead of 64-bit pointers. This drastically reduces memory overhead. A 31GB heap with compressed OOPs is often more effective than a 40GB heap without them.

Edit /etc/elasticsearch/jvm.options (or jvm.options.d/heap.options):

-Xms30g
-Xmx30g

Always set -Xms (minimum) and -Xmx (maximum) to the exact same value to prevent heap resizing during runtime, which is an expensive operation.

Fix 2: Disabling Swap

Swapping is the death of performance for any Java application, especially Elasticsearch. If the OS swaps parts of the JVM heap to disk, garbage collection pauses will spike from milliseconds to minutes, causing nodes to drop out of the cluster. Furthermore, aggressive swapping can trigger kernel panics or the OOM Killer.

Ensure bootstrap.memory_lock: true is set in your elasticsearch.yml. This tells Elasticsearch to use mlockall on startup, pinning its memory to RAM and preventing the OS from swapping it out. You must also configure the OS to allow this by editing /etc/security/limits.conf:

elasticsearch soft memlock unlimited
elasticsearch hard memlock unlimited

Fix 3: Tuning Circuit Breakers

Elasticsearch has built-in circuit breakers to prevent operations from causing an elasticsearch outofmemoryerror. These breakers estimate the memory required for a request before executing it. If the threshold is exceeded, the request is aborted, returning a 429 Too Many Requests or circuit breaker exception, which is vastly preferable to an OOM crash.

You can dynamically update cluster settings to restrict memory usage. The parent circuit breaker (indices.breaker.total.limit) defaults to 70% or 95% depending on the version. If you are experiencing OOMs, you might want to lower the fielddata or request breaker limits.

PUT /_cluster/settings
{
  "persistent": {
    "indices.breaker.fielddata.limit": "40%",
    "indices.breaker.request.limit": "40%",
    "indices.breaker.total.limit": "70%"
  }
}

Fix 4: Query and Mapping Optimization

Hardware and configuration tuning can only go so far. Ultimately, an elasticsearch oom is often an application layer problem.

Do not use fielddata on text fields: If you attempt an aggregation on an analyzed text field, Elasticsearch must load all terms into memory. This is the fastest way to crash a node. Ensure you are aggregating on keyword fields.
Limit bucket sizes: Deeply nested aggregations (e.g., aggregating by country, then state, then city, then user) generate exponential numbers of buckets. Use the size parameter in aggregations to limit the response.
Paginate responsibly: Deep pagination using from and size is highly inefficient because Elasticsearch must load the entire dataset up to from + size into memory. Use search_after or the Point in Time (PIT) API for deep scrolling.

Conclusion

Resolving an out of memory elasticsearch situation requires a holistic approach. First, protect the node from the OS OOM killer by properly balancing JVM heap and OS file cache allocations. Second, lock the memory to prevent swapping. Finally, utilize circuit breakers and query optimization to protect the JVM heap from runaway application requests. By strictly enforcing these SRE best practices, your Elasticsearch cluster will remain resilient and highly available.

Frequently Asked Questions

bash

#!/bin/bash
# Diagnostic script for Elasticsearch OOM troubleshooting

# 1. Check system logs for Linux OOM Killer events targeting java/elasticsearch
echo "--- Checking dmesg for OOM Killer events ---"
dmesg -T | grep -i oom | grep -i java

# 2. Check current Elasticsearch JVM heap utilization (requires curl and jq)
echo -e "\n--- Checking current JVM Heap Usage ---"
curl -s -X GET "http://localhost:9200/_nodes/stats/jvm?pretty" | grep -E "name|heap_used_percent|heap_used_in_bytes|heap_max_in_bytes"

# 3. Check circuit breaker stats to see if they are tripping frequently
echo -e "\n--- Checking Circuit Breaker Stats ---"
curl -s -X GET "http://localhost:9200/_nodes/stats/breaker?pretty" | grep -E "name|estimated_size_in_bytes|tripped"

# 4. Emergency: Clear fielddata cache to free heap space immediately
# Uncomment the line below to execute during an active memory crisis
# curl -X POST "http://localhost:9200/_cache/clear?fielddata=true"

Error Medic Editorial

Our SRE and DevOps editorial team consists of veteran infrastructure engineers specializing in distributed systems, database scaling, and high-availability architecture.

Sources

Explore More Database Guides

Cassandra 'Connection Refused' on Port 9042: Complete Troubleshooting Guide

Fix Cassandra connection refused errors on port 9042. Diagnose OOM kills, misconfigured listen_address, firewall blocks, slow queries, and data corruption with

DynamoDB Slow Query, Timeout & Table Lock: Complete Troubleshooting Guide

Fix DynamoDB slow queries, ProvisionedThroughputExceededException, and timeout errors. Step-by-step diagnosis with AWS CLI commands and proven solutions.

ERROR: deadlock detected - Resolving PostgreSQL Deadlocks & Connection Exhaustion

Fix PostgreSQL deadlocks (ERROR: 40P01) and connection pool exhaustion. Learn to trace lock contention, enforce consistent lock ordering, and optimize transacti