Error Medic

Fixing 'Elasticsearch Cluster Health Red': Comprehensive Troubleshooting Guide

Resolve an Elasticsearch cluster health status red error. Step-by-step diagnostic commands, unassigned shard recovery, and data restoration strategies.

Last updated:
Last verified:
1,603 words
Key Takeaways
  • A 'red' cluster status means at least one primary shard (and its replicas) is missing or unassigned, leading to data unavailability.
  • The first step is always diagnosing *why* shards are unassigned using the Cluster Allocation Explain API.
  • Common root causes include node failures, disk space exhaustion (watermark breaches), cluster network partitions, or corrupted indices.
  • Fixes range from simple cluster rerouting and node restarts to restoring from a snapshot if data loss has occurred.
Fix Approaches Compared
MethodWhen to UseTimeRisk
Cluster Allocation Explain APIInitial diagnosis for any unassigned shard.< 5 minsLow
Clear Disk Space / WatermarksNodes hit low disk watermarks blocking allocation.10-30 minsLow
Manual Shard Allocation (Reroute)Cluster is stable but failed to auto-recover shards.10 minsMedium
Restart Failed NodesHardware/JVM crash caused nodes to drop from the cluster.15-60 minsMedium
Restore from SnapshotPrimary shards are permanently lost or corrupted.HoursHigh (Data Loss)

Understanding the Error

When working with Elasticsearch, monitoring the cluster health status is a daily operation. The health status can be one of three colors: Green, Yellow, or Red.

  • Green: All primary and replica shards are active and assigned to nodes.
  • Yellow: All primary shards are active, but one or more replica shards are unassigned. The cluster is fully functional, but high availability is degraded.
  • Red: One or more primary shards are unassigned.

When your Elasticsearch cluster status is red, it is a critical incident. It means some portion of your data is currently unavailable for indexing or searching. If you query an index that has a red status, the query will fail or return partial results (if allow_partial_search_results is true, though the default is usually strict failure for critical missing data).

The Anatomy of a Red Cluster

Elasticsearch divides indices into shards to distribute data across nodes. Every document belongs to a single primary shard. If the node holding that primary shard crashes, and no replica exists (or the replicas also crashed), that primary shard becomes unassigned. Until a primary shard is promoted from a replica or recovered from disk, the cluster remains red.

You might see this status in your monitoring tools (like Kibana or Datadog) or when hitting the _cluster/health API:

{
  "cluster_name" : "prod-logs-cluster",
  "status" : "red",
  "timed_out" : false,
  "number_of_nodes" : 5,
  "number_of_data_nodes" : 3,
  "active_primary_shards" : 142,
  "active_shards" : 284,
  "relocating_shards" : 0,
  "initializing_shards" : 0,
  "unassigned_shards" : 4,
  "delayed_unassigned_shards" : 0,
  "number_of_pending_tasks" : 0,
  "number_of_in_flight_fetch" : 0,
  "task_max_waiting_in_queue_millis" : 0,
  "active_shards_percent_as_number" : 98.6
}

In the output above, "status" : "red" and "unassigned_shards" : 4 are the immediate indicators of trouble.

Step 1: Diagnose the Unassigned Shards

Do not guess why the cluster is red. Elasticsearch will tell you exactly why it refuses to allocate a shard.

First, identify which indices are red:

curl -X GET "localhost:9200/_cat/indices?v&health=red"

Next, list the specific unassigned shards and their prior node locations:

curl -X GET "localhost:9200/_cat/shards?v&h=index,shard,prirep,state,node,unassigned.reason" | grep UNASSIGNED

This will output something like:

app-logs-2023.10.25 2 p UNASSIGNED NODE_LEFT

The unassigned.reason provides a crucial hint. Common reasons include:

  • NODE_LEFT: The node holding the shard disconnected.
  • CLUSTER_RECOVERED: Full cluster restart, waiting for nodes to join.
  • ALLOCATION_FAILED: Shard allocation failed (often due to disk space or corrupted data).

To get the definitive reason for the allocation failure, use the Cluster Allocation Explain API:

curl -X GET "localhost:9200/_cluster/allocation/explain?pretty"

(If you know the specific index and shard, you can pass them in the request body for targeted output).

The output will contain an allocate_explanation and a deciders array. Look for deciders that return "decision": "NO".

Step 2: Address Common Root Causes

Based on the diagnosis, proceed with the appropriate fix.

Scenario A: Disk Space Exhaustion (Disk Watermarks)

Elasticsearch protects nodes from running out of disk space by enforcing disk watermarks. If a node breaches the cluster.routing.allocation.disk.watermark.low (default 85%), it won't allocate new shards to that node. If it breaches the high watermark (90%), it will attempt to relocate shards away. If it hits the flood_stage (95%), indices are set to read-only.

If the Allocation Explain API shows a decider blocking allocation due to disk space:

  1. Free up space: Delete old indices, clear application logs on the host, or expand the underlying EBS volume/disk.
  2. Adjust watermarks temporarily (if absolutely necessary):
    PUT _cluster/settings
    {
      "transient": {
        "cluster.routing.allocation.disk.watermark.low": "90%",
        "cluster.routing.allocation.disk.watermark.high": "95%"
      }
    }
    
  3. Once space is freed, you may need to manually trigger allocation if the cluster gave up:
    POST /_cluster/reroute?retry_failed=true
    
Scenario B: Node Disconnection (NODE_LEFT)

If the primary shard was on a node that crashed, was terminated, or lost network connectivity:

  1. Check Node Status: curl -X GET "localhost:9200/_cat/nodes?v". Is the expected number of nodes present?
  2. Investigate the missing node: Check systemic logs (e.g., /var/log/syslog, dmesg, or the hypervisor console) and the Elasticsearch logs (/var/log/elasticsearch/*.log) on the disconnected node.
    • Did the JVM OOM (Out of Memory)?
    • Is there a network partition?
  3. Restart the Node: Often, simply starting the Elasticsearch service on the failed node will allow it to rejoin the cluster. Once it rejoins, Elasticsearch will recognize the local data and promote the shards, turning the cluster from red to yellow, and eventually green as replicas sync.
Scenario C: Shard Allocation Delay

By default, Elasticsearch waits 1 minute (index.unassigned.node_left.delayed_timeout) before reallocating missing primary shards to replicas (if available) to avoid unnecessary network I/O during brief node restarts.

If a node is gone permanently, you can speed up the recovery (if replicas exist) by changing this setting:

PUT _all/_settings
{
  "settings": {
    "index.unassigned.node_left.delayed_timeout": "0"
  }
}

Note: If no replicas exist, changing the delay timeout won't help; you must recover the original node or restore from a snapshot.

Scenario D: Corrupted Shards or Permanent Data Loss

If the node containing the primary and only copy of a shard is permanently destroyed (e.g., disk failure), you have lost data.

The Allocation Explain API will show that no valid copies of the shard exist in the cluster.

Your options are:

  1. Restore from Snapshot: This is the safest and correct approach if you have backups configured (e.g., AWS S3 repository).
    POST /_snapshot/my_repository/snapshot_1/_restore
    {
      "indices": "app-logs-2023.10.25",
      "ignore_unavailable": true,
      "include_global_state": false
    }
    
  2. Allocate Empty Primary (Accept Data Loss): If the index is highly transient (e.g., metrics from 5 minutes ago) and you don't have snapshots, you can force the cluster to allocate an empty primary shard. This will permanently delete any data that was in that shard. The cluster will turn green, but the data is gone.
    POST /_cluster/reroute
    {
      "commands": [
        {
          "allocate_empty_primary": {
            "index": "app-logs-2023.10.25",
            "shard": 2,
            "node": "node-1",
            "accept_data_loss": true
          }
        }
      ]
    }
    

Step 3: Prevention

To prevent the Elasticsearch health status from changing to red in the future:

  • Always use replicas: Never run production indices with number_of_replicas: 0. A minimum of 1 ensures high availability if a single node fails.
  • Monitor disk space: Alert heavily on 80% disk utilization to proactively scale storage before hitting watermarks.
  • Ensure node stability: Tune JVM heap size (typically 50% of total RAM, maxing out at ~31GB) and monitor garbage collection times to prevent OOM crashes.

Frequently Asked Questions

bash
# 1. Check cluster health
curl -X GET "localhost:9200/_cluster/health?pretty"

# 2. Identify which indices are red
curl -X GET "localhost:9200/_cat/indices?v&health=red"

# 3. List all unassigned shards and their prior nodes
curl -X GET "localhost:9200/_cat/shards?v&h=index,shard,prirep,state,node,unassigned.reason" | grep UNASSIGNED

# 4. The most important command: Ask Elasticsearch WHY the shard is unassigned
curl -X GET "localhost:9200/_cluster/allocation/explain?pretty"

# 5. Retry failed allocations (useful if disk space was just cleared)
curl -X POST "localhost:9200/_cluster/reroute?retry_failed=true"

# 6. Check node disk space (to check for watermark breaches)
curl -X GET "localhost:9200/_cat/allocation?v"
E

Error Medic Editorial

Error Medic Editorial is composed of senior Site Reliability Engineers and database administrators specializing in distributed systems, search infrastructure, and large-scale incident response.

Sources

Related Articles in Elasticsearch

Explore More Database Guides