Resolving 'Amazon RDS Storage Full' and 'EKS Node Not Ready' Errors
Fix Amazon RDS storage full states and resolve EKS node not ready errors. Step-by-step troubleshooting, AWS CLI commands, and root cause analysis.
- Amazon RDS enters a storage-full state when available space drops to zero, suspending all write operations.
- EKS nodes frequently enter a NotReady state due to resource exhaustion (DiskPressure/MemoryPressure) or kubelet daemon failures.
- Quick fix for RDS: Modify the instance via AWS CLI to increase allocated storage, or enable Storage Autoscaling.
- Quick fix for EKS: Inspect node conditions using kubectl describe node, clear unused images, or restart the kubelet service.
| Method | When to Use | Time | Risk |
|---|---|---|---|
| Increase RDS Allocated Storage | When rds status storage full occurs and DB is unresponsive | 10-30 mins | Low |
| Enable RDS Storage Autoscaling | Proactively to prevent storage full rds issues | 5 mins | Low |
| Clear EKS Node Disk Space | When eks node not ready is caused by DiskPressure | 5-10 mins | Medium |
| Restart EKS Kubelet | Node is unresponsive due to PLEG timeout or kubelet crash | 2 mins | Low |
Understanding the Interconnected Failures
In modern cloud architectures, infrastructure components are tightly coupled. A critical database failure, such as amazon rds storage full, can cause cascading application failures. While seemingly unrelated, a massive spike in application log output due to database connection errors can exhaust local disk space on Kubernetes worker nodes, leading to an eks node not ready status. This guide tackles both issues, as they often appear together during major localized outages.
Part 1: Troubleshooting Amazon RDS Storage Full
When your database runs out of disk space, it enters the storage-full state. For PostgreSQL users, a rds postgres storage full error is particularly critical because PostgreSQL requires sufficient disk space to write Write-Ahead Logs (WAL). Without it, the database will aggressively halt transactions to prevent corruption.
Step 1: Diagnose the RDS Issue
The first indicator is usually monitoring alerts triggering on FreeStorageSpace. You might see the rds status storage full in the AWS Management Console.
If you are dealing with a rds storage full postgres scenario, the database logs will typically show:
PANIC: could not write to file "pg_wal/xlogtemp.123": No space left on device
Step 2: Fix the RDS Storage Full Issue
To resolve rds storage full, you must increase the storage capacity. If your database is in the storage-full state, you cannot perform any operations other than modifying the storage size.
- Modify the Instance: Increase the allocated storage by at least 10% (or minimum 10GB) to allow the database to recover.
- Enable Autoscaling: To prevent future storage full rds events, ensure RDS Storage Autoscaling is enabled with a maximum storage threshold that accommodates your growth.
Part 2: Troubleshooting EKS Node Not Ready
Simultaneously, you might receive alerts that a node is not ready eks. When a node transitions to the NotReady status, the Kubernetes control plane stops scheduling new pods to it and eventually begins evicting existing pods.
Step 1: Diagnose the EKS Node
A node not ready eks alert usually points to the kubelet. The kubelet performs periodic health checks. If it stops updating the node status, the control plane marks it as NotReady. Common culprits include:
- DiskPressure: The node's root filesystem or container runtime filesystem is out of space (often due to out-of-control container logs caused by the aforementioned RDS outage).
- MemoryPressure: The node is out of memory.
- Network/CNI Issues: The aws-node DaemonSet (VPC CNI) is crashing.
Step 2: Fix the EKS Node
- Describe the Node: Run
kubectl describe node <node-name>and look at theConditionssection. If you seeDiskPressure=True, you need to clear space. - Check Kubelet Logs: SSH or use AWS Systems Manager Session Manager to access the underlying EC2 instance and check the kubelet logs:
journalctl -u kubelet -f. - Restart Services: Often, simply restarting the container runtime or the kubelet resolves transient lockups.
Frequently Asked Questions
# --- RDS Diagnostics & Remediation ---
# Check current RDS instance status and storage
aws rds describe-db-instances \
--db-instance-identifier my-production-db \
--query 'DBInstances[*].[DBInstanceStatus,AllocatedStorage,MaxAllocatedStorage]' \
--output table
# Modify RDS instance to increase storage (e.g., to 500GB) and enable autoscaling
aws rds modify-db-instance \
--db-instance-identifier my-production-db \
--allocated-storage 500 \
--max-allocated-storage 1000 \
--apply-immediately
# --- EKS Diagnostics & Remediation ---
# Find nodes that are NotReady
kubectl get nodes | grep NotReady
# Get detailed conditions for a specific not ready node
kubectl describe node ip-10-0-1-123.ec2.internal | grep -A 5 Conditions
# (Run on the actual EKS worker node via SSH/SSM to check kubelet)
sudo systemctl status kubelet
sudo journalctl -u kubelet -n 100 --no-pager
# Restart kubelet to attempt recovery
sudo systemctl restart kubeletError Medic Editorial
Error Medic Editorial consists of senior Site Reliability Engineers and Cloud Architects dedicated to providing actionable, code-first solutions for complex infrastructure outages.