Why am I getting a 503 from the ALB but my application logs show no errors?

This happens when the ALB cannot even connect to your application. If Security Groups block the traffic, or if the application is bound to 'localhost' (127.0.0.1) instead of '0.0.0.0', the traffic never reaches your app to be logged. Always check ALB Access Logs to see if the target IP was reached.

Can AWS WAF cause an ALB to return a 503 error?

Generally, AWS WAF returns a 403 Forbidden if a request is blocked by a rule. However, if WAF encounters an internal error or rate limiting is applied aggressively, you might see 503s. Check WAF CloudWatch metrics to distinguish WAF blocks from backend target failures.

We only see 503 errors during ECS/EKS deployments. How do we fix this?

This is caused by a race condition where the orchestrator stops old containers before new ones pass ALB health checks. To fix this, adjust the deployment minimum healthy percent (keep it at 100%), increase the deregistration delay on the Target Group, and ensure your container's internal readiness probe aligns with the ALB health check.

What is the difference between ALB 502, 503, and 504 errors?

502 Bad Gateway means the ALB connected to the target, but the target sent back unparseable or incomplete data. 503 Service Unavailable means the ALB has no healthy targets to send traffic to. 504 Gateway Timeout means the ALB connected and sent the request, but the backend didn't respond before the idle timeout expired.

How to Fix AWS ALB 503 Service Temporarily Unavailable Errors

Fix AWS ALB 503 Service Unavailable errors by diagnosing target group health, security groups, and backend capacity limits. A complete SRE troubleshooting guide

Last updated: February 23, 2026

Last verified: February 23, 2026

1,329 words

Key Takeaways

Root Cause 1: Target group has zero registered targets available to process the request.
Root Cause 2: All registered backend targets are failing the ALB health checks and are marked unhealthy.
Root Cause 3: Backend service capacity is fully exhausted (e.g., connection limits reached on instances or containers).
Quick Fix: Validate target health in the EC2/Target Group console, verify Security Group ingress rules allow ALB traffic, and check backend application logs for startup failures.

Fix Approaches Compared
Method	When to Use	Time	Risk
Verify Target Health & Registration	Initial diagnosis for all 503 errors	5 mins	Low
Audit Security Groups & NACLs	Post-deployment or infrastructure changes	10 mins	Low
Scale Backend / Increase Capacity	High traffic spikes or connection exhaustion	15 mins	Medium
Fix Application Startup/Crash Loops	Targets stuck in 'Initial' or immediately failing	30+ mins	High

Understanding the AWS ALB 503 Error

When an AWS Application Load Balancer (ALB) returns an HTTP 503 Service Temporarily Unavailable error, it is explicitly telling the client that the load balancer cannot find a valid, healthy backend target to which it can forward the request. It is crucial to distinguish this from a 502 Bad Gateway (where the load balancer connected to the target, but the target returned an invalid or malformed response) and a 504 Gateway Timeout (where the load balancer connected, but the target took too long to respond).

The standard response generated directly by the ALB looks like this:

HTTP/1.1 503 Service Temporarily Unavailable
Content-Type: text/html
Content-Length: 119

<html>
<head><title>503 Service Temporarily Unavailable</title></head>
<body>
<center><h1>503 Service Temporarily Unavailable</h1></center>
</body>
</html>

As a DevOps engineer or SRE, seeing this error means your application layer is entirely disconnected from your ingress layer. Let us systematically break down the root causes and how to remediate them.

Step 1: Diagnose the Current Target State

The absolute first step in troubleshooting an ALB 503 error is checking the health of your Target Groups. The ALB relies on these groups to know where to send traffic.

Open the Amazon EC2 Console and navigate to Target Groups under the Load Balancing section.
Select the Target Group associated with your ALB listener rules.
Click on the Targets tab.

You will see your targets in one of several states: Healthy, Unhealthy, Initial, Draining, or simply no targets registered at all.

Scenario A: Empty Target Group

If the target group has no registered targets, the ALB has nowhere to send traffic. This commonly happens in ECS/EKS environments if the service failed to start completely or the deployment pipeline removed old tasks before new ones registered.

Scenario B: All Targets are 'Unhealthy'

If targets exist but are failing health checks, the ALB removes them from the active routing pool. Once the pool is empty, the ALB begins throwing 503s to clients.

Step 2: Fix Failing Health Checks

If your targets are unhealthy, you must figure out why the ALB cannot successfully ping the health check endpoint.

1. Check the Health Check Configuration: Navigate to the Health checks tab in your Target Group. Verify the following:

Protocol & Port: Is the ALB checking the correct port? (e.g., HTTP on port 8080).
Path: Does the health check path (e.g., /health or /api/status) actually exist on your application? Does it return an HTTP 200 OK?
Timeout & Interval: If your application takes 10 seconds to respond to a health check, but the ALB timeout is 5 seconds, it will mark the target unhealthy. Ensure the interval is longer than the timeout.

2. Verify Security Groups: This is the #1 cause of sudden 503 errors after an infrastructure update. The Security Group attached to your backend instances (EC2, ECS ENIs, or EKS Nodes) must have an inbound rule allowing traffic from the ALB's Security Group.

Go to the EC2/Backend Security Group.
Check Inbound Rules.
Ensure there is a rule allowing the specific application port (e.g., TCP 80, 443, or 3000), and that the Source is the Security Group ID of the ALB (sg-xxxxxxxx).

Step 3: Investigate Backend Application Crashes

If the network is open and the health check configuration is correct, the application itself might be crashing.

For EC2: SSH or use AWS Systems Manager Session Manager (SSM) to log into the instance. Run curl -v http://localhost:<port>/<health-check-path>. If the connection is refused, your application service (e.g., Nginx, Node.js, Tomcat) is down. Check /var/log/messages, syslog, or journalctl for crash logs.
For ECS/EKS: Check the container logs in CloudWatch. Often, a container will start, fail to connect to a database, crash, and restart. During this crash-loop, the target will never reach a Healthy state, resulting in continuous 503s at the ALB layer.

Step 4: Check for Resource Exhaustion and Capacity Issues

In high-traffic scenarios, your backend instances might be perfectly healthy but physically unable to accept new connections.

Connection Queue Limits: Web servers like Nginx, Apache, or Tomcat have maximum thread or connection limits. Once hit, they refuse new TCP connections. The ALB interprets this as a dropped connection and, if widespread, will return 503s.
Ephemeral Port Exhaustion: If your instances make heavy outbound connections, they may run out of ephemeral ports, causing the application to lock up.
CPU/Memory Maxed Out: Use CloudWatch to check the CPUUtilization and MemoryUtilization of your backend targets. An overloaded instance will fail to respond to health checks in time. The fix here is to scale out (add more instances/tasks) or scale up (increase instance size).

Summary of Diagnostic Flow

Identify if the 503 is consistent or intermittent.
Check HTTPCode_ELB_503_Count in CloudWatch.
Inspect Target Group Health status.
Validate Security Group rules.
Confirm application process uptime and local health check responses.

Frequently Asked Questions

bash

# Diagnostic script to list unhealthy targets in a specific target group

TARGET_GROUP_ARN="arn:aws:elasticloadbalancing:region:account-id:targetgroup/my-tg/id"

# Get target health status
echo "Fetching target health status..."
aws elbv2 describe-target-health \
  --target-group-arn $TARGET_GROUP_ARN \
  --query 'TargetHealthDescriptions[?TargetHealth.State!=`healthy`].{ID:Target.Id,Port:Target.Port,State:TargetHealth.State,Reason:TargetHealth.Reason,Description:TargetHealth.Description}' \
  --output table

# Query CloudWatch for 503 errors over the last hour
echo "Fetching ALB 503 error metrics..."
aws cloudwatch get-metric-statistics \
  --namespace AWS/ApplicationELB \
  --metric-name HTTPCode_ELB_503_Count \
  --dimensions Name=LoadBalancer,Value=app/my-load-balancer/id \
  --start-time $(date -u -d '1 hour ago' +%Y-%m-%dT%H:%M:%SZ) \
  --end-time $(date -u +%Y-%m-%dT%H:%M:%SZ) \
  --period 300 \
  --statistics Sum \
  --output table

Error Medic Editorial

Error Medic Editorial comprises senior DevOps and Cloud Reliability Engineers dedicated to demystifying complex cloud infrastructure errors. We share practical, production-tested solutions for AWS, Kubernetes, and modern cloud-native stacks.

Sources

Explore More Networking Guides

Envoy 503 Service Unavailable: Complete Troubleshooting Guide

Fix Envoy 503 errors fast: diagnose no_healthy_upstream, circuit breaker trips, and health check failures using the Admin API and config tuning.

Envoy 503 Service Unavailable: Root Causes and Troubleshooting Guide

Fix Envoy 503 Service Unavailable errors. Learn how to diagnose upstream connection failures, connection pool exhaustion, and TLS issues with actionable steps.

Fixing 'WireGuard Connection Refused': A Comprehensive Troubleshooting Guide

Resolve WireGuard connection refused, high latency, and packet loss errors. Step-by-step fixes for blocked ports, MTU mismatch, and firewall misconfigurations.