Error Medic

Troubleshooting OpenAI API Rate Limits (Error 429) and Common HTTP Errors

Fix OpenAI API 429 Too Many Requests and timeout errors. Learn how to implement exponential backoff, handle 401/403 auth issues, and survive 500/503 outages.

Last updated:
Last verified:
1,221 words
Key Takeaways
  • Error 429 (Too Many Requests) is the most common issue, caused by exceeding TPM (Tokens Per Minute) or RPM (Requests Per Minute) limits.
  • HTTP 401/403 indicate authentication failures, revoked keys, or insufficient quota/permissions on your billing account.
  • HTTP 500/502/503 and timeouts are server-side issues requiring robust retry mechanisms or awaiting OpenAI system recovery.
  • Implement exponential backoff with jitter in your application code to gracefully handle transient 429 and 5xx errors without overwhelming the API.
OpenAI API Errors Compared
Error CodeRoot CauseImmediate ActionLong-term Fix
429 Too Many RequestsExceeded TPM/RPM limits or hit monthly billing quotaWait for limit reset (usually minutes) or check billingImplement exponential backoff, increase usage tier
401 UnauthorizedInvalid API key or malformed Authorization headerCheck API key validity in the OpenAI dashboardUse secure environment variables, rotate keys regularly
500 / 503 Internal ErrorOpenAI infrastructure issue or overloaded serversCheck status.openai.com for ongoing incidentsImplement resilient retry logic for 5xx responses
API TimeoutRequest took too long or network connection droppedRetry the requestOptimize prompt size, use stream=True, increase client timeouts

Understanding OpenAI API Errors

When building applications that depend on the OpenAI API, encountering HTTP errors is a standard part of the development lifecycle. Whether you are dealing with sudden traffic spikes leading to 429 Too Many Requests, or unexpected 503 Service Unavailable outages, handling these gracefully is critical for resilient production systems. This guide covers the diagnosis and mitigation of the most common OpenAI API error codes.

Error 429: Too Many Requests (Rate Limits)

The 429 status code is the most frequent hurdle developers face. The OpenAI API enforces rate limits based on your organization's usage tier across two primary metrics:

  1. Requests Per Minute (RPM)
  2. Tokens Per Minute (TPM)

Additionally, if you are on a free tier or have reached your predefined monthly billing cap (hard limit), a 429 might indicate you have hit your quota limit rather than a transient, per-minute rate limit.

Common Symptom:

{
  "error": {
    "message": "Rate limit reached for default-gpt-3.5-turbo in organization org-xyz on requests per min. Limit: 3, RPM. Please try again in 20s.",
    "type": "requests",
    "param": null,
    "code": "rate_limit_exceeded"
  }
}
Step 1: Diagnose the 429

Examine the exact error message payload. Does it specify requests per min, tokens per min, or insufficient_quota?

  • If it is RPM/TPM, your application is sending requests too quickly. You need to pace your calls.
  • If the code is insufficient_quota, you need to check your billing dashboard, ensure a payment method is attached, and potentially increase your monthly spending limit.
Step 2: Fix with Exponential Backoff

The industry standard approach to handling transient 429 (and 5xx) errors is implementing exponential backoff with jitter. This strategy pauses your application for a short period when an error occurs, and increases the pause time exponentially with each subsequent failure. Adding random "jitter" prevents the "thundering herd" problem where many distributed clients retry simultaneously.

Authentication Errors: 401 Unauthorized and 403 Forbidden

These errors indicate that the OpenAI API rejected your credentials or you lack permissions for the requested resource.

Error 401 Unauthorized: This typically means your Authorization: Bearer YOUR_API_KEY header is missing, malformed, or the key itself has been revoked or deleted.

  • Fix: Verify your API key in the OpenAI platform dashboard. Ensure your application is correctly loading the key from secure environment variables and is not inadvertently passing a hardcoded placeholder or an empty string.

Error 403 Forbidden: This occurs if your account does not have access to the specific model you are requesting (for example, attempting to access gpt-4 without meeting the minimum billing history requirements), or if you are accessing the API from an unsupported geographical region.

Server-Side Errors: 500, 502, 503, and Timeouts

Errors in the 5xx range indicate that the problem lies within OpenAI's infrastructure, not your request.

  • 500 Internal Server Error: An unexpected error or bug occurred on their servers while processing your request.
  • 502 Bad Gateway / 503 Service Unavailable: The server is overloaded, down for maintenance, or experiencing a broader network partition.
  • Timeouts: The TCP connection was dropped or the API took too long to generate a response. This is especially common with long-context generation tasks or when network conditions are poor.
Diagnosis and Mitigation

Always check status.openai.com when encountering persistent 5xx errors. If there is an active incident, you must wait for their SREs to resolve it.

  1. Retries: Treat 5xx errors similarly to 429 rate limits. Implement robust retry loops with exponential backoff.
  2. Timeouts: Configure your HTTP client with sensible timeouts. The default timeout in many standard libraries (like Python's requests) might be too short for large Language Model responses. Increase your read timeout to 60-120 seconds for complex queries.
  3. Streaming: To mitigate timeout issues and improve the perceived latency for end-users, utilize the stream=true parameter in your API calls. This instructs the API to send the response back in chunks as they are generated via Server-Sent Events (SSE), keeping the connection active and responsive.

Frequently Asked Questions

python
import openai
from tenacity import (
    retry,
    stop_after_attempt,
    wait_random_exponential,
    retry_if_exception_type
)

# Retry up to 6 times, waiting exponentially between 1 and 60 seconds with jitter
@retry(
    wait=wait_random_exponential(min=1, max=60),
    stop=stop_after_attempt(6),
    retry=retry_if_exception_type((
        openai.RateLimitError, 
        openai.APIConnectionError, 
        openai.InternalServerError
    ))
)
def completion_with_backoff(**kwargs):
    return openai.chat.completions.create(**kwargs)

try:
    # This call will automatically retry on 429 and 5xx errors
    response = completion_with_backoff(
        model="gpt-3.5-turbo",
        messages=[{"role": "user", "content": "Explain exponential backoff."}]
    )
    print(response.choices[0].message.content)
except Exception as e:
    print(f"Request failed after maximum retries: {e}")
E

Error Medic Editorial

The Error Medic Editorial team consists of senior SREs, DevOps engineers, and platform architects dedicated to solving the most persistent infrastructure and API integration challenges.

Sources

Related Articles in Openai Api

Explore More API Errors Guides