Why am I getting a 429 Too Many Requests error when I haven't made many calls today?

You might be hitting your Token Per Minute (TPM) limit instead of Requests Per Minute (RPM). If your prompts are very large or you have `max_tokens` set very high, a single request can consume a large portion of your TPM limit. Alternatively, you may have hit your hard monthly billing quota.

How long should my application wait before retrying after a 429 error?

You should check the `x-ratelimit-reset-requests` or `x-ratelimit-reset-tokens` headers in the response, which tell you exactly how long to wait. If those are unavailable, use an exponential backoff strategy starting at 1-2 seconds, scaling up to a maximum of 60 seconds.

My API calls keep timing out. How can I fix this?

LLM generation can be slow. First, increase your HTTP client's read timeout (e.g., to 60 or 120 seconds). For the best experience, set `stream=True` in your API request. Streaming sends data back in chunks, preventing idle network timeouts and providing immediate feedback.

What is the difference between a 500 and 503 error from OpenAI?

A 500 error indicates an unexpected internal server error or bug on OpenAI's infrastructure. A 503 error specifically means the service is temporarily unavailable, usually due to high traffic and capacity limits. Both should be handled with exponential backoff and retries.

How do I fix a 401 Unauthorized or 403 Forbidden error?

Ensure your API key is correct and passed as a Bearer token. For 401s, generate a new key if you suspect it was revoked. For 403s, ensure your account has billing set up, and if you are part of multiple organizations, ensure you are passing the correct Organization ID header.

Troubleshooting OpenAI API Errors: Rate Limits (429), Timeouts, and Server Issues (500, 502, 503)

Comprehensive guide to diagnosing and fixing OpenAI API rate limits (429), timeouts, and server errors (5xx) with exponential backoff and connection pooling.

Last updated: February 24, 2026

Last verified: February 24, 2026

1,569 words

Key Takeaways

HTTP 429 (Too Many Requests) errors indicate you have exceeded your Requests Per Minute (RPM), Tokens Per Minute (TPM), or overall billing quota. Mitigate using exponential backoff.
HTTP 5xx errors (500, 502, 503) and timeouts are generally server-side overloads or network partitions. Handle these with robust retry logic and jitter.
Timeouts during generation can be resolved by increasing client-side timeout thresholds, reducing max_tokens, or implementing streaming responses.
HTTP 401/403 errors are authentication and authorization failures, typically requiring API key verification or billing updates.

OpenAI API Error Resolution Strategies
Error Type	Root Cause	Resolution Strategy	Implementation Complexity
429 Too Many Requests	Exceeded RPM/TPM limits or tier quota	Exponential backoff, token tracking, request queuing	Medium
Timeout / 504	Long generation time, network partition	Enable streaming, increase client timeout, reduce max_tokens	Low
500 / 502 / 503	OpenAI server outage or overload	Retry with exponential backoff and jitter, monitor status page	Low
401 / 403	Invalid API key, lack of permissions, billing issue	Verify API key, check billing status, confirm org ID	Low

Understanding the Error Landscape

When integrating with the OpenAI API, developers frequently encounter a variety of HTTP status codes that disrupt service. Because large language models (LLMs) are computationally expensive and shared across millions of users, OpenAI enforces strict concurrency and rate limits while occasionally suffering from capacity constraints. A robust integration must gracefully handle everything from an openai api rate limit (429) to sudden network timeouts and backend server failures (500, 502, 503).

This guide explores the most common OpenAI API errors, dissecting their root causes and providing production-ready architectural patterns for resilience.

Diagnosing 429 Too Many Requests

The HTTP 429 error is by far the most common hurdle. It signifies that your application is sending requests faster than your current tier allows, or that you have exhausted your account's financial quota.

Types of 429 Errors

Requests Per Minute (RPM) Limit: You are firing too many individual API calls within a 60-second window.
Tokens Per Minute (TPM) Limit: Even if you make few requests, sending massive prompts or requesting maximum output tokens can quickly exhaust your TPM. OpenAI calculates TPM based on the maximum possible tokens a request could consume (prompt tokens + max_tokens).
Insufficient Quota: Your monthly billing limit (hard limit) has been reached. This will persistently return a 429 until you adjust your budget or the month rolls over.

Diagnostic Steps for 429s

Inspect the HTTP response headers returned by the OpenAI API. They contain crucial telemetry:

x-ratelimit-limit-requests: The maximum requests allowed in your current tier.
x-ratelimit-remaining-requests: Requests left before hitting the limit.
x-ratelimit-reset-requests: Time until the RPM limit resets (e.g., "1s").
x-ratelimit-limit-tokens: The maximum tokens allowed in your current tier.
x-ratelimit-remaining-tokens: Tokens left before hitting the limit.
x-ratelimit-reset-tokens: Time until the TPM limit resets.

If the response body contains "type": "insufficient_quota", you have hit a billing limit, and no amount of retrying will fix it.

Handling Server Errors: 500, 502, and 503

When OpenAI's infrastructure struggles under load or experiences an outage, you will see 5xx errors.

OpenAI API 500 (Internal Server Error): A generic catch-all for an unhandled exception on OpenAI's backend.
OpenAI API 502 (Bad Gateway): Often occurs when Cloudflare or OpenAI's internal load balancers fail to route your request to an available model inference node.
OpenAI API 503 (Service Unavailable): The servers are explicitly overloaded and shedding load.

Unlike 429s, 5xx errors are entirely out of your control. The only remediation strategy is implementing a resilient retry mechanism. However, blindly retrying can exacerbate the problem. You must use exponential backoff with jitter (randomness) to prevent the "thundering herd" problem, where thousands of clients retry simultaneously when the service recovers.

Conquering Timeouts

An openai api timeout usually happens for one of two reasons:

Network Partitions: The TCP connection between your server and OpenAI drops.
Long Inference Times: Models like GPT-4 can take tens of seconds to generate long responses. If your HTTP client's read timeout is set too aggressively (e.g., default 10 seconds in some libraries), the client will sever the connection before OpenAI finishes generating the response.

Resolving Timeouts

Increase Client Timeout: Explicitly configure your HTTP client to allow 60, 120, or even 300 seconds for read operations when dealing with complex LLM tasks.
Implement Streaming (stream=True): This is the most robust solution. By enabling Server-Sent Events (SSE) streaming, OpenAI sends chunks of the response as they are generated. This keeps the HTTP connection active and completely bypasses standard idle read timeouts. It also massively improves perceived latency for the end-user.

Resolving 401 and 403 Authentication Errors

If you receive an openai api 401 (Unauthorized) or openai api 403 (Forbidden), the issue lies with your credentials or permissions.

401 Unauthorized: Your API key is invalid, revoked, or missing from the Authorization: Bearer <token> header. Ensure you are not accidentally committing keys to version control, which causes automatic revocation by secret scanners.
403 Forbidden: The key is valid, but it lacks permissions. This frequently happens if you are part of an organization but haven't specified the OpenAI-Organization header, or if you are attempting to access a model your tier does not yet have access to (e.g., trying to access GPT-4 without a funded account).

Building Resilient Architectures

To build a truly resilient system, you must move beyond simple try/catch blocks.

1. The Circuit Breaker Pattern

If OpenAI is returning 503s continuously, your application should stop sending requests entirely for a period. A circuit breaker pattern detects high failure rates and "trips," failing fast locally instead of wasting resources waiting for doomed network calls. After a cooldown, it "half-opens" to test the waters.

2. Advanced Rate Limiting Strategies

If you operate a high-throughput system, relying on OpenAI to tell you when you're rate-limited via 429s is inefficient. Implement a local Token Bucket or Leaky Bucket algorithm using Redis to track your RPM and TPM consumption locally. Queue low-priority requests during peak bursts to ensure high-priority interactive requests succeed.

3. Graceful Degradation

If the API is unreachable, how does your app behave? Consider implementing graceful degradation: returning cached responses, falling back to a smaller, faster model (e.g., falling back to gpt-3.5-turbo if gpt-4 times out), or providing user-friendly error messages rather than raw stack traces.

Frequently Asked Questions

python

import os
import time
import logging
import openai
from tenacity import (
    retry,
    stop_after_attempt,
    wait_random_exponential,
    retry_if_exception_type
)

# Configure logging
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

# Initialize the client. The official v1.0+ client has built-in retries,
# but using Tenacity allows for custom logic, alerting, and broader exception handling.
client = openai.OpenAI(api_key=os.environ.get("OPENAI_API_KEY"))

# We retry on RateLimitError (429), APITimeoutError, and InternalServerError (5xx)
@retry(
    retry=retry_if_exception_type((
        openai.RateLimitError, 
        openai.APITimeoutError, 
        openai.InternalServerError
    )),
    wait=wait_random_exponential(multiplier=1, max=60), # Exponential backoff with jitter
    stop=stop_after_attempt(5),
    before_sleep=lambda retry_state: logger.warning(
        f"Retrying API call: attempt {retry_state.attempt_number} "
        f"after error: {retry_state.outcome.exception()}"
    )
)
def generate_completion_with_retry(prompt, model="gpt-4-turbo-preview"):
    try:
        # Set a generous timeout for complete responses if not streaming
        response = client.chat.completions.create(
            model=model,
            messages=[{"role": "user", "content": prompt}],
            timeout=60.0, # 60 second read timeout
        )
        return response.choices[0].message.content
        
    except openai.AuthenticationError as e:
        # 401 / 403 errors: Do not retry, log and fail fast
        logger.error(f"Authentication failed. Check API Key: {e}")
        raise
    except openai.BadRequestError as e:
        # 400 errors: Malformed request, do not retry
        logger.error(f"Bad request. Check parameters: {e}")
        raise

if __name__ == "__main__":
    try:
        result = generate_completion_with_retry("Explain Kubernetes architecture.")
        print("Success:", result[:100], "...")
    except Exception as e:
        print("Operation failed after exhausting retries:", e)

Error Medic Editorial

Error Medic Editorial is a team of senior DevOps and Site Reliability Engineers dedicated to providing deep, actionable troubleshooting guides for modern cloud infrastructure and APIs. We focus on real-world resilience patterns for production systems.

Sources

Explore More API Errors Guides

AWS API Rate Limit Exceeded (ThrottlingException): Complete Troubleshooting Guide

Fix AWS ThrottlingException and API timeouts with exponential backoff, Service Quotas increases, and optimized API polling strategies for your workloads.

Azure API Timeout: 'The operation timed out' — Root Causes and Fixes

Fix Azure API timeouts caused by misconfigured APIM policies, backend latency, or connection limits. Step-by-step diagnostics and policy fixes included.

Azure API Timeout: Fix 504 Gateway Timeout and RequestTimeout Errors in Azure API Management, Functions, and ARM

Diagnose and fix Azure API timeout errors (504, 408, RequestTimeout) across API Management, Functions, and ARM. Includes policy fixes, host.json config, and CLI