Why am I getting a 429 error when I haven't made many requests?

If you are on the free tier, the limits are extremely strict (e.g., 3 Requests Per Minute). Alternatively, you may have hit your monthly billing quota (hard limit) rather than a per-minute rate limit. Always read the specific error message provided in the API response payload.

How do I handle sudden OpenAI API timeouts?

Increase your HTTP client's read timeout settings. For long completions, strongly consider using the `stream: true` API parameter to receive data incrementally. This prevents the connection from sitting idle and timing out at the load balancer level.

I get a 401 error but my API key is correct. What is wrong?

Ensure there are no invisible whitespace characters (like newlines) around your API key when it is loaded from environment variables. Verify that the key hasn't been revoked in the OpenAI dashboard, and confirm you are passing it in the `Authorization: Bearer ` header correctly.

What is the difference between a 500 and 503 error from OpenAI?

A 500 Internal Server Error usually indicates a specific bug or unhandled exception within OpenAI's request processing logic for your specific prompt. A 503 Service Unavailable typically means their servers are temporarily overloaded, undergoing maintenance, or experiencing a broader infrastructure outage. Both require retry logic.

Can I request a rate limit increase?

Yes. As you establish a payment history and prepay for API credits, you move up in 'Usage Tiers' (Tier 1 through Tier 5). Higher tiers automatically grant significantly higher RPM and TPM limits. You can check your current tier in the billing limits section of your OpenAI account dashboard.

Troubleshooting OpenAI API Rate Limits (Error 429) and Common HTTP Errors

Fix OpenAI API 429 Too Many Requests and timeout errors. Learn how to implement exponential backoff, handle 401/403 auth issues, and survive 500/503 outages.

Last updated: February 24, 2026

Last verified: February 24, 2026

1,221 words

Key Takeaways

Error 429 (Too Many Requests) is the most common issue, caused by exceeding TPM (Tokens Per Minute) or RPM (Requests Per Minute) limits.
HTTP 401/403 indicate authentication failures, revoked keys, or insufficient quota/permissions on your billing account.
HTTP 500/502/503 and timeouts are server-side issues requiring robust retry mechanisms or awaiting OpenAI system recovery.
Implement exponential backoff with jitter in your application code to gracefully handle transient 429 and 5xx errors without overwhelming the API.

OpenAI API Errors Compared
Error Code	Root Cause	Immediate Action	Long-term Fix
429 Too Many Requests	Exceeded TPM/RPM limits or hit monthly billing quota	Wait for limit reset (usually minutes) or check billing	Implement exponential backoff, increase usage tier
401 Unauthorized	Invalid API key or malformed Authorization header	Check API key validity in the OpenAI dashboard	Use secure environment variables, rotate keys regularly
500 / 503 Internal Error	OpenAI infrastructure issue or overloaded servers	Check status.openai.com for ongoing incidents	Implement resilient retry logic for 5xx responses
API Timeout	Request took too long or network connection dropped	Retry the request	Optimize prompt size, use stream=True, increase client timeouts

Understanding OpenAI API Errors

When building applications that depend on the OpenAI API, encountering HTTP errors is a standard part of the development lifecycle. Whether you are dealing with sudden traffic spikes leading to 429 Too Many Requests, or unexpected 503 Service Unavailable outages, handling these gracefully is critical for resilient production systems. This guide covers the diagnosis and mitigation of the most common OpenAI API error codes.

Error 429: Too Many Requests (Rate Limits)

The 429 status code is the most frequent hurdle developers face. The OpenAI API enforces rate limits based on your organization's usage tier across two primary metrics:

Requests Per Minute (RPM)
Tokens Per Minute (TPM)

Additionally, if you are on a free tier or have reached your predefined monthly billing cap (hard limit), a 429 might indicate you have hit your quota limit rather than a transient, per-minute rate limit.

Common Symptom:

{
  "error": {
    "message": "Rate limit reached for default-gpt-3.5-turbo in organization org-xyz on requests per min. Limit: 3, RPM. Please try again in 20s.",
    "type": "requests",
    "param": null,
    "code": "rate_limit_exceeded"
  }
}

Step 1: Diagnose the 429

Examine the exact error message payload. Does it specify requests per min, tokens per min, or insufficient_quota?

If it is RPM/TPM, your application is sending requests too quickly. You need to pace your calls.
If the code is insufficient_quota, you need to check your billing dashboard, ensure a payment method is attached, and potentially increase your monthly spending limit.

Step 2: Fix with Exponential Backoff

The industry standard approach to handling transient 429 (and 5xx) errors is implementing exponential backoff with jitter. This strategy pauses your application for a short period when an error occurs, and increases the pause time exponentially with each subsequent failure. Adding random "jitter" prevents the "thundering herd" problem where many distributed clients retry simultaneously.

Authentication Errors: 401 Unauthorized and 403 Forbidden

These errors indicate that the OpenAI API rejected your credentials or you lack permissions for the requested resource.

Error 401 Unauthorized: This typically means your Authorization: Bearer YOUR_API_KEY header is missing, malformed, or the key itself has been revoked or deleted.

Fix: Verify your API key in the OpenAI platform dashboard. Ensure your application is correctly loading the key from secure environment variables and is not inadvertently passing a hardcoded placeholder or an empty string.

Error 403 Forbidden: This occurs if your account does not have access to the specific model you are requesting (for example, attempting to access gpt-4 without meeting the minimum billing history requirements), or if you are accessing the API from an unsupported geographical region.

Server-Side Errors: 500, 502, 503, and Timeouts

Errors in the 5xx range indicate that the problem lies within OpenAI's infrastructure, not your request.

500 Internal Server Error: An unexpected error or bug occurred on their servers while processing your request.
502 Bad Gateway / 503 Service Unavailable: The server is overloaded, down for maintenance, or experiencing a broader network partition.
Timeouts: The TCP connection was dropped or the API took too long to generate a response. This is especially common with long-context generation tasks or when network conditions are poor.

Diagnosis and Mitigation

Always check status.openai.com when encountering persistent 5xx errors. If there is an active incident, you must wait for their SREs to resolve it.

Retries: Treat 5xx errors similarly to 429 rate limits. Implement robust retry loops with exponential backoff.
Timeouts: Configure your HTTP client with sensible timeouts. The default timeout in many standard libraries (like Python's requests) might be too short for large Language Model responses. Increase your read timeout to 60-120 seconds for complex queries.
Streaming: To mitigate timeout issues and improve the perceived latency for end-users, utilize the stream=true parameter in your API calls. This instructs the API to send the response back in chunks as they are generated via Server-Sent Events (SSE), keeping the connection active and responsive.

Frequently Asked Questions

python

import openai
from tenacity import (
    retry,
    stop_after_attempt,
    wait_random_exponential,
    retry_if_exception_type
)

# Retry up to 6 times, waiting exponentially between 1 and 60 seconds with jitter
@retry(
    wait=wait_random_exponential(min=1, max=60),
    stop=stop_after_attempt(6),
    retry=retry_if_exception_type((
        openai.RateLimitError, 
        openai.APIConnectionError, 
        openai.InternalServerError
    ))
)
def completion_with_backoff(**kwargs):
    return openai.chat.completions.create(**kwargs)

try:
    # This call will automatically retry on 429 and 5xx errors
    response = completion_with_backoff(
        model="gpt-3.5-turbo",
        messages=[{"role": "user", "content": "Explain exponential backoff."}]
    )
    print(response.choices[0].message.content)
except Exception as e:
    print(f"Request failed after maximum retries: {e}")

Error Medic Editorial

The Error Medic Editorial team consists of senior SREs, DevOps engineers, and platform architects dedicated to solving the most persistent infrastructure and API integration challenges.

Sources

Explore More API Errors Guides

AWS API Rate Limit Exceeded (ThrottlingException): Complete Troubleshooting Guide

Fix AWS ThrottlingException and API timeouts with exponential backoff, Service Quotas increases, and optimized API polling strategies for your workloads.

Azure API Timeout: 'The operation timed out' — Root Causes and Fixes

Fix Azure API timeouts caused by misconfigured APIM policies, backend latency, or connection limits. Step-by-step diagnostics and policy fixes included.

Azure API Timeout: Fix 504 Gateway Timeout and RequestTimeout Errors in Azure API Management, Functions, and ARM

Diagnose and fix Azure API timeout errors (504, 408, RequestTimeout) across API Management, Functions, and ARM. Includes policy fixes, host.json config, and CLI