Why am I getting a 429 error when I haven't made many requests?

You may be hitting the Tokens Per Minute (TPM) limit rather than the requests limit. A single request with a massive prompt and high max_tokens can exhaust your TPM instantly, especially on the Free or Tier 1 usage tiers.

How do I fix 'You exceeded your current quota, please check your plan and billing details'?

This is a 403 error indicating your pre-paid credit balance is zero or negative. Log into the OpenAI platform, navigate to Settings > Billing, and add funds to your account. It may take a few minutes for the API to recognize the new balance.

What is the best way to handle OpenAI API timeouts?

First, increase the timeout configuration in your HTTP client (e.g., to 60 or 120 seconds). Second, switch to streaming responses (`stream=True`) so your application receives the first bytes quickly, preventing load balancer timeouts. Finally, implement exponential backoff for retries.

Why does my code work locally but throw a 401 Unauthorized in production?

This almost always means the `OPENAI_API_KEY` environment variable is not set correctly in your production environment (e.g., Docker, Vercel, AWS). Double-check your secrets manager and ensure the variable is exposed to the runtime.

Is it safe to retry 500 or 503 errors immediately?

No. Retrying immediately during a server outage can exacerbate the issue and may result in your IP being temporarily blocked. Always use exponential backoff with a random jitter when retrying 5xx errors.

Troubleshooting OpenAI API Errors: 429 Rate Limits, 401s, and 5xx Timeouts

Comprehensive guide to resolving OpenAI API 429 Rate Limit Exceeded, 401 Unauthorized, and 5xx server errors with actionable retry logic and diagnostic scripts.

Last updated: February 24, 2026

Last verified: February 24, 2026

979 words

Key Takeaways

HTTP 429 (Rate Limit Exceeded) is the most common error, triggered by hitting token per minute (TPM) or request per minute (RPM) limits.
Implement exponential backoff with jitter to gracefully handle 429 and transient 5xx server errors without overloading the API.
HTTP 401/403 errors usually indicate an invalid API key, missing organizational ID, or depleted pre-paid billing quota.

Fix Approaches Compared
Method	When to Use	Time	Risk
Exponential Backoff	429 Rate Limits & 5xx Server Errors	Minutes	Low
API Key Rotation	401 Unauthorized / Compromised Keys	Immediate	Medium
Quota Increase	403 Insufficient Quota / Sustained Growth	Hours/Days	Low
Client Timeout Increase	Timeout / 502 Bad Gateway	Immediate	Low

Understanding the Error

When building applications on top of the OpenAI API, encountering HTTP errors is inevitable. Due to the high computational cost of generative AI models, OpenAI enforces strict rate limits and occasionally experiences system-wide latency. Understanding the exact error codes—specifically 429, 401, 403, and the 5xx family—is critical for building resilient AI applications.

The 429 Rate Limit Exceeded Error

The most frequent stumbling block for developers is the 429 Too Many Requests error. The standard error payload looks like this:

{
  "error": {
    "message": "Rate limit reached for default-gpt-3.5-turbo in organization org-xxx on requests per min (RPM): Limit 3, Used 3, Requested 1. Please try again in 20s.",
    "type": "requests",
    "param": null,
    "code": "rate_limit_exceeded"
  }
}

This occurs because OpenAI accounts are categorized into Usage Tiers (Free, Tier 1 through Tier 5). Each tier has specific limits for Requests Per Minute (RPM), Tokens Per Minute (TPM), and Tokens Per Day (TPD). If you burst traffic or process large batches of text, you will quickly hit the TPM limit.

Step 1: Diagnose

Check Response Headers: OpenAI includes x-ratelimit-limit-requests, x-ratelimit-remaining-requests, and x-ratelimit-reset-requests headers. Inspect these to see exactly which limit you hit (tokens or requests).
Review Usage Tier: Visit your OpenAI dashboard (platform.openai.com/account/limits) to verify your current tier. Moving from Free to Tier 1 requires setting up a paid account and purchasing at least $5 of credit.
Identify Auth Errors (401/403): If you receive a 401 Unauthorized or 403 Forbidden error, the API is rejecting your credentials. This is often an "Incorrect API key provided" or "You exceeded your current quota" message.
Monitor Server Errors (5xx): A 500 Internal Server Error or 502/503 indicates a problem on OpenAI's infrastructure. Check status.openai.com.

Step 2: Fix

1. Implement Exponential Backoff (429 & 5xx) Never retry immediately. Use an exponential backoff strategy with jitter. This means waiting a short time (e.g., 1 second), then doubling the wait time for subsequent failures (2s, 4s, 8s), plus a random millisecond delay (jitter) to prevent the thundering herd problem.

2. Resolve Authentication and Quotas (401 & 403)

Verify your .env file is loaded correctly (e.g., using python-dotenv).
Log in to the billing dashboard and ensure your credit balance is greater than $0.
Regenerate the API key if you suspect it has been compromised or deleted.

3. Mitigate Timeouts If you are using the official Python or Node.js SDKs, the default timeout might be too aggressive for large generations. Override the default timeout parameter. For long completions, enable stream=True. This keeps the connection alive by sending chunks of data as they are generated, preventing idle network timeouts from load balancers or proxies.

Frequently Asked Questions

python

import os
import time
import logging
from openai import OpenAI, RateLimitError, APIError, APITimeoutError
from tenacity import retry, stop_after_attempt, wait_random_exponential, retry_if_exception_type

logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

client = OpenAI(
    api_key=os.environ.get("OPENAI_API_KEY"),
    timeout=60.0, # Increase timeout for large models
    max_retries=0 # Disable default retries to use Tenacity
)

# Configure resilient retry logic using Tenacity
@retry(
    wait=wait_random_exponential(min=1, max=60),
    stop=stop_after_attempt(5),
    retry=retry_if_exception_type((RateLimitError, APIError, APITimeoutError)),
    before_sleep=lambda retry_state: logger.warning(f"Retrying due to error: {retry_state.outcome.exception()}")
)
def generate_text_with_backoff(prompt: str, model="gpt-3.5-turbo") -> str:
    try:
        response = client.chat.completions.create(
            model=model,
            messages=[{"role": "user", "content": prompt}]
        )
        return response.choices[0].message.content
    except RateLimitError as e:
        logger.error(f"Rate limit exceeded (429): {e}")
        raise
    except APITimeoutError as e:
        logger.error(f"Request timed out: {e}")
        raise
    except APIError as e:
        # Handles 500, 502, 503 errors
        logger.error(f"OpenAI Server Error: {e}")
        raise

# Example usage:
if __name__ == "__main__":
    try:
        result = generate_text_with_backoff("Explain exponential backoff in one sentence.")
        print(result)
    except Exception as e:
        print(f"Final failure after retries: {e}")

Error Medic Editorial

Error Medic Editorial is a collective of senior Site Reliability Engineers and DevOps practitioners dedicated to providing actionable, code-first solutions for modern infrastructure and API integration challenges.

Sources

Explore More API Errors Guides

AWS API Rate Limit Exceeded (ThrottlingException): Complete Troubleshooting Guide

Fix AWS ThrottlingException and API timeouts with exponential backoff, Service Quotas increases, and optimized API polling strategies for your workloads.

Azure API Timeout: 'The operation timed out' — Root Causes and Fixes

Fix Azure API timeouts caused by misconfigured APIM policies, backend latency, or connection limits. Step-by-step diagnostics and policy fixes included.

Azure API Timeout: Fix 504 Gateway Timeout and RequestTimeout Errors in Azure API Management, Functions, and ARM

Diagnose and fix Azure API timeout errors (504, 408, RequestTimeout) across API Management, Functions, and ARM. Includes policy fixes, host.json config, and CLI