Why do I get a 429 error when I just added a credit card?

OpenAI's billing systems have a propagation delay. After adding pre-paid credits to clear an 'insufficient_quota' error, it can take anywhere from 5 to 15 minutes for the global API gateways to recognize the new balance. Furthermore, you may need to generate a completely new API key if the old one was soft-locked.

How can I distinguish between a rate limit and a server timeout?

A rate limit (429) is returned almost instantly by OpenAI's edge gateway. A server timeout (504 or client-side APITimeoutError) happens after the connection hangs open for an extended period (typically 30-60+ seconds) without returning any data.

What is the best retry strategy for OpenAI 503 errors?

Use exponential backoff with jitter, capped at a reasonable maximum (e.g., 5-6 retries). Because 503s indicate severe server load, aggressive immediate retries will likely fail and worsen the platform degradation. Wait at least 2 seconds before the first retry.

Why does my API key work locally but return 401 in production?

This is almost always an environment variable injection issue. Ensure the production environment has the exact `OPENAI_API_KEY` set. Also check for invisible whitespace or trailing newlines accidentally included when pasting the key into your deployment dashboard (e.g., Vercel, AWS Secrets Manager).

Does max_tokens affect my rate limits?

Yes. When you set `max_tokens` (or `max_completion_tokens`), OpenAI's rate limiter temporarily deducts that maximum amount from your Tokens Per Minute (TPM) quota the moment the request starts, even if the actual generated response is shorter. Setting artificially high token limits can cause premature 429 errors.

Troubleshooting OpenAI API Error 429: 'Rate limit reached', 401s, and 5xx Timeouts

Resolve OpenAI API 429 rate limits, 401/403 auth errors, and 500/503 timeouts. Learn to implement exponential backoff, check quotas, and handle connection drops

Last updated: February 24, 2026

Last verified: February 24, 2026

1,678 words

Key Takeaways

Error 429 (Rate Limit Reached) indicates you have exceeded your Requests Per Minute (RPM), Tokens Per Minute (TPM), or your account has insufficient pre-paid quota.
Authentication errors (401 Unauthorized, 403 Forbidden) typically stem from improperly injected environment variables, revoked keys, or missing Organization IDs.
Server-side errors (500, 502, 503) and Timeouts occur during OpenAI service degradation or when processing excessively large context windows.
The most robust fix for both 429 and 5xx errors is implementing exponential backoff with jitter in your application's retry logic.

Fix Approaches Compared
Method	When to Use	Time	Risk
Exponential Backoff	Handling 429s and transient 5xx errors automatically	Medium	Low
Upgrading Account Tier	Consistently hitting TPM/RPM limits under normal load	Fast	Low
OpenAI Batch API	Non-time-sensitive bulk processing tasks (e.g., embeddings)	Slow	Low
Increasing Timeout	Handling occasional 502/504 gateway timeouts on long prompts	Fast	Medium

Understanding OpenAI API Errors

When building production-grade applications on top of the OpenAI API, engineering teams inevitably encounter a variety of HTTP errors. Because the API processes complex, compute-heavy requests, it is subject to strict rate limits and occasional network instability. Understanding how to diagnose and automatically recover from 401, 403, 429, and 5xx errors is critical for maintaining high availability in your GenAI applications.

This guide breaks down the core categories of OpenAI API errors, how to diagnose them via the CLI and application logs, and the exact architectural patterns required to fix them.

Category 1: Rate Limits and Quotas (HTTP 429)

The HTTP 429 Too Many Requests status code is the most frequent error developers encounter. However, OpenAI overloads the 429 status code to mean two completely different things. You must inspect the JSON response body to determine the root cause.

Symptom A: "Rate limit reached for requests"

{
  "error": {
    "message": "Rate limit reached for requests. Limit: 500 / min. Current: 500 / min.",
    "type": "requests",
    "param": null,
    "code": "rate_limit_exceeded"
  }
}

Root Cause: You have exceeded the Requests Per Minute (RPM) or Tokens Per Minute (TPM) limit for your specific account tier (e.g., Tier 1, Tier 2). Every model has its own distinct RPM and TPM limits.

Symptom B: "You exceeded your current quota"

{
  "error": {
    "message": "You exceeded your current quota, please check your plan and billing details.",
    "type": "insufficient_quota",
    "param": null,
    "code": "insufficient_quota"
  }
}

Root Cause: This is not a throughput issue; it is a billing issue. Your pre-paid credit balance has hit zero, or you have reached the hard monthly spend limit defined in your organization's billing settings.

Category 2: Authentication Failures (HTTP 401 & 403)

HTTP 401 Unauthorized and 403 Forbidden errors mean the OpenAI edge gateway rejected your request before it even reached the inference servers.

Symptom: "Incorrect API key provided"

{
  "error": {
    "message": "Incorrect API key provided: sk-proj-***************************************.",
    "type": "invalid_request_error",
    "param": null,
    "code": "invalid_api_key"
  }
}

Root Cause:

Environment Variables: The OPENAI_API_KEY environment variable is not being loaded correctly in your production environment (e.g., missing in Docker, Kubernetes Secrets, or Vercel config).
Revocation: The key was accidentally leaked to GitHub and automatically revoked by OpenAI's security scanners.
Project Scoping: You are using a Project-scoped API key but attempting to access resources (like specific Fine-Tuned models or Assistants) that belong to a different project or require a legacy Organization-scoped key.

Category 3: Server Errors and Timeouts (HTTP 500, 502, 503)

These errors indicate that the problem lies on OpenAI's infrastructure, not your client.

500 Internal Server Error: A general failure on the OpenAI worker processing your request.
502 Bad Gateway / 503 Service Unavailable: The inference cluster is overloaded and dropping connections, or OpenAI is experiencing an active incident.
Timeouts (openai.APITimeoutError): The client connection remained open longer than the configured timeout limit without receiving the first byte of the response. This frequently happens with gpt-4 or gpt-4-turbo when using massive context windows (e.g., 100k+ tokens) without streaming enabled.

Step 1: Diagnose the Error

Before changing code, you must isolate whether the issue is network-related, auth-related, or payload-related.

Use this raw curl command to bypass your application logic and test the endpoint directly. Replace $OPENAI_API_KEY with your actual key. This command requests HTTP headers (-i), which contain critical rate limit diagnostic data.

curl -i https://api.openai.com/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -d '{
    "model": "gpt-3.5-turbo",
    "messages": [{"role": "user", "content": "Diagnostic test."}]
  }'

Analyze the Response Headers: Look for the x-ratelimit-* headers in the output. These tell you exactly how close you are to your limits:

x-ratelimit-limit-requests: Your RPM limit.
x-ratelimit-remaining-requests: How many requests you have left this minute.
x-ratelimit-reset-requests: Time until your RPM resets (e.g., 1s or 60ms).

If the curl succeeds but your app fails, the issue is within your application's network configuration (e.g., proxy blocking, bad env var injection).

Step 2: Implement the Fix

Fix 1: Exponential Backoff for 429 and 5xx Errors

You cannot prevent 429s or 5xx errors entirely. You must build your application to expect them. The industry standard is to implement exponential backoff with jitter.

This means when a request fails, you wait a short time (e.g., 1 second) and try again. If it fails again, you wait longer (e.g., 2 seconds, then 4, then 8). "Jitter" adds a random amount of milliseconds to the wait time so that if thousands of your background jobs fail simultaneously, they don't all retry at the exact same microsecond and cause another 429.

In Python, the tenacity library is the standard way to handle this (see the Code Block section for the exact implementation). The official OpenAI SDKs for Python and Node.js also have built-in retries, but configuring a custom tenacity wrapper gives you finer control over logging and fallback mechanisms.

Fix 2: Check Quota and Pre-Fund Your Account

If the error is insufficient_quota, code will not save you. OpenAI shifted from a post-paid model to a pre-paid model for most lower-tier accounts.

Log into the OpenAI Platform dashboard.
Navigate to Settings > Billing.
Add a payment method and add at least $5 to your credit balance.
Critical: Wait 5 to 10 minutes. Quota updates are not strictly real-time and have a propagation delay across OpenAI's API gateways.

Fix 3: Handle Timeouts by Enabling Streaming

If you are experiencing frequent timeouts (the client drops the connection before receiving a 200 OK), the issue is usually time-to-first-token (TTFT). Generating a 4,000-token response from GPT-4 can take 30-60 seconds. Many load balancers (like AWS ALB or Nginx) will aggressively cut off connections after 30 seconds of silence.

To fix this, set stream=True in your API request.

{
  "model": "gpt-4-turbo",
  "messages": [{"role": "user", "content": "Write a massive essay."}],
  "stream": true
}

By streaming, OpenAI immediately returns a chunked HTTP response and sends tokens one by one. This keeps the HTTP connection active and prevents your infrastructure from timing out the request.

Fix 4: Validate Authentication and Scopes

If you are receiving 401/403 errors in production but not locally:

Verify your CI/CD pipeline is actually injecting the secret. Run a script that prints the length of the API key (never print the key itself) to ensure it's populated.
Ensure you aren't passing a trailing newline character (\n) in your environment variable.
If using an Organization or Project scoped key, ensure you are passing the correct OpenAI-Organization and OpenAI-Project headers in your client instantiation.

Frequently Asked Questions

python

import os
import openai
from tenacity import retry, stop_after_attempt, wait_random_exponential, retry_if_exception_type

# Initialize the client. Relies on OPENAI_API_KEY environment variable.
client = openai.OpenAI()

# Configure exponential backoff using tenacity
# This will retry on Rate Limits (429) and Server Errors (5xx)
# It waits between 1 and 60 seconds, stopping after 6 attempts.
@retry(
    wait=wait_random_exponential(min=1, max=60),
    stop=stop_after_attempt(6),
    retry=retry_if_exception_type((
        openai.RateLimitError,
        openai.APIConnectionError,
        openai.InternalServerError,
        openai.APITimeoutError
    ))
)
def generate_completion_with_backoff(**kwargs):
    try:
        response = client.chat.completions.create(**kwargs)
        return response
    except Exception as e:
        print(f"API Error encountered. Retrying... Error details: {e}")
        raise # Re-raise to trigger tenacity retry logic

# Usage example
if __name__ == "__main__":
    try:
        print("Sending request to OpenAI...")
        result = generate_completion_with_backoff(
            model="gpt-4-turbo",
            messages=[{"role": "user", "content": "Explain exponential backoff."}],
            timeout=30.0 # Client-side timeout to fail fast on hung connections
        )
        print("Success:")
        print(result.choices[0].message.content)
    except Exception as final_error:
        print(f"Request permanently failed after exhausting all retries: {final_error}")

Error Medic Editorial

The Error Medic Editorial team consists of senior Site Reliability Engineers and DevOps practitioners dedicated to resolving complex production outages and API integrations.

Sources

Explore More API Errors Guides

AWS API Rate Limit Exceeded (ThrottlingException): Complete Troubleshooting Guide

Fix AWS ThrottlingException and API timeouts with exponential backoff, Service Quotas increases, and optimized API polling strategies for your workloads.

Azure API Timeout: 'The operation timed out' — Root Causes and Fixes

Fix Azure API timeouts caused by misconfigured APIM policies, backend latency, or connection limits. Step-by-step diagnostics and policy fixes included.

Azure API Timeout: Fix 504 Gateway Timeout and RequestTimeout Errors in Azure API Management, Functions, and ARM

Diagnose and fix Azure API timeout errors (504, 408, RequestTimeout) across API Management, Functions, and ARM. Includes policy fixes, host.json config, and CLI