Why am I getting a 429 error when I haven't hit my request limit?

You are likely hitting your Token Per Minute (TPM) limit rather than your Request Per Minute (RPM) limit. Every request sends a prompt and expects a completion. OpenAI calculates the maximum potential tokens for the completion based on your `max_tokens` parameter. If `max_tokens` is set extremely high, even a few requests can exhaust your TPM quota.

How do I distinguish between an invalid key and an empty billing account?

An invalid API key returns a 401 Unauthorized status with the code `invalid_api_key`. An empty billing account or exhausted quota returns a 429 Too Many Requests status with the code `insufficient_quota`.

What is the recommended timeout setting for OpenAI API calls?

For simple text classification, 10-30 seconds is usually sufficient. For long-form content generation using complex models like GPT-4, you should set a read timeout of at least 60-120 seconds. Implementing streaming is the best way to mitigate timeout issues.

Why do I get a 502 Bad Gateway error randomly?

502 Bad Gateway errors indicate that the edge proxy or load balancer connecting you to OpenAI's infrastructure failed to route to a healthy backend worker. This happens during massive global usage spikes. The solution is to implement exponential backoff and retry the request.

Does OpenAI charge me for failed requests (4xx or 5xx)?

No. You are only billed for successful requests (HTTP 200) that return token usage. Rate limits, timeouts, and server errors do not deduct from your token balance or monetary credit.

Troubleshooting OpenAI API Errors: Rate Limits (429), Authentication (401/403), and Timeouts

Master OpenAI API troubleshooting. Resolve 429 Rate Limit, 401 Unauthorized, 50x server errors, and timeouts with our comprehensive SRE guide and backoff code.

Last updated: February 24, 2026

Last verified: February 24, 2026

1,711 words

Key Takeaways

Implement Exponential Backoff with Jitter to gracefully handle 429 Too Many Requests errors without overwhelming the API.
Verify API keys, organization IDs, and active billing status to resolve 401 Unauthorized and 403 Forbidden errors.
Inspect HTTP response headers (x-ratelimit-*) to dynamically adjust request pacing based on real-time token and request limits.
Distinguish between client-side timeouts and server-side 502/503 errors to apply the correct retry or circuit-breaker logic.

OpenAI API Error Handling Strategies
HTTP Status	Root Cause	Action Required	Retry Strategy
401 Unauthorized	Invalid or missing API key	Verify credentials and environment variables	Do Not Retry
403 Forbidden	Blocked region or flagged account	Check dashboard for billing or policy issues	Do Not Retry
429 Too Many Requests	RPM/TPM limit exceeded or empty quota	Read x-ratelimit headers, backoff, check billing	Exponential Backoff
500 / 502 / 503	OpenAI internal server or gateway issue	Check status.openai.com, monitor error rates	Backoff + Circuit Breaker
Timeout (No Status)	Network latency or extreme model load	Increase client timeout, use streaming	Immediate Retry (Once) -> Backoff

Understanding OpenAI API Errors

When building production-grade applications on top of Large Language Models (LLMs), encountering API errors is not a matter of if, but when. The OpenAI API, while highly scalable, imposes strict rate limits and occasionally suffers from high-latency spikes or gateway errors due to the immense compute required for inference. As a DevOps or Site Reliability Engineer (SRE), your goal is to build resilient systems that gracefully handle these failures without degrading the user experience.

This guide explores the most common OpenAI API errors—specifically focusing on Rate Limits (429), Authentication issues (401/403), Server Errors (500, 502, 503), and Timeouts—and provides actionable, code-driven solutions to mitigate them.

Deep Dive into Specific Error Codes

1. HTTP 429: Rate Limit Reached / Quota Exceeded

The Symptom: Your application suddenly stops processing requests, and the API returns an HTTP 429 status code. The error message often looks like this: { "error": { "message": "Rate limit reached for default-gpt-4 in organization org-xxxxx on tokens per min (TPM). Limit: 10000. Please try again in 6ms.", "type": "requests", "param": null, "code": "rate_limit_exceeded" } }

The Root Cause: OpenAI enforces rate limits based on your account's Tier. These limits are calculated across multiple dimensions:

RPM (Requests Per Minute): The sheer volume of API calls.
RPD (Requests Per Day): A daily cap to prevent runaway usage.
TPM (Tokens Per Minute): The total number of tokens (prompt + expected completion) processed.
Quota Exceeded: You have hit your monthly hard billing limit. (Error code: insufficient_quota).

2. HTTP 401 & 403: Authentication and Authorization

The Symptom: Requests fail immediately with a 401 or 403 status code.

401 Unauthorized: "Incorrect API key provided: sk-xxxx..."
403 Forbidden: You might see messages related to accessing a model you don't have permission for, or accessing the API from an unsupported country.

The Root Cause:

401: The API key is missing, malformed, revoked, or belongs to a different organization. Often caused by misconfigured .env files or CI/CD secrets.
403: The account has been flagged for policy violations, you are requesting a specialized model (like fine-tuned models) without the correct Org ID header, or you lack the necessary RBAC permissions in the new OpenAI Projects structure.

3. HTTP 500, 502, 503: Server-Side Anomalies

The Symptom: The API returns a 5xx HTTP status code.

500 Internal Server Error: "The server had an error while processing your request. Sorry about that!"
502 Bad Gateway / 503 Service Unavailable: "That model is currently overloaded with other requests."

The Root Cause: These are entirely on OpenAI's side. 502s and 503s typically occur during massive usage spikes when the routing layer or GPU inference workers cannot accept new connections. 500s indicate an unhandled exception within OpenAI's internal microservices.

4. Client and Server Timeouts

The Symptom: Your HTTP client throws a TimeoutException, ReadTimeout, or the connection drops completely without returning a status code.

The Root Cause: Generative AI takes time. Generating a 4000-token response can take 30-60 seconds or more. If your HTTP client (e.g., requests in Python, or axios in Node.js) has a default timeout of 10 or 30 seconds, it will terminate the connection before OpenAI finishes generating the response.

Step 1: Diagnose

Before implementing a fix, you must instrument your application to log the exact failure mode. OpenAI provides crucial diagnostic data in its HTTP response headers.

Run a verbose cURL command to inspect the headers:

curl https://api.openai.com/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -d '{
    "model": "gpt-3.5-turbo",
    "messages": [{"role": "user", "content": "Hello!"}]
  }' -v

Look specifically for the x-ratelimit-* headers in the output:

x-ratelimit-limit-requests: Maximum requests per minute.
x-ratelimit-remaining-requests: Requests left in the current window.
x-ratelimit-reset-requests: Time until the request limit resets.
x-ratelimit-limit-tokens: Maximum tokens per minute.
x-ratelimit-remaining-tokens: Tokens left in the current window.
x-ratelimit-reset-tokens: Time until the token limit resets.

If you see x-ratelimit-remaining-tokens: 0, you know you are hitting a TPM limit, not an RPM limit.

Diagnosing 401/403

Ensure your application is passing the correct headers. If you use multiple organizations, you must specify the OpenAI-Organization header, or the request might default to an org that lacks billing details.

Step 2: Fix

1. Handling 429s: Exponential Backoff with Jitter

The most critical resilience pattern for the OpenAI API is Exponential Backoff with Jitter. If you simply retry immediately upon receiving a 429, you will exacerbate the rate limit and potentially get your IP temporarily banned.

Instead, you should wait a base amount of time, doubling the wait time for each subsequent failure, and adding a random "jitter" (a few milliseconds to seconds) to prevent the "thundering herd" problem where multiple concurrent threads retry at the exact same millisecond.

Note: Do not retry if the 429 error indicates insufficient_quota. This means your billing account is empty, and retries will never succeed until a human adds a credit card.

2. Handling Timeouts: Connection vs. Read Timeouts

Configure your HTTP clients to have distinct connection and read timeouts. Connecting to OpenAI should take less than 3 seconds. Reading the response can take minutes.

If using the official Python SDK, you can configure this explicitly:

from openai import OpenAI
import httpx

client = OpenAI(
    timeout=httpx.Timeout(60.0, read=120.0, connect=5.0)
)

Furthermore, consider using Streaming (stream=True). Streaming returns chunks of the response as they are generated. This prevents read timeouts because data is constantly flowing over the TCP connection, keeping it alive, and provides a much better UX for end-users who don't have to wait 30 seconds to see the first word.

3. Handling 5xx Errors: Circuit Breakers

For 500, 502, and 503 errors, standard backoff applies, but you should also implement a Circuit Breaker pattern. If OpenAI returns a 503 five times in a row, the circuit breaker "opens" and immediately fails subsequent requests for a set period (e.g., 5 minutes) without even hitting the OpenAI API. This saves your application from hanging and allows you to fail over to a backup model (e.g., Anthropic Claude, Azure OpenAI, or a local OSS model) or gracefully inform the user that the AI provider is down.

4. Resolving 401/403 Errors

These are structural failures.

Check .env loading: Ensure your environment variables are actually being injected into the container (check Docker/Kubernetes secrets).
Verify Project Keys: OpenAI recently introduced "Project API Keys". Ensure the key you generated has RBAC permissions to access the specific model you are calling.
Fund the Account: OpenAI requires pre-funded accounts for many API tiers. Go to the Billing dashboard and ensure you have a positive credit balance.

Frequently Asked Questions

python

import os
import openai
from openai import OpenAI
from tenacity import (
    retry,
    stop_after_attempt,
    wait_random_exponential,
    retry_if_exception_type
)

client = OpenAI(api_key=os.environ.get("OPENAI_API_KEY"))

# Retry only on specific exceptions (Rate limits and connection issues)
# We use exponential backoff from 1 to 60 seconds with added jitter
@retry(
    wait=wait_random_exponential(multiplier=1, max=60),
    stop=stop_after_attempt(5),
    retry=retry_if_exception_type((openai.RateLimitError, openai.APIConnectionError, openai.InternalServerError))
)
def chat_completion_with_backoff(**kwargs):
    try:
        response = client.chat.completions.create(**kwargs)
        return response
    except openai.RateLimitError as e:
        # Check if the rate limit is a hard quota issue, which we shouldn't retry
        if "insufficient_quota" in str(e):
            print("CRITICAL: Billing quota exhausted. Manual intervention required.")
            raise e
        print(f"Rate limit hit. Retrying... Error: {e}")
        raise e
    except openai.AuthenticationError as e:
        # Never retry 401/403 errors
        print(f"FATAL: Authentication failed. Check API key. Error: {e}")
        raise e

# Usage example
if __name__ == "__main__":
    try:
        res = chat_completion_with_backoff(
            model="gpt-4",
            messages=[{"role": "user", "content": "Explain exponential backoff."}]
        )
        print(res.choices[0].message.content)
    except Exception as final_error:
        print(f"Failed after max retries or fatal error: {final_error}")

Error Medic Editorial

Expert DevOps and SRE team specializing in API integration, system reliability, and scalable infrastructure. We help teams build resilient AI applications.

Sources

Explore More API Errors Guides

AWS API Rate Limit Exceeded (ThrottlingException): Complete Troubleshooting Guide

Fix AWS ThrottlingException and API timeouts with exponential backoff, Service Quotas increases, and optimized API polling strategies for your workloads.

Azure API Timeout: 'The operation timed out' — Root Causes and Fixes

Fix Azure API timeouts caused by misconfigured APIM policies, backend latency, or connection limits. Step-by-step diagnostics and policy fixes included.

Azure API Timeout: Fix 504 Gateway Timeout and RequestTimeout Errors in Azure API Management, Functions, and ARM

Diagnose and fix Azure API timeout errors (504, 408, RequestTimeout) across API Management, Functions, and ARM. Includes policy fixes, host.json config, and CLI