Why am I getting a 429 error when I haven't sent many requests?

A 429 error often means 'insufficient_quota'. Check your billing dashboard. The API requires prepaid credits; it is billed separately from a ChatGPT Plus subscription. If your balance is zero, every request will return a 429.

How do I fix a timeout error (ReadTimeout) with GPT-4?

GPT-4 can be slow. Increase your HTTP client's timeout setting (e.g., to 60 or 120 seconds). Alternatively, use the 'stream=True' parameter in your request to receive data incrementally, which prevents proxy servers from closing idle connections.

What is the difference between a 401 and a 403 error?

A 401 error means your API key is invalid or missing. A 403 error means your key is recognized, but you lack permission to perform the action, often due to geographic restrictions or attempting to access a model you don't have rights to.

How long should I wait before retrying after a 429 Rate Limit error?

You should use an exponential backoff strategy. Start with a 1 to 2-second delay, and double it on subsequent failures (2s, 4s, 8s). Add 'jitter' (randomness) to prevent the thundering herd problem.

Does my ChatGPT Plus subscription include API credits?

No. The ChatGPT Plus UI subscription ($20/month) is completely separate from the OpenAI API. The API is strictly pay-as-you-go and requires its own prepaid billing setup.

Troubleshooting OpenAI API Rate Limits (429) and Connection Errors

Resolve OpenAI API 429 rate limits, 401/403 auth issues, and 5xx server errors with backoff strategies, quota management, and timeout configurations.

Last updated: February 24, 2026

Last verified: February 24, 2026

1,360 words

Key Takeaways

HTTP 429 (Too Many Requests) is the most common error, caused by hitting Request Per Minute (RPM), Token Per Minute (TPM) limits, or exhausting your prepaid billing quota.
HTTP 401 and 403 errors are authentication and authorization failures, usually stemming from invalid API keys, revoked access, or unsupported regions.
HTTP 500, 502, and 503 are server-side errors on OpenAI's infrastructure, requiring retry logic or waiting for service restoration.
Implement exponential backoff with jitter in your application code to gracefully handle transient network timeouts and rate limiting.

Fix Approaches Compared
Method	When to Use	Time	Risk
Implement Exponential Backoff	For transient 429s, 5xx errors, and timeouts	15-30 mins	Low
Add Prepaid Credits	When receiving 429 'insufficient_quota' errors	5 mins	Low
Upgrade Usage Tier	When consistently hitting RPM/TPM limits on valid usage	Varies	Medium (Cost)
Rotate API Keys	For persistent 401 Unauthorized errors	5 mins	Low
Adjust Request Timeouts	When facing frequent ReadTimeout errors on long completions	5 mins	Low

Understanding OpenAI API Errors

When building applications on top of the OpenAI API, you will inevitably encounter HTTP errors. Because the API serves millions of developers globally and processes compute-intensive requests, rate limiting and transient connection drops are standard operational realities.

Understanding the precise HTTP status code and the accompanying error message in the JSON payload is the first critical step to resolving the issue. Below, we break down the most common error codes: 401, 403, 429, 500, 502, 503, and network timeouts.

Authentication and Authorization: 401 and 403

HTTP 401: Unauthorized

This error indicates that the OpenAI server cannot verify your identity. The exact error message usually looks like: { "error": { "message": "Incorrect API key provided: sk-.... You can find your API key at https://platform.openai.com/account/api-keys.", "type": "invalid_request_error", "param": null, "code": "invalid_api_key" } }

Root Causes:

You are using an invalid or malformed API key.
The API key was deleted or revoked.
You have hardcoded the key and accidentally truncated it.
There is a typo in your Authorization: Bearer <TOKEN> header.

Resolution: Verify your API key in the OpenAI platform dashboard. If you suspect the key is compromised or invalid, generate a new one and update your environment variables. Ensure no extraneous whitespace is appended to the environment variable.

HTTP 403: Forbidden

A 403 error means your authentication is valid, but you are not allowed to access the requested resource.

Root Causes:

You are accessing the API from an unsupported country or region.
Your account has been flagged or suspended due to terms of service violations.
You are trying to access a model (like GPT-4 in the past, or specific fine-tuned models) that your account does not have access to.

Resolution: Check the OpenAI supported countries list. If you are using a VPN, try disabling it. Review your account standing in the dashboard.

The Infamous HTTP 429: Too Many Requests

The 429 error is the most frequently encountered issue. However, 'Too Many Requests' is an overloaded term in the OpenAI ecosystem. It can mean three completely different things, and you must read the error message to know which one applies.

Scenario A: Rate Limits (RPM/TPM/RPD)

Rate limit reached for requests. Limit: 3 / min. Please try again in 20s. OpenAI enforces limits on Requests Per Minute (RPM), Tokens Per Minute (TPM), and Requests Per Day (RPD). These limits vary drastically based on your usage tier (Tier 1 through Tier 5). Free tier users have severe restrictions (e.g., 3 RPM).

How to Fix:

Implement Retries: Use an exponential backoff algorithm. When a 429 is hit, wait a short period (e.g., 2 seconds), and retry. If it fails again, wait 4 seconds, then 8, up to a maximum threshold.
Batching: If you are sending many short prompts, batch them into fewer requests to conserve RPM.
Max Tokens: Lower the max_tokens parameter if you are hitting TPM limits, as OpenAI counts the requested max tokens against your limit, not just the generated tokens.
Upgrade Tier: Spend more money. Moving from Tier 1 to Tier 2 by depositing $50 drastically increases limits.

Scenario B: Insufficient Quota

You exceeded your current quota, please check your plan and billing details. This is a 429 error, but it has nothing to do with how fast you are sending requests. It means your prepaid balance is $0.00.

How to Fix: Go to the OpenAI Billing dashboard and add credits. Note that API access is prepaid; having a ChatGPT Plus subscription does not give you API credits.

Scenario C: Engine Overload

The engine is currently overloaded, please try again later. This is a temporary issue where OpenAI's specific compute cluster for the requested model is at capacity.

How to Fix: Implement exponential backoff. Wait and retry.

Server Errors: 500, 502, 503, and Timeouts

HTTP 500 (Internal Server Error) & 503 (Service Unavailable)

These indicate a systemic issue on OpenAI's side.

Resolution: You cannot fix these. You must gracefully catch them in your code and alert your team if they persist. Always check status.openai.com for active incidents.

HTTP 502 (Bad Gateway) & Timeouts

requests.exceptions.ReadTimeout: HTTPSConnectionPool(host='api.openai.com', port=443): Read timed out. A timeout occurs when the client gives up waiting for the server. GPT-4 and complex o1 model requests can take 30-60+ seconds to generate a response, easily exceeding default HTTP client timeouts (which are often 10-30 seconds).

How to Fix Timeouts:

Increase Client Timeout: Explicitly set the timeout in your HTTP client (e.g., timeout=60 in Python's requests or the official SDK).
Use Streaming: Set stream=True in your API request. Instead of waiting for the entire response to generate before receiving the payload, you receive chunks of text as they are computed. This keeps the HTTP connection active and prevents intermediate proxies (like Nginx or AWS API Gateway) from dropping the connection due to idle timeouts.

Frequently Asked Questions

python

import openai
import time
from tenacity import retry, wait_random_exponential, stop_after_attempt, retry_if_exception_type

# Configure the official OpenAI Python client
client = openai.OpenAI(api_key="sk-YOUR_API_KEY", timeout=60.0)

# Define the retry logic using Tenacity
# This will retry up to 6 times, waiting exponentially longer between attempts (up to 60s)
# It ONLY retries on RateLimitError and APIConnectionError.
@retry(
    wait=wait_random_exponential(min=1, max=60),
    stop=stop_after_attempt(6),
    retry=(retry_if_exception_type(openai.RateLimitError) | retry_if_exception_type(openai.APIConnectionError))
)
def create_chat_completion_with_backoff(**kwargs):
    try:
        return client.chat.completions.create(**kwargs)
    except openai.AuthenticationError as e:
        print(f"Fatal Auth Error (401): {e}")
        raise  # Do not retry on auth errors
    except openai.BadRequestError as e:
        print(f"Fatal Bad Request (400): {e}")
        raise  # Do not retry on malformed payloads

# Example usage
if __name__ == "__main__":
    try:
        response = create_chat_completion_with_backoff(
            model="gpt-4o",
            messages=[{"role": "user", "content": "Explain exponential backoff."}],
            max_tokens=150
        )
        print(response.choices[0].message.content)
    except Exception as final_error:
        print(f"Operation failed after retries: {final_error}")

Error Medic Editorial

Error Medic Editorial is a team of veteran Site Reliability Engineers and DevOps practitioners dedicated to demystifying complex cloud, API, and infrastructure failures.

Sources

Explore More API Errors Guides

AWS API Rate Limit Exceeded (ThrottlingException): Complete Troubleshooting Guide

Fix AWS ThrottlingException and API timeouts with exponential backoff, Service Quotas increases, and optimized API polling strategies for your workloads.

Azure API Timeout: 'The operation timed out' — Root Causes and Fixes

Fix Azure API timeouts caused by misconfigured APIM policies, backend latency, or connection limits. Step-by-step diagnostics and policy fixes included.

Azure API Timeout: Fix 504 Gateway Timeout and RequestTimeout Errors in Azure API Management, Functions, and ARM

Diagnose and fix Azure API timeout errors (504, 408, RequestTimeout) across API Management, Functions, and ARM. Includes policy fixes, host.json config, and CLI