Troubleshooting OpenAI API Rate Limits (429) and Connection Errors
Resolve OpenAI API 429 rate limits, 401/403 auth issues, and 5xx server errors with backoff strategies, quota management, and timeout configurations.
- HTTP 429 (Too Many Requests) is the most common error, caused by hitting Request Per Minute (RPM), Token Per Minute (TPM) limits, or exhausting your prepaid billing quota.
- HTTP 401 and 403 errors are authentication and authorization failures, usually stemming from invalid API keys, revoked access, or unsupported regions.
- HTTP 500, 502, and 503 are server-side errors on OpenAI's infrastructure, requiring retry logic or waiting for service restoration.
- Implement exponential backoff with jitter in your application code to gracefully handle transient network timeouts and rate limiting.
| Method | When to Use | Time | Risk |
|---|---|---|---|
| Implement Exponential Backoff | For transient 429s, 5xx errors, and timeouts | 15-30 mins | Low |
| Add Prepaid Credits | When receiving 429 'insufficient_quota' errors | 5 mins | Low |
| Upgrade Usage Tier | When consistently hitting RPM/TPM limits on valid usage | Varies | Medium (Cost) |
| Rotate API Keys | For persistent 401 Unauthorized errors | 5 mins | Low |
| Adjust Request Timeouts | When facing frequent ReadTimeout errors on long completions | 5 mins | Low |
Understanding OpenAI API Errors
When building applications on top of the OpenAI API, you will inevitably encounter HTTP errors. Because the API serves millions of developers globally and processes compute-intensive requests, rate limiting and transient connection drops are standard operational realities.
Understanding the precise HTTP status code and the accompanying error message in the JSON payload is the first critical step to resolving the issue. Below, we break down the most common error codes: 401, 403, 429, 500, 502, 503, and network timeouts.
Authentication and Authorization: 401 and 403
HTTP 401: Unauthorized
This error indicates that the OpenAI server cannot verify your identity. The exact error message usually looks like:
{ "error": { "message": "Incorrect API key provided: sk-.... You can find your API key at https://platform.openai.com/account/api-keys.", "type": "invalid_request_error", "param": null, "code": "invalid_api_key" } }
Root Causes:
- You are using an invalid or malformed API key.
- The API key was deleted or revoked.
- You have hardcoded the key and accidentally truncated it.
- There is a typo in your
Authorization: Bearer <TOKEN>header.
Resolution: Verify your API key in the OpenAI platform dashboard. If you suspect the key is compromised or invalid, generate a new one and update your environment variables. Ensure no extraneous whitespace is appended to the environment variable.
HTTP 403: Forbidden
A 403 error means your authentication is valid, but you are not allowed to access the requested resource.
Root Causes:
- You are accessing the API from an unsupported country or region.
- Your account has been flagged or suspended due to terms of service violations.
- You are trying to access a model (like GPT-4 in the past, or specific fine-tuned models) that your account does not have access to.
Resolution: Check the OpenAI supported countries list. If you are using a VPN, try disabling it. Review your account standing in the dashboard.
The Infamous HTTP 429: Too Many Requests
The 429 error is the most frequently encountered issue. However, 'Too Many Requests' is an overloaded term in the OpenAI ecosystem. It can mean three completely different things, and you must read the error message to know which one applies.
Scenario A: Rate Limits (RPM/TPM/RPD)
Rate limit reached for requests. Limit: 3 / min. Please try again in 20s.
OpenAI enforces limits on Requests Per Minute (RPM), Tokens Per Minute (TPM), and Requests Per Day (RPD). These limits vary drastically based on your usage tier (Tier 1 through Tier 5). Free tier users have severe restrictions (e.g., 3 RPM).
How to Fix:
- Implement Retries: Use an exponential backoff algorithm. When a 429 is hit, wait a short period (e.g., 2 seconds), and retry. If it fails again, wait 4 seconds, then 8, up to a maximum threshold.
- Batching: If you are sending many short prompts, batch them into fewer requests to conserve RPM.
- Max Tokens: Lower the
max_tokensparameter if you are hitting TPM limits, as OpenAI counts the requested max tokens against your limit, not just the generated tokens. - Upgrade Tier: Spend more money. Moving from Tier 1 to Tier 2 by depositing $50 drastically increases limits.
Scenario B: Insufficient Quota
You exceeded your current quota, please check your plan and billing details.
This is a 429 error, but it has nothing to do with how fast you are sending requests. It means your prepaid balance is $0.00.
How to Fix: Go to the OpenAI Billing dashboard and add credits. Note that API access is prepaid; having a ChatGPT Plus subscription does not give you API credits.
Scenario C: Engine Overload
The engine is currently overloaded, please try again later.
This is a temporary issue where OpenAI's specific compute cluster for the requested model is at capacity.
How to Fix: Implement exponential backoff. Wait and retry.
Server Errors: 500, 502, 503, and Timeouts
HTTP 500 (Internal Server Error) & 503 (Service Unavailable)
These indicate a systemic issue on OpenAI's side.
Resolution: You cannot fix these. You must gracefully catch them in your code and alert your team if they persist. Always check status.openai.com for active incidents.
HTTP 502 (Bad Gateway) & Timeouts
requests.exceptions.ReadTimeout: HTTPSConnectionPool(host='api.openai.com', port=443): Read timed out.
A timeout occurs when the client gives up waiting for the server. GPT-4 and complex o1 model requests can take 30-60+ seconds to generate a response, easily exceeding default HTTP client timeouts (which are often 10-30 seconds).
How to Fix Timeouts:
- Increase Client Timeout: Explicitly set the timeout in your HTTP client (e.g.,
timeout=60in Python'srequestsor the official SDK). - Use Streaming: Set
stream=Truein your API request. Instead of waiting for the entire response to generate before receiving the payload, you receive chunks of text as they are computed. This keeps the HTTP connection active and prevents intermediate proxies (like Nginx or AWS API Gateway) from dropping the connection due to idle timeouts.
Frequently Asked Questions
import openai
import time
from tenacity import retry, wait_random_exponential, stop_after_attempt, retry_if_exception_type
# Configure the official OpenAI Python client
client = openai.OpenAI(api_key="sk-YOUR_API_KEY", timeout=60.0)
# Define the retry logic using Tenacity
# This will retry up to 6 times, waiting exponentially longer between attempts (up to 60s)
# It ONLY retries on RateLimitError and APIConnectionError.
@retry(
wait=wait_random_exponential(min=1, max=60),
stop=stop_after_attempt(6),
retry=(retry_if_exception_type(openai.RateLimitError) | retry_if_exception_type(openai.APIConnectionError))
)
def create_chat_completion_with_backoff(**kwargs):
try:
return client.chat.completions.create(**kwargs)
except openai.AuthenticationError as e:
print(f"Fatal Auth Error (401): {e}")
raise # Do not retry on auth errors
except openai.BadRequestError as e:
print(f"Fatal Bad Request (400): {e}")
raise # Do not retry on malformed payloads
# Example usage
if __name__ == "__main__":
try:
response = create_chat_completion_with_backoff(
model="gpt-4o",
messages=[{"role": "user", "content": "Explain exponential backoff."}],
max_tokens=150
)
print(response.choices[0].message.content)
except Exception as final_error:
print(f"Operation failed after retries: {final_error}")Error Medic Editorial
Error Medic Editorial is a team of veteran Site Reliability Engineers and DevOps practitioners dedicated to demystifying complex cloud, API, and infrastructure failures.