Error Medic

Network Error Troubleshooting: Load Balancers, Proxies & VPN Fixes

Network errors are some of the hardest to debug because the problem can exist at any point between the client and the server. A user reporting "the site is down" might be experiencing a DNS resolution failure, a TLS handshake error, a load balancer health check issue, a firewall rule blocking traffic, or an actual backend outage — and you need to figure out which one before you can fix it.

Modern infrastructure stacks multiple network layers: CDN (Cloudflare), load balancer (HAProxy, AWS ALB, nginx), reverse proxy (Traefik, Envoy), firewall (iptables, pfSense), and VPN tunnels (WireGuard, OpenVPN). Each layer adds functionality — and another place where things can break. Misconfigurations at any layer produce symptoms at every layer above it.

This section covers 31 troubleshooting articles across 11 networking categories, including load balancers (HAProxy, AWS ALB, nginx), CDN and edge networks (Cloudflare), service proxies (Traefik, Envoy), firewalls (iptables, pfSense), and VPN solutions (WireGuard, OpenVPN). Each guide targets specific error messages and network symptoms with systematic diagnostic steps.

Effective network debugging requires methodical layer-by-layer analysis. These guides teach you to isolate the problem layer first, then fix the specific misconfiguration. Every guide includes the exact commands and configuration changes needed to resolve the issue.

Browse by Category

Common Patterns & Cross-Cutting Themes

TLS/SSL Certificate Errors

Certificate errors are the most common class of network issues in production. "SSL certificate problem: certificate has expired," "unable to verify the first certificate," and "SSL handshake failure" all point to certificate chain issues.

The most frequent cause is simply an expired certificate. Even with automated renewal (Let's Encrypt, cert-manager), certificates expire when renewal fails silently — usually because of DNS changes, port 80 being blocked, or the renewal service crashing. Monitor certificate expiry dates and alert 30 days before expiration.

Chain issues are subtler: your server has a valid certificate but is missing the intermediate certificate, causing browsers to work (they cache intermediates) while API clients fail. Verify your full chain with openssl s_client -connect host:443 -showcerts and ensure the intermediate certificates are included in your configuration. SNI (Server Name Indication) mismatches cause errors when multiple domains share an IP — ensure your server is configured to present the correct certificate based on the requested hostname.

Load Balancer Health Check Failures

When a load balancer marks backends as unhealthy, traffic stops flowing even though the servers are running fine. Health check failures are often configuration mismatches rather than actual backend problems.

Common causes: the health check path returns a non-200 status (check that your /health endpoint exists and doesn't require authentication), the health check timeout is shorter than the backend's response time (increase it), the health check uses HTTP but the backend only serves HTTPS (or vice versa), and security groups or firewall rules block the health check source IP.

For HAProxy, check the stats page to see which backends are down and the last health check result. For AWS ALB, check the target group health in the console and review the health check settings. For nginx upstream health checks, enable the health_check directive and check the error log. Always test your health check endpoint independently with curl before trusting the load balancer's report.

Firewall & Packet Filtering Issues

Firewall misconfigurations manifest as mysterious connectivity failures — the service is running, the server is up, but traffic can't get through. The key diagnostic: can you connect from the server itself (curl localhost:port) but not from outside? Then it's a firewall issue.

For iptables, list all rules with iptables -L -n -v --line-numbers. Remember that iptables evaluates rules top-down and stops at the first match — a DROP rule before your ACCEPT rule will block traffic. Check all chains: INPUT for incoming traffic, OUTPUT for outgoing, and FORWARD for routed traffic. Don't forget ip6tables for IPv6.

For pfSense, check the firewall log (Status > System Logs > Firewall) to see which rule is blocking traffic. Floating rules are evaluated before interface rules and are a common source of unexpected blocks. Cloud environments add another firewall layer — AWS security groups, Azure NSGs, and GCP firewall rules all apply in addition to the server's local firewall.

VPN Tunnel & Connectivity Problems

VPN errors typically fall into two categories: tunnel establishment failures and traffic routing issues. WireGuard and OpenVPN have different failure modes, but the diagnostic approach is similar.

For tunnel establishment: check that both peers can reach each other's endpoints (is UDP traffic on the VPN port allowed through firewalls?), verify that keys are correctly configured on both sides, and ensure the server is actually listening (ss -ulnp | grep <port>). For OpenVPN, check the server and client logs for TLS handshake errors, certificate validation failures, and cipher suite mismatches.

For traffic routing issues after the tunnel is up: verify that the AllowedIPs (WireGuard) or push routes (OpenVPN) include the networks you're trying to reach, check that IP forwarding is enabled on the VPN server (sysctl net.ipv4.ip_forward), and ensure NAT/masquerade rules are in place if the VPN server needs to route traffic to other networks. Split tunneling vs. full tunneling configuration determines which traffic goes through the VPN.

Quick Troubleshooting Guide

SymptomLikely CauseFirst Step
SSL certificate expired or invalidCertificate expired or missing intermediateCheck expiry with openssl; renew cert; verify full chain is configured
502 Bad GatewayLoad balancer can't reach backend or backend crashedCheck backend health; verify upstream config; check backend logs
503 Service UnavailableAll backends marked unhealthy by load balancerCheck health check config; verify backends are running; check security groups
Connection timed outFirewall blocking traffic or wrong portCheck iptables/security groups; verify service is listening on expected port
Connection refusedService not running or binding to wrong addressCheck service status; verify listen address (0.0.0.0 vs 127.0.0.1); check port
DNS resolution failureDNS record missing, expired, or wrong nameserversTest with dig/nslookup; check DNS records; verify nameserver configuration
VPN tunnel won't establishFirewall blocking UDP port or key mismatchVerify UDP port is open; check keys on both peers; review VPN logs
VPN connected but can't reach resourcesMissing routes or IP forwarding disabledCheck AllowedIPs/push routes; enable ip_forward; verify NAT/masquerade rules
Intermittent 504 Gateway TimeoutBackend too slow or proxy timeout too shortIncrease proxy timeout; optimize backend response time; check for resource contention
Cloudflare 521/522/523 errorsOrigin server unreachable or blocking Cloudflare IPsWhitelist Cloudflare IP ranges; check origin server is running; verify SSL mode

Category Deep Dives

Frequently Asked Questions