Network Error Troubleshooting: Load Balancers, Proxies & VPN Fixes
Network errors are some of the hardest to debug because the problem can exist at any point between the client and the server. A user reporting "the site is down" might be experiencing a DNS resolution failure, a TLS handshake error, a load balancer health check issue, a firewall rule blocking traffic, or an actual backend outage — and you need to figure out which one before you can fix it.
Modern infrastructure stacks multiple network layers: CDN (Cloudflare), load balancer (HAProxy, AWS ALB, nginx), reverse proxy (Traefik, Envoy), firewall (iptables, pfSense), and VPN tunnels (WireGuard, OpenVPN). Each layer adds functionality — and another place where things can break. Misconfigurations at any layer produce symptoms at every layer above it.
This section covers 31 troubleshooting articles across 11 networking categories, including load balancers (HAProxy, AWS ALB, nginx), CDN and edge networks (Cloudflare), service proxies (Traefik, Envoy), firewalls (iptables, pfSense), and VPN solutions (WireGuard, OpenVPN). Each guide targets specific error messages and network symptoms with systematic diagnostic steps.
Effective network debugging requires methodical layer-by-layer analysis. These guides teach you to isolate the problem layer first, then fix the specific misconfiguration. Every guide includes the exact commands and configuration changes needed to resolve the issue.
Browse by Category
Common Patterns & Cross-Cutting Themes
TLS/SSL Certificate Errors
Certificate errors are the most common class of network issues in production. "SSL certificate problem: certificate has expired," "unable to verify the first certificate," and "SSL handshake failure" all point to certificate chain issues.
The most frequent cause is simply an expired certificate. Even with automated renewal (Let's Encrypt, cert-manager), certificates expire when renewal fails silently — usually because of DNS changes, port 80 being blocked, or the renewal service crashing. Monitor certificate expiry dates and alert 30 days before expiration.
Chain issues are subtler: your server has a valid certificate but is missing the intermediate certificate, causing browsers to work (they cache intermediates) while API clients fail. Verify your full chain with openssl s_client -connect host:443 -showcerts and ensure the intermediate certificates are included in your configuration. SNI (Server Name Indication) mismatches cause errors when multiple domains share an IP — ensure your server is configured to present the correct certificate based on the requested hostname.
Load Balancer Health Check Failures
When a load balancer marks backends as unhealthy, traffic stops flowing even though the servers are running fine. Health check failures are often configuration mismatches rather than actual backend problems.
Common causes: the health check path returns a non-200 status (check that your /health endpoint exists and doesn't require authentication), the health check timeout is shorter than the backend's response time (increase it), the health check uses HTTP but the backend only serves HTTPS (or vice versa), and security groups or firewall rules block the health check source IP.
For HAProxy, check the stats page to see which backends are down and the last health check result. For AWS ALB, check the target group health in the console and review the health check settings. For nginx upstream health checks, enable the health_check directive and check the error log. Always test your health check endpoint independently with curl before trusting the load balancer's report.
Firewall & Packet Filtering Issues
Firewall misconfigurations manifest as mysterious connectivity failures — the service is running, the server is up, but traffic can't get through. The key diagnostic: can you connect from the server itself (curl localhost:port) but not from outside? Then it's a firewall issue.
For iptables, list all rules with iptables -L -n -v --line-numbers. Remember that iptables evaluates rules top-down and stops at the first match — a DROP rule before your ACCEPT rule will block traffic. Check all chains: INPUT for incoming traffic, OUTPUT for outgoing, and FORWARD for routed traffic. Don't forget ip6tables for IPv6.
For pfSense, check the firewall log (Status > System Logs > Firewall) to see which rule is blocking traffic. Floating rules are evaluated before interface rules and are a common source of unexpected blocks. Cloud environments add another firewall layer — AWS security groups, Azure NSGs, and GCP firewall rules all apply in addition to the server's local firewall.
VPN Tunnel & Connectivity Problems
VPN errors typically fall into two categories: tunnel establishment failures and traffic routing issues. WireGuard and OpenVPN have different failure modes, but the diagnostic approach is similar.
For tunnel establishment: check that both peers can reach each other's endpoints (is UDP traffic on the VPN port allowed through firewalls?), verify that keys are correctly configured on both sides, and ensure the server is actually listening (ss -ulnp | grep <port>). For OpenVPN, check the server and client logs for TLS handshake errors, certificate validation failures, and cipher suite mismatches.
For traffic routing issues after the tunnel is up: verify that the AllowedIPs (WireGuard) or push routes (OpenVPN) include the networks you're trying to reach, check that IP forwarding is enabled on the VPN server (sysctl net.ipv4.ip_forward), and ensure NAT/masquerade rules are in place if the VPN server needs to route traffic to other networks. Split tunneling vs. full tunneling configuration determines which traffic goes through the VPN.
Quick Troubleshooting Guide
| Symptom | Likely Cause | First Step |
|---|---|---|
| SSL certificate expired or invalid | Certificate expired or missing intermediate | Check expiry with openssl; renew cert; verify full chain is configured |
| 502 Bad Gateway | Load balancer can't reach backend or backend crashed | Check backend health; verify upstream config; check backend logs |
| 503 Service Unavailable | All backends marked unhealthy by load balancer | Check health check config; verify backends are running; check security groups |
| Connection timed out | Firewall blocking traffic or wrong port | Check iptables/security groups; verify service is listening on expected port |
| Connection refused | Service not running or binding to wrong address | Check service status; verify listen address (0.0.0.0 vs 127.0.0.1); check port |
| DNS resolution failure | DNS record missing, expired, or wrong nameservers | Test with dig/nslookup; check DNS records; verify nameserver configuration |
| VPN tunnel won't establish | Firewall blocking UDP port or key mismatch | Verify UDP port is open; check keys on both peers; review VPN logs |
| VPN connected but can't reach resources | Missing routes or IP forwarding disabled | Check AllowedIPs/push routes; enable ip_forward; verify NAT/masquerade rules |
| Intermittent 504 Gateway Timeout | Backend too slow or proxy timeout too short | Increase proxy timeout; optimize backend response time; check for resource contention |
| Cloudflare 521/522/523 errors | Origin server unreachable or blocking Cloudflare IPs | Whitelist Cloudflare IP ranges; check origin server is running; verify SSL mode |