Error Medic

Fixing "Connection Refused" and Timeout Errors in Grafana: A Comprehensive Troubleshooting Guide

Resolve Grafana 'connection refused' and timeout errors. Learn to fix OOM crashes, permission denied issues, and reverse proxy misconfigurations step-by-step.

Last updated:
Last verified:
1,903 words
Key Takeaways
  • Verify the Grafana service is actively running and bound to the correct network interface (0.0.0.0 instead of 127.0.0.1).
  • Inspect system firewalls (ufw, iptables) and cloud security groups to ensure port 3000 (or your proxy port) is open to prevent timeouts.
  • Check for Out of Memory (OOM) events using dmesg and adjust system or container resource limits accordingly.
  • Resolve 'permission denied' errors by correcting user and group ownership (grafana:grafana) on /var/lib/grafana and /etc/grafana.
Diagnostic Approaches Compared
MethodWhen to UseTimeRisk
Service Status CheckInitial triage for 'Connection Refused'1 minLow
Log Analysis (journalctl)Investigating OOM or Permission Denied5 minsLow
Network Binding VerificationService is up but inaccessible externally2 minsLow
Fixing Directory OwnershipLogs explicitly show SQLite or Plugin permission denied5 minsMedium

Understanding the Error

When working with observability stacks, Grafana serves as the critical pane of glass for monitoring infrastructure and applications. Consequently, when Grafana becomes inaccessible, your visibility into the entire system is severely degraded. The most common manifestations of a degraded Grafana instance are Connection Refused and Connection Timed Out errors, often accompanied by underlying issues such as Out of Memory (OOM) crashes or Permission Denied log entries.

From a DevOps and SRE perspective, resolving these errors requires a systematic approach. The symptom—a browser failing to load the Grafana UI or an API call failing—is just the tip of the iceberg. You must trace the request path from the client, through the network and firewalls, into the reverse proxy, and finally down to the Grafana process and its interaction with the host operating system.

Symptom 1: Connection Refused

The ERR_CONNECTION_REFUSED error (or curl: (7) Failed to connect to localhost port 3000: Connection refused) is highly specific. It means the client successfully routed packets to the destination IP, but the host operating system actively rejected the connection. This almost always means one of two things:

  1. The Grafana process is not running. (It crashed, was stopped, or failed to start).
  2. The Grafana process is running, but listening on a different IP or port. For instance, it might be bound strictly to 127.0.0.1 (localhost), meaning it will refuse connections arriving on the external public or private LAN IP.

Symptom 2: Connection Timed Out

A timeout error (ERR_CONNECTION_TIMED_OUT) behaves differently. The client sends a SYN packet to initiate the TCP handshake, but receives no response. The packets are dropping into a black hole. This points to network-level blockages:

  1. Host-level firewalls (iptables, UFW, firewalld) are silently dropping packets to port 3000.
  2. Cloud provider Security Groups (AWS) or Firewall Rules (GCP/Azure) are not configured to allow ingress traffic on the target port.
  3. Routing issues are preventing the return packets from reaching the client.

Symptom 3: Out of Memory (OOM)

Grafana is generally lightweight, written in Go, but it can consume massive amounts of memory under specific conditions:

  • Rendering large, complex dashboards over long time ranges.
  • Evaluating thousands of heavy alerting rules concurrently.
  • Runaway queries to underlying data sources (like Prometheus or Elasticsearch) that return massive payloads.

When the system exhausts its physical memory and swap space, the Linux kernel invokes the OOM Killer. The OOM Killer identifies memory-heavy processes and terminates them to save the kernel. You will see Grafana abruptly stop, leading to a sudden Connection Refused error.

Symptom 4: Permission Denied

Permission Denied errors typically occur during startup or when attempting to provision dashboards, install plugins, or write to the SQLite database. This frequently happens after migrating Grafana to a new server, restoring from a backup, or accidentally starting the process as the root user and then attempting to run it as the grafana user.


Step-by-Step Troubleshooting and Resolution

Step 1: Verify the Process State and Logs

The first step in triage is confirming whether the Grafana service is actually running.

Execute the following systemd command:

systemctl status grafana-server

If the status is inactive (dead) or failed, Grafana has crashed. To find out why, inspect the systemd journal logs. Do not just look at the last few lines; page through the logs to find the exact moment of failure.

journalctl -u grafana-server -n 100 --no-pager

Handling "Permission Denied" in Logs

If the logs output an error resembling: level=error msg="Stopped background service" logger=server reason="failed to open database: unable to open database file: permission denied"

This indicates the grafana user cannot access /var/lib/grafana/grafana.db. Fix this by recursively setting ownership of the entire Grafana data directory to the grafana user and group:

sudo chown -R grafana:grafana /var/lib/grafana
sudo chown -R grafana:grafana /etc/grafana
sudo chown -R grafana:grafana /var/log/grafana
sudo chmod 755 /var/lib/grafana

Restart the service and check the status again.

Step 2: Diagnosing Out of Memory (OOM) Events

If Grafana stopped unexpectedly without a clear error in journalctl, it might have been killed by the Linux kernel. Check the kernel ring buffer for OOM events:

dmesg -T | grep -i oom
# OR
grep -i "killed process" /var/log/messages /var/log/syslog

If you see Out of memory: Killed process <PID> (grafana-server), you have an OOM issue.

Resolution for OOM:

  1. Increase System Memory / Swap: The simplest fix is adding RAM to the VM or enabling swap space to handle brief memory spikes.
  2. Limit Query Sizes: Prevent users from querying excessive data points. In grafana.ini, you can tweak data source limits.
  3. Container Limits: If running in Docker or Kubernetes, ensure your resource limits aren't set too aggressively low.
    • Kubernetes Example: Adjust the resources.limits.memory in your deployment YAML from 256Mi to 1Gi or higher depending on your load.

Step 3: Network Binding and Port Issues

If systemctl status grafana-server reports the service is active (running), but you still get Connection Refused from a remote machine, Grafana might be bound to localhost.

Check the listening ports on the server:

sudo netstat -tulpn | grep grafana
# OR
sudo ss -tulpn | grep 3000

If the output shows 127.0.0.1:3000, Grafana will only accept local connections. To fix this, you must modify the grafana.ini configuration file (usually located at /etc/grafana/grafana.ini).

Find the [server] section and modify the http_addr property:

[server]
# The IP address to bind to, empty will bind to all interfaces
http_addr = 0.0.0.0

# The http port to use
http_port = 3000

Save the file and restart the service (sudo systemctl restart grafana-server). The netstat output should now show 0.0.0.0:3000 or :::3000.

Step 4: Investigating Timeouts and Firewalls

If you are experiencing timeouts, the packets are being dropped. You must verify firewall rules layer by layer.

1. Host Firewall (UFW/Firewalld):

If you are using Ubuntu with UFW:

sudo ufw status
# If 3000 is not listed, allow it:
sudo ufw allow 3000/tcp

If you are using CentOS/RHEL with Firewalld:

sudo firewall-cmd --list-all
# If 3000 is missing:
sudo firewall-cmd --add-port=3000/tcp --permanent
sudo firewall-cmd --reload

2. Cloud Provider Firewalls:

If the host firewall is open but timeouts persist, check your cloud provider console:

  • AWS: Navigate to EC2 -> Security Groups. Select the Security Group attached to your Grafana instance. Ensure there is an Inbound Rule allowing Custom TCP Port 3000 from your IP (or 0.0.0.0/0 if publicly accessible, though a reverse proxy is recommended).
  • GCP: Navigate to VPC Network -> Firewall. Ensure an ingress rule targets your instance (via network tags) and allows tcp:3000.

Step 5: Reverse Proxy Misconfigurations (Nginx/Apache)

In production environments, Grafana is rarely exposed directly on port 3000. It is usually placed behind a reverse proxy like Nginx or HAProxy. If Grafana works via curl http://localhost:3000 on the server but fails externally through the domain name, the proxy is the culprit.

Common Nginx reverse proxy issues that cause 502 Bad Gateway (which often masks a connection refused on the backend) or Timeouts:

  1. Wrong Upstream Address: Ensure your proxy_pass points to the correct local address.
  2. SELinux blocking Nginx: On RHEL/CentOS systems, SELinux prevents Nginx from making outbound network connections to upstream servers by default. This causes a Permission Denied in Nginx logs and a 502 for the user.

To check Nginx logs:

tail -f /var/log/nginx/error.log

If you see Permission denied while connecting to upstream and are running SELinux, run the following command to allow the web server to connect to the network:

sudo setsebool -P httpd_can_network_connect 1

A standard robust Nginx configuration for Grafana looks like this:

server {
    listen 80;
    server_name grafana.yourdomain.com;

    location / {
        proxy_set_header Host $http_host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;
        proxy_pass http://127.0.0.1:3000;
    }
}

By systematically working through process state, logs, network bindings, firewalls, and proxy configurations, you can reliably resolve "Connection Refused" and "Timeout" errors in Grafana, restoring critical visibility to your infrastructure.

Frequently Asked Questions

bash
# 1. Check Grafana service status
sudo systemctl status grafana-server

# 2. Inspect logs for 'permission denied' or database locking errors
sudo journalctl -u grafana-server -n 50 --no-pager

# 3. Check for Out of Memory (OOM) killer invocations
sudo dmesg -T | grep -i oom

# 4. Fix directory ownership and permissions (Common fix for startup failures)
sudo chown -R grafana:grafana /var/lib/grafana
sudo chown -R grafana:grafana /etc/grafana
sudo chown -R grafana:grafana /var/log/grafana

# 5. Verify Grafana is listening on the correct interface (0.0.0.0 vs 127.0.0.1)
sudo netstat -tulpn | grep 3000

# 6. Open UFW firewall for port 3000 to fix timeouts
sudo ufw allow 3000/tcp

# 7. Restart the service after applying fixes
sudo systemctl restart grafana-server
E

Error Medic Editorial

Error Medic Editorial comprises senior SREs, DevOps engineers, and systems administrators dedicated to producing in-depth, actionable troubleshooting guides for modern infrastructure and observability stacks.

Sources

Related Articles in Grafana

Explore More DevOps Config Guides