Troubleshooting InfluxDB Connection Refused and Out of Memory Errors
Resolve 'InfluxDB connection refused', OOM crashes, and slow query timeouts. Learn to diagnose bind-address issues, optimize TSI indexes, and fix retention poli
- Connection refused is often a symptom of the InfluxDB process crashing due to Out of Memory (OOM) kills by the Linux kernel.
- Network misconfigurations, such as binding to 127.0.0.1 instead of 0.0.0.0 or firewall blocks, are common causes for external connection failures.
- High series cardinality and unbounded queries lead to InfluxDB slow queries, timeouts, and eventual memory exhaustion.
- Quick Fix: Check dmesg for OOM kills, verify bind-address in influxdb.conf, and enable TSI (Time Series Index) to reduce RAM usage.
| Method | When to Use | Time | Risk |
|---|---|---|---|
| Change bind-address | Service runs but external clients get Connection Refused | 5 mins | Low |
| Increase ulimit (File Descriptors) | Logs show 'too many open files' before crashing | 10 mins | Low |
| Enable TSI (Time Series Index) | High series cardinality causing OOM kills | 1-2 hours | Medium (Requires downtime and index rebuild) |
| Implement Downsampling & RPs | Database size growing unbounded, causing slow queries | Days | High (Data deletion involved) |
Understanding the Error: InfluxDB Connection Refused
When you encounter curl: (7) Failed to connect to localhost port 8086: Connection refused or dial tcp 127.0.0.1:8086: connect: connection refused, it usually means the InfluxDB service is either not running, crashing in a loop, or bound to the wrong network interface.
However, in production environments, a "Connection Refused" error is rarely just a simple networking mistake. Frequently, it is the trailing symptom of a more severe underlying issue, such as an InfluxDB Out of Memory (OOM) kill by the Linux kernel, or the database becoming completely unresponsive due to an InfluxDB Slow Query or InfluxDB Timeout.
When InfluxDB runs out of memory, the Linux OOM killer terminates the influxd process. Subsequently, any client attempting to write or query data receives a "Connection Refused" error because the daemon is dead. Similarly, if the database is locked up processing a massive, unoptimized query, the HTTP API may fail to respond within the reverse proxy's timeout window, or the TCP backlog might fill up, leading to rejected connections.
Common Root Causes
- Service Stopped or Crash Looping (OOM Kills): The most common reason for unexpected connection refusals. The
influxdprocess consumes memory proportional to series cardinality and query complexity. - Bind Address Configuration: The
bind-addressorhttp-bind-addressininfluxdb.confis set to127.0.0.1but clients are trying to connect via a public or Docker network IP. - Firewall and Security Groups: Port 8086 (HTTP API) or 8088 (RPC) is blocked by
iptables,ufw, or cloud provider security groups. - File Descriptor Exhaustion: InfluxDB opens many files for TSM and TSI files. If
ulimit -nis too low, it stops accepting new connections.
Step 1: Diagnose the Connection Refused Error
First, determine if the process is actually running or if it was recently killed.
Check the Service Status:
Look for Active: failed or Active: inactive. If it recently restarted, check the uptime. If the service is not running, clients cannot connect.
Check for OOM Kills in the Kernel Log:
If you see Out of memory: Killed process <PID> (influxd), your connection refused error is definitively an OOM issue. InfluxDB requires substantial RAM when handling high cardinality or complex GROUP BY time intervals on large datasets.
Verify Network Binding:
If the service is running perfectly fine but external clients cannot connect, check the listening ports. You should see it listening on :::8086 or 0.0.0.0:8086. If it only shows 127.0.0.1:8086, external traffic will be refused.
Step 2: Fixing "Connection Refused" due to Network/Config
If the issue is purely networking or configuration:
1. Update the Bind Address:
Edit your /etc/influxdb/influxdb.conf (or the respective Docker environment variables) and set bind-address = ":8086" under the [http] section. Restart the service.
2. Adjust File Descriptors:
If you see too many open files in the logs leading to dropped connections, edit /lib/systemd/system/influxdb.service and add LimitNOFILE=65536. Then run sudo systemctl daemon-reload && sudo systemctl restart influxdb.
Step 3: Resolving OOM Kills and Memory Issues
If the "Connection Refused" is caused by OOM kills, you must address the memory consumption. InfluxDB memory usage is driven by series cardinality and query payload.
1. Switch to TSI (Time Series Index):
By default, older versions of InfluxDB (1.x) keep the entire series index in memory (in-memory index). If you have millions of unique series (e.g., generating unique tags per request like UUIDs), you will run out of RAM.
Enable TSI to move the index to disk. In influxdb.conf, under [data], set index-version = "tsi1".
Note: If you have existing data, you must rebuild the index using the influx_inspect buildtsi tool before restarting.
2. Implement Proper Retention Policies (RPs): Keeping raw, high-resolution data forever guarantees eventual OOMs and slow queries. Create a Retention Policy to drop high-res data after 30 days.
3. Use Continuous Queries (CQs) or Tasks: Downsample your data. Instead of querying 6 months of 1-second resolution data, query 1-hour resolution data using CQs.
Step 4: Mitigating InfluxDB Slow Queries and Timeouts
Sometimes, connection refused or 502 Bad Gateway errors happen because the HTTP request times out while InfluxDB is churning through a massive query.
1. Query Timeout Configuration:
Prevent rogue queries from locking up the database. In influxdb.conf under [coordinator], configure query-timeout = "60s" and log-queries-after = "10s". Setting log-queries-after allows you to identify which queries are causing the bottlenecks in your logs.
2. Analyze Slow Queries:
Run SHOW QUERIES in the InfluxDB CLI. If you see queries executing for hundreds of seconds, you may need to kill them using KILL QUERY <qid>.
3. Optimize Your Queries:
- Avoid leading wildcards in regex tag matching (
=~ /.*value/). - Limit the time range: Always include a
WHERE time > now() - 1hclause. Querying without a time bound scans the entire database. - Reduce
GROUP BYcardinality: Grouping by a tag that has 100,000 unique values will create 100,000 buckets in memory, instantly causing an OOM or massive timeout.
Final Verification
After applying fixes (especially memory limits and TSI):
- Monitor memory usage with
topor an external monitoring tool like Telegraf + Grafana. - Watch the InfluxDB logs:
tail -f /var/log/influxdb/influxd.logfor anylevel=errororlevel=warn. - Test external connections from a different host:
curl -I http://<influxdb-ip>:8086/ping. If it returns a204 No ContentHTTP status, the connection is successfully established and InfluxDB is healthy.
Frequently Asked Questions
# Check service status
sudo systemctl status influxdb
# Check for OOM Kills in the kernel log
dmesg -T | grep -i oom-killer
# Check active listening ports for InfluxDB
sudo netstat -tulpn | grep influxd
# Edit influxdb.conf to fix bind-address or enable TSI
sudo nano /etc/influxdb/influxdb.conf
# Restart the service after making config changes
sudo systemctl restart influxdb
# Verify connection is working
curl -I http://localhost:8086/pingError Medic Editorial
A collective of senior SREs, DevOps engineers, and database administrators dedicated to untangling complex infrastructure issues and providing clear, actionable troubleshooting guides.