Error Medic

Fixing 'Connection Refused' and Timeout Errors in InfluxDB: A Complete Guide

Diagnose and resolve InfluxDB connection refused, out of memory (OOM), and slow query timeouts. Learn to tune influxdb.conf and optimize cardinality.

Last updated:
Last verified:
1,010 words
Key Takeaways
  • Connection refused usually indicates the InfluxDB process has crashed (often due to OOM) or is bound to the wrong interface.
  • High series cardinality is the leading cause of Out of Memory (OOM) kills by the Linux kernel.
  • Slow queries and timeouts are frequently caused by unbounded time ranges or lack of appropriate continuous queries/downsampling.
  • Tuning the [http] and [data] sections in influxdb.conf is critical for stabilizing high-throughput environments.
InfluxDB Troubleshooting Approaches Compared
SymptomRoot Cause AnalysisFix MethodDowntime Risk
Connection Refused (Port 8086)Check systemctl status and dmesg for OOM-killerIncrease RAM, adjust max-series-per-databaseHigh (Requires Restart)
Client TimeoutsCheck query log for long-running queriesOptimize query time range, add LIMIT, use TSI1Low
High CPU / Slow QueriesRun SHOW QUERIES, check for table scansKill bad queries, implement Continuous QueriesMedium

Understanding InfluxDB Connection and Performance Errors

When working with time-series data at scale, encountering a connection refused error on port 8086 is a rite of passage. While it might initially seem like a network or firewall issue, in the context of InfluxDB, a refused connection is almost always a symptom of a much deeper resource exhaustion problem—typically the process crashing due to an Out of Memory (OOM) event triggered by unbounded cardinality or an unoptimized query.

Diagnosing 'Connection Refused'

When your application suddenly throws dial tcp 127.0.0.1:8086: connect: connection refused, the first step is to verify the process state.

Often, the InfluxDB service has silently died. Check the system logs:

sudo systemctl status influxdb

If the service is inactive or in a failed state, the next crucial step is to check the kernel ring buffer for the notorious OOM killer:

sudo dmesg -T | grep -i 'killed process.*influxd'

If you see output here, your database didn't just crash; it was assassinated by the Linux kernel to protect system stability.

The Cardinality Problem: Why InfluxDB Runs Out of Memory

InfluxDB indexes tags to make querying fast. The combination of measurement, tag set, and field key creates a 'series'. If you use highly variable data as tags (like UUIDs, IP addresses, or random strings), your series cardinality explodes. The default in-memory index (TSI is the alternative) will consume all available RAM, leading directly to OOM kills and subsequent connection refused errors.

How to check cardinality:

You can use the built-in influx CLI to check your series count, though if the DB is crashing, you might need to start it in a constrained mode first.

SHOW EXACT SERIES EXACT CARDINALITY ON "your_database"

Resolving Timeouts and Slow Queries

If the connection isn't refused but clients are experiencing net/http: request canceled (Client.Timeout exceeded while awaiting headers), the issue is query performance.

InfluxDB will time out if a query attempts to scan too many shards or returns too many data points.

  1. Identify the culprit: Use SHOW QUERIES to find long-running queries.
  2. Kill the query: Use KILL QUERY <id> to stop it and free up resources.
  3. Optimize: Ensure all queries have a tight WHERE time > ... AND time < ... clause. If you are querying months of data, you must implement Downsampling via Continuous Queries (CQs) or Tasks (in InfluxDB 2.x) to aggregate data into lower-resolution buckets.

Configuration Fixes (influxdb.conf)

To prevent the database from taking itself down, implement safety limits in /etc/influxdb/influxdb.conf:

  • max-series-per-database: Limit the number of series to prevent OOM. (Default is 1,000,000).
  • max-values-per-tag: Prevent unbounded tag growth. (Default is 100,000).
  • query-timeout: Kill runaway queries before they consume all CPU. (e.g., "60s").
  • index-version = "tsi1": Switch from the in-memory index to the Time Series Index (TSI) if you legitimately need high cardinality. TSI spills the index to disk, trading IOPS for RAM.

After making changes, always restart the service: sudo systemctl restart influxdb.

Frequently Asked Questions

bash
# 1. Check if the process was killed by the kernel due to OOM
sudo dmesg -T | grep -i 'killed process.*influxd'

# 2. Check current active queries causing load
influx -execute 'SHOW QUERIES'

# 3. Kill a stuck query (replace 42 with the actual query ID)
# influx -execute 'KILL QUERY 42'

# 4. Check the size of your WAL and Data directories
sudo du -sh /var/lib/influxdb/wal
sudo du -sh /var/lib/influxdb/data

# 5. Review critical configuration limits (example grep)
grep -E '(max-series|query-timeout|index-version)' /etc/influxdb/influxdb.conf
E

Error Medic Editorial

Our SRE team specializes in high-availability database infrastructure, time-series data architectures, and distributed system performance tuning.

Sources

Related Articles in InfluxDB

Explore More Database Guides