Datadog Not Working: Troubleshooting Agent Status, APM Drops, and Connectivity Errors
Resolve "Datadog not working" issues. Learn to diagnose stopped agents, invalid API keys, port 8126 APM conflicts, and blocked outbound network traffic.
- Verify the Datadog Agent is actually running and API keys are valid by executing 'sudo datadog-agent status'.
- Ensure outbound network traffic to Datadog endpoints (port 443) is not blocked by firewalls, VPC endpoints, or security groups.
- For missing APM traces, check for port 8126 conflicts and verify the trace-agent service is active and receiving data.
- Examine '/var/log/datadog/agent.log' for localized errors like 'Connection refused' or 'API Key is invalid'.
- Use the 'datadog-agent flare' command to securely bundle logs and configuration files for Datadog Support if immediate fixes fail.
| Method | When to Use | Time | Risk |
|---|---|---|---|
| Agent Restart | Initial triage for frozen metrics or unresponsive agent | 1 min | Low |
| Verify API Keys | Agent status explicitly reports 'Invalid API Key' | 2 mins | Low |
| Network / Firewall Audit | Agent logs show 'dial tcp: i/o timeout' or 'no such host' | 15 mins | Medium |
| Port Conflict Resolution | APM is missing and port 8126 is bound by another service | 10 mins | Medium |
| Agent Reinstallation | Corrupted binaries, botched upgrades, or missing config files | 10 mins | Medium |
Understanding the Error
When engineers report that "Datadog is not working," the issue typically manifests in one of three distinct ways: missing infrastructure metrics, dropped APM traces, or silent log forwarders. Because the Datadog Agent is a sophisticated daemon running multiple concurrent subprocesses (core agent, trace-agent, process-agent, security-agent), a failure in any single component can create dangerous blind spots in your observability pipeline.
The most critical first step is identifying which part of Datadog is failing. Is the entire host offline in the Datadog UI, or are you just missing specific custom metrics? Are your logs flowing, but your distributed traces breaking?
Common Error Messages
Before making changes, check /var/log/datadog/agent.log (Linux) or C:\ProgramData\Datadog\logs\agent.log (Windows) for these exact error strings:
Error: API Key is invaliddial tcp: lookup intake.logs.datadoghq.com: no such hostAgent is not runningFailed to send traces: payload too largeconnection refused: port 8126
Step 1: Diagnose with the Status Command
The Datadog Agent comes with a built-in diagnostic tool. Running the status command provides a comprehensive overview of the agent's health, collector processes, and forwarder status.
Run the following command on your host:
sudo datadog-agent status
Pay close attention to the Forwarder section. If you see multiple retries or dropped payloads, you are likely dealing with a network issue or an invalid API key.
API Key Validation:
If the status output shows API Key is invalid, verify your datadog.yaml file. Ensure that api_key matches the key provided in your Datadog Organization Settings. Remember that Datadog has multiple sites (e.g., US1, US3, EU). If your account is in EU, but your agent is defaulting to US1, your API key will register as invalid. Ensure site: datadoghq.eu (or your specific site) is correctly set in datadog.yaml.
Step 2: Fix Network and Connectivity Issues
The Datadog Agent pushes data outwards via HTTPS on port 443. It does not require any inbound ports to be opened. If metrics aren't showing up, a firewall, proxy, or security group is likely blocking outbound traffic.
To isolate DNS and network routing issues, try connecting to Datadog's infrastructure endpoints directly from the affected host:
curl -v https://app.datadoghq.com
curl -v https://intake.logs.datadoghq.com
If the connection times out (dial tcp: i/o timeout), verify your AWS Security Groups, Azure NSGs, or local iptables rules. If you are operating in an air-gapped environment or a strict enterprise network, you must configure the Datadog Agent to route its traffic through your corporate proxy. Edit the datadog.yaml file to include your proxy settings under the proxy block.
Step 3: Troubleshoot Missing APM Traces
If infrastructure metrics are visible but APM traces are missing, the issue is isolated to the trace-agent.
Applications send traces locally to the Datadog trace-agent over port 8126 (TCP) and stats over port 8125 (UDP). If another application on your host is listening on port 8126, the trace-agent will fail to bind to it, and traces will be dropped into the void.
Check for port conflicts:
sudo netstat -tulpn | grep 8126
If the port is available but traces are still failing, verify that your application's Datadog tracing library (e.g., dd-trace-js, dd-trace-py) is configured with the correct environment variables. Specifically, ensure DD_AGENT_HOST is set correctly. In containerized environments like Kubernetes or ECS, DD_AGENT_HOST must often point to the host's IP address or the dedicated DaemonSet service rather than localhost.
Step 4: Resolve Log Collection Failures
If logs are not appearing in Datadog, confirm that log collection is explicitly enabled. By default, log collection is disabled in the Datadog Agent to prevent unexpected billing spikes.
In your datadog.yaml, verify:
logs_enabled: true
Next, verify the Datadog Agent user (dd-agent) has the necessary read permissions for your log files. If your application writes logs to /var/log/myapp/app.log, but those files are owned by root with 600 permissions, the Datadog Agent will silently fail to read them. Grant read access to the dd-agent user or add dd-agent to the appropriate user group.
Step 5: The Nuclear Option (Flare)
If you have verified API keys, network connectivity, port availability, and file permissions, and the agent still isn't working, it's time to gather the diagnostic bundle known as a "flare".
Running sudo datadog-agent flare will compress all agent logs, configurations, and internal state into a single archive and upload it directly to Datadog Support. It redacts sensitive passwords and provides the support engineers with exactly what they need to debug complex state corruption or edge-case bugs.
Frequently Asked Questions
# 1. Check the complete status of the Datadog Agent
sudo datadog-agent status
# 2. Check for trace-agent port conflicts (APM issues)
sudo netstat -tulpn | grep 8126
# 3. Test outbound network connectivity to Datadog API
curl -v https://app.datadoghq.com
# 4. Tail the agent logs for real-time error messages
sudo tail -f /var/log/datadog/agent.log
# 5. Generate a support flare (if all else fails)
sudo datadog-agent flareError Medic Editorial
Senior SRE and DevOps engineering team specializing in deep-dive troubleshooting for enterprise observability platforms, cloud infrastructure, and distributed systems.