Troubleshooting 'Datadog Not Working': A Complete Diagnostic Guide
Fix Datadog agent not working or reporting metrics. Step-by-step guide to diagnose connection issues, API key errors, and integration failures.
- Verify the Datadog Agent is running and the service is not in a failed state.
- Check API and App keys for validity, ensuring they are correctly configured in datadog.yaml.
- Ensure network connectivity to Datadog endpoints on port 443 (outbound).
- Inspect agent logs for authentication errors or integration-specific failures.
| Method | When to Use | Time | Risk |
|---|---|---|---|
| Restart Agent | Agent is hung or unresponsive | 1 min | Low |
| Network Test | Agent cannot reach Datadog intake | 5 mins | Low |
| Update Keys | Authentication errors in agent logs | 2 mins | Low |
| Reinstall Agent | Corrupted installation or persistent crashes | 10 mins | Medium |
Understanding the Error
When developers or operations teams report that "Datadog is not working," the symptoms can vary widely. It might mean that host metrics are missing, APM traces are not appearing, or a specific integration is failing to collect data. The Datadog Agent is the core component responsible for collecting and forwarding this data. Therefore, troubleshooting almost always begins at the agent level.
Common error messages you might encounter in the Datadog agent logs include:
Error connecting to Datadog intake: dial tcp: lookup intake.logs.datadoghq.com: no such host(Network issue)API Key invalidorHTTP 403 Forbidden(Authentication issue)Agent failed to start: datadog.yaml is missing or invalid(Configuration issue)
Step 1: Diagnose the Agent Status
The first step is always to check the status of the Datadog Agent on the affected host. The agent provides a built-in status command that runs a comprehensive health check.
Run the agent status command (see the code block below for the exact command). Look for the following sections in the output:
- Agent (v7.x.x): Ensure the version is up-to-date and the process is running.
- Collector: This section lists all configured integrations. Look for integrations marked with
[ERROR]. - Forwarder: Check for dropped payloads or connection errors. If the forwarder is dropping data, it usually indicates a network or API key problem.
- Endpoints: Verify that the agent is trying to connect to the correct Datadog site (e.g.,
datadoghq.comvsdatadoghq.eu).
Step 2: Verify Network Connectivity
The Datadog Agent requires outbound internet access to send data to Datadog's servers. By default, it communicates over HTTPS (port 443).
If the agent status shows forwarder errors, test connectivity from the host to the Datadog intake endpoints. You can use tools like curl or telnet.
Ensure that your firewalls, security groups, or proxies are not blocking outbound traffic to Datadog's IP ranges.
Step 3: Check Configuration and Authentication
If the network is fine but data is still not appearing, the issue is likely with authentication or configuration.
- Check
datadog.yaml: The main configuration file is usually located at/etc/datadog-agent/datadog.yaml(Linux) orC:\ProgramData\Datadog\datadog.yaml(Windows). Ensure it exists and is readable by thedd-agentuser. - Verify API Key: The most critical setting is the
api_key. Ensure it is correct and active in your Datadog account. Do not confuse the API key with the Application Key. - Verify the Site: Ensure the
siteparameter matches your Datadog region (e.g.,datadoghq.com,datadoghq.eu,us3.datadoghq.com,us5.datadoghq.com,ap1.datadoghq.com). Using the wrong site will result in authentication failures.
Step 4: Inspect Agent Logs
If the status command doesn't reveal the root cause, dive into the agent logs. The main log file is typically /var/log/datadog/agent.log. Look for ERROR or WARN level messages.
Common log locations:
- Agent:
/var/log/datadog/agent.log - Trace Agent:
/var/log/datadog/trace-agent.log(for APM issues) - Process Agent:
/var/log/datadog/process-agent.log(for Live Process issues)
Step 5: Address Integration-Specific Issues
If core metrics are reporting but a specific integration (e.g., PostgreSQL, Nginx, Kubernetes) is not working:
- Check the integration configuration file in
/etc/datadog-agent/conf.d/<integration_name>.d/conf.yaml. - Ensure the Datadog agent has the necessary permissions to access the service it is monitoring (e.g., database user permissions, file read permissions for logs).
- Run the integration specifically to see detailed errors using the agent command line tool.
Frequently Asked Questions
# 1. Check Datadog Agent Service Status (Systemd)
sudo systemctl status datadog-agent
# 2. Run the comprehensive Datadog Agent status command
sudo datadog-agent status
# 3. Test network connectivity to Datadog US site
curl -v https://api.datadoghq.com
# 4. View recent agent logs for errors
sudo tail -n 50 /var/log/datadog/agent.log | grep -i error
# 5. Restart the Datadog Agent after configuration changes
sudo systemctl restart datadog-agentDevOps Troubleshooting Team
A collective of senior SREs and DevOps engineers dedicated to solving complex infrastructure and monitoring challenges.