Datadog Agent Not Reporting: Troubleshooting 'Agent is not sending metrics' and Connection Errors
Fix Datadog agent not reporting metrics. Learn to troubleshoot API key errors, site misconfigurations, NTP time drift, and network connectivity issues.
- Verify the API key and 'site' parameter in datadog.yaml to prevent 403 Forbidden errors.
- Check outbound network connectivity on port 443 to the specific Datadog intake servers for your region.
- Ensure NTP time synchronization is accurate; time skew causes Datadog to drop metric payloads.
- Inspect /var/log/datadog/agent.log for specific forwarder or intake connection errors.
| Method | When to Use | Time | Risk |
|---|---|---|---|
| Agent Status Command | Initial triage to see component health and forwarder queues | 1 min | Low |
| Flare Creation | When escalating to Datadog support or doing deep offline analysis | 3 mins | Low |
| Network Curl Test | Agent logs show 'Connection refused' or 'Timeout' | 2 mins | Low |
| NTP Sync Verification | Metrics are missing but logs show successful HTTP 200 posts | 5 mins | Medium |
Understanding the Error
When a Datadog Agent is not reporting, it means the host has lost communication with the Datadog backend intake servers. This manifests in the Datadog UI as gaps in metric graphs, hosts appearing as '???' or completely disappearing from the infrastructure list, and missing APM traces or logs. Because the Datadog Agent is a complex, multi-process daemon (core agent, trace-agent, process-agent), a failure in reporting can be systemic or localized to a specific telemetry type.
Typical error messages you might encounter in the logs include:
ERROR | (pkg/forwarder/worker.go) | Error while processing transaction: error: HTTP 403 Forbiddenx509: certificate signed by unknown authorityError: Post "https://5-0-0-app.agent.datadoghq.com/api/v1/validate": dial tcp: lookup 5-0-0-app.agent.datadoghq.com: no such hostcontext deadline exceeded (Client.Timeout exceeded while awaiting headers)
Root Causes
- Network & Firewall Restrictions: The agent relies on outbound HTTPS (port 443) to send data. If a firewall, security group, or egress proxy blocks this traffic, the agent will queue metrics until memory limits are reached, then drop them.
- Misconfigured Credentials or Region: Datadog operates across multiple isolated regions (US1, US3, US5, EU1, AP1). If your agent is configured with an API key for US1 but the
siteparameter indatadog.yamldefaults todatadoghq.com(US1) while your account is actually in EU1 (datadoghq.eu), the intake will reject the payloads with a 403 Forbidden error. - Time Synchronization (NTP) Drift: Datadog's intake servers validate the timestamps of incoming metrics. If your host's clock drifts significantly (typically > 10 minutes) from UTC, the payload will be successfully transmitted but silently dropped by the backend.
- Resource Starvation (OOMKilled): If the agent exceeds its memory limits—common in highly containerized Kubernetes environments without proper resource limits—the OS out-of-memory killer will terminate the process.
Step 1: Diagnose the Agent Status
The most critical first step is running the agent status command. This provides a comprehensive overview of the agent's health, configuration, and recent errors.
On Linux, run:
sudo datadog-agent status
Scroll down to the Forwarder section. This section tells you if the agent is successfully sending data to Datadog. Look for:
Transactions: A high number of dropped or retried transactions indicates network issues.API Key validation: Should sayvalid. If it saysinvalid, check yourdatadog.yaml.
Next, check the logs for real-time errors:
sudo tail -f /var/log/datadog/agent.log
Step 2: Validate Network Connectivity
If the forwarder is failing, simulate the agent's network traffic to isolate DNS or firewall issues. The agent connects to several endpoints (e.g., <VERSION>-app.agent.datadoghq.com). You can test general connectivity to the Datadog API endpoints using curl.
For US1 (default):
curl -v https://api.datadoghq.com
For EU1:
curl -v https://api.datadoghq.eu
If the curl command hangs or returns a connection timeout, your host is lacking outbound internet access on port 443. If you use a proxy, ensure the agent is configured to use it by setting the proxy block in datadog.yaml.
Step 3: Check Time Synchronization (NTP)
If the API key is valid, the network connects perfectly, and logs show HTTP 200 OK responses, but metrics still aren't appearing, check your host's clock.
Run date -u and compare it to a reliable time source. To verify your NTP sync status:
chronyc tracking or timedatectl status
If the system clock is inaccurate, restart your NTP service (chronyd or systemd-timesyncd) and force a synchronization.
Step 4: Fix Configuration and Restart
Most configuration issues stem from an incorrect datadog.yaml. Open /etc/datadog-agent/datadog.yaml (Linux) and verify:
api_key: <YOUR_API_KEY>site: <YOUR_DATADOG_SITE>(e.g.,datadoghq.com,us3.datadoghq.com,datadoghq.eu)
If you modify datadog.yaml, you must restart the agent for the changes to take effect:
sudo systemctl restart datadog-agent
Wait two minutes, then run sudo datadog-agent status again to verify the forwarder is successfully transmitting payloads.
Frequently Asked Questions
# --- Datadog Agent Troubleshooting Script ---
# 1. Check the overall status of the agent
sudo datadog-agent status
# 2. Check the logs for ERROR or WARN messages (specifically looking for API/Forwarder issues)
sudo grep -E "ERROR|WARN" /var/log/datadog/agent.log | tail -n 20
# 3. Test outbound network connectivity to the default US site (Change URL based on your region)
curl -v https://api.datadoghq.com
# 4. Check system time synchronization (NTP)
timedatectl status
# 5. Restart the Datadog Agent after applying any fixes in datadog.yaml
sudo systemctl restart datadog-agentError Medic Editorial
Error Medic Editorial comprises senior DevOps, SRE, and platform engineering experts dedicated to providing actionable, reliable troubleshooting guides for modern cloud infrastructure.