Error Medic

Fixing 'Prometheus Not Sending Alerts to Alertmanager' and Slack Notification Routing Failures

Resolve missing Prometheus alerts, troubleshoot Alertmanager configuration, and fix kube-prometheus-stack routing issues for Slack and OpsGenie notifications.

Last updated:
Last verified:
1,714 words
Key Takeaways
  • Network connectivity and DNS resolution failures between Prometheus and Alertmanager are the primary cause of 'dial tcp: lookup alertmanager: no such host' errors.
  • Invalid routing configurations in alertmanager.yml often result in alerts falling back to a default route or being dropped silently.
  • In kube-prometheus-stack (Prometheus Operator), AlertmanagerConfig CRDs are frequently ignored due to mismatched namespace selectors or label requirements.
  • Grafana 9+ unified alerting can cause confusion; ensure you know whether you are configuring Grafana-managed alerts or Prometheus-managed alerts.
  • Always validate your routing and rules using 'promtool check rules' and 'amtool config routes' before deploying changes.
Diagnostic Approaches for Alerting Failures
Diagnostic MethodWhen to UseTime RequiredRisk Level
Prometheus Targets UI (/targets)Checking if Prometheus can discover and scrape the Alertmanager endpoints.2 minsLow
Tail Alertmanager Pod LogsAlerts show as 'FIRING' in Prometheus but Slack/OpsGenie notifications are missing.5 minsLow
Validate AlertmanagerConfig CRDUsing kube-prometheus-stack and custom namespace alerting configurations are ignored.10 minsMedium
cURL Fake Alert PayloadTesting receiver integration (Slack, Discord, OpsGenie) independently of Prometheus rules.5 minsLow

Understanding the Prometheus Alerting Pipeline

When setting up alerting and monitoring with the Prometheus stack, the architecture is intentionally decoupled. Prometheus is responsible for evaluating PromQL expressions against time-series data. When an expression evaluates to true for a sustained period (the for duration), Prometheus fires an alert. However, Prometheus does not send notifications. Instead, it pushes these firing alerts to Alertmanager. Alertmanager then handles deduplication, grouping, silencing, and routing the alerts to external receivers like Slack, OpsGenie, Discord, or PagerDuty.

When an engineer reports "Prometheus not sending alerts to Alertmanager" or "grafana alert prometheus alertmanager integration failing," the breakdown can occur at three distinct boundaries:

  1. Prometheus Evaluation: The rule is incorrect, or the threshold is never breached.
  2. Prometheus to Alertmanager Transport: Prometheus cannot reach Alertmanager over the network, or the authentication is rejected.
  3. Alertmanager to Receiver Routing: Alertmanager receives the payload but fails to route it due to bad configuration or rejected API calls from Slack/Opsgenie.

Symptom 1: Alerts Firing in Prometheus, but Alertmanager is Empty

If you check the Prometheus UI (/alerts) and see alerts in the FIRING state, but the Alertmanager UI (/#/alerts) is completely empty, the issue lies in the transport layer.

The Error Logs

Check your Prometheus server logs. You will likely see one of these errors:

level=error ts=2024-05-12T10:00:00.000Z caller=notifier.go:527 component=notifier alertmanager=http://alertmanager:9093/api/v2/alerts count=1 msg="Error sending alert" err="Post \"http://alertmanager:9093/api/v2/alerts\": dial tcp: lookup alertmanager on 10.96.0.10:53: no such host"

level=error ... err="context deadline exceeded"

The Root Cause & Fix

This indicates Prometheus cannot resolve the Alertmanager hostname or cannot establish a TCP connection.

If using native Prometheus: Check your prometheus.yml under the alerting block. Ensure the target is correct.

alerting:
  alertmanagers:
  - static_configs:
    - targets:
      - 'alertmanager:9093'

If using kube-prometheus-stack (Prometheus Operator): The Prometheus CRD dynamically discovers Alertmanager instances. Verify that your Prometheus resource has the correct alerting configuration, often mapping to the Alertmanager Service in the cluster.

Run this to check if Prometheus Operator successfully linked them: kubectl get prometheus -n monitoring -o yaml | grep -A 10 alerting

Ensure that Network Policies in Kubernetes aren't blocking port 9093 traffic from the Prometheus pods to the Alertmanager pods.

Symptom 2: Alertmanager Receives Alerts, but Notifications Fail (Slack/OpsGenie)

You can see the alerts in the Alertmanager UI, but your Slack channels or OpsGenie queues are silent. This is a routing or receiver configuration issue.

The Error Logs

Tail the logs of your Alertmanager instance: kubectl logs -l app.kubernetes.io/name=alertmanager -n monitoring -c alertmanager

Look for: level=error ts=... caller=dispatch.go:354 component=dispatcher msg="Notify for alerts failed" num_alerts=1 err="opsgenie/opsgenie[0]: notify retry canceled due to unrecoverable error after 1 attempts: bad response status 401 Unauthorized"

Or for Slack: err="slack/slack[0]: notify retry canceled ... bad response status 400 Bad Request"

The Root Cause & Fix

A 401 Unauthorized for OpsGenie means your API key is invalid or lacks the 'Create and Update' alert rights. A 400 Bad Request for Slack usually means your message template is generating invalid JSON for the Slack webhook, or the Slack webhook URL is truncated/incorrect.

Let's look at a correct Prometheus Alertmanager Slack Config Example:

route:
  group_by: ['alertname', 'cluster', 'service']
  group_wait: 30s
  group_interval: 5m
  repeat_interval: 12h
  receiver: 'slack-default'

receivers:
- name: 'slack-default'
  slack_configs:
  - api_url: 'https://hooks.slack.com/services/T00000000/B00000000/XXXXXXXXXXXXXXXXXXXXXXXX'
    channel: '#alerts-prod'
    send_resolved: true
    title: '[{{ .Status | toUpper }}] {{ .GroupLabels.alertname }}'
    text: >-
      {{ range .Alerts -}}
      *Alert:* {{ .Annotations.summary }}\n
      *Description:* {{ .Annotations.description }}\n
      *Severity:* {{ .Labels.severity }}\n
      {{ end }}

Pro-tip for testing: Never wait for a real outage to test this. You can manually push an alert to Alertmanager to verify the Slack routing. (See the Code Block section for the exact curl command).

Symptom 3: Kube-Prometheus-Stack AlertmanagerConfig CRD is Ignored

If you install alertmanager prometheus via the kube-prometheus-stack Helm chart, the modern best practice is to configure routes via AlertmanagerConfig Custom Resource Definitions (CRDs), rather than manually editing the global alertmanager.yaml secret.

However, a massive point of friction is that users create an AlertmanagerConfig, and it simply doesn't show up in the Alertmanager UI configuration tab.

The Root Cause & Fix

By default, the Prometheus Operator's Alertmanager deployment is heavily scoped. It will ONLY look for AlertmanagerConfig resources that match a specific label selector, and often, only in specific namespaces.

Check your Alertmanager CRD spec: kubectl get alertmanager -n monitoring -o yaml

Look for alertmanagerConfigSelector and alertmanagerConfigNamespaceSelector. If your global helm values.yaml has this:

alertmanager:
  alertmanagerSpec:
    alertmanagerConfigSelector:
      matchLabels:
        alertmanagerConfig: standard

Then your AlertmanagerConfig YAML must include that label:

apiVersion: monitoring.coreos.com/v1alpha1
kind: AlertmanagerConfig
metadata:
  name: team-frontend-slack
  namespace: frontend
  labels:
    alertmanagerConfig: standard  # CRITICAL: Must match the selector!
spec:
  route:
    receiver: 'slack-frontend'
    groupBy: ['alertname']
  receivers:
  - name: 'slack-frontend'
    slackConfigs:
    - apiURL:
        key: url
        name: slack-webhook-secret
      channel: '#frontend-alerts'

If the namespaces do not match or the labels are missing, the Prometheus Operator silently ignores the object. No errors will be printed in the Alertmanager logs because Alertmanager isn't even aware the CRD exists; the Operator failed to translate it.

Integrating Grafana Alerts vs. Prometheus Alerts

A frequent source of confusion is the search term grafana alert prometheus alertmanager. With Grafana 9+, Grafana introduced Unified Alerting.

You now have a choice:

  1. Prometheus-managed alerts: Rules are written in YAML, evaluated by the Prometheus binary, and routed to the Alertmanager binary. Grafana simply displays the state of these external alerts.
  2. Grafana-managed alerts: You write PromQL queries inside the Grafana UI. Grafana evaluates them and uses Grafana's internal Alertmanager to send notifications.

If you want a true GitOps workflow (Infrastructure as Code), you should configure Prometheus rules and Alertmanager receivers via YAML/CRDs and connect your external Alertmanager to Grafana as a data source. This allows you to "grafana show prometheus alerts" under the Alerting tab, giving your developers a visual interface into the state of the cluster's alerts without managing the configuration via the UI.

How to Test Alerts in Prometheus End-to-End

To confidently say "I know how to setup alertmanager prometheus", you must know how to artificially trigger the entire pipeline.

  1. Validate Rules: Use promtool check rules /path/to/your/rules.yml. This ensures your PromQL syntax is valid before it ever reaches the server.
  2. Validate Routing: Use amtool config routes test --config.file=alertmanager.yml --verify.receivers=slack-default severity=critical. This command simulates an alert with the label severity=critical and outputs exactly which receiver it will hit.
  3. Inject Fake Alert: Bypass Prometheus entirely and POST directly to the Alertmanager API. This isolates whether the problem is "Prometheus not firing" or "Alertmanager not routing". If the cURL payload results in a Slack message, your Alertmanager is perfectly configured, and you must look at your Prometheus rule thresholds.

Monitoring and alerting are the lifeblood of SRE. By systematically isolating the failure domains—Evaluation, Transport, and Routing—you can quickly resolve configuration issues and ensure critical pages are never missed.

Frequently Asked Questions

bash
# 1. Validate your Alertmanager routing configuration offline
amtool config routes test --config.file=alertmanager.yml --verify.receivers=slack-default alertname=HighCPU severity=critical

# 2. Inject a fake alert directly into Alertmanager to test Slack/OpsGenie delivery
# Replace localhost:9093 with your Alertmanager endpoint or use port-forwarding in k8s
curl -XPOST -H "Content-Type: application/json" http://localhost:9093/api/v2/alerts -d '[
  {
    "labels": {
      "alertname": "TestAlertManual",
      "severity": "critical",
      "instance": "production-database-01"
    },
    "annotations": {
      "summary": "This is a manual test alert triggered via cURL",
      "description": "Validating that Alertmanager routes to Slack correctly."
    },
    "generatorURL": "http://prometheus.local/test"
  }
]'

# 3. Check Prometheus Operator Alertmanager logs for delivery errors
kubectl logs -l app.kubernetes.io/name=alertmanager -n monitoring -c alertmanager | grep "level=error"
E

Error Medic Editorial

The Error Medic Editorial team consists of senior DevOps engineers and Site Reliability Experts dedicated to demystifying cloud-native troubleshooting, Kubernetes infrastructure, and observability stacks.

Sources

Related Articles in Prometheus

Explore More DevOps Config Guides