Why does Ansible say 'Connection refused' when I can successfully ping the host?

Ping relies on the ICMP protocol, while Ansible utilizes SSH over the TCP protocol (typically port 22). A successful ping only confirms the server is powered on and reachable via routing. 'Connection refused' means the server is actively rejecting TCP port 22 traffic, usually because the SSH daemon is stopped or a local firewall is blocking the port.

How do I fix 'Permission denied (publickey)' in Ansible?

This error means the target server does not recognize your SSH public key. Ensure you are connecting as the correct remote user by setting `ansible_user` in your inventory. Additionally, verify that the path to your private key is correct using `ansible_ssh_private_key_file`, and confirm that the corresponding public key is appended to the target user's `~/.ssh/authorized_keys` file.

What causes Ansible timeouts when deploying to a large fleet of servers?

When running playbooks against hundreds of nodes, the Ansible control node can become resource-starved, or network bandwidth can saturate, causing SSH handshakes to exceed the default 10-second timeout. To fix this, increase the `timeout` variable in `ansible.cfg`, lower the number of `forks`, and ensure SSH multiplexing (`ControlMaster` and `ControlPersist`) is enabled.

Why does Ansible fail with permission denied only during specific tasks?

If Ansible connects successfully but fails on specific tasks (like installing packages or restarting services), you are lacking privilege escalation. Ensure you have `become: yes` added to those specific tasks or at the playbook level, and pass the `--ask-become-pass` flag if the remote user requires a password to execute sudo commands.

How can I debug an Ansible playbook that just hangs indefinitely?

Run the playbook with the `-vvvv` flag. This highest level of verbosity exposes the connection-level debug output. It will show you exactly where the hang occurs—whether it's waiting for an SSH key exchange, hanging on gathering facts due to a slow DNS lookup on the target, or waiting for a specific module to return output.

Resolving "Ansible Failed": Connection Refused, Permission Denied, and Timeouts

Fix "ansible failed" errors like connection refused, permission denied, and timeouts. Learn root causes, SSH config fixes, privilege escalation, and network twe

Last updated: February 24, 2026

Last verified: February 24, 2026

1,558 words

Key Takeaways

Connection Refused usually means the SSH service is down, the port is blocked by a firewall, or the IP is incorrect.
Permission Denied indicates incorrect SSH keys, wrong user context, or missing sudo/become privileges on the target node.
Timeout implies network latency, routing issues, or strict firewalls dropping packets silently.
Use the `-vvv` or `-vvvv` flags to expose the raw OpenSSH command Ansible uses, allowing for precise network layer debugging.

Diagnostic Approaches Compared
Error Type	Common Cause	Quick Diagnostic	Resolution Strategy
Connection Refused	SSH daemon down / port 22 closed	`nc -zv <target_ip> 22`	Start `sshd` or update AWS SG / local firewall
Permission Denied	Wrong SSH key or user	`ssh -v <user>@<ip>`	Set `ansible_user` and verify `authorized_keys`
Timeout	Network block / dropping packets	`ping <target_ip>` or `traceroute`	Increase `timeout` in `ansible.cfg`
Privilege Escalation Failed	Missing become password / sudo rights	`sudo -l` on target via normal SSH	Add `--ask-become-pass` or configure `visudo`

Understanding the Error

When operating at scale, encountering an ansible failed message is a routine event for DevOps engineers and SREs. Because Ansible is an agentless automation tool that relies heavily on standard SSH connections (for Linux/Unix) and WinRM (for Windows), the vast majority of its failure states are directly tied to network accessibility, authentication, or authorization issues.

Rather than a single bug, "Ansible failed" is an umbrella outcome. The specific string appended to the error—such as Connection refused, Permission denied, or Timeout—provides the exact breadcrumb trail needed to resolve the issue. In this comprehensive guide, we will break down the three most common SSH-related failure states, detail exactly what the error looks like in your terminal, and provide step-by-step remediation strategies.

1. Diagnosing "Ansible Connection Refused"

The Error Signature:

fatal: [10.0.5.21]: UNREACHABLE! => {
  "changed": false,
  "msg": "Failed to connect to the host via ssh: ssh: connect to host 10.0.5.21 port 22: Connection refused",
  "unreachable": true
}

Root Causes: A Connection refused error is a TCP-level rejection. It means your Ansible control node successfully routed a packet to the target IP address, but the target machine explicitly responded with an RST (Reset) packet. This typically happens for three reasons:

The SSH Daemon is down: The sshd service is stopped, crashed, or disabled on the target machine.
Firewall Rejection: A local firewall (like iptables, ufw, or firewalld) is actively rejecting connections to TCP port 22.
Wrong IP/Port: You are targeting the wrong IP address entirely, or the SSH daemon is listening on a non-standard port (e.g., 2222), but Ansible is defaulting to port 22.

Step-by-Step Fix:

Verify the Port: Ensure Ansible is targeting the correct port. If the host uses port 2222, define it in your inventory (ansible_port=2222).
Check Service Status: If you have console access (via AWS Systems Manager, VMware vCenter, or physical access), log in and verify the SSH service: sudo systemctl status sshd If it is inactive, start it with sudo systemctl start sshd.
Test TCP Connectivity: From your Ansible control node, bypass Ansible and test the raw TCP port: nc -zv 10.0.5.21 22 If this fails, you must investigate your network security groups, ACLs, and host-level firewalls.

2. Resolving "Ansible Permission Denied"

The Error Signature:

fatal: [webserver-01]: UNREACHABLE! => {
  "changed": false,
  "msg": "Failed to connect to the host via ssh: user@webserver-01: Permission denied (publickey,password).",
  "unreachable": true
}

Alternatively, you might see a success on the connection, but a failure during task execution:

fatal: [webserver-01]: FAILED! => {
  "msg": "Missing sudo password"
}

Root Causes: Permission denied implies that TCP port 22 is open and the SSH daemon is responding, but the authentication phase failed.

Key Mismatch: Your control node's public SSH key is missing from the target's ~/.ssh/authorized_keys file.
Incorrect User Context: Ansible defaults to using the username of the person executing the playbook. If you are logged in as jdoe on the control node, but need to connect to the target as ubuntu, authentication will fail.
Privilege Escalation Failure: The connection succeeded, but the task requires root privileges. You either forgot become: yes, or you need to supply a sudo password.

Step-by-Step Fix:

Define the Remote User: Explicitly declare the user in your inventory or playbook using ansible_user: ubuntu.
Validate SSH Keys: Ensure your SSH key is loaded into your agent (ssh-add -l). If using a specific key for a specific host, define it in your inventory: ansible_ssh_private_key_file=/path/to/key.pem
Fix Privilege Escalation: If the error occurs during a task that modifies system state (like installing a package), ensure your playbook has:
```
become: yes
become_method: sudo
```
If the target user requires a password for sudo, run your playbook with the --ask-become-pass (or -K) flag.

3. Troubleshooting "Ansible Timeout"

The Error Signature:

fatal: [db-node-01]: UNREACHABLE! => {
  "changed": false,
  "msg": "Data could not be sent to remote host \"db-node-01\". Make sure this host can be reached over ssh: ssh: connect to host db-node-01 port 22: Operation timed out",
  "unreachable": true
}

Root Causes: Unlike a Connection refused (which is an active rejection), a Timeout means packets are being sent into a black hole. No response is ever received.

Strict Firewalls: A firewall is configured to DROP packets instead of REJECT them.
Dead Host: The target machine is powered off, isolated from the network, or the IP address is simply unassigned.
High Latency/Overload: The network link is extremely slow, or the control node is overloaded by running too many forks simultaneously, causing the SSH handshake to exceed the default 10-second timeout.

Step-by-Step Fix:

Ping the Host: Test basic ICMP routing: ping -c 4 db-node-01 If it times out, you have a routing or firewall issue, not an Ansible issue.
Increase Ansible Timeout Settings: If the network is just slow or crossing a high-latency VPN, modify your ansible.cfg:
```
[defaults]
timeout = 30

[ssh_connection]
retries = 3
```
Optimize SSH Connections: For large fleets, ensure SSH multiplexing is enabled in ansible.cfg to reuse established TCP connections, drastically reducing overhead and timeout risks:
```
ssh_args = -o ControlMaster=auto -o ControlPersist=60s
```

The Ultimate Diagnostic Tool: `-vvvv`

When all else fails, run your playbook with the -vvvv flag. This "connection debugging" mode will print the exact, raw OpenSSH command that Ansible is executing under the hood.

You can literally copy and paste this command into your terminal. This strips Ansible out of the equation entirely. If the raw SSH command fails, you have a system or network administration problem. If the raw SSH command succeeds, but Ansible fails, you have a configuration issue within your inventory or ansible.cfg.

Frequently Asked Questions

bash

# 1. Test basic SSH connectivity bypassing Ansible
ssh -vvv -i /path/to/private_key.pem ubuntu@10.0.5.21

# 2. Test if TCP port 22 is open and accepting connections
nc -zv 10.0.5.21 22

# 3. Run playbook with maximum connection debugging and prompt for sudo password
ansible-playbook site.yml -i inventory.ini -vvvv --ask-become-pass

# 4. Recommended ansible.cfg optimizations to prevent timeouts
# Add these lines to your local ansible.cfg file:
# [defaults]
# timeout = 30
# host_key_checking = False
# 
# [ssh_connection]
# ssh_args = -o ControlMaster=auto -o ControlPersist=60s -o ServerAliveInterval=15
# retries = 3

Error Medic Editorial

Written by senior Site Reliability Engineers and DevOps practitioners. We specialize in demystifying infrastructure-as-code, CI/CD pipelines, and large-scale system administration to help you keep your production environments stable and efficient.

Sources

Explore More DevOps Config Guides

ArgoCD 'connection refused' Error: Complete Troubleshooting Guide (2024)

Fix ArgoCD 'connection refused', CrashLoopBackOff, ImagePullBackOff, and timeout errors with step-by-step diagnostic commands and proven solutions.

ArgoCD Connection Refused: Fix CrashLoopBackOff, ImagePullBackOff, Permission Denied & Timeout Errors

Fix ArgoCD connection refused errors: diagnose CrashLoopBackOff, ImagePullBackOff, permission denied, and timeout with step-by-step kubectl commands and config

AWS ECS Timeout: Task Failed ELB Health Checks & Container Startup Timeouts — Complete Fix Guide

Fix AWS ECS timeout errors including failed ELB health checks, container startup timeouts, and deployment stalls with step-by-step CLI commands and config fixes

Understanding the Error

1. Diagnosing "Ansible Connection Refused"

2. Resolving "Ansible Permission Denied"

3. Troubleshooting "Ansible Timeout"

The Ultimate Diagnostic Tool: -vvvv

Frequently Asked Questions

Why does Ansible say 'Connection refused' when I can successfully ping the host?

How do I fix 'Permission denied (publickey)' in Ansible?

What causes Ansible timeouts when deploying to a large fleet of servers?

Why does Ansible fail with permission denied only during specific tasks?

How can I debug an Ansible playbook that just hangs indefinitely?

Sources

Related Articles in Ansible

Explore More DevOps Config Guides

The Ultimate Diagnostic Tool: `-vvvv`