Resolving "Ansible Failed": Connection Refused, Permission Denied, and Timeouts
Fix "ansible failed" errors like connection refused, permission denied, and timeouts. Learn root causes, SSH config fixes, privilege escalation, and network twe
- Connection Refused usually means the SSH service is down, the port is blocked by a firewall, or the IP is incorrect.
- Permission Denied indicates incorrect SSH keys, wrong user context, or missing sudo/become privileges on the target node.
- Timeout implies network latency, routing issues, or strict firewalls dropping packets silently.
- Use the `-vvv` or `-vvvv` flags to expose the raw OpenSSH command Ansible uses, allowing for precise network layer debugging.
| Error Type | Common Cause | Quick Diagnostic | Resolution Strategy |
|---|---|---|---|
| Connection Refused | SSH daemon down / port 22 closed | `nc -zv <target_ip> 22` | Start `sshd` or update AWS SG / local firewall |
| Permission Denied | Wrong SSH key or user | `ssh -v <user>@<ip>` | Set `ansible_user` and verify `authorized_keys` |
| Timeout | Network block / dropping packets | `ping <target_ip>` or `traceroute` | Increase `timeout` in `ansible.cfg` |
| Privilege Escalation Failed | Missing become password / sudo rights | `sudo -l` on target via normal SSH | Add `--ask-become-pass` or configure `visudo` |
Understanding the Error
When operating at scale, encountering an ansible failed message is a routine event for DevOps engineers and SREs. Because Ansible is an agentless automation tool that relies heavily on standard SSH connections (for Linux/Unix) and WinRM (for Windows), the vast majority of its failure states are directly tied to network accessibility, authentication, or authorization issues.
Rather than a single bug, "Ansible failed" is an umbrella outcome. The specific string appended to the error—such as Connection refused, Permission denied, or Timeout—provides the exact breadcrumb trail needed to resolve the issue. In this comprehensive guide, we will break down the three most common SSH-related failure states, detail exactly what the error looks like in your terminal, and provide step-by-step remediation strategies.
1. Diagnosing "Ansible Connection Refused"
The Error Signature:
fatal: [10.0.5.21]: UNREACHABLE! => {
"changed": false,
"msg": "Failed to connect to the host via ssh: ssh: connect to host 10.0.5.21 port 22: Connection refused",
"unreachable": true
}
Root Causes:
A Connection refused error is a TCP-level rejection. It means your Ansible control node successfully routed a packet to the target IP address, but the target machine explicitly responded with an RST (Reset) packet. This typically happens for three reasons:
- The SSH Daemon is down: The
sshdservice is stopped, crashed, or disabled on the target machine. - Firewall Rejection: A local firewall (like
iptables,ufw, orfirewalld) is actively rejecting connections to TCP port 22. - Wrong IP/Port: You are targeting the wrong IP address entirely, or the SSH daemon is listening on a non-standard port (e.g., 2222), but Ansible is defaulting to port 22.
Step-by-Step Fix:
- Verify the Port: Ensure Ansible is targeting the correct port. If the host uses port 2222, define it in your inventory (
ansible_port=2222). - Check Service Status: If you have console access (via AWS Systems Manager, VMware vCenter, or physical access), log in and verify the SSH service:
sudo systemctl status sshdIf it is inactive, start it withsudo systemctl start sshd. - Test TCP Connectivity: From your Ansible control node, bypass Ansible and test the raw TCP port:
nc -zv 10.0.5.21 22If this fails, you must investigate your network security groups, ACLs, and host-level firewalls.
2. Resolving "Ansible Permission Denied"
The Error Signature:
fatal: [webserver-01]: UNREACHABLE! => {
"changed": false,
"msg": "Failed to connect to the host via ssh: user@webserver-01: Permission denied (publickey,password).",
"unreachable": true
}
Alternatively, you might see a success on the connection, but a failure during task execution:
fatal: [webserver-01]: FAILED! => {
"msg": "Missing sudo password"
}
Root Causes:
Permission denied implies that TCP port 22 is open and the SSH daemon is responding, but the authentication phase failed.
- Key Mismatch: Your control node's public SSH key is missing from the target's
~/.ssh/authorized_keysfile. - Incorrect User Context: Ansible defaults to using the username of the person executing the playbook. If you are logged in as
jdoeon the control node, but need to connect to the target asubuntu, authentication will fail. - Privilege Escalation Failure: The connection succeeded, but the task requires
rootprivileges. You either forgotbecome: yes, or you need to supply a sudo password.
Step-by-Step Fix:
- Define the Remote User: Explicitly declare the user in your inventory or playbook using
ansible_user: ubuntu. - Validate SSH Keys: Ensure your SSH key is loaded into your agent (
ssh-add -l). If using a specific key for a specific host, define it in your inventory:ansible_ssh_private_key_file=/path/to/key.pem - Fix Privilege Escalation: If the error occurs during a task that modifies system state (like installing a package), ensure your playbook has:
If the target user requires a password for sudo, run your playbook with thebecome: yes become_method: sudo--ask-become-pass(or-K) flag.
3. Troubleshooting "Ansible Timeout"
The Error Signature:
fatal: [db-node-01]: UNREACHABLE! => {
"changed": false,
"msg": "Data could not be sent to remote host \"db-node-01\". Make sure this host can be reached over ssh: ssh: connect to host db-node-01 port 22: Operation timed out",
"unreachable": true
}
Root Causes:
Unlike a Connection refused (which is an active rejection), a Timeout means packets are being sent into a black hole. No response is ever received.
- Strict Firewalls: A firewall is configured to
DROPpackets instead ofREJECTthem. - Dead Host: The target machine is powered off, isolated from the network, or the IP address is simply unassigned.
- High Latency/Overload: The network link is extremely slow, or the control node is overloaded by running too many forks simultaneously, causing the SSH handshake to exceed the default 10-second timeout.
Step-by-Step Fix:
- Ping the Host: Test basic ICMP routing:
ping -c 4 db-node-01If it times out, you have a routing or firewall issue, not an Ansible issue. - Increase Ansible Timeout Settings: If the network is just slow or crossing a high-latency VPN, modify your
ansible.cfg:[defaults] timeout = 30 [ssh_connection] retries = 3 - Optimize SSH Connections: For large fleets, ensure SSH multiplexing is enabled in
ansible.cfgto reuse established TCP connections, drastically reducing overhead and timeout risks:ssh_args = -o ControlMaster=auto -o ControlPersist=60s
The Ultimate Diagnostic Tool: -vvvv
When all else fails, run your playbook with the -vvvv flag. This "connection debugging" mode will print the exact, raw OpenSSH command that Ansible is executing under the hood.
You can literally copy and paste this command into your terminal. This strips Ansible out of the equation entirely. If the raw SSH command fails, you have a system or network administration problem. If the raw SSH command succeeds, but Ansible fails, you have a configuration issue within your inventory or ansible.cfg.
Frequently Asked Questions
# 1. Test basic SSH connectivity bypassing Ansible
ssh -vvv -i /path/to/private_key.pem ubuntu@10.0.5.21
# 2. Test if TCP port 22 is open and accepting connections
nc -zv 10.0.5.21 22
# 3. Run playbook with maximum connection debugging and prompt for sudo password
ansible-playbook site.yml -i inventory.ini -vvvv --ask-become-pass
# 4. Recommended ansible.cfg optimizations to prevent timeouts
# Add these lines to your local ansible.cfg file:
# [defaults]
# timeout = 30
# host_key_checking = False
#
# [ssh_connection]
# ssh_args = -o ControlMaster=auto -o ControlPersist=60s -o ServerAliveInterval=15
# retries = 3Error Medic Editorial
Written by senior Site Reliability Engineers and DevOps practitioners. We specialize in demystifying infrastructure-as-code, CI/CD pipelines, and large-scale system administration to help you keep your production environments stable and efficient.