Error Medic

Resolving "Ansible Failed": Connection Refused, Permission Denied, and Timeouts

Fix "ansible failed" errors like connection refused, permission denied, and timeouts. Learn root causes, SSH config fixes, privilege escalation, and network twe

Last updated:
Last verified:
1,558 words
Key Takeaways
  • Connection Refused usually means the SSH service is down, the port is blocked by a firewall, or the IP is incorrect.
  • Permission Denied indicates incorrect SSH keys, wrong user context, or missing sudo/become privileges on the target node.
  • Timeout implies network latency, routing issues, or strict firewalls dropping packets silently.
  • Use the `-vvv` or `-vvvv` flags to expose the raw OpenSSH command Ansible uses, allowing for precise network layer debugging.
Diagnostic Approaches Compared
Error TypeCommon CauseQuick DiagnosticResolution Strategy
Connection RefusedSSH daemon down / port 22 closed`nc -zv <target_ip> 22`Start `sshd` or update AWS SG / local firewall
Permission DeniedWrong SSH key or user`ssh -v <user>@<ip>`Set `ansible_user` and verify `authorized_keys`
TimeoutNetwork block / dropping packets`ping <target_ip>` or `traceroute`Increase `timeout` in `ansible.cfg`
Privilege Escalation FailedMissing become password / sudo rights`sudo -l` on target via normal SSHAdd `--ask-become-pass` or configure `visudo`

Understanding the Error

When operating at scale, encountering an ansible failed message is a routine event for DevOps engineers and SREs. Because Ansible is an agentless automation tool that relies heavily on standard SSH connections (for Linux/Unix) and WinRM (for Windows), the vast majority of its failure states are directly tied to network accessibility, authentication, or authorization issues.

Rather than a single bug, "Ansible failed" is an umbrella outcome. The specific string appended to the error—such as Connection refused, Permission denied, or Timeout—provides the exact breadcrumb trail needed to resolve the issue. In this comprehensive guide, we will break down the three most common SSH-related failure states, detail exactly what the error looks like in your terminal, and provide step-by-step remediation strategies.


1. Diagnosing "Ansible Connection Refused"

The Error Signature:

fatal: [10.0.5.21]: UNREACHABLE! => {
  "changed": false,
  "msg": "Failed to connect to the host via ssh: ssh: connect to host 10.0.5.21 port 22: Connection refused",
  "unreachable": true
}

Root Causes: A Connection refused error is a TCP-level rejection. It means your Ansible control node successfully routed a packet to the target IP address, but the target machine explicitly responded with an RST (Reset) packet. This typically happens for three reasons:

  1. The SSH Daemon is down: The sshd service is stopped, crashed, or disabled on the target machine.
  2. Firewall Rejection: A local firewall (like iptables, ufw, or firewalld) is actively rejecting connections to TCP port 22.
  3. Wrong IP/Port: You are targeting the wrong IP address entirely, or the SSH daemon is listening on a non-standard port (e.g., 2222), but Ansible is defaulting to port 22.

Step-by-Step Fix:

  1. Verify the Port: Ensure Ansible is targeting the correct port. If the host uses port 2222, define it in your inventory (ansible_port=2222).
  2. Check Service Status: If you have console access (via AWS Systems Manager, VMware vCenter, or physical access), log in and verify the SSH service: sudo systemctl status sshd If it is inactive, start it with sudo systemctl start sshd.
  3. Test TCP Connectivity: From your Ansible control node, bypass Ansible and test the raw TCP port: nc -zv 10.0.5.21 22 If this fails, you must investigate your network security groups, ACLs, and host-level firewalls.

2. Resolving "Ansible Permission Denied"

The Error Signature:

fatal: [webserver-01]: UNREACHABLE! => {
  "changed": false,
  "msg": "Failed to connect to the host via ssh: user@webserver-01: Permission denied (publickey,password).",
  "unreachable": true
}

Alternatively, you might see a success on the connection, but a failure during task execution:

fatal: [webserver-01]: FAILED! => {
  "msg": "Missing sudo password"
}

Root Causes: Permission denied implies that TCP port 22 is open and the SSH daemon is responding, but the authentication phase failed.

  1. Key Mismatch: Your control node's public SSH key is missing from the target's ~/.ssh/authorized_keys file.
  2. Incorrect User Context: Ansible defaults to using the username of the person executing the playbook. If you are logged in as jdoe on the control node, but need to connect to the target as ubuntu, authentication will fail.
  3. Privilege Escalation Failure: The connection succeeded, but the task requires root privileges. You either forgot become: yes, or you need to supply a sudo password.

Step-by-Step Fix:

  1. Define the Remote User: Explicitly declare the user in your inventory or playbook using ansible_user: ubuntu.
  2. Validate SSH Keys: Ensure your SSH key is loaded into your agent (ssh-add -l). If using a specific key for a specific host, define it in your inventory: ansible_ssh_private_key_file=/path/to/key.pem
  3. Fix Privilege Escalation: If the error occurs during a task that modifies system state (like installing a package), ensure your playbook has:
    become: yes
    become_method: sudo
    
    If the target user requires a password for sudo, run your playbook with the --ask-become-pass (or -K) flag.

3. Troubleshooting "Ansible Timeout"

The Error Signature:

fatal: [db-node-01]: UNREACHABLE! => {
  "changed": false,
  "msg": "Data could not be sent to remote host \"db-node-01\". Make sure this host can be reached over ssh: ssh: connect to host db-node-01 port 22: Operation timed out",
  "unreachable": true
}

Root Causes: Unlike a Connection refused (which is an active rejection), a Timeout means packets are being sent into a black hole. No response is ever received.

  1. Strict Firewalls: A firewall is configured to DROP packets instead of REJECT them.
  2. Dead Host: The target machine is powered off, isolated from the network, or the IP address is simply unassigned.
  3. High Latency/Overload: The network link is extremely slow, or the control node is overloaded by running too many forks simultaneously, causing the SSH handshake to exceed the default 10-second timeout.

Step-by-Step Fix:

  1. Ping the Host: Test basic ICMP routing: ping -c 4 db-node-01 If it times out, you have a routing or firewall issue, not an Ansible issue.
  2. Increase Ansible Timeout Settings: If the network is just slow or crossing a high-latency VPN, modify your ansible.cfg:
    [defaults]
    timeout = 30
    
    [ssh_connection]
    retries = 3
    
  3. Optimize SSH Connections: For large fleets, ensure SSH multiplexing is enabled in ansible.cfg to reuse established TCP connections, drastically reducing overhead and timeout risks:
    ssh_args = -o ControlMaster=auto -o ControlPersist=60s
    

The Ultimate Diagnostic Tool: -vvvv

When all else fails, run your playbook with the -vvvv flag. This "connection debugging" mode will print the exact, raw OpenSSH command that Ansible is executing under the hood.

You can literally copy and paste this command into your terminal. This strips Ansible out of the equation entirely. If the raw SSH command fails, you have a system or network administration problem. If the raw SSH command succeeds, but Ansible fails, you have a configuration issue within your inventory or ansible.cfg.

Frequently Asked Questions

bash
# 1. Test basic SSH connectivity bypassing Ansible
ssh -vvv -i /path/to/private_key.pem ubuntu@10.0.5.21

# 2. Test if TCP port 22 is open and accepting connections
nc -zv 10.0.5.21 22

# 3. Run playbook with maximum connection debugging and prompt for sudo password
ansible-playbook site.yml -i inventory.ini -vvvv --ask-become-pass

# 4. Recommended ansible.cfg optimizations to prevent timeouts
# Add these lines to your local ansible.cfg file:
# [defaults]
# timeout = 30
# host_key_checking = False
# 
# [ssh_connection]
# ssh_args = -o ControlMaster=auto -o ControlPersist=60s -o ServerAliveInterval=15
# retries = 3
E

Error Medic Editorial

Written by senior Site Reliability Engineers and DevOps practitioners. We specialize in demystifying infrastructure-as-code, CI/CD pipelines, and large-scale system administration to help you keep your production environments stable and efficient.

Sources

Related Articles in Ansible

Explore More DevOps Config Guides