Error Medic

Ansible Failed: Fix Connection Refused, Permission Denied & Timeout Errors

Fix Ansible failures including connection refused, permission denied, and timeout errors. Step-by-step diagnosis with real commands and verified solutions.

Last updated:
Last verified:
1,946 words
Key Takeaways
  • Connection refused usually means SSH is not running on port 22 of the target host, or a firewall is blocking access — verify with `nc -zv <host> 22` before touching Ansible config.
  • Permission denied is almost always a missing or mismatched SSH key, a wrong `ansible_user`, or sudo not configured correctly on the remote host.
  • Timeout errors stem from slow DNS resolution, high network latency, or `ansible_ssh_timeout` / `DEFAULT_TIMEOUT` set too low for your environment.
  • Quick fix summary: run `ansible <host> -m ping -vvvv` to get full SSH-level output, isolate which layer is failing (network, auth, privilege escalation), then apply the targeted fix from the comparison table below.
Fix Approaches Compared
MethodWhen to UseTime to ApplyRisk
Add/fix SSH public key in authorized_keysPermission denied with key auth2 minLow — non-destructive
Set ansible_password + sshpass for password authNo key-based auth available5 minMedium — credential exposure risk in logs
Configure become + sudo NOPASSWD in /etc/sudoersPermission denied during privilege escalation5 minMedium — grants passwordless sudo
Increase ansible_ssh_timeout and DEFAULT_TIMEOUTSporadic timeout on slow or remote hosts1 minLow — benign config change
Switch ansible_connection to paramikoOpenSSH incompatibility with old hosts2 minLow — slight performance overhead
Use ProxyJump / ansible_ssh_common_args for bastionConnection refused through a jump host10 minLow — adds network hop complexity
Fix /etc/hosts or DNS for target hostnamesTimeout caused by failed name resolution5 minLow — may affect other services
Disable host key checking (testing only)Known_hosts mismatch blocking first run1 minHigh — disables MITM protection

Understanding Ansible Failures

Ansible executes tasks by opening an SSH connection to each managed host, copying a small Python module, running it, and returning structured output. When any step in that chain breaks, Ansible surfaces a FAILED or UNREACHABLE task with a short error message. The three most common failure families — connection refused, permission denied, and timeout — each point to a different layer of the stack.

Understanding which layer is failing is the single most important diagnostic step. Resist the urge to tweak ansible.cfg blindly; instead, isolate the problem with the verbosity flag and standard Unix tools first.


Step 1: Enable Verbose Output and Reproduce the Failure

Run your playbook or ad-hoc command with -vvvv to expose the raw SSH command Ansible is constructing:

ansible all -m ping -vvvv -i inventory.ini

This reveals the exact SSH arguments, the control socket path, and the remote Python interpreter path. Copy the raw SSH command from the output and run it manually in your terminal. If the manual SSH succeeds but Ansible fails, the problem is in your Ansible configuration. If the manual SSH also fails, the problem is at the OS/network level and Ansible is not the right place to fix it.


Step 2: Diagnose Connection Refused

What you see:

fatal: [web01]: UNREACHABLE! => {
    "changed": false,
    "msg": "Failed to connect to the host via ssh: ssh: connect to host 192.168.1.10 port 22: Connection refused",
    "unreachable": true
}

Root causes and checks:

  1. SSH daemon not running — Connect to the host via console or out-of-band access and run systemctl status sshd. If it is stopped, start it with systemctl start sshd.

  2. Wrong port — Some hardened hosts move SSH to a non-standard port. Check sshd_config on the target: grep ^Port /etc/ssh/sshd_config. Set ansible_port in your inventory or group_vars to match.

  3. Firewall blocking port 22 — From your Ansible control node, run nc -zv <target_ip> 22. A Connection refused from nc confirms the port is closed or blocked. On the target host, check iptables -L -n or firewall-cmd --list-all. Open the port with:

    firewall-cmd --permanent --add-service=ssh && firewall-cmd --reload
    
  4. Wrong host or IP in inventory — Verify the host is resolvable: getent hosts web01. If DNS fails, add a static entry to /etc/hosts on the control node or use the IP directly in inventory.

  5. Jump host / bastion required — If the target is in a private subnet, you need a ProxyJump. In ansible.cfg or group_vars:

    ansible_ssh_common_args='-o ProxyJump=bastion.example.com'
    

Step 3: Diagnose Permission Denied

What you see:

fatal: [db01]: UNREACHABLE! => {
    "changed": false,
    "msg": "Failed to connect to the host via ssh: deploy@db01: Permission denied (publickey,gssapi-keyex,gssapi-with-mic).",
    "unreachable": true
}

or during privilege escalation:

fatal: [db01]: FAILED! => {
    "msg": "Missing sudo password"
}

Authentication-layer fixes:

  1. Verify the correct user — Check ansible_user in your inventory or remote_user in ansible.cfg. The user must exist on the remote host. Confirm with: ssh -i ~/.ssh/id_rsa deploy@db01.

  2. Check the SSH key — Ansible defaults to ~/.ssh/id_rsa. If your key is elsewhere, set ansible_ssh_private_key_file in inventory. Ensure the public key appears in ~/.ssh/authorized_keys on the target, with permissions 600 on the file and 700 on ~/.ssh.

  3. Repair authorized_keys — If you can access the host another way, add the key:

    ssh-copy-id -i ~/.ssh/id_rsa.pub deploy@db01
    

    Or manually append the public key and fix permissions:

    chmod 700 ~/.ssh && chmod 600 ~/.ssh/authorized_keys
    
  4. SELinux context on authorized_keys — On RHEL/CentOS systems with SELinux enforcing, a wrong file context blocks SSH key auth even though permissions look correct:

    restorecon -Rv ~/.ssh
    

Privilege escalation (sudo) fixes:

  1. Add become configuration — In your playbook or ansible.cfg:

    [privilege_escalation]
    become=True
    become_method=sudo
    become_user=root
    
  2. Configure passwordless sudo — Edit /etc/sudoers on the target (always use visudo):

    deploy ALL=(ALL) NOPASSWD: ALL
    

    For a narrower scope, restrict to specific commands.

  3. Provide the sudo password at runtime — If passwordless sudo is not acceptable:

    ansible-playbook site.yml --ask-become-pass
    

    Or store it encrypted in Ansible Vault.


Step 4: Diagnose Timeout Errors

What you see:

fatal: [app01]: UNREACHABLE! => {
    "changed": false,
    "msg": "Failed to connect to the host via ssh: ssh: connect to host app01 port 22: Operation timed out",
    "unreachable": true
}

or mid-task:

fatal: [app01]: FAILED! => {
    "msg": "Timeout (12s) waiting for privilege escalation prompt"
}

Timeout-layer fixes:

  1. Increase the SSH connection timeout — In ansible.cfg:

    [defaults]
    timeout = 60
    
    [ssh_connection]
    ssh_args = -o ConnectTimeout=60
    

    Or per-host in inventory: ansible_ssh_timeout=60.

  2. Slow DNS resolution — If getent hosts <target> takes several seconds, DNS is the bottleneck. Add the host to /etc/hosts on the control node, or set UseDNS no in sshd_config on target hosts to skip reverse-DNS lookup during SSH handshake.

  3. SSH multiplexing — Enable ControlPersist to reuse connections across tasks, dramatically reducing per-task overhead:

    [ssh_connection]
    ssh_args = -o ControlMaster=auto -o ControlPersist=60s
    pipelining = True
    
  4. GSSAPI delays on non-Kerberos hosts — SSH tries Kerberos auth before publickey on many distros. Disable it:

    [ssh_connection]
    ssh_args = -o GSSAPIAuthentication=no
    

    This alone can cut connection time by 5–30 seconds per host.

  5. Privilege escalation timeout — If sudo prompts are timing out, ensure requiretty is not set in /etc/sudoers (it breaks Ansible's non-interactive sudo):

    Defaults !requiretty
    

Step 5: Validate the Fix

After applying any change, validate with a targeted ping:

ansible <host_or_group> -m ping -i inventory.ini

Expected success output:

web01 | SUCCESS => {
    "changed": false,
    "ping": "pong"
}

Then run your playbook with --check (dry-run) before live execution:

ansible-playbook site.yml --check --diff -i inventory.ini

For CI/CD pipelines, add ANSIBLE_HOST_KEY_CHECKING=False only in ephemeral environments where hosts are freshly provisioned and MITM risk is zero. Never disable host key checking in production.

Frequently Asked Questions

bash
#!/usr/bin/env bash
# Ansible Failure Diagnostic Script
# Usage: bash ansible-diag.sh <target_host> [inventory_file]

HOST=${1:?"Usage: $0 <target_host> [inventory]"}
INVENTORY=${2:-"inventory.ini"}

echo "=== [1] Resolve hostname ==="
getent hosts "$HOST" || echo "WARN: hostname not resolvable -- check DNS or /etc/hosts"

echo ""
echo "=== [2] Test TCP port 22 (connection refused check) ==="
nc -zv -w 5 "$HOST" 22 2>&1 || echo "FAIL: port 22 unreachable -- check sshd and firewall"

echo ""
echo "=== [3] Ansible ping with full verbosity ==="
ansible "$HOST" -m ping -i "$INVENTORY" -vvvv 2>&1 | tee /tmp/ansible-ping-debug.log
echo "Full output saved to /tmp/ansible-ping-debug.log"

echo ""
echo "=== [4] Extract raw SSH command from debug log ==="
grep -o "ssh .*$HOST.*" /tmp/ansible-ping-debug.log | head -5

echo ""
echo "=== [5] Check current ansible.cfg in effect ==="
ansible --version | grep 'config file'

echo ""
echo "=== [6] Show effective variables for host ==="
ansible "$HOST" -m debug -a 'var=ansible_ssh_private_key_file' -i "$INVENTORY"
ansible "$HOST" -m debug -a 'var=ansible_user' -i "$INVENTORY"
ansible "$HOST" -m debug -a 'var=ansible_port' -i "$INVENTORY"

echo ""
echo "=== [7] Test privilege escalation ==="
ansible "$HOST" -m command -a 'whoami' -i "$INVENTORY" --become

echo ""
echo "=== [8] Check GSSAPI auth delay ==="
time ssh -o GSSAPIAuthentication=no -o BatchMode=yes "$HOST" echo ok 2>&1 || true
echo "If above is significantly faster, add GSSAPIAuthentication=no to ssh_args in ansible.cfg"

echo ""
echo "=== Diagnostic complete ==="
E

Error Medic Editorial

Error Medic Editorial is a team of senior DevOps and SRE engineers with combined experience managing infrastructure at scale across cloud and on-premises environments. We write practical, command-first troubleshooting guides tested against real systems.

Sources

Related Articles in Ansible

Explore More DevOps Config Guides