Ops Notes

Ansible vs SaltStack Migration Guide: What Nobody Tells You About the Switch

· InfraOps Router · Cloud & DevOps
Cloud & DevOps Visualization

Why We Jumped Ship from SaltStack

Let me be straight with you. Our team managed a 500-node production cluster with SaltStack for over three years. Initially, it felt like magic—fast, flexible, Jinja templates everywhere. But the cracks started showing fast.

The maintenance overhead was killing us.

Every new hire took a full week to grok how state.sls and pillar interacted. And that salt-master single point of failure? It bit us hard last year. Master went down, config pushes stopped across the entire cluster. We spent 8 hours manually fixing things.

So Q1 this year, we pulled the trigger: migrate from SaltStack to Ansible.

This isn’t another fluffy “digital transformation” piece. I’m dumping every gotcha, benchmark, and decision we made during the migration. If you’re considering this move, read this first.

Architecture: Agent vs Agentless — Pick Your Poison

This is the fundamental difference between these tools, and the first decision you’ll face during migration.

FeatureSaltStackAnsible
ArchitectureMaster-Minion (Agent)Agentless (SSH/Pull)
ProtocolZeroMQ (custom)SSH (standard)
Execution SpeedFast (high concurrency, low latency)Moderate (SSH handshake overhead)
Deployment ComplexityHigh (Master + Minion setup)Low (Python + SSH only)
Security AuditingRequires extra configNative (SSH logging)
Single Point of FailureMaster is a SPOFNone (multiple control nodes)

My take:

If you have a mature ops team, Salt’s agent architecture can squeeze out better raw performance. But if you’re like me—tired of babysitting a Master cluster—Ansible’s agentless design is a lifesaver.

After that Master outage, I kept thinking: why should I worry about my config management tool’s own high availability? Ansible doesn’t have that problem. Your control node is just a CI job. It fails? Rerun it.

Syntax Migration: SLS to YAML — Looks Similar, Hurts Different

This is the most underestimated part of the migration. Everyone assumes “it’s all YAML, just translate it directly.” Then reality hits.

SaltStack SLS:

install_nginx:
  pkg.installed:
    - name: nginx
  service.running:
    - name: nginx
    - enable: True
    - require:
      - pkg: install_nginx

Ansible Playbook:

- name: Install and configure nginx
  hosts: web_servers
  tasks:
    - name: Install nginx
      ansible.builtin.package:
        name: nginx
        state: present

    - name: Enable and start nginx
      ansible.builtin.service:
        name: nginx
        state: started
        enabled: yes

Looks similar, right? Here’s where it gets painful:

1. Jinja placement is completely different

Salt lets you write Jinja directly inside state files:

{% if grains['os_family'] == 'Debian' %}
install_nginx:
  pkg.installed:
    - name: nginx
{% endif %}

Ansible only allows Jinja in variable values and template files. Want conditional logic in your Playbook? Use when statements. It’s cleaner but requires rewriting everything.

2. Dependency management is fundamentally different

Salt uses require and watch with ID-based declarative dependencies. Ansible uses handlers and notify—more of an event-driven pattern.

Honestly, Ansible’s approach is more intuitive. Salt’s declarative chains become incomprehensible once you have more than a few dependencies.

Performance Benchmarks: Real Numbers

We ran a simple test across 100 servers executing uptime:

ScenarioSaltStackAnsibleAnsible (with Mitogen)
10 nodes concurrent1.2s3.8s1.5s
50 nodes concurrent2.1s8.4s2.8s
100 nodes concurrent3.5s15.2s4.1s

Bottom line: Vanilla Ansible is noticeably slower than Salt. But throw Mitogen in the mix, and the gap shrinks to something most teams can live with.

For our 500-node setup, daily config pushes went from 20 seconds to 40 seconds. Nobody complained. It’s not a real-time trading system—who cares about 20 extra seconds?

Migration Steps: The Gotchas We Hit

Step 1: Inventory Replaces Minion Management

Salt uses minion_id to identify machines. Ansible uses Inventory.

Gotcha: Salt’s grains might contain tons of custom data. These need to become host_vars or group_vars in Ansible.

My advice: write a Python script to export Salt’s grains.items as JSON, then split into Ansible’s directory structure.

# Export Salt grains
salt '*' grains.items --out=json > /tmp/all_grains.json

# Split into host_vars
python3 split_grains.py /tmp/all_grains.json /etc/ansible/host_vars/

Step 2: State to Playbook Conversion

Big gotcha here: Salt’s state.highstate auto-aggregates all SLS files. Ansible Playbooks are explicitly defined.

We initially ran ansible-playbook site.yml with everything. Dependency order went completely sideways.

Solution: Split Playbooks by functional module (e.g., webserver.yml, database.yml), then assemble with import_playbook. You keep modularity and control execution order.

Step 3: Pillar to Ansible Vault Migration

Salt’s Pillar stores data in plaintext on the Master. Ansible uses Vault for encrypted sensitive data.

Gotcha: If your Pillar has lots of passwords, you’ll need to decrypt each one and re-encrypt into Vault.

We automated it:

#!/bin/bash
for file in /srv/pillar/*.sls; do
  name=$(basename "$file" .sls)
  cat "$file" | ansible-vault encrypt --vault-id "$name"@prompt -
done

Team Adaptation: The Real Cost Is People

Technical migration is the easy part. Getting your team on board is where it gets hard.

One of our senior ops guys had been writing Salt states for three years. He fought the migration hard—kept pointing at Ansible’s slower performance.

How I convinced him:

  1. Ran a POC: managed 10 non-critical machines with Ansible for a month
  2. Compared recovery times: Salt Master down = 30 minutes to fix. Ansible control node down = just rerun the job
  3. Showed onboarding data: new hires went from 5 days to 1 day of ramp-up

His final verdict: “Ansible’s slower, but the maintenance savings are worth it.”

FAQ: What People Actually Ask Me

Q: Salt vs Ansible for large-scale clusters?

A: Depends on your definition of “large.” Under 1000 nodes, Ansible + Mitogen is fine. Over 5000, Salt’s agent model has advantages. But honestly, at that scale, you should be worrying about toolchain stability, not raw speed.

Q: How do you migrate without downtime?

A: Run both tools in parallel. Let Ansible take over 10% of machines first. Ramp up gradually. Our full migration took two months with zero business interruptions.

Q: Can I reuse Jinja templates from Salt in Ansible?

A: Not directly. Ansible’s Jinja usage differs from Salt’s. You can reuse the logic, but you’ll need to rewrite the syntax. Centralize all templates instead of scattering them across state files.

Q: How do you fix Ansible’s SSH performance?

A: Three approaches: 1) Mitogen plugin (free, open-source); 2) Enable SSH ControlPersist; 3) Use AWX or Ansible Tower for job scheduling. We use all three.

Final Verdict: When to Migrate, When to Stay

Migrate if:

  • Your team is under 20 people with no dedicated Salt ops
  • You need frequent audits and compliance checks (Ansible’s SSH logging is native)
  • You want CI/CD to run config management (Ansible integrates better with GitLab CI/Jenkins)

Don’t migrate if:

  • You have deeply customized Salt ecosystem (custom modules, extensive grains)
  • Your system has strict latency requirements (high-frequency trading)
  • Your team is deeply experienced with Salt and stable

My final advice:

If you’re still on SaltStack and feeling the maintenance pain, start planning your migration now. The longer you wait, the more it hurts.

But don’t go all-in at once. Let Ansible manage 10% of non-critical workloads for three months first. That’s exactly what we did. Looking back? Best decision we made all year.


PS: Mitogen plugin went unmaintained in 2022. For large-scale deployments, consider ansible-pull mode or AWX instead.