Why We Jumped Ship from SaltStack
Let me be straight with you. Our team managed a 500-node production cluster with SaltStack for over three years. Initially, it felt like magic—fast, flexible, Jinja templates everywhere. But the cracks started showing fast.
The maintenance overhead was killing us.
Every new hire took a full week to grok how state.sls and pillar interacted. And that salt-master single point of failure? It bit us hard last year. Master went down, config pushes stopped across the entire cluster. We spent 8 hours manually fixing things.
So Q1 this year, we pulled the trigger: migrate from SaltStack to Ansible.
This isn’t another fluffy “digital transformation” piece. I’m dumping every gotcha, benchmark, and decision we made during the migration. If you’re considering this move, read this first.
Architecture: Agent vs Agentless — Pick Your Poison
This is the fundamental difference between these tools, and the first decision you’ll face during migration.
| Feature | SaltStack | Ansible |
|---|---|---|
| Architecture | Master-Minion (Agent) | Agentless (SSH/Pull) |
| Protocol | ZeroMQ (custom) | SSH (standard) |
| Execution Speed | Fast (high concurrency, low latency) | Moderate (SSH handshake overhead) |
| Deployment Complexity | High (Master + Minion setup) | Low (Python + SSH only) |
| Security Auditing | Requires extra config | Native (SSH logging) |
| Single Point of Failure | Master is a SPOF | None (multiple control nodes) |
My take:
If you have a mature ops team, Salt’s agent architecture can squeeze out better raw performance. But if you’re like me—tired of babysitting a Master cluster—Ansible’s agentless design is a lifesaver.
After that Master outage, I kept thinking: why should I worry about my config management tool’s own high availability? Ansible doesn’t have that problem. Your control node is just a CI job. It fails? Rerun it.
Syntax Migration: SLS to YAML — Looks Similar, Hurts Different
This is the most underestimated part of the migration. Everyone assumes “it’s all YAML, just translate it directly.” Then reality hits.
SaltStack SLS:
install_nginx:
pkg.installed:
- name: nginx
service.running:
- name: nginx
- enable: True
- require:
- pkg: install_nginx
Ansible Playbook:
- name: Install and configure nginx
hosts: web_servers
tasks:
- name: Install nginx
ansible.builtin.package:
name: nginx
state: present
- name: Enable and start nginx
ansible.builtin.service:
name: nginx
state: started
enabled: yes
Looks similar, right? Here’s where it gets painful:
1. Jinja placement is completely different
Salt lets you write Jinja directly inside state files:
{% if grains['os_family'] == 'Debian' %}
install_nginx:
pkg.installed:
- name: nginx
{% endif %}
Ansible only allows Jinja in variable values and template files. Want conditional logic in your Playbook? Use when statements. It’s cleaner but requires rewriting everything.
2. Dependency management is fundamentally different
Salt uses require and watch with ID-based declarative dependencies. Ansible uses handlers and notify—more of an event-driven pattern.
Honestly, Ansible’s approach is more intuitive. Salt’s declarative chains become incomprehensible once you have more than a few dependencies.
Performance Benchmarks: Real Numbers
We ran a simple test across 100 servers executing uptime:
| Scenario | SaltStack | Ansible | Ansible (with Mitogen) |
|---|---|---|---|
| 10 nodes concurrent | 1.2s | 3.8s | 1.5s |
| 50 nodes concurrent | 2.1s | 8.4s | 2.8s |
| 100 nodes concurrent | 3.5s | 15.2s | 4.1s |
Bottom line: Vanilla Ansible is noticeably slower than Salt. But throw Mitogen in the mix, and the gap shrinks to something most teams can live with.
For our 500-node setup, daily config pushes went from 20 seconds to 40 seconds. Nobody complained. It’s not a real-time trading system—who cares about 20 extra seconds?
Migration Steps: The Gotchas We Hit
Step 1: Inventory Replaces Minion Management
Salt uses minion_id to identify machines. Ansible uses Inventory.
Gotcha: Salt’s grains might contain tons of custom data. These need to become host_vars or group_vars in Ansible.
My advice: write a Python script to export Salt’s grains.items as JSON, then split into Ansible’s directory structure.
# Export Salt grains
salt '*' grains.items --out=json > /tmp/all_grains.json
# Split into host_vars
python3 split_grains.py /tmp/all_grains.json /etc/ansible/host_vars/
Step 2: State to Playbook Conversion
Big gotcha here: Salt’s state.highstate auto-aggregates all SLS files. Ansible Playbooks are explicitly defined.
We initially ran ansible-playbook site.yml with everything. Dependency order went completely sideways.
Solution: Split Playbooks by functional module (e.g., webserver.yml, database.yml), then assemble with import_playbook. You keep modularity and control execution order.
Step 3: Pillar to Ansible Vault Migration
Salt’s Pillar stores data in plaintext on the Master. Ansible uses Vault for encrypted sensitive data.
Gotcha: If your Pillar has lots of passwords, you’ll need to decrypt each one and re-encrypt into Vault.
We automated it:
#!/bin/bash
for file in /srv/pillar/*.sls; do
name=$(basename "$file" .sls)
cat "$file" | ansible-vault encrypt --vault-id "$name"@prompt -
done
Team Adaptation: The Real Cost Is People
Technical migration is the easy part. Getting your team on board is where it gets hard.
One of our senior ops guys had been writing Salt states for three years. He fought the migration hard—kept pointing at Ansible’s slower performance.
How I convinced him:
- Ran a POC: managed 10 non-critical machines with Ansible for a month
- Compared recovery times: Salt Master down = 30 minutes to fix. Ansible control node down = just rerun the job
- Showed onboarding data: new hires went from 5 days to 1 day of ramp-up
His final verdict: “Ansible’s slower, but the maintenance savings are worth it.”
FAQ: What People Actually Ask Me
Q: Salt vs Ansible for large-scale clusters?
A: Depends on your definition of “large.” Under 1000 nodes, Ansible + Mitogen is fine. Over 5000, Salt’s agent model has advantages. But honestly, at that scale, you should be worrying about toolchain stability, not raw speed.
Q: How do you migrate without downtime?
A: Run both tools in parallel. Let Ansible take over 10% of machines first. Ramp up gradually. Our full migration took two months with zero business interruptions.
Q: Can I reuse Jinja templates from Salt in Ansible?
A: Not directly. Ansible’s Jinja usage differs from Salt’s. You can reuse the logic, but you’ll need to rewrite the syntax. Centralize all templates instead of scattering them across state files.
Q: How do you fix Ansible’s SSH performance?
A: Three approaches: 1) Mitogen plugin (free, open-source); 2) Enable SSH ControlPersist; 3) Use AWX or Ansible Tower for job scheduling. We use all three.
Final Verdict: When to Migrate, When to Stay
Migrate if:
- Your team is under 20 people with no dedicated Salt ops
- You need frequent audits and compliance checks (Ansible’s SSH logging is native)
- You want CI/CD to run config management (Ansible integrates better with GitLab CI/Jenkins)
Don’t migrate if:
- You have deeply customized Salt ecosystem (custom modules, extensive grains)
- Your system has strict latency requirements (high-frequency trading)
- Your team is deeply experienced with Salt and stable
My final advice:
If you’re still on SaltStack and feeling the maintenance pain, start planning your migration now. The longer you wait, the more it hurts.
But don’t go all-in at once. Let Ansible manage 10% of non-critical workloads for three months first. That’s exactly what we did. Looking back? Best decision we made all year.
PS: Mitogen plugin went unmaintained in 2022. For large-scale deployments, consider ansible-pull mode or AWX instead.