Ansible: pure (only in its) pragmatism

2025-04-15

I’ve been using Ansible for the better part of a decade. I think I can provide some commentary on its merits and shortcomings. More importantly, I’ll try to analyze the approach it takes, its stature in the community and the implications of both to its utility.

What it is, what it does

It’s a tool that allows you to remotely (and locally) do various administration tasks, to put remote systems into some state, with an accent on idempotence.

The main course is the ability to get things to happen on remote machines via SSH or WinRM (Windows’ analogue) in an idempotent way. You declare the desired state, for example: a folder should exist at some path, with such-and-such permissions and ownership. If it doesn’t - Ansible creates it. If it does, but the permissions are wrong, it changes the permissions. If all the required conditions are already met, nothing happens.

The same is available for all the ordinary administration tasks you’d expect, like managing users, groups, systemd units, firewall configuration, packages in the package manager, etc. It also gives you generic file editing capabilities (altering lines or blocks), which integrates the Jinja2 templating for files and vars.

You can also imperatively run an executable or a piece of shell code, but that is discouraged, and if it’s unavoidable, you’re encouraged to wrap it in a way that makes it somewhat idempotent (check if something is in the desired state, only alter it if it isn’t).

Ansible also adds a layer of universal control logic, with looping, conditions, function-like code reuse (roles), try-catch equivalents, inclusion, et cetera.

This is all available through a YAML-based DSL akin to Kubernetes or Docker Compose, but with Jinja2 templating available everywhere.

Ansible has by far the biggest and most prolific community of all the tools in the category, and by far the largest base of plugins and reusable code (roles) available through its package manager - repository combo known as Ansible Galaxy.

Apart from things you can have it do remotely on another machine, there’s a large number of official and community based integrations with APIs, so you may easily obtain credentials from HashiCorp Vault, register runners in GitLab, import and export Grafana dashboards, etc.

Design choices

First obvious thing is that Ansible relies much on file hierarchy, file naming, and files in general.

As such, you’re expected to organize your playbooks (scripts you run directly) in a certain way, your inventory (hosts and groups) in another way, and roles (callable, reusable code, a cross between functions and modules), in a third way. It’s not hard to grasp but isn’t exactly intuitive either. You’ll be consulting the documentation a lot.

Ansible doesn’t have its own language, but a style or DSL of YAML that’s relatively consistent. I see this being sneered at, but it has it’s upsides, too: familiarity, a decent solution for multi-line strings, availability of anchors (though those are not frequently used because Ansible has other mechanisms), support for parsing in all major languages if you want to generate, check or modify the code programmatically.

Ansible was always marketed as having simplicity™ as one of its goals. You might not like its brand of simplicity:

only global variables; worse, dynamically derived ones get mangled to be shared per host during a run, but are thankfully accessible in the global namespace
no variable namespacing, prefixes being the preferred way
no arguments, all args are global variables, you’re just supposed to prefix them with the role name; all you C programmers put your hands up in the air
due to the simple model of code reuse, there are include and import with different semantics

Other obvious downsides include:

role argument documentation and validation require duplication, especially for defaults (you have to specify it both in the meta and in defaults), and are relatively recent
role-task distinction not that sensible, roles could have been structured more efficiently
the insistence on specifying both the inventory and the hosts in all cases

The role of roles has de facto changed over time - initially it was meant to be “set of tasks and helpers that enable a server to fulfill some role” to “a more complex piece of reusable code that you may call from a different place”.

Ansible is not a purely declarative tool by any means. Even though the single tasks are meant to be declarative to facilitate idempotence, the order of them matters, not to mention that you can run commands imperatively. In that way, it’s really not unlike running some shell scripts via SSH, just with some lipstick.

Vs Shell scripts

It might cross your mind that you could do everything Ansible does with SSH and shell scripts, or Powershell and WinRM. You likely could, but it’d be much harder, would take several times more time, not to mention that a much larger portion of it would be untested. Ansible has become mature and reliable, and you can more or less rely on it doing what it promises. Its idempotence focus really does help, too. You can give points to people with Ansible on their resumé when hiring. This is not the case for your brand of hand-rolled shell scripts.

The main value is that you get a consistent interface to all the regular sysadmin stuff on both Windows and Linux, networking equipment, and all sorts of REST and other APIs. You don’t have to check-then-do, you don’t need to mix sh and Powershell, you don’t need to mess about with manual curl calls, it’s done for you and it’s a huge time saver. You interact with it in the same way and handle it all at once.

You might think that it’d be nice if there were a similar large collection of shell code, like libraries, that does the same things and you just call it from shell, but there isn’t. Yet Ansible exists, is widely used and has an incomparably large online presence, wealth of resources and thirdparty modules. I dare say it’s a strictly superior solution to just shell scripts, if you’re stuck with pet servers and infrastructure.

Vs Infrastructure As Code counterparts

There’s seemingly stiff competition from Puppet, Saltstack, Chef, but they differ enough that it’s not really the case.

Ansible is the fully-featured agentless solution. Salt has salt-ssh, Puppet has Bolt, neither are as fully featured as their agent-based counterparts.

Not needing an agent installed on machines simplifies operation and makes drive-by configuration simple. Say, you want to ensure something is installed and configured on a set of machines, it’s quite easy to do. Being agentless, Ansible does not ensure it stays that way, of course. This is what defines its whole approach to configuration management.

It is not as pure of an IaC tool as its competitors. You don’t define the complete machine configuration in code and have agents enforce it, you have some code ensuring something is in a certain state when you run it, not more and not less. This is a less powerful, but also requires way less commitment, which is what makes it so appealing. It takes very little effort in comparison to, say, Puppet, to get some initial results.

Speed is not Ansible’s strength. It’s uses mostly SSH for transport, and even with TCP parameter tweaks, pipelining or Mitogen optimizations it’s never fast for more complex jobs in high-ping scenarios (servers far away). You will be tempted to run it from a server closer to targets at some point.

Ansible has by far the most active community, which gives you the best odds of finding an answer or a full solution for something online. You can find plugins for pretty much anything. This is something that is not to be discounted and is very important. LLM also speak it with proficiency.

When it does / doesn’t make sense

It makes sense when you need to ensure something is installed and configured on some pet machines. It’s a superior solution to doing everything with shell and SSH.

Although I’ve seen it used like this, it doesn’t really make a difference just for provisioning a new VM (first thing you run after is). If you’re planning to run something only once, and in a single place, then it’s not much better than shell scripts.

It doesn’t make sense when you want to really configure all your infra in code in a repo. In that case, Puppet is a better solution. However, I maintain that this is a legacy approach and the OpenTofu + Packer approach always makes more sense, even when you’re installing on bare metal. It may not cover all your needs (especially for Windows), and in that case you may use a bit of Puppet or Salt.

A replacement?

From what I’ve seen, there is none. I can hypothesize what improvments can be made in a potential replacement: it’s mostly addressing the points from above:

non-global vars, real arguments, namespacing - all obvious improvements
simplification of how roles are defined and documented, move more of the semantics into the files and out of the paths, less duplication
better code reuse paradigm
overall simplification of role / task / playbook distinctions

Of course, any replacement does away with one of Ansible’s major strengths, which is widespread use and a huge amount of resources online. You can’t write that colletive experience into existence.

Another consideration: Ansible is one of those tools that would never have taken off if it weren’t FOSS. Any replacement for it would have to match that.

Examples

For this blog post, I’ve ironed out a couple of playbooks that install my dotfiles and some essential utilities onto Linux and Windows machines. For Windows machines the playbook also tweaks the GUI a bit, like adjusting taskbar settings, reverting questionable Windows 11 decisions, showing hidden files and such.

I’m too lazy to rewrite it for this blog post, but here’s what a real life example of Ansible use would look like - configuring a Windows Gitlab runner for a legacy Windows project:

installing the required tools for compilation and testing
[unregistering and] registering a new Gitlab runner with community.general.gitlab_runner (registration is required to get a runner token)
installing the Gitlab runner into the machine, configuring it (includes templating the token into the config)
ensuring the Windows service for it exists, is started and set for autorestart

Other simple examples include installing Node and Windows Prometheus exporters, installing and configuring Postgres and Mongo exporters (including creating DB users for them) and such. Also patching pet database servers or legacy servers if you have those.

A nice example someone else wrote would be Kubernetes node patching though, again, I’d say that this sort of pet-sitting is inferior to just replacing the node with Terraform.

Conclusion

I’d say that Ansible is, in spite of its numerous shortcomings, a tool worth knowing and currently irreplaceable for its purpose: doing simple common tasks on multiple machines and interacting with APIs along the way.

Its lack of purity shouldn’t discourage you from using it, because it’s better than the alternative (shell scripts). There are few roles where its competitors beat it that are not served better by a more modern approach. I guess it’s important to note that it’s a helper tool, and unlikely to be the focus of your business, so a laxer criteria applies compared to something that’s your main business focus.

I do not suggest you use it in a more important role (deploying and managing the whole system) in 2025 and onwards, the modern OpenTofu (Terraform) + Packer [+ Kubernetes] combo has become the standard for a reason and is worth considering - it’s only hard if you’re not familiar with it, otherwise it’s all quite logical. Not having a human in the loop for redeploying unhealthy infra is a huge QoL improvement.

tags: