[Comment removed by author]
I’ve used Ansible and Chef professionally to manage SoA’s. The following is my opinion – professional sure, but opinion definitely. (EDIT-after-posting): It’s also a wall of text, sorry for that.
Ansible has a lot of things going for it, the language is simpler, there is less imposed structure, and a very good separation between things that are local and remote. It’s also very lightweight – you only need something installed on your local machine, everything else is done over ssh.
It also has some issues – three big ones. First, there is no standard tool for managing external, versioned, private role dependencies with transitively closed dependency chasing. That is to say, there does not exist (or does not seem to exist) an equivalent of berkshelf as for Chef. There is librarian-ansible, but it doesn’t (or I cannot figure out how to make it do) transitive dependency resolution (i.e., downloading dependencies of a role, and their dependencies, and so on). Further ansible-galaxy is for open-source roles only, and though it does seem to download dependencies, it cannot be used for anything other than open-source roles, and doesn’t have (near as I can tell) any sort of manifest file (like berkshelf’s Berksfile or librarian-ansible’s Ansiblefile). Another major issue is that script variables don’t work in particularly useful way. Though Ansible supports nested hierarchies of variables – which are useful for organizing variables in a simple and DRY way – the defaults functionality isn’t properly merged with override variables. Unlike Chef’s notion of different ‘levels’ of variables, which allow your deployer-facing scripts to specify only what they need, Ansible’s defaults really are better termed documentation-of-defaults, since any overrides from the playbook simply replace rather than override, the defaults.
However, by far, the biggest problem with ansible is that the versioning practices are the worst I’ve seen anywhere. They don’t follow semver, it seems, so I’ll upgrade my ansible from 1.6.5->1.6.6 and find that someone made some part of some module undefaulted, and now I have to go chase down a bunch of problems because public API changed in my patch update. Whats worse, is that often new features seem to be willy-nilly spinkled amongst patch and minor version updates, and breaking changes happen freely. If you’re going to use ansible, pick a version and stick with it, update it on a scheduled basis rather than assuming that patch versions are safe.
This latter point contrasts with Chef, which is much more stable in it’s API exposure in my experience. Chef also doesn’t have the variable resolution problem or the no-standard-package manager (at least of the more recent releases, Berkshelf is baked in). The language is also more rich (what with being just ruby, you can build up some good abstractions easily if you need to), and in particular the presence of LWRP’s make it easy to present a very simple API for often complex parts of your infrastructure. Ansible loses there, too, though building custom modules is possible, it’s not nearly as simple as with Chef and LWRPs, and distributing them is also a bit more complicated.
The major downsides to Chef are the need for the remote to have chef (and thus ruby and so on) installed. For ruby shops this usually isn’t an imposition, and the chef-dk package they now distribute is much easier to set up than the old omnibus packages (even on distros they don’t explicitly support, I maintain some vagrant boxes which run Arch and chef-dk, and whereas the old omnibus-based approach routinely broke, the chef-dk approach is not only simpler to install, but also much more reliable). It’s also a ‘heavier’ tool in terms of the structure it imposes and the sheer amount of stuff to deal with when creating a new recipe. The border between provisioner-action and provisioned-action is also not well established (everything basically happens on the remote, and I don’t know of any particular equivalent to Ansible’s “Local action” concept, which is very useful for automating, for instance, registration with loadbalancers/message queues/etc as well as simply being able to spin up resources or lock resources (to solve the inevitable “which machine migrates the database” problem).
Ultimately I think that – given the choice, I’d still go for chef. The problems with chef are primarily in features it outright lacks (simplicity and local-action, in particular), rather than in features it outright gets wrong (like variable management and the lack of LWRPs / easy-to-build-custom-modules). Chef is also a lot more self-contained than Ansible, it seems like, in particular ansible is, at it’s core, a collection of utilities which operate on JSON passed through STDIN and STDOUT. That’s nice an unixy, but it does end up meaning that different modules vary in behavior subtly, and it also makes for less cohesion between various modules. For instance, the file, copy, and template modules all do related, but different things, in reality, they should probably just be one (or maybe two) modules. copy should be merged with template, and file should be renamed to filesystem or something like. Each take arguments for specifying a source, some for specifying a destination, but the naming is inconsistent between them, file uses path, copy uses src, etc. Most of the modules now alias each other’s argument names, but this actually makes things worse, since sometimes file will use src and other places file will use path and so on. I’m a ruby guy, but when it comes to this, I’m with PEP 8, one-way-to-do-it is better here, at least when it comes to naming.
That’s not to say Ansible can’t be effectively used, rather that it feels still like a young project, not all the pieces are in place, and those that are are askew, waiting to be nudged correct by the other puzzle pieces yet to be placed. There is a lot of active development on various tools, and the overall complexity story will (I believe) someday be less than Chef; I didn’t get into Chef Solo vs. Chef Server, but it’s already far less complex to manage than that.
For small projects, Ansible is a fine choice, if you’re aware of the caveats. If you’re managing something more, or if you want to provide a better maintenance story for a large cadre of scripts, then I think Chef is probably the way to go, it’s much more mature, and ultimately will serve you better as the number of recipes that need maintaining grows. For my part, running with ansible now on a project that is in the beginning stages of migrating to an SoA, I’m left with concerns about how well things will scale when we get past the ‘more than five services’ mark. Right now the strategy we’ve gone with is to limit the architecture of those services to something generic enough that it can be managed by a single script, but as we look to incorporate other stacks to suit our requirements, I suspect that such limitations might loosen, and the result may be a lot of work.
Ask me in a year, I’ll let you know which system sucked harder. :)
We mostly use Chef but are slowly moving to Ansible for the following reasons:
Chef is very complicated. @jfredett talks about the variable layering being a feature of Chef but we have found it mostly a complication. A variable can get its value from 15 possible locations depending on where you are in the execution of a your chef run.
There is no reasonable way to test chef. Ansible isn’t necessarily better here, but given that Ansible is significantly simpler it’s less of a problem. There have been multiple situations where you just have to run in production and cross your fingers.
The Chef language is far too powerful. It has all the power of Ruby. Ansible really just executes scripts defines in a YAML fine which is very simple. Limitations are good. The chef semantics are also rather terrible.
Chef contains no concept of orchestrating machines. We use Ansible to push new code out to a few new machines, monitor them, then move on if it looks good after sometime. One can do smart things like take a machine out of the load balancer as it’s being upgraded. Chef only has the concept of a single machine and you cannot do these ‘staging things’.
If I were starting from scratch I might not use Ansible, but I would definitely not use Chef. In our experience Chef has generally resulted in spending 1.5 - 2x longer just to work through issues in chef. We’ve also had scaling issues and availability issues with the chef servers. Chef is one of the most complicated components most companies will deploy which I think is fundamentally incorrect.
FWIW, your complaints about Chef basically mirror my issues with Puppet: it’s impossible to test, difficult to work with, the manifest language is terrible, and there’s no way to build a workflow other than the One True Blessed Puppet Workflow.
I haven’t used Ansible, so I can’t directly comment on it, but my experience with Puppet definitely makes me lean towards simpler, more framework-like configuration management software like Ansible or Salt.
I’ll grant that Chef is quite complicated, but I think Ansible is too far in the direction of simplicity-for-its-own-sake. Variable management is a big deal with any nontrivially large infrastructure, so (IMO) it’s better to have something that’s more complicated and feature-rich, then less complicated and lacking necessary features. Obviously, YMMV. It’s definitely something that you have to weigh as a consideration as a team, and it’s totally reasonable to pass on Chef because of the complication factor. Same argument for the config language. Ansible is nice in one sense, since the language is so simple, but in another, that disallows easy implementation of some of the nice features of chef – like LWRPs.
As far as your final point, that is – honestly – the one major thing that Chef and Puppet both don’t have and desperately need. I have no idea why the Chef folks haven’t incorporated something similar. It seems reasonable to just add a ‘local’ block for recipe execution, but they haven’t. What this has meant practically for me in the past is that rather than having the provisioning tool do orchestration, I’ve had either Cloud Formation Templates do the work, or I’ve opted for the packer.io approach that Mitchell Hashimoto (of Vagrant, Packer.io, and Serf) recommended in a talk last year (I don’t have the link handy, but the title topic was something like “Vagrant, Packer, and Serf for Devops” or something) – the idea basically being that you use packer as a tool to ‘compile’ your application into a VM image, that image is uploaded to your cloud (or built directly there, if packer supports it) and then coordinated using Serf or some other equivalent tool to cause that new VM to become part of the collective and be organized appropriately. The benefit of this is that you can manage relatively large infrastructures with something like cloudformation just bringing up the machine image, then using another tool to trigger the infrastructure change, but it’s still a lot of work and there isn’t a very good way to simplify that process beyond the ‘just use the bare APIs’ approach.
All told, I think the most important thing to note is that devops tooling – whether ansible, puppet, chef, saltstack or whatever – is really a team-based decision. If your team is okay paying the price for the simplicity that Ansible offers, and you dislike the weird puppet language and complication of Chef, then by all means, Ansible away. There is no silver bullet in devops – I suspect, like most things, there will never be one. The choice is ‘what is right for my team’ – not ‘what is right’.
I’m with you on testing, but it’s definitely a problem everywhere. I have a few scripts that automate some basic testing, but it’s all homegrown. I’m displeased with some of the ‘devops-spec’ frameworks I’ve tried, they usually end up just being a separate, equivalent implementation in a lot of ways (if you go to any level of detail). My approach recently has been to have a shell script which spot checks the barest behavior I expect (i.e., test to see if this machine is accessible over this port, make sure this service is running on it, etc. A healthy combination of ping, curl, and elbow grease), then just rely on a good QA cycle to ferret out the bugs. I’m not sure it can really get much better than that.
This is entirely subjective, but I played with Chef for a while. It seemed powerful, but it was just a lot more than I needed, and a lot more complicated. I didn’t care for the DSL, either.
I kept looking and eventually found Ansible. It’s a lot more simple, I feel that the documentation is better, and even our non-technical employees can follow along with the playbooks when they have to.
Great discussion on the different config management tools. My summary is:
Chef is great for folks who know ruby.
Puppet easier for sysadmins with no programming experience,
Ansible easiest of all as it is yaml and its popularity is increasing,
Salt seems to have the least take up.
However it more important How you use these tools rather than which one you use.
‘test-kitchen’ is the key and it supports chef, puppet, ansible, and salt (disclosure i wrote the plugins for puppet and ansible).
test-kitchen allows you to orchestrate your dev environents, run your scripts via push as well as run verification via serverspec tests. This allow much faster development of scripts.
A couple of other thinks i do:
Split config into library and application components. library components are open source and released to the puppet forge, chef commuity or ansible galaxy. berkshelf, librarian-puppet, librarian-ansible used to bring them in at test time (test-kitchen does this automatically) so you don’t store them in your repository. (You should lock the version of library components).
Application components call the library components and are named with a company prefix. Application components DO NOT call other application components. Components that are shared should be made into library components.
This makes dependencies much easier as you only need to look at dependencies when you change versions of library components.
In summary: “It ain’t that important which tool you choose rather it is important how you use them”