1. 46
  1. 8

    If I understand it correctly, I like the idea. As far as I understand your focus is to provide a lightweight system written in Python and then for most use cases the deploy steps will just be written in standard shell commands with your shell function? I have always wondered whether a simple bash-script based idempotent deploy might be better for simple use cases than Ansible (background: I have written many ansible roles for my hobby deployments and it has cost me a lot of debugging and time)

    Can you maybe elaborate a bit on:

    • Why a Python surrounding is needed, i.e. what advantages it gives you?
    • Why you needed to implement some functions like file or package as Python helpers instead of shell, because at first sight it seems to break the logic to use shell, but I guess there is some reason behind (package is one of the aspects I am a bit skeptical why it is abstracted, because they also do this in ansible, but many times packages have different names between Debian-based distributions and e.g. Archlinux anyway)

    I might prefer it even a bit simpler/more reduced. When adopting new technology I came to love it when they support a scheme that I can continue working even if the technology dies. In this case this could e.g. work if you allow the user to write standard shell files (my-nginx-deploy.sh) and load these into pyinfra. If pyinfra might not exist anymore some day, the user could still execute the shell scripts manually.

    1. 3

      The most simple approach would be:

      ssh my.server.com <setup.sh

      My question would be: Against this baseline, what problems does pyinfra solve?

      For example, Ansible can work on multiple servers in parallel. It provides idempotent commands which you have to ensure yourself in shell. It skips commands already executed in previous runs.

      1. 5

        So pyinfra is built out of my personal frustrations with Ansible! pyinfra operations are idempotent similar to Ansible and it will skip commands just the same. The key differences:

        • The deploy is executed in two phases, making it possible to do a “dry run”, this identifies where pyinfra can skip commands
        • Instant debugging - I grew sick of debugging anything with Ansible, pyinfra just drops the output and the shell command itself
        • Performance - not important but a small obsession of mine! pyinfra is significantly faster than ansible over large numbers of hosts
        • Properly agentless - pyinfra does not require the target server to have anything other than a shell (vs Python + requirements for Ansible)
      2. 3

        So the idea is to roughly match some of Ansible’s abstractions - ie apt.packages(...) instead of executing the shelll directly. The advantage to this approach is pyinfra will only execute the underlying apt command if the package is not installed (by default). In this way pyinfra is very similar to Ansible.

        This is the same for most other pyinfra modules, eg files.file, etc, which will check the current remote state before executing commands to update anything as needed. I have tried to remain true to each underlying tool, so there’s a dedicated pacman module, and so on.

        Totally with you on the debugging thing! One of the reasons I started pyinfra was frustration debugging things with Ansible! pyinfra should always give the output and underlying commands that failed, making failures easily replicatable in a shell.

        I think that answers your questions? Let me know if not/you’ve more :)

        1. 1

          There is at least one config management thing that uses bash: cdist. After looking at the examples/docs, the syntax doesn’t seem simple. I’ve never used it, so it could make sense after you spend enough time on the long learning curve for it.

          1. 8

            Each time I read about cdist I remember this:

            Shell script is used by UNIX system engineers for decades. So when cdist is introduced, your staff does not need to learn a new DSL or programming language.

            Which implies that « staff » knows how to write correct bash scripts… which, to be very honest, isn’t an easy task.

            1. 3

              I’ve been writing bash for 10 years and I think I’m halfway there. Hard to say for certain…

              1. 2

                I ran into a weird failure, recently, where bash redirection of files to stdin ( cmd < path) would result in no stdin input to the receiving command, but only when run under go generate, and only on one colleague’s machine.

                I rewrote the script in golang. That’s the kind of BS I expect from JavaScript; I thought bash was better than that.

          2. 4

            I gave this a go and I really like it. It is pretty much straightforward and I prefer having this type of configuration in Python and not YAML.

            As far as I can see it doesn’t read the ~/.ssh/config file?

            1. 3

              It should read standard SSH config (https://github.com/Fizzadar/pyinfra/tree/master/pyinfra/api/connectors/sshuserclient)! If it’s not working please submit an issue!

            2. 3

              Looks rather interesting, I’m going to have to try this one day. Absolutely despise the usual suspects (Ansible, Docker, Puppet, etc).

              1. 2

                As someone who was recently asked to do a deep dive on Ansible, I’m curious, how would you describe the downsides of Ansible?

                1. 2

                  Sorry for the late, late response to this but my few complaints about Ansible:

                  • It’s really hard to figure out how to organize your project when you are first starting
                  • The dynamic inventories tend to fail in very non-obvious fashions (for example, if you forget to make the script executable or if it’s missing one of its dependencies you’ll get some bizarre error because Ansible will try to include the content of the script as if it was the actual inventory)
                  • Managing dependencies between tasks is hard if you plan to use tags to restrict the tags that need to be run. Happy to give an example if you are curious
                  • Managing secrets is kind of a pain in the ass; haven’t tried Ansible Vault, but the documentation made it look harder than using SOPS

                  All in all, Ansible isn’t bad, and it definitely has served my company well, but the level of complexity introduced is high and you’ll end up writing a bunch of wrapper scripts if you don’t want to remember the 10000 command line flags you need to run any moderately complex scenario.

                  1. 1

                    Very interesting. I’m someone who observes ansible a bit from afar – I grew up in the fab + chef + libcloud era, but my DevOps team took over and switched to ansible + terraform, which are admittedly more solid tools for cloud automation. To me, the only real downside of ansible I could tell from studying it (aside from all the YAML-ese) is that its “push” model starts to slow down for big clusters and cloud footprints. But then I discovered mitogen for ansible and it seems like that’s actually becoming a solved problem, without the downsides of the pull model. In which case, it feels to me like ansible will stand the test of time due to ecosystem/network effects, but I could be wrong!

              2. 3

                That looks lovely! I’m a big hater of yaml-based configuration and reinventing the wheel, when things like that can be done in a proper programming language. I don’t have to manage many servers, but I like the idea of having reproducible state of the system. Like NixOs, but NixOs/Nix seems like a lot of work, whereas I’m 99% happy with simply using apt, pip and few ad-hoc commands.

                I was about to cleanup my current setup scripts (done with a bunch of scripts + Ansible), so I think I’ll give this tool a try. Do you think it makes sense for my usecase (personal desktop/laptop)? Are you using it for that purpose?

                1. 4

                  I do think it makes sense - I do exactly the same :) So far I’ve used pyinfra for both ad-hoc/local box setup and also in production managing medium size (100’s of nodes) Elasticsearch clusters, amongst other things. An example is my (very WIP) MacBootstrap deploy: https://github.com/Fizzadar/MacBootstrap.

                2. 2

                  This looks like exactly what I need for homelab sautomation.

                  1. 2

                    What i find funny about these configuration management systems is how they tout agentlessness as a feature. It seems by it’s very nature (push vs pull) a system based on this model can never be useful beyond a toy application. Homelab, personal projects, etc. In which case, i have to agree with other posters, there would need to be a compelling reason to not just use idempotent shell scripts. I’ve felt the same way about Ansible. The whole point of these things are the abstraction (“over abstraction?”) for the sake of not having to write anything to keep something idempotent.

                    Since i know Chef and am familiar with it’s model (and problems), Chef or Puppet seems to make a lot more sense to me. Some equivalent in python (or even Go, which might be a bit ambitious) would be much more interesting to see (but also quite the undertaking).

                    1. 3

                      I agree with this in some ways! I have used both Ansible and pyinfra extensively in medium scale production environments without issue (100’s of targets). As the number of targets grows this becomes more of an issue (particularly for Ansible due to it’s threading model).

                      I see good arguments on both sides of the push vs. pull thing - anything with >thousands of targets I wouldn’t even entertain push solutions due to the inevitable bottleneck. But, for anything smaller I’d be content with either methods - I believe both can provide the same kind of consistency guarantees as long as they are applied appropriately.

                      A pull based system in Python would indeed be a very interesting project…!

                      1. 3

                        i think anyone contemplating such an undertaking would do well to read Mark Burgess’ Promise theory (http://markburgess.org/promises.html), there’s a wiki on it too (https://en.wikipedia.org/wiki/Promise_theory). This is the kind of physics that went into cfengine.

                        1. 1

                          I have used cfengine for single servers (in other words overengineered for learning purposes). I tried to understand the point of Promise Theory but I don’t get it.

                          The advantage of cfengine for me is its a small tool with nearly no dependencies. The Python interpreter alone is big in comparison.

                      2. 3

                        It seems by it’s very nature (push vs pull) a system based on this model can never be useful beyond a toy application.

                        Well that’s just completely false. I don’t even understand why you could think that.

                        Since i know Chef and am familiar with it’s model (and problems), Chef or Puppet seems to make a lot more sense to me.

                        Oh, right, ok. Perhaps it’s just that?

                        1. 2

                          to be sure, i didn’t mean to minimize your work here, i already think Ansible is overly complicated yet still somehow limiting in some ways (!?), so i would venture to say this seems to have a better core design

                          anything could be done in Ansible, but really, should it?

                          1. 2

                            just use idempotent shell scripts

                            Writing idempotent shell scripts is not easy. Just helping with this alone is valuable.

                            Unfortunately, most config management tools do not solve the problem either. My canonical example: Install two packages which conflict with each other, so when you install the second one, it uninstalls the first (and vice versa). Puppet, Ansible, etc all fail. Success would be to show an error in your configuration.

                            1. 1

                              Puppet, Ansible, etc all fail. Success would be to show an error in your configuration.

                              depending on how you test your Chef code, while i wouldn’t see Chef throwing an error in such a scenario, you could certainly have a test harness throw an error if there were changed resources (in this case your package install) on the 2nd convergence (in fact i’ve done this in a pipeline)… at that point then, Chef provides a couple mechanisms to deal with that situation…. but to your point, none are automatic

                              1. 1

                                I never administrated a large number of server but apparently it is not a big deal in practice. In Debian/Ubuntu there are practically no conflicting packages. The mechanism is only used for certain rare transitions where obsolete packages should be deleted. Debian fixes a lot of conflicts by providing management on top with update-alternatives.

                                Well, maybe it is a big deal but everybody assumes it cannot be improved. Package managers like apt/rpm are by their nature somewhat in competition with configuration managers.

                                For my homeserver, I restricted myself to a single installation command. Then apt complains about the conflict. Configuration managers don’t do this for ordering or modularity purposes, I think. What my own system does not capture is a configuration-internal conflict where I install and remove a package at the same time. My config is small enough though.

                          2. 2

                            I use a slightly similar tool for toy projects: https://pressly.github.io/sup/

                            1. 1

                              t reads Supfile, a YAML configuration file,

                              then it’s not the same!

                              1. 1

                                Oh okay. What’s the difference?

                                1. 2

                                  the config isn’t yaml, it’s python

                            2. 2

                              I wonder how much the lightweight and simplicity claims will hold if/when this grows to the number of functionalities and thoroughness of ansible.

                              1. 3

                                I don’t think it ever will, certainly not under my watch! Been using it for years now and the module coverage has barely changed. Because it’s Just Python almost anything else can be achieved without changing pyinfra itself :)