1. 3

    I love SQLite. But is there any off-the-shelf solutions for having multiple web servers accessing a single shared SQLite database? “running that same app all on a single machine using SQLite” is a pretty big constraint.

    1. 4

      If you need a server, what’s wrong with using a DBMS server? SQLite is nice for local file database, but beyond that, any off-the-shelf solutions will most likely add extra complexity and lose most SQLite benefits, especially given alternative like Postgresql.

      1. 4

        Because now you have two servers, and the DBMS needs accounts and privileges set up, and you have to configure the app server to tell it where the DBMS is, etc.

        Obviously a DBMS is more scaleable, but for something that doesn’t need to scale to multiple nodes, why add all the complexity? Why drive an SUV to the corner store when you can just hop on your bike?

        1. 7

          In the environment of the person asking, if you’ve got multiple web servers you already need to deal with accounts, privileges, address discovery, and all the relevant bits.

          1. 1

            You’re right, I wasn’t reading back up to @nelson’s question so I missed the “multiple web servers” part. Running SQLite as a shared db server does seem an odd choice … I agree with @isra17.

        2. 1

          Nothing at all is wrong with using a DBMS server! But this fine article is all about how you can use SQLite instead. I’ve been wondering if someone’s built a really simple DBMS wrapper around SQLite. The other answers suggesting rqlite or reading from replicas are the kind of thing I had in mind.

          1. 4

            From my understanding, the article makes the point that you can run a single web server and therefore, keep a sqlite database on this server to keep everything simple and fast. If for some reason you need to scale to multiple nodes, then the article points does not apply anymore. When using a project like rqlite, you are using sqlite as much as how you are using files when using postgres. rqlite is a whole different systems with a whole new performance and guarantees profiles. Calling it simply a sqlite server wrapper is an understatement. It seems like rqlite is more in the etcd and consul category than general DBMS.

            1. 4

              rqlite author here.

              Agreed, rqlite could be considered similar to etcd or Consul, but with relational modeling available. But I think it’s perfectly fair to call it a DBMS too, just a particularly lightweight and easy-to-run one.

          1. 1

            Along those lines there’s also https://dqlite.io/

          2. 3

            “running that same app all on a single machine using SQLite” is a pretty big constraint.

            I’m not sure it is – single machines are BIG these days. You can fit a lot on a single node before you even need to think about scaling. However, if you feel that way, don’t use sqlite. There are database servers out there.

            1. 2

              Isn’t this just built into SQLite?

              https://www.sqlite.org/faq.html#q5

              1. 5

                well, not really. The WAL mode is really just based of file locking. So if you use the sqlite library for your programming language, it will make sure that during write operation changes are written to the WAL catalog instead of the original file, which is locked during that time. This works for certain amount of write concurrency, but as soon as you use a complex application which issues a lots of writes, you will soon start to notice your sqlite layer to respond with messages like “database is busy, locked”.

                Thats the time you have to deal on your application backend to either handle retries, or even adjust the timeout setting a sqlite write operation is “waiting” for the lock to become free (default is a few microseconds). You can imagine this wont work well with multiple servers accessing the same sqlite database, doing concurrent writes. Really, if you want that, just use some DBMS.

                Also, long running SQL write operations can really fuck up the response time for any other application thread on the same database, the SQL queries will receive the busy error and have to deal with that.

                And i have seen pretty bad application designs NOT dealing with that, and just “forgetting” about the query, not even issuing an error to the user.

                SQLITE has WAL, yes, but it gives you very limited write concurrency.

                1. 2

                  That FAQ entry is about having multiple readers opening the same database file. Which works great! But it’s read only. More importantly, it only works if you have access to the database file, which in practice means only a single machine. (The doc even explicitly warns against trying this with NFS.)

                2. 1

                  I did a single duckduckgo search and found https://www.symmetricds.org/, i cant vouch for it though. fwiw my search terms were “sqlite mirroring”

                  1. 1

                    I think LiteStream supports having multiple readers following one leader. If you’re read heavy, it can work to just consolidate writing to one big box

                    1. 5

                      Litestream doesn’t support live read replication but it’s being worked on right now and should be available in the new year.

                  1. 14

                    Eek.

                    Normally you would expect someone to backport patches and fixes; but web browser codebases are massive and ugly, so I suspect that’s a really hard job for volunteers. They would possibly have to invent their own fixes too, as upstream might have replaced whole systems within the codebase when fixing the bugs.

                    Options I can see:

                    • Suddenly summon a vast amount of manpower to backport these patches for the Debian build
                    • Convince the bulldozer browser vendors that they should do this work (hah!)
                    • Remove the browser packages. Then you’re left with a distro that people will complain about (trading security for social issues). This probably also breaks the “stable” idea.
                    • Add a giant warning popup before the browser launches saying that it’s completely insecure and giving the users an option to abort launching it. It’s probably very wise to add a paragraph about why you are doing this (cultures of stable versus rolling browser releases, cost of man hours backporting packages) and another paragraph describing actual practical options to work around this problem (eg moving to Deb testing?).
                    • Shoe-horn in an isolated updated system in a box (eg appimage,etc). It “can” work, but it can also cause a thousand other technical issues (new bugs, esp in regards to video drivers & mesa, let alone potential security ones) and it’s probably not as easy as people think. Remember that browsers are essentially a complete operating system of their own, with things like hardware accelerated video decoding that need to cross the divide to your drivers.

                    Any other options?

                    1. 14

                      When I used Debian I just used the Google Chrome deb repo. I used Debian testing, which is what Google tracks internally, so Chrome is guaranteed to work. That is, if Chrome were broken on Debian testing, it would be broken for Google developers. And the Google developer workflow heavily relies on web-based tooling. That’s as close to a “blessed” configuration you can get for web browsers on Linux as far as I know.

                      1. 12

                        but then you’re introducing an untrusted binary package into the system (untrusted in that it was built by a 3rd party, not from source on debian-owned servers, etc)

                        1. 24

                          Yeah, but most people don’t care about that and just want their computers to work. Even as a relatively security-conscious SRE, that includes me.

                          On the list of “people likely to distribute malware-infected binaries,” Google is pretty far down. Unless Chrome falls under your personal definition of malware I suppose.

                          1. 16

                            Yeah I consider Chrome to be malware, but that’s beside the point.

                            1. 8

                              Very much so. It’s amazing how much the goalposts of “malware” have shifted.

                              Chrome is spyware. Having a EULA or opt-in was never a reason for spyware not to be listed by AV tools in the past (at best this might make them get flagged as “PUPS” instead of “spyware”). If Chrome came out from a small company in the 2000’s then it would get flagged.

                              No-one dares mark Chrome as malware. You cannot upset such a large company nor such a large computer base without expecting users to think you are the one at fault. We are not malware, we are an industry leader, you must be mistaken sir :)

                              It seems that you can, indirectly, buy your way out of being considered malware simply by being a big player.

                              1. 4

                                …from a small company in the 2000’s then it would get flagged.

                                I get your point, but c’mon… Stuff got flagged back then because it interrupted what the user was trying to do. If you don’t launch Chrome, you don’t see it, and it doesn’t attempt to interact with you. That’s what most users care about, that’s what most users consider to be malware, and, as far as I recall, that’s (largely) what got apps flagged as malware in the 2000s.

                                1. 2

                                  Chrome is like Internet Explorer with all those nasty toolbars installed, except the toolbars are hidden by default ¯\(ツ)/¯.

                            2. 2

                              That’s a silly distinction. If you use Chrome, then you’re already executing tons of arbitrary code from Google. In practice, whether you get Chrome from Debian or Google, you still have no choice but trust Google.

                            3. 1

                              same here, even as a long term Debian user (20+ years), this is just the only way for me, for both the private and regular workstation.

                            4. 12

                              Remove the browser packages.

                              I’d go with that. Well, leave netsurf in so there’s at least a modicum of web browsing functionality OOTB. Motivated users can download Firefox themselves and the world won’t end. That’s what they have to do on windows and macOS. But trying to keep up with the merry go round is like trying to boil the ocean. Then volunteer effort can be spent on an area that the investment will recoup.

                              1. 1

                                In previous Debian releases they had a section in the release notes about how the version of webkit they shipped was known to be behind on security patches and that it was only included so that you could use it to view trusted sources like your own HTML files or whatever. They were very specific about the fact that only Firefox and Chromium were safe to use with untrusted content.

                                But I only found out about it by a friend telling me about it in chat. I have my doubts that this could be communicated effectively.

                              2. 9

                                Normally you would expect someone to backport patches and fixes; but web browser codebases are massive and ugly, so I suspect that’s a really hard job for volunteers. They would possibly have to invent their own fixes too, as upstream might have replaced whole systems within the codebase when fixing the bugs.

                                The article allows us an interesting glimpse into just how hard this is, and it’s not just because of the web browsers:

                                Debian’s official web browser is Mozilla Firefox (the ESR version). The last update of Firefox ESR in Debian stable has been version 78.15.0. This version also has quite a few unpatched security issues and the 78.x ESR branch is not maintained by Mozilla anymore. They need to update to the 91.x ESR branch, which apparently causes big problems in the current stable Debian platform. In an issue, people complain about freezing browser sessions with the 91.x release, which blocks the new Firefox ESR release from being pushed to “stable-security”. Somebody in the issue claims the reason: “Firefox-ESR 91.3 doesn’t use OpenGL GLX anymore. Instead it uses EGL by default. EGL requires at least mesa version 21.x. Debian stable (bullseye) ships with mesa version 20.3.5.”

                                “So just update mesa” doesn’t sound like the kind of thing you could do over just a couple of days, seeing how many packages depend on it. Assuming that even fixes the Firefox end of things, I’m not sure I want to think about how many things could break with that update, not before I’ve had my second coffee of the day in any case. Just testing the usual “I updated mesa and now it crashes/looks funny” suspects – Gnome, Plasma, a bunch of games – takes weeks. It’s something you can do in testing but it takes a while.

                                Large commercial vendors are hitting release management problems like these, too, this is actually part of the reason why you see so many Linux gadgets unironically using tech stacks from three years ago. It’s worse for Debian because they’re trying to build a general-purpose system out of parts that are increasingly made for special-purpose systems that you can either freeze forever (embedded devices) or overwork DevOps teams into PTSD and oblivion in order to keep them running (cloud apps).

                                1. 7
                                  • Realize that their current model of Debian slow and “stable” will no longer work in 2021 (and beyond) and change it

                                  Not saying Debian should drop stable releases and becoming a rolling release, but perhaps there’s some slightly more rapid cadence they could adopt with releases? Like, is the issue highlighted in the article also a problem with OpenSuSE and Red Hat?

                                  1. 4

                                    “Stable” means different things to different distros.

                                    To Debian, “Stable” means that bugs will be patched, but features and usage will not. This does not fit with Mozilla and Google’s monthly release cadence; all changes need to be checked over by skilled devs.

                                    SuSE just builds whatever Mozilla hands them, as far as I can tell.

                                    1. 2

                                      For Firefox (and some other packages iirc) Debian have already given up on that. They would package the latest Firefox ESR even if it introduced new features (and it would, of course). The issue is even that is an insurmountable amount of work. The latest ESR needs much newer LLVM and Rust toolchain versions than the last one. And Debian also wants to build all packages for a given release with other packages in that release; so that means updating all that stack too.

                                      1. 2

                                        This is why I don’t really see the point in LTS Linux distros. By a couple of years into their lifetime, the only thing that you’re getting from the stability is needing to install most things that you actually want from a separate repo. If ‘stable’ means ‘does not get security fixes’ then it’s worse than useless. A company like Red Hat might have the resources to do security backports for a large set of packages but even they don’t have that ability for everything in their package repos.

                                        It works a bit better in the BSD world, where there’s a strict distinction between the base system and third-party packages, so the base system can provide ABI stability over a multi-year period within a single release but other things can be upgraded. The down side of this is that the stability applies only to the base system. This is great if you’re building an appliance but for anything else you’re likely to have dependencies from the packages that are outside of the base system.

                                        1. 1

                                          The Debian stable approach works really well for servers. It works moderately well for desktops, with the very notable exception of web browsers – which are, without a doubt, the most used most exposed most insanely complicated subsystem on any desktop, so much so that Google’s ChromeOS is a tiny set of Linux vital organs supporting Chrome.

                                          Even so, Debian is working on this and within a few weeks, I think, there will be new packages for stable and oldstable and even LTS.

                                          1. 1

                                            I used to think that the “stability” was fine for servers, but it practice it meant that every couple of years I was totally screwed when I had to urgently fix a small thing, but it couldn’t be done without a major upgrade of the whole OS that upset everything. It also encourages having “snowflake” servers, which is problematic on its own.

                                            I feel like the total amount of problems and hassle is the same whether you use a rolling release or snapshots, but snapshot approach forces you to deal with all of them at once. You can’t never upgrade, and software is going to evolve whether you like it or not, so only choice you have is whether you deal with upgrade problems one by one, or all at once.

                                      2. 2

                                        The Debian release cadence is about 2 years, and has been for 16 years. How much faster would work? What’s Firefox ESR’s cadence? The best I could find from Mozilla was “on average 42 weeks” but I’m not sure that’s quite the right thing. ESR 78 only came out in September this year and is already unsupported. The latest ESR has very different toolchain requirements to build. It’s a confusing picture.

                                      3. 1

                                        Update mesa, then update Firefox? Fighting upstream like that is a losing battle.

                                        1. 1

                                          Agreed, but updating Mesa is easier said than done.

                                      1. 1

                                        Added support for single file restore / instant recovery in virtnbdbackup

                                        1. 1

                                          I’ve heard sysprep is super janky w/ modern Windows. Has it been deprecated and replaced with something else yet?

                                          1. 4

                                            not that i know of, and yes, its really really painful to create a good working sysprepped images. Lately i was looking into building Windows 11 Vagrant images for deployment on libvirt, which, was another kind of fun:

                                            • windows 11 refuses to install without UEFI/secureboot
                                            • windows 11 refuses to install without working TPM module

                                            After working around all of that stuff by making packer pass a tpm emulation device (swtpm) to qemu and make it use tianocore uefi bios, after hours, i had an automated install going which failed during sysprep phase, because a OneDrive Appx package was unable to be uninstalled and some error messages followed where no exact reason was to be found. I went on and removed the mentioned package manually and then sysprep finally worked.

                                            All in all it took me about a day to get a working image, and i wont touch that image.. ever.. again (until it breaks, for some reason)

                                          1. 1

                                            Why GO if you have bpftrace and bash?

                                            just a POC: https://github.com/abbbi/bpf-hotclone

                                            1. 1

                                              Neat idea. I could imagine this as a service that tracks changed regions, and based on high watermarks, pushes the changes to some remote storage: block level incremental backups on live disks…

                                              1. 2

                                                seeing the SHOGO screenshot hits me straight in the feels. Its been a long time ago i had the demo running as a kid.

                                                1. 1

                                                  “According to intelligence, eh? Then I’ve got nothing to worry about!” - Sanjuro

                                                1. 2

                                                  Using these to run python 3.9 on various CI environments with all sorts of different linux and windows distributions that dont bring recent python versions. Never have failed me!

                                                  1. 2

                                                    Ive now turned the example repository into an action for others to easily use it (from the marketplace), for example:

                                                    https://github.com/abbbi/tuneme/blob/master/.github/workflows/tuned.yml#L12

                                                    1. 3

                                                      That’s a nice little trick. Note that you can also avoid installing dependencies with caching.

                                                      1. 1

                                                        yes, ive added a link to the documentation. It makes sense for small things like npm etc, but caching the complete /var/cache/apt/archives directory (to save download time) is probably not useful. One wants to test against the latest package versions and pulling in large amounts of packages, you might easily exceed the maximum cache size of 5 GB…

                                                        It makes also sense to set dpkg into unsafe-io option to save some time.. added another statement about this.

                                                      1. 3

                                                        Very good writeup!

                                                        1. 1

                                                          Attempting to migrate an bugzilla installation with ~27k tickets to gitlab, using a heavily modified multithreaded verison of bugzilla2gitlab ..

                                                          1. 2

                                                            Nice. Can be done by using dm-setup too, which allows for creating errnous devices, even tho not changeable during runtime, see:

                                                            https://abbbi.github.io/dd/

                                                            What i would search for is simulating a faulty tape device. I havent found a easy way to do so, other than to modify mhvtl or other other things like “tgt”..

                                                            If you need a FS that only eats your data, see: https://github.com/abbbi/nullfsvfs

                                                            1. 3

                                                              Shame. The fact that vagrant is scripted with a sensible language without arbitrary limitations makes it much easier to use.

                                                              1. 2

                                                                indeed. I run a complete CI environment with vagrant, spinning up virtual machines for basically 90% of the available generic/* vagrant images, and scripting this with a central configuration file or specific provisioning scripts for specific virtual machines was really easy. I fear with vagrant shifting to go, i will have quite some trouble to port all this stuff, especially with all the cycles required to get it right.

                                                                1. 2

                                                                  I get the impression from the Vagrant 3.0 heading, that with a plugin the existing ruby based vagrantfiles will still work:

                                                                  Once installed, Ruby-based Vagrantfiles and plugins will work normally.

                                                                  1. 1

                                                                    yes, but i fear that might be limited to vagrant specific settings in the vagrantfile, probably not any ruby code that is within.

                                                                1. 2

                                                                  For me there is only one solution to this: i wont buy anything of that known to be short living, often completely useless stuff. I really don’t need any smart-watch to show my daily movement stats. I still dont understand how people can even get hyped by that.

                                                                  1. 3

                                                                    There are libraries/DSLs plumbum, mario, and sh in Python listed here – does zxpy differ significantly from them?

                                                                    https://github.com/oilshell/oil/wiki/Internal-DSLs-for-Shell

                                                                    And this page is editable – feel free to edit it (with the node.js zx as well).

                                                                    1. 3

                                                                      This is a timely post. I’ve just gotten fed up with how bad it feels to convert a shell script to Python. This time, I wanted argument parsing.

                                                                      So, I’m finally going through and looking at ‘sh’, marcel, xonsh, plumbum, and now zx to find something better in between bash and vanilla Python.

                                                                      1. 3

                                                                        Definitely write up the results! I think many would find it useful.

                                                                        It feels like there’s a bit too much re-inventing the wheel here, even just within Python. But I could be wrong.

                                                                        FWIW I have written similar shell-ish Python tools going back 15 years now – one was called “dice” and used JSON over pipes. I even got some positive comments from Guido van Rossum about it. But I think that approach is fundamentally limited – hence the long-winded Oil project :)

                                                                        It’s probably useful for some problems, but I would still point to the lack of convergence as a curious thing. To me it feels like each one is a little wrong for some job, so someone writes a new one.

                                                                        I would generally like my Python-based tools, but then when I go back to use them, it was often easier to just do it in shell. (Deployment was an issue for sure.) I learned enough shell by writing them that I lacked the motivation to actually use them for “production” problems :)

                                                                        Similarly there was a predecessor to Eggex in Python called Annex. But when I went back to use it, it was easier to just suck it up and use Python/Perl regex syntax. Again ironically I learned every nook and cranny of Python and POSIX regex syntax by writing it. I think Eggex makes more sense because it’s embedded in a shell language and it’s not a Python library.


                                                                        Also clicking through the wiki, I just noticed this from the author of “pysh”:

                                                                        I no longer believe this approach to shell scripting to be a good solution. pysh’s approach is to modify the syntax of python resulting in an uglier, and confusing, language. Maybe someday I’ll stumble upon the ``right way’’ to implement a shell language, but for now bash is just fine.

                                                                        (But I don’t agree bash is fine :) )

                                                                        1. 2

                                                                          Here are my thoughts from my hacking last night:

                                                                          Marcel and Xonsh are doing way more than I want. I mostly just want a library that makes shelling out easier. For example, Marcel is going to return Python data types for things when I really just want shell scripting, but easier from Python. I’d still like to give them a closer look, but last night I primarily compared plumbum and sh. I’m (sorry OP) not really interested in zxpy because of its interpolation syntax.

                                                                          Re: plumbum and sh, sh is the clear winner. I want behavior like bash’s set -x and sh provides that with info-level logging. I want my program to run “like a shell script” and print all stderr and stdout to the terminal, which was easier to accomplish in sh than in plumbum (though it does add a bit of boilerplate). Finally, the way sh does subcommands is great, in that it really makes the shell you’re running “look like” Python code. Here’s an example of how good sh can look and where it falls short from a comment on a ticket I made last night.

                                                                          1. 1

                                                                            Thanks for the feedback, I re-organized the page and added a link to your comments!

                                                                            https://github.com/oilshell/oil/wiki/Internal-DSLs-for-Shell

                                                                            I also appreciate any feedback on https://www.oilshell.org/ itself; it’s basically shell with Python-like data types, and without quoting problems, so I imagine it may be appropriate for what you’re doing.

                                                                            1. 1

                                                                              Hey, been thinking about your request-for-comment on Oil and didn’t want to ignore you. Here’s where I come out personally:

                                                                              1. Zsh is a local optimum for posix/bash-compatible shells. I have mine well-configured, and with a few plugins like autosuggestions and syntax highlighting (recently installed fzf-tab which is awesome too) it seems to me that it offers anything Fish does but is still posix-compatible (there’s a thread today about Fish on HN so I’m thinking about it). Side note: I remain completely baffled why people use plugin managers for shells. My .zshrc is like 100 lines (which is mostly setopts and bindkeys) and I’ve never seen the need for it.

                                                                              2. shell is awful for scripting. For anything more than running some commands with some if statements I’d use Python, which is itself a local-optimum for dynamic languages. So, personally, any avenue to “better scripting” will be through improving Python’s ability to be used for shell scripting. Not that it’s “bad” now, it could just be nicer to transition from shell to Python without so much impedance mismatch.

                                                                              3. So, given the above, I don’t see where Oil fits for me personally. If I ever switch from Zsh for my interactive shell it’ll be to something more radically different like Nu shell (which seems very promising). And like I said for scripting, there’s no reason to leave Python, where I can whip out some Pandas, Requests, etc.

                                                                              Btw, I’m unhappy with Plumbum and ‘sh’ for a few reasons so I started my own autoshell library over the weekend. Currently working on piping using pipe operators. I’m gonna try out the new async-based subprocess for my library to be able to real-time tee output to the console, and so on.

                                                                              1. 2

                                                                                Thanks for the feedback! Your points make sense and are not too surprising to me.

                                                                                1. Oil isn’t a better interactive shell than zsh at the moment; however there is a new “headless mode” coming up which I’m excited about. That will enable some more inventive UIs.

                                                                                2. Yes shell is awful on the surface, but it has a great core! And the point of Oil is to fix it while retaining the good parts :) I guess people aren’t convinced it is possible to rehabilitate, or are not convinced that there are good parts.

                                                                                3. I use Python in all my shell scripts! I address this here:

                                                                                http://www.oilshell.org/blog/2021/01/why-a-new-shell.html#shouldnt-scripts-over-100-lines-be-rewritten-in-python-or-ruby

                                                                                However I’ve noticed that the way this is worded isn’t particularly convincing, so I plan to update it: https://github.com/oilshell/oil/issues/944

                                                                                The tl;dr meme is that it’s “better” to write 200 lines of shell that calls 300 lines of Python, than to write 1000 or 2000 lines of Python. But I understand that a lot of people haven’t felt that “compression”. It’s one of those things that you have to experience yourself.

                                                                                It’s hard to explain but some things just naturally go in Python and some things naturally go in shell, and they work together as part of the same system.

                                                                                I would make an analogy to writing many manual loops over dicts and lists in Python, and then discovering SQL or Data Frames. You will just save so much repetitive code. (Not that SQL doesn’t have a ton of downsides too.)

                                                                                Python is a great language, and my primary one for ~18 years, and Oil is written in it, but it isn’t optimal for many tasks. For instance, one of the main reasons I use shell is to parallelize Python (and R) trivially!

                                                                                If you have any other feedback or questions let me know.

                                                                                1. 2

                                                                                  To your point, yesterday I was repeatedly waiting for a command to complete that had an xargs in it, and I went “wait a minute”, added -P0 and it completed much quicker :)

                                                                                  I’d be interested in more about your thoughts about the right way to combine shell and “real programming languages” in a way that makes best use of both. In general I’m very sympathetic to that point of view because that’s always how I design systems, expose a bunch of command line programs and tie them together. That’s similar to how git is designed as well.

                                                                                  Ultimately though, shell is programming too, so it seems like it’s just an API/ergonomics issue in programming languages that needs to be improved if shell is significantly better for certain tasks.

                                                                                  Edit: here’s an example of how I always combine command line programs together: I have a little cb (“clipboard”) program and I’ll often do things like cb | sort -u | cb, or cb | xargs ... | cb to go back and forth from data in my editor.

                                                                                  1. 2

                                                                                    Yup xargs is one of my favorite commands! In fact I once made a presentation about it which I never turned into a blog post :) http://www.oilshell.org/share/05-24-pres.html

                                                                                    Yes if you know how to write and design a CLI in Python, then you’re already mostly there! To me the difference between a CLI tool and a Python function is that the CLI tool is mostly stable. That is, you add things and never take them away, because that would break callers.

                                                                                    And this discipline makes you more careful about your code and how it interacts with the world.

                                                                                    There are a few books that cover it (and unfortuantely I think it does take a book + a bunch of experience, I’m still learning):

                                                                                    Roughly speaking, I’d say my Python programs use stdin, stdout, and stderr better than they used to, and they have better flags, better errors, and better logging/instrumentation. I find it a pretty useful style for structuring code and especially testing it.

                                                                                    1. 2

                                                                                      Btw I released that library I said I’d do above: https://github.com/kbd/aush

                                                                                      I’ve been using Python for so many years but never put up a library on PyPI before. Poetry made it easy.

                                                                                      It’s not done yet but it “works” enough to share. Currently learning asyncio things so I can implement streaming output.

                                                                                      1. 1

                                                                                        Nice README, it’s very clear. I can see it being useful for some tasks but I still like shell for pipelines, redirects, and a few other things :)

                                                                                      2. 1

                                                                                        Yes if you know how to write and design a CLI in Python, then you’re already mostly there!

                                                                                        Yeah I have argparse basically memorized :) The way I look at it, command line programs are basically “functions” available from any other language, with the caveat that they can only take strings as arguments and return a return code and a string.

                                                                            2. 1

                                                                              For Python, it feels more like a curious lack of batteries rather than abundance of wheels. As much as people like to promote Python as bash replacement, spawning a subprocess in vanilla Python is much less ergonomic than in bash. And, to make this ergonomic, you need some way to make ls $dir syntax work without injections, and Python doesn’t have nice facilities to do that. This is not specific to Python even, spawning a process in most languages is either a chore (looking at you, Rust), or depends on shell (Ruby, Perl). That’s why I kinda gave up on “normal” languages, and just write my scripts in Julia: they get this detail right, despite this being not really their domain. It was also pleasant to re-learn that JavaScript’s string interpolation works the right way, allowing for library-defined interpolation semantics.

                                                                              1. 1

                                                                                I always attribute the weird/limited APIs for spawning processes in most languages to a (perhaps misguided) attempt at portability. (And I was using Python before the subprocess module existed; it was REALLY impoverished back then.)

                                                                                I think you can get close in Python with f strings now – how about something like:

                                                                                os.system(f'mplayer {filename:shell_escape}')
                                                                                

                                                                                I think you just have to write/register the shell_escape “formatter” (but I haven’t tested this).


                                                                                Back to the higher level point: The way processes work between Unix and Windows is completely different (compare with the file system which is more similar). To make a portable interface on top of them limits the functionality greatly. The errors you can get back are different, and pipelines are a whole other can of worms.

                                                                                Windows is really about thread-based concurrency (and async); process-based concurrency is an afterthought. Processes are slow and heavy on Windows. The original concurrency model of Unix was processes; there were no threads.


                                                                                To be fair, it’s also painful to spawn processes in C on Unix – fork, exec, and making sure you don’t have descriptor leaks! (CLOEXEC and all that). And shell gets it wrong because it doesn’t have first class support for arrays, and arrays are literally in the C interface (char **argv).

                                                                                And pipelines: pipe(), dup(), close(), and fcntl(). Unix is very flexible but also makes you do a lot of work yourself. The C standard library barely helps.

                                                                                Also to me it is funny that bash made the env var solution arguably unsafe (from the previous thread about zx). The example I listed happens to be safe though.

                                                                                https://lobste.rs/s/9yu5sl/after_discussion_here_i_created_lib_for#c_paq9ch

                                                                                You have to avoid using the environment variable in array subscripts:

                                                                                a[$DIR]=1  # unsafe, hidden EVAL in ksh/bash
                                                                                echo ${a[$DIR]}  # unsafe, hidden EVAL in ksh/bash
                                                                                

                                                                                which is probably too subtle a rule to recommend to people. So quoting is the more standard and more explainable solution.

                                                                                Although I think this would be a lot cleaner and require less mechanisms from each calling language.

                                                                                The fault is really with ksh and not bash, since bash copied the double expansion / hidden “eval” from ksh. If there were no hidden EVALs, like Python/JavaScript/every other language, this would be by far the better solution.

                                                                                Although I think there is a subtlety – I wonder if typical shell quoting actually prevents expansion in the a[$DIR] case. It might not. I will think about that…

                                                                                1. 1

                                                                                  Yeah actually quoting does NOT protect you from the hidden eval problem [1], so the environment var solution I gave is a good as quoting.

                                                                                  An equivalent solution without env vars (and leakage) is simply to invoke sh -c and pass an argument:

                                                                                  >>> untrusted='/bin'
                                                                                  >>> 
                                                                                  >>> subprocess.call(['sh', '-c', 'find "$1" | wc -l', 'dummy0', untrusted])
                                                                                  170
                                                                                  

                                                                                  Why this is useful:

                                                                                  • It does NOT use Python string interpolation.
                                                                                  • It doesn’t require shell quoting. Automatic (Julia-like) or otherwise.
                                                                                  • It’s as safe as quoting. Quoting is still subject to the hidden eval caveat, so the 2 solutions are on equal footing.

                                                                                  Downsides:

                                                                                  • Your language has to let you spawn an argv array without the shell. Most languages let you do that now, but maybe languages like awk don’t.
                                                                                  • The dummy0 thing is probably confusing to some people.
                                                                                  • You might forget the quoting around $1 like I initially did :)

                                                                                  I should write a blog post about this, but it’s probably at the back of the queue


                                                                                  [1] For some reason this is hard for people to understand, but here are the refs:

                                                                                  https://github.com/oilshell/blog-code/tree/master/crazy-old-bug

                                                                                  http://www.oilshell.org/blog/2019/01/18.html#a-story-about-a-30-year-old-security-problem

                                                                                  1. 1

                                                                                    Automatic (Julia-like) or otherwise.

                                                                                    Not sure if this is just a choice of word, but, just in case, Julia doesn’t do any quoting. It doesn’t need to, because it never concatenates arguments into a single string. run(`ls $dir`) does subprocess.call([“ls”, dir]).

                                                                                    Your language has to let you spawn an argv array without the shell

                                                                                    I’d go as far as saying that new languages & runtimes shouldn’t have ability to spawn with the shell at all, even via opt-in mechanism like shell=True in Python. If someone knows what they are doing, they can [“sh”, “-c”].

                                                                                    1. 1

                                                                                      Ah OK I thought it was auto-quoting. So then does Julia reimplement pipelines with find $dir | wc -l ? I think I might have seen that with their use of libuv. My first thought is that is a bad idea – there are other constructs in shell you might want besides pipelines, and you don’t want to implement an entire shell inside the language runtime.

                                                                                      I wouldn’t object to an explicit sh -c everywhere – it might clear up a lot of confusion.

                                                                                      1. 1

                                                                                        Yeah, Julia does implement pipelines itself.

                                                                                  2. 1

                                                                                    I think you can get close in Python with f strings now

                                                                                    I rather strongly feel that this is nowhere close. The right solution should be the default and more ergonomic (and preferably the only) option. Otherwise folks will do the wrong thing without knowing it.

                                                                                    Like, zx demonstrates how hard it is to make people write safe code. JavaScript backtick syntax is specifically designed to make it possible to not do injection vulnerabilities. tag`ls $d` will call tag([“ls”], d). And yet the library happily concatenates that back into a single sitting, because node has unsafe (but ergonomic) child_process.exec API.

                                                                              2. 1

                                                                                ‘sh’ is nice, if you require it only on linux. More recent python versions have subprocess.run, which makes it really easy to execute commands, catch stdout/stderr, have timeouts defined, send into background etc. I replaced all my ‘sh’ occurrences with subprocess.run.

                                                                              3. 3

                                                                                The main motivation for me was to make it dead simple to jump between shell scripts and Python, and have 0 learning curve. And I think no other library achieves that as well.

                                                                                1. 1

                                                                                  My companys product is available for quite all major linux distributions aswell as windows. It is not an cloud native application and consists of server+client component. So for testing the base functionality we basically have to spin up a complete virtual machine, containers are not sufficient because we also have to have access to direct luns and iscsi stuff (VTL tape).

                                                                                  At the moment im running 6 systems based on centos/libvirt/kvm and im using a dead simple setup with vagrant, which, if the build for one distribution has been done, spins up the required virtual machines to install the latest build version and executes the testsuite.

                                                                                  The virtual machines are pre-configured using regular shellscripts (via vagrant provisioning) and the testsuite is implemented in python pyunit, at the moment having roughly around 8000 lines of python code and is executing around ~180 testcases for each setup, for some virtual machines we have on the fly creation for virtual tape libraries that are attached to the virtual machine for usage, using quadstorvtl in the backend.

                                                                                  The boxes are spinned up using jenkins and some scripting, the virtual machine configuration (vagrant config) beeing part of an git repository. Im using sphinx with the docstrings plugin to provide the developers a neat link to the tests documentation and source, in case one testrun fails. The logfiles of the test run are saved to a log repository for each build.

                                                                                  As a side project i provide a web-based frontend using python/flask which uses the same vagrant configurations as the CI virtual machines for the developers, where they can easily spin up a virtual machine with the latest build version installed to reproduce any failing tests.

                                                                                  I really have to give big kudos to the roboxes project and Petr Ruzicka for providing such excellent pre-configured vagrant boxes for all major distributions and especially windows systems!!!

                                                                                  https://github.com/ruzickap/packer-templates https://github.com/lavabit/robox

                                                                                  1. 2

                                                                                    Mostly working on a new backup tool for kvm/libvirt which allows online full and incremental backups (https://github.com/abbbi/virtnbdbackup)

                                                                                    1. 4

                                                                                      other documents in the web seem to have more in detail documentation about the technology used for the search index etc.. even tho these are quite old. At least these are what ive found:

                                                                                      https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.157.7362&rep=rep1&type=pdf

                                                                                      back then they used a system based on an “distributed in memory index” that is queried by requesters using udp broadcasts to locate the requested files stored on the disks.

                                                                                      I wonder how that system has evolved in the last decade.

                                                                                      1. 5

                                                                                        Until recently, index scans were performed very infrequently because each index scan caused the permanent loss of up to 10 hard disks. The specific cause of the disk failures seems to have been related to insufficient data center cooling capacity. Actively accessing the disks raised the machine room temperature by at least 5 degrees Fahrenheit.

                                                                                        That is the greatest thing I’ve read today.