1. 8

      Don’t give me credit. My eyes were opened by Ryan Gordon over 10 years ago when he was lamenting how terrible dotfiles and dotdirs are in your ~/ on modern *nix. I wish I knew where he discussed this, but it could be lost to old IRC logs. I think it was about how terrible writing new parsers for config files was for every new videogame he ported to MacOS and Linux and also the performance issues around it, chance of config file corruption if a game crashes at just the wrong time, etc.

      https://www.patreon.com/icculus

      edit: Yeah, definitely discussed this back in 2009, just can’t find the correct archive of the conversation

      00:42 <+Mercury> floam: I’d kill for a simple, flexible, small, trivial to use, config file library that was everywhere. Note, it has to actually exist in distributions, a BSD license might be nice, and it has to be absolutely trivial to use with a very small handful of function calls. :)

      00:43 <+icculus> floam: I have a list of shit I’d like to change that basically makes Linux a GPL’d version of Mac OS X by the time it’s done, though. :)

      00:43 <+Mercury> (This may already exist, I’ve seen too many crappy implementations to keep looking on a regular basis.)

      00:43 <+icculus> Mercury: sqlite.

      1. 5

        How do you diff and merge it?

        1. 9

          In theory, diffing should be easier than with bytestreams because your structure is distinct from your data. Diffing tools are better for text right now because we’ve invested a lot more time into diffing text.

          1. 7

            Not a totally complete solution but https://www.sqlite.org/sqldiff.html

            1. 5

              Ah, thanks for all the answers! I’ve realized what confused me about the post and all the comments. In my world, configuration means a chunk of read-only data that’s used to distinguish one environment from another and the application only ever reads it at start and never touches it again. IMO, the best way to keep such info is as human readable plain text files in your repository.

              However, what’s meant in this context is configuration managed by the application itself. Like you go to the options menu in a game and tweak some settings. I’m personally inclined to think of that as regular old data and not configuration. Then, by all means, any piece of information you modify throughout its lifecycle should be kept in a proper database, embedded or not and sqlite is a great choice. Some exceptions could be application-created immutable data that you don’t care to version, or even if there will be versions they will be few and you’re willing to take great care about it.

              1. 3

                schema updates are all you need, and you can store them in a table and allow you to upgrade/downgrade config versions

                1. 2

                  You might store the canonical representation as a plain text SQL dump rather than the binary database itself

                2. 4

                  If you want to persist config from an app then it’s a good choice, but both reading and editing it manually seems like a complete pain to me. Imagine doing something like nginx.conf from SQLite!

                  1. 1

                    Another benefit of this is that SQL databases are programmable by a wide variety of standard tools. You don’t need to read or edit it manually. Imagine having a useful config editor toolkit!

                    1. 1

                      Personally I don’t think that’s easier for end-users, although it can be for application programmers. Even as someone who is quite familiar with SQLite I would consider it a pain to configure like this, never mind people who aren’t.

                      1. 2

                        It seems like you are positing that one would need to use a SQLite shell and to write SQL in order to edit configurations. What I am suggesting is while that would indeed be possible at the low end, it also affords myriad higher levels of tooling, such as structural or relational editors, and the ability to share that technology across other systems that use configurations.

                        A tool like this https://www.digitalocean.com/community/tools/nginx could easily be implemented on a SQLite backend. It would be much simpler to do so than to build it for the current nginx config language which bears almost nothing in common with any other config format.

                        1. 1

                          Yeah, I understood what you mean (although I probably didn’t detail that enough), but it still seems like a complete pain to me. Personally, I even dislike using stuff like git config because just reading or editing a plain text file is so much easier. (luckily, git config is backed by a simple INI file so that’s not a problem) I can quickly see what’s configured, and it’s easy to edit with a standard tool I already know (my text editor of choice) without learning some sort of new tool.

                          I don’t think nginx.conf could be “easily” implemented, or at least, not in a way that’s transparent: without some specialized CLI/tooling to deal with it, it’ll be very hard to make head or tails of just the database as the data is kinda complex. I’m also not convinced that tools like the DigitalOcean one would be that much easier. It’s easy to grok and verify the correctness of a simple config file format (just read it), whereas that would be much harder with some SQLite format.

                          For me, personally, I would have to really want to use a product/project in order to deal with something like a specialized “structural or relational editors” to set it up.

                          To be honest, I think “use SQLite” is a typical “engineering solution”, which makes some amount of sense from an engineering perspective, but ignores UX. Config files are all about UX.

                          1. 2

                            I’m sorry, no, config files are horrible UX. For instance, they are never “plain text” but rather they exhibit some informally-specified data model often coupled with another informally-specified logical model. There is no config file format that makes sense without an understanding of (1) the syntax and (2) the semantics. You cannot “just read it” and determine correctness without first forming a full mental model. Why should this be so? Tools can be better at it. After all, you have become quite comfortable working with text due to generations of tools development specifically geared for manipulating text. Would you consider a text file to be superior UX if the state-of-the-art editor was still ed(1)?

                            1. 1

                              For most files, understanding the syntax isn’t hard: it’s just a key-value mapping with some comments, possible nested in some context. The semantics problem still exists with SQLite; it’s not like a relational database will give you automatic semantics.

                            2. 2

                              You could easily have a text editor plugin or fuse filesystem that presented sqlite dbs as something more like a config file. Could even support comments and stuff.

                              That keeps the advantages of sqlite.

                              Still not really convinced it’s much benefit over just standardising on something like toml.

                  1. 1

                    From the title I expected an empty pile-on to the various other spinlock articles this week but this is a very practical and detailed writeup

                    1. 1

                      You could probably beat both with a bloom filter based approach.

                      1. 2

                        I guess that could work, probe for an item. If its bits are not set, add the item to a list, otherwise skip it. This would however give inexact answers. There is a certain probability of collisions, which may get a false positives (where a the Bloomfilter indicates that an item is present, whereas it’s not). So, your final list of unique items may have items missing.

                        If you want an approximate answer to the number of unique items (as opposed to getting the actual items), HyperLogLog would be very small and efficient.

                        1. 2

                          Generally you use a bloom filter as in addition to some other structure, as an optimisation. So here you’d do something like

                          1. Check bloom filter
                            a. it's there: actually check the underlying set
                            b. it's not: add it to the underlying set, then add it to the bloom filter
                          

                          Of course this only saves you if checking the bloom filter is much cheaper than checking the underlying set. This might be the case if you distribute the bloom filter to edge nodes but your set is in a database that needs to be queried (this is how Chrome’s SafeSearch works, your browser has a copy of the bloom filter and you only query Google if there’s a hit). And of course now you have race conditions between them. It depends a lot on the cardinality of your data too, because if you’re mostly seeing the same few records all of the time then the bloom filter won’t really save you anything.

                          For just finding the unique items when you’ve already got an array that fits in memory, it’s pretty unlikely that a bloom filter beats more naive approaches.

                          1. 1

                            because if you’re mostly seeing the same few records all of the time then the bloom filter won’t really save you anything.

                            Yes, but this is the case where the hash is already good, so I’d expect bloom+hash to work well for low-uniqueness and high-uniqueness.

                      1. 21

                        We were running into a bug in the field where certain, very simple operations were taking 1,000,000x longer than they should. It was only happening at certain sites and we just could not replicate it locally.

                        Turns out that, under certain workloads and after a certain amount of data set growth, the query planner in the embedded database engine we’re using would start picking exactly the wrong index for a certain commonly-run query, resulting in what amounted to a full table scan on a billion-row table dozens of times a second…

                        The fix was a simple “FORCE INDEX” hint in the query.

                        Another interesting one was in statistics reporting for traffic capture. The stats would report that we were processing hundreds of gigabits per second of traffic on a 10Gbps box. Turns out that the packet validation code was correctly handling corrupt IP headers but not until after the size was recorded for the purposes of stats gathering, so a malformed IP header could cause a calculation to go negative and overflow. The packets would later get dropped correctly, but it was a bit of a puzzler for a day or so because we were convinced that it was the stat calculation that was wrong (i.e. we trusted the packet data that the stat calculation was using).

                        (What’s embarrassing is that I wrote both the stats calculation and the capture engine with the header validation, so I really should’ve found the bug sooner…)

                        Now for a stupid one:

                        Our traffic capture engine in our test rig was reporting 30% malformed packets during tests. I was pulling my hair out trying to find where things were getting parsed/validated incorrectly. Turns out someone had accidentally replaced the “clean” test data set with one of the data sets that had, you guessed it, 30% bad packets to test the validation code…

                        1. 15

                          under certain workloads and after a certain amount of data set growth, the query planner in the embedded database engine we’re using would start picking exactly the wrong index for a certain commonly-run query

                          I had a fun one like this a while ago. We use base 36 IDs for a lot of stuff internally (so counting looks like 1,2,…,9,a,b,c,…,z,10,11, etc). Sometimes postgres would do full table scans for this very simple query using psycopg2 (a common python postgres driver)

                          transaction.execute("SELECT * FROM users WHERE id=?", [id36_to_int(the_id)])
                          

                          Here id is the primary key. You shouldn’t ever have to do a full table scan to do = queries on it. So what was happening?

                          After a lot of debugging we saw that instead of the_id being a string like “d54q6”, it was sometimes human-readable strings like “lorddimwit”. It turned out that sometimes our callers thought they should be passing us a user name instead of a user ID. We only spotted this because of seemingly unrelated errors about not being able to convert strings containing - and _

                          id36_to_int(something_reasonable) should return a Python int which psycopg would map to the int64 type of the column, which Postgres can look up in the index. But id36_to_int(something_huge) might be very large and psycopg2 very helpfully sends it to postgres as a bigint. Postgres can compare int64s to bigints, but it can’t use the index to do so. So it would scan the whole table, individually casting every id from int64 to bigint to do the comparison.

                          We added an if too_big(the_id): raise "hell no" and the problem went away.

                          1. 4

                            Hah! We had something similar happen too: the character encoding for some column was set specifically and differently from the rest of the DB (why I don’t know; it was like that when I got here).

                            So doing a join or a query involving that column would cause the DB to do a full table scan, convert that column to the other encoding, do the equality check, and then go to the next row…for millions of rows.

                            1. 2

                              I do not understand this behaviour - if the coercion is built-in and the column is int64, postgres should cast the lookup key to int64 and return an empty resultset if the conversion overflows. seems like a bug in their code too.

                          1. 6

                            “Passphrases” as they define it and bcrypt/scrypt are fully orthogonal. You should both have high-entropy passwords and be using them as input to a KDF. They guard against totally different things.

                            Passwords guard against somebody getting into an account you own. They protect you. KDFs guard against your database being leaked and everybody’s passwords being leaked. They protect your users.

                            This is dangerous advice. Please don’t take it.

                            1. 13

                              This software may not be used by […]

                              Doesn’t that make it not open source software anymore? The OSI website says that open source licenses don’t discriminate against people, groups, or fields of endeavor. I also think it goes against the GNU definition of free software.

                              1. 11

                                There’s also considerable controversy over whether the OSI gets to define anything like that, or whether the OSD is a usable definition.

                                1. 7

                                  Personally this feels like botanists telling you that a strawberry isn’t a berry but a watermelon is. Berry has a working definition that humans have been using for longer than we’ve had botanists and it doesn’t have to correspond 1:1 with some unrelated biological concept.

                                  I know the analogy isn’t perfect because the history of the terms “open source” and “free” and “libre” are more tied up in each other, but I’m supremely uncomfortable with someone claiming authority to decide what somebody else means when they use a word

                                  1. 4

                                    You’re not wrong, the discussion whether OSI can lay claim on the name is also not new.

                                    1. 3

                                      “a strawberry isn’t a berry but a watermelon is”

                                      I’ll be damned… Well, educated. I’m still calling them strawberries and watermelons, though. Call it inertia.

                                      1. 2

                                        Bananas too!

                                  2. 10

                                    A lot of people have made that point–and for what it’s worth, the license page itself uses Google Analytics and hence is not libre software either.

                                    This is kind of one of the root philosophical rifts in the community of folks who want open source software right now: hardcore FSF types that would argue that any discrimination is an infringement of freedoms, and those who believe that it is permissible to allow discrimination and limitation of rights so long as it protects underprivileged groups.

                                    One of those positions is a lot more straightforward to both define and enforce, the other one is a lot better in terms of trying not to reinforce power hierarchies.

                                    1. 3

                                      I’m not sure even the first position is straightforward. My personal preference for the ISC (MIT/BSD-style) license is because I choose to give as much freedom as possible directly to those who follow me. GPL infringes on those people’s freedom for the greater benefit of my followers’ followers (and on and on).

                                      I expect most of those “hardcore FSF types” consider GPL-style more “freedom-loving” framed this way, but the same facts lead me to the opposite conclusion.

                                  1. 2

                                    The Annotated Turing is a great tool for reading this paper and the requisite concepts with very few prerequisites. Highly recommend.

                                    1. 12

                                      No mention of Cunningham’s Law?

                                      The best way to get the right answer on the Internet is not to ask a question, it’s to post the wrong answer.

                                      1. 2

                                        It may work and the idea is funny, but it’s anti-social behaviour. Then the whole internet just fills up with wrong answers.

                                        1. 6

                                          But done in earnest, it makes for a good question. “I’m trying to do a thing; here’s what I’ve tried so far (yes, I know it’s wrong).” It lines up nicely with point 1 in the article.

                                          Plus, wrong is a spectrum; just look at Stack Overflow. There’s right answers, wrong answers, right answers for the wrong questions, answers that are technically correct but completely insane, answers that were right 3 years ago…

                                          1. 1

                                            I just got it to work with my Kubernetes article.

                                        1. 12

                                          This isn’t a bad post, but beyond giving a very rudimentary overview of BFS there doesn’t seem to be much in the article demonstrating Go’s standard library. You could do this same exercise in virtually any language that has list and map types in its standard library and the code would look the same.

                                          Serious question: do Go users actually re-use the standard list type everywhere instead of having proper types for stacks and queues?

                                          1. 3

                                            Serious question: do Go users actually re-use the standard list type everywhere instead of having proper types for stacks and queues?

                                            If the list type handles both and you can just pick which methods to call in your use case, why would that be an issue? Stacks and queues are very similar internally… You can sometimes see a very simple implementation of them just using slice access really.

                                            1. 3

                                              If the list type handles both and you can just pick which methods to call in your use case, why would that be an issue?

                                              No fundamental issue. I often rely on type signatures to help understand what I’m dealing with, and it is also sometimes helpful to get a compile error (ie. if I try to use as a stack a list that was intended to represent a queue). I guess folks using Go probably rely on comments explicitly indicating whether the list is a stack or queue (and helpfully named variables).

                                              1. 2

                                                From what I’ve seen (using go since 2012) people don’t usually pass around a slice expecting you to push or pop.

                                                If you are using a slice as a queue you make one of two choices: option 1 is write a wrapper type. Option 2 is reuse some wrapper type built on the empty interface. Soon we’ll get a third option with go2 generics.

                                                1. 2

                                                  Did they decide to add generics in Go 2? That would’ve been pretty big news.

                                                  1. 2

                                                    Here’s one proposal from the core go team: https://go.googlesource.com/proposal/+/master/design/go2draft-contracts.md

                                                    I don’t know which release it will land in, but go getting generics is a question of when not if.

                                                    1. 1

                                                      I feel like if they had decided that they will implement generics eventually, they would’ve told us. Seems possible that they could explore the design space and decide there is no possible design they are satisfied with.

                                            2. 2

                                              Yes, you are right, I’m not diving deep into BFS. I actually said in the post there was plenty of pretty awesome material online. I wanted to demonstrate the power of standard library. You must surely admit having a simple BFS implementation on ~15 lines of code is pretty cool :-)

                                              do Go users actually re-use the standard list type everywhere instead of having proper types for stacks and queues

                                              I dont personally do this all the time, but I often tend to start with it, just to get things going and then gradually improve on and sometimes eventually completely replacing some of the standard library data structure implementations.

                                              1. 0

                                                Why not? A stack is just a vertical list.

                                                1. 2

                                                  No, a stack is not a vertical list. A stack is a LIFO. If you’re arbitrarily indexing into a stack, it’s a list.

                                                  1. -1

                                                    That’s picking nits, and not really useful ones. The stack, as in the C/hardware stack, is pushed to on subroutine call and indexed into throughout the subroutine. It’s hard to argue that it’s not a stack.

                                                    1. 3

                                                      It’s not a nitpick. “A stack is just a vertical list” is nonsense, else we’d just call it a list.

                                                      1. 1

                                                        Have you ever tried physically rotating a data structure? Of course it’s nonsense! :)

                                                        Anyway, it depends on how restrictively you define a stack. Is it just nil, push and pop? What about rot, tuck and roll? I’d say nil, cons, car and cdr form both a list and a stack minimally and adequately. Both data structures have a head and a tail, and arbitrary indexing is O(n), so it’s just a matter of perspective.

                                                        Also you can arbitrarily index into a mutable stack non-destructively if you keep another stack around to form a zip list.

                                                        1. 1

                                                          A stack doesn’t really have a tail, if you’re pushing and popping the “tail” it’s a list or dequeue. rot is not an operation that cannot be performed on a stack, it requires some kind of external storage; in Forth, >r swap r> swap, and swap too requires some external storage. Same for tuck. roll is not a real stack operation, it’s treating a stack like a list and indexing arbitrarily. Names for these data structures exist for a reason. A stack is not a vertical list because the point of a stack is you push and pop the top item. If you wanted to randomly index into some data structure that might be appended or truncated, the first data structure to come to mind would not be a stack, because that isn’t the point in a stack.

                                                          1. 1

                                                            These new external storage requirements aside (even pop requires some external storage), I’m not arguing what things are for, I’m saying what they are. Absolutely you wouldn’t arbitrarily index into a stack, but if you take an index-able data structure like a list (not that cdddr is any different to pop pop pop) and treat it like a stack, you’ve got yourself a stack. You wouldn’t reimplement a list type just to remove a method so you can rename it to “stack.”

                                              1. 3

                                                Are you sure you’re a Scottsman?

                                                1. 5

                                                  To be fair the article goes nicely in depth on the details why certain systems are not actually an independent set of services but rather lots of nodes heavily interconnected.

                                                  1. 7

                                                    I’d say 99% of microservice-architecture adoption is actually this — a single system where function calls are swapped out for asynchronous and error-prone network requests.

                                                    I’m glad the author’s conclusion was to not go this route, but I don’t think the warning is dire enough.

                                                    1. 3

                                                      I’m glad the author’s conclusion was to not go this route, but I don’t think the warning is dire enough.

                                                      Sconded.

                                                      I can remember only one application in which I had something that could truly be called “microservices”. One was a service that did a geographic lookup for zip-codes, the other was an endpoint that did nothing more than checking whether or not a certain location-tracking device, with a certain authentication-hash, could upload a new (timestamp,location) pair to the database. Both were extremely simple and only required one or two hits on the database, and only required some input validation.

                                                      I opted for plain-old-simple and reliable PHP for those services, while the more complex logic is handled in a large and mostly monolithic java-application.

                                                      1. 2

                                                        Yeah, that’s true.

                                                        But I also hope this won’t scare away teams who could really benefit from splitting their architecture up and making parts swappable and independent. Makes certain things much easier to handle, in my opinion.

                                                  1. 1

                                                    Isn’t this the Rust that everyone has been waiting for? Isn’t this our 1 dot Oh Yeah! It is on!!

                                                    Please correct me, but I believe this is the case.

                                                    1. 2

                                                      Sort of:

                                                      With this stabilization, we hope to give important crates, libraries, and the ecosystem time to prepare for async / .await, which we’ll tell you more about in the future.

                                                      1. 3

                                                        The stabilization report for the MVP of async/.await indicated a goal of 1.38, which comes out in ~3 months. Really looking forward to start playing with that.

                                                        Lots of future improvements on the horizon also. Personally I am really excited for async/await when combined with Generic Associated Types (type-members in traits can be generic, not just concrete as today), since that is a key requirement to be able to have async trait methods. Work on GAT is ongoing but not at all finished.

                                                        1. 2

                                                          That means, BTW, that the feature needs to be mostly done in 1.5 months, as that’s when the beta is being cut.

                                                          1. 3

                                                            Yes,. Here is the issue I mentioned above: https://github.com/rust-lang/rust/issues/62149

                                                            From the issue:

                                                            This leaves documentation and testing as the major blockers on stabilizing this feature.

                                                            and

                                                            As of today we have 6 weeks until the beta is cut, so let’s say we have 4 weeks (until August 1) to get these things done to be confident we won’t slip 1.38.

                                                    1. 3

                                                      Trying to implement (for example) NTPsec on Python would be a disaster, undone by high runtime overhead and latency variations due to GC.

                                                      Go uses a GC as well, but one which is tuned more for latency afaik. Can the Python GC be tuned for latency? You can at least disable it (temporarily).

                                                      1. 4

                                                        Can the Python GC be tuned for latency?

                                                        As far as I’m aware, the GC in CPython must stop-the-world and doesn’t have many tuning knobs. Maybe if you turn threshold0 way down, threshold1 up a little and left threshold2 more or less alone, the 50th %ile GC latency would get lower, at the cost of throughput (more, shorter young-generation collections)? The worst-case GC latency wouldn’t get any better though.

                                                        You can at least disable it (temporarily).

                                                        You can switch CPython’s GC off and leave it off (and have only refcounting) if you’re prepared to live without reference cycles.

                                                        I would expect the majority of libraries on PyPI have never been tested with the GC turned off. I wouldn’t be very surprised if people ignored pull requests that made their libraries more complicated for the sole purpose of enabling them to be used without the GC.

                                                        1. 7

                                                          Yeah. I suspect he never tried? Even if Python is too slow, would Java or OCaml have been a perfectly adequate language for that kind of project? It’s hard to believe the answer is no if Go is an option.

                                                          As someone who got started with ML in 2005, C seemed obviously obsolete even then.

                                                          1. 2

                                                            I’d definitely guess he never tried. Python is refcounted and in the common case the latency is (high but) mostly deterministic