Threads for vacherin

  1. 1

    This is basically meant to protect against human errors.

    As such its applicability depends on whether this type of screw-up is likely for a particular team or a code base. In that sense this is a not too distant cousin of if (NULL != foo) inversion evangelized by some back in the 90s. It hedged against one particular type of typo at the expense of code readability. Some people needed it, most didn’t.

    Same here - you tend to mix-up the order of function arguments? Sure, use enums. But don’t forget to wrap your int args in classes too, so not to mix them up too. Better yet though, try and pay some attention to what you are typing.

    1. 1

      Programs are written primarily for humans to understand, and only secondarily for computers to execute. Protecting against human errors is the whole ball game.

      Between

      shouldBuy(true, true, false, true, false)
      

      and

      shouldBuy(needPool, needBasement, dontNeedAttic, needGarage, dontNeedGarden)
      

      there is an option which is — I hope — unambiguously superior.

    1. 11

      There’s another package called cdb.

      1. 1

        Came here to say this. Hopefully, the more people upvote this, the op will reconsider the name.

        1. 4

          I don’t mind changing the name at all! I didn’t expect this project to take off and get noticed by many. On reddit and here, name thing has been brought up already, so I am definitely considering to rename it.

          1. 2

            It ack’ed in the README:

            Name

            cdb is an acronym for Cask Database, and it is not related to D. J. Bernstein’s cdb. I don’t expect this to get popular or used in production environments, and naming things is hard pleading_face.

            https://github.com/avinassh/cdb#name

            1. 6

              I will never understand the obsession with naming things that aren’t user logins or executable files three characters long. “Cask DB” is a wonderful name for the project.

              1. 5

                hey, thank you for the feedback. I wasn’t expecting this to be noticed by many, so I just kept it as cdb. It also refers to an internal joke between me and my partner.

                I am considering to rename it

                1. 2

                  I’d encourage you to rename it. cdb is a well-recognized name for djb’s project and since yours is fairly fresh, it’d be a good thing to not create an ambiguity, especially if you are aware of it. If nothing else, it will prevent any discussion of your work to veer immediately to this subject at the expense of the rest. Just like it did here.

                  1. 5

                    If nothing else, it will prevent any discussion of your work to veer immediately to this subject at the expense of the rest. Just like it did here.

                    This is a great point. I have renamed it to py-caskdb and I have also credited your message.

        1. 4

          This is neat! But it takes as input a UTF-16 character (wchar_t), when in my experience nearly all Unicode text is encoded as UTF-8*. So that would require an additional step to decode 1-4 UTF-8 bytes to a Unicode codepoint. (Not sure what one does with codepoints above FFFF, where stuff like emoji lives — maybe there are no alphabets up there?)

          It’d be cool to have an algorithm that directly read and wrote UTF-8. I imagine there’s one out there already somewhere.

          * Obj-C’s NSString uses UTF-16 in its API, but I know it internally stores the string in several encodings including ASCII to save space; not sure if it uses UTF-8. (Swift went to an internal UTF-8 representation.) I hear Windows APIs use UTF-16? Mostly UTF-16 seems like a relic from the period before the Unicode architects realized that 65536 characters wasn’t enough for everything.

          1. 1

            Not sure what one does with codepoints above FFFF, where stuff like emoji lives — maybe there are no alphabets up there?

            There are, so this is not a “Unicode-complete” solution, but probably good enough for many use-cases.

            1. 1

              Additionally, there are a bunch of late addition and corrections for the CJK block. 叱 53F1 vs 𠮟 20B9F being a notable example.

            2. 1

              Utf16 is a good choice for where the code comes from - wine which need to deal with windows API which is mostly utf16.

              1. 1

                This was for a Windows project. Platform API is entirely in UTF-16. UTF-8 version would probably require multi-level lookup tables, but these should be compressible along the same lines.

              1. 20

                Warning: this is not supposed to be taken very seriously. It’s not a joke, but I won’t bet 2 cents that I’m right about any of it.

                Pretty much all widely used languages today have a thing. Having a thing is not, by far, the only determinant factor in whether a language succeeds, and you can even question whether wide adoption is such a good measure of success. But the fact is, pretty much all languages we know and use professionally have a thing, or indeed, multiple things:

                • Python has simplicity, and later, Django, and later even, data science
                • Ruby has Rails and developer happiness (whatever that means)
                • Go had simplicity (same name, but a different thing than Python’s) and concurrency (and Google, but I don’t think that it counts as a thing)
                • PHP had web, and, arguably, Apache and cheap hosts
                • JavaScript has the browser
                • Typescript has the browser, but with types
                • Java had the JVM (write once, run everywhere), and then enterprise
                • C# had Java, but Microsoft, and then Java, but better
                • Rust has memory safety even in the presence of threads

                Even older languages like SQL, Fortran, Cobol, they all had a thing. I can’t see what Hare’s thing might be. And to be fair, it’s not a problem exclusively with, or specially represented by, Hare. 9/10 times, when there’s a post anywhere about a new language, it has no thing. None. It’s not even that is not actually particularly well suited for it’s thing, it can’t even articulate what it’s thing is.

                “Well, Hare’s thing is system’s programming” that’s like saying that McDonald’s thing is hamburgers. A thing is more than a niche. It’s … well, it’s a thing.

                It might well be the case that you can only see a thing in retrospect (I feel like that might be the case with Python, for instance), but still, feels like it’s missing, and not not only here.

                1. 3

                  It might well be the case that you can only see a thing in retrospect

                  Considering how many false starts Java had, there was an obvious and error-ridden search process to locate the thing—first delivering portability, mainly for the benefit of Sun installations nobody actually had, then delivering web applets, which ran intolerably poorly on the machines people needed them to run on, and then as a mobile device framework that was, again, a very poor match for the limited hardware of the era, before finding a niche in enterprise web platforms. Ironically, I think Sun did an excellent job of identifying platforms in need of a thing, seemingly without realizing that their thing was a piss-poor contender for being the thing in that niche. If it weren’t for Sun relentlessly searching for something for Java to do, I don’t think it would have gotten anywhere simply on its merits.

                  feels like it’s missing

                  I agree, but I also think it’s a day old, and Ruby was around for years before Rails. Although I would say that Ruby’s creator did so out of a desire for certain affordances that were kind of unavailable from other systems of the time—a Smalltalk placed solidly in the Perl-Unix universe rather than isolated in a Smalltalk image. What we seem to have here is a very small itch (Zig with a simpler compiler?) being scratched very intensely.

                  1. 2

                    Ruby and Python were in the back of my mind the whole time I was writing the thing about things (hehe), and you have a point about Java, that thing flailed around A LOT before settling down. Very small itch is a good summary.

                    Time will tell, but I ain’t betting on it.

                    1. 1

                      I’m with you. But we’ll see, I guess.

                  2. 3

                    Pretty much all widely used languages today have a thing. […] Even older languages like SQL, Fortran, Cobol, they all had a thing

                    An obvious language you do not mention is C. What’s C’s thing in that framework? And why couldn’t Hare’s thing be “C, but better”, like C# is to Java? (Or arguably C++ is to C, or Zig is to C)

                    1. 12

                      C’s thing was Unix.

                      1. 4

                        Incorrect…C’s thing was being a portable less terrible macroassembler-ish tool.

                      2. 3

                        Well, I did say a thing is not the only determinant for widespread adoption. I don’t think C had a thing when it became widely used. Maybe portability? It was the wild wild west days, though.

                        Hare could very well eat C’s lunch and became big. But being possible is far away from being likely.

                        1. 2

                          C’s thing is that it’s a human-friendly assembly.

                          strcpy is rep stosb, va_list is a stack parser, etc.

                          1. 5

                            But it’s not. At least not once you turn on optimizations. This is a belief people have that makes C seem friendlier and lower level, but there have been any number of articles posted here about the complex transformations between C and assembly.

                            (Heck, even assembly isn’t really describing what the CPU actually does, not when there’s pipelining and multiprocessing going on.)

                            1. 2

                              But it is. Sure, you can tell the compiler to optimize, in which case all bets are obviously off, but it doesn’t negate the fact that C is the only mainstream high-level language that gets you as close to the machine language as it gets.

                              That’s not a belief, it’s a fact.

                              1. 4

                                you can tell the compiler to optimize, in which case all bets are obviously off

                                …and since all deployed code is optimized, I’m not sure what your point is.

                                Any modern C compiler is basically humoring you, taking your code as a rough guideline of what to do, but reordering and inlining and unrolling and constant-folding, etc.

                                And then the CPU chip gets involved, and even the machine instructions aren’t the truth of what’s really going on in the hardware. Especially in x86, where the instruction set is just an interpreted language that gets heavily transformed into micro-ops you never see.

                                If you really want to feel like your C code tells the machine exactly what to do, consider getting one of those cute retro boards based on a Z-80 or 8086 and run some 1980s C compiler on it.

                                1. -1

                                  No need to lecture and patronize if you don’t get the point.

                                  C was built around machine code, with literally every language construct derived from a subset of the latter and nothing else. It still remains true to that spirit. If you see a piece of C code, you can still make a reasonable guess to what it roughly translates to. Even if it’s unrolled, inlined or even trimmed. In comparison with other languages, where “a += b” or “x = y” may translate into the pages of binary.

                                  Do you understand the point?

                                  1. 2

                                    C Is Not a Low-level Language

                                    The post you’re replying to isn’t patronizing you, it’s telling the truth.

                                    1. 2

                                      You are missing the point just the same.

                                      It’s not that C generates an exact assembly you’d expect, it’s that there’s a cap on what it can generate from a given piece of code you are currently looking at. “x = y” is a memcpy at worst and a dereferencing a pointer does more or less just that. Not the case with C++, leave alone Go, D, etc.

                                      1. 1

                                        I suggest reading an intro to compilers class textbook. Compilers do basic optimizations like liveliness analysis, dead store eliminations etc. Just because you write down “x = y” doesn’t mean the compiler will respect it and keep the load/store in your binary.

                                        1. -1

                                          I suggest trying to make a rudimentary effort to understand what others are saying before dispensing advice that implies they are dolts.

                                    2. 2

                                      If you see a piece of C code, you can still make a reasonable guess to what it roughly translates to.

                                      As someone who works on a C compiler for their day job and deals with customer support around this sort of thing, I can assure you this is not true.

                                      1. 2

                                        See my reply to epilys.

                                        Can you share an example of resulting code not being even roughly what one was expecting?

                                        1. 4

                                          Some general observations. I don’t have specific examples handy and I’m not going to spend the time to conjure them up for what is already a thread that is too deep.

                                          • At -O0 there are many loads and stores generated that are not expected. This is because the compiler is playing it safe and accessing everything from the stack. Customers generally don’t expect that and some complain that the code is “stupid”.
                                          • At -O1 and above, lots of code gets moved around, especially when inlining and hoisting code out of loops. Non-obvious loop invariants and loops that have on effect on memory (because the user forgot a volatile) regularly result in bug reports saying the compiler is broken. In nearly every case, the user expects all the code they wrote to be there in the order they wrote it, with all the function calls in the same order. This is rarely the case.
                                          • Interrupt code will be removed sometimes because it is not called anywhere. The user often forgets to tag a function as an interrupt and just assumes everything they write will be in the binary.
                                          • Our customers program microcontrollers. They sometimes need timing requirements on functions, but make the assumption that the compiler will generate the code they expect to get the exact timing requirements they need. This is a bad assumption. They may think that a series of reads and writes from memory will result in a nearly 1-1 correspondence of load/stores, but the compiler may realize that because things are aligned properly, it can be done in a single load-multiple/store-multiple instruction pair.
                                          • People often expect one statement to map to a contiguous region of instructions. When optimizaton is turned on, that’s not true in many cases. The start and end of something as “simple” as x = y can be very far apart.

                                          This is just from recent memory. There is really no end it. I won’t get into the “this instruction sequence takes too many cycles” reports as those don’t seem to match your desired criteria.

                                          1. 1

                                            Thanks. These are still more or less in the ballpark of what’s expected with optimizations on.

                                            I’ve run into at least a couple of these, but I can remember only one case when it was critical and required switching optimizations off to get what we needed from the compiler (had to with handling very small packets in the nic driver). Did kill a day on that though.

                        1. 1

                          Not sure who the target audience of this is.

                          1. 5

                            At some point I wanted to get closer to Machine Learning/Computer Vision, and I found out that I had to compute eigenvalues and eigenvectors for PCA, and then I wanted to find out how they are computed, and then I found out I might need to compute QR decomposition for a matrix, and soon I’ve entered in a linear algebra rabbit hole.

                            So I’ve documented my findings and wrote this article. Somehow just calling methods from a python library was not satisfying for me, I wanted to understand the underlyings.

                            The target audience would be people who are just starting with this, or undergrad students who are taking numerical analysis / linear algebra classes and have to implement LU/QR decomposition or algorithms for Row Echelon.

                          1. 1

                            Trying to understand why NtQueryDirectoryFile fails with 0xC000000D for no apparent reason on one specific machine that I don’t have an access to. That is, nothing unusual.

                            1. 1

                              … aaaand it was a bug in the Google Drive client.