1. 13

    I originally also suppressed this output on non-terminal devices, but then prog | less will still hang without a message, which is not great. I would encourage suppressing this output with a -q or -quiet flag.

    STDIN might be a terminal while STDOUT or STDERR are not – you have different FDs, it is not a single STDIO device.

    For example in C, you can test particular FDs this way:

    #include <stdio.h>
    #include <unistd.h>
    void check(int fd) {
    	if (isatty(fd))  printf("FD %d is a terminal\n", fd);
    	else             printf("FD %d is a file or a pipe\n", fd);
    void main() {


    $ make is-a-tty
    cc     is-a-tty.c   -o is-a-tty
    $ ./is-a-tty 
    FD 0 is a terminal
    FD 1 is a terminal
    FD 2 is a terminal
    $ ./is-a-tty | cat
    FD 0 is a terminal
    FD 1 is a file or a pipe
    FD 2 is a terminal
    $ echo | ./is-a-tty
    FD 0 is a file or a pipe
    FD 1 is a terminal
    FD 2 is a terminal
    $ echo | ./is-a-tty | cat
    FD 0 is a file or a pipe
    FD 1 is a file or a pipe
    FD 2 is a terminal
    $ ./is-a-tty 2>/dev/null 
    FD 0 is a terminal
    FD 1 is a terminal
    FD 2 is a file or a pipe

    I would not recommend messing the STDOUT/STDERR with superfluous messages if there is no error.

    Indeed, there is Rule of Silence:

    When a program has nothing surprising to say, it should say nothing.

    Waiting for an input from a file or pipe is expected non-surprising behavior. Only when waiting for an input from the terminal, it could make sense to print some prompt or guide the user what he should do.

    1. 2

      I forgot you can can run isatty() on stdin, too. Previously it did check this for stdout, but I removed his earlier (isTerm is the result of isatty(stdout)).

      I’ll update the program and article; thanks.

      1. 3

        isatty on stdin is good to test if your users made a mistake, and then isatty() again on stderr to make sure your users are reading your message!

      2. 1

        Strictly speaking, this is POSIX not C. isatty has been broken in the past on Windows with some distributions of GCC, I am unsure what the status is these days.

      1. 2

        I would probably go with the conversion matrices specified in the sRGB standard (IEC 61966-2-1:1999, also listed on the Wikipedia page for sRGB), to hopefully be consistent with other implementations.

        Edit: there is a PDF on w3.org that also lists the values from the standard, and which adds some information about normalization. I don’t know if this accounts for the difference between the values in the standard and those derived in the article here.

        1. 2

          I went through this a couple of months ago while designing the CIE polar color conversion in piet. What I found was that many sources had slightly different values for these matrices (and in particular I didn’t find Wikipedia reliable), so ended up going with the ones in the w3.org spec you linked. One way to validate these is that white ends up as white, to 6 decimal places, say.

          It’s also the case that an error in the 4th decimal place won’t be particularly visible, but it is evidence of taking care to get things right.

          ETA: here’s a link to the colab notebook I used to calculate this, including the verification of white point.

        1. 12

          Does your makefile support Windows at all?

          This seems like a non-issue – who builds on Windows outside of an environment like cygwin EDIT: who builds non-windows-first applications on windows using windows-specific build systems, rather than unix emulation layers? Supporting users of visual studio is a project in of itself, & while there are lots of Windows users around, there are very few who have a compiler installed or are liable to want to build anything from source. It makes more sense to do builds for windows via a cross-compiler & ask folks who want to build on windows to use cygwin – both of which are substantially less effort than getting VS to build an already-working project.

          1. 15

            I believe that’s your experience, but you and I have radically different experiences as Windows users.

            First, to be very blunt: Cygwin is truly awful. It needs to die. Cygwin is not a port of *nix tools to Windows; that’s MSYS. Cygwin is a really weird hacky port of a *nix to Windows. It’s basically WSL 0.0. It does not play well with native Windows tooling. It honestly doesn’t play well with Windows in general. And it’s comically slow, even compared to vaguely similar solutions such as WSL1. If I see a project that claims Windows support, and see Cygwin involved, I don’t even bother. And while I don’t know if a majority of devs feel similarly, a substantial enough group of Windows devs agree that I know my opinion’s not rare, either.

            You’re right that Visual Studio is the go-to IDE on Windows, in the same way that Xcode is on Mac. But just as Mac users don’t necessarily bust out Xcode for everything, Windows devs don’t necessarily bust out Visual Studio. Using nmake from the command line is old as dirt and still common (it’s how we build Factor on Windows, for instance), and I’ve seen mingw-based Windows projects that happily use cross-platform gnumake Makefiles. CMake is also common, and has the benefit that you can generate Visual Studio projects/solution when you want, and drive everything easily from the command line when you want. These and similar tools designed to be used without Visual Studio are heavily used enough and common enough that Microsoft continues to release the command-line-only Windows SDK for the most recent Windows 10–and they do that because plenty of devs really do only want that, not all of Visual Studio.

            For reasons you point out elsewhere, there’s a lot that goes into supporting Windows beyond the Makefile, to the point that concerns about cross-platform make may be moot, but “Windows devs will use Cygwin” seems reductionist.

            1. 5

              I don’t think windows devs use cygwin. I think that non-windows devs use cygwin (or mingw or one of the ten other unix-toolchain-for-windows environments) so that they don’t need to go through the hoops to become windows devs.

              In other words, I’m not really sure who the audience is for OP’s argument re: building on windows.

              If you’re building on windows & you are a windows dev, why care about make at all? If you’re building on windows & you are not a windows dev, why care about the first group at all? In my (dated & limited) experience these two ecosystems hardly interact & the tooling to make such interaction easier is mostly done by unix-first developers who want to check windows builds off the list with a minimum of effort.

              1. 3

                I think you need to take into consideration that there are also libraries. Sure, if you have an application developed on non-Windows, the easiest way to port it to Windows is building it in MSYS, with MinGW, or possibly Clang. But if you develop a library that you wish Windows developers be able to use in their projects, you have to support them building it with their tools, which is often MSVC.

            2. 8

              who builds on Windows outside of an environment like cygwin?

              I don’t understand this question. There are lots of software applications for Windows, each one has to be built, and cygwin is used really rarely. And CMake is precisely for supporting Visual Studio and gcc/clang at the same time, this is one of the purposes of the tool.

              1. 2

                In software applications that are only for windows, supporting unix make isn’t generally even on the table. Why would you, when a lot more than the build system would need to change to make a C or C++ program aimed at windows run on anything else?

                It only really makes sense to consider make for code on unix-like systems. It’s very easy to cross-compile code intended for unix-like systems to windows without actually buying a copy of windows, and it’s very easy for windows users to compile these things on windows using mechanisms to simulate a unix-like system, such as cygwin.

                There are a lot of alternative build systems around, including things like cmake and autotools that ultimately produce makefiles on unix systems. If your project actually needs these tools, there are probably design issues that need to be resolved (like overuse of unreliable third party dependencies). These build systems do a lot of very complicated things that developers ought not to depend upon build systems for, like generating files that play nice with visual studio.

                1. 2

                  In software applications that are only for windows, supporting unix make isn’t generally even on the table.

                  Every team I’ve been on which used C++ has used CMake or FASTBuild, so supporting Unix builds at some point isn’t off the table, and it makes builds a lot easier to duplicate and simplifies CI/CD. Every project I’ve seen with build configuration in a checked-in Visual Studio solution makes troubleshooting build issues a complete nightmare since diffs in the configs can be hard to read. CMake’s not great, but it’s one of the more commonly supported tools.

                  If your project actually needs these tools, there are probably design issues that need to be resolved (like overuse of unreliable third party dependencies).

                  I’m not sure how this logically follows.

                  These build systems do a lot of very complicated things that developers ought not to depend upon build systems for, like generating files that play nice with visual studio.

                  Using CMake (or something else which generates solution files for Visual Studio), provides developers options on how they want to work. If they want to develop on Linux with vim (or emacs), that’s fine. If they want to use CLion (Windows, Mac or Linux), that’s also fine. There really isn’t that much extra to do to support Visual Studio solution generation. Visual Studio has a fine debugger and despite many rough edges is a pretty decent tool.

              2. 2

                This seems like a non-issue – who builds on Windows outside of an environment like cygwin?

                Most Windows developers and cross-platform frameworks that I can tell.

                1. 4

                  I should rephrase:

                  Who builds cross-platform applications not originally developed on windows outside of an environment like cygwin?

                  Windows developers don’t, as a rule, care about the portability concerns that windows-first development creates, & happily use tools that make windows development easier even when it makes portability harder. And cross-platform frameworks tend to do at least some work targeting these developers.

                  But, although no doubt one could, I don’t think (say) GIMP and Audacity builds are done through VS. For something intended to be built with autotools+make, it’s a lot easier to cross-compile with winecc or build on windows with cygwin than to bring up an IDE with its own distinct build system – you can even integrate it with your normal build automation.

                  1. 2

                    I work on software that is compiled on Windows, Mac, and Linux, and is generally developed by people on Windows. We do not use Cygwin, which as gecko point out above, is truly awful. If I need to use Linux, I use WSL or a VirtualBox VM. And yes, I and my team absolutely care about portability, despite the fact that we develop primarily on Windows.

              1. 2

                Unrelated to the content, but that first example for getcpu() looks wrong. It is passing uninitialized pointers to getcpu() and then casting those (still uninitialized) pointers to int pointers (which are not int) in the calls to printf().

                1. 2

                  The updates in the article contain most most of what I would comment, except perhaps that the fixed width integer types are optional (present if the implementation provides them).

                  1. 4

                    Like many things in C, it is fairly easy to write an implementation of a dynamic array, but not necessarily trivial to handle the corner cases.

                    For instance, from a quick glance it appears none of the three examples linked here check if doubling the capacity will overflow. Two of them use an int to store the capacity, and overflowing that would be UB, and in all cases it could result in the allocated memory suddenly shrinking, introducing out of bounds memory accesses.

                    1. 1

                      Now, I changed that from “int” to “unsigned”.

                      1. 1

                        Honestly I posted mine because I am stuck on some memory bugs in my implementation right now and I was hoping someone would point something out. The issue I’m currently having has to do with invalid checksum for freed object:

                        slowjs(1679,0x106c0e5c0) malloc: Incorrect checksum for freed object 0x7faf7a403168: probably modified after being freed.

                        While the error occurs in the vector library I get the feeling it is a bug in code using the vector not in the vector itself.

                        That said I totally forgot about dealing with using unsigned ints and having a max int check. Will fix.

                      1. 8

                        For projects that are exactly C (not C++), I found check quite nice. It’s in C and has no C++ dependencies.

                        Here’s an example: https://github.com/vyos/ipaddrcheck/blob/current/tests/check_ipaddrcheck.c

                        1. 7

                          Another one that’s focused on C that I’ve enjoyed for several years now is greatest.


                          I tend to use it over others because it exists in a single header file, so it’s easy to add to existing projects without fighting my build system for too long.

                          1. 5

                            Love greatest, it runs everywhere so you have no problems getting it to build on your CI.

                            Also used µnit sometimes, the reproducible random number generator can be great when a test fails.

                            1. 2

                              That looks great, with a similar API to gtest. Thanks for the tip!

                          1. 2

                            Dynamic programming is such an interesting concept, it took a good while (and lots of examples coded) to get my head around it.

                            The interesting thing is that I have not used it outside of preparation for interviews. I am not sure if it is still the case, but Google and Facebook would often ask problems that needed dynamic programming approaches to solve (e.g. the classic backpack problem or one of its permutations).

                            The same way, I have noticed that the only engineers who can knock dynamic programming solutions off the shelf are either the ones who did competitive programming back in the day (where this is a basic technique) or others who’ve gone through thorough interview preparation.

                            Anyone used dynamic programming in the real world or know of libraries that have this kind of algos implemented?

                            1. 7

                              I’ve used it in the real world. Had a problem a decade or so ago that required computing the difference between tree-structured data, and a dynamic programming method similar to substring edit-distance was ideal. No libraries implemented it for me, and it was a pretty straightforward implementation via a dynamic programming approach. The use case was in the context of static program analysis where the trees I was working with were ASTs. I wrote part of it up a couple years ago for fun in F# (http://syntacticsalt.com/blog/2015-12-24-comparing-trees-functionally.html).

                              1. 1

                                That’s a cool use case and the post was a good read. Thanks for sharing!

                              2. 5

                                Anyone used dynamic programming in the real world or know of libraries that have this kind of algos implemented?

                                Dynamic programming algorithms are at the core of many, many bioinformatics and computational biology solutions (DNA sequencing, the human genome, and etc). One of the earliest was Needleman-Wunsch (circa 1970). Local alignments, global alignments, different cost structures all took off from the same basic trick.

                                1. 4

                                  I’ve gotten to use dynamic programming at my $DAYJOB, since it was the only reasonable way of solving a problem.

                                  In school, in competitive programming, and in interview situations, the problems are typically quite well-defined, and the data structure of choice is typically an n-dimensional matrix of integers. For an actual messy real-world problem, issues such as data structure choice, complicated solution types, choice of forward/backward/recursive organization of computation, memory limiting for extreme cases, and making operations parallel crop up. The core took half a day, but making it “industrial strength” took many months with many people involved.

                                  1. 3

                                    I think it’s likely that all the common uses of such techniques come as part of standard libraries in most languages, so when you need to use them, you have prepackaged versions. C/C++ code is the most likely place where you end up rolling your own, but with the STL since you can run these algorithms on your own objects it is even less needed.

                                    1. 2

                                      One recent use I remember was finding a (bit-) optimal parse for a LZ style compression algorithm. Basically it finds the shortest path from each position to the end (source for anyone interested).

                                      1. 2

                                        In university, I had a graph computation problem that had a recursive solution, but you could derive another graph from that that was easier to solve, also recursive. The paper only had pseudo-code and the proof of the time complexity just assumed you could use another algorithm in the middle that had certain complexity, calculated using the recursive method.

                                        In practice, though, using the recursive way of doing things absolutely sucked and switching it to the dynamic approach changed it from calculating results for subgraphs to, instead, combining results from already calculated subgraphs.

                                        The funny thing was how straightforward it seemed for the problem given than it was for all the motivating examples before.

                                        I wish I remembered anything about the problem but it’s all gone with time (this was years ago). It wasn’t particularly complex but it was very satisfying to ‘flip’ it to bottom up and see the effect.

                                      1. 5

                                        Seems like a reasonably fair comparison without any too contrived examples.

                                        Globbing is usually avoided in CMake because it is a build system generator (the problem is who knows when a source file as added or removed).

                                        In the dependency support section it might be better to use imported targets from “modern CMake” (and if the package is required, generation will fail if it is not found). Something like:

                                        add_executable(test main.c)
                                        find_package(OpenSSL REQUIRED)
                                        target_link_libraries(test PRIVATE OpenSSL::SSL)
                                        find_package(ZLIB REQUIRED)
                                        target_link_libraries(test PRIVATE ZLIB::ZLIB)

                                        Incidentally, I believe if you use the cmake_paths generator with Conan the above should work there as well, which has the benefit (depending on your use case) of making the CMake script independent of the package manager.

                                        1. 11

                                          While the examples convey the idea, there are a number of issues with using them for teaching C. For instance:

                                          • The macros are missing parenthesis around the macro parameters. SetBit(A, i + 1) will not expand to what you want
                                          • There is no range checking
                                          • It assumes that int is 32-bit. The standard only specifies a minimum range for int which is 16 bits. You likely want to use one of the fixed-width integer types from C99 like uint32_t.
                                          • Signed integers do not necessarily allow you to use all possible bit patterns. Signed int might not be two’s complement, and there could be values that are trap representations (I know this is entering the realm of the unlikely, but if you want portable C you have consider this).
                                          1. 7

                                            You could also use CHAR_BIT * sizeof(unsigned int) to get the number of bits in an unsigned integer (which is better suited for this type of job). Both are constants, so there should be no runtime overhead for calculating that result. I would also be inclined to make the functions static inline (a C99 feature) and define them in a header. That avoids any issues with macros while still producing decent code. Something like:

                                            #include <limits.h>
                                            #include <stdlib.h>
                                            static inline void setbit(unsigned int *a,size_t size,size_t idx)
                                              size_t       off = idx / (CHAR_BIT * sizeof(unsigned int));
                                              unsigned int bit = idx % (CHAR_BIT * sizeof(unsigned int));
                                              unsigned int val = 1u << bit;
                                              if (off >= size) abort();
                                              a[off] |= val;
                                            1. 3

                                              Out of a really morbid curiosity, are there any modern common platforms where CHAR_BIT isn’t just 8?

                                              1. 5

                                                It’s very hard to say, as I’ve found very scant information about non-byte-oriented, non-2’s-complement C compilers. The Unisys 1100/2200 use 9 bit characters and they are still in use (for various values of “use”). I’ve also heard that C compilers for DSPs tend to have to define char as larger than 8-bits, but I’ve never worked with a DSP so I can’t say for sure.

                                                1. 2

                                                  Any modern platform that is, is not POSIX-compliant.

                                                  1. 2

                                                    The standard (indirectly) requires CHAR_BIT to be at least 8. I’ve never worked on a platform where it wasn’t 8, but have heard about embedded devices where it was 16 or 32 – a quick web search found mention of such as well [1].

                                                    I think one important thing the UB debacles over the recent years has shown is that if you rely on behaviour not guaranteed by the standard, your code may work right now with the current compiler on the current hardware, but may break randomly in the future.

                                                    [1] https://stackoverflow.com/questions/32091992/is-char-bit-ever-8/38360262

                                                2. 1

                                                  Signed int might not be two’s complement,

                                                  Are there modern common architectures where this is not the case?

                                                  1. 3

                                                    I’m not aware of any non-2’s-complement machines made since the 70s, but as I’ve learned over the years, there are always exceptions. So one may be lurking out there [1]. I’m also not aware of any 2’s complement machines that have a trap representation, but I do know of several 2’s complement byte-addressable machines that can trap on overflow—the VAX [2] and MIPS [3].

                                                    [1] IEEE-754 floating point math is sign-magnitude, and that’s in wide use these days.

                                                    [2] There’s a flag in the condition codes register that enables/disables such trapping.

                                                    [3] The ADD instruction will trap on overflow, but all compilers I’ve used on MIPS default to ADDU, which does not trap.

                                                1. 3

                                                  I think part of the reason is the emphasis many people put on portability of C code. For instance, until recently there was one major compiler vendor who did not support large parts of C99 including designated initializers. Also, some people may be forced to use arcane versions of GCC or proprietary compilers for embedded platforms, which might not support C99.

                                                  A problem with designated initializers specifically is that they are not part of C++ (at least until C++20), so they are not ideal for example code which might be picked up by both C and C++ programmers.

                                                  An interesting aspect is that compilers are aware of old idiomatic ways to do things and tend to optimize them so you end up with the same code when optimization is turned on.

                                                  Don’t get me wrong, I am all for moving forward, and there are a lot of things in C99 I wouldn’t like to be without (// comments and declaring variables anywhere to name two), but sometimes I still write ANSI C for maximum portability.

                                                  1. 41

                                                    Well I’m thoroughly impressed. Not just at the reverse engineering, but also your ability to expose one of my biggest gripes with tech’s hiring practices.

                                                    “Great job showcasing all the necessary skills and techniques that we use on our day-to-day job. Awesome work. Can you reverse this linked list?”

                                                    1. 25

                                                      Exactly. I interviewed at Google back in 2011 or 2012. Maybe it’s different now, but there were no questions about fit, personality, drive, or what you wanted to do with your career. There were no questions about my past work or anything like that. It was an all-day on-site series of gotcha questions.

                                                      (My favorite was “you’re trying to unmount an external drive on Linux and you can’t, why?” I ran through everything I could think of and after an hour got to “the umount binary is on the disk you’re unmounting.”)

                                                      (That or, I was asked how you would measure latency between two systems. I said “ping.” The interviewer just stared at me, so I kept talking. “Lots of pings averaged over time and throw out or explain any outliers.” Kept staring. This sort of thing for an hour.)

                                                      The whole experience really turned me off of Google, honestly. I was expected to just sit and have questions barked at me for hours because by God I should want nothing more than to work for Google.

                                                      1. 8

                                                        I interviewed at Google last year and had a great time with the on-site (Zurich). There were no trick questions, I think they stopped doing those years ago.

                                                        The typical question was something reasonably simple algorithmically, that then turned out to contain a lot of details you could delve into as time allowed.

                                                        I rather liked that the focus was on showing your enjoyment in solving problems.

                                                        1. 6

                                                          I had a similar experience in 2013/2014. Screener full of questions with lots of gotchas and edge cases (syscalls, what data structures don’t guarantee log n, what component is not in a passwd entry, …), even a sudden “quick, what’s 2^24?”. Remote technical interview was an hour with a shared google doc writing C (have fun indenting while keeping your sanity) with a phone to my ear to the interviewer. Offered an on-site interview but evolving home situation meant I couldn’t proceed. I’m very, very thankful for that now.

                                                          1. 3

                                                            the umount binary is on the disk you’re unmounting.

                                                            Weird. I don’t see why this could be a limitation. Disks can be unmounted with the umount(2) syscall, and a process using this syscall doesn’t have to be tied to its executable file (just like any other process), right?

                                                            (I know this is a bit off-topic)

                                                            1. 5

                                                              The executable would be opened and resident, holding a reference to the mount.

                                                              1. 3

                                                                Oh I see. From the Linux man pages:

                                                                       MNT_FORCE (since Linux 2.1.116)
                                                                              [...] If, after aborting requests, some processes still have active
                                                                              references to the filesystem, the unmount will still fail.

                                                                It’s probably a Linux-specific limitation. I just tried it on OpenBSD and MNT_FORCE unmounts the filesystem in anyway (processes started from the unmounted filesystem keep running). It fails without MNT_FORCE though.

                                                                1. 2

                                                                  I’m surprised OpenBSD is okay with it. I wonder if it has to do with OpenBSD’s umount being statically linked. Linux holds an open reference to the executing file for demand-loading/page-sharing purposes (I’m pretty sure most recent versions of Solaris and other Unices do too), and will block attempts to write to an executing file with ETXTBSY to prevent you from modifying a running file. That open reference to the executable prevents the umount from happening.

                                                                  Interestingly, I just ran an experiment and Linux doesn’t prevent you from writing to open shared objects that aren’t being directly executed (that is, dynamically linked libraries). So, I can start a process foo linked to libfoo and while I can’t write to foo, I can write to libfoo and cause a SIGBUS/SIGILL/whatever when foo loads that code.

                                                                  That’s interesting.

                                                                  EDIT: I guess it makes sense because you can dynamically load modules via dlsym or whatever and you would want to pick up changes in that case. It’s still interesting though.

                                                                  1. 1

                                                                    Linux holds an open reference to the executing file for demand-loading/page-sharing purposes

                                                                    Oooh, you mean Linux reads executable sections from the disk on demand? I didn’t know this! Locking the file makes sense in this case. I think it also makes sense that OpenBSD doesn’t do it given OpenBSD’s emphasis on simplicity and security.

                                                                    1. 2

                                                                      It’s not that Linux reads executables on demand, it’s that executable pages can be read back from disk if those pages end up being paged out due to memory pressure.

                                                                      (IIRC, I’m at game night and waiting for my turn to come up.)

                                                              2. 1

                                                                I’d think once the binary is executing and in memory… Maybe the filesystem shows it open thus the disk is busy?

                                                                Note: I should reload the page before replying

                                                              3. 2

                                                                My interview experience was pretty bad back in 2016, too. No more trick questions, but I wasn’t a “culture fit” lol

                                                              4. 1

                                                                Why do people always use reversing a linked list as an example? Google usually asked about graphs… If anybody can’t reverse a linked list then I doubt they’ve done much coding, right?

                                                              1. 5

                                                                Probably a DOS 16-bit executable packer I wrote back around 1997 (aPACK) and the compression library I wrote for it (aPLib), I still sometimes get an email from somebody telling me how they are using them, which is awesome.

                                                                If you are using some obscure little free library or tool, consider taking the time to send the author a quick email. Sometimes it can really brighten your day to get one of those.

                                                                1. 2

                                                                  Isn’t this basically Lomuto using the first element as the pivot instead of the last, and moving the pivot element along using an extra swap instead of swapping it to its final place at the end?

                                                                  1. 2

                                                                    IANAL, but I wonder about the patent section:

                                                                    Each contributor licenses you to do everything with this software that would otherwise infringe any patent claims they can license or become able to license.

                                                                    How about if a piece of software doing process B is released under this license, and later process ABCD is patented, does this give permission to do ABCD as long as the B piece is the one licensed under this software? What is the scope of “do everything”?

                                                                    Also, if I release a piece of software under this, and someone later receives a patent that covers some part of it, am I supposed to ensure access to that patent for all who received the software under this license?

                                                                    1. 6

                                                                      I think the “in practice” from the original title is relevant.

                                                                      1. 2

                                                                        Probably should also include in Python 3. The behavior happens because Python silently switches to bignum rather than overflowing when the operands exceed MAX_INT (sys.maxsize).

                                                                      1. 7

                                                                        git commit --amend is pretty handy to edit the last commit

                                                                        1. 1

                                                                          I’ve got a similar alias to fix mistakes in the last commit using the same commit message:

                                                                          fixup = commit --amend -C HEAD
                                                                          1. 2

                                                                            In Mercurial, this is just hg amend or hg am (all Mercurial commands can be abbreviated to their shortest unique prefix). If you do want to edit the commit message, it’s hg amend --edit.

                                                                        1. 2

                                                                          Is it assuming registrants will not fail?

                                                                          Otherwise the hardcoded number of registrants could be a problem – say one of them fails and never starts, then the algorithm will not finish (this could be a feature rather than an issue of course, depending on use).

                                                                          Also, if registrants can fail, then for example imagine one registrant has just re-added its id as the final one then crashes before it continues, half the other registrants see the full size and continue doing rest of work, while the one registrant restarts and resets the data structure and the other half then loop on a half-full data structure.

                                                                          1. 2

                                                                            I use this in cloud-init scripts so restarts aren’t an issue because cloud-init scripts aren’t re-executed on restart so what happens in practice is that cluster formation fails.

                                                                          1. 2

                                                                            It is nice to see a data compression example that is both easy to follow (with the blog post in hand) and not trivial.

                                                                            Also, it is great that he links to the blog posts of ryg and cbloom which contain lots of practical knowledge.

                                                                            The code is set up to test using enwik8. It is easy to end up optimizing for a specific type of data which might inversely affect the results on other types, so I would suggest testing with some non-text data sets as well if he isn’t already (perhaps the Silesia corpus, which contains various types of data).

                                                                            1. 3

                                                                              Compressing enwik8 took 22 seconds, a quick hack to try and compress silesia.tar which is roughly twice as big, took almost 2 minutes to compress and segfaults on decompression. A good indication it needs some testing on other types of data.

                                                                            1. 2

                                                                              I think the amount of discussion about the hows and whys and portability of small things like resolving dependencies is a good argument for using something like Meson, at least for non-trivial projects:

                                                                              project('foobar', 'c')
                                                                              cc = meson.get_compiler('c')
                                                                              m_dep = cc.find_library('m', required : false)
                                                                              executable('foobar', 'main.c', 'foo.c', 'bar.c', dependencies : m_dep)