1. 3

    Not really toy implementations but ninja is make if make was good and samurai is a smaller reimplementation of ninja.

    1. 17

      This is just “Java is a miserable nightmare to program in”, then it just descends into parroting the party line on Unix philosophy with nonsensical statements to back it up. “printf” vs. “System.out.println” is not a great reason.

      1. 25

        Yup. I’ve worked in large Java code bases, large Python code bases, and large C/C++ code bases. The reasons given in this article are imagined nonsense.

        • Remember syntax, subroutines, library features: absolutely no evidence to support this. In all my years I have yet to see even a correlation between the use of IDEs and the inability to remember something about the language. Even if the IDE is helping you out, you should still be reading the code you write. (And if you don’t, then an IDE is not the problem.) This claim is a poor attempt at an insult.
        • Get to know your way around a Project: If anything, IDEs make this simpler and better. I worked in GCC /binutils without tags or a language server for a while and let me just say that without them, finding the declaration or definition with grep is much less efficient.
        • Avoid long IDE startup time / Achive better system performance: Most people I know who use IDEs shut them down or switch projects about once a week, if that. This is just whining.
        • Less Code, Better Readability: Tell this to the GCC and Binutils developers, who almost certainly didn’t use an IDE yet still managed to produce reams of nearly unreadable code. Yet another nonsense claim from the “Unix machismo” way of thinking.

        The other points made in the article are just complaints about Java and have nothing to do with an IDE.

        1. 6

          Avoid long IDE startup time / Achive better system performance: Most people I know who use IDEs shut them down or switch projects about once a week, if that. This is just whining.

          It’s also not even true. In particular tmux and most terminals top out at a few MB/s throughput and stop responding to input when maxed so if you accidentally cat a huge file you might as well take a break. Vim seems to be O(n^5) in line length and drops to seconds per frame if you open a few MB of minified json, and neovim (i.e. vim but a DIY IDE) is noticeably slower at basically everything even before you start adding plugins. Nevermind that the thing actually slowing my PC down is the 5 web browsers we have to run now anyway.

          1. 2

            Vim seems to be O(n^5) in line length and drops to seconds per frame if you open a few MB of minified json

            Obscenely long lines are not a very realistic use pattern. Minified JSON and JS is a rare exception. Vim uses a paging system that deals very well with large files as long as they have reasonable line sizes (this includes compressed/encrypted binary data, which will usually have a newline every 100-200 bytes). I just opened a 3gb binary archive in vim; it performed well and was responsive, and it used only about 10MB of memory.

            1. 3

              A modern $100 SSD can read a few MB in 1 millisecond, $100 of RAM can hold that file in memory 5 million times, and a $150 CPU can memcpy that file in ~100 nanoseconds.

              If they did the absolute dumbest possible implementation, on a bad computer, it would be 4-5 orders of magnitude faster than it is now.

            2. 2

              Oh yes, I didn’t even mention this seemingly ignored fact. I can’t speak for Vim and friends, but Emacs chokes horribly on large files (much less so with M-x find-file-literally) and if there are really long lines, which are not as uncommon as you might think, then good luck to you.

        1. 7

          Don’t really see why MS should care about terminal bandwidth.

          The tl;dw of this is that Muratori implements a reference terminal that uses a tile renderer to render glyphs real fast whereas windows terminal is quite slow.

          There’s maybe some moving of the goalposts going on here. In the comment Muratori made on Github, he alleged that his design could be done “in a weekend” and his design included correct ClearType handling. Microsoft folk said that ClearType couldn’t be done with a glyph atlas approach, and, indeed, Muratori’s readme seems to say that ClearType is not handled properly yet (and would require information from DirectWrite that it does not give).

          1. 8

            The point of the video is that trivial back of the envelope calculations show that almost everything is 100-1000x (IMO Casey is being generous with the upper bound here) slower than optimal, and that the default code that can be easily understood and probably even written by someone with less than one year of programming experience is 100x faster than code that cost millions of dollars and many dev-years to produce.

            But in the specific case of terminal rendering, drawing text is one of the first things computers did. A $30 Android phone is more powerful than all the computers in the world combined when we first did it, and now rendering text at 60fps is almost universally considered to be too hard.

            On subpixel rendering: Casey talks about it in the github issue a bit IIRC. You have a fixed number of foreground/background colour combinations and 99% of the time it’s 16 colours on the default background, so you can key the glyph cache on fg/bg and bake cleartyped glyphs into it and it works out for ~free.

            1. 6

              Text is incredibly hard if you want to do it right. Obviously, the difficulties of vector fonts, complex scripts, RTL, now colour emoji, etc. But even back when we just had Latin characters, we had separate computers that did nothing but render text and handle input - physical terminals. The glass teletypes employed a lot of hacks to be useful; and of course, they were bottlenecked by their serial lines and couldn’t do bitmaps.

              Rendering text is hard, always has been, and now there’s more to do - less shortcuts.

              1. 4

                It is hard, especially in the general case, but when you have monospaced text with one font size and outsource text shaping/glyph rendering to directwrite most of the hard problems go away.

                Alternatively, Slug does everything for you and runs at 1000+ fps and is available for an amount of money that rounds to zero for a company like Microsoft.

                1. 1

                  Slug is very cool, thanks for linking to it!

                  Greater than 1000 fps might be overstating it unless you have experience with Slug. In the original paper, Section 5, they report 1.1ms to fill a 2 megapixel area with just 50 lines of Arial and from the algorithm description, more smaller text may take a longer time, but probably still fast enough for 100s of fps.

                  The clever things that Slug does don’t matter if you’re not doing geometric transformations on the glyphs anyway, but it would be nice to be nice for VR terminals! (Slug also doesn’t do font fallback, and doing that right may or may not be tricky)

          1. 9

            The most immediate takeaway for me: Rust compiles around 350 lines a second, Go around 44,000.

            (The only compilers faster in TFA are FreePascal, which is about as fast as Go, and TCC, which is crazy fast, something like 3x faster than Go’s compiler.)

            Now, code is run many orders of magnitude more times than it is compiled, so compilation time isn’t as important as runtime performance, but it does give an idea as to how pleasant developing code in a given language will be if you’re like me and have a very tight edit-compile-test cycle.

            1. 5

              Now, code is run many orders of magnitude more times than it is compiled, so compilation time isn’t as important as runtime performance, but it does give an idea as to how pleasant developing code in a given language will be if you’re like me and have a very tight edit-compile-test cycle.

              If you think in terms of wall clock time, slow compilation is a more serious problem. When developing, instant response gives an entirely different workflow compared to even just a few seconds’ wait. If it’s dozens of seconds, you seriously risk losing attention and focus every time you rebuild.

              1. 4

                It’s worth noting that FPC’s optimizer is quite good, whereas tcc does only the bare minimum of optimizations, which makes the fact that FPC is “only” one third the speed of tcc really impressive (IMVHO).[1] That also makes Go’s speed really interesting to me: I don’t know how you’d meaningfully compare the optimization skill of one compiler that’s has manual memory management only (or, I guess, reference counting, if we want to count TComponent descendents) v. one with a GC, but it’s interesting nonetheless.

                [1]: To be clear: this isn’t a dig at tcc; its entire point is to compile as fast as possible, to the point that you can use C as a scripting language, and it thus makes a deliberate decision not to have complex optimizations

                1. 4

                  Given that “more important” means that there’s an “exchange rate” between the two (because obviously no one will use a language with 10% faster runtime but 100x slower development time), it’s fun to speculate about what exchange rates different people have. For instance, as a Common Lisp programmer, I’m taking a 2x performance hit in exchange for a 10x-100x faster edit-compile-test cycle over Rust - that’s my “exchange rate” (or, my exchange rate is at least that much).

                  Meanwhile, Rust programmers’ exchange rates are probably much higher - they’d want something like 2x:10000x in order to switch.

                  …of course, the above only applies given “all else considered equal” - which is never is. Programming language design is hard.

                  1. 3

                    rustc is also multithreaded. Not sure about go but C/C++ compilers aren’t, so the gap is even bigger.

                    so compilation time isn’t as important as runtime performance

                    Tbh I don’t think this is true, for two reasons:

                    1. compilers can’t optimise code nearly as well as a person can, and when optimising by hand it’s nice to be able to run a lot of experiments
                    2. most code is not performance sensitive to the point where it can be 10000x slower than optimal and nobody will notice, Or they will notice and it doesn’t matter because the product is good anyway. Or they will notice and it doesn’t matter because your company has 9+ digits of investor cash. etc
                    1. 5

                      rustc is also multithreaded. Not sure about go but C/C++ compilers aren’t, so the gap is even bigger.

                      That’s not super true I believe. rustc front-end is not parallel. There were “parallel compiler” efforts couple of years ago, but they are stalled. What is parallel is LLVM-side code generation — rustc can split llvm ir for a crate into several chunks (codegen units) and let LLVM compile them in parallel. On a higher level, C++ builds tend to exhibits better parallelism than Rust builds: because of header files, C++ compilation is embarrassingly parallel, while Rust compilation is shaped as DAG (although the resent pipelines build helped with shortening the critical path significantly). This particular benchmark sets -j 1.

                      1. 2

                        Yeah, but compiling C/C++ is totally parallelizable per source file. Any nontrivial C/C++ build tool runs N parallel “cc” processes, where N is more or less the number of CPU cores. In practice, Xcode or CMake manage to peg my CPU at 100% right up until link time, which is unfortunately single-threaded.

                        1. 1

                          I THINK I told rustc to only use a single thread. The commands are there for someone to double-check.

                        2. 1

                          In their defence, Rust is driven by no-collector correctness, and Golang was built from the ground up with compilation speed as the important driver, because Google has fucktons of code they build every few hours.

                        1. 5

                          This is overcomplicating a simple problem. Using memcpy + bswap intrinsics needs zero thought and matches what the hardware actually ends up doing (unaligned load/store + bswap)

                          1. 6

                            Except those intrinsics aren’t standard C. Maybe the C compiler for those IBM mainframes she alludes to doesn’t have them? It does kind of boggle my mind that C and C++ still don’t come with support for endian conversions, which are one of the rock-bottom requirements for portable code.

                            This blog post covers most of the dark corners of C so if you’ve understood what you’ve read so far, you’re already practically a master at the language, which is otherwise remarkably simple and beautiful.

                            Masterful use of sarcasm there! I would insert “integer arithmetic” after “…of C…”, since there are plenty of other dark corners involving floating-point, parameter passing, struct alignment, etc.

                            1. 6

                              Pretty much every modern ISA has either swapped load/store (i.e PowerPC, I can speak from experience) or an instruction to do so (x86). The problem is the C abstract machine doesn’t expose it, so you effectively have to rely on compiler built-ins or inline assembly.

                              I wish C let you specify endianness as a modifier on a type, kinda like how Erlang binary pattern matches work.

                              1. 2

                                When I write and use the following:

                                static uint32_t load32_le(const uint8_t s[4])
                                {
                                    return (uint32_t)s[0]
                                        | ((uint32_t)s[1] <<  8)
                                        | ((uint32_t)s[2] << 16)
                                        | ((uint32_t)s[3] << 24);
                                }
                                

                                GCC and Clang can recognise what I’m trying to do, and they replace that ugly piece of code by a single unaligned load. Same pattern for the store. I’ve also heard that they take advantage of bswap for the big endian versions. MSVC is likely lagging behind. Even better, I found in practice that in this case, functions are faster than macros. I think the compiler is better able to simplify an isolated function, then inline it, than it would have been if that pattern was in the middle of a bigger function.

                                Yes, it would be nice for C to let you specify loads, stores, and endiannes more directly. In practice though, there are ways around this limitation.

                                1. 2

                                  That’s assuming your compiler is optimizing and can recognize such constructs. I think it would be better if the C abstract machine exposed such a thing so the semantics are obvious.

                                2. 2

                                  I wish C let you specify endianness as a modifier on a type

                                  The IAR compiler has the __big_endian keyword for just this purpose.

                                  1. 1

                                    I implemented bigendian<T> as a C++ template a few years ago. It’s just a wrapper around a numeric value that byte-swaps it when the value is read or stored. Very convenient.

                                  2. 1

                                    Alas, htole and friends are also non-standard:

                                      These functions are nonstandard.  Similar functions are present on the BSDs, where the required header file is <sys/endian.h> in‐
                                      stead of <endian.h>.  Unfortunately, NetBSD, FreeBSD, and glibc haven't followed the original OpenBSD naming convention for these
                                      functions, whereby the nn component always appears at the end of the function name (thus, for example, in  NetBSD,  FreeBSD,  and
                                      glibc, the equivalent of OpenBSDs "betoh32" is "be32toh").
                                    
                                  3. 5

                                    Well, apart from being nonstandard C, bswap intrinsics need to be applied conditionally based on your machine’s endianness, which adds a rarely tested codepath. I think it’s easier to avoid relying on endianness entirely by slicing up the bytes manually.

                                  1. 3

                                    Warning, this site appears to be hacked – the article showed up, but a few seconds later redirected to a blank page at a different site with a weird hostname that asked permission to pop up notifications… (Safari 14.1 on macOS 11.3 beta, with the Ghostery blocker plugin installed.)

                                    1. 2

                                      You might want to send them a message. It seems to be Wordpress, so not that weird if it was compromised. I haven’t had a redirect but I’m behind seven proxies. (Joking, just AdBlock and pihole). Here is the archived version: https://archive.md/NSHex

                                      1. 1

                                        I can’t help but wonder if this could have been avoided if they just blogged about a memory safe language instead.

                                        1. 1

                                          I didn’t have that problem for this site on iOS. But I have occasionally had that problem on random sites, totally unreproducible. I suspect it’s malware embedded in ads, but I can’t prove it since I don’t browse the internet with a debugger open.

                                          This site doesn’t seem to have ads, so who knows? Definitely weird. I wasn’t able to reproduce on my Mac either, with Ghostery enabled or disabled.

                                        1. 3

                                          I take it Steam doesn’t require any modifications that would violate the GPL?

                                          1. 5

                                            Steamworks/Steam DRM wrapper are optional. You can use steamworks through a second process or an LGPL component (for example) though

                                            1. 4

                                              This is a grey area and if you want to link your own GPL app with the Steam SDK one should definitely hire a lawyer first.

                                              1. 10

                                                I hate that kind of response. Most people don’t have that kind of money. “The only way to know whether what you want to do is legal or not is to blow all your money and time on a lawyer” is such a terrible system.

                                                1. 10

                                                  If you select a license that is multiple pages of dense legalese, then you are explicitly opting in to needing to talk to a lawyer to know what you can do with it.

                                                  1. 8

                                                    This seems to be implying you need a lawyer because of GPL, but of course you need a lawyer equally much because of Steam’s terms.

                                                    1. 1

                                                      So what’s the solution then? Write proprietary software and give up on open source?

                                                      I like the MIT license for many things, but there are some projects where I don’t want other people or corporations to just take the project and sell a modified proprietary version. I currently license those projects under the GPL. Should I just make them proprietary instead?

                                                      1. 2

                                                        I like the MIT license for many things, but there are some projects where I don’t want other people or corporations to just take the project and sell a modified proprietary version. I currently license those projects under the GPL. Should I just make them proprietary instead?

                                                        It’s your choice. I don’t really see a difference between a company taking my software, making a load of changes, and making money by selling it without making their changes available to their customers (not allowed by the GPL), and a company taking my software, making a load of changes, and making money by deploying it at scale in-house (allowed by the GPL). In both cases, someone is making a load of money from my software and is not contributing changes back. I consider this the cost of doing business for open source: I care more about how much it benefits me than about preventing it from benefitting someone else more than it benefits me.

                                                        If this is a concern for you, then you are already embarking on trying to enforce some legal protection that prevents people from doing something with your software. When you want to do use the law to prevent someone from doing something, you need to understand all of the nuances of the law in any of the applicable jurisdictions. You generally do this by either studying law or paying a lawyer. If you created the software, then it’s entirely your right to choose how you license it. The only thing that I object to is people choosing to use a complex legal document to restrict what other people can do with their work and then complaining that they need to be or hire a lawyer to understand how it interacts with other people’s complex legal documents.

                                                        1. 1

                                                          Is this (steam compatability) a real problem you have? If we can convince someone to use their case to have a good lawyer create a resuable opinion we could solve the problem for many projects at once. If you are in a position to have a case worth analysis, PM me and let’s talk strategy.

                                                      2. 4

                                                        Unless we can get a system of unambiguous law, I fear we may never do better…

                                                  1. 12

                                                    I don’t disagree that curl would be better off in rust, but curl is really a shining example of 90s super bloaty/legitimately terrible C code. Any rewrite would eliminate half its vulns (probably more), including a rewrite in C.

                                                    Bloat:

                                                    > wc -l src/**.c src/**.h lib/**.c lib/**.h
                                                     159299 total
                                                    

                                                    More code = more bugs and 160k loc to do not much more than connect(); use_tls_library(); write(); read(); close(); is insane. You can implement a 90% usecase HTTP client in < 100 lines without golfing. TLS/location header/keep-alive/websockets make that 99%+ and are also fairly straightforward.

                                                    Then let’s pick a file at random and take a look: connect.c

                                                    • entire file is full of nested ifdefs which themselves are nested inside regular flow control
                                                    • a bunch of ad-hoc string parsing
                                                    • entire file is littered with platform specific details (maybe this gets a pass because it’s sockets related, but it’s still a lot worse than it could be)

                                                    Finally let’s take a look at the API: curl_easy_getinfo

                                                    For whatever reason they folded like 20 functions into a single varargs function, so nothing is typechecked, including pointers that the function writes to. So you use int instead of long to store the HTTP response code by accident and curl trashes 4 bytes of memory, and by definition the compiler can’t catch it.

                                                    I put curl in the same box as openssl a long time ago. Extremely widely used networking infrastructure, mostly written by one guy in their free time 20 years ago, and kneecapped by its APIs. Kinda surprised it didn’t get any attention during the heartbleed frenzy.

                                                    1. 26

                                                      160k loc to do not much more than connect(); use_tls_library(); write(); read(); close(); is insane.

                                                      This is really unfair, running curl --help will show you what else curl can do other than just making an HTTPS request. Regardless of whether that makes sense, you can use curl to send and receive emails!!! In an older project, I remember we’ve tried many approaches in order to send and receive emails reliably talking to a variety of email servers with their particular bugs and quirks, and shelling out to curl turned out to be a very robust method…

                                                      1. 4

                                                        Indeed it can, so when you build it as a library you have to use configure flags like

                                                        –enable-static –disable-shared –disable-ftp –disable-file –disable-ldap –disable-ldaps –disable-rtsp –disable-proxy –disable-dict –disable-telnet –disable-tftp –disable-pop3 –disable-imap –disable-smb –disable-smtp –disable-gopher –disable-manual –disable-libcurl-option –enable-pthreads –disable-sspi –disable-crypto-auth –disable-ntlm-wb –disable-tls-srp –disable-unix-sockets –disable-cookies –without-pic –without-zlib –without-brotli –without-default-ssl-backend –without-winssl –without-darwinssl –without-ssl –without-gnutls –without-polarssl –without-cyassl –without-wolfssl –without-mesalink –without-nss –without-axtls –without-ca-bundle –without-ca-path –without-ca-fallback –without-libpsl –without-libmetalink –without-librtmp –without-winidn –without-libidn2 –without-nghttp2

                                                        1. 2

                                                          I interpreted the parent comment’s point about 160k LOC as targetting the fact that most uses of curl are hitting that narrow code path - and therefore most of that 160k LOC is around lesser-used features.

                                                          Because curl is semi-ubiquitous, or at least has a stable enough interface and is easy to download and install without major dependency issues, it ends up being used all over the place and relied upon in ways that will never be fully understood.

                                                          It’s a great tool, and has made my life so much easier over the years for testing and exploring, but perhaps it’s time for a cut-down tool that does only the bare minimum required for the most common curl use case(s), giving us a way to get the functionality with less risk.

                                                          edit: Of course there had to be many such tools already in existence! Here’s one that’s nice and small: https://gitlab.com/davidjpeacock/kurly

                                                      1. 5

                                                        TL;DR: Google deprecated tracking cookies because Google Analytics/AMP/directly uploading browser history from Chrome/etc are more effective. Chrome and another browser primarily funded by Google are introducing countermeasures to make it harder to compete with Google.

                                                        1. 2

                                                          I’m wondering if any parser / lexer generators protect you from doing this to yourself by am always memcpy()ing out the matched part for a given production into a little null terminated buffer?

                                                          1. 3

                                                            Indeed! It’d probably be cheaper overall to carry around the matches as little struct span { char const *offset; usize_t length; }s instead.

                                                            1. 4

                                                              This is also how approximately every other language (C++, D, Java, C#) saves strings.

                                                              1. 3

                                                                A difference is that such a struct is not for storing strings, but merely reference into it. This is what Go and Rust calls a string slice, and C++17 calls a string_view. No allocation, no memcpy.

                                                                1. 3

                                                                  Yes, sorry, that’s what I meant. D doesn’t differentiate between them (slices are the only user-visible type), so I’m not used to thinking of them as separate things.

                                                                  1. 2

                                                                    Box<str> in Rust is exactly { char *data; size_t length; }. Ownership is orthogonal to representation. Parsers like serde can return either one, depending on whether you want to keep the input string in memory, or parse from an unbuffered stream.

                                                              2. 3

                                                                Off the top of my head, jsmn does, and OpenBSD’s patterns stuff (which is ripped from Lua) does if you grab the code.

                                                                In general it’s easy to do and obviously good, but C has no nice way to print non terminated strings (printf %.*s warns if you implement your strings correctly lol) and C++ people can’t handle pointers so everything is std::string. C++17 has span/stringview but anyone who cared about this wrote their own implementations long ago so I’m not sure they really changed anything.

                                                              1. 11

                                                                cgdb built from master was the only usable linux debugger I found when I looked a few years ago, providing an experience similar to that of 22 year old Visual Studio but with more crashes. Other gdb wrappers either don’t work or are unusable. lldb is unusable too.

                                                                It’s unfortunate, but really not any different to the rest of linux userspace…

                                                                1. 1

                                                                  I keep wishing RemedyBG would come to linux.

                                                                1. 3

                                                                  This is all small fry. If you want actual decent builds you have to:

                                                                  1. Ban the STL
                                                                  2. Ban any third-party C++ library that isn’t just C in disguise
                                                                  3. Profile your builds

                                                                  1 and 2 are critical, banning the STL is a 10x speedup, banning C++ libs can easily be 10x or more again. Once you’ve done that you can use 3 to hunt down small wins like in the article.

                                                                  Some other small things I like:

                                                                  • Ban headers including other headers, except for types headers that only contain data structures needed for public function declarations. You’ll have to bend this rule a bit for your core types.h, adding allocator interfaces and small templates like min/max etc.
                                                                  • Forward declare C standard library stuff. Some C standard headers get blown up when you compile as C++, most notably math.h, so having a header with extern "C" float sinf( float ); etc is a decent win. As a bonus you can declare void sinf( double ); to catch any accidental double usage.
                                                                  • Quarantine windows.h, it’s huge and eats some useful names (DrawText/near/etc). Wrap it in a header that does lean and mean/nominmax/undef near far and only include it in files that actually use the winapi.
                                                                  1. 2

                                                                    So what do you use in place of the STL for things like strings or vectors?

                                                                    1. 1

                                                                      Roll your own, strings/dynamic arrays/fixed hashtables are < 500 LOC total.

                                                                      For more difficult data structures/algos I do actually fall back to using the STL, but that’s rare.

                                                                  1. 33

                                                                    While AVIF has decent compression due to using AV1, its container format (HEIF) is unfortunately a bloated designed-by-committee mess. We looked at implementing it in libavformat in FFmpeg during GSoC 2019, and the conclusion in short was “no”. I might write a blog post on its many failings if there’s interest, but in short: it is not just a still image format. It also supports albums, compositing and a whole slew of other junk. This makes implementing a decoder for it in a project like FFmpeg a monumental task.

                                                                    In my opinion it would be vastly superior to just define a FourCC for AV1 and stick it in a BMP file. BMP parsers are common, the format already supports compression. There’s no need to come up with anything new. A similar argument can be made for audio formats, which can just be stuck inside WAV files with an appropriate TwoCC.

                                                                    1. 14

                                                                      I’d love a bog post that explains it in great detail (the format, the existing ffmpeg software architecture, the assumptions, the goals, the conflicts). I’d also like to hear about non-technical side of this - I think there’s lots of value to be derived from talking about projects that didn’t succeed.

                                                                      1. 12

                                                                        Out of curiosity, is any container format NOT a mess? I’ve heard people complain about ogg, MP4 and a few others, but nobody seems to dish out much praise anywhere. Container formats in general seem to be a somewhat obscure topic, nobody seems to say much about tradeoffs in them or what makes a good vs. bad design.

                                                                        1. 4

                                                                          Well, BMP and WAV are fairly simple and find wide use. They have their quirks though, like uncompressed BMP are stored upside-down and lines must be an even number of bytes. WAV only supports constant bitrate. AVI worked well enough before B-frames started being used. Ogg is an absolute joke. ISOBMFF (MOV) has tons of derivatives including MP4, 3GP and HEIF. It suffers from requiring a complete header to decode. Fragmented MP4 fixes that, but of course that’s only MP4. MXF is widely used in the broadcast world, is a huge mess both design wise and for being split over oodles of SMPTE specs. It also happens to be the format I maintain in libavformat.

                                                                          1. 3

                                                                            is any container format NOT a mess

                                                                            This is a very good observation.

                                                                            Container formats that implement a ‘database-in-a-file’ with bunch of tables, ‘foreign-key-conventions’, and so on are really really difficult to use (and I cannot even imagine, what it is to implementers, or folks who write conversion utilities).

                                                                            I do not know what a proper solution/architecture approach for these are, though. It seems that this model is needed.

                                                                            PEM ( https://serverfault.com/questions/9708/what-is-a-pem-file-and-how-does-it-differ-from-other-openssl-generated-key-file )

                                                                            PDF

                                                                            HDF5

                                                                            come to mind.

                                                                            1. 3

                                                                              The ISO container (MPEG-4 part 14, QuickTime, &c.) is at least a sensible model for a time-synced multi-stream container. It has a lot of cruft in it, though.

                                                                              1. 3

                                                                                Bink

                                                                                From what I’ve heard a significant portion of its value is that you don’t have to deal with any open formats/libraries, all of which are garbage.

                                                                              2. 6

                                                                                You sure make it sound like an overengineered piece of shit, and it it is, then your blog post (please write it!) would help expose it and limit the damage it can do.

                                                                                1. 6

                                                                                  HEIF, and thus AVIF, is an unfortunate pile of ISO specs. Each spec in itself isn’t unreasonable, but the sum of them adds up to ridiculous bloat. You need 300 bytes of MPEG metadata to say 1 bit of information that the AVIF has an alpha channel.

                                                                                  However, it’s most likely that nobody will implement the full feature set, and we’ll end up with a de-facto AVIF minimal profile that’s just for still images.

                                                                                  AVIF-sequence is another bizarre development. It’s a video format turned into image format turned back into worse, less efficient, more complex video format. And Chrome insists on requiring AVIF-sequence over a real AV1 video in <img>.

                                                                                  1. 3

                                                                                    However, it’s most likely that nobody will implement the full feature set, and we’ll end up with a de-facto AVIF minimal profile that’s just for still images.

                                                                                    This is the issue though. Because we can’t claim to have implemented AVIF because someone is going to come along with a composite AVIF some day and go “guise ffmpeg is broken it can’t decode this”.

                                                                                    I looked at AVIF-sequence just now, it just sounds like AV1 in MP4 with “avis” in the ftyp atom. Nothing too strange about that.

                                                                                  2. 2

                                                                                    There’s no need to come up with anything new are there other patent-free high-compression formats that support, as an example, PPTX-> conversion (to an individual file) ?

                                                                                    For my needs, being able to stick a slide show into one file (and then being able to reference ‘a page’ within the file, on a client), solves some technical complexities.

                                                                                    I might write a blog post on its many failings if there’s interest

                                                                                    Oh, and I also join folks who would love to see you write a blog post on this. Implementer’s analysis of AVIF, its pain points, short comings, etc, would be very interesting in shaping community understanding of this.

                                                                                    1. 1

                                                                                      I think the complexity is being used though. For example, iPhones take burst images compounded into a single HEIF, IIRC.

                                                                                      1. 1

                                                                                        Is burst images the correct term for this? From what I can see iphones just take an actual video recording. Burst images makes me think of cameras which actually move the shutter but I’m not sure it makes any difference on a phone camera where there are no moving parts.

                                                                                    1. 12

                                                                                      This is all good stuff, I can also recommend decoupling handles and storage entirely. Instead of putting the underlying array index in the bottom n bits, you add a hashtable from handle to index.

                                                                                      The big advantage is that you can tightly pack your data. When you delete an item you swap it with the last element in the array and update the hashtable. Now if you want to perform an operation on every item you just blast over the array. It’s also simpler to expose iteration to the rest of the codebase like this, non-trivial iterators are a pain in C++ and it’s a one liner to hand out a span.

                                                                                      It also lets you generate handles from outside the system. For example, you can hash asset filenames and use those as handles, then you can pass the handles between the client and server for picking entity models etc and it just works, even better if you add a constexpr hash function :)

                                                                                      As an aside: people do this in rust too, where it has the additional benefit of hiding your allocations from the borrow checker and amusingly leaves you with the same safety guarantees as rewriting in C++

                                                                                      1. 5

                                                                                        Instead of putting the underlying array index in the bottom n bits, you add a hashtable from handle to index.

                                                                                        The main disadvantage to this is you then need a hashtable lookup for everything, and so you now have two levels of indirection to go through for each access instead of one. The tight packing is not necessarily needed if your data is still mostly-dense, it can be almost as fast to iterate through a hole-y array and skip over the unused entries. It’s a nice tradeoff sometimes, but it is a tradeoff.

                                                                                        This pattern IS very useful in Rust, its used commonly in gamedev/entity component systems. It’s also nice for stuff like interning strings though. In general I view it as a weird cousin to reference counting; a RC’d pointer is always valid to dereference and you do the safety accounting when you create/destroy pointers, a handle is possibly-invalid and needs no special actions to copy, but you check whether it’s safe when you access it.

                                                                                        1. 4

                                                                                          If you’re brave enough you can replace the hash table with a billion element array for a direct lookup from handles to object pointers. The trick is that, thanks to virtual memory, you don’t really consume much physical memory.

                                                                                          Better have a 64-bit build and small pages though :)

                                                                                          1. 2

                                                                                            That link is worth its own submission!

                                                                                          2. 2

                                                                                            I did not fully understand the handles.

                                                                                            is that just a hashmap of

                                                                                              <unsigned int,T*>
                                                                                            

                                                                                            where ‘the outside’ world uses the ‘key’ of the hash (the unsigned int) in my example ? Or is it something else?

                                                                                            1. 3

                                                                                              You got it!

                                                                                              1. 2

                                                                                                Not quite.

                                                                                                It’s more a handle is a uint32_t that is composed of (at least) two bitfields.

                                                                                                If you have an array to store N items, then you need the B = ciel(log2(N)) bits to store an index into that array.

                                                                                                So that means you have 32 - B bits to play around and do (other) useful stuff.

                                                                                                When you want access that element, you have to mask off the high bits and then you have a plain old array index.

                                                                                                Which you can sanity check against the size of the array.

                                                                                                So what nifty stuff can you do with those free bits?

                                                                                                Well a standard problem is use after free.

                                                                                                You allocate a resource, you free a resource, you still holding a pointer / handle, you forget you have freed the resource, you use it again….. Horrible things happen.

                                                                                                So you could use the free bits to create a unique id modulo 2^(32-B) that you store in the array and stamp onto the handle.

                                                                                                When anybody asks for anything to happen with that handle… you check they match and scream if they don’t.

                                                                                                What you do with those extra bits is up to your imagination…

                                                                                            1. 4

                                                                                              This sort of thing is why I strongly suggest developers who are doing their work professionally to get a proper desktop with proper processor and storage.

                                                                                              My (biased) opinion is that so many folks, especially in web dev, do all of their work on expensive lemons from Apple or on laptops instead of getting a beefy workstation that can serve their needs better. My Ryzen box was a godsend for dealing with this sort of thing a couple of years ago at launch.

                                                                                              (This also applies to getting real keyboards, mice, and monitors.)

                                                                                              1. 1

                                                                                                I agree. And I don’t even compile anything, I just need to transpile a bunch of TypeScript to ES5 or ES6. The difference Is huge when I switch from my beefy work-issued laptop to my humble 5-yo desktop. (The laptop does run windows so there’s that factor.)

                                                                                                There’s only one thing that bothers me usually. If I get a beast, I usually don’t have a lot of money left over for a beast laptop. So then I’m underpowered when I need to work on a laptop alone, like during travel. On the other hand, I don’t travel that much, and if I had to build something so intensive, I would probably set up some cloud/remote solution.

                                                                                                I don’t buy new machines often so I’d like to but the best. But you can’t have it all, can you?

                                                                                                1. 1

                                                                                                  My (also biased) opinion is that LLVM becoming the standard has doomed us to decades of dog slow compilers, so anyone who doesn’t currently have compile time issues will have them soon.

                                                                                                1. 4

                                                                                                  This material was presented nicely, and gave a good overview of the main phases of a physics system, but didn’t really cover the gnarly bits of physics engine implementation. I’ve gone down the rabbit hole of trying to implement my own physics engine before, and if you want generic convex hull/convex hull collision detection (which most modern games use – not just sphere/sphere detection), the code becomes a whole lot more complicated. The standard algorithm in use is GJK, which has some elegant ideas embedded in it but for a practical 3D use case, is a lot of painful cases to implement. Once you’ve implemented, I’ve heard of serious numerical issues that lead to subtle collision detection bugs that are hard to track down. I wish there were some more robust alternatives that are easier to implement, but I haven’t found anything…

                                                                                                  That was enough to scare me away – I’m at the point where I’ll just use an implementation of a physics engine with moderate popularity in the hopes that it’s battle-tested enough to have some confidence in.

                                                                                                  1. 2

                                                                                                    GJK and SAT upset me, because looking at the literature everybody’s consensus is “oh this is a solved problem, just go see the talk by person” and then when you actually try to build it it’s just a colossal pain in the ass.

                                                                                                    And god forbid you’re trying to do any of this in, say, a functional language with immutable data. That’s just a whole exercise in fun. >_<

                                                                                                    1. 1

                                                                                                      I’m at the point where I’ll just use an implementation of a physics engine with moderate popularity in the hopes that it’s battle-tested enough

                                                                                                      If I may ask, what options out there did you consider?

                                                                                                      1. 2

                                                                                                        Not OP but if you’re doing 3D PhysX is by far the best, because of the docs and PVD.

                                                                                                        Bullet is trash, I don’t know if the actual library itself is bad but everything around it is. The only docs you get are a 10 page outdated pdf. If you try to google anything you just get the same unanswered questions on their forums. There’s a bullet3 which seems to be an AMD tech demo to get people to care about OpenCL that ended up being shipped with zero documentation. The debugging stuff is a pain to set up and kills perf.

                                                                                                        Conversely, the PhysX docs are some of the best I’ve ever seen. With a few lines of code you get a standalone, rewindable(!) physics debugger. One of the guys who wrote it has a nice tips series on their blog.

                                                                                                        1. 2

                                                                                                          For 3D physics, there’s always Bullet which is old, but pretty standard. I’m going to give nphysics a shot for a new project of mine. And if you don’t need anything besides box detection, qu3e looks like a nice lightweight alternative.

                                                                                                      1. 3

                                                                                                        If the string is smaller than your chunk size, you can still do chunk-at-a-time processing by doing aligned loads and masking off the excess data. Loading out of bounds memory only crashes if you actually cross a page boundary and hit an unmapped page. Page size and load sizes are pow2, so aligned loads never cross page boundaries.

                                                                                                        Naturally, there is no way to express this in system programming languages since the UB police arrived, but compilers seem to leave you alone if you use intrinsics.

                                                                                                        1. 2

                                                                                                          You could keep valgrind happy by rounding your allocations up and memset()ing the last few bytes to 0.

                                                                                                          1. 2

                                                                                                            Note that this depends on your architecture and is not true on CHERI (or any other environment that gives byte-granularity memory safety). We had to disable the transforms in LLVM that assumed this was possible.

                                                                                                            It’s also undefined behaviour in C to read past the end of an object, so you need to be quite careful that your compiler doesn’t do unexpected things.

                                                                                                          1. 1

                                                                                                            Least effort conditional breakpoints, not sure if this is really useful outside of games though

                                                                                                            Add bool break1, break2, break3, break4; somewhere, extern bool break1; etc in some base header, and set them to be true when F1-4 are pressed every frame. Then drop if( break1 ) __debugbreak(); in some code that’s not working, set up the conditions so you hit the bug, then hit F1

                                                                                                            Also if you have to use Linux install cgdb

                                                                                                            1. 18

                                                                                                              Email from a self-hosting perspective absolutely is, though. Absolute clusterfsck to try and configure.

                                                                                                              1. 10

                                                                                                                Configuration is one thing. Actually getting email delivered is another. I feel like you’re instantly on Google’s and Microsoft’s blacklist with your little server, marking all your messages as spam. It’s horrendous!

                                                                                                                1. 12

                                                                                                                  Email is now a cartel. Old thread about it.

                                                                                                                  tl;dr if you really want to die on that hill, start by choosing your VPS provider carefully…

                                                                                                                  1. 2

                                                                                                                    Damn, my VPS of choice is DigitalOcean and I have to tell people to maybe check their spam folder for my email. Annoying.

                                                                                                                    1. 1

                                                                                                                      I relay all my email from my VPS through my (personal) FastMail account, which is easy and works well enough. Thus far the volume is still well within my account limits, but if I go over them I’ll probably just use SendMail or whatnot.

                                                                                                                      You can probably do the same with gmail or other providers.

                                                                                                                  2. 2

                                                                                                                    I spent a few hours setting up DKIM and SPF, after which my emails were delivered to gmail addresses (haven’t checked ms, but I’ve heard they’re more lenient) without a hitch. Yes, it’s irksome to have to spend even that amount of time, but it’s not that much work.

                                                                                                                    1. 2

                                                                                                                      Microsoft often marks its official communications as spam (usually correctly :)) in my Office 365 account… With the cloud and hosts reusing IP addresses all the time the old spam-fighting methods simly do not work anymore (many were bad ideas even back then)

                                                                                                                      1. 1

                                                                                                                        DMARC can be painful to setup.

                                                                                                                        1. 0

                                                                                                                          It’s trivial what do you mean

                                                                                                                      2. 6

                                                                                                                        I don’t think this is related to the article’s content.

                                                                                                                        1. 5

                                                                                                                          I’m not sure I agree. Services like Mail in a Box and Mailcow make getting started a little simpler. Overall it is complicated, but email is a complicated system. Being complicated doesn’t mean it’s broken though.

                                                                                                                          1. 3

                                                                                                                            Which part is the most painful?

                                                                                                                            1. 3

                                                                                                                              I understand that email hosting used to be appalling and most of it still is but OpenSMTPD is actually really nice to use. I’ve chosen to write email apps over webapps for a couple things, e.g. self hosted instagram where I email photos from my phone to myself.

                                                                                                                              Just need OpenIMAP and OpenSpam and Open Everything Else and we are all good.

                                                                                                                              1. 1

                                                                                                                                Could you go into some more details about your OpenSMTPD based workflow? I’ve been thinking of building apps over email, but would love to hear about others’ usage.

                                                                                                                              2. -2

                                                                                                                                It’s really not that hard.

                                                                                                                              1. 2

                                                                                                                                I never understood the advantages of ninja with respect to make. It seems to boil down to things like that the makefiles do not use tab characters with semantic value, that the -j option is given by default, or that the syntax is simpler and slightly better. But apart from that, what are the essential improvements that would justify a change from make to ninja? If ninja is slightly better than GNU make, I tend to prefer GNU make that I know and that it is ubiquitous and it avoids a new build dependency.

                                                                                                                                1. 14

                                                                                                                                  The article discusses how it’s really a low-level execution engine for build systems like CMake, Meson, and the Chrome build system (formerly gyp, now GN).

                                                                                                                                  So it’s much simpler than Make, faster than Make, and overlapping with the “bottom half” of Make. This sentence is a good summary of the problems with Make:

                                                                                                                                  Ninja’s closest relative is Make, which attempts to encompass all of this programmer-facing functionality (with globbing, variable expansions, substringing, functions, etc.) that resulted in a programming language that was too weak to express all the needed features (witness autotools) but still strong enough to let people write slow Makefiles. This is vaguely Greenspun’s tenth rule, which I strongly attempted to avoid in Ninja.

                                                                                                                                  FWIW as he also mentions in the article, Ninja is for big build problems, not necessarily small ones. The Android platform build system used to be written in 250K lines of GNU Make, using the “GNU Make Standard Library” (a third-party library), which as far as I remember used a Lisp-like encoding of Peano numbers for arithmetic …

                                                                                                                                  1. 6
                                                                                                                                    # ###########################################################################
                                                                                                                                    # ARITHMETIC LIBRARY
                                                                                                                                    # ###########################################################################
                                                                                                                                    
                                                                                                                                    # Integers a represented by lists with the equivalent number of x's.
                                                                                                                                    # For example the number 4 is x x x x. 
                                                                                                                                    
                                                                                                                                    # ----------------------------------------------------------------------------
                                                                                                                                    # Function:  int_decode
                                                                                                                                    # Arguments: 1: A number of x's representation
                                                                                                                                    # Returns:   Returns the integer for human consumption that is represented
                                                                                                                                    #            by the string of x's
                                                                                                                                    # ----------------------------------------------------------------------------
                                                                                                                                    int_decode = $(__gmsl_tr1)$(if $1,$(if $(call seq,$(word 1,$1),x),$(words $1),$1),0)
                                                                                                                                    
                                                                                                                                    # ----------------------------------------------------------------------------
                                                                                                                                    # Function:  int_encode
                                                                                                                                    # Arguments: 1: A number in human-readable integer form
                                                                                                                                    # Returns:   Returns the integer encoded as a string of x's
                                                                                                                                    # ----------------------------------------------------------------------------
                                                                                                                                    __int_encode = $(if $1,$(if $(call seq,$(words $(wordlist 1,$1,$2)),$1),$(wordlist 1,$1,$2),$(call __int_encode,$1,$(if $2,$2 $2,x))))
                                                                                                                                    __strip_leading_zero = $(if $1,$(if $(call seq,$(patsubst 0%,%,$1),$1),$1,$(call __strip_leading_zero,$(patsubst 0%,%,$1))),0)
                                                                                                                                    int_encode = $(__gmsl_tr1)$(call __int_encode,$(call __strip_leading_zero,$1))
                                                                                                                                    
                                                                                                                                    1. 3

                                                                                                                                      Source, please? I can’t wait to see what other awful things it does.

                                                                                                                                      1. 1

                                                                                                                                        Yup exactly, although the representation looks flat, it uses recursion to turn 4 into x x x x! The __int_encode function is recursive.

                                                                                                                                        It’s what you would do in Lisp if you didn’t have integers. You would make integers out of cons cells, and traverse them recursively.

                                                                                                                                        So it’s more like literally Greenspun’s tenth rule, rather than “vaguely” !!!

                                                                                                                                      2. 1

                                                                                                                                        Yes, so I guess its main advantage is that it is really scalable. This is not a problem that I have ever experience, my largest project having two hundred files that compiled in a few seconds, and the time spent by make itself was negligible. On the other hand, for such a small project you may get to enjoy the ad-hoc GNU make features, like the implicit compilation .c -> .o, the usage of CFLAGS and LDFLAGS variables, and so on. You can often write a makefile in three or four lines that compiles your project; I guess with ninja you should be much more verbose and explicit.

                                                                                                                                        1. 5

                                                                                                                                          He mentions that the readme explicitly discourages people with small projects from using it.

                                                                                                                                          I suspect it’s more that ninja could help you avoid having to add the whole disaster that is autotools to a make-based build rather than replacing make itself.

                                                                                                                                          1. 2

                                                                                                                                            I suspect it’s more that ninja could help you avoid having to add the whole disaster that is autotools to a make-based build rather than replacing make itself.

                                                                                                                                            Sure; autotools is a complete disaster and a really sad thing (and the same thing can be said about cmake). For small projects with few and non-configurable dependencies, it is actually feasible to write a makefile that will compile seamlessly the same code on linux and macos. And, if you don’t care that windows users can compile it themselves, you can even cross-compile a binary for windows as a target for a compiler in linux.

                                                                                                                                          2. 2

                                                                                                                                            You don’t (or better, shouldn’t!) write Ninja build descriptions by hand. The whole idea is something like CMake generates what Ninja actually parses. I’ve written maybe 3 ninja backends by now.

                                                                                                                                        2. 4

                                                                                                                                          we use ninja in pytype, where we need to create a dependency tree of a project, and then process the files leaves-upwards with each node depending on the output of processing its children as inputs. this was originally done within pytype by traversing the tree a node at a time; when we wanted to parallelise it we decided to instead generate a ninja file and have it invoke a process on each file, figuring out what could be done in parallel.

                                                                                                                                          we could doubtless have done the same thing in make with a bit of time and trouble, but ninja’s design decisions of separating the action graph out cleanly and of having the build files be easy to machine generate made the process painless.

                                                                                                                                          1. 4

                                                                                                                                            It’s faster. I am (often) on Windows, where the difference can feel substantial. The Meson site has some performance comparisons and mentions: “On desktop machines Ninja based build systems are 10-20% faster than Make based ones”.

                                                                                                                                            1. 2

                                                                                                                                              I use it in all my recent projects because it can parse cl.exe /showincludes

                                                                                                                                              But generally like andyc already said, it’s just a really good implementation of make