Threads for iv

  1. 2

    This is pretty accurate. I think Lightsail might be better suited for hobby projects. CloudWatch is always expensive though.

    1. 1

      I understand why people write these articles, but the argumentation here is just “I don’t like this”. Can’t we talk about design with better arguments? Maybe even data? (Not that I have any at hand.)

      1. 8

        but the argumentation here is just “I don’t like this”

        That’s just the title. The article goes a bit more in depth on why the author considers the new design to be worse, though you’re right about the data, the article only presents anecdotes:

        I feel like the designers of this new theme have never sit down with anyone who’s not a “techie” to explain to them how to use a computer. While a lot of people now instinctively hunt for labels that have hover effects, for a lot of people who are just not represented in online computer communities because they’re just using the computer as a tool this is completely weird. I have had to explain to people tons of times that the random word in the UI somewhere in an application is actually a button they can press to invoke an action.

        1. 2

          That’s fair, I missed that paragraph. I agree that it’s not data though.

          1. 6

            “Data” doesn’t make arguments automatically better. Quantitative analysis isn’t appropriate for everything, and even when it may be useful, you still need a qualitative analysis to even know what data to look at and how to interpret it.

            1. 1

              This is the kind of reply that’s easy to agree with. 🙂

      1. 1

        meta: does a website really have to look like this in 2022? Luckily Firefox has “reader view”.

        1. 5

          It works well in mobile at least.

          1. 2

            Yeah, it seems just adding body { max-width: 36em } would be enough for the site to be readable on desktop as well.

          2. 4

            Heh. As soon as content from this author shows up on HN, there’s a gush of people fawning over the “clean” and “oldschool” look of their pages.

            Personally I’d be fine with it if the margins were a bit bigger.

            1. 6

              I like “brutalist” webdesign too, but readability should be the focus of published text IMO

              1. 2

                As a counterweight, do you like or dislike this recent submission?

                http://rachelbythebay.com/w/2022/02/09/nice/

                1. 6

                  Rachel’s site is unfashionable but fine. Dan’s site is an aggressive nose thumb to the reader. There’s no comparison.

                  1. 4

                    not beautiful but a lot more readable.

              2. 3

                I think the author made his website like that to work for people with really bad connections. I think he also mentioned somewhere that wiping CSS off his site led to more engagement, but I could be wrong.

                1. 6

                  Obligatory http://bettermotherfuckingwebsite.com/ I guess. 7 lines of CSS isn’t going to make a noticeable difference on load times even on dialup.

                  But yeah, how on earth do people survive without Reader View?

                2. 1

                  +1, a column of text 70-90 characters wide is the most readable option personally, and it requires the least css. One could do it with <br> if one really wants.

                  1. 3

                    Don’t hard wrap! It will make it look horrible on harrow screens. max-width will do that in a sec.

                1. 1

                  It would be interesting to try to encode the secret-quality of data in the type system of the implementation language (or DSL).

                  1. 4

                    “this wouldn’t have happened with ZFS” is a strange conclusion to come to after a user error. Also: I’d recommend a mundane backup strategy. having to package something smells of novelty. Although I’ve not heard of the system they mention, it might be fine.

                    1. 6

                      ZFS would have told you why the drive wasn’t in the array anymore, with a counter showing how many checksums failed (the last column in zpool status, it should be 0). The author would thus have known there was something wrong with the SSD, and think twice before mindlessly adding it to the array.

                      I’m not entirely sure what would happen if you add the SSD back to the array anyway, at the very least you must give it a clean bill of health with zpool clean. I would also expect that ZFS urges or maybe even forces you to do a resilver of the affected device, which would show the corruption again. The main problem with mdadm in this case was that when re-adding the device, it found it was already part of the array before and decided to trust it blindly, not remembering that it was thrown out earlier, or why.

                      1. 3

                        ZFS should resilver when you add the drive back the array and verify/update the data on the failed drive.

                      2. 5

                        the readme in the repo for that project says in bold text that it is experimental. which is exactly what i would avoid if i was looking for some reliable backup system… but to each their own.

                        1. 5

                          How was this user error? This raid array silently corrupted itself. Possibly because of the ram?

                          the filesystem in the treefort environment being backed by the local SSD storage for speed reasons, began to silently corrupt itself.

                          ZFS checksums the content of each block, so it would have been able to tell you that what you wrote is not what is there anymore. It could also choose the block from the disk that was NOT corrupted by matching the checksum. It would have also stopped changing things the moment it hit inconsistencies.

                          1. 2

                            The drive failed out of the array and they added it back in.

                            1. 4

                              Yeah, but why did the array think it was fine when it had previously failed out?

                              1. 2

                                I don’t know, it’s a reasonable question but doesn’t change that fundamentally it was a user mistake. ZFS may have fewer sharp edges but it’s perfectly possible to do the wrong thing with ZFS too.

                        1. 4

                          I think the post is a bit dismissive of ECS. I ended up going with ECS for my tiny company, because running two EKS clusters (staging and prod), at $0.10/hour each, added appreciably to our AWS bill. But then, we may also be in the minority in thinking that AWS NAT gateways are too expensive.

                          1. 5

                            I think ECS is a great choice for a small company. Sorry if the article gave you a different impression.

                            1. 4

                              No, I agree. 90% of projects would be fine with Heroku. 9% need ECS (and other AWS services). 1% need something even more complicated, like EKS or bare metal.

                              1. 1

                                Is Heroku similar to AppEngine?

                              2. 3

                                However if I were a very small company new to AWS I wouldn’t touch this with a ten foot pole. It’s far too complicated for the benefits and any sort of concept of “portability” is nonsense bullshit

                                I think that’s pretty clear..

                              1. 4

                                Go is an interesting contrast here. Its standard library has HTTP, for example.

                                1. 3

                                  I think HTTP in particular is a special case when it comes to Go, since the language was written for Google to use for their own services, which would almost all use HTTP in some form, and having a well supported, well vetted default implementation just makes sense; you could almost look at HTTP as being up there with stdin/out/err for Go.

                                  1. 4

                                    And there is at least one problem they can’t fix in there (default timeouts)!

                                    1. 1

                                      Oh dear…

                                    2. 1

                                      Also HTTP is kinda ancient and well understood compared to other stuff. It should be possible to not break API compatibility and still do changes under the hood.

                                    1. 3

                                      The problem with that comment is you might actually want a guarantee that name is not None, and therefore not have to unwrap name on each use.

                                      1. 3

                                        You certainly might!

                                        The problem, though, is that this undermines the blog’s central thesis: “look how much harder doing this thing is in unsafe Rust than it is to do the same thing in C”. We’re making this hard in unsafe Rust only by using a different type in Rust than was used in C (a guaranteed not-null & rather than a simple *), and this related insisting on guarantees that it’s never not present so you can’t use Option, when the C version does not guarantee this and makes no effort to enforce it.

                                        Which is to say, of course it’s harder to guarantee more in unsafe Rust than it is to guarantee less in C.

                                        1. 2

                                          Also, what’s really making this hard is that the “this” — partially initializing structs — is a very unidiomatic thing overall. You shouldn’t need to do this, ever.

                                          The typical use of unsafe is FFI and fancy pointer-based data structures, none of which require thinking about something as scary as the layout of repr(Rust) structs.

                                    1. 3

                                      This is essentially a parser combinator library, right? I’m just wondering why they omitted the term.

                                      1. 3

                                        It might be because they use the result builder pattern/feature of Swift.

                                        1. 3

                                          Yeah. They’ve called their parser stuff parser combinators before. On their homepage there’s five videos called “Parser Combinators: Part N” and “Parser Combinators Recap: Part N” but I guess they wanted to highlight that it’s using result builders now.

                                      1. 16

                                        Part of this is ‘computers are fast now’. I distinctly remember two such moments around 10-15 years ago:

                                        The first was when I was working on a Smalltalk compiler. I had an interpreter that I mostly used for debugging, which was an incredibly naïve AST walker and was a couple of orders of magnitude slower than the compiler. When I was doing some app development work in Smalltalk, I accidentally updated the LLVM .so to an incompatible version and didn’t notice that this meant that the compiler wasn’t running for two weeks - the slow and crappy interpreter was sufficiently fast that it had no impact on the perceived performance of a GUI application.

                                        The second one was when I was writing my second book and decided to do the ePub generation myself (the company that the publisher outsourced it to for my first book did a really bad job). I wrote in semantic markup that I then implemented in LaTeX macros for the PDF version (eBook and camera-ready print versions). I wrote a parser for the same markup and all of the cross-referencing and so on and XHTML emission logic in idiomatic Objective-C (every text range was a heap-allocated object with a heap-allocated dictionary of properties and I built a DOM-like structure and then manipulated it) with a goal of optimising it later. It took over a minute for pdflatex to compile the book. It took under 250ms for my code to run on the same machine and most of that was the process creation / dynamic linking time. Oh, and that was at -O0.

                                        The other part is the user friendliness of the programming languages. I’m somewhat mixed on this. I don’t know anything about the COCO2’s dialect of BASIC, the BASIC that I used most at that time was BBC BASIC. This included full support for structured programming, a decent set of graphics primitives (vector drawing and also a teletext mode for rich text applications), an integrated assembler that was enough to write a JIT compiler, full support for structured programming, and so on. Writing a simple program was much easier than in almost any modern environment and I hit limitations of the hardware long before I hit limitations of the language. I don’t think I can say that about any computer / language that I’ve used since outside of the embedded space.

                                        1. 6

                                          Hm interesting examples. Though I would say Objective C is screamingly fast compared to common languages like Python, JavaScript (even JITted), and Ruby, even if it’s idiomatic to do a lot of heap allocations.

                                          Though another “computers are fast” moment I had is that sourcehut is super fast even though it’s written entirely in Python:

                                          https://forgeperf.org/

                                          It’s basically written like Google from 2005 (which was crazy fast, unlike now). Even though Google from 2005 was written in C++, it doesn’t matter, because the slow parts all happen in the browser (fetching resources, page reflows, JS garbage collection, etc.)

                                          https://news.ycombinator.com/item?id=29706150

                                          Computers are fast, but “software is a gas; it expands to fill its container… “


                                          Another example I had was running Windows XP in a VM on a 2016 Macbook Air. It flies and runs in 128 MB or 256 MB of RAM! And it has a ton of functionality.


                                          This also reminds me of bash vs. Oil, because the bash codebase was started in 1987! And they are written in completely different styles.

                                          Oil’s Parser is 160x to 200x Faster Than It Was 2 Years Ago

                                          Some thoughts:

                                          • Python is too slow for sure. It seems obvious now, but I wasn’t entirely sure when I started, since I could tell the bash codebase was very suboptimal (and later I discovered zsh is even slower). The parser in Python is something like 30-50x slower than bash’s parser in C (although it uses a Clang-like “lossless syntax tree”, for better error messages, which bash doesn’t have. It also parses more in a single pass.)
                                          • However adding static types, and then naively translating the Python code to C++ actually produces something competitive with bash’s C implementation! (It was slightly faster when I wrote that blog post, is slightly slower now, and I expect it to be faster in the long run, since it’s hilariously unoptimized)

                                          I think it is somewhat surprising that if you take some Python code like:

                                          for ch in s:
                                            if s == '\n':
                                              pass
                                          

                                          And translating it naively to C++ (from memory, not exactly accurate)

                                          for (it = StrIter(s); ch = it.Next(); !it.Done()) {
                                            if str_equals(ch, str1) {  // implemented with memcmp()
                                               ;
                                            }
                                          }
                                          

                                          It ends up roughly as as fast. That creates a heap allocated string for every character like Python does.

                                          Although I am purposely choosing the worst case here. I consciously avoided that “allocation per character” pattern in any place I thought would matter, but it does appear at least a couple times in the code. (It will be removed, but only in the order that it actually shows up in profiles !)

                                          I guess the point is that there are many more allocations. Although I wrote the Cheney garbage collector in part because allocation is just bumping a pointer.

                                          The garbage collector isn’t hooked up yet, and I suspect it will be slow on >100 MB heaps, but I think the average case for a shell heap size is more like 10 MB.


                                          I think the way I would summarize this is:

                                          • Some old C code is quite fast and optimized. Surprisingly, Windows XP is an example of this, even though we used to make fun of Microsoft for making bloated code.
                                            • bash’s code is probably 10x worse than optimal, because Oil can match it with a higher level language with less control. (e.g. all strings are values, not buffers)
                                          • Python can be very fast for sourcehut because web apps are mostly glue and I/O. It’s not fast for Oil’s parser because that problem is more memory intensive, and parsing creates lots of tiny objects (the lossless syntax tree).
                                          1. 5

                                            Though I would say Objective C is screamingly fast compared to common languages like Python, JavaScript (even JITted), and Ruby, even if it’s idiomatic to do a lot of heap allocations.

                                            Yes and no. Objective-C is really two languages, C (or C++ for Objective-C++) and Smalltalk. The C/C++ implementation is as good as gcc or clang’s C/C++ implementation. The Smalltalk part is much worse than a vaguely modern Smalltalk (none of the nice things that a JIT does, such as inline caching or run-time-type-directed specialisation). The code that I wrote was almost entirely in the Smalltalk-like subset. If I’d done it in JavaScript, most of the thing that were dynamic message sends in Objective-C would have been direct dispatch, possibly even inlined, in the JIT’d JavaScript code.

                                            I used NSNumber objects for line numbers, for example, not a C integer type. OpenStep’s string objects have some fast paths to amortise the cost of dynamic dispatch by accessing a range of characters at once. I didn’t use any of these, and so each character lookup did return a unichar (so a primitive type, unlike your Python / C++ example) but involved multiple message sends to different objects, probably adding up to hundreds of retired instructions.

                                            All of these were things I planned on optimising after I did some profiling and found the slow bits. I never needed to.

                                            Actually, that’s not quite true. The first time I ran it, I think it used a couple of hundred GiBs of RAM. I found one loop that was generating a lot of short-lived objects on each iteration and stuck an autorelease pool in there, which reduced the peak RSS by over 90%.

                                            bash’s code is probably 10x worse than optimal, because Oil can match it with a higher level language with less control. (e.g. all strings are values, not buffers)

                                            I suspect that part of this is due to older code optimising for memory usage rather than speed. If bash (or TeX) used a slow algorithm, things take longer. If they used a more memory-intensive algorithm, then the process exhausts memory and is killed. I think bash was originally written for systems with around 4 MiB of RAM, which would have run multiple bash instances and where bash was mostly expected to run in the background while other things ran, so probably had to fit in 64 KiB of RAM, probably 32 KiB. I don’t know how much RAM Oil uses (I don’t see a FreeBSD package for it?), but I doubt that this was a constraint that you cared about. Burning 1 MiB of RAM for a 10x speedup in a shell is an obvious thing to do now but would have made you very unpopular 30 years ago.

                                            1. 2

                                              Yeah the memory management in all shells is definitely oriented around their line-at-a-time nature, just like the C compilers. I definitely think it’s a good tradeoff to use more RAM and give precise Clang-like error messages with column numbers, which Oil does.

                                              Although one of my conjectures is that you can do a lot with optimization at the metalanguage level. If you look at the bash source code, it’s not the kind of code that can be optimized well. It’s very repetitive and there are lots of correctness issues as well (e.g. as pointed out in the AOSA book chapter which I link on my blog).

                                              So Oil’s interpreter is very straightforward and unoptimized, but the metalanguage of statically typed Python + ASDL allows some flexibility, like:

                                              • interning strings at GC time, or even allocation time (which would make string equality less expensive)
                                              • using 4 byte integers instead 8 byte pointers. This would make a big difference because the data structures are pointer rich. However it tends to “break” debugging so I’m not sure how I feel about it.
                                                • Zig does this manually but loses type safety / debuggability because all your Foo* and Bar* just become int.
                                              • Optimizing a single hash table data structure rather than the dozens and dozens of linked list traversals that all shells use

                                              All of these things are further off than I thought they would be … but I still think it is a good idea to use the “executable spec” startegy, since codebases like bash tend to last 30 years or so, and are in pretty bad shape now. At a recent conference the maintainer emphasized that the possibility of breakage is one reason that it moves relatively slowly and new features are rejected.

                                              One conjecture I have about software is:

                                              • Every widely used codebase that’s > 100K lines is 10x too slow in some important part, and it’s no longer feasible to optimize
                                              • Every widely used codebase that’s > 1M lines is 100x too slow in some important part, …

                                              (Although ironically even though bash’s man page says “it’s too big and too slow”, it’s actually small and fast compared to modern software!)

                                              I think this could explain your pdflatex observations, although I know nothing about that codebase. Basically I am never surprised that when I write something “from scratch” that it is fast (even in plain Python!), simply because it’s 2K or 5K lines of code tuned to the problem, and existing software has grown all sorts of bells and whistles and deoptimizations!

                                              Like just being within 10x of the hardware is damn good for most problems, and you even can do that in Python! (though the shell parser/interpreter was a notable exception to this! This problem is a little more demanding than I thought)

                                              1. 4

                                                Every widely used codebase that’s > 100K lines is 10x too slow in some important part, and it’s no longer feasible to optimize

                                                That’s an interesting idea. I don’t think it’s universally true, but it does highlight the fact that designing to enable large-scale refactoring is probably the most important goal for long-term performance. Unfortunately I don’t think anyone actually knows how to do this. To give a concrete example, LLVM has the notion of function passes. These are transforms that run over a single function at a time. They are useful as an abstraction because they don’t invalidate the analysis results of any other function. At a high level, you might assume that you could then run function passes on all functions in a translation unit at a time. Unfortunately, there are some core elements of the design that make this impossible. The simplest one is that all values, including globals, have a use-def chain and adding (or removing) a use of a global in a function is permitted in a function pass and this would require synchronisation. If you were designing a new IR from scratch then you’d probably try to treat a function or a basic block as an atomic unit and require explicit synchronisation or communication to operate over more than one. LLVM has endured a lot of very invasive refactorings (at the moment, pointers are in the process of losing the pointee type as part of their type, which is a huge change) but the changes required to make it possible to parallelise this aspect of the compiler are too hard. Instead, it’s worked around with things like ThinLTO.

                                                I think this could explain your pdflatex observations, although I know nothing about that codebase. Basically I am never surprised that when I write something “from scratch” that it is fast (even in plain Python!), simply because it’s 2K or 5K lines of code tuned to the problem, and existing software has grown all sorts of bells and whistles and deoptimizations!

                                                There are two problems with [La]TeX. The first is that it’s really a programming language with some primitives that do layout. A TeX document is a program that is interpreted one character at a time with an interpreter that looks a lot like a Turing machine consuming its tape. Things like LaTeX and TikZ look like more modern programming or markup languages but they’re implemented entirely on top of this Turing-machine layer and so you can’t change that without breaking the entire ecosystem (and a full TeXLive install is a few GiBs of programs written in this language, so you really don’t want to do that).

                                                The second is that TeX has amazing backwards compatibility guarantees for the output. You can take a TeX document from 1978 and typeset it with the latest version of TeX and get exactly the same visual output. A lot of the packages that exist have made implicit assumptions based on this guarantee and so even an opt-in change to the layout would break things in unexpected ways.

                                                Somewhat related to the first point, TeX has a single-pass output concept baked in. Once output has been shipped to the device, it’s gone. SILE can do some impressive things because it treats the output as mutable until the program finishes executing. For example, in TeX, if you want to do a cross-reference to a page that hasn’t been typeset yet then you need to run TeX twice. The first time will emit the page numbers of all of the labels, the second time will insert them into the cross references. This is somewhat problematic because the first pass will put ‘page ?’ in the output and the second might put ‘page 100’ in the output, causing reflow and pushing the reference to a different place. In some cases this may then cause it to be updated to page 99, which would then cause reflow again. This is made worse by some of the packages that do things like ‘on the next page’ or ‘above’ or ‘on page 42 in section 3’ depending on the context and so can cause a lot of reflowing. In SILE, the code that updates these references can see the change to the layout and if it doesn’t reach a fixed point after a certain number of iterations then it can fall back to using a fixed-width representation of the cross-reference or adding a small amount of padding somewhere to prevent reflows.

                                                1. 1

                                                  … designing to enable large-scale refactoring is probably the most important goal for long-term performance.

                                                  Yes! In the long run, architecture dominates performance. That is one thesis / goal behind Oil’s unusual implementation strategy – i.e. writing it in high level DSLs which translate to C++.

                                                  I’ve been able to refactor ~36K lines of code aggressively over 5 years, and keep working productively in it. I think that would have been impossible with 200K-300K lines of code. In my experience, that’s about the size where code takes on a will of its own :-)

                                                  (Bash is > 140K lines, and Oil implements much of it, and adds a rich language on top, so I think the project could have been 200K-300K lines of C++, if it didn’t fall over before then)

                                                  Another important thesis is that software architecture dominates language design. If you look at what features get added to say Python or Ruby, it’s often what is easy to implement. The Zen of Python even says this, which I quoted here: http://www.oilshell.org/blog/2021/11/recent-progress.html#how-osh-is-implemented-process-tools-and-techniques

                                                  When you add up that effect over 20-30 years, it’s profound!


                                                  The LLVM issues you mention remind me of the talks I watched on MLIR – Lattner listed a bunch of regrets with LLVM that he wants to fix with a new IR. Also I remember him saying a big flaw with Clang is that there is no C++ IR. That is, unlike Swift and the machine learning compiler he worked on at Google, LLVM itself is the Clang IR.

                                                  Also I do recall watching a video about pass reordering, although I don’t remember the details.


                                                  Yes to me it is amazing that TeX has survived for so long, AND that it still has those crazy limitations from hardware that no longer exists! Successful software lasts such a long time.

                                                  TeX and Oil have that in common – they have an unusual “metalanguage”! As I’m sure you know, in TeX it’s WEB and Pascal-H. I linked an informative comment below about that.

                                                  In Oil it’s statically typed Python, ASDL for algebraic types, and regular languages. It used to be called “OPy”, but I might call this collection of DSLs “Pea2” or something.

                                                  So now it seems very natural to mention that I’m trying to fund and hire a compiler engineer to speed up the Oil project:

                                                  https://github.com/oilshell/oil/wiki/Compiler-Engineer-Job (very rough draft)

                                                  (Your original comment about the dynamic parts of Objective C and their speed is very related!)

                                                  What I would like a compiler engineer to do is to rewrite a Python front end in Python, which is just 4K lines of code, but might end up at 8K.

                                                  And then enhance a 3K C++ runtime for garbage collected List<T> and `Dict<K, V>. And debug it! I spent most of my time in the debugger.

                                                  This task is already half done, passing 1131 out of ~1900 spec tests.

                                                  https://www.oilshell.org/release/0.9.6/pub/metrics.wwz/line-counts/for-translation.html

                                                  It seems like you have a lot of relevant expertise and probably know many people who could do this! It’s very much engineering, not research, although it seems to fall outside of what most open source contributors are up for.

                                                  I’m going to publicize this on my blog, but I’m letting people know ahead of time. I know there are many good compiler engineers who don’t read my blog, or who don’t read Hacker News, or who have never written open source (i.e. prefer being paid).

                                                  (To fund this, I applied for a $50K euro grant which I’ll be notified of by February, and I’m setting up Github sponsors. Progress will also be on the blog.)


                                                  Someone replied to me with nice info about TeX metalanguages: https://news.ycombinator.com/item?id=16526151

                                                  Today, major TeX distributions have their own Pascal(WEB)-to-C converters, written specifically for the TeX (and METAFONT) program. For example, TeX Live uses web2c[5], MiKTeX uses its own “C4P”[6], and even the more obscure distributions like KerTeX[7] have their own WEB/Pascal-to-C translators. One interesting project is web2w[8,9], which translates the TeX program from WEB (the Pascal-based literate programming system) to CWEB (the C-based literate programming system).

                                                  The only exception I’m aware of (that does not translate WEB or Pascal to C) is the TeX-GPC distribution [10,11,12], which makes only the changes needed to get the TeX program running with a modern Pascal compiler (GPC, GNU Pascal).

                                            2. 4

                                              Windows XP is an example of this, even though we used to make fun of Microsoft for making bloated code.

                                              It doesn’t surprise me that we’d feel this way now. From memory (I didn’t like XP enough to have played with it in virtualization at any time since Windows software moved on from supporting it) Windows XP was slow for a few reasons:

                                              1. It included slow features that its predecessor didn’t. Like web rendering on the desktop, indexing for search, additional visual effects in critical paths in the GUI, etc.
                                              2. It needed a lot more RAM than NT4 or 2000 did. Many orgs had sized their PCs for NT 4 and tried to go straight to XP on the same hardware, and MS had been super conservative about minimum RAM requirements. So systems that met the minimums were miserable.
                                              3. (related to 2) It had quite a bit more managed code in the desktop environment, which just chewed RAM.

                                              If you tried to install it on a 16MB or 32MB system that seemed just fine with NT SP6 or 2k, you had a bad time. Now, as you point out, we just toss 256MB at it without thinking. Some of the systems in the field when it was released, that MS told us could run XP, could not take 256MB of RAM.

                                              1. 2

                                                I think you’re mis-remembering the memory requirements of 1990s WinNT a little bit. :-)

                                                I deployed NT 3.1 in production. It just about ran in 16MB, and not well. 32MB was realistic.

                                                NT 4 was OK in 32MB, decent in 64MB, and the last box I gave to someone had 80MB of RAM and it ran really quite well in that.

                                                I deployed an officeful of Win2K boxes in 2000 on Athlons with 128MB of RAM, and 6mth later, I had to upgrade them all to 256MB to make it usable. (I was canny; I bought 256MB for half of them, and used the leftover RAM to upgrade the others, to minimise how annoyed my client was at needing to upgrade still-new PCs.)

                                                XP in 128MB was painful, but it was just about doable in 192MB (the unofficial maxed-out capacity of my Acer-made Thinkpad i1200 series 1163G) and acceptable in 256MB.

                                                For an experiment, I ran Windows 2000 (no SPs or anything) on a Thinkpad 701C – the famous Butterfly folding-keyboard machine – in 40MB of RAM. On a 486. It was just marginally usable if you were extremely patient: it booted, slowly, it logged in, very slowly, and it could connect to the Internet, extremely slowly.

                                                1. 2

                                                  I will just believe you… I’m not going to test it :)

                                                  I remember that I had rooms full of PCs that were OK with either NT4 or 2K, and were pretty much unusable on XP despite vendor promises. The fact that I’ve forgotten the exact amounts of RAM where those lines fell is a blessing. I’m astonished but happy that I’ve finally forgotten… it was such a deeply ingrained thing for so long.

                                                  1. 2

                                                    :-D

                                                    That sounds perfectly fair! ;-)

                                                    The thing about RAM usage that surprised me in the early noughties was how much XP grew in its lifetime. When it was new, yeah, 256MB and it ran fairly well. Towards the end of its useful lifetime, you basically had to max out a machine to make it run decently – meaning, as it was effectively 32-bit only, 3 and a half (or so) gigs of RAM.

                                                    One of the things that finally killed XP was that XP64 was a whole different OS (a cut-down version of Windows Server 2003, IIRC) and needed new drivers and so on. So if you wanted good performance, you needed more RAM, and if you needed more than three-and-a-bit gigs of RAM, you had to go to a newer version of Windows to get a proper 64-bit OS.

                                                    For some brave souls that meant Vista (which, like Windows ME, was actually fairly OK after it received a bunch of updates). But for most, it meant Windows 7.

                                                    And that in turn is why XP was such a long-lived OS, as indeed was Win7.

                                                    Parenthetical P.S.: whereas, for comparison, a decent machine for Win7 in 2009 – say a Core i5 with 8GB of RAM – is still a perfectly usable Windows 10 21H2 machine now in 2022. Indeed I bought a couple of Thinkpads of that sort of vintage just a couple of months ago.

                                                2. 1

                                                  Yeah I think all of that is true (although I don’t remember any managed code.) So I guess my point is that the amount of software bloat is just way worse now, so software with small amounts of bloat like XP seem ultra fast.

                                                  Related thread from a month ago about flatpak on Ubuntu:

                                                  https://lobste.rs/s/ljsx5r/flatpak_is_not_future#c_upxzcl

                                                  One commenter pointed out SSDs, which I agree is a big part of it, but I think we’ve had multiple hardware changes that are orders-of-magnitude increases since then (CPU, memory, network). And ALL of it has been chewed up by software. :-(

                                                  And I don’t think this is an unfair comparison, because Windows XP had networking and a web browser, unlike say comparing to Apple II. It is actually fairly on par with what a Linux desktop provides.

                                              2. 4

                                                I cut my teeth on AppleSoft BASIC in the 1980s. The only affordance for “structured programming” was GOSUB and the closest thing there was to an integrated assembler was a readily accessible system monitor where you could manually edit memory. The graphics primitives were extremely limited. (You could enable graphics modes, change colors, toggle pixels, and draw lines IIRC. You might have been able to fill regions, too, but I can’t swear to that.) For rich text, you could change foreground and background color. Various beeps were all you could do for sound, unless you wanted to POKE the hardware directly. If you did that you could do white noise and waveforms too. I don’t have enough time on the CoCo to say so with certainty, but I believe it was closer to the Apple experience than what you describe.

                                                The thing that I miss about it most, and that I think has been lost to some degree, is that the system booted instantly to a prompt that expected you to program it. You had to do something else to do anything other than program the computer. That said, manually managing line numbers was no picnic. And I’m quite attached to things like visual editing and syntax highlighting these days. And while online help/autocomplete is easier than thumbing through my stack of paper documentation was, I might have learned more, more quickly, from that paper.

                                                1. 2

                                                  Before Applesoft BASIC there was Integer BASIC, which came with the Mini-Assembler. It was very crappy though, and not a compelling alternative to graph paper and a copy of the instruction set. I remember a book on game programming on the Apple II that spent almost half the book writing an assembler in Applesoft BASIC, just to get to the good part!

                                                  1. 1

                                                    I remember Integer BASIC only because there were a few systems around our school where you needed to type “FP” to get to Applesoft before your programs would work. I don’t remember the Mini-Assembler at all.

                                                  2. 1

                                                    Color BASIC on the CoCo was from Microsoft, and it wasn’t too different from some of the other BASICs of the time, but did require a little adaptation for text-only stuff. Extended Color BASIC (extra cost option early on) added some graphics commands in various graphics modes. With either version of Color BASIC, the only structured programming was via GOSUB. Variable names were limited to one or two letters for floats and for strings.

                                                    Unfortunately, the CoCo didn’t ship with an assember / debugger built-in, you had to separately buy the EDTASM cartridge (or the later floppy disk version).

                                                  3. 4

                                                    My computers are fast moment: I was trying to get better image compression. I’ve discovered that an existing algorithm randomly generated better or worse results depending on hyperparameters, so I’ve just tried a bunch of them in a loop to find the best:

                                                    for(int i=0; i < 100; i++) {
                                                       result = try_with_parameter(i);
                                                       if (result > best) best = result;
                                                    }
                                                    

                                                    And it worked great, still under a second. Then I’ve found the original 1982 paper about this algorithm, where they said their Univesity Mainframe took 2 minutes per try, on way smaller data. Now I know why they hardcoded the parameters instead of finding the best one.

                                                    1. 12

                                                      A lot of the new and exciting work in compilers for the last 20 years has been implementing algorithms that were published in the ‘80s but ignored because they were infeasible on available hardware. When LLVM started doing LTO, folks complained that you couldn’t link a release build of Firefox on a 32-bit machine anymore. Now a typical dev machine for something as big as Firefox has at least 32 GiB of RAM and no one cares (thought they do care that fat LTO is single threaded). The entire C separation of preprocessor, compiler, assembler, and linker exists because each one of those could fit independently in RAM on a PDP-11 (and the separate link step originates because Mary Allen Wilkes didn’t have enough core memory on an IBM 704 to fit a Fortran program and all of the library routines that it might use). Being able to fit an entire program in memory in an intermediate representation over which you could do whole-program analysis was unimaginable.

                                                      TeX has a fantastic dynamic programming algorithm for finding optimal line breaking points in a paragraph. In the paper that presents the algorithm, it explains that it would also be ideal to use the same algorithm for laying out paragraphs on the page but doing this for a large document would require over a megabyte of memory and so is infeasible. SILE does the thing that the TeX authors wished they could do, using the algorithm exactly as they described in the paper.

                                                    2. 2

                                                      RIGHT!? A few years ago, I stumbled upon an old backup CD on which, around 2002 or so, I dumped a bunch of stuff from my older, aging Pentium II’s hard drive. This included a bunch of POV-Ray files that, I think, are from 1999 or so, one of which I distinctly recall taking about two hours to render at a nice resolution (800x600? I don’t think I’d have dared try 1024x768 on that). It was so slow that you could almost see every individual pixel coming up into existence. In a fit of nostalgia I downloaded a more recent version of POV-Ray and after some minor fiddling to get it working with modern POV-Ray versions, I tried to render it at 1024x768. It took a few seconds.

                                                      I was somewhat into 3D modelling at the time but I didn’t have the computer to match. Complicated scenes required some… creative fiddling. I’d do various parts of the scene in Moray 2 (anyone remember that?) in several separate files, so I could render them separately while working on them. That way it didn’t take forever to do a render. I don’t recall why (bugs in Moray? poor import/copy-paste support when working with multiple files?) but I’d then export all of these to POV-Ray, paste them together by hand, and then do a final render.

                                                      I don’t know what to think about language friendliness either, and especially programming environment friendliness. I’m too young for 1987 so I can’t speak for BASIC and the first lines of code I ever wrote were in Borland Pascal, I think. But newer environments weren’t all that bad either. My first real computer-related job had me doing things in Flash, which was at version… 4 or 5, I think, back then? Twenty years later, using three languages (HTML, CSS and JS), I think you can do everything you could do in Flash (a recent-ish development, though – CSS seems to have improved tremendously in the last 5-7 years), but with orders of magnitude more effort, and in a development environment that’s significantly worse in just about every way there is. Thank God that dreadful Flash plugin is dead, but still…

                                                      For a long time I though this was mostly a “by programmers, for programmers” thing – the inevitable march of progress inevitably gave rise to more complex tools, which not everyone could use, so we were generally better off, but non-programmers were not. For example, lots of people at my university lamented the obsolescence of Turbo C – they were electrical engineers who mostly cared about programming insofar as it allowed them to crunch numbers quickly and draw pretty graphics. Modern environments could do a lot more things, but you also paid the price of writing a lot more boilerplate in order to draw a few circles.

                                                      But after I’ve been at it for a while I’m not at all convinced things are quite that simple. For example, lots of popular application development platforms today don’t have a decent GUI builder, or any GUI builder at all, for that matter, and writing GUI applications feels like an odd mixture of “Holy crap the future is amazing!” and “How does this PDP-11 fit in such a small box!?”. Overall I do suppose we’re better off in most ways but there’s been plenty of churn that can’t be described as “progress” no matter how much you play with that word’s slippery definition.

                                                      Edit: on the other hand, there’s a fun thread over at the Retrocomputing SO about how NES games were developed. This is a development kit. Debugging involved quite some creativity. Now you can pause an emulator and poke through memory at will. Holy crap is the future awesome!

                                                      1. 1

                                                        I’ve been thinking about UI builders, and from my experience, I think they’ve fallen out of favor largely because the result is harder to maintain than a “UI as code” approach.

                                                        1. 4

                                                          They haven’t really kept up with the general shift in development and business culture, that’s true. The “UI description in one file, logic in another file, with boilerplate to bind them” paradigm didn’t make things particularly easy to maintain, but it was also far more tolerable at a time when shifting things around in the UI was considered a pretty bad idea rather than an opportunity for an update (and beefing up some KPIs and so on).

                                                          A great deal of usefulness at other development stages has been lost though. At one point we used to be able to literally sit in the same room as the UX folks (most of whom had formal, serious HCI education but that’s a whole other can of worms…) and hash out the user interfaces based on the first draft of a design. I don’t mean new wireframes or basic prototypes, I mean the actual UI. The feedback loop for many UI change proposals was on the order of minutes, and teaching people who weren’t coders how to try them out themselves essentially involved teaching them what flexible layouts are and how to drag’n’drop controls, as opposed to a myriad CSS and JS hacks. For a variety of reasons (including technology) interfaces are cheaper to design and implement today, but the whole process is a lot slower in my experience.

                                                    1. 5

                                                      The really interesting bit here is the fact that close can be surprisingly slow, so if you really need to care about it (like in this case), you should probably call it on a worker thread.

                                                      1. 1

                                                        That is probably fine in this use case, but has some annoying interactions with UNIX semantics for file descriptors. UNIX requires that any new file descriptor will have the number of the lowest unallocated file descriptor. This means that any file-descriptor-table manipulation requires some serialisation (one of the best features of io_uring is providing a separate namespace for descriptors that is local to the ring and so can be protected by a lock that’s never contended). I think Linux uses RCU here, other systems typically use locks. Some userspace code actually depends on this and so localises all fd creation / destruction to a single thread. If the program isn’t already multithreaded then adding a thread that calls close can break it.

                                                      1. 2

                                                        I can recommend using managed DB services, e.g. AWS RDS, to offload the DB server management. You get backups and other features without having to implement all that yourself.

                                                        1. 1

                                                          Are there any estimates of how heavy use of generics would impact compile time ?

                                                          1. 1

                                                            Yes, in the release notes.

                                                            1. 5

                                                              Thank you. (exact paragraph)

                                                          1. 3

                                                            What a coincidence, I was looking into learning Agda yesterday. Considering this short guide is far from complete and its last chapter is marked deprecated, are there any other good resources for learning Agda? As far as the ideas themselves of dependent types go, I’ve read “The Little Typer”, which gave me a good understanding of the fundamentals, but I’d like to read some more about things specific to Agda and the ways it is unique in the proof assistant landscape.

                                                            1. 3

                                                              I do wish there was a detailed comparison review / table / something of tools in the dependent types / proof assistant space… As someone trying to learn about this area, I currently find it quite hard to work how to choose between Coq, Idris, Agda, Lean, etc. At the moment, it feels like the people who know the area deeply enough to write such a thing are also heavily involved in one specific camp…

                                                              If anyone knows of such a resource, please do share it!

                                                              1. 5

                                                                I don’t know of such a resource, but as a very quick overview:

                                                                • Coq really is a theorem proved based on tactics, and supporting proof automation. Its syntax is ML-like. In Coq, one can write automated procedures to get rid of similar (or not-so-similar) cases for you. A great example is the Lia tactic, which automatically solves many proof goals with equalities and inequalities in them using some kind of algorithm. You never have to see the final proof object. It’s quite convenient. Another thing Coq has is extraction. It has built-in functionality for turning Coq code into, say, Haskell or OCaml. It’s not ideal, but you can verify a Coq program and then use it in some other project. The main downside of the language is, to me, its size. There are multiple sub-languages in Coq, with similar but different syntaxes. It’s a lot to learn. I also think the documentation could use some work.
                                                                • Where Coq lets you prove things via “tactics”, Agda lets you do so using standard proof objects and functions. I’ve seen it used for formalized mathematics and computer science (“Programming language foundations in Agda”, “Univalent foundations in Agda”), and also in multiple recent PL papers (a not-so-recent example is Hazel, a programming language whose semantics were formalized in Agda I believe). However, it’s also a much nicer language for actual dependently typed programming. It’s got a Haskell-like syntax and also some pretty nice support for custom notation and operators.
                                                                • Idris is also a Haskell-like language and it’s really geared towards dependently typed programming rather than proofs. It has some helper functions in its standard library for basic things (replacing one value with another equal value in a type, for example) but it’s far from convenient to write. In fact, Idris2 is (last I checked) not even sound. In it, Type : Type, which I believe can lead to a version of Russel’s paradox. Thus, you can technically prove anything. Idris’s new tagline is “language for type-driven development”. The approach that Edwin, the language’s creator, wants you to take is that of writing down types with a hole, and then gradually filling in the hole with code using the types to guide you.

                                                                I know you asked for a detailed review, but I thought maybe this would help.

                                                                1. 1

                                                                  Thank you! Does anyone know enough about Lean to add a quick overview of it?

                                                              2. 3

                                                                Are there any other good resources for learning Agda

                                                                I’m going through PLFA, and there is an ongoing Discord study group here. For a concise 40-page introduction to Agda, see Dependently Typed Programming in Agda.

                                                                You might find some of the links I collect here on Agda interesting: https://www.srid.ca/agda

                                                                1. 2

                                                                  Programming Language Foundations in Agda looks useful. Haven’t had the chance to dig into it yet though.

                                                                  1. 1

                                                                    If you’re interested in using Agda as a vehicle to learning type theory, there’s also Introduction to Univalent Foundations of Mathematics with Agda, which isn’t exactly beginner material, though. Agda also itself supports cubical types, which kind of incepts the whole thing.

                                                                    In general, most of the material in Agda I’ve seen is oriented towards type theory and/or programming language research. It’d be interesting to see other mathematics as well.

                                                                    1. 1

                                                                      Thanks for the pointer! How feasible would it be in your opinion to understand this with only a few definitions’ worth of category theory knowledge?

                                                                  1. 4

                                                                    I have a utilities library for another language and I’ve found that the grab-bag, misc nature of it has made people reluctant to use it. Any advice?

                                                                    1. 6

                                                                      My advice is that people will use your library if it’s less work to find it, understand it, integrate it, and consume it, when compared to writing their own utility functions. For a utilities library, this is highly unlikely to be the case.

                                                                      The remarkable thing about left-pad to me is that these conditions somehow became true, which is a huge credit to npm.

                                                                      1. 2

                                                                        Any advice?

                                                                        Well, seeing as I released this package literally today, I might not be the best person to give advice – maybe no one will use _ either!

                                                                        That said, here are a few things that I’d look for in a utilities package and that I’m aiming to deliver with _:

                                                                        • How much do I trust the quality of the code? For _, I’m emphasizing the shortness of the code and hoping that reading it will get people to trust the quality (this depends both on them being willing to read it and on them thinking it’s well-written). But this could also be based on the reputation of the author/professional backing, test coverage, etc.
                                                                        • Does the library seem likely to be well maintained? Part of the point of a utility library is to simplify common tasks. But if the library becomes abandoned, it would have the opposite effect. For _, I’m trying to address this by putting thought into the library’s future and being transparent about my thoughts. (See, e.g., the 5,000+ word OP)
                                                                        • Will the library maintain backwards compatibility or break when I need it most? As with the previous bullet, a utility library that breaks my code is having the opposite of the effect I want. This part is still a WIP for _, but I’m trying to come up with the strongest promise of backward compatibility that I can reasonably keep.
                                                                        • Does the library have functions in a “happy medium” of complexity? If they’re too simple, I’d just implement them myself instead of using the library; if they’re too complex, I’d be willing to take on a dedicated dependency for that feature rather than leave it to a utility library. This is fairly subjective; I’ve tried to strike the correct balance with _, but I’ll have to see how many users agree.
                                                                        1. 4

                                                                          maybe no one will use _ either

                                                                          first impression: this is a terrible name, and not just because it’s already in use by gettext

                                                                          1. 1

                                                                            Can you say more about why _ strikes you as a bad name? Does the spelled-out version I also used (“lowbar”) strike you the same way?

                                                                            Names are important things, and I’m curious about your views.

                                                                            1. 5

                                                                              “I used an underscore character to name my library. Many people will think it’s named after this other similar library that also uses an underscore for its name, but it’s actually not.” <- This makes no sense at all. I’ve never heard an underscore called a lowbar, and citing the HTML spec comes across as “see I’m technically correct” pedantry.

                                                                              1. 2

                                                                                That’s an entirely fair criticism of my post – so much so that I upvoted.

                                                                                (The bit about the HTML spec was intended as … well, not quite a joke, but making light of myself for needing to pick an obscure term for the character – I also hadn’t heard a ‘_’ called a “lowbar” before. Clearly that not-quite-a-joke didn’t land).

                                                                                What I was trying to say is that _ fits so well with Raku’s existing use of $_, @_, and %_ that I decided to go with the name anyway – even though I view the name collision with underscore.js and lodash.js as unfortunate and 100% accept that most people will think that _ named in homage to lowdash.

                                                                                1. 2

                                                                                  Yeah, I’m speaking without any specific knowledge of Raku. I just think that if the library does catch on, people will pronounce it as “underscore” whether you want them to or not. =)

                                                                                  The thing that the underscore character reminds me of is when you’re writing a function that accepts more arguments than it needs, and you’re using an underscore to communicate that it’s unused: https://softwareengineering.stackexchange.com/questions/139582/which-style-to-use-for-unused-return-parameters-in-a-python-function-call (the link mentions Python and I’ve never used Python, but it’s common in nearly every language I do use)

                                                                              2. 1

                                                                                Can you say more about why _ strikes you as a bad name? Does the spelled-out version I also used (“lowbar”) strike you the same way?

                                                                                I dislike this name because it’s difficult to search for and lowbar is a relatively obscure term. If the intent is that every Raku program includes the library, then you could call it Prelude. That’s what other languages such as Haskell call a library of functions that are implicitly included in every program.

                                                                                1. 2

                                                                                  On the other hand, lodash uses _ and is pretty well known in the JS land.

                                                                                  1. 1

                                                                                    there’s also underscore.js https://underscorejs.org/

                                                                          2. 1

                                                                            I sometimes use such libraries if I have no other choice, but it would be much better to split it into smaller, more focused libraries probably?

                                                                            1. 2

                                                                              But then we’re back to the left-pad dilemma!

                                                                              1. 1

                                                                                Not if you don’t rely on a mutable third party store for your production builds :)

                                                                                1. 3

                                                                                  But you’ve just pushed the workload somewhere else… now you have to separately monitor for and pull in manually any bug fixes and security patches that get made in the dependency. Vendoring (whether in the traditional sense, or merely by snapshotting some specific state of a remote source) is a valid approach, but no panacea.

                                                                          1. 2

                                                                            I wonder how small you could make the instruction set and move extra stuff to the bootstrap/self-extension. That’s a lot of ops!

                                                                            1. 3

                                                                              I wrote some Agda formalizations here: https://github.com/wolverian/dawn/blob/main/Dawn.agda

                                                                              1. 4

                                                                                This was a very good and in-depth post. I learned that perf had a ton of counters. Thanks!

                                                                                1. 4

                                                                                  At first I thought this would be using a known database of CSE material, stored in hashed form or such, that could simply be indexed against and thought it was a great idea. To me, that is the ideal non-intrusive method of identifying users storing illegal material of this nature (and, best case scenario, finding a collision within some hashing method). Finding out that it’s an “AI” that scans all your images and sends them to an Apple-approved review board was really disheartening. It feels like this is just a convoluted method for Apple to get governments and grant agencies to fund their AI research in some way.

                                                                                  1. 8

                                                                                    Did you read the technical summary from Apple? Sounds like a modern take on PhotoDNA using a neural network to me.

                                                                                    NeuralHash is a perceptual hashing function that maps images to numbers. Perceptual hashing bases this number on features of the image instead of the precise values of pixels in the image.

                                                                                    The main purpose of the hash is to ensure that identical and visually similar images result in the same hash, and images that are different from one another result in different hashes. For example, an image that has been slightly cropped or resized should be considered identical to its original and have the same hash.

                                                                                    1. 3

                                                                                      identical and visually similar images result in the same hash, and images that are different from one another result in different hashes

                                                                                      Mostly ignorant questions: Do we not know from adversarial classifier research that, given “similar” and “different” are meant to be meaningful in a human sense, this is impossible? Are they going to publish their false positive rate? Has this been tested against adversarial images or just with a normal dataset?

                                                                                      1. 4

                                                                                        I’m not 100% sure what you’re asking, but NeuralHash isn’t a classifier in the usual sense, i.e. it can’t say “this is a picture of a dog” or whatever. It has specifically been trained to detect if two images are literally the same image.

                                                                                        Consider trying to detect known bad images with SHA-256. Someone wanting to circumvent that detection could save as png instead of jpg, or crop 1px off the side, or brighten by 1%. Any small change like that would defeat fingerprinting based on SHA-256 or other conventional hashing algorithms. Clearly this problem calls for another solution.

                                                                                        NeuralHash—and its predecessor PhotoDNA, a procedural algorithm—doesn’t make any inferences about the conceptual contents of the image. Instead it generates a hash that can match two images together even if they’ve been edited. The images don’t have to be similar or different in a strictly human sense. For example, NeuralHash would not produce a match for two different images of paint splatter. The edges of the splatters would be at different distances and angles relative to each other. Humans may consider two such images as visually similar, or perhaps even indistinguishable without close inspection, but NeuralHash doesn’t measure that sort of visual similarity.

                                                                                        1. 4

                                                                                          If it’s not literally a SHA of the file, then it’s some kind of imprecise comparison, and calling it “neural” sure makes it sound like a classifier. Presumably the result of the comparison algorithm is supposed to be “a human would consider B to be an edited version of A”. Otherwise it would again be trivial to fool, just by changing one pixel.

                                                                                          So if it’s an imprecise algorithm supposedly mimicking a human decision, the natural question is, what happens if an adversary tries to induce a false positive? Impossible with a file hash, but I’m suspecting significantly less impossible with this algorithm. At any rate, I’m just asking, has anyone tried?

                                                                                          1. 2

                                                                                            Using the name “Neural” likely just means the hashing algorithm has been optimised to run on the Apple Neural Engine accelerator, for the sake of speed and power efficiency.

                                                                                            What happens if an adversary tries to induce a false positive? About the only consequence I can think of is that the person at Apple who performs the human review step has a slightly nicer day because they got to look at an image which isn’t CSAM.

                                                                                            1. 2

                                                                                              According to the article:

                                                                                              Matthew Green, a top cryptography researcher at Johns Hopkins University, warned that the system could be used to frame innocent people by sending them seemingly innocuous images designed to trigger matches for child pornography. That could fool Apple’s algorithm and alert law enforcement. “Researchers have been able to do this pretty easily,” he said of the ability to trick such systems.

                                                                                              So it would seem so.

                                                                                              I’m not entirely sure if this needs to be an issue as such. If a match triggers an immediate automatic shutdown of your account: then yes, it’s a problem. But as I read this, it merely “flags” your account for closer inspection, and you may not even know it happened. And as I understand it, false matches are mostly limited to intentionally crafted images, rather than accidental false match (like e.g. Tumblr’s mess of a “nudity detector”).

                                                                                              The bigger issue is sending of actual child pornography: you don’t control the messages you receive over email, iMessage, WhatsApp, etc. and a malicious “here are the pictures you asked for 😉” message could potentially land you in some problems. It essentially opens you up to a new type of “digital swatting” attack and also one that could potentially be hard to defend from in cases of persistent and motivated attackers.

                                                                                              1. 2

                                                                                                Photos over Messages are not automatically added to iCloud Photos. WhatsApp has an option to do that, I think.

                                                                                                1. 1

                                                                                                  One of the concerns that occured to me is that if adversarial testing hasn’t been a focus, it might turn out to be pretty easy to generate a false positive, and then you’ve not only got a 4chan-ready DoS attack on iCloud but a “mass swatting” opportunity. Imagine an adversarial meme image goes popular, and once you receive ten of them your account gets locked and you’re reported to the authorities.

                                                                                                  1. 1

                                                                                                    But some meme is harmless? Who cares if some harmless meme is reported to the authorities.

                                                                                        2. 4

                                                                                          You might be confusing the Messages feature and the iCloud feature.

                                                                                        1. 3

                                                                                          I foresee lots of horny teenagers causing lots of very awkward problems.

                                                                                          1. 17

                                                                                            How so? Do teenagers frequently exchange known images of child pornography that the relevant authorities have already reviewed for inclusion in this database?

                                                                                            1. 3

                                                                                              To be fair, the Messages feature doesn’t match against a list of hashes but actually does do ML to detect suspicious photos. That said, it’s only for under-13s, who are not teenagers.

                                                                                              1. 5

                                                                                                Additionally, that particular feature does not communicate with Apple at all. Only the parents or guardians on the account. So maybe an awkward conversation with a parent but Apple never gets involved.