1. 12

    You don’t have to use the golden ratio; multiplying by any constant with ones in the top and bottom bits and about half those in between will mix a lot of input bits into the top output bits. One gotcha is that it only mixes less-significant bits towards more-significant ones, so the 2nd bit from the top is never affected by the top bit, 3rd bit from the top isn’t affected by top two, etc. You can do other steps to add the missing dependencies if it matters, like a rotate and another multiply for instance. (The post touches on a lot of this.)

    FNV hashing, mentioned in the article, is an old multiplicative hash used in DNS, and the rolling Rabin-Karp hash is multiplicative. Today Yann Collet’s xxHash and LZ4 use multiplication in hashing. There have got to be a bajillion other uses of multiplication for non-cryptographic hashing that I can’t name, since it’s such a cheap way to mix bits.

    It is, as author says, kind of interesting that something like a multiplicative hash isn’t the default cheap function everyone’s taught. Integer division to calculate a modulus is maybe the most expensive arithmetic operation we commonly do when the modulus isn’t a power of two.

    1. 1

      Nice! About the leftward bit propagation: can you do multiplication modulo a compile time constant fast? If you compute (((x * constant1) % constant2) % (1<<32)) where constant1 is the aforementioned constant with lots of ones, and constant2 is a prime number quite close to 1<<32 then that would get information from the upper bits to propagate into the lower bits too, right? Assuming you’re okay with having just slightly fewer than 1<<32 hash outputs.

      (Replace 1<<32 with 1<<64 above if appropriate of course.)

      1. 1

        You still have to do the divide for the modulus at runtime and you’ll wait 26 cycles for a 32-bit divide on Intel Skylake. You’ll only wait 3 cycles for a 32-bit multiply, and you can start one every cycle. That’s if I’m reading the tables right. Non-cryptographic hashes often do multiply-rotate-multiply to get bits influencing each other faster than a multiply and a modulus would. xxHash arranges them so your CPU can be working on more than one at once.

        (But worrying about all bits influencing each other is just one possible tradeoff, and, e.g. the cheap functions in hashtable-based LZ compressors or Rabin-Karp string search don’t really bother.)

        1. 1

          you’ll wait 26 cycles for a 32-bit divide on Intel Skylake

          And looking at that table, 35-88 cycles to divide by a 64 bit divide. Wow. That’s so many cycles, I didn’t realize. But I should have: on a 2.4 GHz processor 26 cycles is 10.83 ns per op, which is roughly consistent with the author’s measurement of ~9 ns per op.

          1. 1

            That’s not what I asked. I asked a specific question.

            can you do multiplication modulo a compile time constant fast?

            similarly to how you can do division by a constant fast by implementing it as multiplication by the divisor’s multiplicative inverse in the group of integers modulo 2^(word size). clang and gcc perform this optimisation out the box already for division by a constnat. What I was asking is if there’s a similar trick for modulo by a constant. You obviously can do (divide by divisor, multiply by divisor, subtract from original number), but I’m wondering if there’s something quicker with a shorter dependency chain.

            1. 1

              OK, I get it. Although I knew about the inverse trick for avoiding DIVs for constant divisions, I didn’t know or think of extending that to modulus even in the more obvious way. Mea culpa for replying without getting it.

              I don’t know the concrete answer about the best way to do n*c1%(2^32-5) or such. At least does intuitively seem like it should be possible to get some win from using the high bits of the multiply result as the divide-by-multiplying tricks do.

        2. 1

          So does that mean that when the author says Dinkumware’s FNV1-based strategy is too expensive, it’s only more expensive because FNV1 is byte-by-byte and fibonacci hashing multiplying by 2^64 / Φ works on 8 bytes at a time?

          Does that mean you could beat all these implementations by finding a multiplier that produces an even distribution when used as a hash function working on 8 byte words at a time? That is, he says the fibonacci hash doesn’t produce a great distribution, whereas multipliers like the FNV1 prime are chosen to produce good even distributions. So if you found an even-distribution-producing number for an 8 byte word multiplicative hash, would that then work just as well whatever-hash-then-fibonacci-hash? But be faster because it’s 1 step not 2?

          1. 1

            I think you’re right about FNV and byte- vs. word-wise multiplies.

            Re: 32 vs. 64, it does look like Intel’s latest big cores can crunch through 64-bit multiplies pretty quickly. Things like Murmur and xxHash don’t use them; I don’t know if that’s because perf on current chips is for some reason not as good as it looks to me or if it’s mainly for the sake of older or smaller platforms. The folks that work on this kind of thing surely know.

            Re: getting a good distribution, the limitations on the output quality you’ll get from a single multiply aren’t ones you can address through choice of constant. If you want better performance on the traditional statistical tests, rotates and multiplies like xxHash or MurmurHash are one approach. (Or go straight to SipHash, which prevents hash flooding.) Correct choice depends on what you’re trying to do.

            1. 2

              That makes me wonder what hash algorithm ska::unordered_map uses that was faster than FNV1 in dinkumware, but doesn’t have the desirable property of evenly mixing high bits without multiplying the output by 2^64 / φ. Skimming his code it looks like std::hash.

              On my MacOS system, running Apple LLVM version 9.1.0 (clang-902.0.39.2), std::hash for primitive integers is the identity function (i.e. no hash), and for strings murmur2 on 32 bit systems and cityhash64 on 64 bit systems.

              // We use murmur2 when size_t is 32 bits, and cityhash64 when size_t
              // is 64 bits.  This is because cityhash64 uses 64bit x 64bit
              // multiplication, which can be very slow on 32-bit systems.
              

              Looking at CityHash, it also multiplies by large primes (with the first and last bits set of course).

              Assuming then that multiplying by his constant does nothing for string keys—plausible since his benchmarks are only for integer keys—does that mean his benchmark just proves that dinkumware using FNV1 for integer keys is better than no hash, and that multiplying an 8 byte word by a constant is faster than multiplying each integer byte by a constant?

          2. 1

            A fair point that came up over on HN is that people mean really different things by “hash” even in non-cryptographic contexts; I mostly just meant “that thing you use to pick hashtable buckets.”

            In a trivial sense a fixed-size multiply clearly isn’t a drop-in for hashes that take arbitrary-length inputs, though you can use multiplies as a key part of variable-length hashing like xxHash etc. And if you’re judging your hash by checking that outputs look random-ish in a large statistical test suite, not just how well it works in your hashtable, a multiply also won’t pass muster. A genre of popular non-cryptographic hashes are like popular non-cryptographic PRNGs in that way–traditionally judged by running a bunch of statistical tests.

            That said, these “how random-looking is your not-cryptographically-random function” games annoy me a bit in both cases. Crypto-primitive-based functions (SipHash for hashing, cipher-based PRNGs) are pretty cheap now and are immune not just to common statistical tests, but any practically relevant method for creating pathological input or detecting nonrandomness; if they weren’t, the underlying functions would be broken as crypto primitives. They’re a smart choice more often than you might think given that hashtable-flooding attacks are a thing.

            If you don’t need insurance against all bad inputs, and you’re tuning hard enough that SipHash is intolerable, I’d argue it’s reasonable to look at cheap simple functions that empirically work for your use case. Failing statistical tests doesn’t make your choice wrong if the cheaper hashing saves you more time than any maldistribution in your hashtable costs. You don’t see LZ packers using MurmurHash, for example.

          1. 5

            In case it’s still tagged crypto, note that these aren’t generators for cryptography; they’re for producing noise with good statistical properties faster than cryptographic generators would. That’s useful for, for example, Monte Carlo simulation.

            It is interesting to note that some cryptographic primitives are fast enough to be in the comparison table, though. ChaCha20/8 and AES-128 counter mode are both under one cycle/byte on new x64 hardware and really well studied. If you find nonrandomness in them of any sort that would break your Monte Carlo simulation, you can probably get a paper published about it at least.

            1. 4

              chromium-browser is scrutinized closely enough that this would be noticed on ubuntu, right?

              1. 5

                The sandbox engine downloading and running ESET actually appears to be in Chromium: https://cs.chromium.org/chromium/src/chrome/browser/safe_browsing/chrome_cleaner/ so developpers are free to review it and remove any reference to it. If my memory serve me well, Chrome Cleaner is not special and should appear in chrome://components/ along other optional close source components, although I don’t have a windows machine to validate right now. It should (Or at least used to) be disabled for other build than Google Chrome.

                1. 2

                  Thanks. It doesn’t appear in chrome://components for me, at any rate.

                  1. 1

                    If I look at it on windows I can see the entry: Software Reporter Tool - Version: 27.147.200

                    1. 1

                      Excellent, a positive control.

                2. 2

                  isra17’s reply implies there’s no scanner in Chromium, only Chrome. [I wrote this referring to his separate comment–now he has another reply here.] It probably wouldn’t make sense to have this on Linux anyway, just because there isn’t the same size of malware ecosystem there.

                  (And I think the reporting/story would be different if the scanner were open source–we’d have an analysis based on the source code, people working on patched Chromium to remove it, and so on.)

                  1. 1

                    I’m curious about MacOS. I don’t run Chrome usually, but I have to in some cases, e.g. to use Google Meets for work.

                    1. 2

                      I don’t have an authoritative answer, but https://www.blog.google/products/chrome/cleaner-safer-web-chrome-cleanup/ only talks about Windows.

                      1. 2

                        I don’t see it in chrome://components on my Mac, if that is indeed where it is supposed to appear.

                  1. 3

                    We’re a small shop (~15 folks, ~10 eng), but old (think early 2000s, using mod_perl at the time). Not really a startup but we match the description otherwise so:

                    It’s a Python/Django app, https://actionk.it, which some lefty groups online use to collect donations, run their in-person event campaigns and mailing lists and petition sites, etc. We build AMIs using Ansible/Packer; they pull our latest code from git on startup and pip install deps from an internal pip repo. We have internal servers for tests, collecting errors, monitoring, etc.

                    We have no staff focused on ops/tools. Many folks pitch in some, but we’d like to have a bit more capacity for that kind of internal-facing work. (Related: hiring! Jobs at wawd dot com. We work for neat organizations and we’re all remote!)

                    We’ve got home-rolled scripts to manage restarting our frontend cluster by having the ASG start new webs and tear the old down. We’ve scripted hotfixes and semi-automated releases–semi-automated meaning someone like me still starts each major step of the release and watches that nothing fishy seems to be happening. We do still touch the AWS console sometimes.

                    Curious what prompts the question; sounds like market research for potential product or something. FWIW, many of the things that would change our day-to-day with AWS don’t necessarily qualify as Solving Hard Problems at our scale (or 5x our scale); a lot of it is just little pain points and time-sucks it would be great to smooth out.

                    1. 6

                      FYI, I get a “Your connection is not private” when going to https://actionk.it. Error is NET::ERR_CERT_COMMON_NAME_INVALID, I got this on Chrome 66 and 65.

                      1. 2

                        Same here on Safari.

                        1. 1

                          Sorry, https://actionkit.com has a more boring domain but works :) . Should have checked before I posted, and we should get the marketing site a cert covering both domains.

                        2. 1

                          Firefox here as well.

                          1. 1

                            Sorry, I should have posted https://actionkit.com, reason noted by the other comments here.

                          2. 1

                            https://actionk.it

                            This happens because the served certificate it for https://actionkit.com/

                            1. 1

                              D’oh, thanks. Go to https://actionkit.com instead – I just blindly changed the http://actionk.it URL to https://, but our cert only covers the boring .com domain not the vanity .it. We ought to get a cert that covers both. (Our production sites for clients have an automated Let’s Encrypt setup without this problem, for the record :) )

                          1. 9

                            The main topic discussed here is now known as the Han Unification, for those curious to catch up with what’s happened since this was written.

                            1. 2

                              A consequence that hadn’t soaked in for me is that you need to know the language of a piece of text to reliably display it correctly. I’m guessing Twitter and Facebook and such are assigning languages to comments, etc. based on content, the poster’s language preferences/location/Accept-Language, and who knows what else.

                              (And then if a user switches into not-their-native-language for a post, or mixes languages by quoting text in another language or whatever, there’s a whole other level of difficulty.)

                            1. 7

                              Compared to, say, the ARM whitepaper, Intel’s still reads to me as remarkably defensive, especially the section on “Intel Security Features and Technologies.” Like, we know Intel has no-execute pages, as other vendors do, and we know they aren’t enough to solve this problem. And reducing the space where attackers can find speculative gadgets isn’t solving the root problem.

                              Paper does raise the interesting question of exactly how expensive the bounds-check bypass mitigation will be for JS interpreters, etc. To foil the original attack, you don’t have to stop every out-of-bounds load, you just have to keep the results from getting leaked back from speculation-land to the “real world.” So you only need a fence between a potentially out-of-bounds load and a potentially leaky operation (like loading an array index that depends on the loaded value). You might even be able to reorder some instructions to amortize one fence across several loads from arrays. And I’m sure every JIT team has looked at whether they can improve their bounds check elimination. There’s no lack of clever folks working on JITs now, so I’m sure they’ll do everything that can be done.

                              The other, scarier thing is bounds checks aren’t the only security-relevant checks that can get speculated past, just an easy one to exploit. What next–speculative type confusion? And what if other side-channels besides the cache get exploited? Lot of work ahead.

                              1. 5

                                FWIW: a possible hint at Intel’s future directions (or maybe just a temp mitigation?) are in the IBRS patchset to Linux at https://lkml.org/lkml/2018/1/4/615: one mode helps keep userspace from messing with kernel indirect call speculation and another helps erase any history the kernel left in the BTB. I bet both of these are blunt hammers on current CPUs (‘cause a microcode update can only do so much–turn a feature off or overwrite the BTB or whatever), but they’re defining an interface they want to make work more cheaply on future CPUs. It also seems to be enabled in Windows under the name “speculation control” (https://twitter.com/aionescu/status/948753795105697793)

                                ARM says in their whitepaper that most ARM implementations have some way to turn off branch prediction or invalidate branch predictor state in kernel/exception handler code, which sounds about in line with what Intel’s talking about. The whitepaper also talks about barriers to stop out of bounds reads. The language is a bit vague but I think they’re saying an existing conditional move/select works on current chips and a new instruction, CLDB will barrier for future chips that provides just the minimum you need to avoid the cache side channel attack.

                                1. 25

                                  Zenyep Tufekci said, “[T]oo many worry about what AI—as if some independent entity—will do to us. Too few people worry what power will do with AI.” (The thread starts with a result that, in some circumstances, face recognition can effectively identify protesters despite their efforts to cover distinguishing features with scarves, etc.)

                                  And, without really having to have tech resembling AI, what they can do can look like superhuman abilities looked at right. A big corporation doesn’t have boundless intelligence but it can hire a lot of smart lawyers, lobbyists, and PR and ad people, folks who are among the best in their fields, to try and shift policy and public opinion in ways that favor the sale of lots of guns, paying a lot for pharmaceuticals, use of fossil fuels, or whatever. They seem especially successful shifting policy in the U.S. recently, with the help of recent court decisions that free some people up to spend money to influence elections (and the decisions further back that established corps as legal people :/).

                                  With recent/existing tech, companies have shown they can do new things. It’s cheap to test lots of possibilities to see what gets users to do what you want, to model someone’s individual behavior to keep them engaged. (Here’s Zenyep again in a talk on those themes that I haven’t watched.) The tech giants managed to shift a very large chunk of ad spending from news outlets and other publishers by being great at holding user attention (Facebook) or being smarter about matching people to ads than anyone else (Google), or shift other spending by gatekeeping what apps you can put on a device and taking a cut of what users spend (Apple with iOS), or reshape market after market with digitally-enabled logistics, capital, and smart strategy (Amazon). You can certainly look forward N years to when they have even more data and tools and can do more. But you don’t really even have to project out to see a pretty remarkable show of force.

                                  This is not, mostly, about the details of my politics, nor is it to suggest silly things like that we should roll back the clock on tech; we obviously can’t. But, like, if you want to think about entities with incredible power that continue to amass more and how to respond to them, you don’t have to imagine; we have them right here and now!

                                  1. 3

                                    Too few people worry what power will do with AI.

                                    More specifically, the increasingly police statey government.

                                  1. 42

                                    Reminds me of a quote:

                                    Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it.

                                    • Brian W. Kernighan
                                    1. 13

                                      :) Came here to post that.

                                      The blog is good but I’m not convinced by his argument. It seems too worried about what other people think. I agree that we have to be considerate in how we code but forgoing, say, closures because people aren’t familiar with them or because we’re concerned about how we look will just hold back the industry. Higher level constructs that allow us to simplify and clarify our expression are a win in most cases. People can get used to them. It’s learning.

                                      1. 8

                                        I think he may not disagree with you as much as it sounds like. I don’t think that sentence says “don’t use closures,” just that they’re not for impressing colleagues. (It was: “You might impress your peers with your fancy use of closures… but this no longer works so well on people who have known for a decades what closure are.”)

                                        Like, at work we need closures routinely for callbacks–event handlers, functions passed to map/filter/sort, etc. But they’re just the concise/idiomatic/etc. way to get the job done; no one would look at the code and say “wow, clever use of a closure.” If someone does, it might even signal we should refactor it!

                                        1. 5

                                          It seems too worried about what other people think.

                                          I agree with your considerations on learning.
                                          Still, just like we all agree that good code must be readable, we should agree that it should be simple too. If nothing else, for security reasons.

                                        2. 2

                                          On the other hand, sometimes concepts at the limit of my understanding (like advanced formal verification techniques) allow me to write better code than if I had stayed well within my mental comfort zone.

                                        1. 3

                                          Here are some things:

                                          These are, again, perf but not profiling, but I really want to write (and have started on) a super beginner-focused post about memory use. I wish someone would write a balanced intro to concurrency–from StackOverflow, a couple common mistakes are dividing work into very tiny tasks (with all the overhead that causes) when NumCPU workers and large chunks would do better, and using channels when you really do just want a lock or whatever kind of shared datastructure.

                                          1. 11

                                            When you mentioned channel costs, I wondered if there was communication via unbuffered channels, which can lead to traffic jams since the sender can’t proceed ‘til each recipient is ready. Looking at the old chat_handler.go that doesn’t seem to be the case, though. The three goroutines per connection thing isn’t without precedent either; I think at least the prototypes of HTTP/2 support for the stdlib were written that way.

                                            It looks like maybe the socketReaderLoop could be tied in with ChatHandler.Loop(): where socketReaderLoop communicates with Loop on a channel, just inline the code that Loop currently runs in response, then call socketReaderLoop at the end of Loop instead of starting it asynchronously. You lose the 32-message buffer, but the end-user-facing behavior ought to be tolerable. (If a user fills your TCP buffers, seems like their problem and they can resend their packets.) However, saving one goroutine per connection probably isn’t a make-or-break change.

                                            Since you talk about memory/thrashing at the end, one of the more promising possibilities would be(/have been) to do a memprofile to see where those allocs come from. A related thing is Go is bad about respecting a strict memory limit and its defaults lean towards using RAM to save GC CPU: the steady state with GOGC=100 is around 50% of the peak heap size being live data. So you could start thrashing with 512MB RAM once you pass 256MB live data. (And really you want to keep your heap goal well under 512MB to leave room for kernel stuff, other processes, the off-heap mmapped data from BoltDB, and heap fragmentation.) If you’re thrashing, GC’ing more often might be a net win, e.g. GOGC=50 to be twice as eager as the default. Finally, and not unrelated, Go’s collector isn’t generational, so most other collectors should outdo it on throughput tests.

                                            Maybe I’m showing myself not to be a true perf fanatic, but 1.5K connections on a Pi also doesn’t sound awful to me, even if you can do better. :) It’s a Pi!

                                            1. 2

                                              Thank you for such a detailed analysis and looking into code before you commented :) positive and constructive feedback really helps. I have received a great amount of feedback and would definitely try with your tips. BoltDB is definitely coming up every-time and I think it contributes to memory usage as well. Some other suggestions include use fixed n workers and n channels, backlog building up, and me not doing serialization correctly. I will definitely update my benchmark code and test it with new fixes; and if I feel like code is clean enough would definitely love to move back.

                                              1. 3

                                                Though publicity like this is fickle, you might get a second hit after trying a few things and then explicitly being like “hey, here’s my load test, here are the improvements I’ve done already; can you guys help me go further?” If you don’t get the orange-website firehose, you at least might hear something if you post to golang-nuts after the Thanksgiving holiday ends or such.

                                                Looking around more, I think groupInfo.GetUsers is allocating a string for each name each time it’s called, and then when you use the string to get the object out there’s a conversion back to []byte (if escape analysis doesn’t catch it), so that’s a couple allocs per user per message. Just being O(users*messages) suggests it could be a hotspot. You could ‘downgrade’ from the Ctrie to a RWLocked map (joins/leaves may wait often, but reads should be fastish), sync.Map, or (shouldn’t be needed but if you were pushing scalability) sharded RWLocked map. But before you put in time trying stuff like that, memprofile is the principled/right way to approach alloc stuff (and profile for CPU stuff)–figure out what’s actually dragging you down.

                                                True that there are likely lighter ways to do the message log than Bolt. Just files of newline-separated JSON messages may get you far, though don’t know what other functionality you support.

                                                FWIW, I also agree with the commenter on HN saying that Node/TypeScript is a sane approach. (I’m curious about someday using TS in work’s frontend stuff.) Not telling you what to use; trying to get Go to do things is just a hobby of mine, haha. :)

                                            1. 7

                                              I’ve been using a Samsung ARM Chromebook (1st generation) as my daily driver for the past 4 years. It’s a lowend, underpowered machine with nothing to write home about, but it can support a full Arch Linux ARM installation, run a web browser just fine, and have an adequate number of terminals. I love it. The battery life hasn’t changed at all since I bought it, it’s still consistently getting >7 hours. I have other friends with ARM laptops from other manufacturers, the battery life story is one I hear consistently.

                                              1. [Comment removed by author]

                                                1. 6

                                                  dz, I wrote up a blog post about this: http://blog.jamesluck.com/installing-real-linux-on-arm-chromebook I completely replaced ChromeOS with Archlinux ARM on the internal SSD. The gist of the process is that you make a live USB, boot to that, and then follow the same procedure for making the bootable USB onto the SSD. You just have to untar the root filesystem, edit /etc/fstab, and correct the networking config!

                                                  1. 2

                                                    Thanks so much for writing and sharing!

                                                  2. 1

                                                    If it’s anything like my Samsung ARM Chromebook, you can boot a different os off external storage (i.e. an SD card), or possibly replace Chrome OS on the internal solid-state storage.

                                                    1. 1

                                                      You can replace ChromeOS. Here’s the Arch wiki on the 1st gen Samsung ARM Chromebook and the Samsung Chromebook Plus.

                                                  1. 6

                                                    Not sure how responsive this is to the specific q, but I have an ARM Samsung Chromebook Plus. Initially I used developer mode and crouton, which provides an Ubuntu userspace in a chroot hitting Chrome OS’s Linux kernel, but more recently I use a setup like Kenn White describes, where you’re not in dev mode and you use Termux, an Android app that provides a terminal and apt-based distro inside the app’s sandbox. (Termux has a surprising number of apt packages covered and you can build and run executables in there, etc. You can also just use it on any Android phone or tablet.) In both cases I just use terminal stuff, though Crouton does support running an X session. You can apparently boot Linux on the Chromebook Plus full-stop; I don’t know about compatibility, etc. Amazon’s offering it for $350 now.

                                                    I’ve long been more indifferent to CPU speed than a lot of folks (used netbooks, etc.), but for what it’s worth I find the Chromebook Plus handles my needs fine. The chip is the RK3399, with two out-of-order and four in-order ARMv8 cores and an ARM Mali GPU. I even futz with Go on it, and did previously on an even slower ARM Chromebook (Acer Chromebook 13). The most noticeable downsides seemed more seem to be memory-related: it has 4GB RAM but…Chrome. If I have too much open the whole thing stalls a bit while it tries (I think) compressing RAM and then stopping processes. Also, there’s a low level of general glitchiness: sometimes I’ll open it up and it’ll boot fresh when it should be resuming from sleep. I had to do some tweaking to do what I wanted inside Chrome OS’s bounds, notably to connect to work’s VPN (had to make a special JSON file and the OS doesn’t readily give useful diagnostics if you get anything wrong), but it’s just a matter of round tuits.

                                                    From the ARM vs. Intel angle, Samsung actually made a Chromebook Pro with the low-power but large-core Core m3. Reviewers seemed to agree the Pro was faster, and I think I saw it was like 2x on JavaScript benchmarks, but Engadget wasn’t fond of its battery life and The Verge complained its Android app support was limited.

                                                    1. 3

                                                      Since we’re talking about the Plus, a few notes on my experience. I have an HP chromebook with an m3 as well, for comparison. None of the distinguishing features of the Plus were positives. Chrome touch support is just meh, and the keyboard was annoyingly gimped by very small backspace, etc. Touchpad was noticeably worse. Android support was basically useless for me.

                                                      More on topic, the ARM cpu was capable enough, but the system wasn’t practically lighter or longer lasting than the m3 system. It was cheaper, perhaps 50% so, but that’s still only a difference of a few hundred dollars. (I mean, that’s not nothing, but if this is a machine you’ll be using for hours per day over several years… Price isn’t everything.)

                                                      (I would not pick the Samsung Pro, either. Not a great form factor imo.)

                                                      More generally, if you don’t have ideological reasons to pick arm, the intel m3 is a pretty remarkable cpu.

                                                    1. 5

                                                      The technical description doesn’t fully convey some of the, uh, magic that’s been done with it. One of the coolest things is Arcadia, which gives you access to game-engine fun with a Clojure REPL. The MAGIC stuff is apparently key to cutting allocations so your games aren’t all janked up by collection pauses.

                                                      The author, Ramsey Nasser, also posts about a bunch of fun stuff on Twitter–other recent stuff he’s messed with includes spline drawing tool and (related, I think) Arabic typography

                                                      1. 6

                                                        Posted this in the orange place, figure that it’s worth saying here:

                                                        FWIW, I think this mostly shows that the GOGC defaults will surprise you when your program keeps a small amount of live data but allocates quickly and is CPU bound. By default, it’s trying to keep allocated heap size under 2x live data. If you have a tiny crypto bench program that keeps very little live data around, it won’t take many allocations to trigger a GC. So the same code benchmarked here would trigger fewer GCs in the context of a program that had more data live. For example, it would have gone faster if the author had allocated a 100MB []byte in a global variable.

                                                        If your program is far from exhausting the RAM but is fully utilizing the CPU, you might want it to save CPU by using more RAM, equivalent to increasing GOGC here. The rub is that it’s hard for the runtime ever be sure what the humans want without your input: maybe this specific Go program is a little utility that you’d really like to use no more RAM than it must, to avoid interference with more important procs. Or maybe it’s supposed to be the only large process on the machine, and should use a large chunk of all available RAM. Or it’s in between, if there are five or six servers (or instances of a service) sharing a box. You can imagine heuristic controls that override GOGC in corner cases (e.g., assume it’s always OK to use 1% of system memory), or even a separate knob for max RAM use you could use in place of GOGC. But the Go folks tend to want to keep things simple, so right now you sometimes have to play with GOGC values to get the behavior you want.

                                                        1. 6

                                                          As best I can reconstruct, the thing the Chrome team was dealing with was that Chrome had to delay starting to scroll until they ran touch event handlers, because the handler could always preventDefault() to disable scrolling, but 1) 80% of touch event handlers weren’t actually calling preventDefault(), 2) this added ~150ms latency before scroll started in a sample case, and doubled 95th-percentile latency in their field tests, and 3) even before adding their passive option, there was a CSS alternative for preventing scroll-on-touch without JS required at all. (They also argue that browsers already have timeouts for these handlers, so they weren’t entirely reliable in the first place.) So the post author’s app probably used touchstart to prevent scrolling and then Chrome broke that.

                                                          If the Chrome folks wanted to do this and were willing to go to a lot more effort to mitigate breakage, maybe they could have effectively tried to patch up the damage after a passive touchstart handler called preventDefault: jump back to the old scroll position, log a complaint to the console, and treat this handler (or all handlers) as active for subsequent events. It would be very complicated (both for Chrome hackers to code up and for Web devs to understand–consider that arbitrary DOM changes can happen between the scroll starting and the preventDefault!), lead to a janky scroll-and-snap-back sequence for users, and still break for more complex cases, but would likely avoid breaking some previously-usable pages entirely.

                                                          I get the value of working pretty hard to maintain backwards compatibility, and it’s possible the Chrome folks moved a little fast here; they did actually take the time to gather data (which they use sometimes: a GitHub comment provides some general background on when they’ve backed out interventions), but the ‘scrolling intervention’ page reports only 80% of listeners were passive, and 20% breakage seems high.

                                                          For what it’s worth, I’m not absolutist about targeted changes that might break some standards-compliant pages; as a dev I see compat impact from changes that are outside the scope of standards (sunsetting Flash, autofill changes, tracking protection, things popular extensions do) and bug fixes; it seems like there’s just always a cost-benefit balance and you need to see how things are used out on the Web, versus being able to reason from first principles what changes are OK or not. And the APIs being as old and quirky as they are, there are places where you really could subtly tweak some API and break almost nobody and improve things for almost everyone; this seems like the common category where we’d like to allow something to be async but that can break some uses of it.

                                                          It’s tricky to figure out if a change is worth it, and again, they might have gotten it wrong here, but I wouldn’t want “you can never break anything” to be the lesson from this.

                                                          1. 6

                                                            A bias in SHA256 doesn’t look like the best explanation for what they saw.

                                                            It sounds like they’re talking about a bias observed in live Bitcoin blocks not anything they’ve tested and seen in raw SHA256 themselves, because 2^80 hashes is more like what the entire Bitcoin network has done than anything they can do themselves.

                                                            I’m sure someone has said this better elsewhere, but it sounds like that just means miners aren’t using entirely random nonces in their search–the same thing as in the difficulty-less-than-1,000 blocks, but in a subtler way. Maybe some miner’s code/chips just don’t scan all nonces. Maybe in some miner’s cluster, they use a node ID in some bits of the nonce to partition nonce space, and some sections of nonce space are effectively left aside for future capacity they might add. Maybe the miner tries every possible nonce eventually, but there’s an ordering thing where they’ll tend to try some parts of nonce space last, the ones corresponding to the nonces you less often see in blocks. I don’t know a ton about the block format, but if you can break out which blocks were generated by specific software or mining companies, you might be able to spot whose blocks have the strongest bias.

                                                            If there were a SHA256 bias that strong (a couple bits set 55% of the time instead of expected 50!) and that simple (basically linear, and shows up even without a sophisticated search for linear characteristics), and visible through 2xSHA256’s 128 rounds not just SHA256’s 64, then SHA256 itself should have profound biases you’d expect to show up when cryptographers search for linear characteristics. And to test this directly, you wouldn’t need 2^80 effort – find a few blocks at difficulty >1000 searching through nonces randomly, and you should be able to see the 55%-versus-50% difference with much less effort than the Bitcoin miners have expended. Dunno whether that would be near try-this-at-home levels, but it would be a lot less than 2^80.

                                                            I can see how if it all looks like a black box to you a problem in SHA256 would be an enticing theory, but the least surprising explanation by far is that it’s an artifact of miners searching nonrandomly, not of the hash being nonrandom in such a strong way.

                                                            1. 1

                                                              Thanks a lot for the explanations.

                                                              I don’t have any feeling for the inner workings of cryptographic hash functions, only for reductions. I assumed Bitcoin does enough abuse in the way it applies hashing for a mild bias in this case not to contradict any well-studied properties of SHA256, I was wrong.

                                                              I was obviously wrong to put a summary only slightly less enthusiastic than the title instead of much less enthusiastic.

                                                              I didn’t believe that they found anything that would be an actual problem with SHA256 as a hash, but could we please have something non-catastrophic but noticeable (5% bias in Bitcoin mining would be nice, but not this time) to make people stop using random primitives as if they were perfect random oracles? I guess not this time. Sigh.

                                                            1. 3

                                                              Seems plausible Google didn’t go over Buganizer as carefully as they do many public-facing products because it seemed like an internal product, and the external audience that gets access (folks who successfully reported a security bug) seemed semi-trusted. I bet there are a lot of security issues in code that’s meant for semi-trusted audiences like this: a company’s vendors, say, or a small number of special high-value clients, or employees (when employees have access to a tool but shouldn’t all have ‘root’ on it). Place to look for issues in our own stuff potentially.

                                                              1. 1

                                                                Agreed—this is a prime example of security through obscurity. Tons of small shops do this sort of thing every day, employing only what security is necessary to keep out the passerby who isn’t too tech-savvy.

                                                                One would hope a place like Google wouldn’t lean on obscurity, but here they are doing it.

                                                                1. 2

                                                                  Agreed—this is a prime example of security through obscurity. Tons of small shops do this sort of thing every day, employing only what security is necessary to keep out the passerby who isn’t too tech-savvy.

                                                                  Oh, huh, you’re right and it’s different from what I originally thought. I had thought they had proper access control so that only bug reporters could access the tool, but weren’t securing products for that audience as well as they would secure apps that just anyone can trivially browse to. But, in fact, it looks like it’s public (at least, I can just browse to issuetracker.google.com), so it really was just a more obscure product being less thoroughly scrubbed for issues.

                                                              1. 0

                                                                Another great aspect of concurrency in Go is the race detector. This makes it easy [emphasis mine] to figure out if there are any race conditions within your asynchronous code.

                                                                Am I reading correctly? How exactly does this magical decision procedure establish the presence of absence of data races in your concurrent code?

                                                                1. 4

                                                                  Yeah–it is easy, but isn’t comprehensive. If racy accesses don’t actually occur in whatever you run under the detector nothing is reported, and racy accesses that have some totally unrelated sync operation happen between them (like there happened to be some lock taken but it doesn’t guard the racily-accessed data) aren’t found.

                                                                  A lot of people seem to say that an emphasis on CSP-style concurrency combined with the race detector catching some stuff gets them far. For what it’s worth, one dissenting opinion on that comes from Dropbox, who say clever engineers looking for races is the best they’ve got: https://about.sourcegraph.com/go/go-reliability-and-durability-at-dropbox-tammy-butow/

                                                                  1. 1

                                                                    Yeah–it is easy, but isn’t comprehensive.

                                                                    Then at best it is easy to find (some) data races, not to establish their presence or absence. The former might be good enough for your purposes, but the latter is how I interpret “figure out if there are any race conditions”. But, then, I’m not a native English speaker.

                                                                    If racy accesses don’t actually occur in whatever you run under the detector nothing is reported,

                                                                    Which is no different from languages in which concurrency is supposedly harder than in Go.

                                                                    and racy accesses that have some totally unrelated sync operation happen between them (like there happened to be some lock taken but it doesn’t guard the racily-accessed data) aren’t found.

                                                                    In a heavily concurrent program, this could be pretty much any racy access.

                                                                    A lot of people seem to say that an emphasis on CSP-style concurrency combined with the race detector catching some stuff gets them far.

                                                                    Yeah, I totally understand the feeling of empowerment when you learn something that seemed out of reach until not too long ago (in this case, writing concurrent programs), but IMO it’s not really justified unless you can reliably do it right. “Not too wrong” isn’t good enough.

                                                                    1. 4

                                                                      I think the OP is just being imprecise with their wording. They were technically wrong as soon as they started talking about race conditions instead of data races in the context of the race detector. (And many, many, many people get this wrong. It happens.)

                                                                      1. 1

                                                                        I think I’d get it wrong too?

                                                                        I guess my understanding is that a race condition is a general term (something which has two actors, and the sequencing of their operations matters) which includes a data race (the actors both happen to be modifying some data) is a specific instance of a race condition.

                                                                        Is that about right, or am I missing some nuance in the terms?

                                                                        1. 6

                                                                          They are actually completely orthogonal concepts. :-) You can have a race condition without a data race.

                                                                          John Regehr explains it far better than I could, with examples: https://blog.regehr.org/archives/490

                                                                          Also, you’re in good company. The blog post introducing the race detector even uses the term “race condition”: https://blog.golang.org/race-detector The official documentation of the tool, however, does not: https://golang.org/doc/articles/race_detector.html

                                                                          (And btw I completely agree with your “perfect is the enemy of the good” remarks.)

                                                                          (Please take a look at John’s blog post. It even shows an example of a data race that isn’t a race condition!)

                                                                          1. 3

                                                                            Thanks for that.

                                                                            Edit: I briefly posted saying I still thought data races were a subset of race conditions - edited after digesting blog.

                                                                            The blog post makes the distinction that (their definition of) race conditions is about correctness, and not all data races violate correctness, so not all data races are race conditions.

                                                                            That’s a subtle distinction and I’m not entirely sure I agree, but I understand better - so thanks :-)

                                                                            1. 6

                                                                              Dmitry Vyukov, who works on the Go race detector, is a big advocate for presuming data races are incorrect rather than trying to sort out if they’re safe, because things that look benign in the code can bite you. Sometimes the problems only arise with the help of compiler optimizations that assume there are no racy accesses and therefore, say, compute on copy of some value in a register rather than the original on the heap for some duration. He writes some about this at https://software.intel.com/en-us/blogs/2013/01/06/benign-data-races-what-could-possibly-go-wrong

                                                                              The io.Discard example partway down https://blog.golang.org/race-detector is good too: some memory that everyone thought of as a write-only ‘black hole’ turned out not to be in one specific situation, and a race that some experienced coders thought was safe caused a bug.

                                                                              Nothing there is inconsistent with what Regehr says, I don’t think: Vyukov isn’t saying a benign data race is an invalid concept, just saying it’s dangerous to try to write code with them. I mention it because examples like these made me much less inclined to try to figure out when data races were or weren’t benign.

                                                                              1. 1

                                                                                Hmm, interesting :) I’m going to read up and correct this.

                                                                              2. 1

                                                                                I don’t find terribly convicning that example of a data race that isn’t a race condition. The racy program doesn’t have a well-defined meaning in terms of the language it’s written in. Its behavior depends on the whims of the language implementor, i.e., they have ways to break your program without neglecting their duty to conform to the language specification.

                                                                                1. 1

                                                                                  I’m not convinced by your rebuttal. That doesn’t imply it is a race condition. It just implies that it is UB. UB could cause a race condition, but not necessarily so.

                                                                                  1. 1

                                                                                    The language specification is a contract between language implementors and users. To prove a language implementation incorrect, it suffices to show one conforming program whose behavior under the implementation doesn’t agree with the language specification. Conversely, to prove a program incorrect, it suffices to exibit one conforming language implementation in which the program deviates from its intended behavior.

                                                                                    In other words: Who cares that, by fortunate accident, there are no race conditions under one specific language implementation?

                                                                                    1. 1

                                                                                      In other words: Who cares that, by fortunate accident, there are no race conditions under one specific language implementation?

                                                                                      Everyone that uses it?

                                                                                      This is my last reply in this thread. I don’t find your objection worth discussing. Moreover, you didn’t actually refute my previous comment. You just lectured me about what UB means.

                                                                                      1. 2

                                                                                        The essence of my objection is that “there are no race conditions under this specific implementation” isn’t a very interesting property when implementation details are outside of your control. But maybe that’s not true. Maybe people genuinely don’t mind when changes to a language implementation suddenly break their programs. What do I know.

                                                                          2. 3

                                                                            I think it sounded like I was disagreeing when I was trying to agree: the Go race detector is basically LLVM’s ThreadSanitizer (Dmitry Vyukov did work on both), and as such can help but can’t prove the absence of races. Agree it’s nothing like Rust’s static checking or the isolated heaps some languages use.

                                                                            1. 2

                                                                              “Not too wrong” isn’t good enough.

                                                                              That’s a pretty good opposite to to “the perfect is the enemy of the good”.

                                                                              In most software domains, the impact of a race is related to how often you hit it. e.g. a race which only shows up under your typical load once a year is a low impact problem - you have other, more relevant bugs.

                                                                              (I agree that in some scenarios (medical, aviation, etc) formal proofs of correctness and other high assurance techniques may be required to establish that even low-likelihood bugs (of all kinds, not just races) can’t occur.)

                                                                              The (imperfect) race detector allows you most easily find the most frequent races (and all of the ones you can provoke in your auto tests).

                                                                              No, it’s not perfect, but it is a very useful tool.

                                                                              1. 1

                                                                                This comment made me think; I think another thing that lets people get by is that a lot of practical concurrency is the simpler stuff: a worker pool splitting up a large chunk of work, or a thread-per-request application server where sharing in-memory data isn’t part of the design and most of the interesting sync bits occur off in database code somewhere.

                                                                                Not that you’re guaranteed to get those right, or that advanced tools can’t help. (Globals in a simple net/http server can be dangerous!) Just, most of us aren’t writing databases or boundary-pushing high-perf apps where you have to be extra clever to minimize locking; that’s why what you might expect to become a complete, constant tire fire often works in practice.

                                                                        1. 1

                                                                          The error handling section was a little confusing to me: Are there any example that are meant by proper error handling?

                                                                          I think when you have multiple return values, you get many of the advantages of exceptions, but I don’t really know what the author of this blog is arguing for as compared to python besides exceptions.

                                                                          1. 1

                                                                            Well you’d at least expect this to be baked into the language: https://github.com/pkg/errors

                                                                            1. 3

                                                                              But by not being baked in, you have alternatives: https://pocketgophers.com/error-handling-packages/ , all of which conform to the error interface

                                                                              1. 1

                                                                                Cool site!

                                                                            2. 1

                                                                              I think he means the kind of stuff he now gets from third-party tools: showing where errors occurred, and making it hard to accidentally discard errors.

                                                                              (Also: big plug for https://github.com/gordonklaus/ineffassign. It finds assignments in your code that make no difference to how it runs, a generalization of the ‘no unused variables’ rule that turns up some bugs and, when it doesn’t, often points to somewhere you could make your code tighter. Better yet, you can use something like https://github.com/alecthomas/gometalinter with a subset of checks enabled.)