1. 2

    Babytimes, seeing some folks, and cleaning.

    1. 6

      Nice article. Also interesting for reasoning about DNS-over-HTTPS.

      As far as I can say from my experience in Kenya, it should also be noted that Africans have a very different way of perceiving time. And security. And… everything! :-D

      How would I address this issue?

      I think that I would basically create a reverse proxy serving over HTTP those sites that could benefit more from caching (eg Wikipedia). Probably with a custom domain such as wikipedia.cached.local so that people could not be fooled to take the proxied for the original. Rewriting URIs for hypertexts shouldn’t be an issue, but it could be harder to Ajax pages. Probably I would also create a control page so that a page could be prefetched or updated. With a custom protocol and a server in Europe, one could also prefetch several contents at once and send them back together, maximizing bandwidth usage.

      Obviously it wouldn’t be safe, but it would be visibly unsafe, and limited to those website that can get advantage of such caches without creating serious threats.

      As for service workers, I do not think they would improve the user experience at all, since they are local to the browser and the browser has a cache anyway. The problem is to share such cache between different machines.

      1. 2

        Local reverse proxy is a clever idea, and a proxy that you explicitly set up clients trust a la corporate middleboxes (see Lanny’s comment) seems like it can work in some environments too. Sympathetic to the problem of existing solutions no longer working, sort of surprised the original blog post wasn’t more about how to improve things now.

        1. 5

          The point of the machinery I described was to make user explicitly choose between security and access time.

          You can make everything smoother (and easier to implement) with a local CA or by installing proper fake certificates in the clients and transparent proxy, but then people cannot easily opt-out.
          Worse: they might be trusting the wrong people without any benefit, as for sensible pages that cannot be cached (shopping carts, online banking and similar…)

          That why using the reverse proxy should be opt-in, not default and trivial to opt out: there’s no need for a proxy if you want to edit a wikipedia page!

          Sympathetic to the problem of existing solutions no longer working, sort of surprised the original blog post wasn’t more about how to improve things now.

          Eric Meyer is a legend of HTML, CSS and Web accessibility. A legend, beyond any doubt.
          Before HTML5 I used to read his website daily. He teached me a lot.

          But he is a client-side guy.
          I think his reference to service workers is an attempt to improve things now.

          1. 2

            Nit: the past tense of teach is taught, not teached.

            1. 1

              I’m sorry… I can’t edit it anymore, but thanks!

      1. 8

        Neat to see the “Lobster” codebase get used this way. And that explains a strange and useless issues I recently got. I had thought they were some new, half-baked commercial product or service that wanted to contribute to Lobsters as a marketing effort (this has happened a couple times and so far been a complete waste of time). This is the only contact I’ve had with the study authors (I have no idea if they contacted jcs), though I see they’re also local to Chicago.

        Looks like this analysis is at least a few months old; when I added rubocop it caught all the opportunities to use find_by.

        1. 3

          :/ about the useless issues opened.

          There was a long history of automated analysis at Google and a takeaway was, more or less, “you don’t really have a useful analyzer ’til it mostly finds fixes coders think are useful.”

          I wonder what folks would come up with given incentives like that; currently, there’s probably more pressure to maximize your claimed number of bugs to get published.

        1. 1

          This analogy is a stretch, but ORMs and DBs face a problem a little like what dynamic language runtimes face: with more understanding of how the code really runs, you could make it run faster (x is almost always a float in this JS function, or we almost always/never retrieve obj.related_thing after this ORM query retrieves some objs), but that info isn’t readily available when the code is first run.

          JITs deal with this by recording specifics of what happens at each callsite, then making an optimized path for what usually happens. You could imagine ORMs tagging query results with the callsites they came from, and figuring out things like “this bulk query should probably retrieve this related object” or “looks like this query is usually just an existence check” from how the result was actually used.

          An inherent challenge with this kind of thing is that you need to make sure that tracking isn’t so expensive it eats up any advantage it brings. We take for granted that JVMs, V8, etc. do magic with our code, but that’s with big teams of experts working on them over years. Perhaps a more achievable thing is more like profile-guided optimization, where you do test runs in a slow stat-tracking mode and some changes get suggested to the code.

          That’s sort of a big dream and there is much lower hanging fruit; pushcx notes rubocop was able to find a chunk of things with static analysis, and stuff like “this query scans a table” or just “profiling shows this line is empirically one of our slowest” probably does a lot for big hotspots out there with a lot less trickiness.

          1. 12

            You don’t have to use the golden ratio; multiplying by any constant with ones in the top and bottom bits and about half those in between will mix a lot of input bits into the top output bits. One gotcha is that it only mixes less-significant bits towards more-significant ones, so the 2nd bit from the top is never affected by the top bit, 3rd bit from the top isn’t affected by top two, etc. You can do other steps to add the missing dependencies if it matters, like a rotate and another multiply for instance. (The post touches on a lot of this.)

            FNV hashing, mentioned in the article, is an old multiplicative hash used in DNS, and the rolling Rabin-Karp hash is multiplicative. Today Yann Collet’s xxHash and LZ4 use multiplication in hashing. There have got to be a bajillion other uses of multiplication for non-cryptographic hashing that I can’t name, since it’s such a cheap way to mix bits.

            It is, as author says, kind of interesting that something like a multiplicative hash isn’t the default cheap function everyone’s taught. Integer division to calculate a modulus is maybe the most expensive arithmetic operation we commonly do when the modulus isn’t a power of two.

            1. 1

              Nice! About the leftward bit propagation: can you do multiplication modulo a compile time constant fast? If you compute (((x * constant1) % constant2) % (1<<32)) where constant1 is the aforementioned constant with lots of ones, and constant2 is a prime number quite close to 1<<32 then that would get information from the upper bits to propagate into the lower bits too, right? Assuming you’re okay with having just slightly fewer than 1<<32 hash outputs.

              (Replace 1<<32 with 1<<64 above if appropriate of course.)

              1. 1

                You still have to do the divide for the modulus at runtime and you’ll wait 26 cycles for a 32-bit divide on Intel Skylake. You’ll only wait 3 cycles for a 32-bit multiply, and you can start one every cycle. That’s if I’m reading the tables right. Non-cryptographic hashes often do multiply-rotate-multiply to get bits influencing each other faster than a multiply and a modulus would. xxHash arranges them so your CPU can be working on more than one at once.

                (But worrying about all bits influencing each other is just one possible tradeoff, and, e.g. the cheap functions in hashtable-based LZ compressors or Rabin-Karp string search don’t really bother.)

                1. 1

                  you’ll wait 26 cycles for a 32-bit divide on Intel Skylake

                  And looking at that table, 35-88 cycles to divide by a 64 bit divide. Wow. That’s so many cycles, I didn’t realize. But I should have: on a 2.4 GHz processor 26 cycles is 10.83 ns per op, which is roughly consistent with the author’s measurement of ~9 ns per op.

                  1. 1

                    That’s not what I asked. I asked a specific question.

                    can you do multiplication modulo a compile time constant fast?

                    similarly to how you can do division by a constant fast by implementing it as multiplication by the divisor’s multiplicative inverse in the group of integers modulo 2^(word size). clang and gcc perform this optimisation out the box already for division by a constnat. What I was asking is if there’s a similar trick for modulo by a constant. You obviously can do (divide by divisor, multiply by divisor, subtract from original number), but I’m wondering if there’s something quicker with a shorter dependency chain.

                    1. 1

                      OK, I get it. Although I knew about the inverse trick for avoiding DIVs for constant divisions, I didn’t know or think of extending that to modulus even in the more obvious way. Mea culpa for replying without getting it.

                      I don’t know the concrete answer about the best way to do n*c1%(2^32-5) or such. At least does intuitively seem like it should be possible to get some win from using the high bits of the multiply result as the divide-by-multiplying tricks do.

                2. 1

                  So does that mean that when the author says Dinkumware’s FNV1-based strategy is too expensive, it’s only more expensive because FNV1 is byte-by-byte and fibonacci hashing multiplying by 2^64 / Φ works on 8 bytes at a time?

                  Does that mean you could beat all these implementations by finding a multiplier that produces an even distribution when used as a hash function working on 8 byte words at a time? That is, he says the fibonacci hash doesn’t produce a great distribution, whereas multipliers like the FNV1 prime are chosen to produce good even distributions. So if you found an even-distribution-producing number for an 8 byte word multiplicative hash, would that then work just as well whatever-hash-then-fibonacci-hash? But be faster because it’s 1 step not 2?

                  1. 1

                    I think you’re right about FNV and byte- vs. word-wise multiplies.

                    Re: 32 vs. 64, it does look like Intel’s latest big cores can crunch through 64-bit multiplies pretty quickly. Things like Murmur and xxHash don’t use them; I don’t know if that’s because perf on current chips is for some reason not as good as it looks to me or if it’s mainly for the sake of older or smaller platforms. The folks that work on this kind of thing surely know.

                    Re: getting a good distribution, the limitations on the output quality you’ll get from a single multiply aren’t ones you can address through choice of constant. If you want better performance on the traditional statistical tests, rotates and multiplies like xxHash or MurmurHash are one approach. (Or go straight to SipHash, which prevents hash flooding.) Correct choice depends on what you’re trying to do.

                    1. 2

                      That makes me wonder what hash algorithm ska::unordered_map uses that was faster than FNV1 in dinkumware, but doesn’t have the desirable property of evenly mixing high bits without multiplying the output by 2^64 / φ. Skimming his code it looks like std::hash.

                      On my MacOS system, running Apple LLVM version 9.1.0 (clang-902.0.39.2), std::hash for primitive integers is the identity function (i.e. no hash), and for strings murmur2 on 32 bit systems and cityhash64 on 64 bit systems.

                      // We use murmur2 when size_t is 32 bits, and cityhash64 when size_t
                      // is 64 bits.  This is because cityhash64 uses 64bit x 64bit
                      // multiplication, which can be very slow on 32-bit systems.
                      

                      Looking at CityHash, it also multiplies by large primes (with the first and last bits set of course).

                      Assuming then that multiplying by his constant does nothing for string keys—plausible since his benchmarks are only for integer keys—does that mean his benchmark just proves that dinkumware using FNV1 for integer keys is better than no hash, and that multiplying an 8 byte word by a constant is faster than multiplying each integer byte by a constant?

                  2. 1

                    A fair point that came up over on HN is that people mean really different things by “hash” even in non-cryptographic contexts; I mostly just meant “that thing you use to pick hashtable buckets.”

                    In a trivial sense a fixed-size multiply clearly isn’t a drop-in for hashes that take arbitrary-length inputs, though you can use multiplies as a key part of variable-length hashing like xxHash etc. And if you’re judging your hash by checking that outputs look random-ish in a large statistical test suite, not just how well it works in your hashtable, a multiply also won’t pass muster. A genre of popular non-cryptographic hashes are like popular non-cryptographic PRNGs in that way–traditionally judged by running a bunch of statistical tests.

                    That said, these “how random-looking is your not-cryptographically-random function” games annoy me a bit in both cases. Crypto-primitive-based functions (SipHash for hashing, cipher-based PRNGs) are pretty cheap now and are immune not just to common statistical tests, but any practically relevant method for creating pathological input or detecting nonrandomness; if they weren’t, the underlying functions would be broken as crypto primitives. They’re a smart choice more often than you might think given that hashtable-flooding attacks are a thing.

                    If you don’t need insurance against all bad inputs, and you’re tuning hard enough that SipHash is intolerable, I’d argue it’s reasonable to look at cheap simple functions that empirically work for your use case. Failing statistical tests doesn’t make your choice wrong if the cheaper hashing saves you more time than any maldistribution in your hashtable costs. You don’t see LZ packers using MurmurHash, for example.

                  1. 5

                    In case it’s still tagged crypto, note that these aren’t generators for cryptography; they’re for producing noise with good statistical properties faster than cryptographic generators would. That’s useful for, for example, Monte Carlo simulation.

                    It is interesting to note that some cryptographic primitives are fast enough to be in the comparison table, though. ChaCha20/8 and AES-128 counter mode are both under one cycle/byte on new x64 hardware and really well studied. If you find nonrandomness in them of any sort that would break your Monte Carlo simulation, you can probably get a paper published about it at least.

                    1. 4

                      chromium-browser is scrutinized closely enough that this would be noticed on ubuntu, right?

                      1. 5

                        The sandbox engine downloading and running ESET actually appears to be in Chromium: https://cs.chromium.org/chromium/src/chrome/browser/safe_browsing/chrome_cleaner/ so developpers are free to review it and remove any reference to it. If my memory serve me well, Chrome Cleaner is not special and should appear in chrome://components/ along other optional close source components, although I don’t have a windows machine to validate right now. It should (Or at least used to) be disabled for other build than Google Chrome.

                        1. 2

                          Thanks. It doesn’t appear in chrome://components for me, at any rate.

                          1. 1

                            If I look at it on windows I can see the entry: Software Reporter Tool - Version: 27.147.200

                            1. 1

                              Excellent, a positive control.

                        2. 2

                          isra17’s reply implies there’s no scanner in Chromium, only Chrome. [I wrote this referring to his separate comment–now he has another reply here.] It probably wouldn’t make sense to have this on Linux anyway, just because there isn’t the same size of malware ecosystem there.

                          (And I think the reporting/story would be different if the scanner were open source–we’d have an analysis based on the source code, people working on patched Chromium to remove it, and so on.)

                          1. 1

                            I’m curious about MacOS. I don’t run Chrome usually, but I have to in some cases, e.g. to use Google Meets for work.

                            1. 2

                              I don’t have an authoritative answer, but https://www.blog.google/products/chrome/cleaner-safer-web-chrome-cleanup/ only talks about Windows.

                              1. 2

                                I don’t see it in chrome://components on my Mac, if that is indeed where it is supposed to appear.

                          1. 3

                            We’re a small shop (~15 folks, ~10 eng), but old (think early 2000s, using mod_perl at the time). Not really a startup but we match the description otherwise so:

                            It’s a Python/Django app, https://actionk.it, which some lefty groups online use to collect donations, run their in-person event campaigns and mailing lists and petition sites, etc. We build AMIs using Ansible/Packer; they pull our latest code from git on startup and pip install deps from an internal pip repo. We have internal servers for tests, collecting errors, monitoring, etc.

                            We have no staff focused on ops/tools. Many folks pitch in some, but we’d like to have a bit more capacity for that kind of internal-facing work. (Related: hiring! Jobs at wawd dot com. We work for neat organizations and we’re all remote!)

                            We’ve got home-rolled scripts to manage restarting our frontend cluster by having the ASG start new webs and tear the old down. We’ve scripted hotfixes and semi-automated releases–semi-automated meaning someone like me still starts each major step of the release and watches that nothing fishy seems to be happening. We do still touch the AWS console sometimes.

                            Curious what prompts the question; sounds like market research for potential product or something. FWIW, many of the things that would change our day-to-day with AWS don’t necessarily qualify as Solving Hard Problems at our scale (or 5x our scale); a lot of it is just little pain points and time-sucks it would be great to smooth out.

                            1. 6

                              FYI, I get a “Your connection is not private” when going to https://actionk.it. Error is NET::ERR_CERT_COMMON_NAME_INVALID, I got this on Chrome 66 and 65.

                              1. 2

                                Same here on Safari.

                                1. 1

                                  Sorry, https://actionkit.com has a more boring domain but works :) . Should have checked before I posted, and we should get the marketing site a cert covering both domains.

                                2. 1

                                  Firefox here as well.

                                  1. 1

                                    Sorry, I should have posted https://actionkit.com, reason noted by the other comments here.

                                  2. 1

                                    https://actionk.it

                                    This happens because the served certificate it for https://actionkit.com/

                                    1. 1

                                      D’oh, thanks. Go to https://actionkit.com instead – I just blindly changed the http://actionk.it URL to https://, but our cert only covers the boring .com domain not the vanity .it. We ought to get a cert that covers both. (Our production sites for clients have an automated Let’s Encrypt setup without this problem, for the record :) )

                                  1. 9

                                    The main topic discussed here is now known as the Han Unification, for those curious to catch up with what’s happened since this was written.

                                    1. 2

                                      A consequence that hadn’t soaked in for me is that you need to know the language of a piece of text to reliably display it correctly. I’m guessing Twitter and Facebook and such are assigning languages to comments, etc. based on content, the poster’s language preferences/location/Accept-Language, and who knows what else.

                                      (And then if a user switches into not-their-native-language for a post, or mixes languages by quoting text in another language or whatever, there’s a whole other level of difficulty.)

                                    1. 7

                                      Compared to, say, the ARM whitepaper, Intel’s still reads to me as remarkably defensive, especially the section on “Intel Security Features and Technologies.” Like, we know Intel has no-execute pages, as other vendors do, and we know they aren’t enough to solve this problem. And reducing the space where attackers can find speculative gadgets isn’t solving the root problem.

                                      Paper does raise the interesting question of exactly how expensive the bounds-check bypass mitigation will be for JS interpreters, etc. To foil the original attack, you don’t have to stop every out-of-bounds load, you just have to keep the results from getting leaked back from speculation-land to the “real world.” So you only need a fence between a potentially out-of-bounds load and a potentially leaky operation (like loading an array index that depends on the loaded value). You might even be able to reorder some instructions to amortize one fence across several loads from arrays. And I’m sure every JIT team has looked at whether they can improve their bounds check elimination. There’s no lack of clever folks working on JITs now, so I’m sure they’ll do everything that can be done.

                                      The other, scarier thing is bounds checks aren’t the only security-relevant checks that can get speculated past, just an easy one to exploit. What next–speculative type confusion? And what if other side-channels besides the cache get exploited? Lot of work ahead.

                                      1. 5

                                        FWIW: a possible hint at Intel’s future directions (or maybe just a temp mitigation?) are in the IBRS patchset to Linux at https://lkml.org/lkml/2018/1/4/615: one mode helps keep userspace from messing with kernel indirect call speculation and another helps erase any history the kernel left in the BTB. I bet both of these are blunt hammers on current CPUs (‘cause a microcode update can only do so much–turn a feature off or overwrite the BTB or whatever), but they’re defining an interface they want to make work more cheaply on future CPUs. It also seems to be enabled in Windows under the name “speculation control” (https://twitter.com/aionescu/status/948753795105697793)

                                        ARM says in their whitepaper that most ARM implementations have some way to turn off branch prediction or invalidate branch predictor state in kernel/exception handler code, which sounds about in line with what Intel’s talking about. The whitepaper also talks about barriers to stop out of bounds reads. The language is a bit vague but I think they’re saying an existing conditional move/select works on current chips and a new instruction, CLDB will barrier for future chips that provides just the minimum you need to avoid the cache side channel attack.

                                        1. 25

                                          Zenyep Tufekci said, “[T]oo many worry about what AI—as if some independent entity—will do to us. Too few people worry what power will do with AI.” (The thread starts with a result that, in some circumstances, face recognition can effectively identify protesters despite their efforts to cover distinguishing features with scarves, etc.)

                                          And, without really having to have tech resembling AI, what they can do can look like superhuman abilities looked at right. A big corporation doesn’t have boundless intelligence but it can hire a lot of smart lawyers, lobbyists, and PR and ad people, folks who are among the best in their fields, to try and shift policy and public opinion in ways that favor the sale of lots of guns, paying a lot for pharmaceuticals, use of fossil fuels, or whatever. They seem especially successful shifting policy in the U.S. recently, with the help of recent court decisions that free some people up to spend money to influence elections (and the decisions further back that established corps as legal people :/).

                                          With recent/existing tech, companies have shown they can do new things. It’s cheap to test lots of possibilities to see what gets users to do what you want, to model someone’s individual behavior to keep them engaged. (Here’s Zenyep again in a talk on those themes that I haven’t watched.) The tech giants managed to shift a very large chunk of ad spending from news outlets and other publishers by being great at holding user attention (Facebook) or being smarter about matching people to ads than anyone else (Google), or shift other spending by gatekeeping what apps you can put on a device and taking a cut of what users spend (Apple with iOS), or reshape market after market with digitally-enabled logistics, capital, and smart strategy (Amazon). You can certainly look forward N years to when they have even more data and tools and can do more. But you don’t really even have to project out to see a pretty remarkable show of force.

                                          This is not, mostly, about the details of my politics, nor is it to suggest silly things like that we should roll back the clock on tech; we obviously can’t. But, like, if you want to think about entities with incredible power that continue to amass more and how to respond to them, you don’t have to imagine; we have them right here and now!

                                          1. 3

                                            Too few people worry what power will do with AI.

                                            More specifically, the increasingly police statey government.

                                          1. 42

                                            Reminds me of a quote:

                                            Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it.

                                            • Brian W. Kernighan
                                            1. 13

                                              :) Came here to post that.

                                              The blog is good but I’m not convinced by his argument. It seems too worried about what other people think. I agree that we have to be considerate in how we code but forgoing, say, closures because people aren’t familiar with them or because we’re concerned about how we look will just hold back the industry. Higher level constructs that allow us to simplify and clarify our expression are a win in most cases. People can get used to them. It’s learning.

                                              1. 8

                                                I think he may not disagree with you as much as it sounds like. I don’t think that sentence says “don’t use closures,” just that they’re not for impressing colleagues. (It was: “You might impress your peers with your fancy use of closures… but this no longer works so well on people who have known for a decades what closure are.”)

                                                Like, at work we need closures routinely for callbacks–event handlers, functions passed to map/filter/sort, etc. But they’re just the concise/idiomatic/etc. way to get the job done; no one would look at the code and say “wow, clever use of a closure.” If someone does, it might even signal we should refactor it!

                                                1. 5

                                                  It seems too worried about what other people think.

                                                  I agree with your considerations on learning.
                                                  Still, just like we all agree that good code must be readable, we should agree that it should be simple too. If nothing else, for security reasons.

                                                2. 2

                                                  On the other hand, sometimes concepts at the limit of my understanding (like advanced formal verification techniques) allow me to write better code than if I had stayed well within my mental comfort zone.

                                                1. 3

                                                  Here are some things:

                                                  These are, again, perf but not profiling, but I really want to write (and have started on) a super beginner-focused post about memory use. I wish someone would write a balanced intro to concurrency–from StackOverflow, a couple common mistakes are dividing work into very tiny tasks (with all the overhead that causes) when NumCPU workers and large chunks would do better, and using channels when you really do just want a lock or whatever kind of shared datastructure.

                                                  1. 11

                                                    When you mentioned channel costs, I wondered if there was communication via unbuffered channels, which can lead to traffic jams since the sender can’t proceed ‘til each recipient is ready. Looking at the old chat_handler.go that doesn’t seem to be the case, though. The three goroutines per connection thing isn’t without precedent either; I think at least the prototypes of HTTP/2 support for the stdlib were written that way.

                                                    It looks like maybe the socketReaderLoop could be tied in with ChatHandler.Loop(): where socketReaderLoop communicates with Loop on a channel, just inline the code that Loop currently runs in response, then call socketReaderLoop at the end of Loop instead of starting it asynchronously. You lose the 32-message buffer, but the end-user-facing behavior ought to be tolerable. (If a user fills your TCP buffers, seems like their problem and they can resend their packets.) However, saving one goroutine per connection probably isn’t a make-or-break change.

                                                    Since you talk about memory/thrashing at the end, one of the more promising possibilities would be(/have been) to do a memprofile to see where those allocs come from. A related thing is Go is bad about respecting a strict memory limit and its defaults lean towards using RAM to save GC CPU: the steady state with GOGC=100 is around 50% of the peak heap size being live data. So you could start thrashing with 512MB RAM once you pass 256MB live data. (And really you want to keep your heap goal well under 512MB to leave room for kernel stuff, other processes, the off-heap mmapped data from BoltDB, and heap fragmentation.) If you’re thrashing, GC’ing more often might be a net win, e.g. GOGC=50 to be twice as eager as the default. Finally, and not unrelated, Go’s collector isn’t generational, so most other collectors should outdo it on throughput tests.

                                                    Maybe I’m showing myself not to be a true perf fanatic, but 1.5K connections on a Pi also doesn’t sound awful to me, even if you can do better. :) It’s a Pi!

                                                    1. 2

                                                      Thank you for such a detailed analysis and looking into code before you commented :) positive and constructive feedback really helps. I have received a great amount of feedback and would definitely try with your tips. BoltDB is definitely coming up every-time and I think it contributes to memory usage as well. Some other suggestions include use fixed n workers and n channels, backlog building up, and me not doing serialization correctly. I will definitely update my benchmark code and test it with new fixes; and if I feel like code is clean enough would definitely love to move back.

                                                      1. 3

                                                        Though publicity like this is fickle, you might get a second hit after trying a few things and then explicitly being like “hey, here’s my load test, here are the improvements I’ve done already; can you guys help me go further?” If you don’t get the orange-website firehose, you at least might hear something if you post to golang-nuts after the Thanksgiving holiday ends or such.

                                                        Looking around more, I think groupInfo.GetUsers is allocating a string for each name each time it’s called, and then when you use the string to get the object out there’s a conversion back to []byte (if escape analysis doesn’t catch it), so that’s a couple allocs per user per message. Just being O(users*messages) suggests it could be a hotspot. You could ‘downgrade’ from the Ctrie to a RWLocked map (joins/leaves may wait often, but reads should be fastish), sync.Map, or (shouldn’t be needed but if you were pushing scalability) sharded RWLocked map. But before you put in time trying stuff like that, memprofile is the principled/right way to approach alloc stuff (and profile for CPU stuff)–figure out what’s actually dragging you down.

                                                        True that there are likely lighter ways to do the message log than Bolt. Just files of newline-separated JSON messages may get you far, though don’t know what other functionality you support.

                                                        FWIW, I also agree with the commenter on HN saying that Node/TypeScript is a sane approach. (I’m curious about someday using TS in work’s frontend stuff.) Not telling you what to use; trying to get Go to do things is just a hobby of mine, haha. :)

                                                    1. 7

                                                      I’ve been using a Samsung ARM Chromebook (1st generation) as my daily driver for the past 4 years. It’s a lowend, underpowered machine with nothing to write home about, but it can support a full Arch Linux ARM installation, run a web browser just fine, and have an adequate number of terminals. I love it. The battery life hasn’t changed at all since I bought it, it’s still consistently getting >7 hours. I have other friends with ARM laptops from other manufacturers, the battery life story is one I hear consistently.

                                                      1. [Comment removed by author]

                                                        1. 6

                                                          dz, I wrote up a blog post about this: http://blog.jamesluck.com/installing-real-linux-on-arm-chromebook I completely replaced ChromeOS with Archlinux ARM on the internal SSD. The gist of the process is that you make a live USB, boot to that, and then follow the same procedure for making the bootable USB onto the SSD. You just have to untar the root filesystem, edit /etc/fstab, and correct the networking config!

                                                          1. 1

                                                            If it’s anything like my Samsung ARM Chromebook, you can boot a different os off external storage (i.e. an SD card), or possibly replace Chrome OS on the internal solid-state storage.

                                                            1. 1

                                                              You can replace ChromeOS. Here’s the Arch wiki on the 1st gen Samsung ARM Chromebook and the Samsung Chromebook Plus.

                                                          1. 6

                                                            Not sure how responsive this is to the specific q, but I have an ARM Samsung Chromebook Plus. Initially I used developer mode and crouton, which provides an Ubuntu userspace in a chroot hitting Chrome OS’s Linux kernel, but more recently I use a setup like Kenn White describes, where you’re not in dev mode and you use Termux, an Android app that provides a terminal and apt-based distro inside the app’s sandbox. (Termux has a surprising number of apt packages covered and you can build and run executables in there, etc. You can also just use it on any Android phone or tablet.) In both cases I just use terminal stuff, though Crouton does support running an X session. You can apparently boot Linux on the Chromebook Plus full-stop; I don’t know about compatibility, etc. Amazon’s offering it for $350 now.

                                                            I’ve long been more indifferent to CPU speed than a lot of folks (used netbooks, etc.), but for what it’s worth I find the Chromebook Plus handles my needs fine. The chip is the RK3399, with two out-of-order and four in-order ARMv8 cores and an ARM Mali GPU. I even futz with Go on it, and did previously on an even slower ARM Chromebook (Acer Chromebook 13). The most noticeable downsides seemed more seem to be memory-related: it has 4GB RAM but…Chrome. If I have too much open the whole thing stalls a bit while it tries (I think) compressing RAM and then stopping processes. Also, there’s a low level of general glitchiness: sometimes I’ll open it up and it’ll boot fresh when it should be resuming from sleep. I had to do some tweaking to do what I wanted inside Chrome OS’s bounds, notably to connect to work’s VPN (had to make a special JSON file and the OS doesn’t readily give useful diagnostics if you get anything wrong), but it’s just a matter of round tuits.

                                                            From the ARM vs. Intel angle, Samsung actually made a Chromebook Pro with the low-power but large-core Core m3. Reviewers seemed to agree the Pro was faster, and I think I saw it was like 2x on JavaScript benchmarks, but Engadget wasn’t fond of its battery life and The Verge complained its Android app support was limited.

                                                            1. 3

                                                              Since we’re talking about the Plus, a few notes on my experience. I have an HP chromebook with an m3 as well, for comparison. None of the distinguishing features of the Plus were positives. Chrome touch support is just meh, and the keyboard was annoyingly gimped by very small backspace, etc. Touchpad was noticeably worse. Android support was basically useless for me.

                                                              More on topic, the ARM cpu was capable enough, but the system wasn’t practically lighter or longer lasting than the m3 system. It was cheaper, perhaps 50% so, but that’s still only a difference of a few hundred dollars. (I mean, that’s not nothing, but if this is a machine you’ll be using for hours per day over several years… Price isn’t everything.)

                                                              (I would not pick the Samsung Pro, either. Not a great form factor imo.)

                                                              More generally, if you don’t have ideological reasons to pick arm, the intel m3 is a pretty remarkable cpu.

                                                            1. 5

                                                              The technical description doesn’t fully convey some of the, uh, magic that’s been done with it. One of the coolest things is Arcadia, which gives you access to game-engine fun with a Clojure REPL. The MAGIC stuff is apparently key to cutting allocations so your games aren’t all janked up by collection pauses.

                                                              The author, Ramsey Nasser, also posts about a bunch of fun stuff on Twitter–other recent stuff he’s messed with includes spline drawing tool and (related, I think) Arabic typography

                                                              1. 6

                                                                Posted this in the orange place, figure that it’s worth saying here:

                                                                FWIW, I think this mostly shows that the GOGC defaults will surprise you when your program keeps a small amount of live data but allocates quickly and is CPU bound. By default, it’s trying to keep allocated heap size under 2x live data. If you have a tiny crypto bench program that keeps very little live data around, it won’t take many allocations to trigger a GC. So the same code benchmarked here would trigger fewer GCs in the context of a program that had more data live. For example, it would have gone faster if the author had allocated a 100MB []byte in a global variable.

                                                                If your program is far from exhausting the RAM but is fully utilizing the CPU, you might want it to save CPU by using more RAM, equivalent to increasing GOGC here. The rub is that it’s hard for the runtime ever be sure what the humans want without your input: maybe this specific Go program is a little utility that you’d really like to use no more RAM than it must, to avoid interference with more important procs. Or maybe it’s supposed to be the only large process on the machine, and should use a large chunk of all available RAM. Or it’s in between, if there are five or six servers (or instances of a service) sharing a box. You can imagine heuristic controls that override GOGC in corner cases (e.g., assume it’s always OK to use 1% of system memory), or even a separate knob for max RAM use you could use in place of GOGC. But the Go folks tend to want to keep things simple, so right now you sometimes have to play with GOGC values to get the behavior you want.

                                                                1. 6

                                                                  As best I can reconstruct, the thing the Chrome team was dealing with was that Chrome had to delay starting to scroll until they ran touch event handlers, because the handler could always preventDefault() to disable scrolling, but 1) 80% of touch event handlers weren’t actually calling preventDefault(), 2) this added ~150ms latency before scroll started in a sample case, and doubled 95th-percentile latency in their field tests, and 3) even before adding their passive option, there was a CSS alternative for preventing scroll-on-touch without JS required at all. (They also argue that browsers already have timeouts for these handlers, so they weren’t entirely reliable in the first place.) So the post author’s app probably used touchstart to prevent scrolling and then Chrome broke that.

                                                                  If the Chrome folks wanted to do this and were willing to go to a lot more effort to mitigate breakage, maybe they could have effectively tried to patch up the damage after a passive touchstart handler called preventDefault: jump back to the old scroll position, log a complaint to the console, and treat this handler (or all handlers) as active for subsequent events. It would be very complicated (both for Chrome hackers to code up and for Web devs to understand–consider that arbitrary DOM changes can happen between the scroll starting and the preventDefault!), lead to a janky scroll-and-snap-back sequence for users, and still break for more complex cases, but would likely avoid breaking some previously-usable pages entirely.

                                                                  I get the value of working pretty hard to maintain backwards compatibility, and it’s possible the Chrome folks moved a little fast here; they did actually take the time to gather data (which they use sometimes: a GitHub comment provides some general background on when they’ve backed out interventions), but the ‘scrolling intervention’ page reports only 80% of listeners were passive, and 20% breakage seems high.

                                                                  For what it’s worth, I’m not absolutist about targeted changes that might break some standards-compliant pages; as a dev I see compat impact from changes that are outside the scope of standards (sunsetting Flash, autofill changes, tracking protection, things popular extensions do) and bug fixes; it seems like there’s just always a cost-benefit balance and you need to see how things are used out on the Web, versus being able to reason from first principles what changes are OK or not. And the APIs being as old and quirky as they are, there are places where you really could subtly tweak some API and break almost nobody and improve things for almost everyone; this seems like the common category where we’d like to allow something to be async but that can break some uses of it.

                                                                  It’s tricky to figure out if a change is worth it, and again, they might have gotten it wrong here, but I wouldn’t want “you can never break anything” to be the lesson from this.