1. 25
  1.  

  2. 14

    Thanks goodness for Reader View.

    My complaint isn’t the performance, but is the layout and the floating TOC which gets in the way of reading.

    1. 4

      Killing TOC, thanks

    2. 9

      This website it terribly to use on a mobile device. I wish more sites were plain text…

      1. 18

        To be fair, this website is also terrible on desktop.

        Redacted since it’s unneedly harsh

        1. 9

          I’m a co-founder of SmoothTerminal. Sorry about the performance issues - we just launched a UX/UI refresh and none of us noticed the performance issues until yesterday. Are you on Safari? That’s the only place we’ve been able to reproduce the slow scrolling, and there’s some particularly weird behaviour sometimes (safari web inspector claims an element is where it should be, but it’s painted somewhere else entirely). We’re working on it, and I’m embarrassed that it’s bad. Mea culpa.

          1. 24

            I shouldn’t have been this harsh. Sometime I forget about the human on the other side of the screen. I’m the one embarrassed, sorry about that.

            I have this issue at its worst on Chrome on OSX. Using it on Firefox/Linux on my desktop is less worse, but scrolling still feel a bit tampered with.

            1. 20

              This is one of the reasons that I like lobste.rs, we all occasionally make poorly thought out or harsh comments, but this is one of the places on the internet where people apologize because they think about the person on the other side. Thanks for making the world decent!

            2. 5

              Honestly it’s pretty rough to use on desktop as well. The floating ToC blocks the text in its default, expanded position, and on wide monitors the layout is much wider than seems reasonable. The “reader” view in firefox cleans it up nicely – it just gives raw text with a reasonable column width, which is all anyone wants anyway.

              1. 6

                Killing TOC, thanks

                1. 5

                  Thanks for being responsive!

                  1. 1

                    No problem. We saw some performance gains from that and it was definitely broken in safari, but the real performance gains came from the deploy we just performed - we stopped using background-attachment: fix. That was causing extreme redraw churn. Both needed to be done though really.

              2. 4

                FWIW, it is pretty difficult to read with JavaScript disabled (using uMatrix). Perhaps you’re applying styles in JavaScript?

                1. 2

                  This is what I was thinking too. I just gave up and opted for Reader View in Firefox.

                  1. 1

                    We are, and it’s in support of our themes. We should make sure to ship the default theme by default, but my guess is it’s flipped in js and has no good default fallback. Thanks.

                    1. 1

                      We were accidentally shipping all styles over js. We’re prepping a PR that properly sends the stylesheet now. That’s embarrassing.

                      1. 1

                        Stuff happens! Still displays poorly in Firefox with uMatrix, but maybe you’re still working on the PR.

                        It probably looks fine in e.g. elinks or eww.

                    2. 1

                      Mobile Firefox here, I could see the first paragraph but only the background Blue after that, until I switched to Reader View

                      1. 1

                        What mobile OS? is that happening still? I just tried it on mobile firefox on android and it worked fine, but we also just deployed a bunch of tweaks based on the righteous, justified shellacking we got for performance yesterday.

                        1. 1

                          It works much better now, thanks!

                2. 4

                  If you are doing request/response services that rarely mutate the request buffer, consider using other serialization methods so you can get zero-copy performance.

                  1. 2

                    What do you mean zero-copy performance? Zero copies of what?

                    1. 14

                      When protobufs are unpacked, they’re copied from the serialized binary stream into structs in the native language. FlatBuffers are an alternative, also by Google, that don’t need to be unpacked at all. The data transmitted over the network is directly useable.

                      Interestingly, there are also some zero copy, zero allocation JSON decoders, like RapidJSON. Its methods return const references to strings directly in the source buffer rather than copying them out. But of course it still needs to parse and return numbers and booleans, rather than using them directly from the message like FlatBuffers.

                      The biggest problem with copying is copying large chunks of binary data out of the message. Suppose you wanted to implement a file storage API using gRPC. Right now to handle a read call the server would have to copy the data into memory, copy it into the serialization buffer, and send it. It would be much better to avoid that extraneous copy for serialization.

                      Our internal protobuf implementation handles this with something called cords—essentially large copy-on-write shared buffers—but cords aren’t open source yet. You can see references to ctype=CORD in the protobuf open source code and docs, and there’s a little bit of discussion here on the public mailing list.

                      1. 2

                        +1 to this. In a real-world test case with ~webscale~ traffic, the heap fragmentation caused by unserialiasing ~10,000 protobufs per minute was enough to inexorably exhaust available memory within minutes, even with jemalloc and tuning to minimise fragmentation, and after doubling the memory available a few times to check that it wouldn’t cap out. I kept bumping into cord references online and wishing they were part of the open-source implementation.

                        Swapped out protobuf for a zero-copy solution (involving RapidJSON! :D) — which meant swapping out gRPC — and memory use became a flat line. We’ve become somewhat avoidant of gRPC since this and some other poor experiences.

                        1. 4

                          That’s weird, 10k protobufs per minute isn’t very many. As you might imagine, we do a lot more at Google and don’t have that problem.

                          Since you mention cords, were these protobufs with large strings?

                          What did you tune in jemalloc? Was this in a containerized environment? Did you limit the max number of arenas?

                          1. 3

                            Since you mention cords, were these protobufs with large strings?

                            Yes – the documents were about as simple as it gets, two strings. One huge, one tiny. The response to most requests was a repeated string, but we found that returning an empty array didn’t affect the heap fragmentation – just parsing the requests was enough.

                            What did you tune in jemalloc?

                            Honestly, I tried a bit of everything, but first on the list was lg_extent_max_active_fit, as well as adjusting the decay options to force returning memory to the OS sooner (and so stave off the OOM killer). It performed much better than the default system malloc, but increasing the traffic was enough to see the return of the steady increase in apparent memory use.

                            (At any point in time, turning off traffic to the service would cause the memory use increase to stop, and then after some minutes, depending on decay options, memory would return to baseline. I mention this explicitly just to make sure that we’re 100% sure there was no leak here – repeated tests, valgrind, jemalloc leak reporting, etc. all confirmed this.)

                            Was this in a containerized environment?

                            Yes, Kubernetes. This does complicate things, of course.

                            Did you limit the max number of arenas?

                            No, I didn’t – the stats didn’t give me any off feelings about how threads were being multiplexed with arenas. (Things looked sensible given what was going on.) I did try running jemalloc’s background thread, but as you might expect, that didn’t do much.

                            1. 2

                              Ah. I ask about arenas because of this problem. In that example it happened with glibc, but the same could happen with jemalloc.

                              I ask about containers because max arena count is often heuristically determined from core count, and containers expose the core count of the host system. You can easily run e.g. a 4 core container on a 40 core server and container-unaware code will incorrectly believe it has 40 cores to work with. I believe jemalloc defaults to 4 arenas per core, so 160 arenas in that example. That could massively multiply your memory footprint, just as it did in the linked post.

                              If you didn’t notice a surprisingly large amount of arenas in the stats, that probably wasn’t the issue.

                              At Google all binaries are linked with tcmalloc. I don’t know whether that matters, but it’s another possible difference.

                              If parsing empty protobufs was enough to cause memory fragmentation, I doubt cords would have made a difference either. But I agree, I wish they were open source. I’m sure they’re coming at some point, they just have to be detangled from Google internal code. That’s the whole point of Abseil, extracting core libraries into an open source release, so Google can open source other things more easily.

                              1. 1

                                Aaaah, ouch, yes, that makes sense; that could easily have bitten me, and I just got lucky that our machines had only 4 cores. I do wonder about tcmalloc.

                                If parsing empty protobufs was enough to cause memory fragmentation, I doubt cords would have made a difference either.

                                I may have been a little unclear – we were never parsing empty protobufs, always valid (full) requests, but we changed it so we returned empty/zero results to the RPCs, in case constructing the response protobufs were responsible for the memory use. So it’s possible cords would have helped some, but I have my doubts too.

                                Abseil looks neat! I’m glad such a thing exists.

                              2. 2

                                Apache Arrow uses gRPC with effectively a message similar to yours, some metadata and a giant binary blob. It is possible to use zero-copy:

                                https://github.com/apache/arrow/blob/master/cpp/src/arrow/flight/serialization-internal.h

                                1. 1

                                  Whew! That is interesting. Thank you for the link, I’ll digest this. Unfortunately the project my experience was with is dead in the water, so this will have to be for the future.

                        2. 1

                          The contents (or portions thereof) of the input buffer.

                          As an example, if what you’re parsing out of the buffer is a collection of strings (typical of an HTTP request, for instance), zero-copy parsing would return pointers into the buffer, rather than copying sections of the buffer out into separately-allocated memory.

                          It’s not always beneficial (for instance, if you keep parts of the request around for a long time, they will force you to keep the entire buffer allocated; or if you need to transform the contents of the buffer, separate allocations will be required anyway), but in the right circumstances, it can speed parsing significantly.

                          Neither JSON (due to string escaping and stringified numbers) nor protobuf (varints, mostly) are terribly well-suited to zero-copy parsing, but some others serialization formats (Cap’n Proto being the one I’m most aware of) are specifically designed to enable it.

                          1. 1

                            AFAIK, the binary format of protobuf allows you to do zero-copy parsing of string/binary?

                            1. 1

                              Yes, definitely; string and bytes fields’ data is unchanged and whole in the serialised form.

                      2. 4

                        On Firefox scrolling this page makes my CPU spike and my laptop’s fans spin. You probably don’t need 2MB of JS to display some text.

                        1. 2

                          I’m playing with capnproto rpc now, and finding it slightly confusing. I imagine it’s easier for people who already understand gRPC among many clients and servers.