Threads for rsdbdr203

  1. 18

    I found that when programming Rust I tend to use the type system to my advantage, rather than sprinkle runtime asserts everywhere.

    Looking at some examples from your blog post:

    For example, if I’m writing a warehouse stock system, parts of my program might assume properties such as “an item’s barcode is never empty”.

    In that case you can represent your barcode as Option<Barcode> where it can be empty, and use Barcode where it cannot. Then you assert as needed by expecting the barcode. Though you cannot make it into a debug_assert!, and rightly so - reading from None in release mode would result in reading from potentially uninitialized memory, which is undefined behavior.

    For example, I might have written a function which runs much faster if I assume the input integer is greater than 1.

    For this case even the stdlib has built-in types (which are similar but not exactly fitting this particular example) - std::num::NonZero{U,I}{8,16,32,64,size} - whose new functions return an Option<Self>. Likewise you could create your own type like GreaterThanOne<T> which provides the exact functionality you described.

    There are also some things that can’t really be checked with a runtime assertion - right now I’m writing a parser for a certain old game scripting language. I have a bunch of functions like Keyword::matches, which take in the keyword substring. A mistake I sometimes make however is that I pass in the parser’s entire input instead of just the keyword substring to the function, therefore Keyword::matches always returns false. This is not something you can reasonably check with a runtime assertion, but you could check against it at compilation, by defining a struct InputFragment(str); with a Deref<Target = &str>, which could only be produced by slicing the full input, and the type checker would rightly complain whenever you forget to slice the input before passing it into somewhere.

    And just know I’m not saying runtime assertions, and especially the presence of debug_assert! are completely useless 😄 I have used them before myself. It’s just that I tend to use them much less in the presence of Rust’s strong typing and explicit error handling culture.

    1. 11

      Yes, this is the way. Make illegal states unrepresentable. By leveraging Rust’s powerful sum-types, you can eliminate almost all input checking and assertions and the complexity that goes along with it. It is mind-blowing how many corner cases and error conditions simply cannot exist anymore and how much safer and denser the code is when this is done right.

      1. 1

        Sorry, not following your example:

        A mistake I sometimes make however is that I pass in the parser’s entire input instead of just the keyword substring to the function, therefore Keyword::matches always returns false. This is not something you can reasonably check with a runtime assertion, but you could check against it at compilation, by defining a struct InputFragment(str); with a Deref<Target = &str>, which could only be produced by slicing the full input, and the type checker would rightly complain whenever you forget to slice the input before passing it into somewhere.

        Is the full input a String so you use InputFragment to ensure you’re matching a slice?

        Just a thought, why not have a debug_assert that looks for a space? Wouldn’t this let you know that it’s just the keyword instead of the full input?

        I have a Vec in my code that contains u32 IDs which should always be in sorted order, and that I need to be in sorted order to perform a merge. I could simply call sort on both, but that’s a wasteful runtime cost. Instead I have a debug_assert that simply ensures each value is larger than the pervious (using the amazing iterators available). This is complied away, but ensures the lists are sorted in all my tests.

        1. 2

          Is the full input a String so you use InputFragment to ensure you’re matching a slice?

          Yeah, that’s the idea.

          Just a thought, why not have a debug_assert that looks for a space? Wouldn’t this let you know that it’s just the keyword instead of the full input?

          That’s a good idea, and definitely shorter than creating a newtype. However I still prefer my approach, because it prevents undesired failures at compilation, so I need to write less tests to cover those bad cases.

          Regarding your example, the way I would do it is something like:

          pub struct SortedIds(Vec<u32>);
          
          impl SortedIds {
              pub fn new(maybe_sorted: Vec<u32>) -> Result<Self, Vec<u32>> {
                  if is_sorted(maybe_sorted) { Ok(Self(maybe_sorted)) }
                  else { Err(maybe_sorted) }
              }
          
              pub unsafe fn new_unchecked(maybe_sorted: Vec<u32>) -> Self {
                  debug_assert!(is_sorted(maybe_sorted));
                  Self(maybe_sorted)
              }
          
              pub fn sort(unsorted: Vec<u32>) -> Self { /* ... */ }
          }
          

          So still using a debug_assert in the new_unchecked function to signal loudly that an invariant was broken, but also using a strong type so that we only need to verify the invariant at the edges of the API.

          Hope that clears it up!

      1. 2

        I’ve gotta be honest… I sorta stopped reading when I got to the part about the library being 1MB. Why would it be important to reduce the size? Maybe on some embedded device with a super-small amount of memory? Otherwise, when your watch has GBs of memory, why optimize for library size?

        That aside… cargo bloat seems like a pretty cool tool!

        1. 4

          A megabyte here, a megabyte there, soon you have Electron-level bloat, which everyone loves to hate. I don’t know about Federico, but as a library developer, I’m keenly aware that, at least judging by message board discussions, my potential users are harsh critics of bloat. And in my particular case, I don’t want anyone to decide that they’re not going to make their GUI accessible (e.g. to screen reader users) because my library was too bloated.

          1. 2

            Your app isn’t alone in your phone. When it grows in size, it will cause other apps to be flushed out of memory, and when backgrounded, it will be flushed out sooner because it occupies memory.

          1. 17

            Paper without accessible code. This is computer science’s reproducibility blind spot. I’ve always found papers that don’t have code or contain pseudo code for an algorithm to be next to useless.

            1. 2

              Agreed, especially when the CS discipline makes it so easy (Github, etc) to share code. Other hard sciences don’t have this advantage (sharing a chemical or physics process), and yet we fail to take advantage in CS. I wonder if it’s because academics practice LPU (least publishable unit), and don’t want others to scoop them on some improvement worthy of publishing.

            1. 1

              I realize this came out ~10 days ago, but has anyone played with it?

              Unless I’m missing the point, this seems more oriented towards server code than if you wrote both the client and the server. What I mean by that is that I think it’d be hard to insert crash/latency/etc calls into the control-flow of the client, without having thought about this ahead of time. Maybe something with cfg features?

              1. 15

                It’s my first weekend “free” from a job, and I’ve saved enough money to start my own company.

                Therefore, I’ll be working on my own product, a CLI-based CI service that allows folks to run builds locally and control their cloud workflows directly from the CI.

                I’ll be writing it in Rust, which will be pretty fun.

                My parents are also in town, so I’ll spend some time with them.

                1. 2

                  Best of luck to you!

                  1. 1

                    Good luck! Do you have a site up yet?

                    1. 1

                      That sounds like a lot of fun! Are you going to use NixOS or Docker with it for build isolation/reproducibility? I’d be interested in hearing how this goes, good luck from me too :)

                    1. 5

                      At the risk of showing my ignorance… How do you measure if it’s I/O or processing? If it is the processing step, how do you figure out which part? I’ve used callgrind before, but it seems to “bottom out” showing a function accounts for most of the time, but the next level down simply has many calls to a function that’s seemingly fast, so I’m left not knowing what to optimize. Just curious what folks use for profiling their code, with a focus on optimizing it.

                      1. 12

                        There’s no silver bullet. Performance Optimization is a skillset that gets built up over time, and is mostly measurement and intuition. The measurement comes from profilers and benchmarks (which you should get in the habit of doing in some cases) and intuition to know what is likely to be slow, and most importantly what likely matters if it’s slow.

                        There’s a famous (but often subquoted without the context) Knuth quote here:

                        “Programmers waste enormous amounts of time thinking about, or worrying about, the speed of noncritical parts of their programs, and these attempts at efficiency actually have a strong negative impact when debugging and maintenance are considered. We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. Yet we should not pass up our opportunities in that critical 3%.”

                        “Structured Programming with goto Statements”

                        1. 9

                          It’s worth noting that intuition is very dangerous here because it changes over time. I first encountered this with some code optimised by someone whose intuition was routed in the ‘80s. He put a lot of state that needed to be passed through multiple levels of function call before being used into globals, to avoid stack pushes and pops. I rewrote it in the naive form with function arguments and it got 20% faster. The compiler was able to keep the values in registers and then had useful alias information, so generated better code.

                          These days, caches and rename register pressure are have a huge impact on performance. I don’t think I saw rename register pressure affect anything 10 years ago. I’ve had several conversations with systems programmers over the last few years where they’d been worried about performance effects based on a mental model of how computers work that’s 20 years out of date.

                          1. 1

                            That’s completely fair! I tried to account for that with the habitual measurement part, but that only addresses part of the problem! You won’t necessarily understand why the measurements are changing over time. On the other hand, I bet creativity, trial and error, and measurement can substitute for full understanding of the architecture with good results, a lot of the time.

                            Also of note, I realized that my comment also forgot the very important “Understanding of how to fix the issues” which is some combination of creativity, and understanding of “systems”, data structures, algorithms, etc.

                          2. 1

                            I have heard the quote about premature optimization countless times. I didn’t know the author was Knuth.

                            1. 7

                              I wish people always gave the full version of that quote with attribution and not the “premature optimization is the root of all evil” part taken out of context indeed!

                          3. 4

                            Brendan Gregg has a lot of good guides to performance optimization: https://brendangregg.com/

                            In particular, his USE method is a great starting point: it gives you a framework to use when investigating any kind of computer performance issue. For Linux in particular, he has a whole page of Linux Performance tools and techniques.

                            I also enjoy Bruce Dawson’s blog Random Ascii. He has more of a Windows focus rather than Linux (he’s the author of UIforETW) but a lot of the general methodology is cross-platform.

                          1. 1

                            Investigating clustering algorithms for log files. Affinity Propagation[1] is probably where I’ll start.

                            [1] https://en.m.wikipedia.org/wiki/Affinity_propagation

                            1. 1

                              A bit off-topic: Does anyone use Cap’n Proto’s Rust version in production? I am thinking whether I shall give it a go or fallback to well-established gRPC. After a preliminary check up I am a bit scared to use it long term.

                              1. 2

                                I did a similar analysis, and stuck with gRPC. The support for services w/Tonic is really solid. If you’re just looking for structure serialization, then it’s probably not a bad (and possibly faster) solution.

                                1. 1

                                  Cloudflare uses it extensively in internal services.

                                  1. 1

                                    I was under impression that they use only C++ implementation.

                                    1. 4

                                      The Rust library is definitely used in at least one internal service I can think of

                                1. 0

                                  Do we allow paywalled articles now?

                                  1. 2

                                    I use a regex redirect extension to rewrite medium URLs to scribe.rip, the alternative viewer-respecting fronted to Medium.

                                    Here you go: https://tomaszs2.scribe.rip/how-rust-1-64-became-10-20-faster-on-windows-3a8bb5e81d70

                                    That said, if you already know what PGO is, there’s no point in reading the article.

                                    1. 1

                                      My apologies, I didn’t realize there was a paywall… loads just fine for me.

                                      1. 1

                                        Medium and its A/B tested monetization is fickle and part of the reason nobody should use it for publishing content.

                                    1. 6

                                      I’m trying something new: inspired by this article I’ve decided to split my time working on my startup and open source projects 50/50 between development and marketing - doing a week of each in turn.

                                      So this week is my first ever attempt at a “marketing week”. I’m going to be mainly trying to book demos with potential users and customers (and do activities to support that).

                                      1. 1

                                        What’s your startup?

                                        1. 2

                                          I’m building the SaaS version of my open source project https://datasette.io - the preview is up at https://datasette.cloud

                                      1. 1

                                        Re-written a Rust program to Go and got 5x performance boost. Because I was so bad at Rust.

                                        1. 1

                                          What’s the program?

                                          1. 2

                                            Not sure about the root cause, might be misuse of async codes and inefficient vec structures. The original Rust code was mainly written by an employee who has left and I had to take over the code. Anyway given the scale of our (not so big) data volume, using the most familiar language truely helped this time.

                                        1. 1

                                          In my opinion, they should have rather adopted Ada (an industry standard) instead of Rust in the kernel. Rust is too sandy as a foundation.

                                          1. 2

                                            Too sandy? What do you mean by that?

                                            1. 1

                                              Probably Matt 7:24-27.

                                          1. 1
                                            • Hopefully debugging what’s going on my Kobo e-reader.
                                            • Playing around with sveltekit and rust to see if I can cram build artifacts into a single binary (and Rust driving the server experience)
                                            1. 2

                                              I’ve done this. My setup is to use Vite to build/pack the site, the I have a Python script that copies all the files into the Rust source tree, then update a single Rust file with a bunch of include_str macros. Single binary with web server (I use tide) and all the artifacts. It’s pretty neat :-)

                                            1. 26

                                              I got laid off today :) My goal is to keep a straight head this weekend.

                                              1. 6

                                                I hope you are able to do that. Be well.

                                                1. 4

                                                  If you were laid off, just try to keep in mind that it was a biz problem, not a you problem! I know I focused on what I did wrong when I was let go from a struggling startup, 2 months before my 1-year cliff. Turns out they just needed those RSUs for other investors because we weren’t making money.

                                                  I took a week or two to do stuff around the house, and begin looking. Ended up being the best thing, because I reconnected with a lot of old colleagues, and one was hiring for almost double my previous salary.

                                                  So hang in there!

                                                  1. 3

                                                    Sorry for you. Hope you find another job quickly!

                                                    1. 5

                                                      I should be all right. There’s a technical writer job I want to try for, but also something in code cad available. In the end TypeScript / React and Rust devs are pretty in season so I think there’s not much to worry about! (Plus I’m friendly on a personal level :))

                                                      Thanks everyone, much appreciated <3

                                                  1. 16

                                                    Could you go into more detail about why this round of embedded key-value stores are different than, say, Berkeley DB, which has been in the Python stdlib for decades? I’ve always been very confused about this hype, figuring it as just being the Google-backed version of an existing idea.

                                                    1. 9

                                                      A lot of the ones mentioned in the article were developed for use in servers. They are much more scalable than Berkeley DB and, with LSM trees, can handle much higher write loads.

                                                      For embedded use, LMDB and its spin-off libMDBX are a lot faster than BDB, but have neatly-compatible APIs. (The guy who built LMDB has a paper on how his project, OpenLDAP, had so many performance problems with BDB that he wrote LMDB as a replacement.)

                                                      1. 5

                                                        The guy who built LMDB has a paper on how his project, OpenLDAP, had so many performance problems with BDB that he wrote LMDB as a replacement.

                                                        That’s very interesting. I usually think of an LDAP backing store as the canonical example of something that’s almost entirely unconcerned with write performance. (Because directories are read so frequently and updated so rarely.)

                                                        edit: Assuming this is the paper in question, it seems to me that read optimization was the focus of MDB development, not write loads. But it sounds like some of the design decisions they made helped quite a bit with write performance as well.

                                                        1. 1

                                                          Yes, my comment about higher write loads was specific to LSM trees, not LMDB. Sorry for the confusion.

                                                        2. 3

                                                          As I remember it the main reason for the move away from BDB was not performance, it was the license change.

                                                        3. 6

                                                          My college database course had us write a SQL engine on top of BerkeleyDB, and that was 8 years ago. I was surprised to learn just now that it was initially released in 1994. Page 6 of this paper from this year shows BerkeleyDB winning in most of the workloads they tested. (The paper is “Performance Comparison of Operations in the File System and in Embedded Key-Value Databases” by Hines et al.)

                                                          1. 4

                                                            Interesting paper, but they’re pretty vague about the setup: they ran the tests in a VM provided by the university, so that adds a lot of unknowns (like, what kind of mass storage? And was anyone else using the VM?), and didn’t specify what filesystem they used. I suspect they’d also get different results on Apple platforms and Windows.

                                                            1. 1

                                                              Agreed. I wish they posted their code so we could try it on other systems.

                                                          2. 2

                                                            It would be great to see more examples of situations where each is better. The article mentions:

                                                            It can allow you to swap B-Trees (traditional choice) for LSM Trees (new hotness) as the underlying storage method (useful for optimizing write-heavy workloads).

                                                            I don’t think LSM trees are strictly better than b-trees, if your only requirement is write heavy workloads. You also need to require the index characteristics an LSM tree provides (sequential IO), as well as be okay with the compaction characteristics. Cassandra, for example, uses this structure. I distinctly remember compactions being something that was tricky to optimize well (all though it’s been a looong time since I’ve used it).

                                                            The original paper goes into more detail, but it’s been a long time since I’ve read it. Google might have made them more popular, but they weren’t invented at Google. Like anything in CS, it’s just a different data structure. There are tradeoffs, it depends on the use case, and ymmv.

                                                            1. 4

                                                              Sure I’m making generalizations because precision requires benchmarking and I’m not even talking about any specific code in this post. But if you just Google “lsm tree vs btree benchmarks” you’ll find a bunch and they mostly agree with the generalization I made.

                                                              For example:

                                                              Here’s a Mongo benchmark.

                                                              Their takeaway: “If you have a workload that requires a high write throughput LSM is the best choice.”

                                                              Here’s a TiKV benchmark.

                                                              Their takeaway: “The main purpose for TiKV to use LSM-tree instead of B-tree as its underlying storage engine is because using cache technology to promote read performance is much easier than promote write performance.”

                                                            2. 2

                                                              Yeah this is a good question. I’d like to know myself.

                                                              1. 8

                                                                When you talk about key-value stores, there are basically 2 data structures that are implemented for storing the data: B-trees and LMS-trees. B-trees have been around for a long time (wikipedia tells me since 1970), and so the original embedded DBs all used a B-tree as their underlying data structure. There are differences in all other aspects (manual page cache vs mmap, variations of B-trees, etc), but they’re all implementations of a B-tree. B-trees are more efficient at finding/reading data than inserting or updating data.

                                                                LSM-trees on the other hand were invented in 1996 (again, so says wikipedia). They were designed to handle writes/updates more efficiently than B-trees. The observation was that the “heavy” work of sorting can be done in-memory, then a merge-sort style operation can be performed, which incurs a sequential read from and write to disk, which is typically very fast. This spawned a number of implementations (LevelDB, RocksDB, etc) which too varied in a number of aspects, but most specifically in the strategy around when you merge data that has been persisted to disk. There are 2 main camps here: when a certain number of files-per-level have been written, and when a certain size-per-level has been written. These strategies vary in performance characteristics (write amplitude, space amplitude, etc) and can be chosen based upon the workload.

                                                                1. 1

                                                                  I’m aware of the data structures. :) But I’m not as aware about every major implementation and their ebb and flow in popularity/use over time.

                                                            1. 3

                                                              Does ingest require a predefined schema? If not, how do you handle converting the schemaless data into Parquet?

                                                              1. 1

                                                                I’m working on something similar (log-store.com) and built a database to get around the issue of most databases and file formats requiring a schema… that and I like databases :-)

                                                                1. 1

                                                                  Heh I’m also working on a schemaless log ingest and search tool written in Rust, which is why I’m asking.

                                                                2. 1

                                                                  There is a MAP type and also a JSON/BSON type that would help. Maybe that’s what they use?

                                                                1. 6

                                                                  Related article that explains some of the concepts, etc in HTTP/3 - https://www.smashingmagazine.com/2021/08/http3-core-concepts-part1/

                                                                  1. 1

                                                                    My key takeaway from this is that TCP+TLS is still faster for high throughput (without flaky connections) and HTTP/3 optimizations are only relevant if you need to help people with very unstable connections, and the actual results can vary a lot. For my basic nginx reverse proxy setup it’s kinda irrelevant and I’m hesitating to open UDP ports for that. If debian ships nginx with http/3 I’ll probably enable it, until then it seems to perform not that great in nginx and apache.

                                                                    1. 2

                                                                      It’s a bit more subtle than that, though regardless if you’re not interested in being in the bleeding edge of the space I too would wait until nginx or apache enable http/3.

                                                                      A couple points in no general order:

                                                                      • A lot of the overhead for small web pages or AJAX requests especially from TCP+TLS is the 3 round trips necessary to establish the TLS stream. Assuming a conservative TCP packet size would be ~ 1280 bytes (a conservative MTU of 1320 bytes, resulting in a TCP MSS of 1280 bytes), and HTTP request and response pair for a small blog post can easily fit in 2-3 packets (1 packet for the request and 1-2 packets for the response), and an AJAX request/response is usually 2 packets. This means the entire HTTP interaction over TCP for the AJAX request would result in 1.5RTT (for TCP establishment) + 2 RTT = 3.5 RTT. TCP+TLS has 3RTT (for TCP+TLS establishment) + 2RTT = 5RTT. This is ~42% overhead just for TLS establishment. If page weight is high though (or requests are being pipelined), the overhead on connection establishment is decreased. TCP Fast Open and TLS False Start can get this down to 1RTT connection establishment. TLS 1.3 has support for 0RTT connection establishment but this is tricky. Default QUIC connection establishment is 1.5RTT, just like regular TCP, and there are 0RTT modes available for QUIC.

                                                                      • “Flaky” connections can be more common than you think. The internet is mostly designed around maximizing throughput, and near after-work or after-school hours you’re going to see congestion on lots of routers as everyone starts using bandwidth-intensive multimedia services. Moreover if you’re ever on cafe/airport wifi, building free wifi, or just been far from an AP, then you’ll be hit with flakiness and dropped packets. QUIC could increase “reliability” in these situations dramatically.

                                                                      • Multimedia is especially impacted by HoL blocking. Dropping a packet or two when streaming a video is fine for stream quality but can cause the stream to stutter and stop while your connection waits for a blocked packet to ACK. Moreover if an ACK isn’t received, packets will be resent, adding delays and congesting the network further leading to a negative spiral. This is one common answer to “Why is Netflix slow after work?” and can improve experiences broadly.

                                                                      • QUIC supports using a connection ID to maintain a persistent connection even when IP endpoints change. This means that if you walk from one part of a building to another with a different WiFi SSID, you come back from elsewhere and plug into your desk’s Ethernet, or a NAT mapping changes silently for you that your existing connections will stay established instead of dropping and all reconnecting.

                                                                      There’s other stuff too, but the above points are some examples of the fat that can be trimmed on the net by moving to HTTP/3. Though personally I’m more excited by being able to use QUIC for non-HTTP traffic, and even using QUIC through p2p-webtransport so we can send/receive non-HTTP traffic directly from the browser. Happy to talk more about this stuff as I’m super excited for QUIC.

                                                                      1. 1

                                                                        I’ve actually read all 3 articles. Still it seems like a lot of overhead for diminishing returns for now. I think the biggest change is that we can replace parts and iterate on the protocol much faster now. (By choosing the only other possibly non-blocked protocol, UDP.) I fear for the DDoS resistance when looking at some of the overhead all the new compression, first-packet optimization and ID re-use adds on top (while actually storing multiple IDs for changing them on interface / ISP change, so more stuff to store in memory).

                                                                        1. 1

                                                                          I think the biggest change is that we can replace parts and iterate on the protocol much faster now.

                                                                          By having HTTP go over QUIC, QUIC gets to essentially play chicken with ossified middleboxes. “Support this or web traffic won’t work.” But because QUIC is so general-purpose, we can also push other traffic over it. It’s exciting to think that we can send arbitrary traffic over what looks like regular traffic (though folks do that today over TLS sockets on port 443.)

                                                                          I fear for the DDoS resistance when looking at some of the overhead all the new compression, first-packet optimization and ID re-use adds on top (while actually storing multiple IDs for changing them on interface / ISP change, so more stuff to store in memory)

                                                                          I’m hopeful that connection IDs offer a new way to throttle/block for DDoS also but yeah it’s something to keep in mind as HTTP/3 rolls out.

                                                                  1. 4

                                                                    Enjoying the nice weather, and probably continuing Type Checker Rewrite N of N+1.

                                                                    1. 1

                                                                      Cool! Type checker for what language?

                                                                      1. 1

                                                                        My “what if Rust was small” language, Garnet. Type checking in general has been an exercise in frustration and “here’s all the things textbooks don’t teach you”, and I haven’t even gotten to borrow checking yet. I thiiiiink I’m getting a handle on it though.

                                                                        1. 2

                                                                          If you ever feel like writing it, it would be cool to read a “here’s all the things textbooks don’t teach you” regarding type checking.

                                                                          1. 1

                                                                            Thanks, but I am really not qualified to write it. XD

                                                                            1. 1

                                                                              That’s why you are the one whose “…the things textbooks don’t teach you” I would like to read. Someone who is more qualified would probably forget to explain basic knowledge that I lack.