Threads for emschwartz

    1. 2

      That’s good to know that it doesn’t save that much compile time.

      I just refactored a project of mine and found that splitting things up into smaller crates made quite a large difference (~50% less time for incremental changes). I thought of trying this kind of strategy but sounds like it’s not really worth it.

      1. 2

        I’d imagine the bottleneck here is the popcnt. For long strings, there are published methods for fast popcounts using SIMD instructions that beat the native popcnt instruction. Could be worth looking into, especially since the “OR” could be implemented in wide registers as well.

        https://lemire.me/en/publication/arxiv161107612/

        1. 3

          Apparently the Rust compiler already applies some of these optimizations. Take a look at the discussion on Reddit: https://www.reddit.com/r/rust/s/8g0mGdAMaf

        2. 1

          Cool to see iterator methods perform so well.

          Question about the semantics: what happens if the length of y is not a multiple of 8? I might to see something equivalent to:

          x.bits().zip(y.bits()).sum()
          

          Except if x is 16 bytes and y is 15 bytes, I think the current version will just take the hamming distance of the first 8 bytes.

          1. 4

            That first assert ensures that the two slices are the same length

          2. 3

            I spent a week hunting a phantom memory leak just like this. Don’t trust Grafana dashboards to tell you the full story about your app’s memory usage!

            Posting here to hopefully save others the headache.

            1. 6

              I don’t know if it’s just me, but I find Typescript written with Effect somewhat hard to grok.

              It sounds like an interesting project but when I look at the Without Effect and With Effect code samples on https://effect.website, my immediate reaction is that I’ll take the Without Effect version.

              From this post, it seems somewhat nice that the final “effect” of a chain of actions is hoisted up to the top and you can see what it returns if it succeeds or fails.

              const success: Effect.Effect<number, never, never> = Effect.succeed(42)
              
              const fail: Effect.Effect<never, Error, never> = Effect.fail(
                new Error("Something went wrong")
              )
              

              But those chains of pipes… That doesn’t seem to be the most readable to me.

              1. 8

                You might be interested in effection which tries to accomplish similar goals but look like vanilla JS: https://frontside.com/effection/

                There’s also https://starfx.bower.sh which builds off of it and is used in the FE.

                1. 2

                  Very interesting!

                  My ears definitely perk up at the mention of structured concurrency (I wrote a blog post about it in Rust).

                  From looking through the website a bit, I was missing some of the “why” for this approach though. This blog post explains it more, but I’m still not sure how big of an issue this is in TS/JS land: https://frontside.com/blog/2023-12-11-await-event-horizon/

                  Nevertheless, it does seem nice that it seems to be using slightly more built-in constructs: https://frontside.com/effection/docs/async-rosetta-stone

                  1. 4

                    but I’m still not sure how big of an issue this is in TS/JS land

                    Great insight, I tried to answer that question here: https://bower.sh/why-structured-concurrency

                    For me, the problem reduces down to async flow control and structured concurrency is a fantastic way to manage it. This is why structured concurrency is such a great tool for TS since the node/browser is single-threaded and requires strong concurrent tooling. It’s a great fit.

                2. 3

                  That’s something that would need to be fixed on the language level. ts-effect can’t do much here.

                  1. 3

                    I agree. The pitch of “the missing standard library for TypeScript” is extremely off-putting. I can imagine balls of spaghetti code because Effect is neither a standard nor a language. That positioning is going to mean that everybody invents best practices and it’ll be used either too much or too little.

                    I’d love to see an effect system in TS but it needs to be language level for it to be practical.

                    1. 1

                      Yeah, I could barely get something written. No chance I’d get anybody else to read or write this stuff.

                      1. 5

                        The critical part here is:

                        Every few seconds, do a range-delete on saved texts older than 48hr

                        So, sure, you can consume the firehose for cheap… you just can’t store it!

                        1. 10

                          oh hey that’s me :)

                          no, you probably can’t store it fo $2.50/month. bit clickbaity but i wanted to find the lower limit for doing something interesting. today i just had to bump the storage to 3GB, so say 1.5GB/day to store all text content at the current network volume. you can store a lot more and do more interesting things before it gets really expensive.

                          i’m trying out aggregating back-links (from likes, for now) in sqlite right now. much higher volume than posts. sqlite is super not made for it, but.. it’s surprisingly been able to keep up for a number of hours with 1cpu and 2G ram. there are things that are expensive in atproto but a lot can be cheap.

                      2. 2

                        If you want to learn how to write Rust proc macros, I would highly recommend doing this tutorial in full https://github.com/dtolnay/proc-macro-workshop.

                        It was created by dtolnay, the author of syn and quote, and gives you failing test cases that you then fix step-by-step to learn what you need to know to write proc macros.

                        1. 1

                          Not sure how the link works with an extra slash and 2 spaces in it. Must be some weird backend matching.
                          Here’s the canonical URL: https://tweedegolf.nl/en/blog/65/async-rust-vs-rtos-showdown

                          1. 1

                            This model reaches Arena Hard of 85.0, AlpacaEval 2 LC of 57.6 and GPT-4-Turbo MT-Bench of 8.98, which are known to be predictive of LMSys Chatbot Arena Elo

                            As of 1 Oct 2024, this model is #1 on all three automatic alignment benchmarks (verified tab for AlpacaEval 2 LC), edging out strong frontier models such as GPT-4o and Claude 3.5 Sonnet.

                              1. 22

                                As does https://blessed.rs/crates which is a hand curated list of libraries for common tasks.

                                1. 4

                                  Blessed looks great! I’ve used Rust for a long while but hadn’t come across that before

                                  1. 3

                                    Man oh man, that would have saved a lot of time.

                                  2. 8

                                    Sweet sweet abandonware. Truly stable

                                    1. 7

                                      Yep. You also have near-stdx, awesome-rust, and probably a bunch more that I couldn’t find in various states of abandonment.

                                      Turns out that curating a stdlib is a lot of work. Who knew!

                                    2. 2

                                      stdx exists.

                                      For more context:

                                      1. This is responding to:

                                        Rust needs an extended standard library.

                                        All other major programming languages and ecosystems have internalized this fact and provide solid and secure foundations to their developers. Go is paving the way, with deno and zig following closely.

                                        The extended standard library, let’s call it stdx, should be independent of the language so it can evolves at its own pace and make breaking changes when it’s required.

                                      2. The real stdx was a project by Brian Anderson (brson), who was the (de facto?) leader of the Rust Project for some years after Graydon Hoare left.

                                      1. 3

                                        Anyone building an app with Tauri? What has your experience been?

                                        1. 9

                                          I briefly interacted with the Tauri folks because of a shared dependency on an unmaintained Rust crate. They seem to be nice and no-nonsense. Also, I know someone who builds a TypeScript based game and builds on Tauri, it seems to be a robust solution.

                                          1. 7

                                            I just spend an hour and a half or so going through the latest 2.0 version of the tutorial, and was able to get their hello-world app running in both my native Linux (Arch Linux specifically), and also an Android emulator on the same system; and was also able to install a plugin and make some other relatively-simple changes to the hello-world code. This was all pretty straightforward, I didn’t run into any major issues. I haven’t tried running that on a real Android phone yet though.

                                            1. 5

                                              I built Foreground (an app for automatically tracking time in apps and websites) for Windows specifically, using Tauri. It was a good fit for being able to use Rust to interface with system level stuff, and HTML to make a nice interface (and also play to my strengths as someone who is primarily a web developer). I like the experience, and Tauri has a lot of batteries included. It’s lean, too, my app uses very little RAM even including the Edge webview process. I’d use it again for desktop apps for sure.

                                              However, I built it with v1 and just tried to go through the upgrade process to v2 on one of the RCs a few weeks ago, and while the automatic migration helped in some places, so many APIs have changed that it was a huge undertaking which I ultimately aborted as my app crashes on launch… It’s definitely just an artifact of the upgrade process, but it meant I abandoned that idea for now. I’m curious to see how others go with moving to v2—really so much has changed, it’s a lot of work.

                                              1. 3

                                                A company I just left had two applications in Tauri. One of the applications was upgraded to the RC of Tauri 2 as an experimental upgrade and we were super pleased with it. Both v1 and v2 have been amazing for us. The communication between the front end and Rust side has been trouble free, but can be a little cumbersome depending on how you design the applications and which parts are handled by which (because of the plugins there generally isn’t much you cannot do from just the front end outside of performance sensitive sections; so depending on team makeup and skillsets you may move more or less into the backend and have to utilize the communication layer more or less).

                                                Our biggest hiccup with Tauri 2 was the permissions and capabilities layer which at the time was ubderdocumented and could be complex. However, once we figured that part out it’s extremely granular and incredibly flexible, so we can see why it’d be designed the way it is.

                                                Overall, I’d give a huge recommendation to Tauri for anyone interested in shipping an electron like application that is comfortable in Rust.

                                                1. 1

                                                  We tried to build a serious application on Tauri but it did not work out. That’s all I’ll say.

                                                  1. 4

                                                    Interesting. Totally understand not wanting to say more — but I am definitely curious about what problems you ran into!

                                                2. 16

                                                  This is a well written article, even for someone like me who doesn’t use Rust.

                                                  These types of realizations are why I think more people need to learn about concurrent ML, delimited continuations (i.e. prompt/reset/shift), and implement some of their own concurrency in Lisp. I know I know I know, another person saying implement it in lisp! But once you grasp the fundamental abstractions, you can ask yourself things like “how much of the program’s state does this async operation capture” well before you’re hit with the footgun.

                                                  1. 4

                                                    Aww, thanks! I’m glad you think so.

                                                    Are there any specific things you’d recommend reading about those topics?

                                                    1. 6

                                                      It’s a talk, but I highly recommend Delimited Continuations, Demystified by Alexis King for learning about delimited continuations. She explains this stuff really well.

                                                      1. 3

                                                        I find andy wingo’s talk about his guile fiber implementation has lots of ideas I seem to care about: https://www.youtube.com/watch?v=7IcI6sl5oBc

                                                    2. 3

                                                      Small note: the backstory says it was written in 2024 (so this project must be from the future 🙃) https://nuejs.org/backstory/

                                                      1. 5

                                                        I think Nushell seems cool, and I’ve had the thought of switching to it a couple of times.

                                                        The main blocker for me was that I just don’t do that much in my terminal aside from moving around and running individual commands with their arguments. All the data processing capabilities sound cool, but just not part of what I find myself needing. On the other hand, sometimes I copy a command from somewhere or run a curl > sh install script (I know, the horror!), and the fact that that doesn’t work with nu was a knock against it.

                                                        Now, I may be outside of the target audience for Nushell as a backend engineer that doesn’t do a lot of data/local scripting stuff. But I have a bit of a hard time imagining who the ideal target user is…

                                                        1. 3

                                                          I have a bit of a hard time imagining who the ideal target user is…

                                                          Sysadmins, people who write scripts, shell enthusiasts.

                                                        2. 1

                                                          Could someone explain in simple language what the idea behind generational indicies is?

                                                          1. 4

                                                            It’s a way of detecting dangling references to slab-like allocations in runtime.

                                                            It exists purely in runtime, it requires each allocation to have a generation index, it requires every reference to carry that index.

                                                            When a reference is dereferenced, indices must be checked in runtime, if the allocation does not have the generation of the reference, the caller must handle the situation in runtime.

                                                            Unlike RC, this does not give any static guarantees about validity of the allocation. It might be slightly faster when there are lots of copying of references and few accesses to their allocations.

                                                            Generational references is basically a term invented by the Vale author for a software version of ARM memory tagging.

                                                            1. 1

                                                              From the readme of one I wrote for myself a while back:

                                                              this is a random-access data structure that stores an unordered bag of items and gives you a handle for each specific item you insert. Looking things up by handle is O(1) – the backing storage is just a Vec – and items can be removed as well, which is also O(1). Handles are small (two usize’s) and easy to copy, similar to a slice. The trick is that there is a generation number stored with each item, so that a “dangling” handle that refers to an item that has been removed is invalid. Unlike array indices, if you remove an item from the array and a new one gets put in its place, the old stale handle does not refer to the new one and trying to use it will fail at runtime.

                                                              This is useful for various things: managing lots of uniform things with shared ownership (such as video game resources), interning strings, that sort of thing. Essentially by using handles to items you have runtime-checked memory safety: you can get an item from the map using the handle, since it’s basically just an array index.

                                                              In practice this is not unrelated to Rc, it’s just that Rc does the accounting of memory on cloning the Rc, and this does it on “dereferencing” by looking up an object’s handle. Referencing counting, threading garbage collection and this sort of map are all different methods of achieving the same thing (checking for memory safety at runtime) with different tradeoffs. With a reference count you don’t have to explicitly free items and can’t have stale handles, but loops require special handling and you pay for the accounting cost on cloning a handle. With this you have to free items explicitly, but stale handles can be safely detected and the accounting happens when you look the item up. You can also use it as a slab allocator type thing, where you pre-allocate a large amount of storage and free it all at once.

                                                            2. 6

                                                              Irrespective of whether they’re hard to learn, I’d agree that they’re a useful tool worth learning at least this subset of.

                                                              I’d also highly recommend https://regexr.com/. I always use it when I’m trying to build up and kind of more complicated pattern, because it makes it easy to visualize how the parts of the pattern match text and it let’s you put as many example strings as you’d like to test against. Their built in cheat sheet is also very handy.

                                                              1. 1

                                                                I’d pretty much given up on learning regular expressions until I found some interactive, visual tools. In the early 2000s, it was RegexBuddy. The one I’ve been using for the last decade or so is regexplained.co.uk. Seeing a regular expression as a railroad diagram as you build it really helps not only to understand a specific regular expression but to learn them in general.

                                                              2. 7

                                                                Prometheus queries are hard because it’s design priorities scale (via very effecient data collection) over querying data. In normal usage of Prometheus you can’t escape using sampled data and it’s hard to escape using monotonic gauges, both of which place a minimum complexity on any query.

                                                                “Older” sytems like InfluxDB, TimescaleDB, or even “ancient” systems like Graphite make querying data far easier and can be a lot more flexible around the type of data being stored. You have the option to use same data the systems being monitored generated, and aren’t forced into using monotonic gauges as the query engine can operate non-sampled data. So the data is structured in a way that’s far easier to query in a lot of cases, and (moving entirely into the world of opinions), they have query languages that are a lot easier for engineers to use and understand.

                                                                I like Prometheus a lot, but everytime I’ve worked for a company using it they’ve failed to understand the trade-offs Prometheus and it’s design makes, leading to a lot of frustration with the resulting monitoring stack.

                                                                1. 3

                                                                  Mm that’s a great point. Hadn’t thought about the difficulty of grokking queries using monotonic counters coming from that design tradeoff.

                                                                2. 5

                                                                  What has helped me when learning PromQL has been to think about it as a query language for filtering instead of something similar to SQL. You are always filtering down to get the subset of metrics you are interested in and then aggregating those into something to graph with some function (sum, avg, etc.).

                                                                  I agree that to fully understand a query you’ll need to grasp more details that are not immediately “bubbled up” to the user: scrape interval, type of the metric (counter, gauge, histogram), query look back period, etc.

                                                                  1. 4

                                                                    Agreed. This blog post, and specifically his illustrations of how grouping labels works, helped me grok the point that you’re always filtering down first: https://iximiuz.com/en/posts/prometheus-vector-matching/

                                                                  2. 12

                                                                    I’m always uneasy when reading articles like “SQL is overengineered”, “git is complicated”, “(some other core technology) is hard”.

                                                                    Especially with Prometheus query, I know I’m repeating myself but I think that PromQL, like SQL, Git, IP, PKCS, … is part of the software engineer toolbox. There should be no corner cutting, IMHO. The technology should be learned and mastered by anybody who want to qualify as a software “craftperson.” I’m more and more sadden at the lowering of the standard of my profession… But I might just have become an old man… Maybe you shouldn’t listen to me.

                                                                    1. 21

                                                                      I’m fine with learning difficult technologies, but PromQL just seems poorly designed. Every time I touch it and try to do something well within the purview of what a time series database ought to be able to do, it seems there isn’t a good way to express that in PromQL—I’ll ask the PromQL gurus in my organization and they’ll mull it over for a few hours, trying different things, and ultimately conclude that hacky workarounds are the best case. Unfortunately it’s been a couple of years since I dealt with it and I don’t remember the details, but PromQL always struck me as uniquely bad even worse than git.

                                                                      Similarly, the idea that software craftsmen need to settle for abysmal tools—even if they’re best in class today—makes me sad. What’s the point of software craftsmanship if not making things better?

                                                                      1. 7

                                                                        Every time I touch [Prometheus] and try to do something well within the purview of what a time series database ought to be able to do…

                                                                        One big conceptual thing about Prometheus is that it isn’t really a time series database. It’s a tool for ingesting and querying real-time telemetry data from a fleet of services. It uses a (bespoke and very narrowly scoped) time series database under the hood, yes — edit: and PromQL has many similarities with TSDB query languages — but these are implementation details.

                                                                        If you think of Prometheus as a general-purpose TSDB then you’re always going to end up pretty frustrated.

                                                                        1. 2

                                                                          Could you expand on that more? I’m curious what features/aspects of a general TSDB you’re referring to Prometheus lacking. (This is a curiosity coming from someone with no experience with other TSDBs)

                                                                          1. 5

                                                                            It’s not that Prometheus lacks any particular TSDB feature, because Prometheus isn’t a (general-purpose) TSDB. Prometheus is a system for ingesting and querying real-time operational telemetry from a fleet of [production] services. That’s a much narrower use case, at a higher level of abstraction than a TSDB. PromQL reflects that design intent.

                                                                          2. 2

                                                                            I mean, I’m using it for telemetry data specifically. My bit about “ordinary time series queries” was mostly intended to mean I’m not doing weird high-cardinality shit or anything Prom shouldn’t reasonably be able to handle. I’m not doing general purpose TS stuff.

                                                                            1. 1

                                                                              Gotcha. I’d be curious to hear a few examples of what you mean, just to better understand where you’re coming from. Personally, I’m also (sometimes) frustrated by my inability to express a concept in PromQL. In particular, I feel like joining different time series on common labels should be easier than it is. But it’s not (yet) gotten to the point that I consider PromQL to be poorly designed.

                                                                              1. 1

                                                                                Yeah, unfortunately it’s been a while and I’ve forgotten all of the good examples. :/ Poorly designed feels harsh, but suffice it to say I don’t feel like it’s clicked and it seems like it’s a lot more complicated than it should be.

                                                                          3. 3

                                                                            I’ve wondered about this as well – how much of the difficulty has to do with a) working with time series b) PromQL syntax c) not knowing what metrics would actually be helpful for answering a given situation d) statistics are hard if you’re not familiar or e) a combination of the above.

                                                                            I’m curious if folks that have used something like TimescaleDB, which I believe uses a more SQL-flavored query syntax, have had a very different experience.

                                                                            1. 3

                                                                              In my experience, it’s been a combination of everything you’ve listed, with the addition of (at least my) teams not always being good about instrumenting our applications beyond the typical RED metrics.

                                                                              I can’t speak for TimescaleDB, but my team uses AWS Timestream for some of our data and it’s pretty similar as far as I can tell. Timestream’s more SQL-like syntax makes it both easier and harder to write queries, I’ve found. On the one hand, it’s great because I grok SQL and can write queries pretty quickly, but on the other hand I can start expecting it to behave like a relational database if I’m not careful. I’d almost rather just use PromQL or something like it to create that mental separation of behaviors.

                                                                          4. 10

                                                                            I’m more and more sadden at the lowering of the standard of my profession

                                                                            I see the reverse. Being willing to accept poor-quality UIs is a sign of low standards in a profession. Few other skilled trades or professions [1] contain people using poorly designed tools and regard using them as a matter of pride. Sometimes you have to put up with a poorly designed tool because there isn’t a better alternative but that doesn’t mean that you should accept it: you should point out its flaws and encourage improvement. Even very simple tools have improved a lot over the last few decades. If I compare a modern hammer to the one my father had when I was a child, for example, mine is better in several obvious ways:

                                                                            • The grip is contoured to fit my hand better,
                                                                            • The head and handle are now a single piece of metal so you don’t risk the head flying off when the wood ages (as happened with one of his).
                                                                            • The nail remover is a better shape and actually works.

                                                                            If carpenters had had your attitude then this wouldn’t have happened: a mediocre hammer is a technology that should be learned and mastered by anybody who wants to qualify as a “craftperson”. My hammer is better than my father’s hammer in all of these ways and it was cheap because people overwhelmingly bought the better one in preference.

                                                                            Some things are intrinsically hard. Understanding the underlying model behind a distributed revision control system is non-trivial. If you want to use such a tool effectively, you must acquire this understanding. This is an unavoidable part of any solution in the problem space (though you can avoid it if you just want to the tool in the simplest way).

                                                                            Other things are needlessly hard. The fact that implementation details of git leak into the UI and the UI is inconsistent between commands are both problems that are unique to git and not to the problem space.

                                                                            As an industry, we have a long history of putting up with absolutely awful tools. That’s not the attitude of a skilled craft.

                                                                            [1] Medicine is the only one that springs to mind and that’s largely due to regulators putting totally wrong incentives in place.

                                                                            1. 2

                                                                              I agree with you, although I think it’s worth calling out that git has at least tried to address the glaring problems with its UI. PromQL has remained stubbornly terrible since I first encountered it, and I don’t think it’s just a question of design constraints. All the Prometheus-related things are missing what I consider to be fairly basic quality-of-life improvements (like allowing you to name a subexpression instead of repeating it 3 times).

                                                                              Maybe PromQL also has limitations derived from its limited scope, but frankly I think that argument is… questionable. (It doesn’t help that the author of this article hasn’t really identified the problems very effectively, IMO.) The times I’ve resorted to terrible hacks in Prometheus I don’t think I was abusing it at all. Prometheus is actively, heavily, some might say oppressively marketed at tech people to do their operational monitoring stuff. But on the surface it’s incapable of anything beyond the utterly trivial, and in the hands of an expert it’s capable of doing a handful of things that are merely fairly simple, usually with O(lots) performance because you’re running a range subquery for every point in your original query.

                                                                              As an aside, I think the relentless complaining about git’s model being hard to understand is not helping in either dimension. Saying “DVCS is too hard, let’s reinvent svn” doesn’t stop DVCS being useful, but it makes people scared to learn it, and it probably makes other people think that trying to improve git is pointless, too.

                                                                            2. 1

                                                                              This is a very interesting point. I hear you in the general case (and I’ll also say that actually working more with PromQL has given me a lot of respect for it).

                                                                              I think it’s easier to make that argument for tools that people use on a daily or at least very regular basis. Depending on the stage of company you’re at, to what extent your job involves routinely investigating incidents, etc, PromQL may be something you reach for more or less frequently. It’s quite a different paradigm than a lot of other programming tools, so it makes sense to me that engineers who are new to it or don’t use it frequently would have a hard time. Also, speaking as someone who learned it pretty recently, the materials for learning it and trying to get to a deeper level of understanding of what you can and might want to do with it are…sparse.

                                                                              1. 5

                                                                                I think you nailed it - in many cases you don’t touch Prometheus until you’re investigating some issue and that’s often when it’s urgent, and doing so using an unfamiliar query language is a recipe for pain. Of course, you could set aside some time to learn it, but if a lot of time passes until you need it again, those skills will have faded again.

                                                                                git is hard to learn compared to some of its competitors but has become ubiquitous enough, and once you start using it daily you will learn it properly in no time. Learning additional stuff about it becomes easier too once you have a good foundation and it will stick around better, as well. For SQL I’d argue the same - at uni I almost flunked my SQL course due to simply not grokking it, but I’ve worked with it so much that I’m currently one of the company’s SQL experts.