Python is an interesting language to me. I’m far from experienced, but it’s really what made writing code click for me. I loved using it for my projects and felt super productive in it for a few years. Then I began using it professionally and grew to be disappointed by the tooling around it when I worked on a team that cut corners and revealed the worst of the language, which to me was the lack of an enforced type system, the system python version conflicts, and dependency management. There were too many optional practices, and I just wanted a more structured default that provides a more consistent experience across projects that use python.
I’ve switched to using rust as my primary language and am a lot happier now. Python remains my choice for leetcode interviews, but I never reach for it for a project unless I really need to use a specific library. Curious how others view it.
My experience sounds very similar. Python was a language that really helped programming “click” in my brain but I was quickly disenchanted when working with it professionally for the same three reasons. These days Rust is also my primarily language, along with Java/Kotlin when needed.
Of course you can. And the nice thing about both HNT and Scoped Tagging is that if you don’t want to think about hierarchies/scopes then you don’t have to. There’s nothing forcing you to have tags with a depth > 1 in HNT, nor to use scopes in Scoped Tagging.
Personally, I want a little more structure than what Simple Tagging provides. I want to avoid having a large cloud of tags across all my bookmarks and what I’ve come up with for Scoped Tagging is less about hierarchies and more about having a way to modularize your tags.
Again though, if that doesn’t work for you then it’s not forced which is a nice property to have :)
I really enjoyed the previous entries in this series, and I’m hoping to someday get around to trying (at least) FastCDC for delta-compression.
I found it more difficult to understand what RapidCDC and QuickCDC were doing in this post. Maybe I’m not reading closely enough, but I can see talk of storing previously-seen chunk sizes, and presumably they’re used to predict future chunk sizes, though I can’t quite imagine exactly how that works. The previous entries had nice diagrams and source-code snippets that really helped.
That said, it sounds like RapidCDC and QuickCDC don’t offer much benefit if you have to read the entire file anyway to verify that chunks do not contain inner modifications. Maybe I should just stick to FastCDC and ignore the others.
I found it more difficult to understand what RapidCDC and QuickCDC were doing in this post
Honestly I found the same thing when reading the papers. They seem to be built on all this previous knowledge of how chunking works (unlike the DDelta and FastCDC papers), and so (at least in my reading of them) they feel more set out as just “here are some optimizations to try and skip over a bunch of bytes”.
I can see talk of storing previously-seen chunk sizes, and presumably they’re used to predict future chunk sizes, though I can’t quite imagine exactly how that works
Yea that’s right. If, when we identify a chunk boundary, we have access to a list of the sizes for chunks that follow the current chunk, then we can use those sizes as hints for where the next chunk boundary is. The papers seem to be quite good at spelling out how this works, but not so good at considering how to determine if that chunk is a duplicate or unique. They provide a bunch of options, but none that really seem to work in all cases without also reading every byte.
That said, it sounds like RapidCDC and QuickCDC don’t offer much benefit if you have to read the entire file anyway to verify that chunks do not contain inner modifications.
This was the realization I came to. And I’m worried I’m just missing something because this seems like a pretty big flaw in these two CDC algorithms. Then again, maybe there are some chunking use cases that don’t care so much about chunks being treated faithfully as duplicates 100% of the time, and 99.98% of the time is acceptable.
Maybe I should just stick to FastCDC
This would be my recommendation (and I nearly wrote that in the post). The nice thing about the optimizations in these papers is that they are all pretty orthogonal. Personally, I would start out with FastCDC and then investigate optimizing with techniques from these (and other) papers later down the line if it becomes necessary.
Hi! I recently did benchmark for various deduplicating solutions. I. e. for programs, which maintain some kind of “repo” for storing and deduplicating data, such as borg, casync, etc. Most of them are based on some kind of CDC.
Results are very unfortunate. Turns out FOSS deduplicating solutions miss many optimizations. All they are slower than they should be. So I had to develop my own fast solution called azwyon (it is also included in benchmark above). But azwyon uses fixed chunking as opposed to CDC, because this is what needed for my problem.
Okay, so here are list of problems with existing CDC-based solutions, such as borg and casync. I will use “extract” and “store” as names of basic operations of such solutions.
Borg. Not parallel. Uses slow SHA-256 as hash. Also uses blake2, but it is not default and undiscoverable. I. e. the documentation doesn’t advertise that you can speed up borg by switching to blake2. Also blake3 is even faster, but it is not supported. Also, extracting always verifies hashs, this cannot be disabled
UPD: as on 2023-08-10 borg2 docs don’t say that blake2 mode is faster than no encryption. Docs for borg1 do say this
UPD: SHA-256 is slower than blake2 on my machine. Some machines support special instructions for SHA-256, hence SHA-256 is faster there
Casync. Not parallel. For unknown reasons storing is something like x5 slower than in borg
Rdedup. For unknown reasons extracting is slower than desync. Also, as you can see in benchmark, for unknown reasons storing with FastCDC with block size 64K turns out to be very slow. x10 times slower than with block size 4096K. It may be bug in rdedup or problem with FastCDC algorithm itself
Zpaq. Despite being very old trusted solution, it turns out to be very slow in its default mode compared to alternatives with marginally better compression ratio
So, it seems that current FOSS situation with deduppers is very bad. All them can be easily improved in terms of speed. Or a new fast solution can be created, which will outperform existing solutions
In fact, my azwyon contains 200 lines of Rust code (not counting my thread pool library). It fast, simple and parallel. With minimal edits it can gain CDC support. This will make it outperform all other FOSS deduppers
Great write up! Really enjoyed the journal entries too.
I’ve had BBEdit installed forever but haven’t used it much. This post also makes me want to try out Nova (just installed it) though I’ve built up a lot of muscle memory with Vim after so many years. Looks like Nova has support for Vim keybindings, though there aren’t a lot of options for customization.
I have the fortunate situation of knowing enough Vim to be useful and never having made it my primary editor. Modal editing always seems like a good thing… for which I am unwilling to throw away everything else I like about a good text editor. 😂
Having built up such a reliance on Vim to be productive I do agree that I have thrown away a few good tools because they don’t have modal editing. Something I quite like (and half tried with intellij which I am forced to use sometimes) is building in Emacs style modal editing which is essentially just loads of custom hotkeys bound under Ctrl or Alt (or both). Keen to give that a shot again with Nova.
One of the things I continue to love about macOS in general and well-behaved Mac apps specifically is that CoreText has native support for a ton of the Emacs text movement and manipulation key bindings. One reason both Chrome and especially Firefox do not “stick” well for me (there are many, but one) is that they mess that up! You can pry my ability to hit ⌃E to go to the end of a line from my cold dead hands!
Unfortunately Nova’s vim mode is extremely poor. I mostly ran into missing features. It seems to be a “there’s normal and insert mode and you can use hjkl” mode.
Packing up to go to Nepal for a month and do some climbing in the Khumbu Valley. And trying not to start any new personal projects just before I leave 🙃
Great idea to keep a “gripes” file. I constantly find myself getting annoyed with the software I use and then forgetting about it, only to find myself getting annoyed again. And only tracking issues with software when I plan to do something about it (in my projects list). I’ve just setup a list called “Software Grinds”… for things that really grind my gears. https://youtu.be/GospVDNp6EM?t=13
It still sounds like “gear hashing” is just a re-branding of buzhash, except that in buzhash, you also pass in min/max block sizes, to trim the long and short tail from the block size distribution. What am I missing?
I was going to say the same thing. The article even links to a Wikipedia page mentioning Buzhash and block size tricks like you mention date as far back as the beginnings of CDC in Manber93/Spam Sum/LBFS. I do completely agree that Buzhash is simpler & probably usually faster and is overlooked in this space especially in teaching contexts. That said, Bob Uzgalis (“Buz”) actually died over 10 years ago now - and I’m not sure he did much more than popularize an S-box for this kind of hashing - DES in 1977 had S-boxes… Anyway, for the curious, there is a Nim impl of some of these ideas over at https://github.com/c-blake/ndup
I wouldn’t say a rebrand, but certainly closely related and quite similar to Gear Hashing. Buzhash is a rolling hash based on Cyclic Polynomials. Though both Gear and Buzhash use a table of byte-value to random integer, incurring an array lookup, one key difference is that Buzhash uses barrel-shifts and XORs while Gear uses regular shifts and ANDs.
Nice write up :) It echoes a lot of the experience and sentiments I have with Go. Particularly that I can see why it exists and has gained popularity, and I can appreciate it myself (heck even enjoying it from time to time). But fundamentally the values embodied by the language don’t align with my values on software engineering as much as other languages do - as much as Rust, Haskell, and I’d even say Kotlin aligns more than Go does.
Especially liked the comparison of worse-is-better to the MIT approach. I found that a nice summary.
I sort of agree. I wish the world was better calibrated for the tradeoffs Rust makes, and some of it is. If I were in some market where the costs of any single error were high and/or I can’t quickly push out a patch, I think Rust is the economical way to go. Maybe it will be the way to go when we’ve largely automated everything and the competition is no longer about being the first mover and more about having the faster and more correct product?
But for now, for a huge swatch of the software engineering space, the most important thing is rapid productivity and Go is best in class here (quite a lot better even than Python, despite my 15 years of experience with it).
And its quality is good enough—indeed, people are still shipping important software backed by fully dynamic languages! Go’s static typing story may only be 95% “complete”, but a whole lot of engineering is done entirely without static types! And not just the likes of Twitter and Reddit, but things like healthcare and finance. Go is a marked improvement here but it still gets dramatically more criticism than dynamic languages for lacking more advanced static analysis features.
Most importantly, as much as my younger self would hate to admit it, type systems aren’t that important. Having some basic type checking yields a ton of productivity, but beyond that the returns on quality diminish and the productivity returns go negative. Same deal with performance—if you’re 2 or 3 orders of magnitude faster than Python, you’re probably good enough, and to get much faster you’re often having to trade off a ton of productivity, which again is a bad tradeoff for the overwhelming majority of applications. More important are things like learning curve, tooling, ecosystem, deployment story, strong culture/standards/opinions etc. Go strikes these balances really well.
For better or worse, Go is the best fit for a giant chunk of software development at this moment in time.
I think the “cost of a single error” is why companies in security are jumping on Rust. The issue is no one cares about the cost of a single error if it can be put off to runtime. Since no matter how much devops synergy takes place, the time and money spent diagnosing a runtime error is often not on the programmer who wrote it, let alone the person who made the language choice. By allowing nil or nil, nil to be a part of the language, the Go story is one of late nights and anxiety at runtime, instead of chewed pencils and squeezed rubber ducks at compile time. The labor market seems to prefer runtime pain, because you can hire a wider range of people to write and track down preventable bugs than you can hire to write in a language that sidesteps them completely. This is perhaps too sarcastic a take, but oh well. I’ve been paid to write both and I don’t sweat about the Rust I’ve written from a maintenance or a “will it crash” perspective. Go on the other hand, has so much room for unplanned activities.
Eh, I’ve written plenty of bugs in Rust, which has been a frustrating and disappointing experience considering all of the energy I would put into pacifying the borrow checker. After having used it, I would say I get so wrapped up in thinking about borrow-checker-friendly architectures and planning refactoring (“If I change this thing from owned to borrowed, is this going to cascade through my program such that I spend untold hours on this refactor?”) that I actually lose track of details that the type system can’t help me with (“did I make sure to escape this string correctly?”, “did I pass the right PathBuf to this function?”, etc). And of course it takes me much longer to write that code.
I don’t want to make too big of a deal about that point–I think Rust’s quality is still a bit better than Go’s, but the difference seems exaggerated to me and overall has been disappointing considering how much time and energy goes into pacifying the borrow checker. I also haven’t heard much discussion about the quality impact associated with taking the developer’s focus off of the domain and onto the borrow-checker. That said, Rust’s productivity hit though is a huge deal for most software development, and you can diagnose and patch a lot of bugs in the amount of time you save by writing Go (including bugs that Rust’s type checker can’t prevent at all).
Again, I like Rust, and I’m glad it exists, but it’s (generally) not well suited for the sort of software that Go is used to write. It’s a great replacement for C and C++ though (including the security space).
I agree, the problem space for Rust is ideal for implementing known solutions. After a year of coding only in Rust the ‘appeasing the borrow checker’ became about as difficult as ‘appeasing the syntax checker’ … 97% of the time. Then the other 3% of the time you can find yourself painted into a corner in a way that isn’t very satisfying to fix. Still the basic rules on lifetimes and lifetime elision do creep into working knowledge after a while, I just haven’t seen a document that distills what that knowledge entails into basic patterns or rules.
I don’t want to make too big of a deal about that point–I think Rust’s quality is still a bit better than Go’s, but the difference seems exaggerated to me and overall has been disappointing considering how much time and energy goes into pacifying the borrow checker.
Some folks have brought this up in the past but I haven’t seen anyone try and explore this line of inquiry. I personally find myself spending a lot of time thinking about the propagation of borrows and owns as you mentioned, and I find it a drag on thinking about the problem domain, but I only hack on Rust for fun so I can’t say whether this is a problem in production situations or not.
I’ve kind of had the inverse, going from fulltime rust to python/go I’ve found myself just saying “well, if that error happens so be it, we’ll crash at runtime”. The cognitive overhead of thinking about catching errors and exceptions (or disentangling them) is large for me returning to these none/nil langs. I do recall the borrow checker absolutely ruining me early on tho. These days I tend to think a lot of the code I write in Python would survive the borrow checker. You can write ____ in any language. :D
I sort of agree. I wish the world was better calibrated for the tradeoffs Rust makes, and some of it is.
And that’s cool :) but I think this still comes down to values. Your personal values a constructor the software, the values embodied by the product, and the values of the business surrounding the product.
Personally, my values (and interests) align more to building systems where correctness is more important than speed to delivery, where competition of products is assessed by users on more than who has the most features, and where I would rather take the steep learning curve over a shallower one if it means better design/elegance/consistency.
But these are my values. And I fully recognize that they may not align with the values of others. Particularly with many of the “high-tech”, “fast-moving” companies and “startups” where, as you said and I do agree, being a first mover is a bigger competitive advantage than having a correct (and I’ll add more generally, a higher quality) product.
I agree with all of this, and indeed it’s what I mean by “I wish the world was better calibrated for the tradeoffs Rust makes”. Specifically, the economic context is such that we’re largely replacing stuff that humans are doing manually with software, so we’re already talking about gains which dwarf any difference between Rust and the most error prone programming languages. The first mover advantages are enormous, so iteration velocity is king. This is what we mean when we say “the world values iteration speed over performance and correctness”. I wish this weren’t the case–I wish performance and correctness were the most important criteria, but until someone lends me a magic lamp, I must deal with reality.
TBH, Haskell is in kind of an awkward spot these days:
if you want to play with type-level magic (and proofs) Idris is a much better fit, all the type-level programming that was done with awkward hacks in Haskell becomes much easier in a fully dependent system;
meanwhile if you want to get practical things done quickly and safely, Rust is absolutely kicking ass on all fronts and it’s really hard to pick anything else!
I love Rust, but I don’t think it’s a clear winner over Haskell personally.
In Rust, the affine types and lack of garbage collection are really great when I’m working on low-level code. As someone who has written a lot of C and a lot of Haskell, Rust undeniably hits a lot of my requirements. For a lot of day-to-day work though, I still find that I’m much more likely to pick up Haskell. Little things like higher kinded types and GADTs end up being a big force multiplier for me being able to build the sorts of APIs that work best for me. I also really value laziness and the syntactic niceties like having universal currying when I’m working in Haskell.
None of that is anything negative about Rust. I really admire what the Rust community has done. If anything, I think rustaceans are in a great position to leverage all of the things they’ve learned from Rust so that they can more quickly and easily dip a toe into Haskell and see if it might be useful to them sometimes. In the end I don’t think we have to view each other as competitors, so much as two languages that sit in somewhat different spots of the ecosystem that can learn and benefit one another.
I think rustaceans are in a great position to leverage all of the things they’ve learned from Rust so that they can more quickly and easily dip a toe into Haskell and see if it might be useful to them sometimes.
This is exactly where I am in my PL journey (outside of work). I’ve been writing Rust for 5 years and it’s a great language for all the reasons you mentioned and more. I also found Rust a nice intro to writing real world code that is more FP than OO (i.e: structs/record and traits/type-classes instead of classes and interfaces) while still having static types (I love lisps but I tend to work better with types). Now I’m getting into Haskell and so far the process have been fairly smooth and very enlightening. The type system is far more expressive and I can see myself being highly productive in Haskell (far more than Rust) not having to worry about memory management and some of the more restrictive aspects of the Rust type system. If the learning process continues I wouldn’t be surprised if Haskell becomes my “go-to” for most problems, but Rust is still there for when I care more about performance and resource usage.
Finishing off reading Modern C by Jens Gustedt. After spending much time exploring and working with various languages I have had an urge to get back to writing C which I haven’t done in anger since college days.
Nice! I’m no C testing library expert but seems similar to libcheck with the use of macros to define the tests (maybe they’re all like that) but with far less boilerplate :)
This is a fork of gddo, the software which previously powered the now-defunct godoc.org, operated by Drew DeVault. The source code is available on sr.ht.
Anyone here have context around the situation that led to the fork? I must have entirely missed this (but that’s not surprising as I only tinker with Golang).
Godoc.org used to be an independent, open source site that Google bought just to cancel and replace with proprietary bloat that takes >10x longer to render, and won’t display documentation for except for licenses deemed acceptable by Google, in particular it rejects public domain and WTFPL. Some people dislike that and decided to do something about it.
I never cared about “silly” licenses like the WTFPL, but now I am tempted to license all my code under WTFPL. Google won’t allow the use of WTFPL code internally, which for me is absolutely great.
Something like this would be better then, if one wanted to be inflammatory but more correct:
Godoc.org used to be an independent, open source site that Google agreed to take stewardship over, just to cancel and replace with proprietary bloat that takes >10x longer to render, and won’t display documentation for except for licenses deemed acceptable by Google, in particular it rejects public domain and WTFPL. Some people dislike that and decided to do something about it.
EDIT: I see from the rest of the hacker news discussion that the replacement isn’t proprietary. I can’t comment on the bloat/performance aspect, new site seems fast to me. That really only leaves the licensing stuff.
WTFPL is problematic in general because the public domain doesn’t exist in all jurisdictions, which is why CC-Zero exists. If your goal is to build free software that other people can reuse, I wouldn’t recommend it.
If your goal is to spite Google, use something like AGPL instead.
… and I’ll bet doesn’t work well on non-Chrome (maybe non-Firefox?) browsers either - and thus not on Plan 9. Whereas godocs.io seems to work nicely on Netsurf.
Edit: To my surprise, it (somewhat) does, albeit with horrendously broken layout.
Can confirm (from my experience) on Firefox that pkg.go.dev has some… paper cuts, whereas on Chrome I haven’t run into any issues. godocs.io on the other hand has worked flawlessly on both and is my go to.
Good luck! I’ve used Clojure as my main language at work for nearly 2 years now, and still liking it. I also use Emacs—I would recommend configuring flycheck-clj-kondo, and lsp-mode (clojure-lsp) for jumping around. FWIW my rather simple Clojure-related Emacs config is here.
More generally, I find that ctags are surprisingly underpowered even for robust semanticless searcher. I believe it should be easy to build a significantly more powerful tool (comment with details).
+1 in fact @matklad it was your comment that has me thinking about this a bit more. I’ve been messing around with Tree Sitter and was doing some initial comparisons with ctags which is how I came across this interesting behaviour. Hopefully more to come on this subject :)
OT: this “You have built X” meme format makes me very happy. Kudos to those who recognize good ideas in the wild :)
Most of them seem to be better-explained, usually more positive, variants of Greenspun’s 10th rule.
I need more
There is also Dear Sir, You Have Built a Compiler
Python is an interesting language to me. I’m far from experienced, but it’s really what made writing code click for me. I loved using it for my projects and felt super productive in it for a few years. Then I began using it professionally and grew to be disappointed by the tooling around it when I worked on a team that cut corners and revealed the worst of the language, which to me was the lack of an enforced type system, the system python version conflicts, and dependency management. There were too many optional practices, and I just wanted a more structured default that provides a more consistent experience across projects that use python.
I’ve switched to using rust as my primary language and am a lot happier now. Python remains my choice for leetcode interviews, but I never reach for it for a project unless I really need to use a specific library. Curious how others view it.
My experience sounds very similar. Python was a language that really helped programming “click” in my brain but I was quickly disenchanted when working with it professionally for the same three reasons. These days Rust is also my primarily language, along with Java/Kotlin when needed.
As I mentioned in the discussion of the previous blog post:
Part of why I like tags is that I don’t have to work out or think about hierarchies.
Of course you can. And the nice thing about both HNT and Scoped Tagging is that if you don’t want to think about hierarchies/scopes then you don’t have to. There’s nothing forcing you to have tags with a depth > 1 in HNT, nor to use scopes in Scoped Tagging.
Personally, I want a little more structure than what Simple Tagging provides. I want to avoid having a large cloud of tags across all my bookmarks and what I’ve come up with for Scoped Tagging is less about hierarchies and more about having a way to modularize your tags.
Again though, if that doesn’t work for you then it’s not forced which is a nice property to have :)
I can’t recommend the videos from Sebastian Lague and Freya Holmér enough!!
+100 for Sebastian Lague!
Shameless plug… but want to throw in snowhashing as well 😄
I really enjoyed the previous entries in this series, and I’m hoping to someday get around to trying (at least) FastCDC for delta-compression.
I found it more difficult to understand what RapidCDC and QuickCDC were doing in this post. Maybe I’m not reading closely enough, but I can see talk of storing previously-seen chunk sizes, and presumably they’re used to predict future chunk sizes, though I can’t quite imagine exactly how that works. The previous entries had nice diagrams and source-code snippets that really helped.
That said, it sounds like RapidCDC and QuickCDC don’t offer much benefit if you have to read the entire file anyway to verify that chunks do not contain inner modifications. Maybe I should just stick to FastCDC and ignore the others.
Honestly I found the same thing when reading the papers. They seem to be built on all this previous knowledge of how chunking works (unlike the DDelta and FastCDC papers), and so (at least in my reading of them) they feel more set out as just “here are some optimizations to try and skip over a bunch of bytes”.
Yea that’s right. If, when we identify a chunk boundary, we have access to a list of the sizes for chunks that follow the current chunk, then we can use those sizes as hints for where the next chunk boundary is. The papers seem to be quite good at spelling out how this works, but not so good at considering how to determine if that chunk is a duplicate or unique. They provide a bunch of options, but none that really seem to work in all cases without also reading every byte.
This was the realization I came to. And I’m worried I’m just missing something because this seems like a pretty big flaw in these two CDC algorithms. Then again, maybe there are some chunking use cases that don’t care so much about chunks being treated faithfully as duplicates 100% of the time, and 99.98% of the time is acceptable.
This would be my recommendation (and I nearly wrote that in the post). The nice thing about the optimizations in these papers is that they are all pretty orthogonal. Personally, I would start out with FastCDC and then investigate optimizing with techniques from these (and other) papers later down the line if it becomes necessary.
Hi! I recently did benchmark for various deduplicating solutions. I. e. for programs, which maintain some kind of “repo” for storing and deduplicating data, such as borg, casync, etc. Most of them are based on some kind of CDC.
Results are here: https://github.com/borgbackup/borg/issues/7674#issuecomment-1656787394 (see whole thread for context)
Results are very unfortunate. Turns out FOSS deduplicating solutions miss many optimizations. All they are slower than they should be. So I had to develop my own fast solution called azwyon (it is also included in benchmark above). But azwyon uses fixed chunking as opposed to CDC, because this is what needed for my problem.
Okay, so here are list of problems with existing CDC-based solutions, such as borg and casync. I will use “extract” and “store” as names of basic operations of such solutions.
So, it seems that current FOSS situation with deduppers is very bad. All them can be easily improved in terms of speed. Or a new fast solution can be created, which will outperform existing solutions
In fact, my azwyon contains 200 lines of Rust code (not counting my thread pool library). It fast, simple and parallel. With minimal edits it can gain CDC support. This will make it outperform all other FOSS deduppers
I added some UPDs
Great write up! Really enjoyed the journal entries too.
I’ve had BBEdit installed forever but haven’t used it much. This post also makes me want to try out Nova (just installed it) though I’ve built up a lot of muscle memory with Vim after so many years. Looks like Nova has support for Vim keybindings, though there aren’t a lot of options for customization.
I have the fortunate situation of knowing enough Vim to be useful and never having made it my primary editor. Modal editing always seems like a good thing… for which I am unwilling to throw away everything else I like about a good text editor. 😂
Having built up such a reliance on Vim to be productive I do agree that I have thrown away a few good tools because they don’t have modal editing. Something I quite like (and half tried with intellij which I am forced to use sometimes) is building in Emacs style modal editing which is essentially just loads of custom hotkeys bound under Ctrl or Alt (or both). Keen to give that a shot again with Nova.
One of the things I continue to love about macOS in general and well-behaved Mac apps specifically is that CoreText has native support for a ton of the Emacs text movement and manipulation key bindings. One reason both Chrome and especially Firefox do not “stick” well for me (there are many, but one) is that they mess that up! You can pry my ability to hit ⌃E to go to the end of a line from my cold dead hands!
😂 Very glad that things like ⌃E and ⌃A work well almost everywhere! (and especially in Google Docs)
Unfortunately Nova’s vim mode is extremely poor. I mostly ran into missing features. It seems to be a “there’s normal and insert mode and you can use hjkl” mode.
Packing up to go to Nepal for a month and do some climbing in the Khumbu Valley. And trying not to start any new personal projects just before I leave 🙃
I keep a “gripes” file where I list all the things that bug me about the software I use.
At the top is a “hall of fame” section listing programs with only one or zero complaints.
includein their config file format)Great idea to keep a “gripes” file. I constantly find myself getting annoyed with the software I use and then forgetting about it, only to find myself getting annoyed again. And only tracking issues with software when I plan to do something about it (in my projects list). I’ve just setup a list called “Software Grinds”… for things that really grind my gears. https://youtu.be/GospVDNp6EM?t=13
It still sounds like “gear hashing” is just a re-branding of buzhash, except that in buzhash, you also pass in min/max block sizes, to trim the long and short tail from the block size distribution. What am I missing?
I was going to say the same thing. The article even links to a Wikipedia page mentioning Buzhash and block size tricks like you mention date as far back as the beginnings of CDC in Manber93/Spam Sum/LBFS. I do completely agree that Buzhash is simpler & probably usually faster and is overlooked in this space especially in teaching contexts. That said, Bob Uzgalis (“Buz”) actually died over 10 years ago now - and I’m not sure he did much more than popularize an S-box for this kind of hashing - DES in 1977 had S-boxes… Anyway, for the curious, there is a Nim impl of some of these ideas over at https://github.com/c-blake/ndup
I wouldn’t say a rebrand, but certainly closely related and quite similar to Gear Hashing. Buzhash is a rolling hash based on Cyclic Polynomials. Though both Gear and Buzhash use a table of byte-value to random integer, incurring an array lookup, one key difference is that Buzhash uses barrel-shifts and XORs while Gear uses regular shifts and ANDs.
Nice write up :) It echoes a lot of the experience and sentiments I have with Go. Particularly that I can see why it exists and has gained popularity, and I can appreciate it myself (heck even enjoying it from time to time). But fundamentally the values embodied by the language don’t align with my values on software engineering as much as other languages do - as much as Rust, Haskell, and I’d even say Kotlin aligns more than Go does.
Especially liked the comparison of worse-is-better to the MIT approach. I found that a nice summary.
I sort of agree. I wish the world was better calibrated for the tradeoffs Rust makes, and some of it is. If I were in some market where the costs of any single error were high and/or I can’t quickly push out a patch, I think Rust is the economical way to go. Maybe it will be the way to go when we’ve largely automated everything and the competition is no longer about being the first mover and more about having the faster and more correct product?
But for now, for a huge swatch of the software engineering space, the most important thing is rapid productivity and Go is best in class here (quite a lot better even than Python, despite my 15 years of experience with it).
And its quality is good enough—indeed, people are still shipping important software backed by fully dynamic languages! Go’s static typing story may only be 95% “complete”, but a whole lot of engineering is done entirely without static types! And not just the likes of Twitter and Reddit, but things like healthcare and finance. Go is a marked improvement here but it still gets dramatically more criticism than dynamic languages for lacking more advanced static analysis features.
Most importantly, as much as my younger self would hate to admit it, type systems aren’t that important. Having some basic type checking yields a ton of productivity, but beyond that the returns on quality diminish and the productivity returns go negative. Same deal with performance—if you’re 2 or 3 orders of magnitude faster than Python, you’re probably good enough, and to get much faster you’re often having to trade off a ton of productivity, which again is a bad tradeoff for the overwhelming majority of applications. More important are things like learning curve, tooling, ecosystem, deployment story, strong culture/standards/opinions etc. Go strikes these balances really well.
For better or worse, Go is the best fit for a giant chunk of software development at this moment in time.
I think the “cost of a single error” is why companies in security are jumping on Rust. The issue is no one cares about the cost of a single error if it can be put off to runtime. Since no matter how much devops synergy takes place, the time and money spent diagnosing a runtime error is often not on the programmer who wrote it, let alone the person who made the language choice. By allowing nil or nil, nil to be a part of the language, the Go story is one of late nights and anxiety at runtime, instead of chewed pencils and squeezed rubber ducks at compile time. The labor market seems to prefer runtime pain, because you can hire a wider range of people to write and track down preventable bugs than you can hire to write in a language that sidesteps them completely. This is perhaps too sarcastic a take, but oh well. I’ve been paid to write both and I don’t sweat about the Rust I’ve written from a maintenance or a “will it crash” perspective. Go on the other hand, has so much room for unplanned activities.
Eh, I’ve written plenty of bugs in Rust, which has been a frustrating and disappointing experience considering all of the energy I would put into pacifying the borrow checker. After having used it, I would say I get so wrapped up in thinking about borrow-checker-friendly architectures and planning refactoring (“If I change this thing from owned to borrowed, is this going to cascade through my program such that I spend untold hours on this refactor?”) that I actually lose track of details that the type system can’t help me with (“did I make sure to escape this string correctly?”, “did I pass the right PathBuf to this function?”, etc). And of course it takes me much longer to write that code.
I don’t want to make too big of a deal about that point–I think Rust’s quality is still a bit better than Go’s, but the difference seems exaggerated to me and overall has been disappointing considering how much time and energy goes into pacifying the borrow checker. I also haven’t heard much discussion about the quality impact associated with taking the developer’s focus off of the domain and onto the borrow-checker. That said, Rust’s productivity hit though is a huge deal for most software development, and you can diagnose and patch a lot of bugs in the amount of time you save by writing Go (including bugs that Rust’s type checker can’t prevent at all).
Again, I like Rust, and I’m glad it exists, but it’s (generally) not well suited for the sort of software that Go is used to write. It’s a great replacement for C and C++ though (including the security space).
I agree, the problem space for Rust is ideal for implementing known solutions. After a year of coding only in Rust the ‘appeasing the borrow checker’ became about as difficult as ‘appeasing the syntax checker’ … 97% of the time. Then the other 3% of the time you can find yourself painted into a corner in a way that isn’t very satisfying to fix. Still the basic rules on lifetimes and lifetime elision do creep into working knowledge after a while, I just haven’t seen a document that distills what that knowledge entails into basic patterns or rules.
Some folks have brought this up in the past but I haven’t seen anyone try and explore this line of inquiry. I personally find myself spending a lot of time thinking about the propagation of borrows and owns as you mentioned, and I find it a drag on thinking about the problem domain, but I only hack on Rust for fun so I can’t say whether this is a problem in production situations or not.
I’ve kind of had the inverse, going from fulltime rust to python/go I’ve found myself just saying “well, if that error happens so be it, we’ll crash at runtime”. The cognitive overhead of thinking about catching errors and exceptions (or disentangling them) is large for me returning to these none/nil langs. I do recall the borrow checker absolutely ruining me early on tho. These days I tend to think a lot of the code I write in Python would survive the borrow checker. You can write ____ in any language. :D
And that’s cool :) but I think this still comes down to values. Your personal values a constructor the software, the values embodied by the product, and the values of the business surrounding the product.
Personally, my values (and interests) align more to building systems where correctness is more important than speed to delivery, where competition of products is assessed by users on more than who has the most features, and where I would rather take the steep learning curve over a shallower one if it means better design/elegance/consistency.
But these are my values. And I fully recognize that they may not align with the values of others. Particularly with many of the “high-tech”, “fast-moving” companies and “startups” where, as you said and I do agree, being a first mover is a bigger competitive advantage than having a correct (and I’ll add more generally, a higher quality) product.
I agree with all of this, and indeed it’s what I mean by “I wish the world was better calibrated for the tradeoffs Rust makes”. Specifically, the economic context is such that we’re largely replacing stuff that humans are doing manually with software, so we’re already talking about gains which dwarf any difference between Rust and the most error prone programming languages. The first mover advantages are enormous, so iteration velocity is king. This is what we mean when we say “the world values iteration speed over performance and correctness”. I wish this weren’t the case–I wish performance and correctness were the most important criteria, but until someone lends me a magic lamp, I must deal with reality.
Getting out of the city and going hiking. Might visit an art gallery nearby too.
Certainly better than in many other languages but things like the bracket function (the “default” version of which is broken due to async exceptions lol oops) are rather “meh” compared to RAII-style ownership. Because nothing forces you to avoid resource leaks… well, now Linear Haskell can do that, but being a newly retrofitted extension it’s not gonna be instantly pervasive.
TBH, Haskell is in kind of an awkward spot these days:
I love Rust, but I don’t think it’s a clear winner over Haskell personally.
In Rust, the affine types and lack of garbage collection are really great when I’m working on low-level code. As someone who has written a lot of C and a lot of Haskell, Rust undeniably hits a lot of my requirements. For a lot of day-to-day work though, I still find that I’m much more likely to pick up Haskell. Little things like higher kinded types and GADTs end up being a big force multiplier for me being able to build the sorts of APIs that work best for me. I also really value laziness and the syntactic niceties like having universal currying when I’m working in Haskell.
None of that is anything negative about Rust. I really admire what the Rust community has done. If anything, I think rustaceans are in a great position to leverage all of the things they’ve learned from Rust so that they can more quickly and easily dip a toe into Haskell and see if it might be useful to them sometimes. In the end I don’t think we have to view each other as competitors, so much as two languages that sit in somewhat different spots of the ecosystem that can learn and benefit one another.
This is exactly where I am in my PL journey (outside of work). I’ve been writing Rust for 5 years and it’s a great language for all the reasons you mentioned and more. I also found Rust a nice intro to writing real world code that is more FP than OO (i.e: structs/record and traits/type-classes instead of classes and interfaces) while still having static types (I love lisps but I tend to work better with types). Now I’m getting into Haskell and so far the process have been fairly smooth and very enlightening. The type system is far more expressive and I can see myself being highly productive in Haskell (far more than Rust) not having to worry about memory management and some of the more restrictive aspects of the Rust type system. If the learning process continues I wouldn’t be surprised if Haskell becomes my “go-to” for most problems, but Rust is still there for when I care more about performance and resource usage.
It will be interesting to see how attitudes towards resource collection shift with the advent of linear types in Haskell.
Yes, my opinion is that Rust has successfully stolen the best ideas from Haskell and made them more palatable to a mass audience.
Finishing off reading Modern C by Jens Gustedt. After spending much time exploring and working with various languages I have had an urge to get back to writing C which I haven’t done in anger since college days.
Start migrating from Arch to NixOS.
A write up of that process would be interesting!
Hope all goes smooth and well. I went through the same migration around 6 months ago. Haven’t looked back!
Nice! I’m no C testing library expert but seems similar to libcheck with the use of macros to define the tests (maybe they’re all like that) but with far less boilerplate :)
From the ‘About’ page …
Anyone here have context around the situation that led to the fork? I must have entirely missed this (but that’s not surprising as I only tinker with Golang).
Godoc.org used to be an independent, open source site that Google bought just to cancel and replace with proprietary bloat that takes >10x longer to render, and won’t display documentation for except for licenses deemed acceptable by Google, in particular it rejects public domain and WTFPL. Some people dislike that and decided to do something about it.
I never cared about “silly” licenses like the WTFPL, but now I am tempted to license all my code under WTFPL. Google won’t allow the use of WTFPL code internally, which for me is absolutely great.
You posted the same lies on HN. Please stop.
Longer reply there: https://news.ycombinator.com/item?id=29314711.
But the part about licenses is correct.
Something like this would be better then, if one wanted to be inflammatory but more correct:
Godoc.org used to be an independent, open source site that Google agreed to take stewardship over, just to cancel and replace with
proprietarybloat that takes >10x longer to render, and won’t display documentation for except for licenses deemed acceptable by Google, in particular it rejects public domain and WTFPL. Some people dislike that and decided to do something about it.EDIT: I see from the rest of the hacker news discussion that the replacement isn’t proprietary. I can’t comment on the bloat/performance aspect, new site seems fast to me. That really only leaves the licensing stuff.
If people haven’t granted the right to display the documentation then sites probably shouldn’t display it, right?
Still not really true. First off, a public domain dedication is not a licence. Second, CC0-1.0 is accepted.
WTFPL is problematic in general because the public domain doesn’t exist in all jurisdictions, which is why CC-Zero exists. If your goal is to build free software that other people can reuse, I wouldn’t recommend it.
If your goal is to spite Google, use something like AGPL instead.
WTFPL isn’t a public domain dedication. It’s an informal statement.
Which is exactly the reason cited for the creation of WTFPL. It’s basically CC-less-than-zero (and it predates CC0 by several years).
… and I’ll bet doesn’t work well on non-Chrome (maybe non-Firefox?) browsers either - and thus not on Plan 9. Whereas godocs.io seems to work nicely on Netsurf.
Edit: To my surprise, it (somewhat) does, albeit with horrendously broken layout.
Can confirm (from my experience) on Firefox that pkg.go.dev has some… paper cuts, whereas on Chrome I haven’t run into any issues. godocs.io on the other hand has worked flawlessly on both and is my go to.
Going to give learning Clojure a shot. I’ve been diving into Emacs some more so kind of excited to learn a more widely used lisp.
Good luck! I’ve used Clojure as my main language at work for nearly 2 years now, and still liking it. I also use Emacs—I would recommend configuring flycheck-clj-kondo, and lsp-mode (clojure-lsp) for jumping around. FWIW my rather simple Clojure-related Emacs config is here.
This is fantastic! Thanks @stig
Coming from which language, out of curiosity?
I write Java for the day-job, and on the side I’ve been mostly doing Rust and Go for a handful of years.
Nice writeup :)
More generally, I find that ctags are surprisingly underpowered even for robust semanticless searcher. I believe it should be easy to build a significantly more powerful tool (comment with details).
+1 in fact @matklad it was your comment that has me thinking about this a bit more. I’ve been messing around with Tree Sitter and was doing some initial comparisons with ctags which is how I came across this interesting behaviour. Hopefully more to come on this subject :)
This is exciting!