I was trying to find a FOSS alternative to VS Code’s live share feature but couldn’t. Looking online it seems live share is very flaky with VSCodium, which is to be expected. Another company called CodeTogether has the interesting feature of supporting multiple types of IDEs in the same session but is proprietary and (reasonably) paid. Also possibly bundled with management-oriented snitchware?
It really seems like a very interesting & challenging FOSS project for someone to take on. You could define an open standard API for editors to interact with the local live share service similar to the LSP, and various editor communities (neovim, emacs, etc.) could implement that client if they wanted. Then the project could have a free lightweight coordination service similar to tailscale that hands off to P2P connections (it would probably even use tailscale code for NAT/firewall traversal). The option to self-host the coordination service would of course be available, probably using one-time codes shared OOB. Don’t have the bandwidth for this myself but would be incredibly fun to work on!
I would also love to have a FOSS project to use for this, especially if it could integrate with all sorts of editors. I’d happily help with some development but like you don’t have the bandwidth to start up that project and run it.
Actually seeing the sibling comment on this post the solution might already exist; I just wasn’t thinking in terms of the UNIX philosophy of doing one thing well! You could use Tailscale to have your coworker be able to connect directly to your computer, then use wemux and a terminal editor. The LSP-like API for a shared editing session would still be a fun project though.
Another example of (proprietary) prior art, Floobits launched a decade ago, but now seems to be defunct: https://floobits.com/
Nice! What do you get for π from your real cake?
Matt Parker does this kind of thing every year; this year he found π from the tyre marks of a skidding car.
I haven’t counted the sprinkles on the cake yet haha. But based on the simulation, the value of pi would be somewhere over 2 and under 4, which is not a very good range!
That’s a really interesting video, thanks for sharing it!
I don’t recommend ever putting lifetime annotations on &mut self
. In this case it’s sufficient to only name Token
’s lifetime:
impl<'source> Scanner<'source> {
pub fn next_token(&mut self) -> Token<'source>
Lifetimes on &mut self
are very very rarely legitimately useful, but can easily lead to even worse gotcha: when you mix them with traits or make them dependent on other lifetimes with a bigger scope, they can end up meaning the call exclusively borrows the object for its entire existence (you can call one method once, and nothing else with it ever).
Good point, and thank you for the example of where it would end up causing another confusing error!
In this case I put on an explicit lifetime just to make all of them explicit, but you’re right that it’s not legitimately useful here. It would probably be better on the method, if I want to leave it for explicitness.
(For what it’s worth, I also didn’t have that parameter in the code this blog post was inspired by.)
…they can end up meaning the call exclusively borrows the object for its entire existence (you can call one method once, and nothing else with it ever).
What would be an example of this, out of curiosity?
struct Bad<'a>(&'a str);
impl<'a> Bad<'a> {
fn gameover(&'a mut self) {}
}
fn main() {
let mut b = Bad("don't put references in structs");
b.gameover();
b.gameover(); // error: cannot borrow `b` as mutable more than once at a time
}
I shot myself in the foot with the same gun while also writing a parser a few weeks ago, but there is in fact a lint for this, which one should arguably add to any new code base. https://doc.rust-lang.org/rustc/lints/listing/allowed-by-default.html#elided-lifetimes-in-paths
The Google Slicer paper(pdf) is a good read. I believe that many applications benefit greatly from an above-database stateful layer, especially at scale where hot rows and hot entity groups become a real concern, or when you find yourself doing things like polling a database for completion status.
Stateful services aren’t right for every use, but when used well they greatly simplify your architecture and/or unlock really compelling use-cases.
This is a nice piece of work, and clarifies something that people forget about stateless services: the service can have state that requires warmup, just not authoritative state. If you’re using Hack or the JVM, your stateless service already has warmup from the JIT. Having a local read cache is a similar case. If you lose the host, a new host will have worse performance for users for some time until its cache is warm.
I’d be curious to see a comparison of this approach for them vs trying VoltDB.
The memory space or filesystem of the process can be used as a brief, single-transaction cache. For example, downloading a large file, operating on it, and storing the results of the operation in the database. The twelve-factor app never assumes that anything cached in memory or on disk will be available on a future request or job[.]
When a participant starts responding to a message, they open a WebSocket connection to the server, which then holds their exercises in the connection handler. These get written out in the background to BigTable so that if the connection dies and the client reconnects to a different instance, that new instance can read their previous writes to fill up the initial local cache and maintain consistency.
Sounds like they are still following the rules by not relying on the state hehe
I’m not surprised they had to. You can certainly run stateful things in Kubernetes, but the ease at which you can roll out new versions of containers means restarts are common. And even when running multiple replicas, restarts still kill open connections (terminationGracePeriod can help but still has limits).
Well, you’re right, we’re kind of in the middle: we rely on per-connection state, but we don’t rely on it existing for a long time after the connection. We wanted to go there, too, but sticky routing was unfortunately not feasible for us.
In the spirit of breaking rules and going pretty far with just one machine, I wonder if a single machine that locally ran PostgreSQL and used an in-memory cache directly in the monolith would be even better. Sure, take periodic off-site backups, but a single bare-metal box from a provider like OVH can have pretty good uptime.
A single machine can definitely take you very far. The biggest instance (pun intended) I know of doing this is Lichess, which runs on one rather beefy machine, but I am sure there are others that are bigger or equally/more well known.
Unfortunately, that particular bet wasn’t one I could make for us ;)
Very cool writeup Ntietz. I think as more and more applications diverge from the old-school request -> DB-work -> render-output
webapp model, we’ll find ourselves “breaking the rules” more often.
This type of architecture makes me happy – Erlang/Elixir programs can very often really capitalize on this pattern (see, for example, caching user-local data in a Phoenix Channel for the duration of a socket’s existence).
Elixir and the BEAM definitely make this easy to do and can be used to great effect. I’m really excited to see what comes about with Phoenix LiveView (and the similar projects in other languages) leveraging connection state and lots of backend processing.
Programmer time is more expensive than CPU cycles. Whining about it isn’t going to change anything, and spending more of the expensive thing to buy the cheap thing is silly.
The article makes a good counterpoint:
People migrate to faster programs because faster programs allow users to do more. Look at examples from the past: the original Python-based bittorrent client was quickly overtaken by the much faster uTorrent; Subversion lost its status as the premier VCS to Git in large part because every operation was so much faster in Git; the improved grep utility, ack, is written in Perl and waning in popularity to the faster silversurfer and ripgrep; the Electron-based editor Atom has been all but replaced by VSCode, also Electron-based, but which is faster; Chrome became the king of browsers largely because it was much faster than Firefox and Internet Explorer. The fastest option eventually wins. Would your project survive if a competitor came along and was ten times faster?
That fragment is not great in my opinion. Svn-git change is about the whole architecture not about implementation speed. A lot of speedup in that case comes from not going to the server for information. Early git was mainly shell and perl too so it doesn’t quite mesh with the python example before. Calling out Python for BitTorrent is not a great example either - it’s an io-heavy app rather than processing heavy.
Vscode has way more improvements over atom and available man-hours. If it was about performance, sublime or some other graphical editor would take over from them.
I get the idea and I see what the author is aiming for, but those examples don’t support the post.
I was an enthusiastic user of BitTorrent when it was released. uTorrent was absolutely snappier and lighter than other clients. Specifically the oficial Python GUI. It blew the competition out of the watter because it was superior in its pragmacy. Perhaps python Vs c is an oversimplification. The point would still hold even in the presence of two programs written in the same language.
The same applies for git. It feels snappy and reliable. Subversion and cvs, besides being slow and clunky, would gift you a corrupted repo every other Friday afternoon. Git pulverised this non sense brutally quick.
The point is about higher quality software built with better focus, making reasonable use of resources, resulting in superior experience for the user. Not so much about a language being better than others.
BitTorrent might seem IO heavy these days; ironically this is because it has been optimised to death; but you are revising history if you think that it’s not CPU/Memory intensive and doing it in python would be crushingly slow.
The point at the end is a good one though, you must agree:
Would your project survive if a competitor came along and was ten times faster?
I was talking about the actual process not the specific implementation. You can make BitTorrent cpu-bound in any language with inefficient implementation. But the problem itself is IO bound, so any runtime should also be able to get there. (Modulo the runtime overhead)
This paragraph popped out at me as historically biased and lacking in citations or evidence. With a bit more context, the examples are hollow:
I understand the author’s feelings, but they failed to substantiate their argument at this spot.
This is true, but most programming is done for other employees, either of your company or another if you’re in commercial business software. These employees can’t shop around or (in most cases) switch, and your application only needs to be significantly better than whatever they’re doing now, in the eyes of the person writing the cheques.
I don’t like it, but I can’t see it changing much until all our tools and processes get shaken up.
But we shouldn’t ignore the users’ time. If the web app they use all day long take 2-3 seconds to load every page, that piles up quickly.
While this is obviously a nuanced issue, personally I think this is the key insight in any of it, but the whole “optimise for developer happiness/productivity, RAM is cheap, buy more RAM (etc)” line totally ignores it. Let alone the “rockstar developer” spiel. Serving users’ purposes is what software is for. A very large number of developers lose track of this because of an understandable focus on their own frustrations, and tools that make them more productive are obviously valuable, as well as meaning they have a less shitty time, which is meaningful and valuable. But building a development ideology around that doesn’t make this go away. It just makes software worse for users.
Occasionally I ask end-users in stores, doctor’s offices, etc what they think of the software they’re using, and 99% of the time they say “it’s too slow and crashes too much.”
Yes, and they’re right to do so. But spending more programming time using our current toolset is unlikely to change that, as the pressures that selected for features and delivery time over artefact quality haven’t gone anywhere. We need to fix our tools.
In an early draft, I cut out a paragraph about what I am starting to call “trickle-down devenomics”; this idea that if we optimize for the developers, users will have better software. Just like trickle-down economics, it’s just snake oil.
Alternately, you could make it not political.
Developers use tools and see beauty differently from normal people. Musicians see music differently, architects see buildings differently, and interior designers see rooms differently. That’s OK, but it means you need software people to talk to non-software people to figure out what they actually need.
Removed because I forgot to reload and multiple others gave the same argument I did in the meantime already.
I don’t buy this argument. In some (many?) cases, sure. But once you’re operating at any reasonable scale you’re spending a lot of money on compute resources. At that stage even a modest performance increase can save a lot of money. But if you closed the door on those improvements at the beginning by not thinking about performance at all, then you’re kinda out of luck.
Not to mention the environmental cost of excessive computing resources.
It’s not fair to characterize the author as “whining about” performance issues. They made a reasonable and nuanced argument.
Yes. This is true so long as you are the only option. Once there is a faster option, the faster option wins.
Why?
Not for victories in CPU time. The only thing more scarce and expensive than programmer time is…. User Time. Minimize user time and pin cpu usage at 100% and nobody will care until it causes user discomfort or loss of user time elsewhere.
Companies with slow intranets cause employees to become annoyed, and cause people to leave at some rate greater than zero.
A server costs a few thousand dollars on the high end. A smaller program costs a few tens of thousands to build and maintain and operate. That program can cost more than hundreds of thousands in management and engineer and sales and marketing and HR and quality and training and compliance salaries to use it over its life.
Finishing up the first half of Practical TLA+ and trying to work a couple of examples on my own, mostly. On the tech side. Lots of personal things keeping me busy, too.
Disclaimer: It’s been around 10 years since I was involved in creating a social network.
If I remember correctly we had a few discussions, because social networks are kind of a poster example for graphs, but one point was sharding. There were known-to-work solutions if you were using relational DBs but we didn’t know of any proven thing for the graph databases around. Also we weren’t a startup so no “revolutionizing the world by inventing the best new graph db”, and in the end a lot of it was trying to not spend the innovation budget on something like this, boring is better, as we were supposed to hand off the thing to the company we were building it for, so operationally it should be easy to run. Of course easy is relative, but mysql (or postgres?) was a known fact. Oh, and not to forget: in the end you can model the graph relations quite easily with a RDBMS, so why bother?
Adding on this, to the best of my knowledge there are still no great ways to partition graphs across nodes and it’s been shown to be a hard problem (NP hard or complete, depending on some factors). Intuitively this should make sense: social graphs (for example) have very small diameters and you’re likely to have a lot of edge crossings between different compute nodes.
That said, I don’t think that distributing a graph DB is a big deal. You really don’t need it: (almost) any graph you would work with will fit in memory (it might be a big machine but it’ll still fit!) and replication is an easier problem.
Disclosure: I worked for TigerGraph. Not sure disclosure is even necessary—I left in 2015. But I have a financial interest in graph databases.
social networks are kind of a poster example for graphs, but one point was sharding.
Yup. Even with a relational database, it’s hard. To exaggerate only slightly, this is one of the major reasons LiveJournal (the first real social network) failed. Performance was killing them, and they had to throttle growth via invite-codes for a few years while they worked out how to do clustering with MySQL, and cache as much as possible in RAM. This was c.2001, before any of this was common, and some of the tools BradFitz invented, like memcached, are still in use today. End result was they couldn’t grow fast enough, and they didnt have resources to evolve the UX or feature set enough. By the time Facebook caught on, they were doomed (sob!)
It’s a bit unfair to blame everything on performance though, because I remember LiveJournal in its heyday, but there were a few reasons I never signed up. I hated the design, it didn’t look like a social network, it looked like a collection of blogs to me (without support to bring your own domain) and also 90% of where I ended up by clicking random links to LJ, it was only fanfic. I never noticed anything slow, but I can’t exactly tell you if this was indeed more 2001 or more 2007.
Yeah, as I said, the constant firefighting to keep the servers from overloading meant they couldn’t evolve the UI and feature set. Another big reason was that, after MovableType bought them, they made the fatal mistake of building an all-new service, which looked very nice but flopped, instead of improving LJ.
Sure, and maybe I misunderstood the point, but I wanted to talk about “social networks” as they are kinda clearly defined to the general populace, like Facebook and not any community of online people. Might be narrow, might be wrong, but the frills were not the point, but that “era” of consolidation towards single closed mass networks and not anything open or small.
Man, how often did I wish when doing database query generation “please $DB, just let me hand you queries in your own internal format, instead of making me write SQL”.
So I agree with the criticism of the author, but as mentioned in the end of the article … what to do with all the knowledge we have now?
It seems that many previous alternative were not successful:
So what can “we” do, to improve the state of the art?
In my opinion: demonstrating and specifying a practical, well-designed¹ language that various databases could implement on their own as an alternative to SQL.
¹ Not going into that here.
A Datalog variant. I had a lot of fun playing with differential-datalog, and there’s Logica which compiles datalog to SQL for a variety of SQL dialects.
Relational data maps pretty well to most business domains. NoSQL and ORMs throw out the baby with the bathwater for different reasons (turfing the entire model with NoSQL, trying force two different views of modelling the domain to kiss with ORMs). Anything that makes a join hard isn’t a good idea when an RDBMS is involved.
I think what might be interesting is instead of contorting the RDBMS model to work with OO languages like ORMs do, do the reverse: a relational programming language. I don’t know what that could look like though.
Relational data maps pretty well to most business domains. NoSQL and ORMs throw out the baby with the bathwater for different reasons (turfing the entire model with NoSQL, trying force two different views of modelling the domain to kiss with ORMs). Anything that makes a join hard isn’t a good idea when an RDBMS is involved.
Agreed with the conclusion and I have nothing good to say about most NoSQL systems other than that rescuing companies from them is a lucrative career, but I think this criticism of ORMs is over-broad.
A good ORM will take the scut-work out of database queries in a clean and standardized-across-codebases way without at all getting in your way when accessing deep database features, doing whatever joins you want, etc. I’d throw modern Rails ActiveRecord (without getting in the weeds on Arel) as a good ORM which automates the pointless work while staying out of your way when you want to do something more complicated.
A bad ORM will definitely try to “hide” the database from you in ways that just make everything way too complicated the second you want to do something as simple as specify a specific type of join. Django’s shockingly awful “QuerySet” ORM definitely falls in this camp, as I’ve recently had the misfortune of trying to make it do fairly simple things.
I’m very surprised to see ActiveRecord used as an example if something which stays out if your way. The amount of time I have spent fighting to get it to generate the SQL I wanted is why I never use it unless I’m being paid a lot to do so.
Really? It’s extremely easy to drop to raw SQL, and to intermix that with generated statements – and I’ve done a lot of really custom heavy lifting with it over the years. Admittedly this may not be well documented and I may just be taking advantage of a lot of deep knowledge of the framework, here.
The contrast is pretty stark to me compared to something like Django, whose devs steadfastly refuse to allow you to specify joins and which, while offering a raw SQL escape hatch, has a different intermediate result type for raw SQL queries (RawQuerySet vs QuerySet) with different methods, meaning details of how you formed a query (raw vs ORM API) leak into all consuming layers and you can’t switch one for the other at the data layer without breaking everything upstream (hilariously the accepted community “solution” to this seems to be to write your raw query, then wrap an ORM API call around it that generates a “select * from (raw query)”??).
ActiveRecord has none of these issues in my experience – joins can be manually specified, raw clauses inserted, raw SQL is transparent and intermixable with ORM statements with no impedance mismatch. Even aggregation/deaggregation approaches like unions, unnest(), etc that breaks the table-to-class and column-to-property assumptions can still be made to work cleanly. It’s really night and day.
Not the commenter you’re asking but they’re both tools that reduce initial amount of learning at the cost of abandoning features that make complexity and maintainability easier to handle.
I’m not sure that that’s true, though. ORMs make a lot of domain logic easier to maintain—it’s not about reducing initial learning, it’s about shifting where you deal with complexity (is it complexity in your domain or in scaling or ???). Similar with NoSQL—it’s not a monolithic thing at all and most of those NoSQL databases require similar upfront learnings (document DBs, graph DBs, etc. all require significant upfront learning to utilize well). Again, it’s a trade off of what supports your use case well.
I’m just not sure what the GP meant by “junior developer ideas” (it feels disparaging of these, and those who use them, but I won’t jump to conclusions). They also are by no stretch “worse than using SQL”. They are sometimes worse and sometimes better. Tradeoffs.
I agree with you on the tradeoffs. I’m not sure I agree on the domain logic thing. In my experience orms make things easier until they don’t, in part because you’ve baked your database schema into your code. Sometimes directly generating queries allows changes to happen in the schema without the program needing to change its data model immediately.
So, where does that take us? Well, we want to do engineering to solve problems. I think that means, practically speaking, we need to focus on the specification and verification steps
Respectfully, I disagree. After thrashing around in this area for many years, I’m convinced that the code doesn’t matter, although coding is the thing most of us love doing.
Tests are the things that are most important, whether implied or explicit. Most tests, of course, are sub rosa; they seem too trivial for anybody to ever write down. (Until they fail, of course)
What we need to work on are truly modular and composable tests, something that scales out. my formal methods should revolve around those. I understand that this can be construed as saying the same thing, but there are several subtle differences between the two concepts.
I think, at least directionally, we agree: the focus is on showing that your code does what you want it to do, and the code itself doesn’t really matter.
Where I found myself disagreeing was with the initial premise … “The job of a software engineer is not to produce code, but to solve problems”.
While that is the job for some, a large proportion have the job of modeling a system in code, and the two are different. A problem that culminates in an algorithm is certainly a candidate for a specification that can be formalized. Is that the case for a complex problem domain under consideration that is a model of a system - real or imagined?
That new innovative but complex international payroll system that needs to be built, faces an entirely different set of issues. How are functional requirements and domain constraints determined and modeled in code? How are non-functional requirements determined and met within project/product and organizational constraints?
Perhaps at question is the simplistic viewpoint that all software development is just the transformation of data. I disagree with such a contention. While inherently accurate, it’s similar to saying that the materials used to construct furniture is just atoms.
How to model complex systems in code in any formal and predictable way remains out of reach, and is why calls for applying the terms “formal methods” and “engineering” to software development is inapplicable for many?
Even cutting-edge domain problems still use a lot of basic infrastructural code to run. Does the international payroll system use a rules engine? Is it running batch jobs as background tasks, or triggering anything reactively? How are you distinguishing employee benefits from benefit classes? All of those are places where formal methods help.
More fundamentally, it’s possible for stated requirements to not cover a situation, or give you contradictory results on how to handle a weird edge case. Shouldn’t it be possible to find those out before you’ve written all the code?
Sure there are parts where formal methods might be applied. But I contend they simply are not “the” answer to what is fundamentally a different problem. As is the case all too often in our profession, I see the original author as generalizing all software development towards a particular “silver bullet”.
Yeah, I definitely don’t think formal methods are “the” answer. I think there are more people who could use them that don’t than people who don’t need them but think they do, but it’s ludicrous to collapse all of software engineering down to a specific technique.
I didn’t mean to (and I don’t think I did) generalize that all development has to lean toward formal methods, nor is it a silver bullet. But I think software engineering as a field needs it, and there are a lot of bits of critical code that need it, and as a trend I think we’re pushing toward making it easier to use and using it in more places.
I do agree. Formal methods can certainly be of benefit.
My suggestion is to better define where, for what types of software and under what circumstances.
That is what would really benefit the developer community in my opinion.
As an aside and by way of apology, I was updating my answer as you replied. It my weird (?) way of writing something and then changing it for the first few minutes after first submitting it. HN handles this well by giving 10 minutes in which one can change a reply before it is published. Not sure if Lobsters does the same.
Have you looked at Ada/SPARK? Since SPARK is an Ada subset and both can exist in the same project, you write SPARK when you need verification and Ada if there’s areas where you don’t need it or don’t need it yet. There’s even proved libraries written which are in the Ada/SPARK package manager, Alire.
It’s on my list of things to look into! TLA+ is earlier on my list but Ada and SPARK are near the top, too.
Mostly hanging out with family, but I’m also planning on writing at least one blog post (checked that one off the list), hacking at my side project, and maaaaaybe starting to learn TLA+.
I wonder if this model could be turned on it’s head to score each region of code by its expected bugginess.
“danger (or congrats): no one in the history of time has ever written anything like this before”
Although, I suppose the output might be less than useful: “I have a vague feeling that this might be wrong but I can’t explain why”.
That could be incredibly useful as a code review tool! Kind of gives you a heatmap of which spots to focus most attention on as a code reviewer. I want it yesterday.
Hm; OTOH, if a bug is common enough to have a major presence in the input corpus, I see how it could result in a false positive “green” mark for a faulty fragment of code… super interesting questions, for sure :) maybe it should only be used for “red” coloring, the rest being left as “unrated”.
Huh, is that a spin-off from Oxide’s experience building server cases/racks/etc? That would be like, business level yak shaving :)
I’ve always used wemux for this (it is a pretty thin wrapper around changing socket permissions and
tmux new-session -t
), it makes all terminal work multi-player.Changing editors seems like solving the problem at a more difficult layer. That said. if I can talk people into it, kakoune has a multi-session mode.
I read this expecting to say “use gnu screen” but a fair point is letting the different participants scroll the file independently, and a shared terminal wouldn’t make that work/
Tmux can have multiple independent sessions per window set (note I said new-session -t, not attach) so you can actually do this. I don’t think screen can.
How do you handle NAT/firewall traversal? Or do you just have the repo live on a cloud VM and both ssh into it? I suppose VPN software like Tailscale also works actually.
Through a VPN with a guest account on my dev box (which is not my desktop). There needs to be some kind of connection.
That’s a great option to have available, totally.
Changing editors was for me lower friction than using any sort of tmux tooling because, though I use tmux and vim, none of my collaborators do, so it would be forcing them to switch editors. I think the switch from vim to VS Code is easier than the other way around, so this made for a smoother pairing experience (for me).