The Instagram crew deserve credit for this, and it’s a good example of how understanding things under the hood has real-life benefits.
The sad thing is that http://www.rubyenterpriseedition.com did something similar for Ruby years ago, but they got stuck at Ruby 1.8 and neither Ruby nor Python have followed up on the ideas. I would be happy to be proven wrong.
EDIT: Proven wrong! See about Ruby 2 in comment.
The whole “run a final gc” is amazing. The OS provides your best gc. Use it. Don’t close files. Don’t free memory (and walk all over possibly-paged-out memory figuring out what to free), just die already. I will try to dig up an article about this, from the 90s. Possibly it was djb who wrote it.
Maybe the “fork after bootstrap, then die-as-gc after a while” pattern may have a future in a less bespoke way.
An interesting further inquiry along these lines would be to create a Python image to start from, instead of running the bootstrap every time. Just mmap that initial thing in there and share memory between all the processes on one machine (or even between virtual machines).
Ruby 2.0 implemented out-of-line bitmap marking, which is much more CoW friendly. Phusion didn’t port their REE patches, because they were no longer needed.
BTW, macOS codified the “just die already” philosophy with an application flag that says “when the system is shutting down, just kill me”. (See Apple’s doc on Sudden Termination.)
Nice and nice!
So, Python has its work cut out for it, then.
This is in fact part of Erlang’s (and the actor models') secret sauce.
Actors have their own independently GCed heap. Because of their relatively small size and simplicity, the GC is simple and fast and can be hidden at actor context switches. And, when an actor dies, GC of that actor’s memory is as simple as reclaiming their entire heap. The result is that erlang programs tend to have very low latency and operate essentially pauselessly even under high load.
Ah, yes, great example!
Don’t free memory (and walk all over possibly-paged-out memory figuring out what to free), just die already. I will try to dig up an article about this, from the 90s. Possibly it was djb who wrote it.
Not sure if this is the one you’re thinking of - So what’s wrong with 1975 programming? by PHK.
[Comment removed by author]
I don’t see what point you are disagreeing with. The article is about using cache-oblivious techniques rather than micro-managing memory when you don’t know the case you’re optimizing for, you might not be the only tenant, and you might not even know the parameters of the entire system.
It is still the case that the system probably knows better than you if you or some other process could be exiled to tape, and if it does a bad job at that, the system is what should be fixed.
I love that article and recommend anyone to read it, but it’s not what I was thinking of.
I think the ipfs people read it too, because ipfs daemon on my machine maps 500 GB (!) VM while keeping only 500 MB resident. No idea what they are doing with it, I don’t even have that much space on disk. :-)
I looked around a bit for any posts by djb, pape or fefe on resource management, cleanup, etc, but still didn’t find what I’m thinking of.
[Comment from banned user removed]
I voted up. But, it’s also interesting to consider Whether or not Instagram would have been as successful if they choose another language. Did Python grant them the ability to hire the exact right people at the time? Did it give them that extra productivity boost? But, now that they are successful, and can presumably grow the team out considerably, why haven’t they invested in other langs? Maybe they have? Maybe it’s truly not worth it? I mean, they are using Django and Postgres at their scale. That’s actually pretty damn impressive when you think about it.
they are using Django and Postgres at their scale. That’s actually pretty damn impressive when you think about it.
I feel like this is kind of like saying Facebook is using PHP and MySQL: technically true, but they’ve spent huge amounts of resources to heavily customise them to the point of probably being unrecognisable.
Well, it looks like it’s not just Postgres anymore (shows how up to date I am!), but as of 2015 a slew of other things. I actually don’t think I’d be surprised if they used pretty stock Django (and maybe avoided or replaced some parts of it) for serving web requests. I’d expect there to be custom, hyper-optimized stuff for things like search, recommendations and other stuff like that, though. But, again, what do I know?
Does anyone know if they tried PyPy? Is cache performance or total memory size their real constraint?
PyPy needs to use the same memory structures as CPython. I would expect that to mean that it also needs to touch memory during collection, but I didn’t quite understand from the article why CPython did that. Because it added the objects to Python lists and had to increase their refcount? Sounds like something both Pythons should be able to avoid.
CPython’s garbage-collector is reference-count based, which means that when you free a container (like a list), you then have to go through and decrement the reference count of everything that was contained (and if it hits zero, free it and decrement the reference counts of everything it held onto. This is particularly egregious in Python because the language is built on hashmaps - every object, every class, every module has (or sometimes is) a hashmap, so there’s a lot of ref-counting that needs to be done.
This is also why CPython has a GIL: when any part of the program can reach into any other part at any time and increment or decrement its reference count, the either you need a lock around absolutely everything, or just say “to hell with it” and make one big lock for the whole thing.
PyPy does not have a GIL, and it does not use reference-count-based garbage collection, so on the face of it, it seems like it would be a lot more CoW-friendly. The docs also say that PyPy does not do a gc.collect() at shutdown, which is another change mentioned in the article.
PyPy does not have a GIL
According to their FAQ, they still have a gil. They were looking into replacing the GIL with STM a while ago, but I don’t recall that ever landing.
Ah, my mistake. I knew they did not use reference-count-based garbage collection, and I knew that was the big sticking point for the CPython GIL, so I leapt to a conclusion.
The GIL is necessary for compatibility with CPython extensions. Other implementations have removed their GIL because they live in other contexts, but PyPy wants to retain that compatibility and be a CPython drop-in replacement.
I’m surprised they were able to remove the reference-counting, but I don’t know enough about Python extensions to know why, maybe they indicate ownership of objecs in some other way? Or does PyPy not use refcounting itself, but supports it alongside gc, to support extensions? Or do some extensions just not work with PyPy even though most do?
Maybe I missed something but I was hoping/expecting to see some info on reducing the need for GC by eliminating cycles. Python can theoretically run without GC and only rely on refcounts, although in practice many libraries create cycles.
Umm, yeah, was it early Perl5 which didn’t collect reference cycles? I think I remember writing code to break them manually as part of an object destructor. It was a little annoying to have to do it, but not hard and improved performance considerably.
There is a lot of in going research in reference counting, which has produced reference counting implementations that are comparable to tracing GCs. They tend to do things like delay decrements or do them all at once, or other tricks like that. See this paper.
If CPython holds on to ref counting, it might be interesting to see if some of that work would make a difference.