I’d expect that part of the kernel slowdown is simply that as the process RSS grows the kernel must do more and more work to manipulate the page table permissions and virtual memory areas as it sets everything to copy-on-write. This is probably especially likely if more VMAs are added instead of just existing ones grown.
That makes sense to me, I just didn’t think that would account for that much time! I would be curious to see if using mmap versus messing with the program break would affect things differently.
I do know that Linux has consistently forked dynamically linked programs measurably slower than statically linked ones. It’s possible that dynamic linking causes more work at fork() time (perhaps both in the kernel and in libc), but I think the big difference is the number of memory pages and VMAs in static versus dynamic processes (since dynamic ones have VMAs for all their mmap’d shared libraries).
Oh, that’s really interesting! That would be another interesting dimension to cover. Now I can’t wait to finish this series of posts to work on the fork() one =)
Another wrench to throw into the mix, the libc being used will cause different behavior as well.
You may want to compare glibc versus musl versus uclibc etc…
Also want to compare fork() when different locales are present as that can add more fun utf8 type parsing.
Both spectacular ideas! I’ll add them to my notes!