1. 8

I’m not that good with assembly. Could someone explain me what is happening on this line:


I now that it prepares the location of the stack for clone. I think that it adds STACK_SIZE to have the end of the allocated stack, but why is - 8 needed?


  2. 3

    I don’t know Linux threading at this level, but I’ve seen similar things done in assembly so a new thread/process will ret down a different code path.

    Poking around, the blog post linked from the readme says:

    So we need to add STACK_SIZE to rax to get the high end. This is done with the lea instruction: load effective address. Despite the brackets, it doesn’t actually read memory at that address, but instead stores the address in the destination register (rsi). I’ve moved it back by 8 bytes because I’m going to place the thread function pointer at the “top” of the new stack in the next instruction. You’ll see why in a moment.


    A new thread will be created and the syscall will return in each of the two threads at the same instruction, exactly like fork(). All registers will be identical between the threads, except for rax, which will be 0 in the new thread, and rsp which has the same value as rsi in the new thread (the pointer to the new stack).

    Now here’s the really cool part, and the reason branching isn’t needed. There’s no reason to check rax to determine if we are the original thread (in which case we return to the caller) or if we’re the new thread (in which case we jump to the thread function). Remember how we seeded the new stack with the thread function? When the new thread returns (ret), it will jump to the thread function with a completely empty stack. The original thread, using the original stack, will return to the caller.

    Hey, I guessed right, yay.

    1. 1

      Hmm – sadly doesn’t appear to account for thread-local storage (“the other TLS”), which I was hoping to see.