1. 59

  2. 9

    She should be able to use dtrace to investigate deeper.

    1. 13

      Where would you start with using dtrace to investigate this? I already know the responsible system call – is the idea that I could use dtrace somehow to trace what resource is under contention inside the kernel?

      I’ve basically never used dtrace except running dtruss occasionally to look at system calls so it’s pretty unclear to me where to start.

      1. 12

        I’m not a dtrace master, or even low-level amateur but this might be a first start:

        sudo dtrace -n ':::/pid == $target/{@[stack()] = count();} tick-5s {exit(0);}' -p PID_OF_YOUR_STUCK_PROGRAM

        That will sample the kernel stack of the process any time something happens with it and count them, then after 5 seconds it’ll print the kernel stacktraces out in ascending order of count. You can also look at -c instead of -p

        That’ll probably give you way too much stuff, though. You can only get the stack traces when it does a syscall with:

        sudo dtrace -n 'syscall:::/pid == $target/{@[stack()] = count();} tick-5s {exit(0);}' -p PID_OF_YOUR_STUCK_PROGRAM

        Maybe what you could do is use dtruss to figure out what it’s stuck in (probably a syscall?) and then use dtrace to see what’s going on there. For example, to see what syscalls were called in doing sleep 10 I did (on FreeBSD):

        sudo dtrace -n 'syscall:::/pid == $target/{}' -c "sleep 10"

        And I got a bunch of output, where it clearly sat for 10 seconds on:

          2  80973                  nanosleep:entry

        Then, to see exactly what goes on in the kernel for this process between nanosleep:entry and nanosleep:return, I did:

        sudo dtrace -n 'BEGIN {trc = 0} syscall::nanosleep:entry /pid == $target/ {trc = 1} syscall::nanosleep:return /pid == $target/{trc = 0;} ::::/pid == $target && trc == 1/{@[stack()] = count();}' -c "sleep 10"

        Kind of hard to read but if you pull the stuff in quotes out you can see it’s using a variable called trc and for this pid, when nanosleep is entered it sets the variable to 1, and when nanosleep returns it sets trc to 0. Then, for any probe for this pid if trc is 1 then record the kernel stack. I got a bunch of output in that.

        Hopefully that is helpful. I’m not near a trace wizard so I’m sure there is some much more clever things one can do, but that might be a start to digging.

        You can see probes available to you with dtrace -l.

    2. 4

      This bug happened for me too at 8 December (found my tweet about it), seems in 10.13.2, but I don’t know after what action ps stopped responding until reboot.

      10.13’s quality is very poor :(

      1. [Comment from banned user removed]

        1. 3

          The last two comments I’ve seen from this user seem like the inverse of the friendlysock experiment. If this isn’t intentional, I’d highly recommend reading the blog post and reconsidering your posting style.

          1. 2

            I would like to know, why are you people down-voting stefantalpalaru for that comment?

            I am not a native speaker nor in the US, that remark was insightful for me - am I missing something except it (the comment) being slightly snarky?

            1. 32

              I’m sort of used to people making fun of my writing style (people complain about my use of exclamation marks on the internet every month or so, complaining about question marks is a new one :) ) but in general I find technical comments on my posts much more interesting.

              I’m honestly a bit disappointed by this comment – i tend to think of lobste.rs as a place where people try to have more substantive technical discussions about posts, as opposed to hacker news where comment threads frequently get derailed by conversations about irrelevant things and I end up not learning anything by reading the comments. To me the point of tech discussion sites like this is to discuss the technology! (for example: how could a kernel bug like this happen? have you run into other similar bugs on Mac/Linux? How did you debug them? Can you use dtrace to discover more about what’s going on inside the kernel?).

              There are so many interesting questions to talk about, and I think it’s kind of a shame to waste time making nitpicky comments about the use of a question mark in the title :)

              1. 11

                As a linguist who’s read enough language written without punctuation (Latin and Greek), I’d like to thank you for your use of punctuation, and to encourage it.

                Latin, fun fact, has two words to introduce questions, one that introduces questions where you expect an affirmative answer (“nonne”), and one that introduces questions where you expect a negative answer (“num”), and the interrobang was only invented millennia later. It’s always useful to have a metachannel conveying subtext, and punctuation is compact.

                “I think I found a Mac kernel bug.” sounds definitive, and immediately puts a team of kernel hackers on the defensive. “I think I found a Mac kernel bug?” sounds rather surprised at oneself, and emphasizes the incredulity that you’d posted on Twitter, that it was 4 days from kernel hacking to finding a bug, that you’d expected that people would have found it, and generally is the spirit of humility and exploration that has made your writings so interesting to read!

                Thank you for exploring syscalls :)

                1. 2

                  So, however insignificant, this issue has, believe it or not, been (low-key) bugging me since this (sub)thread happened. I’m purely concerned with the linguistic question taken at face value, since I vaguely concur with the annoyance at the question mark (in the sense that I would feel odd to write in that style that myself, though I don’t care to tell anyone else what they should prefer). The reason it’s been bugging me is that it’s obvious that “just drop the question mark” can’t work, precisely because it significantly alters the quality of what is being expressed – as you stated. So how would I say that?

                  And I think I just realised the answer: the way to correctly express that sentiment in a more formal register is simply “Have I really found a Mac kernel bug?” D’uh, I guess.

                  1. 1

                    Absolutely. And there’s “I think I might have found a Mac kernel bug” in slightly more formal colloquial registers, “Discovery of potential Mac kernel bug” for a title of some Technical Letter to a journal 50 years ago. More formal titles have fewer questions.

                    And we’ve been repurposing punctuation to convey pitch of a sentence when spoken, useful to convey one’s meaning when writing. Sometimes it’s a question mark to convey High Rising Terminal, sometimes it’s comma splices and lack of terminal period to convey a fading train of thought, it’s a fun writing constraint, you should try it

                2. 8

                  Thanks for taking the time to reply. I was asking because I felt I might be missing some language slang/common use that was pointed out here.

                  Regarding your blog posts: I love reading them, your technical content is sound, delivered in a fun way and a dive into things I rarely look at myself - I’m following all your ruby profiler posts. Keep up what you are doing, the silent majority appreciates it ;)

                3. 11

                  the high rising terminal - often associated with “valleyspeak” - is stereotypically associated with shallow, unintelligent women, especially in american pop culture.

                  If anyone else on the site had asked about this, I’d wager we would see far less common contentious voting patterns. But hell, let’s call a spade a spade: I’ve seen enough of OPs previous comments to have a pretty good guess at what he’s doing when he made that comment - and I wager the downvoters did too.

                  1. 7

                    As a meta-discourse thing, I don’t really like this kind of comment even from people whose good faith I’m confident of. It’s really easy for a forum to fall into a pattern where 90% of the discussion is about pretty superficial aspects of the posts, especially in a dismissive way. I wouldn’t say that kind of thing is always off-topic, but I guess I try to think: is this observation novel and non-obvious enough that someone reading the comment learns something? Usually when I’ve been tempted to post a comment complaining about superficial aspects of a post (and there are definitely things I dislike and am tempted to comment on!) it’s hard for me to argue with a straight face that the answer is “yes”.