Recently, I needed to fetch the parent PID of arbitrary (Linux) processes from userspace (in C). For some reason, no standards define this function (pid_t ppid(pid_t)
).
The primary and only supported version to accomplish this is by reading the first process’ status file exported by the kernel in /proc/<pid>/stat
(which is only available in Linux). This file contains the parent’s PID as the fourth parameter, space separated, preceded by, among others, the executable name (in parentheses, for some reason).
Documentation (as in, man proc
) tells us to parse this file using the scanf
family, even providing the proper escape codes - which are subtly wrong.
When including a space character in the executable name, the %s
escape will not read all of the executable name, breaking all subsequent reads. This problem affects some tools using this recommended method of parsing. Using the parentheses as separators does not work for the same reason, as does ‘read until we see numbers again’.
The only reasonable way to do this with the current layout of the stats file would be to read all of the file and scan it from the end, since there are only numeric parameters after the executable name / current state fields, or search for the last closing parenthesis in the file.
The proper fix (aside from introducing the above function) however should probably be to either sanitize the executable name before exposing it to /proc/<pid>/stat
, or move it to be the last parameter in the file.
This problem could potentially be used to feed process-controlled data to all tools relying on reading /proc/<pid>/stat
using the recommended method (which includes several monitoring tools).
This was also a problem for sudo.
This has always been a thorn in my side. A lot of people talk about POSIX but how many people actually code to POSIX? There’s so much you can’t do in POSIX. You can’t even get the size of a disk POSIXly from C. You have to either shell out or rely on system-specific utilities. I found the standard to be so narrow to be essentially useless and instead had to rely on
#ifdef LINUX
,#ifdef OPENBSD
, and so on.POSIX is a very, very low bar to clear if you want to write an OS that conforms to it.
I wrote a Rust library for reading process information from
/proc
a few years back, and ended up with a list of notes on strange problems I had with the documentation and implementation (process.rs#L159-L206).That’s not the only place such tricks are possible in /proc. You can create “abstract namespace” UNIX sockets (those whose .sun_path begins with ‘\0’), embedding newlines into the name. The result is the ability to insert arbitrary new lines into /proc/net/stat/unix (IIRC). Sorry, noticed this a decade ago, can’t remember which file precisely
Perhaps the “human-friendly”
/proc/$PID/status
might in fact be the more robustly-parseable option – it escapes linefeeds (the relevant separator) inargv[0]
(and includes ppid, for what it’s worth).True, but it also incurs a parsing overhead in having to match strings and requiring to read a lot more data. I didn’t test this, but I could see the same problem coming up with executable names with newlines in them.
Except that, as I said, newlines (a.k.a. linefeeds) are escaped: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/fs/proc/array.c?id=c643401218be0f4ab3522e0c0a63016596d6e9ca#n98
You’re right, thanks. Parsing
status
would indeed sidestep that problem.On my PC at least, the process name has parens around it, so you can parse it with the Lua pattern
%b()
. Fishing out the parent PID would be like%S+ %b() %S+ (%S+)
.I don’t think the problem here is that /proc/pid/stat is broken, rather the string parsing tools in libc are not good enough and shouldn’t be used.
The OpenBSD guys already ripped the patterns code out of Lua for their web server (patterns.c/.h), so it’s simple to add it to your own code base.
Assuming the rules for process names is the same as the rules for file names, a process name can contain unbalanced parens. What if a process is named for example
hello) world
? The string in /proc//stat would be(hello) world)
, and %b() would just find(hello)
. This doesn’t seem like something which could be fixed by smarter parsing, other than by reading from the end.Oh good point,
%((.*)%)
works, assuming they don’t add parens to any other fields.As noted in the original post, even using the parentheses does not guard against this. The only thing I can think of to safely use the current format is scanning the whole file into a buffer and searching for the last closing parenthesis, then taking everything from the first opening parenthesis to the last as executable name.
This is also not specific to C or libc, this format is bad to parse (and the mistake easy to make) with any language.
When you think about it, whichever character is going to be used as delimiter it’s probably also going to be allowed in the filename (anything other than \n or \0). So I’m not sure what the solution to this would be.
Then put the filename alone on the first line, or put the file name last so that you can read in the numeric parameters then the rest of the line is the filename. Right?
Seems like the simplest solution. And it’s not like files in /proc don’t use multilines already. I guess it’s made that way for legacy support. However, as mort mentioned you can also have \n in a filename.
Files can actually contain newlines. Try for example
touch "$(printf "hello\nworld")"
.It would probably work to use a slash as a separator though, unless the executable name might be a path.
EDIT: added quotes around
$(printf "hello\nworld")
This creates two files where I’m testing. But it seems like I’m able to do it without the printf.
Sorry, I should’ve written
touch "$(printf "hello\nworld")"
.If you just run
ls
, it will show you'hello'$'\n''world'
, but if you redirect pipe ls (for example toless
), it will show up on two separate lines.Or use a “safer” format like netstring, tnetstrings, or maybe even Bencode.
Or expose structs over sysctls and ioctls instead of making these damn virtual filesystems…
I’ve been testing yesterday getting those information through
netlinks
. There’s a kernel configuration calledCONFIG_TASKSTATS
(check if it’s enabled in your kernel config first /boo/config* or /proc/config.gz).The documentation can be found here: https://www.kernel.org/doc/Documentation/accounting/taskstats.txt There are a bunch of C headers to be able to access the features and there are even Go and Python libraries. I’ve been testing with the Python one, gnlpy (https://www.pydoc.io/pypi/gnlpy-0.1.2/autoapi/taskstats/index.html).
However I’ve been running into trouble with the permission part, the
capabilities(7)
. This is something that I haven’t found that much documentation on. This rfc http://www.faqs.org/rfcs/rfc3549.html and this manpage https://linux.die.net/man/7/netlink say that users need the capability cap_net_admin but it’s not doing it for me.I’ve been trying
sudo setcap 'cap_net_admin=p cap_net_admin+i cap_net_admin+e' t.py
but it still doesn’t execute as a normal user. But it works perfectly as root.Maybe someone here has more info on the topic.
EDIT: As 1amzave, copying the python interpreter and assigning the capabilities on it works fine.
I’m gonna hazard a guess that your
setcap
not being effective might have something to do with it being interpreted (via a shebang line) rather than a directly-executed binary. Maybe create a copy of your python interpreter (presumably you don’t want to blindly grantCAP_NET_ADMIN
to all python code),setcap
that, and change your shebang line to use it instead.You’re right, that was the issue. Copying the python interpreter in a home directory and setting the capabilities on it did the trick. The python script itself doesn’t need capabilities.
Overall, I think netlink is great but unlike procfs it’s not that easily accessible.
ASCII actually has field delimiter characters, which are very rarely used – it’s a shame, because it would make parsing trivial in cases like that.
Not really, because file names can contain those field delimiter characters.
It would’ve been nice if there were stricter rules about what charcaters are allowed in a file name. When would you ever want a newline, or field delimiter, or carriage return, or BEL, in a file name?
When I want to play a small prank by making ls in a given directory make the terminal beep. :)
Field delimiters are also valid file names…
This sounds serious! Is the content of the pseudo-file associating incorrect PIDs or parent PIDs to processes?
Let’s continue…
So it’s a documentation issue…
I have literally never encountered an executable with a space in the name, although it’s perfectly legal from a file name perspective. (I’ve been a Linux user since 1998).
So… let’s do this instead?
Sounds reasonable to me.
Thus breaking all existing implementations that rely on the documentation in
man proc
. But I guess it can be done in some backwardly compatible way?I can’t really parse this. Do you mean “affect” instead of “used”?
In conclusion: I can’t see any evidence of the functionality of this
proc
pseudo-file being “broken”. You have encountered an edge case (an executable name with a whitespace character in it). You’ve even suggested a workaround (scan from the end). If you had formulated this post as “here’s a workaround for this edge case” I believe you would have made a stronger case.Well,
tmux
does this, for example. But my primary concern is not has it ever happened to me but, if it happens, what will my code do?. As this is a silent failure (as in, the recommended method fails in a non-obvious way without indicating failure), no action is taken by most implementations to guard against this. That, in my mind, counts as broken, and the least thing to do is to fix the documentation. Or expose single parameters in files instead of a huge conglomeration with parsing issues. Or… see above.I do, but only after I got sceptical while reading the documentation, ran some tests and had my hunch confirmed. Then I checked to see others making that very mistake.
No, I don’t think so - except for introducing single-value files (and leaving
/proc/<pid>/stats
be as it is).Admittedly, English is not my first language, I do however think that sentence parses just fine. The discussed problem (which is present in several implementations based on the documentation), can potentially be used to inject data (controlled by the process, instead of the kernel) into third-party software.
That depends on your view of broken - if erroneous documentation affecting close to all software relying on it with a silent failure does not sound broken to you, I guess it is not.
I actually did not encounter it per se, I just noticed the possibility for it. But it is an undocumented edge case.
I believe that is good form.
Maybe, but as we can see by the examples of recent vulnerabilities, you’ll need a catchy name and a logo to really get attention, so in my book I’m OK.
Thanks for taking the time to answer the questions I have raised.
Much clearer, thanks.
On the use of “broken”I’m maybe extra sensitive to this as I work in supporting a commercial software application. For both legal and SLA[1] we require our customers to be precise in their communication about the issues they face.
[1] Service level agreement
Followup: can you give a specific example of how
tmux
does this? I checked the running instances of that application on my machine and only found the single wordtmux
in the output of stat files of the PIDs returned bypgrep
.On my Debian 9 machine, when starting a
tmux
host session, the corresponding/proc/<pid>/stat
file contains:“Thus breaking all existing implementations that rely on the documentation in man proc. But I guess it can be done in some backwardly compatible way?”
I will never get the 100ms it took to read this sentence back….
I dunno, maybe just duplicate the information at the end of the current format, in the author’s preferred format, and delimited by some character not otherwise part of the spec.
It’s not trivial, though.
That was my point.
this was clearly overlooked when the api was designed, nobody is parsing that file from the end and nobody is supposed to
What was overlooked? That executables can have whitespace in their names?
I can agree that this section of the manpage can be wrong (http://man7.org/linux/man-pages/man5/proc.5.html, search for
stat
):From the manpage of
scanf
:So it’s clear no provision was made for executables having whitespace in them.
This issue can be simply avoided by not allowing whitespace in executable names, and by reporting such occurrences as a bug.
Ahhh, the Systemd approach to input validation!
Seriously, if the system allows running executables with whitespace in their names, and your program is meant to work with such a system, then it needs to work with executables with whitespace in their names.
I agree somewhat with the OP - the interface is badly thought out. But it’s a general problem: trying to pass structured data between kernel and userspace in plain-text format is, IMO, a bad idea. (I’d rather a binary format. You have the length of the string encoded in 4 bytes, then the string itself. Simple, easy to deal with. No weird corner cases).
I agree it’s a bug.
However, there’s a strong convention that executables do not have whitespace in them, at least in Linux/Unix.[1]
If you don’t adhere to this convention, and you stumble across a consequence to this, does this mean that a format that’s been around as long as the
proc
system is literally broken? That’s where I reacted.As far as I know, nothing crashes when you start an executable with whitespace in it. The
proc
filesystem isn’t corrupted.One part of it is slightly harder to parse using C.
That’s my take, I’m happy to be enlightened further.
I also agree that exposing these kind of structures as plain text is arguably … optimistic, and prone to edge cases. (By the way, isn’t one of the criticisms of systemd that it has an internal binary format?).
[1] note I’m just going from personal observation here, it’s possible there’s a subset of Linux applications that are perfectly fine with whitespace in the executable name.
I agree with most of what you just said, but I myself didn’t take “broken” to mean anything beyond “has a problem due to lack of forethought”. Maybe I’m just getting used to people exaggerating complaints (heck I’m surely guilty of it myself from time to time).
It’s true that we basically never see executables with a space (or various other characters) in their names, but it can be pretty frustrating when tools stop working or don’t work properly when something slightly unusual happens. I could easily see a new-to-linux person creating just such an executable because they “didn’t know better” and suffering as a result because other programs on their system don’t correctly handle it. In the worst case, this sort of problem (though not necessarily this exact problem) can lead to security issues.
Yes, it’s possible to correctly handle /proc/xxx/stat in the presence of executables with spaces in the name, but it’s almost certain that some programs are going to come into existence which don’t do so correctly. The format actually lends itself to this mistake - and that’s what’s “broken” about it. That’s my take, anyway.
Thanks for this thoughtful response. I believe you and I are in agreement.
Looking at this from a slightly more usual perspective, how does the Linux system handle executables with (non-whitespace) Unicode characters?
Well, I’m no expert on unicode, but I believe for the most part Linux (the kernel) treats filenames as strings of bytes, not strings of characters. The difference is subtle - unless you happen to be writing text in a language that uses characters not found in the ASCII range. However, UTF-8 encoding will (I think) never cause any bytes in the ASCII range (0-127) to appear as part of a multi-byte encoded character, so you can’t get spurious spaces or newlines or other control characters even if you treat UTF-8 encoded text as ASCII. For that reason, it poses less of a problem for things like /proc/xxx/stat and the like.
Of course filenames being byte sequences comes with its own set of problems, including that it’s hard to know encoding should be used to display filenames (I believe many command line tools use the locale’s default encoding, and that’s nearly always UTF-8 these days) and that a filename potentially contains an invalid encoding. Then of course there’s the fact that unicode has multiple ways of encoding the exact same text and so in theory you could get two “identical” filenames in one directory (different byte sequences, same character sequence, or at least same visible representation). Unicode seems like a big mess to me, but I guess the problem it’s trying to solve is not an easy one.
(minor edit: UTF-8 doesn’t allow 0-127 as part of a multi-byte encoded character. Of course they can appear as regular characters, equivalent to the ASCII).
Yes; the
proc
system’s format has been broken (well, misleadingly-documented) the whole time.As you note, using pure text to represent this is a problem. I don’t recommend an internal, poorly-documented binary format either: canonical S-expressions have a textual representation but can still contain binary data:
An example
stat
might be:Or, if you really cared about concision:
As an example the Python Prometheus client library uses this file, and allows for this.