Intel Processor Trace is a way to do this for arbitrary software and no compiler tricks, with magic trace providing a nice frontend.
Yes, DTrace has the pid provider for tracing entry and exit of things that look, in the ELF binary, like C functions, or arbitrary instructions within those functions. It also has the profile provider that you can use to do stack sampling, which obviously has a lower overhead than tracing with pid.
The older truss(1) tool also provides for system call tracing and even function boundary tracing a bit like the article, which I believe works by adjusting the way the runtime link editor does dynamic linking for libraries.
Right, the advantage with Intel PT is that there is no need to overwrite instructions with breakpoints. In comparison with the post above, uprobes/dtrace also needs to execute the overwritten instructions since there won’t be NOPs there in general.