Who hurt you? By that I mean, what did you have to work with that made you want to write this piece? I agree with everything you wrote, but intel syntax has won AFAICT. Nothing I come into contact with makes me read enough AT&T syntax to motivate as much writing as you just did. So, if you’re able, point a finger at the responsible party/project. And may your campaign to eradicate AT&T syntax see much success.
At one point I was looking into submitting a patch for one of freebsd’s optimized strings functions, but they were all at&t. I eventually ended up rewriting the function from scratch (though I guess I could have also used a disassembler…)
The other one was tcc. (The code generator doesn’t use any mnemonics, but the runtime does.) That’s a harder sell, though, since it has to bootstrap and the compiler itself doesn’t support the intel syntax (yet?).
Really? The mental mapping from Intel-to-AT&T assembler is relatively straightforward. Personally, I’ve never ran into an issue interpreting either, but I’d consider myself a casual user of assembler (only hobbyist OSdev), rather than a professional (optimized production code).
Purely to satisfy my own curiosity, would you mind linking the string function in question (or a related one)?
Reading is not that bad, but writing is harder; I wouldn’t trust myself to do it.
The function in question was memset, and it needs a rewrite anyway to use simd instructions; that is here (still wip). (Though the old one will still stick around for kernel use, as simd in kernel is expensive and only worth it for very expensive operations.)
so it seems it wasn’t entirely designed from the ground up, but rather it evolved to something we use today. That could explain that some aspects of it don’t make perfect sense.
Ha, great, now I have something to back up my otherwise completely irrational dislike of AT&T syntax.
Putting the destination second is consistent with phrasal forms in English. mov %ebx, %eax is analogous to “move EBX to EAX”, and add %ebx, %eax to “add EBX to EAX”. In general, these correspond to the very idiomatic English form “ to ”. Where Intel syntax demands extremely awkward productions like “move to EAX from EBX”.
I don’t really like the “mov” nomenclature in the first place. mov doesn’t move anything. The value is still there at the source, after it’s been “moved” to the destination. It’s much nicer on Z80, where move commands are called load. ld a,n = load accumulator with n.
I like the AT&T syntax as well, and I’m a little sad every time a new project decides on an Intel-inspired syntax instead. I am amazed at how frothy, and frankly how voluminous, is the article.
It’s much nicer on Z80, where move commands are called load. ld a,n = load accumulator with n.
The problem is that ‘load’ also refers to an assignment that reads from memory, where ‘store’ is an assignment that writes to memory. So it’s somewhat inconsistent. (And I think the load-store terminology wrt memory is more useful.)
Other ideas: set and assign (and ← or <-, if infix is kosher).
I really like the writing. Hope to see more from you!
I’ve always (ie. the couple of times I’ve played with x86 asm) preferred the Intel syntax and wondered why GNU Assembler defaults to this weird backwards syntax. It’s good to have the opinions of someone more experienced to lean on.
The great hacker tradition of voicing strong opinions via long, thorough rants has been waning in the last decade(s?). This is a pity, because learning from strong opinions seems a prerequisite of well thought out moderate opinions.
I tried making intel syntax the only accepted inline assembly syntax of zig but LLVM’s ability to parse it just isn’t up to par. It’s unfortunately not practical to use it.
I still want to get to that point, but it’s going to require zig implementing intel syntax itself, then translating it to AT&T syntax for LLVM to consume. Wild, right? But I agree, it will be worth it so that inline assembly can be in intel syntax.
Yes, because we need assembly parsing anyway, for the non-LLVM backends of the compiler. So we have to lower from that, to whatever LLVM expects. So naturally we would just lower to the syntax that already works in LLVM.
This is very x86-centric. AT&T syntax on every other architecture gets the operand order the right way around (which makes the x86 AT&T assembly syntax even more confusing).
I get the feeling over preference is mostly where an assembly programmer cut their teeth - if you learned on a Unix, then it’s going to be AT&T syntax; otherwise, Intel.
SIB is short for (s)cale (i)ndex (b)ase, the format of some memory operands on x86. E.G. [rbx + 4*rcx]; rbx is the base (address of some array), rcx is the index (of some element of that array), and 4 is the scale (size of an individual element of the array).
Who hurt you? By that I mean, what did you have to work with that made you want to write this piece? I agree with everything you wrote, but intel syntax has won AFAICT. Nothing I come into contact with makes me read enough AT&T syntax to motivate as much writing as you just did. So, if you’re able, point a finger at the responsible party/project. And may your campaign to eradicate AT&T syntax see much success.
Most projects in the unix world.
At one point I was looking into submitting a patch for one of freebsd’s optimized strings functions, but they were all at&t. I eventually ended up rewriting the function from scratch (though I guess I could have also used a disassembler…)
The other one was tcc. (The code generator doesn’t use any mnemonics, but the runtime does.) That’s a harder sell, though, since it has to bootstrap and the compiler itself doesn’t support the intel syntax (yet?).
Really? The mental mapping from Intel-to-AT&T assembler is relatively straightforward. Personally, I’ve never ran into an issue interpreting either, but I’d consider myself a casual user of assembler (only hobbyist OSdev), rather than a professional (optimized production code).
Purely to satisfy my own curiosity, would you mind linking the string function in question (or a related one)?
I will admit I am exaggerating a bit.
Reading is not that bad, but writing is harder; I wouldn’t trust myself to do it.
The function in question was memset, and it needs a rewrite anyway to use simd instructions; that is here (still wip). (Though the old one will still stick around for kernel use, as simd in kernel is expensive and only worth it for very expensive operations.)
This is a pretty good explanation why AT&T notation even exists:
https://stackoverflow.com/questions/42244028/what-was-the-original-reason-for-the-design-of-att-assembly-syntax
https://stackoverflow.com/questions/4193827/questions-about-att-x86-syntax-design/4194571#4194571
so it seems it wasn’t entirely designed from the ground up, but rather it evolved to something we use today. That could explain that some aspects of it don’t make perfect sense.
Ha, great, now I have something to back up my otherwise completely irrational dislike of AT&T syntax.
I don’t really like the “mov” nomenclature in the first place. mov doesn’t move anything. The value is still there at the source, after it’s been “moved” to the destination. It’s much nicer on Z80, where move commands are called load. ld a,n = load accumulator with n.
Mov is everything. Mov is turing complete!
Indeed. mov is all you need (wasn’t this possible with some other instructions as well?). Then again, it’s completely superfluous.
Indeed. Xor, sub, add, xadd, adc, sbb. There are also some that use only a few instructions: push+pop, and+or, rcr+rcl+sal.
I like the AT&T syntax as well, and I’m a little sad every time a new project decides on an Intel-inspired syntax instead. I am amazed at how frothy, and frankly how voluminous, is the article.
The problem is that ‘load’ also refers to an assignment that reads from memory, where ‘store’ is an assignment that writes to memory. So it’s somewhat inconsistent. (And I think the load-store terminology wrt memory is more useful.)
Other ideas:
set
andassign
(and←
or<-
, if infix is kosher).I really like the writing. Hope to see more from you!
I’ve always (ie. the couple of times I’ve played with x86 asm) preferred the Intel syntax and wondered why GNU Assembler defaults to this weird backwards syntax. It’s good to have the opinions of someone more experienced to lean on.
The great hacker tradition of voicing strong opinions via long, thorough rants has been waning in the last decade(s?). This is a pity, because learning from strong opinions seems a prerequisite of well thought out moderate opinions.
Agreed! I hope OP will consider adding an RSS/Atom/JSON feed for their blog.
I tried making intel syntax the only accepted inline assembly syntax of zig but LLVM’s ability to parse it just isn’t up to par. It’s unfortunately not practical to use it.
I still want to get to that point, but it’s going to require zig implementing intel syntax itself, then translating it to AT&T syntax for LLVM to consume. Wild, right? But I agree, it will be worth it so that inline assembly can be in intel syntax.
I’m not a big fan of inline assembly in general, particularly the llvm/gcc vision of it. But llvm is doing really cool stuff there. (See also.)
Would it be harder to fix LLVM directly?
Yes, because we need assembly parsing anyway, for the non-LLVM backends of the compiler. So we have to lower from that, to whatever LLVM expects. So naturally we would just lower to the syntax that already works in LLVM.
This is very x86-centric. AT&T syntax on every other architecture gets the operand order the right way around (which makes the x86 AT&T assembly syntax even more confusing).
I get the feeling over preference is mostly where an assembly programmer cut their teeth - if you learned on a Unix, then it’s going to be AT&T syntax; otherwise, Intel.
Obligatory: except when you or your team already know it or you have legacy code using it to maintain.
What is
SIB
? …I think I don’t get a joke. DoesSIB
just meanx86
?SIB is short for (s)cale (i)ndex (b)ase, the format of some memory operands on x86. E.G.
[rbx + 4*rcx]
;rbx
is the base (address of some array),rcx
is the index (of some element of that array), and 4 is the scale (size of an individual element of the array).