No, you don’t need C aliasing to obtain vector optimization for this sort of code. You can do it with standards-conforming code via memcpy(): https://godbolt.org/g/55pxUS
Wow, it’s actually completely optimizing out the memcpy()? While awesome, that’s the kind of optimization I hate to depend on. One little seemingly inconsequential nudge and the optimizer might not be able to prove that’s safe, and suddenly there’s an additional O(n) copy silently going on.
Actually it’s not optimizing it out, it’s simply allocating the auto array into SIMD registers. You always must copy data into SIMD registers first before performing SIMD operations. The memcpy() code resembles a SIMD implementation more than the aliasing version.
You can - and thanks for the illustration - but the memcpy is antethical to the C design paradigm in my always humble opinion. And my point was not that you needed aliasing to get the vector optimization, but that aliasing does not interfere with the vector optimization.
I’m sorry but the justifications for your opinion no longer hold. memcpy() is the only unambiguous and well-defined way to do this. It also works across all architectures and input pointer values without having to worry about crashes due to misaligned accesses, while your code doesn’t. Both gcc and clang are now able to optimize away memcpy() and auto vars. An opinion here is simply not relevant, invoking undefined behavior when it increases risk for no benefit is irrational.
Au contraire. As I showed, C standard does not need to graft on a clumsy and painful anti-alias mechanism and programmers don’t need to go though stupid contortions with allocation of buffers that disappear under optimization , because the compiler does not need it. My code does’t have alignment problems. The justification for pointer alias rules is false. The end.
There are plenty of structs that only contain shorts and char, and in those cases employing aliasing as a rule would have alignment problems while the well-defined version wouldn’t. It’s not the end, you’re just in denial.
In those cases, you need to use an alignment modifier or sizeof. No magic needed. There is a reason that both gcc and clang have been forced to support -fnostrict_alias and now both support may_alias. The memcpy trick is a stupid hack that can easily go wrong - e.g one is not guaranteed that the compiler will optimize away the buffer, and a large buffer could overflow stack. You’re solving a non-problem by introducing complexity and opacity.
In what world is memcpy() magic and alignment modifiers aren’t? memcpy() is an old standard library function, alignment modifiers are compiler-specific syntax extensions.
memcpy() isn’t a hack, it’s always well-defined while aliasing can never be well-defined in all cases. Promoting aliasing as a rule is like promoting using the equality operator between floats – it can never work in all cases, though it may be possible to define meaningful behavior in specific cases. Promoting aliasing as a rule is promoting the false idea that C is a thin layer above contemporary architectures, it isn’t. Struct memory is not necessarily the same as array memory, not every machine that C supports can deference an int32 inside of an int64, not every machine can deference an int32 at any offset. Do you want C to die with x86_64 or do you want C to live?
Optimizations don’t need to be guaranteed when the code isn’t even correct in the first place. First make sure your code is correct, then worry about optimizing. You talk about alignment modifiers but they are rarely used, and usually they are used after a bug has already occurred. Code should be correct first, and memcpy() is the rule we should be promoting since it is always correct. Optimizers can meticulously add aliasing for specific cases once a bottleneck has been demonstrated. You’re solving a non-problem by indulging in premature optimization.
The memcpy hack is a hack because the programmer is supposed to write a copy of A to B and then back to A and rely on the optimizer to skip the copy and delete the buffer. So unoptimized the code may fault on stack overflows for data structures that exist only to make the compiler writers happier. And with a novel architecture, if the programmer wants to take advantage of a new capability - say 512 bit simd instructions , she can wait until the compiler has added it to its toolset and be happy with how it is used.
As for this not working in all cases: Big deal. C is not supposed to hide those things. In fact, the compiler has no idea if the memory is device memory with restrictions on how it can be addressed or memory with a copy on write semantics or …. You want C to be Pascal or Java and then announce that making C look like Pascal or Java can only be solved at the expense of making C unusable for low level programming. Which programming communities are asking for such insulation? None. C works fine on many architectures. C programmers know the difference between portable and non-portable constructs. C compilers can take advantage of SIMD instructions without requiring C programmers to give up low level memory access - one of the key advantages of programming in C. Basically, people who don’t like C are trying to turn C into something else and are offended that few are grateful.
You aren’t writing a copy of a buffer back and forth. In your example, you are reducing an encoding of a buffer into a checksum. You are only copying one way, and that is for the sake of normalization. All SIMD code works that way, you always must copy into SIMD registers first before doing SIMD operations. In your example, the aliasing code doesn’t resemble SIMD code both syntactically and semantically as much the memcpy() code does and in fact requires a smarter compiler to transform.
The chance of overflowing the stack is remote, since stacks now automatically grow and structs tend to be < 512 bytes, but if that is a legitimate concern you can do what you already do to avoid that situation, either use a static buffer (jeopardizing reentrancy) or use malloc().
By liberally using aliasing, you are assuming a specific implementation or underlying architecture. My point is that in general you cannot assume arbitrary internal addresses of a struct can always be dereferenced as int32s, so in general that should not be practiced. In specific cases you can alias, but those are the exceptions not the rule.
All copies on some architectures reduce to: load into register, store from register. So what? That is why we have a high level language which can translate *x = *y efficiently. The pointer alias code directly shows programmer intent. The memcpy code does not. The “sake of normalization” is just another way of saying “in order to cooperate with the fiction that the inconsistency in the standard produces”.
In many contexts, stacks do NOT automatically grow.Again, C is not Java. OS code, drivers, embedded code, even many applications for large systems - all need control over stack size. Triggering stack growth may even turn out to be a security failure for encryption which is almost universally written in C because in C you can assure time invariance (or you could until the language lawyers decided to improve it). Your proposal that programmers not only use a buffer, but use a malloced buffer, in order to allow the optimizer (they hope) not to use it, is ridiculous and is a direct violation of the C model.
“3. C code can be non-portable. Although it strove to give programmers the opportunity to write truly portable programs, the Committee did not want to force programmers into writing portably, to preclude the use of C as a “high-level assembler;” the ability to write machine-specific code is one of the strengths of C. It is this principle which largely motivates drawing the distinction between strictly conforming program and conforming program.” ( http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2021.htm)
Give me an example of an architecture where a properly aligned structure where sizeof(struct x)%sizeof(int32) == 0 cannot be accessed by int32s ? Maybe the itanium, but I doubt it. Again: every major OS turns off strict alias in the compilers and they seem to work. Furthermore, the standard itself permits aliasing via char* (as another hack). In practice, more architectures have trouble addressing individual bytes than addressing int32s.
I’d really like to see more alias analysis optimization in C code (and more optimization from static analysis) but this poorly designed, badly thought through approach we have currently is not going to get us there. To solve any software engineering problem, you have to first understand the use cases instead of imposing some synthetic design.
I’m willing to agree with you that the aliasing version more clearly shows intent in this specific case but then I ask, what do you do when the code aliases a struct that isn’t properly aligned? There are a lot of solutions but in the spirit of C, I think the right answer is that it is undefined.
So I think what you want is the standard to define one specific instance of previously undefined behavior. I think in this specific case, it’s fair to ask for locally aliasing an int32-aligned struct pointer to an int32 pointer to be explicitly defined by the standards committee. What I think you’re ignoring, however, is all the work the standards committee has already done to weigh the implications of defining behavior like that. At the very least, it’s not unlikely that there will be machines in the future where implementing the behavior you want will be non-trivial. Couple that with the burden of a more complex standard. So maybe the right answer to maximize global utility is to leave it undefined and to let optimization-focused coders use implementation-defined behavior when it matters but, as I’m arguing, use memcpy() by default. I tend to defer to the standards committees because I have read many of their feature proposals and accompanying rationales and they are usually pretty thorough and rarely miss things that I don’t miss.
Everybody arguing here loves C. You shouldn’t assume the standards committee is dumb or that anyone here wants C to be something it’s not. As much as you may think otherwise, I think C is good as it is and I don’t want it to be like other languages. I want C to be a maximally portable implementation language. We are all arguing in good faith and want the best for C, we just have different ideas about how that should happen.
what do you do when the code aliases a struct that isn’t properly aligned? There are a lot of solutions but in the spirit of C, I think the right answer is that it is undefined.
Implementation dependent.
Couple that with the burden of a more complex standard.
The current standard on when an lvalue works is complex and murky. Wg14 discussion on how it applies shows that it’s not even clear to them. The exception for char pointers was hurriedly added when they realized they had made memcpy impossible to implement. It seems as if malloc can’t be implemented in conforming c ( there is no method of changing storage type to reallocate it)
C would benefit from more clarity on many issues. I am very sympathetic to making pointer validity more transparent and well defined. I just think the current approach has failed and the c89 error has not been fixed but made worse. Also restrict has been fumbled away.
The chance of overflowing the stack is remote, since stacks now automatically grow and structs tend to be < 512 bytes, but if that is a legitimate concern you can
You don’t need a large buffer. You can memcpy the integers used for the calculation out one at a time, rather than memcpy’ing the entire struct at once.
Your designation of using memcpy as a “stupid hack” is pretty biased. The code you posted can go wrong, legitimately, because of course it invokes undefined behaviour, and is more of a hack than using memcpy is. You’ve made it clear that you think the aliasing rules should be changed (or shouldn’t exist) but this “evidence” you’ve given has clearly been debunked.
Funny use of “debunked”. You are using circular logic. My point was that this aliasing method is clearly amenable to optimization and vectorization - as seen. Therefore the argument for strict alias in the standard seems even weaker than it might. Your point seems to be that the standard makes aliasing undefined so aliasing is bad. Ok. I like your hack around the hack. The question is: why should C programmers have to jump through hoops to avoid triggering dangerous “optimizations”? The answer: because it’s in the standard, is not an answer.
Funny use of “debunked”. You are using circular logic. My point was that this aliasing method is clearly amenable to optimization and vectorization - as seen
You have shown a case where, if the strict aliasing rule did not exist, some code could [edit] still [/edit] be optimised and vectorised. That I agree with, though nobody claimed that the existence of the strict aliasing rule was necessary for all optimisation and vectorisation, so it’s not clear what you do think this proves. Your title says that the optimisation is BECAUSE of aliasing, which is demonstrably false. Hence, debunked. Why is that “funny”? And how is your logic any less circular then mine?
The question is: why should C programmers have to jump through hoops to avoid triggering dangerous “optimizations”?
Characterising optimisations as “dangerous” already implies that the code was correct before the optimisation was applied and that the optimisation can somehow make it incorrect. The logic you are using relies on the code (such as what you’ve posted) being correct - which it isn’t, according to the rules of the language (which, yes, are written in a standard). But why is using memcpy “jumping through hoops” whereas casting a pointer to a different type of pointer and then de-referencing it not? The answer is, as far as I can see, because you like doing the latter but you don’t like doing the former.
No, you don’t need C aliasing to obtain vector optimization for this sort of code. You can do it with standards-conforming code via memcpy(): https://godbolt.org/g/55pxUS
Wow, it’s actually completely optimizing out the memcpy()? While awesome, that’s the kind of optimization I hate to depend on. One little seemingly inconsequential nudge and the optimizer might not be able to prove that’s safe, and suddenly there’s an additional O(n) copy silently going on.
memset/memcpy get optimized out a lot, hence libraries making things like this: https://monocypher.org/manual/wipe
Actually it’s not optimizing it out, it’s simply allocating the auto array into SIMD registers. You always must copy data into SIMD registers first before performing SIMD operations. The memcpy() code resembles a SIMD implementation more than the aliasing version.
You can - and thanks for the illustration - but the memcpy is antethical to the C design paradigm in my always humble opinion. And my point was not that you needed aliasing to get the vector optimization, but that aliasing does not interfere with the vector optimization.
I’m sorry but the justifications for your opinion no longer hold. memcpy() is the only unambiguous and well-defined way to do this. It also works across all architectures and input pointer values without having to worry about crashes due to misaligned accesses, while your code doesn’t. Both gcc and clang are now able to optimize away memcpy() and auto vars. An opinion here is simply not relevant, invoking undefined behavior when it increases risk for no benefit is irrational.
Au contraire. As I showed, C standard does not need to graft on a clumsy and painful anti-alias mechanism and programmers don’t need to go though stupid contortions with allocation of buffers that disappear under optimization , because the compiler does not need it. My code does’t have alignment problems. The justification for pointer alias rules is false. The end.
There are plenty of structs that only contain shorts and char, and in those cases employing aliasing as a rule would have alignment problems while the well-defined version wouldn’t. It’s not the end, you’re just in denial.
In those cases, you need to use an alignment modifier or sizeof. No magic needed. There is a reason that both gcc and clang have been forced to support -fnostrict_alias and now both support may_alias. The memcpy trick is a stupid hack that can easily go wrong - e.g one is not guaranteed that the compiler will optimize away the buffer, and a large buffer could overflow stack. You’re solving a non-problem by introducing complexity and opacity.
In what world is memcpy() magic and alignment modifiers aren’t? memcpy() is an old standard library function, alignment modifiers are compiler-specific syntax extensions.
memcpy() isn’t a hack, it’s always well-defined while aliasing can never be well-defined in all cases. Promoting aliasing as a rule is like promoting using the equality operator between floats – it can never work in all cases, though it may be possible to define meaningful behavior in specific cases. Promoting aliasing as a rule is promoting the false idea that C is a thin layer above contemporary architectures, it isn’t. Struct memory is not necessarily the same as array memory, not every machine that C supports can deference an int32 inside of an int64, not every machine can deference an int32 at any offset. Do you want C to die with x86_64 or do you want C to live?
Optimizations don’t need to be guaranteed when the code isn’t even correct in the first place. First make sure your code is correct, then worry about optimizing. You talk about alignment modifiers but they are rarely used, and usually they are used after a bug has already occurred. Code should be correct first, and memcpy() is the rule we should be promoting since it is always correct. Optimizers can meticulously add aliasing for specific cases once a bottleneck has been demonstrated. You’re solving a non-problem by indulging in premature optimization.
Heh I bet you’d get quite varied answers to this one here
The memcpy hack is a hack because the programmer is supposed to write a copy of A to B and then back to A and rely on the optimizer to skip the copy and delete the buffer. So unoptimized the code may fault on stack overflows for data structures that exist only to make the compiler writers happier. And with a novel architecture, if the programmer wants to take advantage of a new capability - say 512 bit simd instructions , she can wait until the compiler has added it to its toolset and be happy with how it is used.
As for this not working in all cases: Big deal. C is not supposed to hide those things. In fact, the compiler has no idea if the memory is device memory with restrictions on how it can be addressed or memory with a copy on write semantics or …. You want C to be Pascal or Java and then announce that making C look like Pascal or Java can only be solved at the expense of making C unusable for low level programming. Which programming communities are asking for such insulation? None. C works fine on many architectures. C programmers know the difference between portable and non-portable constructs. C compilers can take advantage of SIMD instructions without requiring C programmers to give up low level memory access - one of the key advantages of programming in C. Basically, people who don’t like C are trying to turn C into something else and are offended that few are grateful.
You aren’t writing a copy of a buffer back and forth. In your example, you are reducing an encoding of a buffer into a checksum. You are only copying one way, and that is for the sake of normalization. All SIMD code works that way, you always must copy into SIMD registers first before doing SIMD operations. In your example, the aliasing code doesn’t resemble SIMD code both syntactically and semantically as much the memcpy() code does and in fact requires a smarter compiler to transform.
The chance of overflowing the stack is remote, since stacks now automatically grow and structs tend to be < 512 bytes, but if that is a legitimate concern you can do what you already do to avoid that situation, either use a static buffer (jeopardizing reentrancy) or use malloc().
By liberally using aliasing, you are assuming a specific implementation or underlying architecture. My point is that in general you cannot assume arbitrary internal addresses of a struct can always be dereferenced as int32s, so in general that should not be practiced. In specific cases you can alias, but those are the exceptions not the rule.
All copies on some architectures reduce to: load into register, store from register. So what? That is why we have a high level language which can translate *x = *y efficiently. The pointer alias code directly shows programmer intent. The memcpy code does not. The “sake of normalization” is just another way of saying “in order to cooperate with the fiction that the inconsistency in the standard produces”.
In many contexts, stacks do NOT automatically grow.Again, C is not Java. OS code, drivers, embedded code, even many applications for large systems - all need control over stack size. Triggering stack growth may even turn out to be a security failure for encryption which is almost universally written in C because in C you can assure time invariance (or you could until the language lawyers decided to improve it). Your proposal that programmers not only use a buffer, but use a malloced buffer, in order to allow the optimizer (they hope) not to use it, is ridiculous and is a direct violation of the C model.
Give me an example of an architecture where a properly aligned structure where sizeof(struct x)%sizeof(int32) == 0 cannot be accessed by int32s ? Maybe the itanium, but I doubt it. Again: every major OS turns off strict alias in the compilers and they seem to work. Furthermore, the standard itself permits aliasing via char* (as another hack). In practice, more architectures have trouble addressing individual bytes than addressing int32s.
I’d really like to see more alias analysis optimization in C code (and more optimization from static analysis) but this poorly designed, badly thought through approach we have currently is not going to get us there. To solve any software engineering problem, you have to first understand the use cases instead of imposing some synthetic design.
Anyways off the airport. Later. vy
I’m willing to agree with you that the aliasing version more clearly shows intent in this specific case but then I ask, what do you do when the code aliases a struct that isn’t properly aligned? There are a lot of solutions but in the spirit of C, I think the right answer is that it is undefined.
So I think what you want is the standard to define one specific instance of previously undefined behavior. I think in this specific case, it’s fair to ask for locally aliasing an int32-aligned struct pointer to an int32 pointer to be explicitly defined by the standards committee. What I think you’re ignoring, however, is all the work the standards committee has already done to weigh the implications of defining behavior like that. At the very least, it’s not unlikely that there will be machines in the future where implementing the behavior you want will be non-trivial. Couple that with the burden of a more complex standard. So maybe the right answer to maximize global utility is to leave it undefined and to let optimization-focused coders use implementation-defined behavior when it matters but, as I’m arguing, use memcpy() by default. I tend to defer to the standards committees because I have read many of their feature proposals and accompanying rationales and they are usually pretty thorough and rarely miss things that I don’t miss.
Everybody arguing here loves C. You shouldn’t assume the standards committee is dumb or that anyone here wants C to be something it’s not. As much as you may think otherwise, I think C is good as it is and I don’t want it to be like other languages. I want C to be a maximally portable implementation language. We are all arguing in good faith and want the best for C, we just have different ideas about how that should happen.
Implementation dependent.
The current standard on when an lvalue works is complex and murky. Wg14 discussion on how it applies shows that it’s not even clear to them. The exception for char pointers was hurriedly added when they realized they had made memcpy impossible to implement. It seems as if malloc can’t be implemented in conforming c ( there is no method of changing storage type to reallocate it)
C would benefit from more clarity on many issues. I am very sympathetic to making pointer validity more transparent and well defined. I just think the current approach has failed and the c89 error has not been fixed but made worse. Also restrict has been fumbled away.
… just copy the ints out one at a time :) https://godbolt.org/g/g8s1vQ
The compiler largely sees this as a (legal) version of the OP’s code, so there’s basically zero chance it won’t be optimised in exactly the same way.
You don’t need a large buffer. You can memcpy the integers used for the calculation out one at a time, rather than memcpy’ing the entire struct at once.
Your designation of using memcpy as a “stupid hack” is pretty biased. The code you posted can go wrong, legitimately, because of course it invokes undefined behaviour, and is more of a hack than using memcpy is. You’ve made it clear that you think the aliasing rules should be changed (or shouldn’t exist) but this “evidence” you’ve given has clearly been debunked.
Funny use of “debunked”. You are using circular logic. My point was that this aliasing method is clearly amenable to optimization and vectorization - as seen. Therefore the argument for strict alias in the standard seems even weaker than it might. Your point seems to be that the standard makes aliasing undefined so aliasing is bad. Ok. I like your hack around the hack. The question is: why should C programmers have to jump through hoops to avoid triggering dangerous “optimizations”? The answer: because it’s in the standard, is not an answer.
You have shown a case where, if the strict aliasing rule did not exist, some code could [edit] still [/edit] be optimised and vectorised. That I agree with, though nobody claimed that the existence of the strict aliasing rule was necessary for all optimisation and vectorisation, so it’s not clear what you do think this proves. Your title says that the optimisation is BECAUSE of aliasing, which is demonstrably false. Hence, debunked. Why is that “funny”? And how is your logic any less circular then mine?
Characterising optimisations as “dangerous” already implies that the code was correct before the optimisation was applied and that the optimisation can somehow make it incorrect. The logic you are using relies on the code (such as what you’ve posted) being correct - which it isn’t, according to the rules of the language (which, yes, are written in a standard). But why is using memcpy “jumping through hoops” whereas casting a pointer to a different type of pointer and then de-referencing it not? The answer is, as far as I can see, because you like doing the latter but you don’t like doing the former.
The internet has no end.
See an explanation at http://www.yodaiken.com/2018/06/17/vector-optimization-with-c-aliasing/