Agree, frameworks are terrible. Dependencies should be minimal and orthogonal. I can connect the legos myself, no need for a lego frame.
No, you don’t need C aliasing to obtain vector optimization for this sort of code. You can do it with standards-conforming code via memcpy(): https://godbolt.org/g/55pxUS
Wow, it’s actually completely optimizing out the memcpy()? While awesome, that’s the kind of optimization I hate to depend on. One little seemingly inconsequential nudge and the optimizer might not be able to prove that’s safe, and suddenly there’s an additional O(n) copy silently going on.
memset/memcpy get optimized out a lot, hence libraries making things like this: https://monocypher.org/manual/wipe
Actually it’s not optimizing it out, it’s simply allocating the auto array into SIMD registers. You always must copy data into SIMD registers first before performing SIMD operations. The memcpy() code resembles a SIMD implementation more than the aliasing version.
You can - and thanks for the illustration - but the memcpy is antethical to the C design paradigm in my always humble opinion. And my point was not that you needed aliasing to get the vector optimization, but that aliasing does not interfere with the vector optimization.
I’m sorry but the justifications for your opinion no longer hold. memcpy() is the only unambiguous and well-defined way to do this. It also works across all architectures and input pointer values without having to worry about crashes due to misaligned accesses, while your code doesn’t. Both gcc and clang are now able to optimize away memcpy() and auto vars. An opinion here is simply not relevant, invoking undefined behavior when it increases risk for no benefit is irrational.
Au contraire. As I showed, C standard does not need to graft on a clumsy and painful anti-alias mechanism and programmers don’t need to go though stupid contortions with allocation of buffers that disappear under optimization , because the compiler does not need it. My code does’t have alignment problems. The justification for pointer alias rules is false. The end.
There are plenty of structs that only contain shorts and char, and in those cases employing aliasing as a rule would have alignment problems while the well-defined version wouldn’t. It’s not the end, you’re just in denial.
In those cases, you need to use an alignment modifier or sizeof. No magic needed. There is a reason that both gcc and clang have been forced to support -fnostrict_alias and now both support may_alias. The memcpy trick is a stupid hack that can easily go wrong - e.g one is not guaranteed that the compiler will optimize away the buffer, and a large buffer could overflow stack. You’re solving a non-problem by introducing complexity and opacity.
In what world is memcpy() magic and alignment modifiers aren’t? memcpy() is an old standard library function, alignment modifiers are compiler-specific syntax extensions.
memcpy() isn’t a hack, it’s always well-defined while aliasing can never be well-defined in all cases. Promoting aliasing as a rule is like promoting using the equality operator between floats – it can never work in all cases, though it may be possible to define meaningful behavior in specific cases. Promoting aliasing as a rule is promoting the false idea that C is a thin layer above contemporary architectures, it isn’t. Struct memory is not necessarily the same as array memory, not every machine that C supports can deference an int32 inside of an int64, not every machine can deference an int32 at any offset. Do you want C to die with x86_64 or do you want C to live?
Optimizations don’t need to be guaranteed when the code isn’t even correct in the first place. First make sure your code is correct, then worry about optimizing. You talk about alignment modifiers but they are rarely used, and usually they are used after a bug has already occurred. Code should be correct first, and memcpy() is the rule we should be promoting since it is always correct. Optimizers can meticulously add aliasing for specific cases once a bottleneck has been demonstrated. You’re solving a non-problem by indulging in premature optimization.
Do you want C to die with x86_64 or do you want C to live?
Heh I bet you’d get quite varied answers to this one here
The memcpy hack is a hack because the programmer is supposed to write a copy of A to B and then back to A and rely on the optimizer to skip the copy and delete the buffer. So unoptimized the code may fault on stack overflows for data structures that exist only to make the compiler writers happier. And with a novel architecture, if the programmer wants to take advantage of a new capability - say 512 bit simd instructions , she can wait until the compiler has added it to its toolset and be happy with how it is used.
As for this not working in all cases: Big deal. C is not supposed to hide those things. In fact, the compiler has no idea if the memory is device memory with restrictions on how it can be addressed or memory with a copy on write semantics or …. You want C to be Pascal or Java and then announce that making C look like Pascal or Java can only be solved at the expense of making C unusable for low level programming. Which programming communities are asking for such insulation? None. C works fine on many architectures. C programmers know the difference between portable and non-portable constructs. C compilers can take advantage of SIMD instructions without requiring C programmers to give up low level memory access - one of the key advantages of programming in C. Basically, people who don’t like C are trying to turn C into something else and are offended that few are grateful.
You aren’t writing a copy of a buffer back and forth. In your example, you are reducing an encoding of a buffer into a checksum. You are only copying one way, and that is for the sake of normalization. All SIMD code works that way, you always must copy into SIMD registers first before doing SIMD operations. In your example, the aliasing code doesn’t resemble SIMD code both syntactically and semantically as much the memcpy() code does and in fact requires a smarter compiler to transform.
The chance of overflowing the stack is remote, since stacks now automatically grow and structs tend to be < 512 bytes, but if that is a legitimate concern you can do what you already do to avoid that situation, either use a static buffer (jeopardizing reentrancy) or use malloc().
By liberally using aliasing, you are assuming a specific implementation or underlying architecture. My point is that in general you cannot assume arbitrary internal addresses of a struct can always be dereferenced as int32s, so in general that should not be practiced. In specific cases you can alias, but those are the exceptions not the rule.
All copies on some architectures reduce to: load into register, store from register. So what? That is why we have a high level language which can translate *x = *y efficiently. The pointer alias code directly shows programmer intent. The memcpy code does not. The “sake of normalization” is just another way of saying “in order to cooperate with the fiction that the inconsistency in the standard produces”.
In many contexts, stacks do NOT automatically grow.Again, C is not Java. OS code, drivers, embedded code, even many applications for large systems - all need control over stack size. Triggering stack growth may even turn out to be a security failure for encryption which is almost universally written in C because in C you can assure time invariance (or you could until the language lawyers decided to improve it). Your proposal that programmers not only use a buffer, but use a malloced buffer, in order to allow the optimizer (they hope) not to use it, is ridiculous and is a direct violation of the C model.
“3. C code can be non-portable. Although it strove to give programmers the opportunity to write truly portable programs, the Committee did not want to force programmers into writing portably, to preclude the use of C as a “high-level assembler;” the ability to write machine-specific code is one of the strengths of C. It is this principle which largely motivates drawing the distinction between strictly conforming program and conforming program.” ( http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2021.htm)
Give me an example of an architecture where a properly aligned structure where sizeof(struct x)%sizeof(int32) == 0 cannot be accessed by int32s ? Maybe the itanium, but I doubt it. Again: every major OS turns off strict alias in the compilers and they seem to work. Furthermore, the standard itself permits aliasing via char* (as another hack). In practice, more architectures have trouble addressing individual bytes than addressing int32s.
I’d really like to see more alias analysis optimization in C code (and more optimization from static analysis) but this poorly designed, badly thought through approach we have currently is not going to get us there. To solve any software engineering problem, you have to first understand the use cases instead of imposing some synthetic design.
Anyways off the airport. Later. vy
I’m willing to agree with you that the aliasing version more clearly shows intent in this specific case but then I ask, what do you do when the code aliases a struct that isn’t properly aligned? There are a lot of solutions but in the spirit of C, I think the right answer is that it is undefined.
So I think what you want is the standard to define one specific instance of previously undefined behavior. I think in this specific case, it’s fair to ask for locally aliasing an int32-aligned struct pointer to an int32 pointer to be explicitly defined by the standards committee. What I think you’re ignoring, however, is all the work the standards committee has already done to weigh the implications of defining behavior like that. At the very least, it’s not unlikely that there will be machines in the future where implementing the behavior you want will be non-trivial. Couple that with the burden of a more complex standard. So maybe the right answer to maximize global utility is to leave it undefined and to let optimization-focused coders use implementation-defined behavior when it matters but, as I’m arguing, use memcpy() by default. I tend to defer to the standards committees because I have read many of their feature proposals and accompanying rationales and they are usually pretty thorough and rarely miss things that I don’t miss.
Everybody arguing here loves C. You shouldn’t assume the standards committee is dumb or that anyone here wants C to be something it’s not. As much as you may think otherwise, I think C is good as it is and I don’t want it to be like other languages. I want C to be a maximally portable implementation language. We are all arguing in good faith and want the best for C, we just have different ideas about how that should happen.
what do you do when the code aliases a struct that isn’t properly aligned? There are a lot of solutions but in the spirit of C, I think the right answer is that it is undefined.
Implementation dependent.
Couple that with the burden of a more complex standard.
The current standard on when an lvalue works is complex and murky. Wg14 discussion on how it applies shows that it’s not even clear to them. The exception for char pointers was hurriedly added when they realized they had made memcpy impossible to implement. It seems as if malloc can’t be implemented in conforming c ( there is no method of changing storage type to reallocate it)
C would benefit from more clarity on many issues. I am very sympathetic to making pointer validity more transparent and well defined. I just think the current approach has failed and the c89 error has not been fixed but made worse. Also restrict has been fumbled away.
The chance of overflowing the stack is remote, since stacks now automatically grow and structs tend to be < 512 bytes, but if that is a legitimate concern you can
… just copy the ints out one at a time :) https://godbolt.org/g/g8s1vQ
The compiler largely sees this as a (legal) version of the OP’s code, so there’s basically zero chance it won’t be optimised in exactly the same way.
You don’t need a large buffer. You can memcpy the integers used for the calculation out one at a time, rather than memcpy’ing the entire struct at once.
Your designation of using memcpy as a “stupid hack” is pretty biased. The code you posted can go wrong, legitimately, because of course it invokes undefined behaviour, and is more of a hack than using memcpy is. You’ve made it clear that you think the aliasing rules should be changed (or shouldn’t exist) but this “evidence” you’ve given has clearly been debunked.
Funny use of “debunked”. You are using circular logic. My point was that this aliasing method is clearly amenable to optimization and vectorization - as seen. Therefore the argument for strict alias in the standard seems even weaker than it might. Your point seems to be that the standard makes aliasing undefined so aliasing is bad. Ok. I like your hack around the hack. The question is: why should C programmers have to jump through hoops to avoid triggering dangerous “optimizations”? The answer: because it’s in the standard, is not an answer.
Funny use of “debunked”. You are using circular logic. My point was that this aliasing method is clearly amenable to optimization and vectorization - as seen
You have shown a case where, if the strict aliasing rule did not exist, some code could [edit] still [/edit] be optimised and vectorised. That I agree with, though nobody claimed that the existence of the strict aliasing rule was necessary for all optimisation and vectorisation, so it’s not clear what you do think this proves. Your title says that the optimisation is BECAUSE of aliasing, which is demonstrably false. Hence, debunked. Why is that “funny”? And how is your logic any less circular then mine?
The question is: why should C programmers have to jump through hoops to avoid triggering dangerous “optimizations”?
Characterising optimisations as “dangerous” already implies that the code was correct before the optimisation was applied and that the optimisation can somehow make it incorrect. The logic you are using relies on the code (such as what you’ve posted) being correct - which it isn’t, according to the rules of the language (which, yes, are written in a standard). But why is using memcpy “jumping through hoops” whereas casting a pointer to a different type of pointer and then de-referencing it not? The answer is, as far as I can see, because you like doing the latter but you don’t like doing the former.
sigh
So, this has already sparked a discussion about taste, freedom of speech, the whole thing.
The joke in question is bad, very bad. It’s plain unfitting, and it isn’t even remotely funny. It’s US-centric. RMS, the person making and subsequently claiming it, has a history of making sexual and other inappropriate commentary (e.g. arguing eugenics). His quoted comment about child birth is another example of RMS speaking about things he probably doesn’t have a very qualified opinion on. Most (all?) of the people mentioned in the article discussing the issue will never be affected by this in the real world. Seriously, I expect one of those people to stand up and say “You know what? We aren’t even the right group to discuss that in!”.
And this is the issue he pulls his authority card? Seriously? For a bad joke that was already shit in the 90s? That - even ignoring the punchline being terrible - just plain isn’t funny? Which boundary does that cross? Probably his egos.
Seriously, this is a tech manual. This is the place where you can finally have your “let’s just talk tech her”. And there, this discussion comes up?
The thing I find weird is the clear generational gap in Internet users that mean that people end up talking past each other.
For older people who grew up thinking that Sendmail m4 macros were somehow intuitive, and that C was the new hotness, this is not a joke about abortion. It’s about censorship. That’s the hill RMS thinks he’s dying on. Removing the joke is at the risk of putting words in his mouth, censoring the manual.
Of course, the younger people who live in a world where Javascript isn’t ridiculous to use on a server, where everything-as-a-service is the norm demand takedowns of things outside of their overton window. To them, it’s a matter of not having a frankly disgusting joke about the very real problems of abortion in the US in a technical manual that has nothing to do with those problems. They don’t understand the culture in which GNU was founded, they believe that it is RMS’ job to change to fit with their culture.
This is what happens when an unstoppable force meets an immovable object. I’m just not sure who plays which part here. There is a reasonable answer, and the good news for the kids is that this has happened before several times: fork glibc. Fork it to remove RMS’ influence from the project and fork it to remove the offending text (for people that want it removed).
Even as a commentary about censorship, it’s pretty freaking oblique. It should be removed on the technical grounds that it’s inefficient GNU crap.
Stallman is pretty freaking oblique at the best of times when it comes to his sense of humour. Saying that GNU is full of inefficient crap is like saying that water is wet, or that the Linux kernel is a bug-ridden dumpster fire.
If every GNU inefficiency was removed, it’d be BSD.
It should be removed on the technical grounds that it’s inefficient GNU crap.
Nobody force you to use GNU crap.
But GNU is and have always been openly political.
You are free to use software that is apparently neutral. if you don’t like it.
And you have plenty of choice on the market: Microsoft, Apple, Google… all are pretty ready to serve your needs (and collect your data for whatever purpose, and lobbying for DRM and so on../)
But “as a commentary about censorship”, that joke is perfectly fine.
Nobody force you to use GNU crap.
The fact that you are saying this to tedu (an OpenBSD developer) is kind of funny.
I’m fine with GNU being a political project. Indeed, I actively advocate for projects to make their mind up.
But “as a commentary about censorship”, that joke is perfectly fine.
A lot of the project itself does not seem to agree, especially in the context of having it in the documentation. Except RMS, who pulls rank over a joke that he himself made. Which makes the GNU project his personal opinion/joke vehicle.
Except RMS, who pulls rank over a joke that he himself made. Which makes the GNU project his personal opinion/joke vehicle.
I don’t see the point you’re making here? The GNU project was always an expression of political views that were, originally, personal to RMS. If the project ran by majority consensus it would have given up on the whole free software thing a long time ago.
Using your “Rust Community Team” hat here is crass, and only reinforces some people’s beliefs (myself included) about these types of thought police organizations.
I sure hope the non-“Rust Community Team” people show less virtue signalling. It puts your project under a terrible spotlight.
FWIW, I find the use of the hat inappropriate here as well.
That being said, as discussed below, I think it depends on what you think the hat means, exactly. It seems Florian uses the hat differently than many here might expect.
Be that as it may, when the people who have written the code (glibc was originally written by someone else (not RMS), and Ulrich Drepper is now responsible for something like 70% of the code) and make it all work ask you to back off, it’s a stupid hill to die on. Yeah, you might win the battle, but you’ll lose the war.
Last time something like this happened, everyone switched to using eglibc and it wasn’t until the RMS-mandated steering committee was dissolved that people switched back to glibc. If RMS decides to be a jerk about things, watch everyone fork it again or sink their resources into musl.
There’s being right, and there’s being so egotistical that you burn down the house because you didn’t get your way.
He has veto power for precisely these cases where “everyone else” disagrees, so I don’t think it’s a stupid hill to die on. In any case, I agree with you, RMS will lose this war, this is just the beginning.
Vetoing the removal of a little-used architecture with heavy maintenance burden because they want to support those few users is a good hill to die on. Vetoing the removal of a joke that everyone else wants to remove from the manual and doesn’t in any way affect the operation of the library is a stupid hill to die on.
That’s in your opinion. If you care the culture of your project not taking itself so seriously, I think it’s a good hill to die on.
As a participant in Rust Community and a proponent of eugenics, your use of Rust Community Team hat makes me uncomfortable. Was it necessary? Are you really speaking for Rust Community Team here? I hope my eugenics advocacy won’t affect my Rust participation.
As for the joke, the joke is clearly about censorship and not about abortion. I think attempt to censor the joke makes it more relevant.
As for the joke, the joke is clearly about censorship and not about abortion.
Jokes, by their nature, are not clear and subject to cultural background and education. In my opinion, it’s a bit condescending to claim that it has universal understanding and appeal.
I think attempt to censor the joke makes it more relevant.
The origin of the patch seems to be the person just didn’t think it relayed any meaningful information to a user of the function. I don’t think that falls into common usage of “censorship”.
I don’t think that falls into common usage of “censorship”.
Yes, and I have yet to see a documentation patch forced on a project by a state.
On FOSS social issues, I generally put the hat on here. As my work for the Rust project is social, judging which of these issues I should put the hat on would only lead to problems. I’m fine with people knowing my affiliation and I think it’s more honest for people to know it. I don’t speak for the team, but I am a member of the team.
On Eugenics: it’s, in my view, an only thinly veiled form of Ableism, and as such opposed to the goal of being inclusive, especially also to people with disability. Many forms fundamentally attack the right to live of people with disabilities, for example by arguing for their abortion.
Just to be clear on which comment by RMS I’m referring to (on people with Trisomy 21):
If you’d like to love and care for a pet that doesn’t have normal human mental capacity, don’t create a handicapped human being to be your pet. Get a dog or a parrot…
If you want to support that comment, go ahead.
I support the idea behind the comment. Given medical acceptance of prenatal screening of trisomy 21, this is one of less extreme among RMS’s positions.
I agree the expression of the idea in the comment you quoted leaves a lot to be desired.
Prenatal screening of trisomy 21 are generally accepted as a way to increase survival chances for the fetus.
Trisomy 21 increases the risk of heart issues at birth, that can be handled in the proper structure, but would lead to secure death if not addressed promptly.
Some people use it for eugenetics (usually with amniocentesis, that kills 1 healthy children out of 200 if I remember correctly).
Now, IMO what RMS means is horrible, disgusting and plain dangerous.
But it’s not related to freedom. And he has the right to think (and say) it.
Prenatal screening of trisomy 21 are generally accepted as a way to increase survival chances for the fetus.
Do you have a citation for your “generally accepted” claim? There appears to be at least some evidence to the contrary:
About 92% of pregnancies in Europe with a diagnosis of Down syndrome are terminated.[14] In the United States, termination rates are around 67%, but this rate varied from 61% to 93% among different populations.[13] Rates are lower among women who are younger and have decreased over time.[13] When nonpregnant people are asked if they would have a termination if their fetus tested positive, 23–33% said yes, when high-risk pregnant women were asked, 46–86% said yes, and when women who screened positive are asked, 89–97% say yes.[75]
This is entirely offtopic here, but I don’t want to flee the question.
My source is my doctor, that incidentally is also my wife.
When the prenatal screening of our second daughter established 1/350 probability of a Down syndrome, she explained me about amniocentesis, about the risks for the fetus and about the implications and the medical reasoning beyond it. It’s a complex topic and I’m not competent enough to expose it here deeply, but the relevant point was that, while several doctors object to abortion as a murder in contrast with their oath and ethics, prenatal screening is designed to increase the survival of the fetus, so every doctor is fine with it.
On FOSS social issues, I generally put the hat on here. As my work for the Rust project is social, judging which of these issues I should put the hat on would only lead to problems. I’m fine with people knowing my affiliation and I think it’s more honest for people to know it. I don’t speak for the team, but I am a member of the team.
While I do not agree with you on the “joke on documentation” issue, I really support this approach.
Hacking is a ethical and political action.
I hope my eugenics advocacy won’t affect my Rust participation.
If that’s what you think that means, and you advocate for any intelligence-based eugenics, you might want to reconsider your position on eugenics.
This obviously would only affect you if you attempted to add eugenics commentary to the Rust project itself in some way. Same as if you attempted to add any other irrelevant polarizing commentary.
I don’t talk eugenics on Rust space. Not because eugenics is wrong (it isn’t), but because it’s off-topic.
No, it isn’t. By definition.
You might not agree with GNU or with rms here, or you might prefer that glibc would not be a GNU project, but it is.
Fine. But the consensus of the primary maintainers is that it’s off-topic. Therefore it’s off-topic for whatever fork of glibc everyone ends up using. Because if we get another eglibc situation, everyone will use the fork maintained by the maintainers, and no one will use the fork “maintained” by rms.
It’s de facto off-topic for those who accept reality.
Anyone who “accepts reality” in that sense wouldn’t be contributing to GNU in the first place. The project has always been about RMS telling the rest of the world they’re wrong.
See eglibc. A non-GNU fork already happened, and was reintegrated when the issue was dropped.
I don’t see how you can say that those kind of people wouldn’t be contributing to GNU, when they clearly are and that’s what this is all about. If those kind of people wouldn’t be contributing to GNU, then why is there any debate?
There is debate precisely because the people contributing don’t subscribe to your notion that the primary maintainer consensus is all that matters. glibc contributors do care about GNU and RMS, otherwise the eglibc-style fork would already have happened and the project would now be being maintained outside the GNU umbrella.
This is why we can’t have good software. This program could literally have been an empty file, a nothing at all, a name capturing the essence perfectly.
I’m not sure I could disagree more strongly. An empty file only has the true behavior because of a bunch of incredibly non-obvious specific Unix behaviors. It would be equally reasonable for execution of this file to fail (like false) since there’s no hashbang or distinguishable executable format to decide how to handle it. At a somewhat higher level of non-obviousness, it’s really weird that true need be a command at all (and indeed, in almost all shells, it’s not—true is a builtin nearly everywhere).
true being implementable in Unix as an empty file isn’t elegant—it’s coincidental and implicit.
I mean, it’s POSIX specified behavior that any file that is executed that isn’t a loadable binary is passed to /bin/sh (”#!” as the first two bytes results in “implementation-defined” behavior), and it’s POSIX specified behavior that absent anything else, a shell script exits true.
It’s no more coincidental and implicit than “read(100)” advances the file pointer 100 bytes, or any other piece of standard behavior. Sure, it’s Unix(-like)-specific, but, well, it’s on a Unix(-like) operating system. :)
It’s precisely specified, yes, but it’s totally coincidental that the specification says what it does. A perfectly-reasonable and nearly-equivalent specification in an alternate universe where Thomson and Ritchie sneezed five seconds earlier while deciding how executables should be handled would have precisely the opposite behavior.
On the other hand, if read(100) did anything other than read 100 bytes, that would be extremely surprising and would not have come about from an errant sneeze.
Black Mirror Episode: The year is 2100 and the world is ravaged by global warming. The extra energy aggregated over decades because non executables went through /bin/sh caused the environment to enter the tipping point where the feedback loops turned on. A time machine is invented, where one brave soul goes back in time with a feather, finds Thomson and makes him sneeze, saving humanity from the brink of extinction. But then finds himself going back to 2100 with the world still ravaged. Learns that it was fruitless because of npm and left-pad.
it’s totally coincidental that the specification says what it does.
This is true of literally all software specifications, in my experience.
Surely we can agree that it is far more coincidental that an empty executable returns success immediately than that e.g. read(100) reads 100 bytes?
Why isn’t 100 an octal (or a hex or binary) constant? Why is it bytes instead of machine words? Why is read bound to a file descriptor instead of having a record size from an ioctl, and then reading in 100 records?
Just some examples. :)
Obviously, minor variations are possible. However, in no reasonable (or even moderately unreasonable) world, would read(100) write 100 bytes.
The current (POSIX) specification is the product of historical evolution caused in part by /bin/true itself. You see, in V7 Unix, the kernel did not execute an empty file (or shell scripts); it executed only real binaries. It was up to the shell to run shell scripts, including empty ones. Through a series of generalizations (starting in 4BSD with the introduction of csh), this led to the creation of #! and kernel support for it, and then POSIX requiring that the empty file trick be broadly supported.
This historical evolution could have gone another way, but the current status is not the way it is because people rolled out of bed one day and made a decision; it is because a series of choices turned out to be useful enough to be widely supported, eventually in POSIX, and some choices to the contrary wound up being discarded.
(There was a time when kernel support for #! was a dividing line between BSD and System V Unix. The lack of it in the latter meant that, for example, you could not make a shell script be someone’s login shell; it had to be a real executable.)
The opposite isn’t reasonable though. That would mean every shell script would have to explicitly exit 0 or it will fail.
Every. Shell. Script.
And aside from annoying everyone, that wouldn’t even change anything. It would just make the implementation of true be exit 0, instead of the implementation of false be exit 1.
And read(100) does do something besides read 100 bytes. It reads up to 100 bytes, and isn’t guaranteed to read the full 100 bytes. You must check the return value and use only the amount of bytes read.
It’s not obvious to me that an empty file should count as a valid shell script. It makes code generation marginally easier, I suppose. But I also find something intuitive to the idea that a program should be one or more statements/expressions (or functions if you need main), not zero or more.
So if you run an empty file with sh, you would prefer it exits failure. And when you run an empty file with python, ruby, perl, et al., also failures?
Why should a program have one or more statements / expressions? A function need not have one or more statements / expressions. Isn’t top level code in a script just a de facto main function?
It’s intuitive to me that a script, as a sequence of statements to run sequentially, could have zero length. A program with an entry point needs to have at least a main function, which can be empty. But a script is a program where the entry point is the top of the file. It “has a main function” if the file exists.
I think whatever the answer is, it makes equal sense for Perl, Python, Ruby, shell, any language that doesn’t require main().
In my opinion, your last argument begs the question. If an empty program is considered valid, then existing is equivalent to having an empty main. If not, then it isn’t.
In any case, I don’t mean to claim that it’s obvious or I’m certain that an empty program should be an error, just that it seems like a live option.
Exactly. It sounds like arbitrary hackery common in UNIX development. Just imagine writing a semi-formal spec that defines a program as “zero characters” which you pass onto peer review. They’d say it was an empty file, not a program.
I guess true shouldn’t be considered a program. It is definitely tied to the shell it runs in, as you wouldn’t call execv("true", {"/bin/true", NULL}) to exit a program correctly. for example. true has no use outside of the shell, so it makes sense to have it use the shell’s features. That is why now it tends to be a builtin. But having it a builtin is not specified by POSIX. Executing file on the other end, is, and the spec says the default exit code it 0 or “true”. By executing an empty file, you’re then asking the shell to do nothing, and then return true. So I guess it is perfectly fine for true to jist be an empty file. Now I do agree that such a simple behavior has (loke often with unix) way too many ways to be executed, ans people are gonna fight about it for quite some time!
What about these?
alias true=(exit)
alias true='/bin/sh /dev/null'
alias true='sh -c "exit $(expr `false;echo $? - $?`)"'
The one true true !
It depends upon the system. There is IEFBR14, a program IBM produced to help make files in JCL which is similar to /bin/true. So there could be uses for such a program.
It also has the distinction of being a program that was one instruction long and still have a bug in it.
“That is why now it tends to be a builtin.”
Makes sense. If tied to the shell and unusual, I’d probably put something like this into the interpreter of the shell as an extra condition or for error handling. Part of parsing would identify an empty program. Then, either drop or log it. This is how such things are almost always handled.
That would mean every shell script would have to explicitly exit 0 or it will fail.
I don’t see how that follows.
Once the file is actually passed to the shell, it is free to interpret it as it wishes. No reasonable shell language would force users to specify successful exit. But what the shell does is not in question here; it’s what the OS does with an empty or unroutable executable, for which I am contending there is not an obvious behavior. (In fact, I think the behavior of running it unconditionally with the shell is counterintuitive.)
And read(100) does do something besides read 100 bytes.
You’re being pedantic. Obviously, under some circumstances it will set error codes, as well. It very clearly reads some amount of data, subject to the limitations and exceptions of the system; zero knowledge of Unix is required to intuit that behavior.
I don’t see how that follows.
You claim the exact opposite behavior would have been equally reasonable. That is, the opposite of an empty shell script exiting true. The precise opposite would be an empty shell script—i.e. a script without an explicit exit—exiting false. This would affect all shell scripts.
Unless you meant the opposite of executing a file not loadable as an executable binary by passing it to /bin/sh, in which case I really would like to know what the “precise opposite” of passing a file to /bin/sh would be.
You’re being pedantic. Obviously, under some circumstances it will set error codes, as well. It very clearly reads some amount of data, subject to the limitations and exceptions of the system; zero knowledge of Unix is required to intuit that behavior.
No. Many people assume read will fill the buffer size they provide unless they are reading the trailing bytes of the file. However, read is allowed to return any number of bytes within the buffer size at any time.
It also has multiple result codes that are not errors. Many people assume when read returns -1 that means error. Did you omit that detail for brevity, or was it not obvious to you?
If a file is marked executable, I think it’s quite intuitive that the system attempt to execute. If it’s not a native executable, the next obvious alternative would be to interpret it, using the default system interpreter.
Saying the behavior is totally (or even partially) coincidental is a bit strong. You’re ignoring the fundamental design constraints around shell language and giving the original designers more credit than they deserve.
Consider this experiment: you pick 100 random people (who have no previous experience to computer languages) and ask them to design a shell language for POSIX. How would all of these languages compare?
If the design constraints I’m talking about didn’t exist, then it would indeed be random and one would expect only ~50% of the experimental shell languages to have a zero exit status for an empty program.
I strongly doubt that is what you would see. I think you would see the vast majority of those languages specifying that an empty program have zero exit status. In that case, it can’t be random and there must something intentional or fundamental driving that decision.
I don’t care about how the shell handles an empty file. (Returning successful in that case is basically reasonable, but not in my opinion altogether obvious.) I’m stating that the operating system handling empty executables by passing them to the shell is essentially arbitrary.
The reason for the existence of human intelligence isn’t obvious either but that doesn’t make it random. A hostile environment naturally provides a strong incentive for an organism to evolve intelligence.
As far as the operating system executing non-binaries with “/bin/sh” being arbitrary, fair enough. Though I would argue that once the concepts of the shebang line and an interpreter exist, it’s not far off to imagine the concept of a “default interpreter.” Do you think the concept of a default is arbitrary?
It’s precisely specified, yes, but it’s totally coincidental that the specification says what it does.
laughs That’s really taking an axe to the sum basis of knowledge, isn’t it?
yes an empty file signifying true violates the principle of least astonishment.However if there were a way to have metadata comments about the file describing what it does, how it works, and what version it is without having any of that in the file we’d have the best of both worlds.
true being implementable in Unix as an empty file isn’t elegant—it’s coincidental and implicit.
But isn’t this in some sense exactly living up to the “unix philosophy”?
To me, the issue is whether it is prone to error. If it is not, it is culture building because it is part of the lore.
Sad, but true, is Joe’s Law : `Frameworks grow in complexity until nobody can use them.’
Similar to Zawinski’s Law:
“Every program attempts to expand until it can read mail. Those programs which cannot so expand are replaced by ones which can.” Coined by Jamie Zawinski (who called it the “Law of Software Envelopment”) to express his belief that all truly useful programs experience pressure to evolve into toolkits and application platforms (the mailer thing, he says, is just a side effect of that). It is commonly cited, though with widely varying degrees of accuracy.