1. 9

    No, you don’t need C aliasing to obtain vector optimization for this sort of code. You can do it with standards-conforming code via memcpy(): https://godbolt.org/g/55pxUS

    1. 2

      Wow, it’s actually completely optimizing out the memcpy()? While awesome, that’s the kind of optimization I hate to depend on. One little seemingly inconsequential nudge and the optimizer might not be able to prove that’s safe, and suddenly there’s an additional O(n) copy silently going on.

      1. 2

        memset/memcpy get optimized out a lot, hence libraries making things like this: https://monocypher.org/manual/wipe

        1. 1

          Actually it’s not optimizing it out, it’s simply allocating the auto array into SIMD registers. You always must copy data into SIMD registers first before performing SIMD operations. The memcpy() code resembles a SIMD implementation more than the aliasing version.

        2. 1

          You can - and thanks for the illustration - but the memcpy is antethical to the C design paradigm in my always humble opinion. And my point was not that you needed aliasing to get the vector optimization, but that aliasing does not interfere with the vector optimization.

          1. 8

            I’m sorry but the justifications for your opinion no longer hold. memcpy() is the only unambiguous and well-defined way to do this. It also works across all architectures and input pointer values without having to worry about crashes due to misaligned accesses, while your code doesn’t. Both gcc and clang are now able to optimize away memcpy() and auto vars. An opinion here is simply not relevant, invoking undefined behavior when it increases risk for no benefit is irrational.

            1. -2

              Au contraire. As I showed, C standard does not need to graft on a clumsy and painful anti-alias mechanism and programmers don’t need to go though stupid contortions with allocation of buffers that disappear under optimization , because the compiler does not need it. My code does’t have alignment problems. The justification for pointer alias rules is false. The end.

              1. 10

                There are plenty of structs that only contain shorts and char, and in those cases employing aliasing as a rule would have alignment problems while the well-defined version wouldn’t. It’s not the end, you’re just in denial.

                1. -2

                  In those cases, you need to use an alignment modifier or sizeof. No magic needed. There is a reason that both gcc and clang have been forced to support -fnostrict_alias and now both support may_alias. The memcpy trick is a stupid hack that can easily go wrong - e.g one is not guaranteed that the compiler will optimize away the buffer, and a large buffer could overflow stack. You’re solving a non-problem by introducing complexity and opacity.

                  1. 10

                    In what world is memcpy() magic and alignment modifiers aren’t? memcpy() is an old standard library function, alignment modifiers are compiler-specific syntax extensions.

                    memcpy() isn’t a hack, it’s always well-defined while aliasing can never be well-defined in all cases. Promoting aliasing as a rule is like promoting using the equality operator between floats – it can never work in all cases, though it may be possible to define meaningful behavior in specific cases. Promoting aliasing as a rule is promoting the false idea that C is a thin layer above contemporary architectures, it isn’t. Struct memory is not necessarily the same as array memory, not every machine that C supports can deference an int32 inside of an int64, not every machine can deference an int32 at any offset. Do you want C to die with x86_64 or do you want C to live?

                    Optimizations don’t need to be guaranteed when the code isn’t even correct in the first place. First make sure your code is correct, then worry about optimizing. You talk about alignment modifiers but they are rarely used, and usually they are used after a bug has already occurred. Code should be correct first, and memcpy() is the rule we should be promoting since it is always correct. Optimizers can meticulously add aliasing for specific cases once a bottleneck has been demonstrated. You’re solving a non-problem by indulging in premature optimization.

                    1. 2

                      Do you want C to die with x86_64 or do you want C to live?

                      Heh I bet you’d get quite varied answers to this one here

                      1. -1

                        The memcpy hack is a hack because the programmer is supposed to write a copy of A to B and then back to A and rely on the optimizer to skip the copy and delete the buffer. So unoptimized the code may fault on stack overflows for data structures that exist only to make the compiler writers happier. And with a novel architecture, if the programmer wants to take advantage of a new capability - say 512 bit simd instructions , she can wait until the compiler has added it to its toolset and be happy with how it is used.

                        As for this not working in all cases: Big deal. C is not supposed to hide those things. In fact, the compiler has no idea if the memory is device memory with restrictions on how it can be addressed or memory with a copy on write semantics or …. You want C to be Pascal or Java and then announce that making C look like Pascal or Java can only be solved at the expense of making C unusable for low level programming. Which programming communities are asking for such insulation? None. C works fine on many architectures. C programmers know the difference between portable and non-portable constructs. C compilers can take advantage of SIMD instructions without requiring C programmers to give up low level memory access - one of the key advantages of programming in C. Basically, people who don’t like C are trying to turn C into something else and are offended that few are grateful.

                        1. 4

                          You aren’t writing a copy of a buffer back and forth. In your example, you are reducing an encoding of a buffer into a checksum. You are only copying one way, and that is for the sake of normalization. All SIMD code works that way, you always must copy into SIMD registers first before doing SIMD operations. In your example, the aliasing code doesn’t resemble SIMD code both syntactically and semantically as much the memcpy() code does and in fact requires a smarter compiler to transform.

                          The chance of overflowing the stack is remote, since stacks now automatically grow and structs tend to be < 512 bytes, but if that is a legitimate concern you can do what you already do to avoid that situation, either use a static buffer (jeopardizing reentrancy) or use malloc().

                          By liberally using aliasing, you are assuming a specific implementation or underlying architecture. My point is that in general you cannot assume arbitrary internal addresses of a struct can always be dereferenced as int32s, so in general that should not be practiced. In specific cases you can alias, but those are the exceptions not the rule.

                          1. 1

                            The chance of overflowing the stack is remote, since stacks now automatically grow and structs tend to be < 512 bytes, but if that is a legitimate concern you can

                            … just copy the ints out one at a time :) https://godbolt.org/g/g8s1vQ

                            The compiler largely sees this as a (legal) version of the OP’s code, so there’s basically zero chance it won’t be optimised in exactly the same way.

                            1. 0

                              All copies on some architectures reduce to: load into register, store from register. So what? That is why we have a high level language which can translate *x = *y efficiently. The pointer alias code directly shows programmer intent. The memcpy code does not. The “sake of normalization” is just another way of saying “in order to cooperate with the fiction that the inconsistency in the standard produces”.

                              In many contexts, stacks do NOT automatically grow.Again, C is not Java. OS code, drivers, embedded code, even many applications for large systems - all need control over stack size. Triggering stack growth may even turn out to be a security failure for encryption which is almost universally written in C because in C you can assure time invariance (or you could until the language lawyers decided to improve it). Your proposal that programmers not only use a buffer, but use a malloced buffer, in order to allow the optimizer (they hope) not to use it, is ridiculous and is a direct violation of the C model.

                              “3. C code can be non-portable. Although it strove to give programmers the opportunity to write truly portable programs, the Committee did not want to force programmers into writing portably, to preclude the use of C as a “high-level assembler;” the ability to write machine-specific code is one of the strengths of C. It is this principle which largely motivates drawing the distinction between strictly conforming program and conforming program.” ( http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2021.htm)

                              Give me an example of an architecture where a properly aligned structure where sizeof(struct x)%sizeof(int32) == 0 cannot be accessed by int32s ? Maybe the itanium, but I doubt it. Again: every major OS turns off strict alias in the compilers and they seem to work. Furthermore, the standard itself permits aliasing via char* (as another hack). In practice, more architectures have trouble addressing individual bytes than addressing int32s.

                              I’d really like to see more alias analysis optimization in C code (and more optimization from static analysis) but this poorly designed, badly thought through approach we have currently is not going to get us there. To solve any software engineering problem, you have to first understand the use cases instead of imposing some synthetic design.

                              Anyways off the airport. Later. vy

                              1. 2

                                I’m willing to agree with you that the aliasing version more clearly shows intent in this specific case but then I ask, what do you do when the code aliases a struct that isn’t properly aligned? There are a lot of solutions but in the spirit of C, I think the right answer is that it is undefined.

                                So I think what you want is the standard to define one specific instance of previously undefined behavior. I think in this specific case, it’s fair to ask for locally aliasing an int32-aligned struct pointer to an int32 pointer to be explicitly defined by the standards committee. What I think you’re ignoring, however, is all the work the standards committee has already done to weigh the implications of defining behavior like that. At the very least, it’s not unlikely that there will be machines in the future where implementing the behavior you want will be non-trivial. Couple that with the burden of a more complex standard. So maybe the right answer to maximize global utility is to leave it undefined and to let optimization-focused coders use implementation-defined behavior when it matters but, as I’m arguing, use memcpy() by default. I tend to defer to the standards committees because I have read many of their feature proposals and accompanying rationales and they are usually pretty thorough and rarely miss things that I don’t miss.

                                Everybody arguing here loves C. You shouldn’t assume the standards committee is dumb or that anyone here wants C to be something it’s not. As much as you may think otherwise, I think C is good as it is and I don’t want it to be like other languages. I want C to be a maximally portable implementation language. We are all arguing in good faith and want the best for C, we just have different ideas about how that should happen.

                                1. 1

                                  what do you do when the code aliases a struct that isn’t properly aligned? There are a lot of solutions but in the spirit of C, I think the right answer is that it is undefined.

                                  Implementation dependent.

                                  Couple that with the burden of a more complex standard.

                                  The current standard on when an lvalue works is complex and murky. Wg14 discussion on how it applies shows that it’s not even clear to them. The exception for char pointers was hurriedly added when they realized they had made memcpy impossible to implement. It seems as if malloc can’t be implemented in conforming c ( there is no method of changing storage type to reallocate it)

                                  C would benefit from more clarity on many issues. I am very sympathetic to making pointer validity more transparent and well defined. I just think the current approach has failed and the c89 error has not been fixed but made worse. Also restrict has been fumbled away.

                        2. 2

                          You don’t need a large buffer. You can memcpy the integers used for the calculation out one at a time, rather than memcpy’ing the entire struct at once.

                          Your designation of using memcpy as a “stupid hack” is pretty biased. The code you posted can go wrong, legitimately, because of course it invokes undefined behaviour, and is more of a hack than using memcpy is. You’ve made it clear that you think the aliasing rules should be changed (or shouldn’t exist) but this “evidence” you’ve given has clearly been debunked.

                          1. 0

                            Funny use of “debunked”. You are using circular logic. My point was that this aliasing method is clearly amenable to optimization and vectorization - as seen. Therefore the argument for strict alias in the standard seems even weaker than it might. Your point seems to be that the standard makes aliasing undefined so aliasing is bad. Ok. I like your hack around the hack. The question is: why should C programmers have to jump through hoops to avoid triggering dangerous “optimizations”? The answer: because it’s in the standard, is not an answer.

                            1. 3

                              Funny use of “debunked”. You are using circular logic. My point was that this aliasing method is clearly amenable to optimization and vectorization - as seen

                              You have shown a case where, if the strict aliasing rule did not exist, some code could [edit] still [/edit] be optimised and vectorised. That I agree with, though nobody claimed that the existence of the strict aliasing rule was necessary for all optimisation and vectorisation, so it’s not clear what you do think this proves. Your title says that the optimisation is BECAUSE of aliasing, which is demonstrably false. Hence, debunked. Why is that “funny”? And how is your logic any less circular then mine?

                              The question is: why should C programmers have to jump through hoops to avoid triggering dangerous “optimizations”?

                              Characterising optimisations as “dangerous” already implies that the code was correct before the optimisation was applied and that the optimisation can somehow make it incorrect. The logic you are using relies on the code (such as what you’ve posted) being correct - which it isn’t, according to the rules of the language (which, yes, are written in a standard). But why is using memcpy “jumping through hoops” whereas casting a pointer to a different type of pointer and then de-referencing it not? The answer is, as far as I can see, because you like doing the latter but you don’t like doing the former.

                      2. 1

                        The end.

                        The internet has no end.

                1. 18

                  I no longer believe that daemons should fork into the background. Most Unix systems now have better service control and it makes the code easier to deal with if it doesn’t call fork(). This makes it easier to test (no longer do you have to provide an option not to fork() or an option to fork()) and less code is always better.

                  1. 6

                    Not forking also allows logging to be an external concern and the process should just write to stdout and stderr as normal.

                    1. 1

                      This is not so much about the forking per se, but rather the other behaviour that generally goes with it: closing any file descriptors that might be connected to a controlling terminal.

                    2. 4

                      OpenBSD’s rc system seems to expect that processes fork. I don’t see an obvious workaround for processes that don’t fork.

                      1. 3

                        It’s not that hard to write a program to do the daemonization (call umask(), setsid(), chdir(), set up any redirection of stdin, stdout and stderr, then exec() the non-forking daemon.

                        1. 2

                          It’s even simpler when you have daemon(3): http://man7.org/linux/man-pages/man3/daemon.3.html

                          1. 1

                            Which you do on OpenBSD, actually.

                            Note that daemon(3) is a non-standard extension so it should be avoided for portable code. The implementation is simple enough, though.

                        2. 2

                          I’m not sure this is accurate, at least on -current. There are several go “deamons” that as far as I understand don’t support fork(2). These can still be managed by OpenBSD’s rc system:

                          # cd /etc/rc.d
                          # cat grafana                                                                                                                                                                                                  
                          #!/bin/ksh
                          #
                          # $OpenBSD: grafana.rc,v 1.2 2018/01/11 19:27:10 rpe Exp $
                          
                          daemon="/usr/local/bin/grafana-server"
                          daemon_user="_grafana"
                          daemon_flags="-homepath /usr/local/share/grafana -config /etc/grafana/config.ini"
                          
                          . /etc/rc.d/rc.subr
                          
                          rc_bg=YES
                          rc_reload=NO
                          
                          rc_cmd $1
                          

                          I’m not sure if there’s more to it that I don’t understand, I don’t write many deamons!

                          1. 1

                            Well, it turns out, I can’t read! The key to this is rc_bg, see https://man.openbsd.org/rc.subr#ENVIRONMENT

                        3. 1

                          For those that don’t know, daemontools is a nice service system that explicitly wants programs to not try to daemonize themselves. For services I build and run I try to use that.

                        1. 7

                          Do not prematurely introduce dependencies.

                          Worrying about performance is vastly overvalued. Good performance is a by-product of pursuing other goals, mainly simplicity.

                          This is largely true, based on my experience. Good advice to follow.

                          Do not use Markdown. [The essay referred to in the slides.]

                          I think the author is a bit harsh on Markdown, especially if you read the linked essay. It was designed for writing for the web, and was meant to be converted to HTML only. The author criticizes it for lacking things it was never meant to have.

                          But I do agree that it’s terrible for technical documentation and it’s a travesty that it’s become the norm in that respect. I recently switched to using Markdown for some internal documentation at work mostly because Bitbucket will render it and it (hopefully) encourages others on the team to write docs. When I’m writing with it, though, it’s a mess. No description lists, linking between things is ugly, and you have to manually make thing like a table of contents. I’d switch to mdoc but then I’m worried no one else will bother to writing anything. Maybe that will happen anyway with the cruddiness of Markdown, so perhaps it won’t matter.

                          (Bonus: in the linked essay, I learned that GNU is apparently going to kill off info. No arguments here!)

                          1. 4

                            it’s a travesty that it’s become the norm in that respect

                            I believe we mostly have GitHub becoming the norm to thank for that. Its native formatting for rich presentation is Markdown via the README.md. With GitHub being massively adopted and Markdown being the least path of resistance, it’s hardly surprising how this came to be. So that hampers mdoc adoption.

                            Because everybody writes their documentation in README.md now (if at all), they also expect Markdown to man page converters. Those emit man(7) more or less by necessity. People unfamiliar with mandoc won’t care, but those that are may be annoyed by the semantic information that is lost. Not only because mandoc produces worse HTML output because of it, but also because mandoc’s semantic search won’t work for those man pages. However, the group that are unfamiliar with mandoc intersects more or less entirely with the group of people who writes Markdown exclusively. This further hampers mdoc adoption.

                            We have a man page and documentation problem. And there doesn’t seem to be a way to help it.

                            1. 13

                              You’ve long been able to render AsciiDoc, org, rST, among other lightweight markup languages, to HTML with GitHub. For example:

                              I dunno why Markdown “won”, but maybe it had something to do with:

                              • (at one time) an informal spec, easy to implement (though, probably, with bugs) in a couple days
                              • the use by Reddit for its comment system (compare w/ bbcode)
                              • good marketing - gruber was already a known figure when he unleashed markdown, was able to publicize it with his widely-read personal blog
                            2. 2

                              (Bonus: in the linked essay, I learned that GNU is apparently going to kill off info. No arguments here!)

                              Looks like that was from 2014 and I can’t tell if the proposed replacement is genuine or a very bad joke. Either way, it doesn’t feel like much has changed in 4 years unfortunately.

                              1. 1

                                you have to manually make thing like a table of contents

                                This is a specific solution (VSCode specific), but I discovered markdown-toc recently and I like it. There are other tools that can be used to add a TOC to a markdown based on headings.

                                I actually find Markdown convenient for Readmes and user land documentation. Because I can get by with very little markup and the markup reads fine as text it encourages me to write.

                                I can totally see how it would be a pain for technical documentation for code bases, but I suspect you would use .rst and an assist from an automatic documentation generator for that.

                                1. 1

                                  I am regretting the Markdown choice already and I’ll probably switch to something else that can export Markdown so it can be read directly while browsing the repo.

                                  So yeah, whether that’s .rst or not remains to be seen, but the pattern is the same.

                              1. 2

                                I love how “boring” syspatch is. freebsd-update is certainly more complex, but does a pretty solid job itself.

                                I am, however, a bit concerned by FreeBSD’s “packaging base” movement, both in terms of complexity (lots of tiny packages), and how long it seems to be taking.

                                1. 2

                                  lots of tiny packages

                                  illumos is a wonderful example of just how much pain this is.

                                1. 2

                                  Slide 15 mentions that deraadt@ signs the patch file. This made me wonder: What happens if he unexpectedly falls critically ill or dies? For at least one OpenBSD release cycle, there might just be no way to sign anything.

                                  1. 4

                                    It bears noting that the OpenBSD project’s man page viewer is actually a CGI program written in C.

                                    1. 4

                                      It’s fast, too. I wonder if it would be faster if they did something really hardcore like write it in a language that’s less tame. Like this.

                                    1. 34

                                      It’s a hipster-free

                                      This may just be the most hipster thing I’ve seen since COBOL on Cogs

                                      1. 6

                                        COBOL on Wheelchair also exists.

                                        1. 5

                                          do not forget bash on balls: https://github.com/jneen/balls

                                        1. 18

                                          Obviously, re-reading the article is the preferred way to proofread.

                                          My advice is to re-read your work many, many times. And be merciless to your words. Always ask why each element (word, sentence, paragraph, section) is there and don’t be afraid to remove it.

                                          [taking a break and reading it later] allows the mind to reset so the writer can more easily see typos or spelling mistakes.

                                          Putting it aside and reading it again later is critical, in my experience. Everyone I know who writes says this. Also, typos and spelling mistakes are the least of your problems. It’s structure and clarity that you’re looking for. Spellcheckers will mostly take care of the typo problem.

                                          Even a blog post needs tags, categories, and images.

                                          This I don’t buy.

                                          As a frequent reader of technical articles, I almost always fire up reader view in Firefox to (hopefully) remove all that cruft. Images can be very useful, but the practice of adding “meme-ish” images between paragraphs is mostly useless. Don’t go in assuming you need an image: add the image if it helps explain the point. Lead-in images (the ones at the top of an article) might be okay, but are still cruft, in my opinion. Layouts where a title or side bar is floating fixed at some location, especially when it contains a site logo or the author’s image, is just distracting.

                                          Tags and categories may be the least useful part of any blog post. I’ve never paid attention to them, but maybe I’m doing something wrong. Completely ignoring them has never seemed to be a deteriment, though.

                                          1. 3

                                            My advice is to re-read your work many, many times. And be merciless to your words. Always ask why each element (word, sentence, paragraph, section) is there and don’t be afraid to remove it.

                                            I agree. A text isn’t finished when there is nothing left to add – it its finished when there is nothing left to remove. It is much easier to go through a text where you can take every word at face value than it is to filter the critical points out of fluff.

                                            1. 2

                                              I appreciate your point on images. I don’t use reader views. How does it handle diagrams or images within the article itself?

                                              Tagging and categories are so folks can find the article more easily. This is search engine fodder essentially but, it’s also a key organizational tool on most web sites.

                                              1. 5

                                                Load up your own article in Firefox. Look at it in original form. Then, go to View menu, hit Enter Reader Mode, and look at it again. It nicely illustrates what GeoffWozniak is talking about.

                                                1. 5

                                                  I appreciate your point on images. I don’t use reader views. How does it handle diagrams or images within the article itself?

                                                  It doesn’t. In a few cases, it removes useful images and diagrams. In the vast majority of cases, it removes header images, bad memes, and cruft.

                                                  IMO most people use images poorly. Not displaying them is a sensible default.

                                              1. 7

                                                > discord

                                                God please, no!

                                                1. 2

                                                  Better than slack.

                                                  1. 7

                                                    Of course!

                                                    But why you need to stick to proprietary solutions and make them unreachable on platforms you’re caring about on this community? Wouldn’t be better to just use IRC like civilized people do?

                                                    1. 7

                                                      Trying to convince people who want Slack or Discord to use IRC will get you nowhere.

                                                      IRC is awesome and some of us have been using it since dirt but it ITSELF lacks features some modern users really want - built in search / logging / voice chat / built in image / sound rendering, etc etc etc.

                                                      You can say “Bah that’s all crap” - and I’ll agree with you, but that doesn’t stop people from wanting.

                                                      Personally, I wish more open source folk would explore sollutions like https://zulipchat.com/

                                                      1. 2

                                                        Direct link to the code for everybody’s convenience: https://github.com/zulip/zulip

                                                        1. 1

                                                          I know Zulip but haven’t tried it personally yet…

                                                          And, more importantly - does it have an IRC gateway? :)

                                                          1. 1

                                                            Sort of: https://github.com/zulip/python-zulip-api/issues/106

                                                            I still like zulip quite a lot, i think its concept of topics does really improve discussions.

                                                      2. 1

                                                        They have an IRC channel too, and a bot that communicates between IRC & Discord

                                                        1. 1

                                                          These bridging bots (between Slack/Discord/Matrix/Telegram/Hipchat and IRC) are quite incomplete solutions, as they can’t do “puppeting” so the bot impersonates all IM users as single IRC user and it’s bad to interact with them in that way.

                                                          I hope Matrix could solve this in the future.

                                                          1. 1

                                                            I’ve been using Matrix for about 18 months, and it does puppeting perfectly when bridging to IRC, from either side.

                                                            The Slack bridging with Matrix looks to behave in a similar way; you’re almost unable to distinguish native users and bridged users.

                                                  1. 7

                                                    Why not just directly write man(7), which is all this tool produces? Or use the existing perlpod, pandoc, docbook, lowdown, rst2man, or any other tool doing exactly the same thing from diverse formats?

                                                    Because I’m sure the world needs more opaque, un-indexable manpages.

                                                    (Edit: to clarify, use mdoc(7).)

                                                    1. 5

                                                      Author here. Did you even read the blog post? I answered all of these questions.

                                                      perlpod is built on a mountain of perl, and pandoc on a mountain of haskell. lowdown is a Markdown implementation, and Markdown and roff are mutually exclusive. RST and roff are mutually exclusive. I spoke about docbook directly in my article (via asciidoc, which is a docbook frontend). I also directly addressed mdoc.

                                                      Man pages are already being indexed. If you search the web for “man [anything]” you’ll find numerous websites which scrape packages and convert the roff into HTML.

                                                      1. 1

                                                        Thanks for your hack. It’s a good candidate for a port in my little os.

                                                        A couple of question:

                                                        • have you considered to avoid the bold markers around man page refs as you already have the parentheses to identify the reference?
                                                        • also section titles have conventional names: what about omitting the starting sharp to mark them as titles?
                                                        • what about definition lists? (I know they are an HTML thing, but they can be useful to describe options for example)
                                                        • I know tables are the most difficult format to express in a readable source form, but what alternatives did you considered and why you discarded them?

                                                        And btw… Thanks again!

                                                        1. 2

                                                          Glad you like it!

                                                          have you considered to avoid the bold markers around man page refs as you already have the parentheses to identify the reference?

                                                          This is an interesting thought. https://todo.sr.ht/~sircmpwn/scdoc/12

                                                          also section titles have conventional names: what about omitting the starting sharp to mark them as titles?

                                                          I’m not fond of this idea. Given that lots of man pages will need to have section titles which fall outside of the conventinoal names, and that I want all headers to look the same, this isn’t the best design imo.

                                                          what about definition lists? (I know they are an HTML thing, but they can be useful to describe options for example)

                                                          man pages do “definition lists” with borderless tables, which are possible to write with scdoc like this

                                                          |[ *topic*
                                                          :[ definition
                                                          |  *topic
                                                          :  definition
                                                          # etc
                                                          

                                                          I know tables are the most difficult format to express in a readable source form, but what alternatives did you considered and why you discarded them?

                                                          The main approach I’ve seen elsewhere is trying to use something resembling ascii art to make tables look like tables in the source document. I’ve never been fond of this because you then have to do annoying edits when updating the table to keep all of the artsy shit intact, which in addition to being just plain annoying can also bloat your diffs, lead to more frequent merge conflicts, etc.

                                                          An alternative some formats have used is to make aligning your columns optional, but still using an artsy-fartsy kind of style. I figure that if you’re going to make aligning the columns optional you no longer have any reason to require a verbose format like that. So I invented something more concise.

                                                          Also, the troff preprocessor used for tables supports column alignment specifiers and various border styles, which I wanted to expose to the user in a concise way. Other plaintext table formats often have this feature but never concise.

                                                          1. 1

                                                            man pages do “definition lists” with borderless tables

                                                            Do you think you could render something like this with scdoc in a source-readable way http://man7.org/linux/man-pages/man8/parted.8.html (see section OPTIONS and COMMAND)?

                                                            The main approach I’ve seen elsewhere is trying to use something resembling ascii art to make tables look like tables in the source document.

                                                            Actually it was what I was thinking about. You propose a good point, but my counter argument is that manual pages are (hopefully) read more often then they are written. But I admit that my goal is people using cat to read manual pages by default, so I can see how in a more conventional system using Troff the people most often read a rendered page, thus the annoyance is pointless. OTOH, it should be relatively easy to write a tool that take scdoc document as input and output another scdoc document where tables are automatically aligned, removing the annoyance to align the cells while writing.

                                                            Having said that, I find your table syntax nice.
                                                            I wonder if one could nest tables (I mean put a table in a cell). Also, you organize the table by rows, but given the format, some table might benefit from being organized by column.

                                                            1. 2

                                                              Do you think you could render something like this with scdoc in a source-readable way http://man7.org/linux/man-pages/man8/parted.8.html (see section OPTIONS and COMMAND)?

                                                              You don’t actually even need tables for this. scdoc preserves your indent. https://sr.ht/I0g7.txt

                                                              I wonder if one could nest tables (I mean put a table in a cell). Also, you organize the table by rows, but given the format, some table might benefit from being organized by column.

                                                              I think nested tables is a WONTFIX. Also not sold on column-oriented tables. IMO man pages should be careful to keep their tables fairly narrow to stay within 80 characters.

                                                              1. 1

                                                                Wow, that’s really readable!

                                                                Fine for nested tables. Just to be sure I explained what I meant by column-oriented (that just like nested tables might or might not be a good idea): suppose you want to create something like

                                                                English    Italian    Swahili
                                                                Hello!     Ciao!      Habari?
                                                                Tour       Viaggio    Safari
                                                                Lion       Leone      Simba
                                                                

                                                                You might prefer a syntax like

                                                                |[ English
                                                                :[ Hello!
                                                                :[ Tour
                                                                :[ Lion
                                                                |[ Italian
                                                                :[ Ciao!
                                                                :[ Viaggio
                                                                :[ Leone
                                                                |[ Swahili
                                                                :[ Habari?
                                                                :[ Safari
                                                                :[ Simba
                                                                

                                                                Or even, for such a simple table (that I don’t know if actually exists in a man page, so…), you could put each column (or row) in the same line:

                                                                |[ English :[ Hello! :[ Tour :[ Lion
                                                                |[ Italian :[ Ciao! :[ Viaggio :[ Leone
                                                                |[ Swahili :[ Habari? :[ Safari :[ Simba
                                                                

                                                                (that a tool could easily turn into:

                                                                |[ English :[ Hello!  :[ Tour    :[ Lion
                                                                |[ Italian :[ Ciao!   :[ Viaggio :[ Leone
                                                                |[ Swahili :[ Habari? :[ Safari  :[ Simba
                                                                

                                                                )

                                                                Ok… now I’ve really annoyed you enough for a single night… good work!

                                                      2. 5

                                                        Because you cannot have progress without research.

                                                        Now troff is not readable in source form.
                                                        This is better in this regard. You are right about indexing, but the project have a very short log. I guess we can talk about it with the author, and see what he think about that.

                                                        Maybe he like the idea, and add it. Or he doesn’t, and will not add it.
                                                        You will always be able to fork it and fine tune to you need.

                                                        I’m grateful to hackers who challenge the status quo.

                                                        1. 4

                                                          While mdoc(7) is great (thanks for that!) , I think your questions are answered on the page. I think lowdown is probably the closest to what u/SirCmpwn was aiming for (no dependencies, man output), maybe they hadn’t seen it?

                                                          Man formatting is inscrutable to the un-trained eye (most people), and we need to acknowledge the popularity of markdown is related to its ease of reading/writing.

                                                          1. 4

                                                            I think your questions are answered on the page. I think lowdown is probably the closest to what u/SirCmpwn was aiming for (no dependencies, man output), maybe they hadn’t seen it?

                                                            groff (as installed on every Linux distribution that uses groff for man pages, which is basically all of them, and macOS) has had native support for mdoc for at least a decade. If you install an mdoc man page and then man $thepage, you get exactly what you expect.

                                                        1. 11

                                                          And the markup is quite presentation oriented; much of it is visual rather than structural and thus difficult to translate well to the web

                                                          mdoc (man page linked to is from 4.4BSD, around 1994) has been a thing for a couple of decades now. Many notable *NIX has moved to mdoc – OpenBSD, FreeBSD, NetBSD, illumos; notably missing macOS and most Linux distributions. If you’ll take a look at it, it can encode a lot of semantic information.

                                                          For one reason or another, the GNU project seems to vehemently resist the idea of using mdoc, however; the original man macros indeed do have this problem. As do all * to man page converters because they can’t know the semantic information required by mdoc, after all, as well as people being simply unaware of mdoc. mdoc works just fine with groff on Linux.

                                                          And of course, because people new to writing man pages will copy what they see in the ecosystem they’re in (which is GNU on Linux or macOS in many cases), the man macros will continue getting copied. Or they don’t bother to learn anything at all and rely on converters and manual formatting. If they even bother to write a man page at all, these days.

                                                          1. 3

                                                            Just a small note: the script could be writen more consisley (and maybe in a more understandable way) by doing:

                                                            /baz
                                                            t
                                                            s/baz/elephants
                                                            wq
                                                            

                                                            Writing .t. is like running vi ./file instead of vi file in a shell. And ed allows you to write and quit in the same command, just like :wq does in vi.

                                                            While I would have personally chosed emacs to do this task (using dired + keyboads marcos would be quite straightforward), I do agree that ed(1) is a quite helpfull and underestimated tool, especially when you embed it into a shell script with a here-doc. And despite apperances, it really isn’t that complicated, especially when you have a good man page (eg. OpenBSD’s) or have GNU Info + the ed manual installed, in case one needs to do something more esoteric.

                                                            1. 5

                                                              And despite apperances, it really isn’t that complicated

                                                              UNIX V7 actually shipped interactive tutorials to learn ed(1) as part of learn. It’s unfortunate that there’s no convenient way to actually make use of those. You’d actually have to set up a PDP-11 emulator with V7 (though prebuilt images exist) and work with that, an environment where backspace doesn’t really work out of the box.

                                                              1. 1

                                                                I’m a pretty mediocre emacs user, and poking around at the manual, I wasn’t qutie sure how to use dired to apply a macro to multiple files. I guess if you had a dired buffer with just the files you wanted, you could write the macro to open the file, do the operation, return to the dired buffer, then go on. Is that the idea?

                                                                1. 1

                                                                  While I’m no expert, that would have been what I would was thinking about. And despite first appearances, I don’t even think there’s anything too wrong about it either. I guess if you really wanted to be “save” you could write a script that processes all buffers on a stack by applying a function or a marco within them, but I don’t see the practical advantage. Whenever I did “start a macro in dired, open a file, edit it, close, move to next line (manually or via C-s)”, I didn’t have any problems with the method.

                                                              1. 9

                                                                And if you don’t like the GPL that much, there’s:

                                                                • A portable version of NetBSD’s libedit.
                                                                • antirez (of Redis fame)’s linenoise if you just want to drop a single C source code file in.
                                                                • linenoise-ng, which actually has UTF-8 support, too, but requires a C++ compiler and consists of more than one source code file; it does expose a C interface, though.
                                                                1. 2

                                                                  I don’t like replacement because they always fail to implement something or other about GNU readline. For example, Ctrl-o is one of my favourite keybindings. You use Ctrl-r to search back in history, and then successively hit C-o to replay history from the search point. This feature is obscure and almost never duplicated in replacements.

                                                                1. 4

                                                                  historical pffff :-)

                                                                  A great game. There’s a neat android port and “browserhack” to play on the web. For the uninitiated, I recommend reading a few spoilers (nethackwiki), but try not to go too deep!

                                                                  1. 3

                                                                    Games like this are more fun to me hand in hand with heaps of spoilers. Asymmetric information games are just frustrating to me. I want the death to be my fault, rather than just the element of surprise.

                                                                    1. 2

                                                                      I’m split on this. On the one hand, the really interesting part of strategy games tends to be at the efficiency frontier, that is, the difference between good and excellent, and much less so going from clueless to decent.

                                                                      On the other hand, it is cool that the game can teach you about itself through your failures (where answering a question is just trying it in the game and seeing what happens), and that optimizing this learning process itself is part of the strategy [1].

                                                                      In reality, I tend to learn a bunch from watching others play then enjoy the fine tuning process.

                                                                      [1] roguelike games tend to force this kind of meta learning anyway, in systems like potion and scroll discovery being randomized each run, so you need to develop a way to learn, not just learn it once. Genre tends to be abused, but if there were going to be a useful definition of roguelike, it would be built around this meta learning.

                                                                      1. 1

                                                                        That’s why DCSS is so good :) No ridiculous secret information, just you and your tactics.

                                                                    1. 1

                                                                      I note (again) that I’m looking for somebody to maintain this… (It needs a lot of love!)

                                                                      1. 1

                                                                        Does that include the History of UNIX Manpages, too? Some links are dead, at least.

                                                                        And what would actually need changing? I thought the general man page format and mdoc change once in a blue moon.

                                                                      1. 5

                                                                        This is a nice effort, but one wonders why the author doesn’t want to use vmstat(8).

                                                                        Side note: The author doesn’t seem to be too familiar with OpenBSD and its conventions. The man page was written in man(7), which is deprecated in favor of mdoc(7) on OpenBSD

                                                                        1. 4

                                                                          Thanks very much for your pointing out to use mdoc!

                                                                          Compared to vmstat(8), my simple toy has following differences:
                                                                          (1) Add displaying swap space;
                                                                          (2) Only consider active pages as “used” memory, others are all counted as “free” memory.IMHO, for the end user who doesn’t care the guts of Operating System, maybe this method is more plausible?

                                                                          All in all, I just write a small tool for fun, and thanks very much again for giving pertinent advice!

                                                                          1. 2

                                                                            Agreed. Sometimes you don’t really care about everything vmstat offers. free is dirty neat :)

                                                                            • TIL about mdoc
                                                                            1. 1

                                                                              P.S. After some testing, I modify the calculating free method just now: use free pages as “free” memory, then others are considered as “used” memory.

                                                                            2. 3

                                                                              Thanks for educating me about the distinction: https://github.com/blinkkin/blinkkin.github.com/wiki/man-vs-mdoc

                                                                              1. 4

                                                                                I’d suggest Practical UNIX Manuals for introductionary reading for mdoc, too: https://manpages.bsd.lv/mdoc.html

                                                                            1. 5

                                                                              CGI is a dying technology.

                                                                              It probably is dying, but I feel I ought to point out that OpenBSD’s man.cgi uses CGI and OpenBSD has added slowcgi(8) to base.

                                                                              1. 5

                                                                                If you’re interested in seriously learning troff, I’d like to double down on the recommendation in the article: Take an afternoon or two to read at least the troff parts of UNIX Text Processing, available for the low cost of free these days.

                                                                                1. 2

                                                                                  The book is a good read, the plan9 troff tutorial is also helpful.

                                                                                1. 1

                                                                                  Was it really necessary to make the political dig “orangefuckface@whitehouse.gov” in this post? It makes the whole talk seem a tad more unprofessional than it needed to be.

                                                                                  1. 10

                                                                                    Eh, at this point insulting Führer Cheeto has widely become socially acceptable, even in professional settings. In San Francisco it’s blatantly a free pass to say something outrageously inappropriate.

                                                                                    1. 5

                                                                                      Which means they’re basically acting more like him given how he does Twitter and meetings. At least they’re trolling a fellow troll this time. That’s progress I guess. ;)

                                                                                      EDIT to add: Yet, if anyone does it the other way, many of them will cry that it’s offensive with conference ejection or job termination being mandatory. I say they need to knock it off or take what they dish out.

                                                                                    2. 4

                                                                                      I agree. It isn’t exactly taking the high road.

                                                                                      1. 0

                                                                                        She tells you up front that it’s basically a rant. If it was me writing it I would have used pussygrabber@whitehouse.gov, because I’m a guy and could probably get away with it.

                                                                                      1. 16

                                                                                        I fucking hate reCaptcha, partly because the problems seem to be getting harder over time. Sometimes I literally can’t spot the cars in all the tiles.

                                                                                        1. 19

                                                                                          It’s also very effective at keeping Tor out. ReCATPCHA will, more often than not, refuse to even serve a CAPTCHA (or serve an unsolveable one) to Tor users. Then remember that a lot of websites are behind CloudFlare and CloudFlare uses ReCAPTCHA to check users.

                                                                                          Oops.

                                                                                          1. 2

                                                                                            For the Cloudflare issue you can install Cloudflare’s Privacy Pass extension that maintains anonymity, but still greatly reduces or removes the amount of reCaptchas Cloudflare shows you if you’re coming from an IP with bad reputation, such as a lot of the Tor exit nodes.

                                                                                            (Disclaimer: I work at Cloudflare but in an unrelated department)

                                                                                            1. 2

                                                                                              Luckily, CloudFlare makes it easy for site owners to whitelist Tor so Tor users don’t get checked.

                                                                                              1. 9

                                                                                                Realistically, how many site owners do that, though?

                                                                                            2. 16

                                                                                              I don’t hate it because it’s hard. I hate it because I think Google lost its moral compass. So, the last thing that I want to do is to be a free annotator for their ML efforts. Unfortunately, I have to be a free annotator anyway, because some non-Google sites use reCaptcha.

                                                                                              1. 7

                                                                                                Indeed, also annoying is you have to guess at what the stupid thing is trying to indicate as “cars”. Is it a full image of the car or not? Does the “car” span multiple tiles? Is it obscured in one tile and not in another? Which of those “count” if so? Should I include all the tiles if say the front bumper is in one tile or not? (my experiments have indicated not).

                                                                                                Or the store fronts, some don’t have any signage, they could be store fronts, or not, literally unknowable by a human or an AI with that limited of information.

                                                                                                I’m sick of being used as a training set for AI data, this is even more annoying than trying to guess if the text in question was using Fraktur and the ligature in question is what google thinks is an f, or an s. I love getting told I’m wrong by a majority of people not being able to read Fraktur and distinguish an f from an s from say an italic i or l. Now I get to be told I can’t distinguish a “car” by an image training algorithm.

                                                                                                1. 4

                                                                                                  At some point, only machines will be able to spot the cars.