1.  
    cat file.txt \    
      | sort           `# I *can* put a comment here` \
      | cut -f 1 \
        `# And I can put one here too` \
      | grep foo
    

    :^)

    1. 5

      I would usually just write:

        cat file.txt | # "useless cat" recapitulated
        sort         | # I *can* put a comment here
        cut -f 1     | # And I can put one here too
        grep foo       # and even here
      
      1.  

        That’s OK for pipelines, but it falls down for the equivalent of

        ... find /bin               # traverse this directory
            -type f -a -executable  # filter executable files
            -a printf '%s %P\n'     # print size and path
          | sort -n                 # sort numerically
          ;
        

        The Tour has an example like this, probably should have put it in the blog post too …

        1.  

          how would you do this without the useless cat?

          1.  
            1. You can do: sort < file.txt | # comment ….
            2. UUoC makes sometimes sense and does not deserve so much hate as it gets in internet discussions.
            3. I prefer | + indentation on the beginning of the next line, so this style of comments is not much useful for me (If I have to, I will probably rather use the backtick trick).
            1.  

              Also, just sort file.txt | ...

              1.  

                Or if you want to keep the order consistent with the data flow:

                < file.txt \
                sort | \
                ...
                
        2.  

          I’ve seen this trick! But I don’t like all the noise with \ and backticks.

          I probably should have mentioned it … Right now it actually doesn’t work in OSH because # goes to the end of the line – we don’t look for a backtick. But that is not fundamental and could be changed. I haven’t seen it enough in real code to make it a priority. I think I saw it in some sandstorm shell scripts, and that’s about it.

          The 1.2M lines of shell corpus (https://www.oilshell.org/release/0.9.2/test/wild.wwz/) either contains 0 or 1 instances of it, I forget …

          1.  

            But I don’t like all the noise with \ and backticks.

            i agree it’s a total hack. it wasn’t meant as a serious proposal (although i have used it in the past for a lack of alternatives), i was just being a smart-ass.

            oil’s approach looks way better :)

            1.  

              Oh also another reason not to use that trick is that it actually relies on “empty elision”. Shell omits empty strings from argv, which is related to word splitting. Consider:

              $ argv X $(echo) `echo` Y
              

              This results in an argv of ['X', 'Y'] in shell, but ['X', '', '', 'Y'] in Oil. Compare with:

              $ argv X "$(echo)" "`echo`" Y
              

              I deem this too confusing which is why Oil Doesn’t Require Quoting Everywhere !

          1. 1

            River, don’t worry about my question, because I really like your other submissions; but isn’t this submission a bit off-topic? I just don’t see how it’s related to computing beyond taking place on a computer. All your other submissions are really good, though.

            1. 2

              Yeah this was probably not good to submit although the comments ended up very interesting.

            1. 7

              After posting this I have also found this https://easylist.to/ which has EasyList Cookie List, that may solve the problem without another extension

              1. 2

                No need for another extension with this list https://www.i-dont-care-about-cookies.eu/abp/, linked in OP.

                1. 1

                  ABP had malware in it in the past, not recommend. ublock origin has been clean so far.

                  1. 2

                    This filter list is compatible with ublock origin and other similar extensions.

              1. 26

                There are a lot of extensions that automatically select the ‘reject all’ or walk the list and decline them all. Why push people towards one that makes them agree? The cookie pop-ups are part of wilful misinterpretation of the GDPR: you don’t need consent for cookies, you need consent for tracking and data sharing. If your site doesn’t track users or share data with third parties, you don’t need a pop up. See GitHub for an example of a complex web-app that manages this. Generally, a well-designed site shouldn’t need to keep PII about users unless they register an account, at which point you can ask permission for everything that you need to store and explain why you are storing it.

                Note also that the GDPR is very specific about requiring informed consent. It is not at all clear to me that most of these pop-ups actually meet this requirement. If a user of your site cannot explain exactly what PII handling they have agreed to then you are not in compliance.

                1. 4

                  Can’t answer this for other people, but I want tracking cookies.

                  When people try to articulate the harm, it seems to boil down to an intangible “creepy” feeling or a circular “Corporations tracking you is bad because it means corporations are tracking you” argument that begs the question.

                  Tracking improves the quality of ad targeting; that’s the whole point of the exercise. Narrowly-targeted ads are more profitable, and more ad revenue means fewer sites have to support themselves with paywalls. Fewer paywalls mean more sites available to low-income users, especially ones in developing countries where even what seem like cheap microtransactions from a developed-world perspective would be prohibitively expensive.

                  To me, the whole “I don’t care if it means I have to pay, just stop tracking me” argument is dripping with privilege. I think the ad-supported, free-for-all-comers web is possibly second only to universal literacy as the most egalitarian development in the history of information dissemination. Yes, Wikipedia exists and is wonderful and I donate to it annually, but anyone who has run a small online service that asks for donations knows that relying on the charity of random strangers to cover your costs is often not a reliable way to keep the bills paid. Ads are a more predictable revenue stream.

                  Tracking cookies cost me nothing and benefit others. I always click “Agree” and I do it on purpose.

                  1. 3

                    ‘an intangible “creepy” feeling’ is a nice way of describing how it feels to find out that someone committed a serious crime using your identity. There are real serious consequences of unnecessary tracking, and it costs billions and destroys lives.

                    Also I don’t want ads at all, and I have no interest in targeted ads. If I want to buy things I know how to use a search bar, and if I don’t know I need something, do I really need it? If I am on a website where I frequently shop I might even enable tracking cookies but I don’t want blanket enable them on all sites.

                    1. 4

                      How does it “costs billions and destroys lives”?

                      1. 2

                        https://www.ftc.gov/system/files/documents/reports/consumer-sentinel-network-data-book-2020/csn_annual_data_book_2020.pdf see page 8. This is in the US alone and does not take the other 7.7b people in the world into account. I will admit it is not clear what percentage of fraud and identity theft are due to leaked or hacked data from tracking cookies so this data is hardly accurate for the current discussion, but I think it covers the question of ‘how’. If you want more detail just google the individual categories in the report under fraud and identity theft.

                        Also see this and this

                        But I covered criminal prosecution in the same sentence you just quoted from my reply above so clearly you meant ‘other than being put in prison’. Also, people sometimes die in prison, and they almost always lose their jobs.

                        1. 4

                          The first identity theft story doesn’t really detail what exactly happened surrounding the ID theft, and the second one is about a childhood acquaintance stealing the man’s ID. It doesn’t say how exactly either, and neither does that FTC report as far as I can see: it just lists ID theft as a problem. Well, okay, but colour me skeptical that this is cause by run-of-mill adtech/engagement tracking, which is what we’re talking about here. Not that I think it’s not problematic, but it’s a different thing and I don’t see how they’re strongly connected.

                          The NSA will do what the NSA will do; if we had no Google then they would just do the same. I also don’t think it’s as problematic as often claimed as agencies such as the NSA also do necessary work. It really depends on the details on who/why/what was done exactly (but the article doesn’t mention that, and it’s probably not public anyway; I’d argue lack of oversight and trust is the biggest issue here, rather than the actions themselves, but this is veering very off-topic).

                          In short, I feel there’s a sore lack of nuance here and confusion between things that are (mostly) unconnected.

                          1. 2

                            Nevertheless all this personal data is being collected, and sometimes it gets out of the data silos. To pretend that it never causes any harm just because some stranger on the internet failed to come up with a completely airtight example case in 5 minutes of web searching is either dishonest or naive. If you really want to know, you can do the research yourself and find real cases. If you would rather just feel comfortable with your choice to allow all tracking cookies that is also totally fine. You asked how, I believe my answer was sufficient and roughly correct. If you feel the need to prove me wrong that is also fine, and I will consider any evidence you present.

                            1. 2

                              The type of “personal data” required for identity theft is stuff like social security numbers, passport numbers, and that kind of stuff. That’s quite a different sort of “personal data” than your internet history/behaviour.

                              To pretend that it never causes any harm just because some stranger on the internet failed to come up with a completely airtight example case in 5 minutes of web searching is either dishonest or naive. If you really want to know, you can do the research yourself and find real cases.

                              C’mon man, if you’re making such large claims such as “it costs billions and destroys lives” then you should be prepared to back them up. I’m not an expert but spent over ten years paying close attention to these kind of things, and I don’t see how these claims bear out, but I’m always willing to learn something new which is why I asked the question. Coming back with “do your own research” and “prove me wrong then!” is rather unimpressive.

                              If you would rather just feel comfortable with your choice to allow all tracking cookies that is also totally fine.

                              I don’t, and I never said anything which implied it.

                              If you feel the need to prove me wrong that is also fine, and I will consider any evidence you present.

                              I feel the need to understand reality to the best of my ability.

                              1. 1

                                I feel the need to understand reality to the best of my ability.

                                Sorry I was a bit rude in my wording. There is no call for that. I just felt like I was being asked to do a lot of online research for a discussion I have no real stake in.

                                GDPR Article 4 Paragraph 1 and GDPR Article 9 Paragraph 1 specify what kind of information they need to ask permission to collect. It is all pretty serious stuff. There is no mention of ‘shopping preferences’. Social security numbers and passport numbers are included, as well as health data, things that are often the cause of discrimination like sexuality/religion/political affiliation. Also included is any data that can be used to uniquely identify you as an individual (without which aggregate data is much harder to abuse) which includes your IP, your real name.

                                A lot of sites just ask permission to cover their asses and don’t need to. This I agree is annoying. But if a site is giving you a list of cookies to say yes or no to they probably know what they are doing and are collecting the above information about you. If you are a white heterosexual English speaking male then a lot of that information probably seems tame enough too, but for a lot of people having that information collected online is very dangerous in quite real and tangible ways.

                        2. 3

                          I am absolutely willing to have my view on this changed. Can you point me to some examples of serious identity theft crimes being committed using tracking cookies?

                          1. 2

                            See my reply to the other guy above. The FTC data does not specify where the hackers stole the identity information so it is impossible for me to say what percentage are legitimately caused by tracking cookies. The law that mandates these banners refers to information that can be used to identify individuals. Even if it has never ever happened in history that hacked or leaked cookie data has been used for fraud or identity theft, it is a real danger. I would love to supply concrete examples but I have a full time job and a life and if your claim is “Sure all this personal data is out there on the web, and yes sometimes it gets out of the data silos, but I don’t believe anyone ever used it for a crime” then I feel like its not worth my time spending hours digging out case studies and court records to prove you wrong. Having said that if you do some searching to satisfy your own curiosity and find anything definitive I would love to hear about it.

                          2. 2

                            someone committed a serious crime using your identity

                            because of cookies? that doesn’t follow

                          3. 1

                            Well this is weird. I think it’s easy to read that and forget that the industry you’re waxing lyrical about is worth hundreds of billions; it’s not an egalitarian development, it’s an empire. Those small online services that don’t want to rely on asking for donations aren’t billion-dollar companies, get a deal entirely on someone else’s terms, and are almost certainly taken advantage of for the privilege.

                            It also has its own agenda. The ability to mechanically assess “ad-friendliness” already restricts ad-supported content producers to what corporations are happy to see their name next to. I don’t want to get too speculative on the site, but there’s such a thing as an ad-friendly viewer too, and I expect that concept to become increasingly relevant.

                            So, tracking cookies. They support an industry I think is a social ill, so I’d be opposed to them on that alone. But I also think it’s extremely… optimistic… to think being spied on will only ever be good for you. Advertisers already leave content providers in the cold when it’s financially indicated—what happens when your tracking profile tells them you’re not worth advertising to?

                            I claim the cost to the individual is unknowable. The benefit to society is Cambridge Analytica.

                          4. 2

                            The cookie law is much older than GDPR. In the EU you do need consent for cookies. It is a dumb law.

                            1. 11

                              In the EU you do need consent for cookies. It is a dumb law.

                              This is not true. In the EU you need consent for tracking, whether or not you do that with cookies. It has to be informed consent, which means that the user must understand what they are agreeing to. As such, a lot of the cookie consent UIs are not GDPR compliant. Max Schrems’ company is filing complaints about non-compliant cookie banners.

                              If you only use functional cookies, you don’t need to ask for consent.

                              1. 3

                                https://eur-lex.europa.eu/legal-content/EN/TXT/HTML/?uri=CELEX:31995L0046 concerns consent of user data processing.

                                https://eur-lex.europa.eu/legal-content/EN/TXT/HTML/?uri=CELEX:32002L0058 from 2002 builds on the 1995 directive, bringing in “cookies” explicitly. Among other things it states “The methods for giving information, offering a right to refuse or requesting consent should be made as user-friendly as possible.”

                                In 2009 https://eur-lex.europa.eu/legal-content/EN/TXT/HTML/?uri=CELEX:32009L0136 updated the 2002 directive, closing a few loop holes.

                                The Do-Not-Track header should have been enough signal to cut down on cookie banners (and a few websites are sensible enough to interpret it as universal rejection for unnecessary data storage), but apparently that was too easy on users? It went as quickly as it came after Microsoft defused it by enabling it by default and parts of adtech arguing that the header doesn’t signify an informed decision anymore and therefore can be ignored.

                                If banners are annoying it’s because they’re a deliberate dark pattern, see https://twitter.com/pixelscript/status/1436664488913215490 for a particularly egregious example: A direct breach of the 2002 directive that is typically brought up as “the cookie law” given how it mandates “as user-friendly as possible.”

                                1. 2

                                  I don’t understand what you’re trying to say. Most cookie banners on EU sites are not at all what I’d call a dark pattern. They’re just trying to follow the law. It is a stupid law which only trained people to click agree on all website warnings, making GDPR less effective. Without the cookie law, dark patterns against GDPR would be less effective.

                                  1. 3

                                    The dark pattern pgeorgi refers to is that on many cookie banners, the “Refuse all” button requires more clicks and/or more careful looking than the “Accept all” button. People who have trained themselves to click “Accept” mostly chose “Accept” because it is easier — one click on a bright button, and done. If “Refuse all” were equally easy to choose, more people would train themselves to always click “Refuse”.

                                    Let’s pretend for a moment the cookie law no longer exists. A website wants to set a tracking cookie. A tracking cookie, by definition, constitutes personally identifiable information (PII) – as long as the cookie is present, you can show an ad to specifically that user. The GDPR recognizes 6 different conditions under which processing PII is lawful.

                                    The only legal ground to set a tracking cookie for advertising purposes is (a) If the data subject has given consent to the processing of his or her personal data. I won’t go over every GDPR ground, but suffice it to say that tracking-for-advertising-purposes is not covered by

                                    • (b) To fulfil contractual obligations with a data subject;
                                    • nor is it covered by (f) For the legitimate interests of a data controller or a third party, unless these interests are overridden by interests of the data subject.

                                    So even if there were no cookie law, GDPR ensures that if you want to set a tracking cookie, you have to ask the user.

                                    Conversely, if you want to show ads without setting tracking cookies, you don’t need to get consent for anything.

                                    1. 2

                                      I feel the mistake with the whole “cookie law” thing is that it focuses too much on the technology rather than what people/companies are actually doing. That is, there are many innocent non-tracking reasons to store information in a browser that’s not “strictly necessary”, and there are many ways to track people without storing information in the browser.

                                    2. 1

                                      I’m not saying that dark patterns are employed on the banners. The banners themselves are dark patterns.

                                      1. 1

                                        The banners often come from freely available compliance packages… It’s not dark, it’s just lazy and badly thought out, like the law itself.

                                        1. 1

                                          What about the law do you think is badly thought out?

                                          1. 1

                                            The cookie part of the ePrivacy Directive is too technological. You don’t need consent, but you do have to inform the user of cookie storage (or localstorage etc) no matter what you use it for. It’s unnecessary information, and it doesn’t protect the user. These are the cookie banners that only let you choose “I understand”, cause they only store strictly necessary cookies (or any kind of cookie before GDPR in 2016).

                                            GDPR is the right way to do it. The cookie part of EPR should have been scrapped with GDPR. That would make banners that do ask for PII storage consent stand out more. You can’t make you GDPR banner look like an EPR information banner if EPR banners aren’t a thing.

                                2. 2

                                  Usually when I see the cookie consent popup I haven’t shared any personal information yet. There is what the site has from my browser and network connection, but I trust my browser, uBlock origin and DDG privacy tools to block various things and I use a VPN to somewhere random when I don’t want a site to know everything it can about my network location.

                                  If I really do want to share personal info with a site, I’ll go and be very careful what I provide and what I agree too, but also realistic in that I know there are no guarantees.

                                  1. 8

                                    If you’re using a VPN and uBlock origin, then your anonymity set probably doesn’t contain more than a handful of people. Combined with browser fingerprinting, it probably contains just you.

                                    1. 2

                                      Should I be concerned about that? I’m really not sure I have properly thought through any threats from the unique identification that comes from that. Do you have any pointers to how to figure out what that might lead to?

                                      1. 9

                                        The point of things like the GDPR and so on is to prevent people assembling large databases of correlated knowledge that violate individual privacy. For example, if someone tracks which news articles you read, they have a good first approximation of your voting preferences. If they correlate it with your address, they can tell if you’re in a constituency where their candidate may have a chance. If you are, they know the issues that are important to you and so can target adverts towards you (including targeted postal adverts if they’re able to get your address, which they can if they share data with any company that’s shipped anything physical to you) that may influence the election.

                                        Personally, I consider automated propaganda engines backed by sophisticated psychological models to be an existential threat to a free society that can be addressed only by some quite aggressive regulation. Any unique identifier that allows you to be associated with the kind of profile that these things construct is a problem.

                                      2. 2

                                        Do you have a recommendation?

                                    2. 2

                                      The problem with rejecting all the tracking is that without it most ad networks will serve you the worst/cheapest untargeted adverts which have a high chance of being a vector for malware.

                                      So if you reject the tracking you pretty much have to also run an ad-blocker to protect yourself. Of course if you are running an ad blocker then the cookies arent going to make much difference either way.

                                      1. 1

                                        I don’t believe it makes any difference whether you agree or disagree? the goal is just to make the box go away

                                        1. 2

                                          Yes. If I agree and they track me, they are legally covered. If I disagree and they track me then the regulator can impose a fine of up to 5% of their annual turnover. As a second-order effect: if aggregate statistics say 95% of people click ‘agree’ then they have no incentive to reduce their tracking, whereas if aggregate statistics say ‘10% leave the page without clicking either, 50% click disagree’ then they have a strong case that tracking will lose them business and this will impact their financial planning.

                                      1. 19

                                        Some rough thoughts: I guess I am a bit curmudgeonly about the term “crypto” getting taken over by “cryptocurrency” after several decades of use, it’s a bit galling to cede the term. Though at some point, yeah, it’s a practical reality that’s not worth confusing people over. We’ve seen cryptocurrency promotion get tagged both with crypto and merkle-trees so I’m not really sure expanding to cryptocurrency would change anything; I think all technical communities are going to continue to suffer from promotion for at least a few more years.

                                        Worth discussing, though, maybe someone has more pros or cons? Maybe it’s worth just trying for a year?

                                        1. 6

                                          I think the realistic view is that it has become ambiguous and would be regardless of what the new referent that “takes over” is. What are the cons in expanding it to “cryptography” exactly? I don’t see any, only pros.

                                          1. 5

                                            I’ve renamed the crypto tag to cryptography. There’s strong support for it, it’s cheap, and implicitly ceding a little ground to scammers is a pretty negligible downside.

                                            1. 2

                                              I would rather not have any bitcoin related stuff on the site if possible.

                                              1. 2

                                                Add a cryptocurrency tag, then when used, show a warning like with banned domains and don’t submit the url. Sort of like a fishbait tag. You can add one for business news as well. Preventive moderation of sorts

                                                1. 1

                                                  Tbh, I’m pretty appreciative of the similar stance for the ml tag.

                                                  On another note, I do kind of wish there was a more general programming-languages or pl-design tag, as a lot of things that get tagged plt aren’t really PL theory, more about PL design etc?

                                                  1. 1

                                                    What do you think about this plan:

                                                    1. Add suffix for cryptography
                                                    2. Add cryptocurrency
                                                    3. Hotness mod cryptocurrency of -0.5
                                                    1. 1

                                                      There are technical aspects of cryptocurrencies that are technical, on-topic, insightful (presumably? I don’t read them), and I don’t think it would be fair to apply a hotness downgrade to them.

                                                      The problem is cryptocurrency promotion/spam, which isn’t on-topic at all. Lobsters also gets a fair amount of spam on other topics like cloud wazamabobs we don’t downgrade the entire topic for it.

                                                      1. 2

                                                        If it’s described as a cryptocurrency, then it’s likely already moved into the scam category. There are interesting uses for verifiable append-only ledgers. There are interesting use cases for distributed consensus algorithms. These have nothing to do with cryptocurrencies other than the underlying technology.

                                                        1. 1

                                                          I do happen to agree that at a first approximation all cryptocurrencies are scams, but even a more charitable reading would lead them to fall under the umbrella of “business news” and thus off-topic anyway.

                                                        2. 1

                                                          I thought we use merkle-trees for the technical aspects?

                                                          A downgrade is just a downgrade; e.g. culture tags are downgraded. If the link is good, votes will still rise it to the top.

                                                      2. 1

                                                        I am in favor of expanding the existing tag to cryptography and keeping the implicit ban on cryptocurrency promotion by only keeping the relevant merkle-trees tag.

                                                      1. 8

                                                        I had no idea they were different! I always thought SFTP was just a fancy name for scp. Turns out SFTP is an SSH protocol standard.

                                                        1. 10

                                                          Yes they are pretty different, I wrote about it here https://rain-1.github.io/use-sftp-not-scp.html

                                                          1. 3

                                                            I see you are also against rsync. Is there alternative that would use similar protocol for incremental update that would have better implementation?

                                                            1. 2

                                                              Maybe reclone

                                                            2. 3

                                                              Thanks, looking at its interface is all I need to know I don’t ever want to use the sftp tool. That interface is horrible.

                                                            3. 3

                                                              I thought scp was just a command line tool to transfer files over sftp. Looks like it is that now. What did it use before if not sftp?

                                                              1. 6

                                                                scp used SCP

                                                              2. 2

                                                                An additional learning that blew my mind is that SFTP is actually very much used in big corporations!

                                                                It is used widely in Finance and Healthcare afaik. There are wish to more away from file based protocols but it will take some time!

                                                                1. 3

                                                                  An additional learning that blew my mind is that SFTP is actually very much used in big corporations!

                                                                  I recently bought a Brother printer / scanner. The scanner has an option to upload results via sftp, with a web-based GUI for providing both the private key for it to use and the server’s public key. It was very easy to set up to scan things to my NAS, where I wrote a tiny script that uses fswatch to watch for new files and then tesseract to OCR them.

                                                                  I was very happy to see that it supported SFTP. The last printer / scanner combo thingy I bought could talk FTP or SMB, but a weird version of SMB that didn’t seem to want to talk to Samba.

                                                                  1. 2

                                                                    The product made by company I work for handles a lot of data being transferred in flat files. Many customers have “security checklists” that identified FTP as an insecure protocol and recommended SFTP instead.

                                                                    I used to mock file based data transfer but compared to stuff like getting data via JSON APIs they have a lot of life in them still…

                                                                    1. 2

                                                                      You mention JSON APIs; but you can have JSON APIs over SFTP, so I guess you meant REST APIs instead.

                                                                      As far as I understand, the main issue with file based data transfer with SFTP is that there’s no support for upload completion in any way.

                                                                      E.g.: if client 1 uploads a file to the server for processing, then, how does the server knows the file upload is completed?

                                                                      This is often worked around by changing the name of the file(using the SFTP rename command), or uploading a hash too, or the file name is the hash, etc… all this is pretty clumsy compared to how HTTP handles that.

                                                                      1. 2

                                                                        Correct, I meant REST APIs (often returning JSON, but can return XML too).

                                                                        There are a lot of issues with file based transfer, including stuff like completeness (can be mitigated by including a defined footer/end of file marker) file names, unannounced changes of format and so on.

                                                                        But you can shuffle a lot of data in a short time by zipping files, the transfers can be batched, and the endpoint generally doesn’t need a ton of authentication infra to ensure that unauthorized access is prevented etc. Push vs. Pull.

                                                                        In the long run returning data over a API endpoint is The Future, but SFTP is basically a small upgrade to FTP which enables transport security without a ton of other changes.

                                                                        1. 1

                                                                          It’s a bit unclear here if you’re talking about SFTP or FTPS…

                                                                          1. 2

                                                                            SFTP.

                                                                            I don’t mean it’s a drop-in replacement, but as a part of a system where you have 2 systems communicating using files, updating the transport mechanism from FTP to SFTP is a small step compared to converting the entire chain to an API-based solution.

                                                                      2. 1

                                                                        What bother me about SFTP over FTPS (as a replacement for FTP), is that you need to allow ssh trafic from your client to your server. It also means providing a real account for the client on the machine, while FTPS is just as secure and can make use of virtual accounts and a different port than SSH by default.

                                                                        1. 2

                                                                          There’s nothing about the SFTP protocol that doesn’t allow for virtual users or other port numbers.

                                                                          1. 1

                                                                            Sure the protocol allows it, but as far as I know, openssh doesn’t support virtual users. So you’d need to install another server (say vsftpd), and at this point, why would you run sftp over ftps ?

                                                                      3. 1

                                                                        Yes, I work in the data space and sftp connectors usually come up right after cloud stores. A lot of companies use it, it is even supported by hadoop. It seems to have replaced ftp/nfs is a lot of corporations.

                                                                      4. 2

                                                                        I think scp was basically rcp over ssh rather than rsh/rlogin.

                                                                        1. 17

                                                                          Emacs in Guile.

                                                                          I know there was the work in ~2014, but being able to run things that were multi-threaded, have a much simpler language, and support the guile ecosystem where emacs libraries would just be guile libraries makes my heart happy.

                                                                          Doesn’t make a lot of sense to actually do, and I understand the limitations.. but whoooo boy have I been lusting for this for years and years.

                                                                          1. 4

                                                                            Why Guile instead of Common Lisp? Elisp is far closer to Lisp than to Scheme; Lisp has multiple compatible implementations while Guile (and every other useful Scheme) is practically incompatible with any other useful Scheme, because the RnRS series, even R6RS, tend to underspecify.

                                                                            GNU’s focus on Scheme rather than Common Lisp for the last 20 years has badly held it back. Scheme is great for teaching and implementation: it is simple and clean and pure. But it is to small, and consequently different implementations have to come up with their own incompatible ways of doing things.

                                                                            While the Common Lisp standard is not as large as I would like in 2021, it was considered huge when it came out. Code from one implementation is compatible with that of others. There are well-defined places to isolate implementation-specific code, and compatibility layers for, e.g., sockets and threading, exist.

                                                                            1. 2

                                                                              I’d also love to see a Common Lisp Emacs. One day, I’m hoping that CLOS-OS will be usable, and using it without Emacs is kind of unthinkable.

                                                                              1. 1

                                                                                Why Guile instead of Common Lisp?

                                                                                Because it’s my favorite flavor of scheme, and I enjoy programming in scheme much more than common lisp. I do not like lisp-2’s, but I do like defmacro (gasp) so who knows.

                                                                                Emacs being rewritten in common lisp would also be awesome.

                                                                              2. 1

                                                                                I’m going to go more heretical: emacs with the core in Rust and Typescript as the extension language. More tooling support, more libraries, more effort into optimizing the VM.

                                                                                Lisps are alright but honestly I just don’t enjoy them. That maxim about how you have to be twice as clever to debug code as you do to write it, so any code that’s as clever as you can make it is undebuggable.

                                                                                1. 1

                                                                                  this exists doesn’t it?

                                                                                  1. 1

                                                                                    The last implementation work was ~2015 according to git.

                                                                                    If you know of this existing, let me know!

                                                                                  2. 1

                                                                                    Multiple times I’ve considered, and then abandoned, taking microemacs and embedding a Scheme in it. The reality is that it’s basically a complete rewrite, which just doesn’t seem worth it…. And you lose all compatibility with the code I use today.

                                                                                    Guile-emacs, though, having multiple language support, seems to have a fighting chance at a successful future, if only it was “staffed” sustainably.

                                                                                    1. 1

                                                                                      Multiple times I’ve considered, and then abandoned, taking microemacs and embedding a Scheme in it.

                                                                                      Isn’t that basically edwin?

                                                                                      1. 1

                                                                                        I mean, by the reductivist’s view, yes. Edwin serves a single purpose of editing scheme code with an integrated REPL. You could, of course, use it beyond that, but that’s not really the goal of it, so practically no one does (I am sure there are some edwin purists out there).

                                                                                        My interest in this as a project is more as a lean, start from scratch standpoint. I wonder what concepts I’d bring from emacs over. I wonder if I’d get used to using something like tig, instead of magit. I wonder if the lack of syntax highlighting would actually be a problem… The reason I’ve never made a dent in this is because I don’t view reflection of how I use things like this as deeply important. They’re tools…

                                                                                  1. 7

                                                                                    A GUI toolkit that is easy as html to use but as memory efficient as native.

                                                                                    1. 5

                                                                                      For some reason I’m reminded of XUL. And I think I just heard a Firefox developer cry out in terror.

                                                                                      1. 2

                                                                                        the big question there would be what features do you consider essential from html?

                                                                                        1. 2

                                                                                          This is an idea I’ve been toying with for some time. Basically an HTML rendering engine meant for GUIs, like Sciter. But instead of Javascript, you’d control everything using the host language (probably Rust). If you’d want JS you’d have to somehow bind that to Rust.

                                                                                          I think this could really work out. However: I’ve dealt with XML and HTML parsers for years in the past, and I’m not sure I’m ready yet to dive into the mess that is HTML again.

                                                                                        1. 15

                                                                                          Rewrite the whole Unix tool space to emit and accept ND-JSON instead of idiosyncratic formats.

                                                                                          1. 5

                                                                                            I’m a HUGE fan of libucl. It supports several constructs, including JSON.

                                                                                            1. 3

                                                                                              I was thinking YAML in that it’s still human readable at the console but also obviously processable, and JSON is a subset.

                                                                                              1. 7

                                                                                                YAML is a terrible format that should literally never be used. :-) There’s always a better choice than YAML.

                                                                                                1. 3

                                                                                                  Strict yaml, without the stupid gotchas

                                                                                                  1. 2

                                                                                                    I don’t think yaml is the best choice for this. Most of the time user’s will want to see tables rather than a nested format like yaml. I guess it is a bit nicer to debug than JSON but ideally the user would never see it. If it was going to hit your terminal it would be rendered for human viewing.

                                                                                                    Yaml is also super complex and has a lot of extensions that are sparsely supported. JSON is a much better format for interop.

                                                                                                    On the other hand the possibility of passing graphs between programs is both intriguing and terrifying.

                                                                                                    1. 3

                                                                                                      I was recently trying to figure out how, from inside a CLI tool I’m building, to determine whether a program was outputting to a screen for a user to view, or a pipe, for another program to consume… Turns out it’s not as straightforward as I thought. I do believe the modern rust version of cat, bat can do this. Because my thought is…. why not both?

                                                                                                  2. 1

                                                                                                    this is a good idea and wouldn’t even be that much work

                                                                                                  1. 4

                                                                                                    This is why I like lobsters, up until recently (with medium and things being posted more often) most sites posted here didn’t have these problems.

                                                                                                    It’s very annoying how badly my google search results are polluted with sites like these now.

                                                                                                    1. 51

                                                                                                      I’m prepping for a work meeting in an hour I can’t miss, but I’ve taken the afternoon off to rush finishing the fix for this. @355e3b has free time before then and has jumped into to help. Sorry to be vague about the in-progress security issue, but I trust y’all understand how that goes. I’ll leave a response to this comment in a few hours with full info.

                                                                                                      In the meantime I’ve made the rate limiting much stricter. I’m sorry for the inconvenience and will revert as part of the fix.

                                                                                                      1. 40

                                                                                                        So, the thing I was talking around is that the potential vulnerability was potentially much worse and we were trying to be thorough about it. Hunter (@355e3b) and I dropped as much as we could to finish that work today.

                                                                                                        To start, there was some kibitzing in chat, so to be explicit: we do think this was a potentially a viable attack. An attacker could register a VPS at Digital Ocean to minimize network jitter, the dominant factor in this attack. (Other things that might introduce jitter can multiply the cost of the attack but don’t fix it.) We already see regular probing from other DO VPSs.

                                                                                                        A few weeks before soatok’s report, some of that probing prompted me to implement some rate limiting to deal with unrelated abuse that prevented effective abuse of this potential vulnerability. This was easy to extend to password resets. An attacker who can only test four tokens per minute isn’t going to get very far.

                                                                                                        Since this was reported, I was concerned that this kind of network timing attack could be used against other tokens. The most important of these is called session_token and is used to authenticate logged-in users on every request. Because this is every possible endpoint and not just the password reset flow, this would not be subject to the rate limit of 4 req/min and could lead to account takeover. We assumed this was a viable attack and Hunter wrote an in-depth mitigation for it.

                                                                                                        This afternoon I worked on finishing the mitigation and testing the attack, which required a bigger block of time than I previously could to devote to it. I was unable to get the timing attack to work against session_token (usually pretty straightforward on localhost!) and pulled my hair out for a bit. Eventually I recognized that the attacker can’t execute the attack against session_token because Rails stores the session as an encrypted, serialized value in the cookie. The session cookie isn’t the value of the session_token, it’s an encrypted blob that can’t be edited by an attacker without being invalidated. It’s a little embarrassing that I conflated the two, but I don’t much mind making an error in the direction of being over-cautious.

                                                                                                        In short, I was uncommunicative because I’d been proceeding the last couple weeks on a much more paranoid footing than was needed. My thanks and apologies to Hunter for his wasted effort; I’m sorry I didn’t have the time to write a proof-of-concept to test this first.

                                                                                                        The other thing I’ve been doing the last few weeks has to do with the “sister sites” that have adopted the Lobsters codebase. Lobsters is a site that’s open source for transparency, not to help start and run other sites. It’s really rewarding to see the code used this way, but nobody has the spare attention to run that potential project. Mostly this means we don’t take feature requests, but in this case it also means we don’t have a way to contact the site admins. There is no mailing list or forum, even for security issues. I’ve been in the process of finding admin emails and collecting translators (many sister sites aren’t in English) so we could notify them before the vulnerability was published. If you recently got a DM from me asking you to play translator, you can ignore it, I’m proceeding from here in English for urgency’s sake.

                                                                                                        Now that I’m confident there isn’t a worse vulnerability I’ve reverted the temporary stricter rate limiting. I’m sorry to anyone I inconvenienced, I know if you read the site by opening several stories as tabs you probably hit this.

                                                                                                        I think all the exciting bits are over, but am happy to be corrected if any volunteer spots something I missed, or any other issue. My email is on my profile.

                                                                                                        My thanks to soatok for alerting about the potential vulnerability.

                                                                                                        1. 8

                                                                                                          Thanks as always for the transparency and also the significant effort you continue to put into the site.

                                                                                                          1. 5

                                                                                                            To add to this, the encryption is AES-256-GCM and the underlying implementation is OpenSSL so there’s no timing attack on the encryption either.

                                                                                                            1. 5

                                                                                                              As someone who runs an (intentionally-single-threaded, but multithreaded in prefetching upstream stories) fetch loop before skimming all the stuff in a text editor, a separate thanks for the transparent and friendly throttling implementation. To say the least, not all emergency-throttling implementations are so clear with the exact changes in the fetcher one needs to do. Thanks.

                                                                                                              1. 2

                                                                                                                I know if you read the site by opening several stories as tabs you probably hit this.

                                                                                                                I did indeed hit this! I thought it was strange, and just figured it must have been my fault and I must have had a VPN going that was using a popular IP. Didn’t think about it much but glad you explained it.

                                                                                                                1. 1

                                                                                                                  I was unable to get the timing attack to work against session_token (usually pretty straightforward on localhost!) and pulled my hair out for a bit

                                                                                                                  Did you implement the timing attack described in the article and have success with it?

                                                                                                                  1. 3

                                                                                                                    No. I took it as plausible but easily addressed with rate limiting. I only tried to repro the more serious attack I feared against session_token.

                                                                                                              1. 3

                                                                                                                A little late to the party, but can’t you just sleep(rand() % 100) to introduce a random delay of up to, say, 100ms in any security sensitive context? It doesn’t eliminate the ability to use timing attacks given the uniform distribution but it dramatically increases the number of samples you would need to actually get meaningful results. If we’re talking password resets, you could delay up to 500ms without the user even noticing (especially redirected from their mail client) and I’d imagine that would require more samples than you could afford given the 24 hour password reset window.

                                                                                                                Or just reduce the password reset window to 10 minutes or something.

                                                                                                                1. 2

                                                                                                                  There’s 3 variables in a timing attack: delta (the thing you’re trying to measure), noise, query rate (how often you can hit the site to take a sample). In theory, noise follows some kind of distribution with a mean around 0, so you can always get your measurement by performing enough queries and averaging to smooth out the noise. (In practice things are not this simple).

                                                                                                                  In this situation (lobste.rs) delta is in the range of nanoseconds, noise is probably in the range of 10s of milliseconds, this means you will need to perform an absolutely enormous number of queries to smooth out the noise and perform an accurate measurement of the timing side channel. You’re proposing to increase the noise which will (gigantically) increase the number of queries needed to measure delta. Multiply the number of queries needed by the query rate you are able to hit the site with to find out how long you need. The attack needs to be done in under 24 hours.

                                                                                                                  So this idea of adding randomness could be a valid solution but it needs quantified to know it will work or not. Although we do not know if the attack is possible at all - so it’s very hard to say if adding 100s noise even matters. It’s much less fragile to fix timing side channels in general by using constant time comparisons/algorithms.

                                                                                                                  1. 2

                                                                                                                    this means you will need to perform an absolutely enormous number of queries to smooth out the noise and perform an accurate measurement of the timing side channel

                                                                                                                    Yes; that’s the point I was trying to get across - thanks.

                                                                                                                    It’s much less fragile to fix timing side channels in general by using constant time comparisons/algorithms.

                                                                                                                    I don’t know. I think it’s actually the more fragile but theoretically optimal approach to resisting timing attacks while my suggestion is the “huge bandaid” overkill approach that makes you want to roll your eyes and burn the resulting code because it’s clearly suboptimal but probably gives better real world results.

                                                                                                                    By real world results I mean that so-called constant time operations are only constant time on paper and are a research paper or two away from being cracked. They only consider the algorithm in question and a few “known unknowns” but in the real world we have to deal with things like overzealous compilers, speculative execution, cache locality, side channels up and down both the software and hardware stacks, etc any of which might very plausibly one day be shown (if not already privately known) to have a partiality to produce - on some micro level - a linearly converging preference for one codepath over another.

                                                                                                                    What I mean to say is, in any library I write I’m going to definitely take the “smart” approach and convince myself that I’ve done everything right and I’m using all the right algorithms for all the steps and have arrived at an optimally impartial validation function/entrypoint to serve as my oracle without sacrificing one nanosecond more than I had to in order to be resistant to timing attacks, but if tomorrow someone were stupid enough to hold a gun to my head and put me in charge of creating an (online or offline) frontend to some country’s nuclear controls, theoretically optimum code is going out the window: I’m using constant time algorithms but you bet I’m adding a random delay to AuthenticateIcbmCredentials().

                                                                                                                1. 2

                                                                                                                  poc?

                                                                                                                  edit: title was edited to state that the attack was hypothetical.

                                                                                                                  1. 7

                                                                                                                    From the article:

                                                                                                                    Note: I’m not going to implement this attack publicly, because I don’t want to arm script kiddies with another exploit tool that they don’t deserve and will only use to cause harm to the Internet. (Or, more likely, DoS servers while failing to actually leak anything.)

                                                                                                                    1. 3

                                                                                                                      so it was never checked if this vulnerability can be exploited in practice?

                                                                                                                      1. 27

                                                                                                                        It’s pretty hard to do. The amount of traffic that you’d need to send to lobste.rs to do the attack is well into the ‘not polite’ level, even if you’re trying to attack your own account. If you try to recreate the environment locally and attack it then you’re probably going to end up with a slightly easier environment to attack than the real deployment.

                                                                                                                        That said, reading the disclosure, there’s nothing in there that hasn’t been exploited in the wild in other contexts so I have no trouble believing @soatok’s assessment.

                                                                                                                        1. 1

                                                                                                                          If you try to recreate the environment locally and attack it then you’re probably going to end up with a slightly easier environment to attack than the real deployment.

                                                                                                                          yes, you would generally do this to validate the finding.

                                                                                                                          1. 2

                                                                                                                            And this is the point where I think you argue in bad faith, as the whole article debates how realistic this is and we’ve had plenty of timing based attacks the last years. Why you need to do a PoC to proof something which is “trivial to proof” so to speak at this point.

                                                                                                                            1. 3

                                                                                                                              No I’m not arguing in bad faith. What is it that I wrote that makes you think I am being bad?

                                                                                                                              There are variables involved in whether a timing attack can be mounted. You use a proof of concept to prove that it is actually exploitable.

                                                                                                                        2. 20

                                                                                                                          I read it less as “zomg lobsters is insecure!!!” and more as “using the vulnerability as a springboard to discuss the mechanics of side channel attacks and split tokens.”

                                                                                                                          1. 15

                                                                                                                            40% of the article discusses the practicality of timing attack exploitation in this context.

                                                                                                                      1. 5

                                                                                                                        I’m really skeptical, for two reasons that were already mentioned in the article.

                                                                                                                        First, the article links to a 2009 study about using timing attacks to obtain secrets over the Internet. But, most of that study is trying to determine a secret with tens of microseconds of processing time. Here, since the string fits on one cache line, we’re measuring CPU performance (not memory performance) and need to observe a difference in tens of nanoseconds of processing time.

                                                                                                                        Pinging lobste.rs shows that I get milliseconds of difference each time. At a minimum, this implies that there need to be many attempts per comparison to just establish what the cost of that particular comparison is.

                                                                                                                        64 bytes from 67.205.128.5: icmp_seq=1 ttl=53 time=85.6 ms
                                                                                                                        64 bytes from 67.205.128.5: icmp_seq=2 ttl=53 time=84.8 ms
                                                                                                                        64 bytes from 67.205.128.5: icmp_seq=3 ttl=53 time=91.3 ms
                                                                                                                        64 bytes from 67.205.128.5: icmp_seq=4 ttl=53 time=82.4 ms
                                                                                                                        64 bytes from 67.205.128.5: icmp_seq=5 ttl=53 time=85.3 ms
                                                                                                                        64 bytes from 67.205.128.5: icmp_seq=6 ttl=53 time=83.4 ms
                                                                                                                        

                                                                                                                        Second, the article correctly points out that the implementation of memcmp matters a lot. I’m really shocked (disappointed?) at the glibc implementation. Microsoft’s is using native machine word size (so 64 bit) but is also unrolling in order to compare multiple words per loop iteration. This means the search space per compare goes up, and the timing difference at a compare boundary goes way down. I really hope the compiler is smart enough to unroll some of glibc’s loop here. The most visible performance difference shouldn’t be about the result of the compare - it should be the memory alignment of variables and where the cache line breaks land.

                                                                                                                        I think what I got from this article is if I wanted to attack a cloud hosted service, the first step is to create a pile of VMs with the same provider in the same region, and test ping times between my VM and the victim, until I can get as close to the target as possible. The closer it’s possible to place, the more attempts can be applied in any given time and the greater potential to detect variance. For that reason, code running on a cloud provider really needs to assume it’s sharing a very fast LAN with hostile actors.

                                                                                                                        1. 3

                                                                                                                          The memcmp linked is—I expect—an architecture-agnostic fallback. The primary memcmp are written in assembly and do use unrolled loops and large word sizes. Here’s one of the amd64 versions, for instance.


                                                                                                                          Pinging lobste.rs shows that I get milliseconds of difference each time. At a minimum, this implies that there need to be many attempts per comparison to just establish what the cost of that particular comparison is

                                                                                                                          I think what I got from this article is if I wanted to attack a cloud hosted service, the first step is to create a pile of VMs with the same provider in the same region

                                                                                                                          Here are ping times from one server to another that’s physically proximal:

                                                                                                                          round-trip min/avg/max/std-dev = 1.568/4.048/6.924/0.481 ms
                                                                                                                          

                                                                                                                          Hell, here are ping times between a VM and its host:

                                                                                                                          round-trip min/avg/max/stddev = 0.117/0.525/2.257/0.188 ms
                                                                                                                          

                                                                                                                          I’d be the last person to say you can’t carry out this attack, and the protections are definitely worth applying, but it seems highly impractical.

                                                                                                                          1. 2

                                                                                                                            Thanks for the memcmp link, that makes much more sense. Further poking around shows the generic amd64 version is here and it’s using SSE2 (which is always present on amd64) to implement 128 bit compares. So I think that really kills the feasibility of this attack.

                                                                                                                            Thinking more about it, I mentioned in the earlier post that memory/variable layout would cause more jitter than the result of the compare, but that layout applies to all kinds of logic in processing the request, not just within the compare itself. I think therefore it’s basically guaranteed to have more per-request jitter than could be displayed by this compare. Given a large enough number of comparisons it might still be possible to detect, but that puts us in the territory of performing each of 128-bit values 10k times in a 24 hour period.

                                                                                                                            1. 2

                                                                                                                              A 128-bit window absolutely mitigates this kind of attack, given sufficient entropy in each byte being compared. (If the strings were encoded as a series of 0 and 1 characters, this would reduce to 16 bits of actual entropy, which is a sample size of 65536.)

                                                                                                                          2. 1

                                                                                                                            yes for a timing attack you would generally need to query a large number of times and take an average to smooth out the noise and detect a difference.

                                                                                                                            Sometimes this isn’t possible, if the noise dominates too hard with respect to the number of queries you can make. nanoseconds (10^-9) vs milliseconds (10^-3) sounds like one of these situations.

                                                                                                                          1. 3

                                                                                                                            Wonderful post!

                                                                                                                            I’ve heard about some places making edicts about config files needing to use properly-terminated formats like JSON a instead of something like YAML to avoid truncated documents being valid. Would that also be a consideration useful in canonicalization?

                                                                                                                            1. 3

                                                                                                                              YAML in general worries me (especially the Norway Problem), and its susceptibility to truncation is noteworthy, but this problem is strictly about how you feed data into your MAC (or equivalent) function rather than a general problem with data truncation.

                                                                                                                              I’m sure there are other, cleverer attacks possible than the simple one I highlighted.

                                                                                                                              1. 2

                                                                                                                                OK yeah I’ve properly woken up now, the issue with scooting data from the encrypted to additional stuff doesn’t get magically fixed if you bound that data. Thanks for humoring me :)

                                                                                                                              2. 1

                                                                                                                                a simple way to handle this truncation issue you raised is to ensure that the data being hashed ends up with a 0x0 byte (which cannot occur anywhere within the string, so does not need escaped). Then the format itself (JSON, YAML etc.) does not matter.

                                                                                                                              1. 2

                                                                                                                                Another example I like is when hashing trees https://en.wikipedia.org/wiki/Merkle_tree

                                                                                                                                If you look down to where they describe “One simple fix” using 0x00 and 0x01 bytes to signal branch vs leaf.

                                                                                                                                At its root they key thing is that you need an encoding function to be injective. Then encode(A) = encode(B) can only ever happen if A = B.

                                                                                                                                This can be done by escaping e.g. you insert backslashes, but this is not good in a crypto context because it involves processing the input data (which could leak into a timing side channel). This is why the encoding method is done by prefixing length numbers instead.

                                                                                                                                1. 2

                                                                                                                                  First idea that popped into my head is that this is only applicable if the thing in question is constructible. Non-existence is not constructible by its nature: I can prove to you that no decision procedure for the halting problem exists, but I cannot show this constructively.

                                                                                                                                  1. 1

                                                                                                                                    I would really like to learn more about this, aren’t coq proofs constructive by nature?

                                                                                                                                    Here is a proof that the halting problem for turing machines is undecidable: https://github.com/uds-psl/coq-library-undecidability/blob/30d773c57f79e1c5868fd369cd87c9e194902dee/theories/TM/TM_undec.v#L8-L12

                                                                                                                                    There are other proofs in that repo that show other problems as being undecidable.

                                                                                                                                    1. 2

                                                                                                                                      Huh, TIL. Indeed, it seems that there exist proofs of the undecidability of the halting problem that are constructive.

                                                                                                                                      ¬ (A ∧ ¬ A) is apparently provable in intuitionistic logic, which suffices for diagonalization arguments:

                                                                                                                                      lemma "¬ (A ∧ ¬ A)"
                                                                                                                                      proof (rule notI)
                                                                                                                                        assume 1: "A ∧ ¬ A"
                                                                                                                                        from 1 have "A" by (rule conjE)
                                                                                                                                        from 1 have "¬ A" by (rule conjE)
                                                                                                                                        from `¬ A` `A` show False by (rule notE)
                                                                                                                                      qed
                                                                                                                                      

                                                                                                                                      The general point still stands though, as there are other examples, such as non-constructive existence proofs.


                                                                                                                                      In general Coq proofs are not necessarily constructive, since one can define the law of the excluded middle as an axiom in Coq (and this is done in e.g. Coq.Classical). I can’t say anything about the proofs you linked though, as my Coq-foo is very limited.

                                                                                                                                      1. 1

                                                                                                                                        I think, the way the definitions expand in that repo, that undecidable (HaltTM 1) is decidable (HaltTM 1) -> decidable (HaltTM 1), which is trivially true. That is, it’s taken as an axiom that Turing machines are undecidable. (I think? I might be misreading)

                                                                                                                                      2. 2

                                                                                                                                        while Coq proofs are constructive, proving “~ exists t, t determines halting” does not construct any Turing machines, what it constructs is something that takes a hypothetical existence proof and builds a proof of false from it.

                                                                                                                                        i.e. in construct math, ~ P is a function P -> False. this function can never be invoked as you can never build a P object.

                                                                                                                                      3. 1

                                                                                                                                        As this post was tagged practices (not math) and the article addresses the difficulty of convincing people, I suspect you’ve misunderstood the author’s use of the phrase “constructive proof.”

                                                                                                                                        Mathematical (constructive proof) vs rhetorical (constructive criticism).

                                                                                                                                        1. 1

                                                                                                                                          I don’t feel like the author talks about constructive criticism at all, how did you come to that conclusion?

                                                                                                                                          1. 1

                                                                                                                                            As this post was tagged practices (not math) and the article addresses the difficulty of convincing people

                                                                                                                                            ☝🏾

                                                                                                                                            1. 2

                                                                                                                                              Just for future reference, “constructive proof” is a term from math and I meant my usage of the term to be an analogy to math. Math also involves convincing people. But maybe I should have tagged it computer science or math, sorry.

                                                                                                                                      1. 2

                                                                                                                                        I would like to write a blog post that i’ve wanted to write but i haven’t got around to it in the last few weeks. partly unsure about hosting.

                                                                                                                                        1. 2

                                                                                                                                          You could put it anywhere that allows your own domain - if you don’t like the service, just move the content later. But at least the post is out there already.

                                                                                                                                          1. 1

                                                                                                                                            I use bearblog.dev, it’s pretty simple and fast.

                                                                                                                                          1. 5

                                                                                                                                            Judging by the comments here I’m not interested in reading the article.

                                                                                                                                            But, why use ls | grep foo at all instead of *foo* as the argument for rm?

                                                                                                                                            1. 6

                                                                                                                                              I was also distracted by using the output of ls in scripting, which is a golden rule no-no.

                                                                                                                                              1. 1

                                                                                                                                                Is this not what ls -D is for?

                                                                                                                                              2. 5

                                                                                                                                                Despite “The UNIX Way” saying that we have all these little composable command line tools that we can interop using the universal interchange language of plaintext, it is also said that we should never parse the output of ls. The reasons for this are unclear to me, patches that would have supported this have been rejected.

                                                                                                                                                Definitely the glob is the right way to do this, and if things get more complex the find command.

                                                                                                                                                1. 5

                                                                                                                                                  “Never parse the output of ls” is a bit strong, but I can see the rationale for such a rule.

                                                                                                                                                  Basically the shell already knows how to list files with *.

                                                                                                                                                  for name in *; do  # no external processes started here, just glob()
                                                                                                                                                     echo $name
                                                                                                                                                  done
                                                                                                                                                  

                                                                                                                                                  That covers 90% of the use cases where you might want to parse the output of ls.

                                                                                                                                                  One case where you would is suggested by this article:

                                                                                                                                                  # Use a regex to filter Python or C++ tests, which is harder in the shell (at least a POSIX shell)
                                                                                                                                                  ls | egrep '.*_test.(py|cc)' | xargs -d $'\n' echo
                                                                                                                                                  

                                                                                                                                                  BTW I’d say ls is a non-recursive special case of find, and ls lacks -print for formatting and -print0 for parseable output. It may be better to use find . -maxdepth 1 in some cases, but I’m comfortable with the above.

                                                                                                                                                2. 3

                                                                                                                                                  why use ls | grep foo at all instead of *foo* as the argument for rm

                                                                                                                                                  Almost always, I use the shell iteratively, working stepwise to my goal. Pipelines like that are the outcome of that process.

                                                                                                                                                  1. 2

                                                                                                                                                    I gave an example below – if you want to filter by a regex and not a constant string.

                                                                                                                                                    # Use a regex to filter Python or C++ tests, which is harder in the shell (at least a POSIX shell)
                                                                                                                                                    ls | egrep '.*_test.(py|cc)' | xargs -d $'\n' echo
                                                                                                                                                    

                                                                                                                                                    You can do this with extended globs too in bash, but that syntax is pretty obscure. You can also use regexes without egrep via [[. There are millions of ways to do everything in shell :)

                                                                                                                                                    I’d say that globs and find cover 99% of use cases, I can see ls | egrep being useful on occasion.

                                                                                                                                                    1. 1

                                                                                                                                                      If normal globs aren’t enough, I’d use extended glob or find. But yeah, find would require options to prevent hidden files and recursive search compared to default ls. If this is something that is needed often, I’d make a function and put it in .bashrc.

                                                                                                                                                      That said, I’d use *_test.{py,cc} for your given example and your regex should be .*_test\.(py|cc)$ or _test\.(py|cc)$

                                                                                                                                                      I have parsed ls occasionally too - ex: -X to sort by extension, -q and pipe to wc for counting files, -t for sorting by time, etc.

                                                                                                                                                      And I missed the case of too many arguments for rm *foo* (for which I’d use find again) regarding the comment I made. I should’ve just read the article enough to know why ls | grep was being used.

                                                                                                                                                    2. 1

                                                                                                                                                      That’s clearly just a placeholder pipeline. No one actually wants *foo* anyhow.