1. 6

    The mentioned developer responds here and his reasoning and actions behind forking the library make sense to me.

    https://github.com/babel/babel/pull/13783#issuecomment-927107377

    1. 14

      Except he didn’t fork the project: he copied the code, removed attribution and removed the history. The history and attribution were added after he was outed on Twitter. If he had forked it all of the above would have been preserved.

      1. 1

        FWIW, under most licences you don’t need to preserve history to fork. Most do require attribution though, it’s true.

        1.  

          Not the MIT license that has been used here, but some licenses require a note about changes made, to varying degrees of verbosity. While that can be managed differently, keeping version control history is the the easiest way these days to keep everything in compliance. GNU projects have ChangeLog files for that purpose but then, GNU predates most version control tooling by a few years or decades.

          1.  

            Interesting, thanks

            1.  

              Not the MIT license that has been used here

              Please see my other comment as to why yours is incorrect.

              1.  

                My phrasing sucked, sorry (and now it’s too late to edit).

                I was referring to “need to preserve history to fork” - MIT doesn’t require change notifications unlike GPL, EUPL and others, MIT “only” requires attribution. Seems simple enough but apparently still too hard for folks sigh

                1.  

                  Ah, gotcha. Now I understand what you were saying “no” to.

        2. 4
          1. 1

            Oops, fixed. Thanks!

          2.  

            They explained things further in a Twitter thread, providing a timeline of sorts.

            In particular, the following tweet digs into the lineage of this particular library, suggesting that the author of the ‘original’ codebase had also copied liberally from another.

            Furthermore, it seems the accuser has resorted to DMCA takedowns in the past? I cannot verify this. Definitely would suggest some hypocrisy in the “spirit of open source” department.

          1. 3

            The fact that it works at all is amazing. However, 6502 is a really tough target for compiled languages. Even something as basic as having a standard function calling convention is expensive.

            1. 2

              Likewise, I’m very impressed it works. Aside from you correctly pointing out how weak stack operations are on the 6502, however, it doesn’t generate even vaguely idiomatic 6502 assembly. That clear-screen extract was horrible.

              1. 2

                The 6502 is best used treating zero page as a lot of registers with the same kind of calling convention as modern RISC (and x86_64) use: some number of registers that are used for passing arguments and return values and for temporary calculations inside a function (and so that leaf functions don’t have to save anything), plus a certain number of registers that are preserved over function calls and you have to save and restore them if you want to use them. The rest of zero page can be used for globals, the same as .sdata referenced from a Global Pointer register on machines such as RISC-V or Itanium.

                If you do that then the only stack accesses needed are push and pop or a set of registers. If you generate the code appropriately then you only have to know to save N registers on function entry and restore the same N and then return on function exit. You can use a small set of special subroutines for that, saving code size. RISC-V does exactly the same thing with the -msave-restore option to gcc or clang.

                Of course for larger programs you’ll want to implement your own stack (using two zero page locations as the stack pointer) for the saved registers. 256 bytes should be enough for just the function return addresses.

                1. 1

                  But I wonder how much of the zero page you can use without stepping on the locations reserved for ROM routines, particularly on the Apple II. It’s been almost three decades since I’ve done any serious programming on the Apple II, but didn’t its ROM reserve some zero-page locations for allowing redirection of ROM I/O routines? If I were programming for that platform today, I’d still want to use those routines, so that, for example, the Textalker screen reader (used in conjunction with the Echo II card) would work. My guess is that similar considerations would apply on the C64.

                  1. 1

                    The monitor doesn’t use a lot. AppleSoft uses a lot more, but that’s ok because it initialises what it needs on entry.

                    https://pbs.twimg.com/media/E_xJ5oWUYAAUo3a?format=jpg&name=4096x4096

                    Seems a shame now to have defaced the manual, but in my defence I did it 40 years ago.

                  2. 1

                    Now I’ve looked into the implementation I see they’re doing something like this, but using only 4 zero page bytes as caller-saved registers. This is nowhere near enough!

                    Even 32 bit ARM uses 4 registers, which should probably translate to 8 bytes on 6502 (four pointers or 16 bit integers).

                    x86_64, which has the same number of registers as arm32, uses six argument registers. RISC-V uses 8 argument registers, plus another 7 “temporary” registers which a called function is free to overwrite. PowerPC uses 8 argument registers.

                    6502 effectively has 128 16-bit registers (the size of pointers or int). There is no reason why you shouldn’t be at least as generous with argument and temporary registers as the RISC ISAs that have 32 registers.

                    I’d suggest maybe 16 bytes for caller-save (arguments), 16 bytes for temporaries, 32 bytes for callee-save. That leaves 192 bytes for globals (2 bytes of which will be the software stack pointer).

                    1. 1

                      Where are you going to save them? In the 256 BYTE stack the 6502 has? Even if the stack wasn’t limited, you still only have as most 65,536 bytes of memory to work with.

                      1. 1

                        Would be cool to see if this stuff were built to expect bank switching hardware.

                        1. 1

                          I quote myself:

                          Of course for larger programs you’ll want to implement your own stack (using two zero page locations as the stack pointer) for the saved registers. 256 bytes should be enough for just the function return addresses.

                          64k of total memory is of course a fundamental limitation of the 6502, so is irrelevant to what details of code generation and calling convention you use. Other than that you want as compact code as possible, of course.

                    2. 2

                      GEOS has a pretty interesting calling convention for some of its functions (e.g. used at https://github.com/mist64/geowrite/blob/main/geoWrite-1.s#L82): Given that there’s normally no concurrency, and little recursive code, arguments can be stored directly in code:

                      jsr function
                      .byte arg1
                      .byte arg2
                      

                      function then picks apart the return address to get at the arguments, then moves it forward before returning to skip over the data. A recursive function (where the same call site might be re-entered before leaving, with different arguments) would have to build a trampoline on a stack or something like that:

                      lda #argcnt
                      jsr trampoline
                      .word function
                      .byte arg1
                      ...
                      .byte argcnt
                      

                      where trampoline creates jsr function, a copy of the arguments + rts on the stack, messes with the returrn address to skip the arguments block, then jumps to that newly created contraption. But I’d rather just avoid recursive functions :-)

                      1. 1

                        Having to need self-modifying code to deal with function calls is reminding me of the PDP-8, which didn’t even have a stack - you had to modify code to put your return address in.

                        1. 1

                          Are those the actual arguments and self-modifying code is used to get non-constant data there? Or are the various .byte values the address to find the argument, in Zero Page?

                          That’s pretty compact at the call site, but a lot of work in the called function to access the arguments. It would be ok for big functions that are expensive anyway, but on 6502 you probably (for code compactness) want to call a function even for something like adding two 32 bit (or 16 bit) integers.

                          e.g. to add a number at address 30-31 into a variable at address 24-25 you’d have at the caller …

                              jsr add16
                              .byte 24
                              .byte 30
                          

                          … and at the called function …

                          add16:
                              pla
                              sta ARGP
                              tax
                              pla
                              sta ARGP+1
                              tay
                              clc
                              txa
                              adc #2
                              pha
                              tya
                              adc #0
                              pha
                              ldy #0
                              lda (ARGP),y
                              tax
                              iny
                              lda (ARGP),y
                              tay
                          
                          add16_q:
                              clc
                              lda $0000,y
                              adc $00,x
                              sta $00,x
                              lda $0001,y
                              adc $01,x
                              sta $01,x
                              rts
                          

                          So the stuff between add16 and add16_q is 26 bytes of code and 52 clock cycles. The stuff in add16_q is 16 bytes of code and 28 clock cycles. The call to add16 is 5 bytes of code and 6 clock cycles.

                          It’s possible to replace everything between add16 and add16_q with a jsr to a subroutine called, perhaps, getArgsXY. That will save a lot of code (because it will be used in many such subroutines) but add even more clock cycles – 12 for the JSR/RTS plus more code to pop/save/load/push the 2nd return address on the stack (26 cycles?).

                          But there’s another way! And this is something I’ve used myself in the past.

                          Keep add16_q and change the calling code to…

                              ldx #24
                              ldy #30
                              jsr add16_q
                          

                          That’s 7 bytes of code instead of 5 (bad), and 10 clock cycles instead of 6 – but you get to entirely skip the 52 clock cycles of code at add16 (maybe 90 cycles if you call a getArgsXY subroutine instead).

                          You may quite often be able to omit the load immediate of X or Y because one or the other might be the same as the previous call, reducing the calling sequence to 5 bytes.

                          If there’s some way to make add16 more efficient I’d be interested to know, but I’m not seeing it.

                          Maybe you could get rid of all the PLA/PHA and use TSX;STX usp;LDX #1;STX usp+1 to duplicate the stack pointer in a 16-bit pointer in Zero Page, grab the return address using LDA instead of PLA, and increment the return address directly on the stack. It’s probably not much better, if at all.

                          1. 1

                            These calling conventions are provided for some functions only, and mostly the expensive ones. From the way it’s implemented for BitmapUp, without looking too closely at the macros, it seems they store the return address at a known address and index through that.

                            GEOS has pretty complex functions and normally uses virtual registers in the zero page, so I guess this is more an optimization for constant calls: no need to have endless lists of lda #value; sta $02; ... in your code - as GEOS then copies it into the virtual registers and just calls the regular function, the only advantage of the format is compactness.

                      1. 9

                        This is one of those occurences where a technical solution is sought for a non-technical problem. I think Mozilla should rather complain to the EU Commission, especially given that Microsoft already had its fair share from the Commission on browser choice. Otherwise Microsoft will just change the mechanisms needed and Mozilla will have to reverse engineer it again.

                        1. 18

                          Wouldn’t be surprised if Mozilla also did this. Having this workaround in place (and then disarmed by Microsoft) helps build the case.

                          1. 8

                            It reminds me of Epic’s case with Apple. Mozilla may be doing this to force Microsoft’s hand into a scenario they can more easily challenge legally.

                          1. 2

                            It would be nice to have a standard API or html element or something to make selection uniform across all sites. We could integrate it into the browser settings.

                            1. 3

                              A request header perhaps…

                              The problem is, the sites with the dodgy banners want you to accept their tracking and cookies. It is not in their interest to make opt-out easier.

                              1. 4

                                Let’s call it Do-Not-Track, but write it as DNT to make it shorter.

                            1. 26

                              There are a lot of extensions that automatically select the ‘reject all’ or walk the list and decline them all. Why push people towards one that makes them agree? The cookie pop-ups are part of wilful misinterpretation of the GDPR: you don’t need consent for cookies, you need consent for tracking and data sharing. If your site doesn’t track users or share data with third parties, you don’t need a pop up. See GitHub for an example of a complex web-app that manages this. Generally, a well-designed site shouldn’t need to keep PII about users unless they register an account, at which point you can ask permission for everything that you need to store and explain why you are storing it.

                              Note also that the GDPR is very specific about requiring informed consent. It is not at all clear to me that most of these pop-ups actually meet this requirement. If a user of your site cannot explain exactly what PII handling they have agreed to then you are not in compliance.

                              1. 4

                                Can’t answer this for other people, but I want tracking cookies.

                                When people try to articulate the harm, it seems to boil down to an intangible “creepy” feeling or a circular “Corporations tracking you is bad because it means corporations are tracking you” argument that begs the question.

                                Tracking improves the quality of ad targeting; that’s the whole point of the exercise. Narrowly-targeted ads are more profitable, and more ad revenue means fewer sites have to support themselves with paywalls. Fewer paywalls mean more sites available to low-income users, especially ones in developing countries where even what seem like cheap microtransactions from a developed-world perspective would be prohibitively expensive.

                                To me, the whole “I don’t care if it means I have to pay, just stop tracking me” argument is dripping with privilege. I think the ad-supported, free-for-all-comers web is possibly second only to universal literacy as the most egalitarian development in the history of information dissemination. Yes, Wikipedia exists and is wonderful and I donate to it annually, but anyone who has run a small online service that asks for donations knows that relying on the charity of random strangers to cover your costs is often not a reliable way to keep the bills paid. Ads are a more predictable revenue stream.

                                Tracking cookies cost me nothing and benefit others. I always click “Agree” and I do it on purpose.

                                1. 3

                                  ‘an intangible “creepy” feeling’ is a nice way of describing how it feels to find out that someone committed a serious crime using your identity. There are real serious consequences of unnecessary tracking, and it costs billions and destroys lives.

                                  Also I don’t want ads at all, and I have no interest in targeted ads. If I want to buy things I know how to use a search bar, and if I don’t know I need something, do I really need it? If I am on a website where I frequently shop I might even enable tracking cookies but I don’t want blanket enable them on all sites.

                                  1. 4

                                    How does it “costs billions and destroys lives”?

                                    1. 2

                                      https://www.ftc.gov/system/files/documents/reports/consumer-sentinel-network-data-book-2020/csn_annual_data_book_2020.pdf see page 8. This is in the US alone and does not take the other 7.7b people in the world into account. I will admit it is not clear what percentage of fraud and identity theft are due to leaked or hacked data from tracking cookies so this data is hardly accurate for the current discussion, but I think it covers the question of ‘how’. If you want more detail just google the individual categories in the report under fraud and identity theft.

                                      Also see this and this

                                      But I covered criminal prosecution in the same sentence you just quoted from my reply above so clearly you meant ‘other than being put in prison’. Also, people sometimes die in prison, and they almost always lose their jobs.

                                      1. 4

                                        The first identity theft story doesn’t really detail what exactly happened surrounding the ID theft, and the second one is about a childhood acquaintance stealing the man’s ID. It doesn’t say how exactly either, and neither does that FTC report as far as I can see: it just lists ID theft as a problem. Well, okay, but colour me skeptical that this is cause by run-of-mill adtech/engagement tracking, which is what we’re talking about here. Not that I think it’s not problematic, but it’s a different thing and I don’t see how they’re strongly connected.

                                        The NSA will do what the NSA will do; if we had no Google then they would just do the same. I also don’t think it’s as problematic as often claimed as agencies such as the NSA also do necessary work. It really depends on the details on who/why/what was done exactly (but the article doesn’t mention that, and it’s probably not public anyway; I’d argue lack of oversight and trust is the biggest issue here, rather than the actions themselves, but this is veering very off-topic).

                                        In short, I feel there’s a sore lack of nuance here and confusion between things that are (mostly) unconnected.

                                        1. 2

                                          Nevertheless all this personal data is being collected, and sometimes it gets out of the data silos. To pretend that it never causes any harm just because some stranger on the internet failed to come up with a completely airtight example case in 5 minutes of web searching is either dishonest or naive. If you really want to know, you can do the research yourself and find real cases. If you would rather just feel comfortable with your choice to allow all tracking cookies that is also totally fine. You asked how, I believe my answer was sufficient and roughly correct. If you feel the need to prove me wrong that is also fine, and I will consider any evidence you present.

                                          1. 2

                                            The type of “personal data” required for identity theft is stuff like social security numbers, passport numbers, and that kind of stuff. That’s quite a different sort of “personal data” than your internet history/behaviour.

                                            To pretend that it never causes any harm just because some stranger on the internet failed to come up with a completely airtight example case in 5 minutes of web searching is either dishonest or naive. If you really want to know, you can do the research yourself and find real cases.

                                            C’mon man, if you’re making such large claims such as “it costs billions and destroys lives” then you should be prepared to back them up. I’m not an expert but spent over ten years paying close attention to these kind of things, and I don’t see how these claims bear out, but I’m always willing to learn something new which is why I asked the question. Coming back with “do your own research” and “prove me wrong then!” is rather unimpressive.

                                            If you would rather just feel comfortable with your choice to allow all tracking cookies that is also totally fine.

                                            I don’t, and I never said anything which implied it.

                                            If you feel the need to prove me wrong that is also fine, and I will consider any evidence you present.

                                            I feel the need to understand reality to the best of my ability.

                                            1. 1

                                              I feel the need to understand reality to the best of my ability.

                                              Sorry I was a bit rude in my wording. There is no call for that. I just felt like I was being asked to do a lot of online research for a discussion I have no real stake in.

                                              GDPR Article 4 Paragraph 1 and GDPR Article 9 Paragraph 1 specify what kind of information they need to ask permission to collect. It is all pretty serious stuff. There is no mention of ‘shopping preferences’. Social security numbers and passport numbers are included, as well as health data, things that are often the cause of discrimination like sexuality/religion/political affiliation. Also included is any data that can be used to uniquely identify you as an individual (without which aggregate data is much harder to abuse) which includes your IP, your real name.

                                              A lot of sites just ask permission to cover their asses and don’t need to. This I agree is annoying. But if a site is giving you a list of cookies to say yes or no to they probably know what they are doing and are collecting the above information about you. If you are a white heterosexual English speaking male then a lot of that information probably seems tame enough too, but for a lot of people having that information collected online is very dangerous in quite real and tangible ways.

                                      2. 3

                                        I am absolutely willing to have my view on this changed. Can you point me to some examples of serious identity theft crimes being committed using tracking cookies?

                                        1. 2

                                          See my reply to the other guy above. The FTC data does not specify where the hackers stole the identity information so it is impossible for me to say what percentage are legitimately caused by tracking cookies. The law that mandates these banners refers to information that can be used to identify individuals. Even if it has never ever happened in history that hacked or leaked cookie data has been used for fraud or identity theft, it is a real danger. I would love to supply concrete examples but I have a full time job and a life and if your claim is “Sure all this personal data is out there on the web, and yes sometimes it gets out of the data silos, but I don’t believe anyone ever used it for a crime” then I feel like its not worth my time spending hours digging out case studies and court records to prove you wrong. Having said that if you do some searching to satisfy your own curiosity and find anything definitive I would love to hear about it.

                                        2. 2

                                          someone committed a serious crime using your identity

                                          because of cookies? that doesn’t follow

                                        3. 1

                                          Well this is weird. I think it’s easy to read that and forget that the industry you’re waxing lyrical about is worth hundreds of billions; it’s not an egalitarian development, it’s an empire. Those small online services that don’t want to rely on asking for donations aren’t billion-dollar companies, get a deal entirely on someone else’s terms, and are almost certainly taken advantage of for the privilege.

                                          It also has its own agenda. The ability to mechanically assess “ad-friendliness” already restricts ad-supported content producers to what corporations are happy to see their name next to. I don’t want to get too speculative on the site, but there’s such a thing as an ad-friendly viewer too, and I expect that concept to become increasingly relevant.

                                          So, tracking cookies. They support an industry I think is a social ill, so I’d be opposed to them on that alone. But I also think it’s extremely… optimistic… to think being spied on will only ever be good for you. Advertisers already leave content providers in the cold when it’s financially indicated—what happens when your tracking profile tells them you’re not worth advertising to?

                                          I claim the cost to the individual is unknowable. The benefit to society is Cambridge Analytica.

                                        4. 2

                                          The cookie law is much older than GDPR. In the EU you do need consent for cookies. It is a dumb law.

                                          1. 11

                                            In the EU you do need consent for cookies. It is a dumb law.

                                            This is not true. In the EU you need consent for tracking, whether or not you do that with cookies. It has to be informed consent, which means that the user must understand what they are agreeing to. As such, a lot of the cookie consent UIs are not GDPR compliant. Max Schrems’ company is filing complaints about non-compliant cookie banners.

                                            If you only use functional cookies, you don’t need to ask for consent.

                                            1. 3

                                              https://eur-lex.europa.eu/legal-content/EN/TXT/HTML/?uri=CELEX:31995L0046 concerns consent of user data processing.

                                              https://eur-lex.europa.eu/legal-content/EN/TXT/HTML/?uri=CELEX:32002L0058 from 2002 builds on the 1995 directive, bringing in “cookies” explicitly. Among other things it states “The methods for giving information, offering a right to refuse or requesting consent should be made as user-friendly as possible.”

                                              In 2009 https://eur-lex.europa.eu/legal-content/EN/TXT/HTML/?uri=CELEX:32009L0136 updated the 2002 directive, closing a few loop holes.

                                              The Do-Not-Track header should have been enough signal to cut down on cookie banners (and a few websites are sensible enough to interpret it as universal rejection for unnecessary data storage), but apparently that was too easy on users? It went as quickly as it came after Microsoft defused it by enabling it by default and parts of adtech arguing that the header doesn’t signify an informed decision anymore and therefore can be ignored.

                                              If banners are annoying it’s because they’re a deliberate dark pattern, see https://twitter.com/pixelscript/status/1436664488913215490 for a particularly egregious example: A direct breach of the 2002 directive that is typically brought up as “the cookie law” given how it mandates “as user-friendly as possible.”

                                              1. 2

                                                I don’t understand what you’re trying to say. Most cookie banners on EU sites are not at all what I’d call a dark pattern. They’re just trying to follow the law. It is a stupid law which only trained people to click agree on all website warnings, making GDPR less effective. Without the cookie law, dark patterns against GDPR would be less effective.

                                                1. 3

                                                  The dark pattern pgeorgi refers to is that on many cookie banners, the “Refuse all” button requires more clicks and/or more careful looking than the “Accept all” button. People who have trained themselves to click “Accept” mostly chose “Accept” because it is easier — one click on a bright button, and done. If “Refuse all” were equally easy to choose, more people would train themselves to always click “Refuse”.

                                                  Let’s pretend for a moment the cookie law no longer exists. A website wants to set a tracking cookie. A tracking cookie, by definition, constitutes personally identifiable information (PII) – as long as the cookie is present, you can show an ad to specifically that user. The GDPR recognizes 6 different conditions under which processing PII is lawful.

                                                  The only legal ground to set a tracking cookie for advertising purposes is (a) If the data subject has given consent to the processing of his or her personal data. I won’t go over every GDPR ground, but suffice it to say that tracking-for-advertising-purposes is not covered by

                                                  • (b) To fulfil contractual obligations with a data subject;
                                                  • nor is it covered by (f) For the legitimate interests of a data controller or a third party, unless these interests are overridden by interests of the data subject.

                                                  So even if there were no cookie law, GDPR ensures that if you want to set a tracking cookie, you have to ask the user.

                                                  Conversely, if you want to show ads without setting tracking cookies, you don’t need to get consent for anything.

                                                  1. 2

                                                    I feel the mistake with the whole “cookie law” thing is that it focuses too much on the technology rather than what people/companies are actually doing. That is, there are many innocent non-tracking reasons to store information in a browser that’s not “strictly necessary”, and there are many ways to track people without storing information in the browser.

                                                  2. 1

                                                    I’m not saying that dark patterns are employed on the banners. The banners themselves are dark patterns.

                                                    1. 1

                                                      The banners often come from freely available compliance packages… It’s not dark, it’s just lazy and badly thought out, like the law itself.

                                                      1. 1

                                                        What about the law do you think is badly thought out?

                                                        1. 1

                                                          The cookie part of the ePrivacy Directive is too technological. You don’t need consent, but you do have to inform the user of cookie storage (or localstorage etc) no matter what you use it for. It’s unnecessary information, and it doesn’t protect the user. These are the cookie banners that only let you choose “I understand”, cause they only store strictly necessary cookies (or any kind of cookie before GDPR in 2016).

                                                          GDPR is the right way to do it. The cookie part of EPR should have been scrapped with GDPR. That would make banners that do ask for PII storage consent stand out more. You can’t make you GDPR banner look like an EPR information banner if EPR banners aren’t a thing.

                                              2. 2

                                                Usually when I see the cookie consent popup I haven’t shared any personal information yet. There is what the site has from my browser and network connection, but I trust my browser, uBlock origin and DDG privacy tools to block various things and I use a VPN to somewhere random when I don’t want a site to know everything it can about my network location.

                                                If I really do want to share personal info with a site, I’ll go and be very careful what I provide and what I agree too, but also realistic in that I know there are no guarantees.

                                                1. 8

                                                  If you’re using a VPN and uBlock origin, then your anonymity set probably doesn’t contain more than a handful of people. Combined with browser fingerprinting, it probably contains just you.

                                                  1. 2

                                                    Should I be concerned about that? I’m really not sure I have properly thought through any threats from the unique identification that comes from that. Do you have any pointers to how to figure out what that might lead to?

                                                    1. 9

                                                      The point of things like the GDPR and so on is to prevent people assembling large databases of correlated knowledge that violate individual privacy. For example, if someone tracks which news articles you read, they have a good first approximation of your voting preferences. If they correlate it with your address, they can tell if you’re in a constituency where their candidate may have a chance. If you are, they know the issues that are important to you and so can target adverts towards you (including targeted postal adverts if they’re able to get your address, which they can if they share data with any company that’s shipped anything physical to you) that may influence the election.

                                                      Personally, I consider automated propaganda engines backed by sophisticated psychological models to be an existential threat to a free society that can be addressed only by some quite aggressive regulation. Any unique identifier that allows you to be associated with the kind of profile that these things construct is a problem.

                                                    2. 2

                                                      Do you have a recommendation?

                                                  2. 2

                                                    The problem with rejecting all the tracking is that without it most ad networks will serve you the worst/cheapest untargeted adverts which have a high chance of being a vector for malware.

                                                    So if you reject the tracking you pretty much have to also run an ad-blocker to protect yourself. Of course if you are running an ad blocker then the cookies arent going to make much difference either way.

                                                    1. 1

                                                      I don’t believe it makes any difference whether you agree or disagree? the goal is just to make the box go away

                                                      1. 2

                                                        Yes. If I agree and they track me, they are legally covered. If I disagree and they track me then the regulator can impose a fine of up to 5% of their annual turnover. As a second-order effect: if aggregate statistics say 95% of people click ‘agree’ then they have no incentive to reduce their tracking, whereas if aggregate statistics say ‘10% leave the page without clicking either, 50% click disagree’ then they have a strong case that tracking will lose them business and this will impact their financial planning.

                                                    1. 20

                                                      I’ve been using JS since the late 90s and I haven’t even seen alert/confirm/prompt outside of toy tutorials or horrendous code since roughly 2005. I don’t find the arguments in this blog very convincing.

                                                      It’s also incredibly clickbait/alarmist, which is an immediate eye roll.

                                                      1. 37

                                                        I use confirm for actions that cannot be undone and are potentially dangerous. Why wouldn’t I? The alternative is to write a lot of code to throw up a modal div that does the same thing. Might as well do it natively.

                                                        1. 32

                                                          Wait until you meet enterprise software!

                                                          1. 28

                                                            In the relevant bug tracker discussion someone says this broke ERP software with hundreds of thousands of users.

                                                            1. 18

                                                              Half the argument of the blog post though is that “toy tutorials” are important and valuable in a way that isn’t captured by how often the feature is used in production. And most of the rest is about how actually, it’s valuable that code from 2005 still works. I think you are missing the forest for the trees.

                                                              1. 8

                                                                The article considerably overstates what the Chrome team is actually intending to ship: it’s disabling cross-origin alert inside iframes, not alert entirely. Most of the article seems to be an extremely uncharitable reading of Dominic hoping that “one day”, “in the far future”, “maybe” (literally these are direct quotes!) they can remove blocking APIs like alert — not that they have any plans to do so now or any time soon.

                                                                I don’t think the GP is missing the forest for the trees; I think the author is making a mountain out of a molehill.

                                                                1. 7

                                                                  Few things:

                                                                  • “Some day” tends to come a lot sooner than we’d expect.
                                                                  • This is the sort of thing people use to justify further encroachment down the line (“Well we already disable it for iframe stuff…”).
                                                                  • This directly reduces the utility of using iframes–and some folks still use those on occasion, and it is exceedingly tacky to unilaterally decide to break their workflows.
                                                                2. 1

                                                                  You can still do your little toy tutorials with alert/confirm/prompt, just don’t do them in an iframe?

                                                                  1. 2

                                                                    If you’re making a codepad-like site, you kind of have to put all the user-submitted JS in an alert so it’s not on your own domain.

                                                                    1. 1

                                                                      If you’re making a codepad-like site, you can also inject a polyfill for alert() etc in the user-controlled iframe to keep things working. Until you’re done locking down the codepad for arbitrary user scripts to run without problems, this is probably one of the smaller tasks.

                                                                      1. 2

                                                                        Can you make the polyfill block?

                                                                3. 11

                                                                  I use alert() and confirm(). It’s easy, simple, works, and doesn’t even look so bad since Firefox 89. I don’t think my code is “horrendous”; it’s just the obvious solution without throwing a bunch of JS at it.

                                                                  I agree this blog post isn’t especially great though.

                                                                  1. 1

                                                                    Do you use it in a cross-origin iframe?

                                                                    1. 3

                                                                      No, but your comment made no mention of that:

                                                                      I haven’t even seen alert/confirm/prompt outside of toy tutorials or horrendous code since roughly 2005.

                                                                      1. 1

                                                                        Sorry, I read your reply in the context of the blog post (i.e. your code is going to break).

                                                                        My line about horrendous code is hyperbolic, but the fact is that alert/confirm/prompt don’t offer customizability to make for a consistent, well-made UX. Maybe it’s not a problem for certain audiences (usually things like internal tools for devs end up having them), but most customer-facing solutions require more to their experience.

                                                                        I’m not saying they should remove them right now, but a day in the future where they go away (presumably deprecated due to a better option) is not something we should be dreading. Who knows if that day will even come.

                                                                  2. 8

                                                                    At $JOB we have used prompt for some simple scenarios where it solves the problem of getting user input in a scenario in some sync-y code, and was no fuss.

                                                                    We integrate with Salesforce through an iframe. This change caused us to have to like redo a whole tiny thing to get stuff working again (using a much heavier modal thing instead of, well, a call to prompt). It wasn’t the end of the world, but it was annoying and a real unforced error.

                                                                    We would love a scenario where browsers offered more rich input in a clean way (modals have been in basically every native GUI since the beginning of time!). I’m sure people would be way less frustrated if Chrome offered easy alternatives that don’t rely (for example) on z-index-overlays (that can break for a billion reasons) or stuff like that.

                                                                    Sometimes you just want input from somebody in a prompt-y way

                                                                    1. 5

                                                                      You haven’t seen a lot of business to business software then.

                                                                      1. 1

                                                                        That is still no reason to remove a perfectly functional feature that has worked reliability for decades and requir a orders of magnitude less resources than the alternative. Both human and computational resources.

                                                                        I use it all the time on simple UIs I write for my own usage or for restricted groups of users.

                                                                        The amount of resources that could be saved if we favoured well known, tried and true technology rather than the new aesthetically shiny thing, is astonishing.

                                                                        1. 2

                                                                          It’s not about “shiny things” but about use experience. Linux has suffered for decades due to the approach you’re talking about.

                                                                          1. 2

                                                                            No, Linux has suffered precisely because it does not offer a native GUI, or UI at all, forcing everyone to reinvent basic functionality like on the web.

                                                                      1. 4

                                                                        Wouldn’t you get a warning? If the compiler was able to prove a zero-divide, it ought to spit out a warning at the same time.

                                                                        1. 15

                                                                          Not usually, for two reasons.

                                                                          First, the compiler is not a monolith. Warnings are generated in the front end. This knowledge is typically available only after you’ve done a load of optimisations. At that point, you probably don’t have sufficient source information left to be able to provide a useful warning. It may be detectable division by zero only after a load of inlining, constant propagation, arithmetic reassociation, and so on. The division may be in one function, the place it ends up in another, and the information required to prove that the divisor in another.

                                                                          Second, the optimisation that generates this may generate the invalid instruction in multiple steps. The replacement of division by zero is likely to be replaced by a trap in a single step but by that point the optimiser doesn’t know that the division was present in the source (it may have been introduced by another transform) or if it will end up in the output (it may be dead code that will eventually be eliminated). The last of these is most relevant because compilers quite often make use of this kind of thing to find dead code. In theory at least, a valid C program may not contain undefined behaviour. If something would cause undefined behaviour, then it should not be possible and so can be eliminated. It’s therefore fairly common to see things that might be undefined behaviour in the middle of an optimisation pipeline, but most of them don’t end up in the final output.

                                                                          1. 5

                                                                            I do get warnings of this nature in Xcode — that rely on a lot of code-flow analysis — but they come from the Clang static analyzer, not the compiler itself. Xcode makes it easy to run both in parallel during a build, so I kind of forget I’m not just running “plain” Clang.

                                                                            1. 2

                                                                              Excellent answer, but at that point it’s a user interface problem. The compiler totally knows the divide-by-zero happens, but doesn’t quite have the information to explain exactly how it got to that point. Still, it seems like it’s still totally possible for a compiler to say “Hey, pro tip, this particular bit of the output code will Do Something Impossible on some inputs, you might want to check that out”, even if it still generates the same actual code.

                                                                              1. 8

                                                                                That addresses the first problem but not the second. Consider this (massively simplified) example:

                                                                                int x(int b, int c)
                                                                                {
                                                                                  return b / c;
                                                                                }
                                                                                
                                                                                int y(int b)
                                                                                {
                                                                                  if (b)
                                                                                  {
                                                                                    return 1;
                                                                                  }
                                                                                  return b;
                                                                                }
                                                                                
                                                                                int z(int a)
                                                                                {
                                                                                  int b = 0;
                                                                                  if (y(a))
                                                                                  {
                                                                                    return x(a, b);
                                                                                  }
                                                                                  return 1;
                                                                                }
                                                                                

                                                                                First thing you do is inline x, so you end up with:

                                                                                int y(int b)
                                                                                {
                                                                                  if (b)
                                                                                  {
                                                                                    return 1;
                                                                                  }
                                                                                  return b;
                                                                                }
                                                                                
                                                                                int z(int a)
                                                                                {
                                                                                  int b = 0;
                                                                                  if (y(a))
                                                                                  {
                                                                                    return a / b;
                                                                                  }
                                                                                  return 1;
                                                                                }
                                                                                

                                                                                Now you do some constant propagation:

                                                                                int y(int b)
                                                                                {
                                                                                  if (b)
                                                                                  {
                                                                                    return 1;
                                                                                  }
                                                                                  return b;
                                                                                }
                                                                                
                                                                                int z(int a)
                                                                                {
                                                                                  if (y(a))
                                                                                  {
                                                                                    return a / 0;
                                                                                  }
                                                                                  return 1;
                                                                                }
                                                                                

                                                                                Now you have a division by zero. Do you raise a warning? Let’s see what happens if you don’t. First, you transform it into a trap, because it’s definitely UB:

                                                                                int y(int b)
                                                                                {
                                                                                  if (b)
                                                                                  {
                                                                                    return 1;
                                                                                  }
                                                                                  return b;
                                                                                }
                                                                                
                                                                                int z(int a)
                                                                                {
                                                                                  if (y(a))
                                                                                  {
                                                                                    __trap();
                                                                                  }
                                                                                  return 1;
                                                                                }
                                                                                

                                                                                Now you inline y:

                                                                                int z(int a)
                                                                                {
                                                                                  if (a ? 0 : a)
                                                                                  {
                                                                                    __trap();
                                                                                  }
                                                                                  return 1;
                                                                                }
                                                                                

                                                                                Now you run constant propagation again:

                                                                                int z(int a)
                                                                                {
                                                                                  if (0)
                                                                                  {
                                                                                    __trap();
                                                                                  }
                                                                                  return 1;
                                                                                }
                                                                                

                                                                                Now you simplify the CFG, and you get:

                                                                                int z(int)
                                                                                {
                                                                                  return 1;
                                                                                }
                                                                                

                                                                                At the end of your optimisation, multiple steps after you thought you’d found a division by zero, you discover that it’s not there. In theory, you could keep around the information about the reason the trap was introduced in the first place and then propagate that up to the user if a trap survives optimisation, but that has two problems:

                                                                                • Just because it’s in the code, doesn’t mean that it’s actually reachable, it just means that the compiler can’t prove it’s unreachable. You’ll get a bunch of false positives. Imagine the above example where every function is from a separate compilation unit. Now you’d get the warning in a normal build but not with LTO. Not a great user experience.
                                                                                • The amount of extra information that you’d need to carry through the optimisation pipeline is huge. Conservatively, this would double the size of LLVM IR. Would you be happy if clang memory usage doubled?

                                                                                So, yes, it’s ‘just a UI problem’ but in the same way that my shell not recognising natural language instructions is ‘just a UI problem’.

                                                                                If you think this is a contrived example, I suggest that you compile some non-trivial C++ code and use clang’s -mllvm -print-after-all flags to see the IR after each optimisation step. You’ll see things like this in intermediate steps from C++ template specialisations all of the time. They’re a bit less frequent with modern C++ where constexpr if statements can trim some of the paths early, but they’re still pretty common. Turning them into false-positive warnings would be a terrible UI.

                                                                                1. 2

                                                                                  A compiler option to make __trap() an error during code generation (that is, after all the wrong candidates are optimized out) instead of an undefined opcode would be useful: Even if it’s in a code path that is never supposed to be executed, if you’re ending up with it in actual code and the optimizer couldn’t get rid of it, the code probably benefits from some massaging.

                                                                                  (And if you want to keep __trap() functional for manual use, have the optimizer use an internal symbol, __generated_trap or whatever, which either leads to an error or is rewritten to __trap in a final pass to then become undefined opcode)

                                                                                  1. 1

                                                                                    It’s pretty trivial to do this without a compiler flag: just grep your output for ud2, or whatever trap is lowered to on a given target. As with a compiler flag (that can be implemented without a complete redesign of the compiler), it will tell you where the traps are but not why.

                                                                                    1. 2

                                                                                      “just grep your output for ud2, or whatever trap is lowered to on a given target” isn’t “trivial” when “whatever trap is lowered to” can change in any compiler update without notice (because such details aren’t documented for mere mortals), while compiler updates are the most critical time where one might want to know about such issues (because the optimizer is trying all-new tricks), when people might want to support a larger number of architectures (and then have to check for all those possible traps), or when they might even have deliberate, manual uses of ud2-or-whatever-a-trap-compiles-to in some places.

                                                                                      It’s what I tried before I ended up with https://review.coreboot.org/c/coreboot/+/14364, and the time spent on that patch was well worth my while (even though I had to dive into gcc, yuck) because “just grep for ud2” was a mess.

                                                                                      Having a dedicated symbol for that purpose that can be intercepted would be a huge help to track down issues (even when there’s no description of the crack the optimizer smoked before creating the trap) but I guess exit will do for now.

                                                                            2. 5

                                                                              For that purpose, I have a compiler patch that changes such situations from __builtin_trap (which is compiled to the undefined opcode eventually) to exit. In the situation I have to deal with this (coreboot), we don’t have exit, so it becomes a link time error. These are rather simple to pinpoint with objdump -dS.

                                                                              That’s the best I have been able/willing to build so far (without digging into the compiler internals too much) but it helped me a couple of times as in firmware space, those “undefined” opcodes are a non-descript hang, not a segfault or SIGILL that you can intercept with a debugger to see where and why they happen.

                                                                              1. 2

                                                                                This is an interesting hack. I hadn’t planned on building my own toolchain from scratch but now I’m curious to know what other surprises are left for me in my project. Thanks!

                                                                              2. 3

                                                                                It will often be impossible to tell at compile time whether that code path will ever be executed – this requires solving the “Halting Problem”.

                                                                                1. 2

                                                                                  You would think so, right? Ignoring the div/0 problem, it certainly seems that if the compiler decides to emit illegal instructions that a warning (or even error) might be warranted.

                                                                                  But in answer to your question, I was unable to find any command line option that would trigger a warning.

                                                                                  1. 3

                                                                                    Try running the Clang static analyzer. I’d forgotten that Xcode runs it when I build, and it’s what produces that type of warning.

                                                                                    1. 2

                                                                                      That’s a solid suggestion and I’ll definitely check it out. What is kind of missing from my original post is that this example came from a 3rd party library for a sensor. The compiler generated code for that did not have any illegal instructions.

                                                                                      The illegal instruction came after I came along looking at a crash and thought, hmm, computed divisor, wonder if it’s zero and I added the “GOTCHA” logic. And then after I just happened to notice that the processor status register bits were not set as one might expect.

                                                                                      Long way of saying that I’m not sure that using the analyzer would have prevented this journey of discovery.

                                                                                1. 7

                                                                                  If you’re using integer IDs to identify data or objects, don’t start your IDs at 1. Start them at a huge number (e.g., 2^33) so that no object ID will ever appear in any other role in your application (like a count, a natural index, a byte size, a timestamp, etc).

                                                                                  Passing around random integers and logic like “well it’s somewhere in the order of eight and a half billion, so it must be a user” sounds like a really fucking shitty way to write most programs - both in terms of making assumptions, and in terms of developer productivity.

                                                                                  Ok, very memory sensitive, massively concurrent systems will see a noticeable operational benefit to passing around an integer, rather than an Object, but I’d wager that 99.9% of people will never work on such a project. Even if you don’t want to go full on Model Objects, at least use wrapped integers for your IDs (e.g. class UserID { public int $id } - and then your methods (or your global functions if that’s your kink) can at least typehint to require a UserID, so a function hypothetical get_friends(UserID $id): array will throw immediately if you pass in say a PhotoID, or a GroupID, or any other random integer.

                                                                                  1. 8

                                                                                    well it’s somewhere in the order of eight and a half billion, so it must be a user

                                                                                    That’s not what they’re saying: What the article states is “if you put your user IDs outside the accidentally reachable number space, accidentally trying to parse a small number won’t hand out some user data”. At no point is that arrangement supposed to mean if x >= large_num { x_is_user = true; }

                                                                                    1. 4

                                                                                      Like I said: relying on an arbitrary integer being “outside accidentally reachable space” sounds like a fucking terrible idea, rather than just, you know, using the type system available to you, to say “hey we need a fucking User ID, not just any random integer”.

                                                                                      You (the proverbial you not you specifically) may as well also propose using ranges of integers starting at each billion, for different object types, so you can do away with foreign keys in your RDBMS.

                                                                                      Let me put this another way: if your codebase is written in such a way that you’re relying on user ID’s being some magically unique number, not appearing in any other form, to provide any semblance of security or privacy, you’ve already failed.

                                                                                      1. 3

                                                                                        It’s very poorly explained, but the idea is that you have a bunch of things indexed by some kind of numerical ID. If your language doesn’t give you nice unit types, it’s very easy to confuse integer-representing-a-Foo-ID and integer-representing-a-Bar-ID (and loop induction variable that was supposed to be an index into an array of Foo IDs, and many other things). If you start everything at 0, then an accidental type confusion in your program will probably still find a valid thing. If you start both at different, moderately large, random indexes, then type confusion will probably trigger some kind of thing-not-found error. This is much easier to find in testing: the observable failure is close to the bug.

                                                                                        It’s not about segregating the types and guaranteeing that different numerical ranges refer to different types, it’s about finding the errors where you make the confusion.

                                                                                        If you were doing this in a language like C++, you’d have a separate type for each of these IDs and mark the casts to and from ints as explicit, so you’d have to explicitly write something like: ProductID id(sessionid.as_integer()) and that would be likely to be picked up in code review. PHP doesn’t really help you here.

                                                                                        1. 3

                                                                                          PHP doesn’t really help you here.

                                                                                          Like I said, if your scale is such that passing around actual model instances isn’t feasible (so already a pretty slim minority of the software world), a wrapper class with a single integer property is going to use minimal memory, and still lets you use types to only accept/return an object that is “known” to be a User ID, or Product ID or whatever.

                                                                                          I’m not sure why you would think this won’t work in PHP, or pretty much any language that has even the most basic concept of classes.

                                                                                          1. 3

                                                                                            So you use this nice user type in your code but at some point your code is supposed to present a list of users (e.g. their facebook friends) to the user on the other side of the HTTP connection. For that, you read out the user data (name, picture, …) and present them, but you also need some identifier to put into the URL that is opened when they click on the friend’s link. Now what?

                                                                                            Of course that data gets sanitized on input (and could be wrapped into a User ID object at that point) again, but still: there’s this number floating around in some shape or form. Doesn’t hurt to keep it outside the “normal” number space to avoid running into funny issues down the road (because, you know, coders make mistakes).

                                                                                            I’ve seen similar advice to start such numbers at 2^53 if there’s a chance that double-by-default languages such as Javascript (or JSON parsers that try to be compliant) mess with them, just so developers see resulting issues immediately rather than at some distant point in time when nobody remembers what’s going on.

                                                                                            It’s simply a very cheap defensive programming technique when you deal with something that carries an ID somewhere.

                                                                                            1. 5

                                                                                              Now what? Now you have some actual security. You lookup a user in a table, you check if the active session user has permissions to do whatever the link is supposed to do - neither of those is different logic if the underlying integer is 8 or 8 billion.

                                                                                              Doesn’t hurt to keep it outside the normal search space to avoid running into funny issues down the road (because, you know, coders make mistakes).

                                                                                              What the fuck is “normal search space”?

                                                                                              No user input should be trusted. Your argument is proof of why this ridiculous theory is bad security theatre - there’s nothing to stop a client sending a request with 8 rather than 2^33 in the parameter that identifies the user. If your application is written well, that shouldn’t matter: regular security/privacy should prevent them from seeing/doing things they shouldn’t. At worst they should get an error message.

                                                                                              Your suggestion implies that they might be able to do/see something they shouldn’t be able to, because e.g. the number they send happens to be the ID of a non-user object.

                                                                                              If that is the case, you’re basically arguing in favour of security by obscurity. If that is not the case, then you’re arguing in favour of security theatre.

                                                                                              So which is it?

                                                                                              1. 4

                                                                                                Your suggestion implies that they might be able to do/see something they shouldn’t be able to, because e.g. the number they send happens to be the ID of a non-user object.

                                                                                                I’m arguing that some coder might factor out some user ID handling code into a piece of work that translates it into a plain integer type and write their functions around that. And then have some coder (maybe the same, but just as clueless) cast values when using those function instead of fixing the mess, and again by mistake, they happen to cast an enum type (which typically cast into the 0..n range for small n). And this is all in a strongly-typed environment: The Phabricator folks use PHP and therefore I’d assume that they write their blog posts for a PHP-using audience, and PHP’s type system provides fewer guarantees (although they’re cleaning up their act. slowly.)

                                                                                                I’d rather have it explode on them then, than give reasonably-looking-at-a-glance data because the CEO thought it’s cool to have UID 1.

                                                                                                As I wrote, coders make mistake.

                                                                                                Counter question: what irritates you so much about simply starting a counter at a large value that you exploded like that (see the expletives in the first post)? It’s a no-cost guard rail that is ideally never needed, but as it costs nothing, and might protect against stupid mistakes (even though it’s pretty weak), why bother?

                                                                                                1. 2

                                                                                                  I would rather have it explode at them as soon as possible rather than when you reach 2^n users, by which time a fix might be much more difficult both to do and to trace.

                                                                                                  1. 2

                                                                                                    “It’s a no-cost guard rail that is ideally never needed, but as it costs nothing, and might protect against stupid mistakes (even though it’s pretty weak), why bother?”

                                                                                                    I think the problem is that, as stephenr says in his reply, it’s akin security by obscurity. It’s convincing yourself - or your future self, or whoever looks at this system later, that all’s fine because these numbers are big and therefore we’ve solved the problem. By doing something that ‘might protect’ rather than something that will protect, the problem gets worse, as now we’re lulled into a false sense of security.

                                                                                                    Yes, a system shouldn’t explode because someone wanted UID 1, but the system shouldn’t act like it’s fine for 4 years and then explode because we’ve been comparing UIDs and creation timestamps like that joke signpost that seems to be all over the world (population + height above sea level … total = …) and we’ve only just hit the timestamp and the UID where that mattered.

                                                                                                    Typing is important and what the article advocates is an easy ‘solution’ that’s dangerous and is something software development should have moved past by now. UUIDs/GUIDs are now commonplace and absolutely appropriate for, well, unique identifiers. Namespacing prefixes work reasonably well (UID-123) but are prone to humans making up rules (‘I’ve only ever seen UIDs with 3 digits so therefore if someone tells me they have UID 1 that means I should write UID-001 and an ID of UID-1000 is invalid’).

                                                                                                    1. 2

                                                                                                      as it costs nothing, and might protect against stupid mistakes, why bother

                                                                                                      Lots of things cost nothing and someone claims “might” do something. I’d rather just do something that does protect against the issue, and ignore the security theatre.

                                                                                                      Edited, @vakradrz makes a good point.

                                                                                                      1. 3

                                                                                                        I agree with your argument (and have made my own reply to the parent comment) but please try to keep it civil, as while it’s an important topic and I’m sure we’ve seen disasters due to such designs, there’s still good intention here and education is more difficult with this kind of tone.

                                                                                                        1. 2

                                                                                                          You make a good point.

                                                                                        2. 4

                                                                                          For me, the most compelling reason to assign IDs as described in the article is so that you can grep your log files for those IDs and probably not get false positives. Even just starting your IDs at 256 means that you won’t get collisions with the components of IPv4 addresses. Starting at 10,000 means you won’t get collisions with the components of IPv6 addresses, starting at 32,769 means you won’t collide with PIDs (depending on how your system is configured), and so on. You can go whole-hog with this and use UUIDs for everything, and then you’re even less likely to have collisions, but that has its own drawbacks.

                                                                                          That being said, I think it’s important to view this as a minor developer affordance and not as a substitute for a type system. If you find yourself doing this because you’re getting loop counters confused with entity IDs… I don’t think making the entity IDs larger is the right solution.

                                                                                          1. 2

                                                                                            … is it really that hard to have your logs reflect the type as well as the ID? How do you grep for anything that isn’t using an artificially large PK?. I’d have thought user:123 was easier to get a valid result against than just 9993939191939393

                                                                                            1. 2

                                                                                              Sure, of course your logs should be clear about the meaning of each piece of information. But if you’re grepping through logs multiple times per day, every day, then not having to type user: each time starts to give you a nontrivial time savings—and, more significantly, it feels like there’s less friction in the process. I’m assuming that the numbers are going to be copied and pasted anyway, which means there isn’t a time difference between grepping for a shorter number or a longer one. And your approach depends on a higher level of consistency in writing log messages than I think is common at most places—if someone leaves out the user: in one particular message, you’re back at grep -w.

                                                                                              1. 3

                                                                                                This logic doesn’t make any sense to me, because in any non-trivial application, users are just one type of thing that you’d want to be able to identify.

                                                                                                I don’t buy the idea that users are some unique thing you’d want to search for, but products, sales, payments, groups, etc etc - whatever the actual business items of the application are - are not equally as important. This is why none of the arguments presented make any sense to me: offsetting one type of object by 2^33 doesn’t solve the same supposed issue for all the other object types you have, so unless your application is so trivial you only have two object types: users and… something else, the offset mechanism is not a useful solution to any of the problems presented, IMO.

                                                                                                1. 2

                                                                                                  I think the idea is that you would offset users by 2^33 (for example), products by 2^34, sales by 2^35, and so on. Of course there are obvious problems with this scheme: if you have more than 2^34 products, the IDs for those are going to start overlapping with the IDs for sales. If you have more than around 30 entities in your system, you won’t be able to offset all of them like this and stay within 64 bits.

                                                                                                  That’s why I think it’s vital to treat this scheme as just a developer affordance and not any kind of data integrity or type safety feature. Like I said, I think this is only really helpful for grepping logs… I think I may disagree with some of the other commenters on that point.

                                                                                                  1. 1

                                                                                                    Your comment actually made me go back and re-read the article. I think you’re right that they are talking about all objects, not just users specifically, but their reasoning seems to be essentially, what you alluded to before:

                                                                                                    If you find yourself doing this because you’re getting loop counters confused with entity IDs… I don’t think making the entity IDs larger is the right solution.

                                                                                                    In the example given, getting a list of users returns an associative array using integer IDs as the key, and boolean true as the value. They then proceed to call array_slice without setting the preserve_keys flag to true, and get back a 0-indexed array of boolean true.

                                                                                                    I wouldn’t be surprised if this is a real world example from Facebook, given some of the absolutely garbage examples shown in the leaked dumps of the codebase from several years ago - but to use this ridiculous pattern as a reason for starting your object IDs at 34 billion, is beyond stupid.

                                                                                                    1. 2

                                                                                                      Agreed. I ended up re-reading the article too, and apparently the “larger IDs make searching logs easier” point was something I made up; the article didn’t say that. None of the reasons they give for making the IDs larger seem like sound engineering to me.

                                                                                          2. 3

                                                                                            This isn’t meant to be a primary way to distinguish valid IDs from random numbers, but as a defense in depth in case you screw up your code.

                                                                                            Most likely you’re going to need to work with SQL, JSON, URLs, and other places where you’ll have to put an untyped number. Newtype in the language doesn’t help in cases like this:

                                                                                            get_friends(new UserID($_GET['photo_id']))
                                                                                            
                                                                                            1. 2

                                                                                              So what happens when someone changes your URL from ?uid=9809890809809890 to ?uid=9.

                                                                                              SQL of all places is a ridiculous example. Are you searching every table looking for a PK match?

                                                                                              1. 3

                                                                                                I think you’re still reframing this as if it was meant to be a security measure or some kind of bullet-proof protection. It’s not. It’s a “lint” that may help catch a programmer’s error. It is not intended to catch nor detect any outside interference.

                                                                                                I’ve chosen SQL, because SQL doesn’t accept PHP types as arguments (unless you implement a very fancy type-safe ORM, I guess?). There could be mistakes like misaligning ? placeholders and their values, or selecting columns in a wrong order. Even when you only use named placeholders and fetch rows as assoc arrays, if you join multiple tables with an id column, you might accidentally pick the wrong one. Bugs can happen. The trick is about making such bugs fail louder, sooner.

                                                                                                1. 2

                                                                                                  The trick is about making such bugs fail louder, sooner.

                                                                                                  If your developers don’t notice that their piece of code is returning the wrong user, I honestly don’t think they’ll notice that it’s not returning any user, because they’re clearly not testing what they write, even in the most basic of “I tried this once on my local machine” sense.

                                                                                            2. 1

                                                                                              Yeah this is weird. If am doing ‘get friends’ I want a list of friends, not integers.

                                                                                            1. 7

                                                                                              $200 per month in recurring costs at the end. Not bad if you’re running a business, but otherwise pretty steep for home use.

                                                                                              But I guess for home use you might as well keep the computer running 24/7… in your home.

                                                                                              1. 6

                                                                                                But I guess for home use you might as well keep the computer running 24/7… in your home.

                                                                                                Pretty much what I do, in conjunction with wireguard (proxying from a $5 VPS).

                                                                                                1. 4

                                                                                                  You may want to proxy from a free oracle VPS instead (10TB transfer/month) https://www.oracle.com/cloud/free/#always-free

                                                                                                  1. 2

                                                                                                    What are they getting out of it?

                                                                                                    1. 1

                                                                                                      Your contact details, to sell to marketers.

                                                                                                    2. 1

                                                                                                      Oh wow thank you for linking this, seems like a great offer. Might make me actually stop paying for hosting completely.

                                                                                                      1. 4

                                                                                                        You become the product there though

                                                                                                        1. 1

                                                                                                          I recently signed up for one, to use as a secondary VM. For an “always free” plan, that’s sure an excellent offer, assuming they don’t change their mind abruptly one day.

                                                                                                          1. 3

                                                                                                            they wouldn’t do that, it says “always” right on the box

                                                                                                    3. 2

                                                                                                      One (potential) downside of hosting out of your home is running into your ISP’s AUP (Acceptable Use Policy). Sometimes these outright forbid hosting any servers. But even for those that don’t forbid such acts, they usually have clauses that forbid serving material that is not illegal but is simply indecent, racist, or defamatory. It could prove challenging to remain in compliance if you host a site with user-generated content where even a small number of users are inclined towards posting such things.

                                                                                                      1. 4

                                                                                                        Add a jump host somewhere on the net (eg. some $5/month OVH system) that routes all connections through a VPN (or Tor to an onion service, which would make it harder for the provider to tell your ISP about you) back to the host at home: CloudFlare on a shoestring budget (there are many such providers and by tweaking DNS entries you can hop relatively quickly)

                                                                                                        1. 2

                                                                                                          This doesn’t help you comply with an AUP, just circumvent it.

                                                                                                          1. 3

                                                                                                            That’s often good enough. Using encryption makes it difficult for other agents to see any details about your internet traffic, and that includes your ISP enforcing its AUP on you (which they would be very prone to enforcing selectively, that is, only if they had some reason to think you specifically were a political problem for them).

                                                                                                        2. 2

                                                                                                          Which is when you end up towards running your own ISP, or other such nonsense.

                                                                                                        1. 8

                                                                                                          [ Disclaimer: I have no knowledge of AMD roadmaps ]

                                                                                                          I don’t find it at all surprising that AMD is developing an Arm[1] chip. They’ve been an Arm licensee for ages and already use Arm cores in some places (e.g. in the platform security processor). I’d be quite surprised if they hadn’t had a group working on fitting an Arm front end to their cores for a while. That said, there’s a big difference between ‘working on X’ and ‘shipping X as a product’. There’s a big gap between ‘AMD developing an Arm core internally so that they have leverage with Intel when they renew cross-licensing deals’ and ‘AMD plans on shipping an Arm laptop part’. I’d love to know which of these it actually is. Apple kept their x86 implementation of OS X around for around a decade before the Intel switch, to use in negotiations with IBM and Motorola / FreeScale. It took a change in the competitive landscape outside of their control before they shipped it as a product.

                                                                                                          [1] Minor aside: Arm redid their branding a few years back and their style guide now recommends that you write it as Arm not ARM. It originally stood for Acorn RISC Machines, then Advanced RISC Machines, but it was just ARM Holdings for a while and they’ve dropped the term ‘RISC’ from everything as well. They now refer to the Arm architectures as ‘load-store architectures’, not as RISC. The instruction sets for both AArch32 and AArch64 are pretty massive, but they are orthogonal and everything is added because a compiler / OS actually can make use of it, unlike traditional CISC cores. It makes me chortle a bit when I read articles that talk about ARM RISC cores.

                                                                                                          1. 2

                                                                                                            That said, there’s a big difference between ‘working on X’ and ‘shipping X as a product’.

                                                                                                            They did have an Arm SoC as a product, the Opteron A1100, that could be purchased.

                                                                                                            1. 2

                                                                                                              Could it? I thought it never went beyond pre-purchase/demo.

                                                                                                              1. 2

                                                                                                                It could albeit briefly. There was a generally available board on 96boards.org

                                                                                                                http://armdevices.net/2015/11/16/amd-huskyboard-96boards-enterprise-edition-explained-by-jon-masters-of-red-hat/

                                                                                                                1. 2

                                                                                                                  A more popular product was the SoftIron Overdrive 1000/3000

                                                                                                            2. 1

                                                                                                              Going off on a tangent…

                                                                                                              The name load-store architecture makes sense to me for the arm instruction set, but the name has always made me wonder about its counterparts. Someone invented that name for one class within a classification, presumably because the classification made sense as a way to separate CPU architectures into top-level classes. What is that classification and what are other other classes?

                                                                                                              1. 1

                                                                                                                The other class is CISC where you have instructions that tell the CPU to load a value, modify it and store it back (eg. x86: addl $3, 4)

                                                                                                                It never got a “fancy” name, and I guess the load-store naming only appeared because “reduced instruction set” was hard to say with a straight face when talking about an architecture with 1000+ instructions (e.g. ARM) because reduced can mean both “functionally reduced instructions” (e.g. load-store architecture) and “reduced number of instructions” (what RISC originally was as well, but only incidental).

                                                                                                            1. 11

                                                                                                              This may be a dumb question, but does adding serialization/deserialization greatly increase the latency of the RAM? Won’t there always be a benefit in keeping RAM directly attached?

                                                                                                              1. 8

                                                                                                                10-15ns penalty on Power10 because of the externally attached DRAM controller. It’s just a footnote.

                                                                                                                1. 9

                                                                                                                  10-15ns hardly constitutes a footnote for main memory latency.

                                                                                                                  1. 4

                                                                                                                    On the machine that I write this on, latency to DRAM is 170ns. It’s a high-end multi-socket capable CPU from 2018.

                                                                                                                    It matters far less there than on most customer workloads.

                                                                                                                    1. 6

                                                                                                                      Hmmm. Could you share your machine specs and measurements in more detail? On my less high end machine from 2017 main memory latency is ~60ns. And from personal experience I’d be shocked if any recent CPU had >100ns main memory latency.

                                                                                                                      1. 3

                                                                                                                        Client processors have far lower memory latency than server ones.

                                                                                                                        It’s nearly impossible to find a server CPU with below 100ns of memory latency. But at least, there’s plenty of bandwidth.

                                                                                                                        A random example from a machine (not my daily, which isn’t using AMD CPUs): https://media.discordapp.net/attachments/682674504878522386/807586332883812352/unknown.png

                                                                                                                  2. 4

                                                                                                                    Wow! That’s the same as if the memory was on the other side of the room.

                                                                                                                  3. 7

                                                                                                                    What does directly attached mean? Suppose main memory is connected to the CPU cores via a cache, another cache and a third cache, is that directly attached? Suppose main memory is distant enough, latent enough, that a blocking read takes up as much time as executing 100 instructions, is that directly attached?

                                                                                                                    It’s a question of physics, really: How quickly can you send signals 5cm there and 5cm back? Or 10cm, or 15cm. Modern RAM requires sending many signals along slightly different paths and having them arrive at the same time, and “same time” means on the time scale that light takes to travel a few millimeters. Very tight time constraints.

                                                                                                                    (Almost two decades ago we shifted from parallel interfaces to serial ones for hard drives, AIUI largely to get rid of that synchronisation problem although I’m sure the narrower SATA cables were more convenient in an everyday sense too.)

                                                                                                                    1. 4

                                                                                                                      Even DIMMs have a page and row selection mechanisms that introduce variable latency depending on what you want to access. Add 3 levels of caching between that and the CPU and it’s rather likely that with some slightly larger caches somewhere and much higher bandwidth (as is the promise of independent lanes of highly tuned serial connections) you more than compensate for any latency cost incurred by serialization.

                                                                                                                      Also, memory accesses are per cache-line (usually 64 byte) these days, so there’s already some kind of serialization going on when you try to push 512 bits (+ control + ECC) over 288 pins.

                                                                                                                    1. 9

                                                                                                                      I would love to see a side-by-side with Catala Lang, which was posted here awhile back, though only the Git repo: https://lobste.rs/s/b74svy/catalalang_catala

                                                                                                                      1. 28

                                                                                                                        Hi! Author of both Mlang and Catala here :) So Catala is basically an evolution/reboot of the M language, but this time done right using all the PL best practices.

                                                                                                                        1. 7

                                                                                                                          Wait, are you for real? That is absolutely fascinating! My wife is a lawyer (which makes me not a lawyer) and I am very interested in these types of intersections. Namely where a highly regimented and regulated domain gives rise to some type of formalism once exposed to CS through some “interdisciplinary process”.

                                                                                                                          I have studied DSL design peripherally but would really like to pick your brain about some things. I did once, long ago, design a policy language. Are you open to additional discussions and collaboration?

                                                                                                                          1. 11

                                                                                                                            Ha ha ha yes this area is fascinating. I have the impression that there’s a lot of people in legaltech that are all trying to make a DSL to express parts of the law but have no clue about how to properly make a DSL.

                                                                                                                            I am open to discussions and collaboration, moreover both Mlang and Catala are open-source and accept contributions. Hit me up using the email in the Mlang paper for instance :)

                                                                                                                            1. 2

                                                                                                                              As a lawyer designing my own DSL ;) I would love to know how using of Mlang has affected legislation. For example how do you deal with law being changed? Does your parliament creates updates as “diffs” or as already “merged” texts? Do you use lawxml? Soo many questions!

                                                                                                                              1. 4

                                                                                                                                The French laws are usually written in terms of “diff”. Also I had made a prototype that warned which articles of law your program was relying on were about to expire https://twitter.com/DMerigoux/status/1252914283836473345?s=19. I don’t use any form of XML, I just copy paste the law text to start writing a Catala program. XML would not improve the way Catala programs are written since the XML structure does not follow the logical structue of the law but rather its formatting structure, which we don’t care when translating it to executable code.

                                                                                                                          2. 4

                                                                                                                            Hi Denis - nothing constructive to say except that I am a British CS student and my friends and I are big fans of your work! In fact I think a friend of mine will be basing his undergraduate thesis on your ideas :-)

                                                                                                                            1. 4

                                                                                                                              Thanks Jack! Well if your friend does end up basing his undergrad thesis on Catala or else please drop me an email, I’ll be happy to give feedback or suggest interesting things to look at.

                                                                                                                            2. 2

                                                                                                                              I want to just praise you for the time and effort you put into this space. I’ve recently got into “hobbyist” law myself, specifically Canadian law (http://len.falken.ink/law/101.txt), and instantly had the same thoughts: where are the formal proofs? :) Sure there are tax calculators, and some will creators, but are they rigorous? Can they tell us other properties of a situation?

                                                                                                                              I’m 100% going to play with Catala. This is technology worth spending time on because law governs our every day lives.

                                                                                                                              1. 2

                                                                                                                                but this time done right using all the PL best practices.

                                                                                                                                Does this mean that DGFiP is migrating to something one of the implementers considers not done right?

                                                                                                                                1. 4

                                                                                                                                  I suppose it’s easier to migrate step by step: Improve the tooling, so that everything can be in the open without security concerns and so the system can evolve more easily from its apache cgi-bin roots. That’s what MLang seems to offer.

                                                                                                                                  Once that’s in place, there can be further steps to improve the language (e.g. by introducing Catala) because the foundations are state of the art again. And even if that doesn’t happen, the system is still better off than before because it’s a single system instead of a single system + 25 years of wrappers that extend it ad-hoc.

                                                                                                                                  1. 3

                                                                                                                                    I could not have said it better!

                                                                                                                                  2. 2

                                                                                                                                    Migrating to Mlang improves the compiler but the M language stays the same. For instance, in DGFiP’s M, there are no user-defined functions. And the undefined value in M is a contant reminder of the “billion dollar mistake”. So yes we can definitely improve the M language from its 1990 design :)

                                                                                                                                  3. 1

                                                                                                                                    I’m just curious: who is driving all this? Is this simply something you one day decided to go and implement, or were you approached by someone to do this seemingly huge project? How do you get it financed, did you have backing from the start?

                                                                                                                                    Fascinating stuff!

                                                                                                                                    1. 10

                                                                                                                                      I started looking into this after watching this talk: https://youtu.be/EshxZVMURt4. I always wondered whether it was possible for me to play with formal methods outside the traditional application domains like security or safety-critical embedded systems. Then I fell into a rabbit hole :) I started with a Python prototype of French law encoded into SMT, then moved to try and use the DGFiP code and ended up coding Mlang, then created Catala as a next logical step. I created these on my spare time during my PhD and was helped by some friends who contributed to the open source repos. I’m only starting now to have institutional backing! During a French PhD, your funding is secured for the whole duration from the start so I didn’t have to worry about it and could focus on other things. I would say stable and long-term unconditional funding enabled me to create all this. In my opinion research should promote that instead of the myriad of tiny little funding sources, each of them requiring a lot of paperwork to fill. But in that regard I go against the zeitgeist.

                                                                                                                                1. 5

                                                                                                                                  Betteridge’s law strikes again.

                                                                                                                                  One of the key features of a blockchain, which the author tries to handwave away, is that every link in the chain is verifiable, and unalterable. The author tries to claim that because each commit carries a reference to its parent, it’s a “chain of blocks”, but it’s not so much a chain as just an order. You can edit the history of a git repo easily, reparent, delete, squash, and perform many other operations that entirely modify the entire chain. It was kinda made that way.

                                                                                                                                  1. 12

                                                                                                                                    The technical properties of git’s and common block chain data structures are relatively similar.

                                                                                                                                    You can also fork a bitcoin block chain and pretend that your fork is the canonical one. The special bit about block chains is that there’s some mechanism for building agreement about the HEAD pointer. Among other things, there’s no designated mover of that pointer (as in a maintainer in a git-using project), but an algorithm that decides which among competing proposals to take.

                                                                                                                                    1. 16

                                                                                                                                      They are technically similar because both a blockchain and a git repo are examples of a merkle tree. As you point out though the real difference is in the consensus mechanism. Git’s consensus mechanism is purely social and mostly manual. Bitcoin’s consensus mechanism is proof of work and mostly automated.

                                                                                                                                      1. 2

                                                                                                                                        Please stop referring to “Proof of _” as a consensus mechanism. It is an anti-sybil mechanism, the consensus mechanism is called “longest chain” or “nakomoto consensus” - you can use a different anti-sybil mechanism with the same consensus mechanism (but you may lose some of the properties of bitcoin).

                                                                                                                                        The point is that there are various different combinations available of these two components and conflating them detracts from people’s ability to understand what is going on.

                                                                                                                                        1. 2

                                                                                                                                          You are right. I was mixing definitions there. Thanks for pointing it out. The main point still stands though. The primary distinction between a blockchain and git is the consensus mechanism and not the underlying merkle tree datastructure that they both share.

                                                                                                                                        2. 1

                                                                                                                                          Mandatory blockchain != bitcoin. Key industrial efforts listed in https://wiki.hyperledger.org/ are mostly not proof-of-work in any way (the proper term for this is permissioned blockchain, which is where industrial applications are going).

                                                                                                                                          1. 2

                                                                                                                                            You are correct. I don’t disagree at all. I used bitcoin as an example because it’s well known. There are lots of different blockchains with different types of consensus mechanisms.

                                                                                                                                      2. 2

                                                                                                                                        You can make a new history but it will always be distinct from the original one.

                                                                                                                                        I think what you’re really after is the fact that there is no one to witness that things like the author and the date of a commit are genuine – that is, it’s not just that I can edit the history, I can forge a history.

                                                                                                                                        1. 1

                                                                                                                                          Technically you haven’t really made the others disappear. They are all still there just not easily viewed without using reflog. All you are really doing is creating a new branch point and moving the branch pointer to the head of that new branch when you do those operations. But to the average user it appears that you have edited history.

                                                                                                                                          1. 1

                                                                                                                                            what was all that hullabaloo about git moving away from SHA-1 due to vulnerabilities? why where they using a cryptographic hash function in the first place?

                                                                                                                                            what you said makes sense, but it seems to suggest this SHA-1 thing was a bit of bikeshedding or theater

                                                                                                                                            1. 2

                                                                                                                                              Git uses a cryptographic hash function because it wants to be able to assume that collisions never occur, and the cost of doing so isn’t too large. A collision was demonstrated in SHA-1 in 2017.

                                                                                                                                              1. 3

                                                                                                                                                SHA-1 still prevents accidental collisions. Was Git really designed to be robust against bad actors?

                                                                                                                                                1. 2

                                                                                                                                                  ¯_(ツ)_/¯

                                                                                                                                                  1. 1

                                                                                                                                                    The problem is that it was never properly defined what properties people expect from Git.

                                                                                                                                                    You can find pieces of the official Git documentation and public claims by Linus Torvalds that are seemingly in contradiction to each other. And the whole pgp signing part does not seem to be very well thought through.

                                                                                                                                                2. 2

                                                                                                                                                  Because you can sign git commits and hash collisions ruins that.

                                                                                                                                                  1. 1

                                                                                                                                                    ah that makes some sense

                                                                                                                                              1. 12

                                                                                                                                                I give Google absolutely zero benefit of the doubt here. Everyone should assume as a matter of course that the Google Play store cannot be relied upon to host software that offends the political sensibilities of either Google themselves, or a sufficiently motivated group of people who are willing to abuse the abuse-report mechanism to have Google censor it on their behalf.

                                                                                                                                                What I would like to know is why it is the case that the F-droid version of the app is “out of date”. Is there some reason it’s more difficult to ensure that F-droid has the latest released version, compared to the Google play store? I would personally like to see the Element team treat free software distribution channels as first-class, and treat the Google play store as a only a secondary channel, used to make it as easy as possible for as many people as possible to obtain the app.

                                                                                                                                                1. 6

                                                                                                                                                  What I would like to know is why it is the case that the F-droid version of the app is “out of date”.

                                                                                                                                                  I believe apps in the official F-droid repository are updated/packaged based on “pulls” from the F-droid team, instead of triggered by “pushes” from individual app developers. The F-droid team is aware this sometimes results in some lag, and is working on improving the cycle time.

                                                                                                                                                  Element could also host their own F-droid repo to speed up the process, but AFAIK that comes with a couple of challenges with signatures stemming from non-reproducible builds. I could swear I saw element_hq mention they were considering it somewhere, but now I can’t find a source. Here’s the github issue for it if you’d like to add a thumb: https://github.com/vector-im/element-android/issues/1857

                                                                                                                                                  1. 2

                                                                                                                                                    The announcement post linked above contains:

                                                                                                                                                    Update: reminder that in the interim you can download a (slightly outdated) version of Element Android from F-Droid at https://f-droid.org/en/packages/im.vector.app. We’re also looking into running our own F-Droid repository going forwards so the most recent build is always available there.

                                                                                                                                                    1. 2

                                                                                                                                                      Yes, I’ve one app on f-droid and they build everything from source to ensure “what you see is what you get”. Thus they have to actively poll every source repo and rebuild in a queue. Some repos don’t even have some automation enabled (for example checking of git release tags) and thus need each version manually submitted if such a method for release detection is not available or too many builds fail.

                                                                                                                                                    2. 3

                                                                                                                                                      Never ascribe to malice which can be better explained by blind algorithms.

                                                                                                                                                      I think this app was targeted by a coordinated reporting campaign and Google’s automated system removed it.

                                                                                                                                                      On Monday a human will see the social media shitstorm and reinstate it.

                                                                                                                                                      1. 15

                                                                                                                                                        Frankly if you’re going to have robots take out applications with at least a hundred thousand active and long term users on a weekend you should probably have someone on duty to undo the damage.

                                                                                                                                                        1. 17

                                                                                                                                                          You can bet there are people working 24/7 at Google to serve the needs of advertisers.

                                                                                                                                                          Users? Not so much. App store moderation is a cash sink. Let the bots handle it.

                                                                                                                                                          1. 3

                                                                                                                                                            It appears that after this shitstorm they brought someone in to help stop the PR bleeding:

                                                                                                                                                            Update: we just got a call from a Google VP who explained the suspension was triggered by a report of extremely abusive content accessible on the http://matrix.org server. Our trust & safety team had already acted on it, and the app should be reinstated shortly.

                                                                                                                                                            I wonder if they’ll fix the process. And by “wonder if” I mean “think it is unlikely that”.

                                                                                                                                                            1. 2

                                                                                                                                                              If you can think of a way to solve this (absolutely no false positives or false negatives during review) at the scale of any of the more popular app stores, please apply as SVP for that product area at any of the app store wielding companies ASAP.

                                                                                                                                                              (“Don’t do an app store” won’t fly anymore after users became used to it)

                                                                                                                                                              1. 3

                                                                                                                                                                I can. And they can. They’re smarter than me. The reason I think it’s unlikely is because there are obvious ways to improve, and the only reason they wouldn’t is because these shit storms don’t hurt them. It’s not a priority.

                                                                                                                                                                Some observations:

                                                                                                                                                                • Firefox displays a ton of extremely objectionable content. So does Vivaldi. So does Opera. They never auto-ban these applications.

                                                                                                                                                                • Google has employees who know how to look at an application like this on the weekend and fix it. As evidenced by the action I linked.

                                                                                                                                                                If they wanted to avoid this, there are two obvious paths:

                                                                                                                                                                1. They could let developers say “my app is like firefox, displaying content that I don’t control from arbitrary places on the internet” and let someone at the same level as the person who put this app back in the store verify that assertion before placing the app in that bucket so that it’s no longer subject to automated takedowns.

                                                                                                                                                                2. They could, whenever an application with a number of users above (insert threshold here) gets flagged for takedown, require someone at the same level as the person who put this app back in the store to verify that the takedown is appropriate.

                                                                                                                                                                If these steps are obvious to me, they’re obvious to anyone google has paid to think about it. They’d have already taken one of these steps last time this happened, if fixing this were a priority.

                                                                                                                                                                1. 2

                                                                                                                                                                  In this specific case, the content seems to have come from a server controlled by the same entity that controls the app (from one of the updates: “[Google person] explained the situation, which related to some extremely abusive content which was accessible on the default matrix.org homeserver”), so “my app can display stuff from anywhere” wasn’t sufficient this time.

                                                                                                                                                                  That said, apparently the matrix.org operators have an abuse team and the Play Store folks now have the contact information on file.

                                                                                                                                                                  I also fully expect metrics about users to be part of the assessment, but those could be relatively easily gamed to extend the lifetime of malware in the store. Rough sketch: create a harmless do-nothing app, install it on a few thousand dummy devices somewhere (this step can be done as preparation long before actual use. could be done by some app-shells-as-a-service outlet), change owner, add “useful” purpose, market it to death to get your victims to install it (victim-user-base-as-a-service company), update with malware on friday noon US pacific time, have problem identified on friday evening. But as the user metrics say that the app is “important”, it gets deferred to the Monday team meeting - or needs VP intervention, which will only work that often before the VP will request to be taken out of the loop by whatever means necessary ;-)

                                                                                                                                                                  I think there has been a push for stricter content control after Jan 6 and they are still filling in some blanks in the process - which may also explain the VP involvement, which probably means that there has been lots of escalation behind the scenes (VPs don’t usually get on individual cases by themselves). This likely ruined the weekend of a sizable group of people in that team, not just some on-caller who was scheduled for being around and who probably ticked all the right boxes in their (re-)review of the app, especially since hosting and app is operated by the same group. I think this will serve as motivation to change the process so this particular scenario doesn’t happen again.

                                                                                                                                                                  (Disclosure: I work at Google, but have no insights into how Play Store operates, just a few educated guesses about megacorp behavior)

                                                                                                                                                                  1. 4

                                                                                                                                                                    In this specific case, the content seems to have come from a server controlled by the same entity that controls the app (from one of the updates: “[Google person] explained the situation, which related to some extremely abusive content which was accessible on the default matrix.org homeserver”), so “my app can display stuff from anywhere” wasn’t sufficient this time.

                                                                                                                                                                    That still feels to me like banning firefox because some extremely objectionable content made it onto forums.mozillazine.org. Which is to say nobody would do that on an automated basis if they’d accurately characterized the application, I don’t think.

                                                                                                                                                                    I think there has been a push for stricter content control after Jan 6 and they are still filling in some blanks in the process - which may also explain the VP involvement, which probably means that there has been lots of escalation behind the scenes (VPs don’t usually get on individual cases by themselves). This likely ruined the weekend of a sizable group of people in that team

                                                                                                                                                                    That’s a very interesting point. I wasn’t thinking about the matrix ban in that context.

                                                                                                                                                                    I think the malware concerns you brought up are also interesting, but there should be some other data points at play there that can help which are missing when it comes to content that’s parsed by humans.

                                                                                                                                                                2. 1

                                                                                                                                                                  If you can think of a way to solve this (absolutely no false positives or false negatives during review) at the scale of any of the more popular app stores, please apply as SVP for that product area at any of the app store wielding companies ASAP.

                                                                                                                                                                  There’s a substantial difference between an inscrutable AI and human reviewers; this is at least the second random ban on HN this month. Apple’s review process is not without issues but they use human reviewers. Google could use manual review and alter the app store submission costs accordingly. They certainly are capable of this but it’s a core value to use AI instead of providing customer support.

                                                                                                                                                      1. 6

                                                                                                                                                        This is what early free and open people celebrated as “meritocracy”, before it became clear that the particular kind of lazy, blinkered discrimination actually practiced online, as distinct from the mythical fair holistic contest of achievement, potential, and brilliance hackers found so flattering, had terrible social consequences.

                                                                                                                                                        This was in fact the entire point of the word “meritocracy” from the beginning. It was coined as satire.

                                                                                                                                                        1. 2

                                                                                                                                                          This was in fact the entire point of the word “meritocracy” from the beginning. It was coined as satire.

                                                                                                                                                          And yet, the meaning and connotation of a term can change: See “blacklist”, a term already used in the 17th century that only recently became associated with racial tensions, so folks request to retire it now.

                                                                                                                                                          While meritocracy is terribly hard to implement and easy to abuse as a gatekeeping device, I wouldn’t be too hung up on its “original” meaning to reject the notion altogether.

                                                                                                                                                          1. 1

                                                                                                                                                            The point of examining meritocracy is to realize that any choice of partition or gradient, regardless of its particular operational benefits, is going to lead to an unjust and harmful society which oppresses the bulk of its population. How far would you change the meaning, and where would you try to go with it? By any definition of “better” and “best”, meritocracy writes its own critique.

                                                                                                                                                            1. 1

                                                                                                                                                              That’s an actual argument. “It was coined as satire”, less so in my opinion (but commonly used as if it was).

                                                                                                                                                          2. 1

                                                                                                                                                            Yeah, it’s one of those tropes that’s better known in truncated form, which happens to give exactly the wrong impression. Like “rotten apple”, “information wants to be free”, and “Utopia”.

                                                                                                                                                            I’ve basically just given up. Unless the misapprehension is really the point I want to make.

                                                                                                                                                          1. 6

                                                                                                                                                            Although the author only sketches it, there is a left-hand path here; we can return to Free Software and force corporate consumers of the commons to comply with licenses which disgust them. As I have documented, some licenses are FSF-approved but not OSI-approved, and these licenses often are also not useful for the enumerated corporations.

                                                                                                                                                            1. 4

                                                                                                                                                              Thank you for the link to your answer on stackexchange. That was a very enlightening read!

                                                                                                                                                              1. 3

                                                                                                                                                                we can return to Free Software and force corporate consumers of the commons to comply with licenses which disgust them

                                                                                                                                                                You literally can’t. The power dynamics don’t now, and won’t ever, allow it.

                                                                                                                                                                1. 2

                                                                                                                                                                  You didn’t read my second link; I provide evidence for my claim, including listing some companies which have made public statements or commitments to avoiding certain licenses even when those licenses cover essential software.

                                                                                                                                                                  1. 2

                                                                                                                                                                    …yes, which refutes your claim that you “can force corporate consumers . . . to comply with licenses which disgust them.” They’re opting out, not opting in; that’s not a win condition.

                                                                                                                                                                    1. 2

                                                                                                                                                                      It is very much a winning condition for the Free Software community. We don’t need for corporations to use our code, after all; it is not our problem if they must invent everything for themselves. We would hope, indeed, that if the community is large and robust enough, then corporations would be forced to be polite community members without special privileges, or else be completely outpaced by the combined momentum of folks using code which they’re not allowed to touch. Or, as I put it in a previous thread:

                                                                                                                                                                      At scale, if the public commons is larger than any one corporation’s pool of coders, then this could prevent corporations from entering into public spaces which are broadly populated by people.

                                                                                                                                                                      1. 1

                                                                                                                                                                        [Someone not using our software] is very much a winning condition for the Free Software community.

                                                                                                                                                                        Huh. That’s an… interesting, and rather tautological, way to define your terms. But thanks for the clarification.

                                                                                                                                                                2. 1

                                                                                                                                                                  WTFPL and Unlicense are disliked by entities that come with a legal department because the licenses don’t tick the right boxes for being a dependable tools (e.g. they lack a warranty disclaimer). I consider these anti-endorsements free legal advice because if the lawyers of a megacorp can’t make those licenses work for them (and it’s not just due to their business model or way of operating, like with Affero clauses) it’s likely that neither can I.

                                                                                                                                                                  1. 3

                                                                                                                                                                    I don’t think it’s true that you have the same legal needs and worries as large corporations. And this distinction is precisely what enables us to imagine that we have not completely run out of possibilities for using licensing alone to make progress with Free Software in society.

                                                                                                                                                                    1. 2

                                                                                                                                                                      I have different legal needs, so Affero clauses are fine for me personally. A license that is so blatantly US-centric as to potentially not be enforcable where I live is a legal worry for me just as it is for Google (does not live in Germany, but operates here). So, no Unlicense for me.

                                                                                                                                                                      Disclosure: I work at Google, and my assessment of some of these decisions could be colored by off-the-cuff remarks made by open source team staff in random internal forums that may not be reflected in such detail in the public Google documentation.

                                                                                                                                                                      However these arguments are also made by other parties, including the FSF, and they make sense to me: For example the argument that putting things in the public domain is not a thing in many countries (such as Germany), so there needs to be a clear alternative (Unlicense isn’t clear to me in that regard) and where the PD mechanism works there’s a risk that some default warranty pops up because you might not be able to disclaim copyright and at the same time disclaim warranty. That seems a bit contrived, but I’m not enough of a lawyer (as in, not at all) to rule that out completely, and it has been brought up.

                                                                                                                                                                      So choosing Unlicense just because it’s off-limits for Google seems like a poor choice: Go AGPL or maybe EUPL, which have a similar anti-corporate effect while being written with some legal care (although EUPL has a bunch of uncertainties that make me reconsider every time I re-read it).

                                                                                                                                                                      They also have the advantage of being fundamentally unfriendly towards “corporations that extract value from other people’s work without giving back” (which seems to be the main thrust of this entire exercise), while the rules that prohibit Unlicense, CC0 and other quasi-public domain arrangements at this time are entirely incidental and might be dropped if the legal risk assessment regarding public domain work in those corporations ever changes (for example if “PD with warranty disclaimer” gets support throughout the legal system in the US).

                                                                                                                                                                    2. 1

                                                                                                                                                                      The Unlicense has a warranty disclaimer.

                                                                                                                                                                      1. 1

                                                                                                                                                                        Well, the disclaimer thing was an example :-)

                                                                                                                                                                        As for the Unlicense, it’s not quite clear to me what is supposed to happen under this license in a jurisdiction with copyright law that doesn’t permit dedicating works into the public domain. I guess the barebone license in the second paragraph is supposed to take over, but who knows?

                                                                                                                                                                        In this case, I can also lean on the free legal advice by the FSF who proposes using the CC0 instead for ticking all the right boxes (while still considered not acceptable by Google’s lawyers, apparently).

                                                                                                                                                                        0BSD seems to be the most accepted “I really don’t care what happens” license variant of all and it carefully avoids any explicit public domain dedication, making it work the same inside and outside the US.

                                                                                                                                                                        1. 2

                                                                                                                                                                          I was just pointing out what appeared to be an inaccuracy in your comment.

                                                                                                                                                                          The questions surrounding the Unlicense are why I dual license my projects under the MIT and the Unlicense.

                                                                                                                                                                  1. 13

                                                                                                                                                                    I’ve really enjoyed reading this blog over the last few weeks. He has a great perspective and explains the legal side well. Seems like there is an “Open Source Industrial Complex” where lots of money is made selling products and having conferences about “open source”.

                                                                                                                                                                    1. 5

                                                                                                                                                                      You’ll hear people who work in the field joke about a “compliance-industrial complex”. I think that started back in the early 2000s, after big companies started permitting use of open source in masse. Salespeople for nascent compliance solutions firms would fly around giving C-level officers heartaches about having to GPL all their software. My personal experience of those products, both for ongoing use and for one-off due diligence, is that they’re way too expensive, painful to integrate, just don’t work that well, and only make cost-benefit if you ingest a lot of FUD. Folks who disagree with me strongly on other issues, like new copyleft licenses, agree with me here.

                                                                                                                                                                      That said, I don’t mean to portray what’s going on in the open source branding war as any kind of conspiracy. There are lots of private conversations, private mailing lists, and marketing team meetings that don’t happen in the open. But the major symptoms of the changing of the corporate guard are all right out there to be seen online. That’s why I walked through the list of OSI sponsors, and linked to the posts from AWS and Elastic. It’s an open firefight, not any kind of cloak-and-dagger war.

                                                                                                                                                                      1. 7

                                                                                                                                                                        Agreed. I’m getting increasingly tired by some communities’ (especially Rust’s) aggressive push of corporate-worship-licenses like BSD, MIT (and against even weak copy-left licenses like MPL).

                                                                                                                                                                        1. 17

                                                                                                                                                                          I’m saying this with all the respect in the world, but this comment is so far detached from my perception of license popularity that I wanna know from which niche of the tech industry this broad hatred of Rust comes from. To me it seems like one would have to hack exclusively on C/C++/Vala projects hosted on GNU Savannah, Sourcehut or a self-hosted GitLab instance to reach the conclusion that Rust is at the forefront of an anti-copyleft campaign. That to me would make the most sense because then Rust overlaps with the space you’re occupying in the community much more than, say, JavaScript or Python, where (in my perception) the absolute vast majority of OSS packages do not have a copyleft license already.

                                                                                                                                                                          1. 3

                                                                                                                                                                            Try shipping any remotely popular library on crates.io and people heckle you no end until they get to use your work under the license they prefer.

                                                                                                                                                                            Lessons learned: I’ll never ship/relicense stuff under BSD/MIT/Apache ever again.

                                                                                                                                                                            1. 2

                                                                                                                                                                              this broad hatred of Rust comes from

                                                                                                                                                                              Counter culture to the Rust Evangelism Strike Force: Rust evangelists were terribly obnoxious for a while, seems like things calmed down a bit, but the smell is still there.

                                                                                                                                                                              1. 1

                                                                                                                                                                                I think it’s beneath this site to make reactionary nonsense claims on purpose.

                                                                                                                                                                                1. 2

                                                                                                                                                                                  How is criticizing a (subset) of a group for their method of communication “reactionary”?

                                                                                                                                                                                  1. 1

                                                                                                                                                                                    I’m saying soc’s claim about Rust pushing for liberal licensing is nonsense and probably reactionary to the Rust Evangelism Strike Force if @pgeorgi’s explanation is true. My point is that “counter culture” is not an excuse to make bad arguments or wrong claims.

                                                                                                                                                                                    1. 2

                                                                                                                                                                                      OK, that makes a bit more sense.

                                                                                                                                                                                  2. 2

                                                                                                                                                                                    reactionary nonsense claims

                                                                                                                                                                                    like talking about some “broad hatred of Rust” when projects left and right are adopting it? But the R.E.S.F. is really the first thing that comes to my mind when thinking of rust, and the type of advocacy that led to this nickname sparked some notable reactions…

                                                                                                                                                                                    (Not that I mind rust, I prefer to ignore it because it’s just not my cup of tea)

                                                                                                                                                                              2. 7

                                                                                                                                                                                I won’t belabor the point, but I’d suggest considering that some of those project/license decisions (e.g. OpenBSD and ISC) may be about maximizing the freedom (and minimizing the burden) shared directly to other individual developers at a human-to-human level. You may disagree with the ultimate outcome of those decisions in the real world, but it would be a wild misreading of the people behind my example as “corporate worshipping”.

                                                                                                                                                                                As I have said before: “It’s important to remember that GNU is Not Unix, but OpenBSD userland is much more so. There isn’t much reason to protect future forks if you expect that future software should start from first principles instead of extending software until it becomes a monolith that must be protected from its own developers.”

                                                                                                                                                                                Not all software need be released under the same license. Choosing the right license for the right project need not require inconsistency in your beliefs about software freedoms.

                                                                                                                                                                                1. 6

                                                                                                                                                                                  The specific choice of MIT/Apache dual-licensing is so unprincipled and weird that it could only be the result of bending over backwards to accommodate a committee’s list of licensing requirements (it needs to compatible with the GPL versions 2 and 3, it needs a patent waver, it needs to fit existing corporate-approved license lists, etc). This is the result of Rust being a success at all costs language in exactly the way that Haskell isn’t. Things like corporate adoption and Windows support are some of those costs.

                                                                                                                                                                                  1. 3

                                                                                                                                                                                    I can’t speak directly to that example, as I don’t write Rust code and am not part of the Rust community, but it would not surprise me if there were different and conflicting agendas driving licensing decisions made by any committee.

                                                                                                                                                                                    I do write code in both Python and Go (languages sharing similar BSD-style licensing permissiveness), and my difficult relationship to the organization behind Go (who is also steward of its future) is not related in any way to how that language has been licensed to me. Those are a separate set of concerns and challenges outside the nature of the language’s license.

                                                                                                                                                                            1. 4

                                                                                                                                                                              This is, to put it as politely as I can, incoherent. I have no idea what point the author is trying to make. I know it has something to do with the various differences in Open Source licenses, and people’s attitude towards them, but I am really struggling to figure out the point of this. I get that the author is very upset about something, but I have no idea what, or why.

                                                                                                                                                                              1. 2

                                                                                                                                                                                The point seems to be:

                                                                                                                                                                                • SAAS companies like AWS work fine with common open source licenses (sometimes with the exception of Affero-style licenses)
                                                                                                                                                                                • Open Source licenses might attach requirements to certain uses of code (e.g. offering modified source code with binaries in case of the GPL, or offering modified source code with service access in case of the AGPL)
                                                                                                                                                                                • There’s a crop of new software licenses that try to make SAAS-reuse of programs less attractive, to protect business models
                                                                                                                                                                                • Those licenses are generally not well received
                                                                                                                                                                                • That poor reception must obviously be a conspiracy by the SAAS folks that managed to subdue the open source community. They want to ensure that they can profit from other people’s work for free. (this reading may be a bit less charitable than possible)