1. 17

    This answer shows the “beauty” of mathematics thinking and, at the same time, the “ugliness” of mathematics communication and abstraction.

    The author simply explains the concept of transcendental numbers not by hiding behind a definition but by walking the reader through the intuition behind it. I believe that his way to explain this concept is how most people end up to understanding transcendental numbers.

    At the same time, mathematicians are very prone to hide this learning process and to present their understanding from “first-principles” or in an axiomatic way. This leads new learners to rely on these axioms to “understand” a concept instead of the intuition behind them. In this example, other responders using the definition of polynomial equation to define transcendental numbers. There is nothing wrong with this approach when building up the “mathematics cathedral of knowledge”, but it requires new learners to follow along multiple years of study to understand a concept that could also be introduced in a more intuitive way.

    1. 3

      Maybe I am missing the point of the post, but I don’t agree with some of the points you raise.

      Human brains are shit at certain tasks, things like finding the strongest correlation with some variables in an n-million times n-million valued matrix. Or heck, even finding the most productive categories to quantify a spreadsheet with a few dozen categorical columns and a few thousand rows.

      That might be true of individual brains, but you can put together a lot of people and get to the same result. See how mathematical computations were carried out by humans computers before the introduction of electronic computers.

      Similarly, I don’t agree with the overall framing of your critique. You can say something like “Human bodies are shit at certain tasks, things like digging a 100ft long 10ft deep trench. Or heck, even walking more than a few miles a day or even run a few hundred feet”. We have excavators and cars for a reason. Similarly, we have computers to do “boring” computation work for us.

      This is not to say that the scientific establishment is doomed or anything, it’s just slow at using new technologies, especially those that shift the onus of what a researcher ought to be doing.

      Well, isn’t there the saying that suggests scientific progress happens one funeral at at time? It feels to me that the problem is that researchers tend to ask questions that they can reasonably answer. The questions that they ask are informed by their methods and theoretical preparation. So, if a researcher doesn’t have a good understanding of machine learning models and what they could do, they will probably not think of a research line that will leverage these new methods to move the field forward. Instead, they will probably just apply them toy to an old problem (which will probably be solved by an old boring method just as well).

      I wonder if we are at the intersection of “old” and “new”. While we are at the intersection, we are still applying the “new” models to the “old” problems. Our problem space hasn’t opened up yet to the possibilities that the “new” methods opened (maybe because we haven’t figured them out yet). That makes me wonder whether we need to push ourselves to imagine and tackle new, unexplored problem spaces that machine learning (or any new technology) has opened for us instead of trying to solve old problems with the new methods.

      1. 2

        That might be true of individual brains, but you can put together a lot of people and get to the same result. See how mathematical computations were carried out by humans computers before the introduction of electronic computers.

        A 2-layer neural network can learn the tricks of 200 years of research into signal processing from a bunch of domain-specific training data: https://www.youtube.com/watch?v=HtFZ9uwlscE&feature=youtu.be

        That being said I don’t think I have an argument against this, but my intuition based on how many applications nns have found in terms of outdoing equations in hard sciences like physics and chemistry is that we aren’t super good at this, even if it’s 100k of us working on it together.

        Similarly, I don’t agree with the overall framing of your critique. You can say something like “Human bodies are shit at certain tasks, things like digging a 100ft long 10ft deep trench. Or heck, even walking more than a few miles a day or even run a few hundred feet”. We have excavators and cars for a reason. Similarly, we have computers to do “boring” computation work for us.

        In hindsight, yeah, I think the framing is a bit off.

      1. 23

        I love the attitude: you probably shouldn’t take it apart, but it’s your hardware, so here is instruction how to do this correctly.

        1. 18

          That’s the sort of attitude that IMO has made Steam basically the least-evil software distribution store. Not a high bar, unfortunately, but it’s something.

          1. 9

            They’re doing some pretty good work with Proton too, they claim the whole Steam library will be playable on the Steam Deck when it releases. Maybe the day I can switch my gaming PC to Linux isn’t too far off.

            1. 7

              Yep, and it coincidentally started happening just around the time that Microsoft was saying that the MS Store would become the only way to install programs on Windows 10. Somehow, Microsoft eventually decided that was a bad move once Valve started putting serious work into Linux compat and helping game developers port their games.

              Though it also means that about half my own game library works pretty darn well on Linux, so, can’t complain too much.

              1. 4

                Obviously it’s in their own self-interest to do it, and they’ve been pushing it so that they don’t have to pay Microsoft to preinstall Windows on their consoles, but it’s still a good thing overall.

                1. 2

                  Yeah, I believe this is part of the reason Valve has embraced Linux since so long ago. Basically a bit of insurance against the dominance of Windows. I imagine they were well aware of the extreme dependence on MS playing nice (or whatever). I feel like I’ve read more about this very subject, will see if I can dig up any links or anything…

                2. 6

                  Apparently they are getting anti-cheat software to be work in Linux too (EAC for example) which I thought I would never see happen in my lifetime.

                  1. 2

                    I’m really curious about this. The closest any of these kernel-mode anticheats has come to Linux before is EAC, where they had an extremely basic version briefly for the game Rust, and were also working on a version that worked in WIne. Those were cancelled the moment Epic Games bought them though, so I’m unsure if they’ve managed to build limited support for the drivers into Proton, or whether they’ve made a deal with Epic to get that wine version going again.

                3. 5

                  GoG is [almost] DRM-free, so I try to buy most games there. I wonder how to balance all of the evils against one another to choose the “least-evil”.

                4. 8

                  All things considered, it’s quite amazing that it just takes 8 screws to open the unit and 3 more to replace the thumbstick. Replacing the internal SSD takes 4 more screws but they strongly discourage people from changing it just because they claim that the one that comes installed on it is selected for (1) power consumption and (2) minimal interference with the wifi module (but they also upcharge for more storage, so maybe they just don’t want people to buy the cheap version and swap the ssd on the side).

                  It seems that they put some thought on trying to make the steam deck as serviceable as possible given its form factor.

                1. 2

                  The author pronounces it [aɡe̞], like the Italian “aghe”.

                  Does the author mean aghi? Aghe is not an Itailan word.

                  Now I am confused. Is the pronunciation as ah-gee (as you would say aghi in Italian) or ah-geh (as you would pronounce aghe in Italian if that were a word)?

                  1. 3

                    It seems unlikely to me that the majority of people who encounter this library at its point of use will think to investigate how it’s pronounced. I expect most will assume it’s the English word. Naming things is hard; there are many pitfalls.

                    1. 3

                      I’m also confused. It links to google translate which translates it to “needles”, but I’ve never heard the word pluralized like that. I’m guessing it comes from FiloSottile’s dialect.

                      1. 1

                        I think I just got it. The link to google translate is there so you can play the pronunciation and not to translate it to English. I guess that’s helpful for everyone that is not Italian, lol.

                      2. 2

                        The latter. I’m sure he used to describe it as pronounced the Japanese way, but perhaps even fewer people understand that :-)

                        1. 1

                          I also thought was pronounced like in chicken karaage. However, I now suspect I pronounce that wrong also since I say “ah-hey” rather than “ah-geh”

                          1. 3

                            Heh yeah, for the record you’re pronouncing it wrong - a mora consisting of a g followed by any vowel is always a hard G sound in Japanese.

                            So it’s kah-rah-ah-geh (more or less, in a standardish American accent, although with no aspiration because those Hs are just there to steer you towards the right vowel sound, and with the vowel sounds held for a somewhat shorter period of time than you might default to)

                      1. 8

                        You didn’t mention it in your post, but in general CO2 sensors should be calibrated or you will be getting reading that are off. https://www.yoctopuce.com/EN/article/how-to-calibrate-a-co2-sensor

                        1. 1

                          yoctopuce seems to have some fine products, but holy hell are they pricey - and the calibration kits (coming from senseair) aren’t even available without contacting sales, so I guess as a private person you won’t normally get them

                          1. 1

                            It looks like you have 2 options for calibration. You can either move the sensor outside and run the calibration procedure. That assumes that the outside air CO2 concentration is 400 ppm and resets the sensor accordingly. Or, you can buy one fo thoe calibration kits . The kits usually involve using nitrogen gas to eliminate all CO2 from the sensor, leading to it calibrating to a “true zero” value instead of a 400 ppm value.

                            The outside method works most of the case but it might be a little bit off in case the CO2 ppm concentration is higher in your area for whatever reason.

                        1. 14

                          In metric units 170F is about 75C, which is literal Sauna temperature. Wow.

                          1. 1

                            A sauna is much more dangerous. Just as the air in a 150 celcius oven is pretty hot, but a 100 degree steam coming out of a pot is going to hurt way more. Temperature isn’t the only relevant factor when it comes to cooking…

                            1. 4

                              You make it sound like a sauna is dangerous. It’s not, because you get out when you get too hot.

                              In Finland there were less than 2 deaths in sauna per 100,000 inhabitants per year in the 1990s. That was a time when on average, Finns spent 9 minutes in a sauna twice a week. That’s one death per 780,000 hours spent in the sauna. And half of that is because people binge drink and go to the sauna.

                              I don’t have statistics for deaths per hours a child spends in a hot car, but it cannot be very high considering reasonable people don’t leave children in a hot car at all yet there are dozens of deaths every year.

                              1. 2

                                In these comparisons I think the deciding factor is the ability/inability to leave the hostile environment…

                              2. 1

                                Saunas are typically dry air (although you can sometimes pour water onto hot stones). There are 100degC saunas which you can sit in for several minutes because heat transfer is so low. But a 100degC steam room would instantly burn you (and so doesn’t exist).

                                1. 1

                                  Yeah humidity plays an important role as well. Sadly the post doesn’t show the record high with humidity info, but maybe @JeremyMorgan can enlighten us :)

                                  1. 2

                                    It looks like the min [1] humidity was 7.8% from one of the pictures.

                                    [1]: Humidity and temperature should be inversely related, so as the temperature rises, the humidity should decrease, as more water vapor can be “stored” in the air. Similarly, when the temperature drops, humidity increases, leading to dew in the early morning or fog.

                                    1. 1

                                      @bfielder

                                      Humidity levels:

                                      Outside

                                      • Min: 13.6
                                      • Max: 89.3

                                      In the car

                                      • Min: 6.8
                                      • Max: 55.3

                                      Seems like some wild fluctuations. It was an unusual weather event for sure.

                                1. 3

                                  This is a nice introduction to the concepts and a good set of examples that explains what is happening.

                                  You can use matrix algebra if you want to generalize it to (1) multiple points at the same time, (2) more dimensions, and/or (3) more general transformations.

                                  The idea is that you can represent the “rescaling” as a matrix and then use matrix multiplication to calculate the transformed point.

                                  For example, to translate any value in interval (a, b) to (0, 1), you can use the following matrix:

                                  T = [-a/b-a, 0; 0, 1/b-a] and use the point A = [1; x] as input.

                                  You can then start to “string” together multiple transformations by multiplying together the transformation matrices (for example, the transformation that you discuss is the combination of a translation of -a and a dilation of factor 1/(b-a)) or adding multiple transformed points by adding columns to the input matrix.

                                  You can also extend this to multiple dimensions (e.g., 2D) by adding rows and columns to the matrix and an additional row to the point. The basic concepts will still work out as in the 1D case.

                                  Note: I am not 100% sure of my math. Someone check it. :)

                                  1. 2

                                    You can use matrix algebra if you want to generalize it to (1) multiple points at the same time, (2) more dimensions, and/or (3) more general transformations.

                                    Can you do all three of these at the same time with matrices or do you need to generalise to something with more numbers in it?

                                  1. 15

                                    There are a lot of accusations in this and the subsequently linked posts against this ominous person called Andrew Lee. With all the democratic impetus on these resigning statements, please audiatur et altera pars. Where’s the statement from Lee to the topic? What does he think about this? Does he not want to comment (that is, takes the accusations as valid) or is it simply not linked, which I would find a dubious attitude from people who insist on democratic values? Because, if you accuse anyone, you should give him opportunity to explain himself.

                                    Don’t get me wrong. What I read is concerning. But freenode basically is/was the last bastion of IRC. The brand is well-known. The proposed alternative libera.chat will fight an uphill battle against non-IRC services. Dissolving/attacking the freenode brand is thus doing IRC as a whole a disfavour and should only be done after very careful consideration and not as a spontaneous act of protest.

                                    1. 13

                                      Where’s the statement from Lee to the topic?

                                      You can dig through IRC logs referenced in the resignation letter linked by pushcx above and see what he has to say to the admins directly, if you assume the logs haven’t been tampered with. My personal assessment is he makes lots and lots of soothing reassuring non-confrontational noises to angry people, and then when the people who actually operate the network ask for actual information he gives them none. When they offer suggestions for how to resolve the situation he ignores them. When they press for explanations of previous actions (such as him asking for particular people to be given admin access) he deflects and tries to make it seem like the decision came from a group of people, not just himself.

                                      So yeah. Smooth, shiny, nicely-lacquered 100% bullshit.

                                      1. 16

                                        I’ve now skimmed through some of the IRC logs. It’s been long that I read so heated discussions, full of swear words, insults, accusations, dirty language, and so on. This affects both sides. It’s like a transcript from children in the kindergarten trying to insult each other and it’s hard to believe that these persons are supposed to be adults. This is unworthy of a project which so many FOSS communities are relying on. Everyone should go shame in the corner and come back in a few days when they have calmed down.

                                        I’m not going to further comment on the topic. This is not how educated persons settle a dispute.

                                        1. 6

                                          Amen.

                                      2. 11

                                        Lee has issued a statement under his own name now: https://freenode.net/news/freenode-is-foss

                                        1. 10

                                          As a rebuttal to the URL alone, freenode isn’t foss, it’s a for profit company. So before you even click through you are being bullshitted.

                                          1. 14

                                            freenode isn’t foss, it’s a for profit company.

                                            You can be for profit and foss. So this is a non-sequitur.

                                          2. 3

                                            Ah, thanks.

                                          3. 7

                                            Self-replying: Lee has been e-mail-interviewed by The Register, giving some information on how he sees the topic: https://www.theregister.com/2021/05/19/freenode_staff_resigns/

                                            1. 7

                                              “freenode” is both a brand and an irc network run by a team of people. if the team did not want to work with andrew lee, but wanted to continue running the network, their only option was to walk away and establish a new network, and try to make that the “real” freenode in all but the name.

                                              this is not the first brand-vs-actual-substance split in the open source world; i don’t see that they had any other choice after lee tried to assert control over freenode-the-network due to ownership of freenode-the-brand.

                                              1. 6

                                                who insist on democratic values?

                                                Democracy isn’t about hearing both sides. It’s about majority winning.

                                                Actually getting angry over one-sided claims and forming an angry mob is very democratic and has been its tradition since the time of ancient greeks.

                                                1. 5

                                                  If and when a representative of Lee’s company (apparently https://imperialfamily.com/) posts something, a member can submit it to the site.

                                                  As far as I know Lee or his company have made no statement whatsoever.

                                                  1. 2

                                                    Could this just be the death knell of irc? A network split is not good as people will be confused between Freenode and Libera Chat.

                                                    Most young people that look for a place to chat probably look at discord first. For example, the python discord server has 220000 registered users and 50000 online right now. I don’t believe that the python channel on Freenode has ever gotten close to that.

                                                    1. 16

                                                      Having multiple networks is healthy.

                                                      1. 11

                                                        I strongly believe that IRC is on a long slow decline rather than going to die out due to any one big event. Mostly because there are so many other IRC servers. It’s an ecosystem not a corporation.

                                                        1. 7

                                                          IRC has survived, and will yet survive, a lot of drama.

                                                          1. 3

                                                            Well, people were already confused between OFTC and Freenode. More the merrier.

                                                        1. 38

                                                          I’ve only been following the situation, but, while this is a reasonable act of contrition, I still want to see what UMN reports officially, because there’s clearly something up with their ethics policies if this slipped under the radar.

                                                          “We made a mistake by not finding a way to consult with the community and obtain permission before running this study; we did that because we knew we could not ask the maintainers of Linux for permission”

                                                          That quote is exactly the root of it. You knew better, and you knew this wouldn’t fly. So you did it anyway. This seems pretty ‘reckless’, as per the legal term.

                                                          1. 19

                                                            Of course they could ask permission. They could have written a letter to Greg K-H: y

                                                            “Here are our credentials, here is what we propose to do, here is how to identify commits that should not be merged; we will keep track of them here and remind you at the end of each merge window; we will review progress with you on this data and this date; the project will end on this date.

                                                            Assuming you are generally agreeable, what have we overlooked and what changes should we make in this protocol?”

                                                            Doing that would make the professor and his grad students white-hats.

                                                            Not doing that makes them grey-hats at best, and in practice black hats because they didn’t actually clean up at the end.

                                                            People who have betrayed basic ethics: the professor, his grad students, the IRB that accepted that this wouldn’t affect humans, and the IEEE people who reviewed and accepted the first paper.

                                                            1. 9

                                                              I’m not going to speak too much out of place, but academia does have an honesty and integrity problem in addition to an arrogance one. There is much debate these days within the context of the philosophy of science in how our metric fuelled obsession can be corrected.

                                                              1. 7

                                                                we knew we could not ask the maintainers of Linux for permission

                                                                Their story continues to change. In the paper, they clearly state that they thought that their research didn’t involve human subjects and therefore they didn’t have to consent their participants. Then, it was that they asked for IRB review and the work was exempt from IRB guidelines, and now they are apologizing for non consenting people and that they knew that consenting people would have put their research at risk? Informed consent is one of the corner stones of modern (as in since the 60s) ethical research guidelines.

                                                                There are ways around informed consent (and deceptive research practices, see here for an example) but those require strict IRB supervision and the research team needs to show that the benefits to participants and/or the community outweigh the risks. Plus, participants need to be informed about the research project after their participation and they still need to consent to participate in it.

                                                                In before they didn’t know, all federal funding agencies (including NFS that is funding this work), requires all PIs and research personnel (e.g., research assistants) to complete training every 3 years on IRB guidelines and all students are required to complete a 12 hour course on human subject research (at least at my institution, I imagine that UMN has similar guidelines).

                                                                1. 5

                                                                  But they are only students. It’s the people who have supervised them to blame. I hope for a positive outcome for all parts.

                                                                  1. 24

                                                                    PhD students are functioning adults akin to employees not k-12 students (juveniles), their supervisor and university deserves a share of the blame, so do they. Like if they had been employees their company and manager would deserve a share of the blame, and so would they.

                                                                    1. 16

                                                                      FWIW, the author of the email is the professor running the study and supervising the students who pushed patches out.

                                                                      I too am waiting to see if UMN does anything, either at the department level or higher.

                                                                      1. 3

                                                                        They applied for (and received) grants. They’re scholars, not students.

                                                                    1. 50

                                                                      The paper has this to say (page 9):

                                                                      Regarding potential human research concerns. This experiment studies issues with the patching process instead of individual behaviors, and we do not collect any personal information. We send the emails to the Linux community and seek their feedback. The experiment is not to blame any maintainers but to reveal issues in the process. The IRB of University of Minnesota reviewed the procedures of the experiment and determined that this is not human research. We obtained a formal IRB-exempt letter.

                                                                      [..]

                                                                      Honoring maintainer efforts. The OSS communities are understaffed, and maintainers are mainly volunteers. We respect OSS volunteers and honor their efforts. Unfortunately, this experiment will take certain time of maintainers in reviewing the patches. To minimize the efforts, (1) we make the minor patches as simple as possible (all of the three patches are less than 5 lines of code changes); (2) we find three real minor issues (i.e., missing an error message, a memory leak, and a refcount bug), and our patches will ultimately contribute to fixing them.

                                                                      I’m not familiar with the generally accepted standards on these kind of things, but this sounds rather iffy to me. I’m very far removed from academia, but I’ve participated in a few studies over the years, which were always just questionaries or interviews, and even for those I had to sign a consent waiver. “It’s not human research because we don’t collect personal information” seems a bit strange.

                                                                      Especially since the wording “we will have to report this, AGAIN, to your university” implies that this isn’t the first time this has happened, and that the kernel folks have explicitly objected to being subject to this research before this patch.

                                                                      And trying to pass off these patches as being done in good faith with words like “slander” is an even worse look.

                                                                      1. 78

                                                                        They are experimenting on humans, involving these people in their research without notice or consent. As someone who is familiar with the generally accepted standards on these kinds of things, it’s pretty clear-cut abuse.

                                                                        1. 18

                                                                          I would agree. Consent is absolutely essential but just one of many ethical concerns when doing research. I’ve seen simple usability studies be rejected due to lesser issues.

                                                                          It’s pretty clear this is abuse.. the kernel team and maintainers feel strongly enough to ban the whole institution.

                                                                          1. 10

                                                                            Yeah, agreed. My guess is they misrepresented the research to the IRB.

                                                                            1. 3

                                                                              They are experimenting on humans

                                                                              This project claims to be targeted at the open-source review process, and seems to be as close to human experimentation as pentesting (which, when you do social engineering, also involves interacting with humans, often without their notice or consent) - which I’ve never heard anyone claim is “human experimentation”.

                                                                              1. 19

                                                                                A normal penetration testing gig is not academic research though. You need to separate between the two, and also hold one of them to a higher standard.

                                                                                1. 0

                                                                                  A normal penetration testing gig is not academic research though. You need to separate between the two, and also hold one of them to a higher standard.

                                                                                  This statement is so vague as to be almost meaningless. In what relevant ways is a professional penetration testing contract (or, more relevantly, the associated process) different from this particular research project? Which of the two should be held to a higher standard? Why? What does “held to a higher standard” even mean?

                                                                                  Moreover, that claim doesn’t actually have anything to do with the comment I was replying to, which was claiming that this project was “experimenting on humans”. It doesn’t matter whether or not something is “research” or “industry” for the purposes of whether or not it’s “human experimentation” - either it is, or it isn’t.

                                                                                  1. 18

                                                                                    Resident pentester and ex-academia sysadmin checking in. I totally agree with @Foxboron and their statement is not vague nor meaningless. Generally in a penetration test I am following basic NIST 800-115 guidance for scoping and target selection and then supplement contractual expectations for my clients. I can absolutely tell you that the methodologies that are used by academia should be held to a higher standard in pretty much every regard I could possibly come up with. A penetration test does not create a custom methodology attempting do deal with outputting scientific and repeatable data.

                                                                                    Let’s put it in real terms, I am hired to do a security assessment in a very fixed highly focused set of targets explicitly defined in contract by my client in an extremely fixed time line (often very short… like 2 weeks maximum and 5 day average). Guess what happens if social engineering is not in my contract? I don’t do it.

                                                                                    1. 1

                                                                                      Resident pentester and ex-academia sysadmin checking in.

                                                                                      Note: this is worded like an appeal to authority, although you probably don’t mean it that way, so I’m not going to act like you are.

                                                                                      I totally agree with @Foxboron and their statement is not vague nor meaningless.

                                                                                      Those are two completely separate things, and neither is implied by the other.

                                                                                      their statement is not vague nor meaningless.

                                                                                      Not true - their statement contained none of the information you just provided, nor any other sort of concrete or actionable information - the statement “hold to a higher standard” is both vague and meaningless by itself…and it was by itself in that comment (or, obviously, there were other words - none of them relevant) - there was no other information.

                                                                                      the methodologies that are used by academia should be held to a higher standard

                                                                                      Now you’re mixing definitions of “higher standard” - GP and I were talking about human experimentation and ethics, while you seem to be discussing rigorousness and reproducibility of experiments (although it’s not clear, because “A penetration test does not create a custom methodology attempting do deal with outputting scientific and repeatable data” is slightly ambiguous).

                                                                                      None of the above is relevant to the question of “was this a human experiment” and the closely-related one “is penetration testing a human experiment”. Evidence suggests “no” given that the term does not appear in that document, nor have I heard of any pentest being reviewed by an ethics review board, nor have I heard any mention of “human experimenting” in the security community (including when gray-hat and black-hat hackers and associated social engineering e.g. Kevin Mitnick are mentioned), nor are other similar, closer-to-human experimentation (e.g. A/B testing, which is far closer to actually experimenting on people) processes considered to be such - up until this specific case.

                                                                                    2. 5

                                                                                      if you’re an employee in an industry, you’re either informed of penetration testing activity, or you’ve at the very least tacitly agreed to it along with many other things that exist in employee handbooks as a condition of your employment.

                                                                                      if a company did this to their employees without any warning, they’d be shitty too, but the possibility that this kind of underhanded behavior in research could taint the results and render the whole exercise unscientific is nonzero.

                                                                                      either way, the goals are different. research seeks to further the verifiability and credibility of information. industry seeks to maximize profit. their priorities are fundamentally different.

                                                                                      1. 1

                                                                                        you’ve at the very least tacitly agreed to it along with many other things that exist in employee handbooks as a condition of your employment

                                                                                        By this logic, you’ve also agreed to everything else in a massive, hundred-page long EULA that you click “I agree” on, as well as consent to be tracked by continuing to use a site that says that in a banner at the bottom, as well as consent to Google/companies using your data for whatever they want and/or selling it to whoever will buy.

                                                                                        …and that’s ignoring whether or not companies that have pentesting done on them actually explicitly include that specific warning in your contract - “implicit” is not good enough, as then anyone can claim that, as a Linux kernel patch reviewer, you’re “implicitly agreeing that you may be exposed to the risk of social engineering for the purpose of getting bad code into the kernel”.

                                                                                        the possibility that this kind of underhanded behavior in research could taint the results and render the whole exercise unscientific

                                                                                        Like others, you’re mixing up the issue of whether the experiment was properly-designed with the issue of whether it was human experimentation. I’m not making any attempt to argue the former (because I know very little about how to do good science aside from “double-blind experiments yes, p-hacking no”), so I don’t know why you’re arguing against it in a reply to me.

                                                                                        either way, the goals are different. research seeks to further the verifiability and credibility of information. industry seeks to maximize profit. their priorities are fundamentally different.

                                                                                        I completely agree that the goals are different - but again, that’s irrelevant for determining whether or not something is “human experimentation”. Doesn’t matter what the motive is, experimenting on humans is experimenting on humans.

                                                                                  2. 18

                                                                                    This project claims to be targeted at the open-source review process, and seems to be as close to human experimentation as pentesting (which, when you do social engineering, also involves interacting with humans, often without their notice or consent) - which I’ve never heard anyone claim is “human experimentation”.

                                                                                    I had a former colleague that once bragged about getting someone fired at his previous job during a pentesting exercise. He basically walked over to this frustrated employee at a bar, bribed him a ton of money and gave a job offer in return for plugging a usb key into the network. He then reported it to senior management and the employee was fired. While that is an effective demonstration of a vulnerability in their organization, what he did was unethical under many moral frameworks.

                                                                                    1. 2

                                                                                      First, the researchers didn’t engage in any behavior remotely like this.

                                                                                      Second, while indeed an example of pentesting, most pentesting is not like this.

                                                                                      Third, the fact that it was “unethical under many moral frameworks” is irrelevant to what I’m arguing, which is that the study was not “human experimentation”. You can steal money from someone, which is also “unethical under many moral frameworks”, and yet still not be doing “human experimentation”.

                                                                                    2. 3

                                                                                      If there is a pentest contract, then there is consent, because consent is one of the pillars of contract law.

                                                                                      1. 1

                                                                                        That’s not an argument that pentesting is human experimentation in the first place.

                                                                                  3. 42

                                                                                    The statement from the UMinn IRB is in line with what I heard from the IRB at the University of Chicago after they experimented on me, who said:

                                                                                    I asked about their use of any interactions, or use of information about any individuals, and they indicated that they have not and do not use any of the data from such reporting exchanges other than tallying (just reports in aggregate of total right vs. number wrong for any answers received through the public reporting–they said that much of the time there is no response as it is a public reporting system with no expectation of response) as they are not interested in studying responses, they just want to see if their tool works and then also provide feedback that they hope is helpful to developers. We also discussed that they have some future studies planned to specifically study individuals themselves, rather than the factual workings of a tool, that have or will have formal review.

                                                                                    They because claim they’re studying the tool, it’s OK to secretly experiment on random strangers without disclosure. Somehow I doubt they test new drugs by secretly dosing people and observing their reactions, but UChicago’s IRB was 100% OK with doing so to programmers. I don’t think these IRBs literally consider programmers sub-human, but it would be very inconvenient to accept that experimenting on strangers is inappropriate, so they only want to do so in places they’ve been forced to by historical abuse. I’d guess this will continue for years until some random person is very seriously harmed by being experimented on (loss of job/schooling, pushing someone unstable into self-harm, targeting someone famous outside of programming) and then over the next decade IRBs will start taking it seriously.

                                                                                    One other approach that occurs to me is that the experimenters and IRBs claim they’re not experimenting on their subjects. That’s obviously bullshit because the point of the experiment is to see how the people respond to the treatment, but if we accept the lie it leaves an open question: what is the role played by the unwitting subject? Our responses are tallied, quoted, and otherwise incorporated into the results in the papers. I’m not especially familiar with academic publishing norms, but perhaps this makes us unacknowledged co-authors. So maybe another route to stopping experimentation like this would be things like claiming copyright over the papers, asking journals for the papers to be retracted until we’re credited, or asking the universities to open academic misconduct investigations over the theft of our work. I really don’t have the spare attention for this, but if other subjects wanted to start the ball rolling I’d be happy to sign on.

                                                                                    1. 23

                                                                                      I can kind of see where they’re coming from. If I want to research if car mechanics can reliably detect some fault, then sending a prepared car to 50 garages is probably okay, or at least a lot less iffy. This kind of (informal) research is actually fairly commonly by consumer advocacy groups and the like. The difference is that the car mechanics will get paid for their work where as the Linux devs and you didn’t.

                                                                                      I’m gonna guess the IRBs probably aren’t too familiar with the dynamics here, although the researchers definitely were and should have known better.

                                                                                      1. 18

                                                                                        Here it’s more like keying someone’s car to see how quick it takes them to get an insurance claim.

                                                                                        1. 4

                                                                                          Am I misreading? I thought the MR was a patch designed to fix a potential problem, and the issue was

                                                                                          1. pushcx thought it wasn’t a good fix (making it a waste of time)
                                                                                          2. they didn’t disclose that it was an auto-generated PR.

                                                                                          Those are legitimate complaints, c.f. https://blog.regehr.org/archives/2037, but from the analogies employed (drugs, dehumanization, car-keying), I have to double-check that I haven’t missed an aspect of the interaction that makes it worse than it seemed to me.

                                                                                          1. 2

                                                                                            We were talking about Linux devs/maintainers too, I commented on that part.

                                                                                            1. 1

                                                                                              Gotcha. I missed that “here” was meant to refer to the Linux case, not the Lobsters case from the thread.

                                                                                        2. 1

                                                                                          Though there they are paying the mechanic.

                                                                                        3. 18

                                                                                          IRB is a regulatory board that is there to make sure that researchers follow the (Common Rule)[https://www.hhs.gov/ohrp/regulations-and-policy/regulations/common-rule/index.html].

                                                                                          In general, any work that receives federal funding needs to comply with the federal guidelines for human subject research. All work involving human subjects (usually defined as research activities that involve interaction with humans) need to be reviewed and approved by the institution IRB. These approvals fall within a continuum, from a full IRB review (which involve the researcher going to a committee and explaining their work and usually includes continued annual reviews) to a declaration of the work being exempt from IRB supervision (usually this happens when the work meets one of the 7 exemptions listed in the federal guidelines). The whole process is a little bit more involved, see for example (all the charts)[https://www.hhs.gov/ohrp/regulations-and-policy/decision-charts/index.html] to figure this out.

                                                                                          These rules do not cover research that doesn’t involve humans, such as research on technology tools. I think that there is currently a grey area where a researcher can claim that they are studying a tool and not the people interacting with the tool. It’s a lame excuse that probably goes around the spirit of the regulations and is probably unethical from a research stand point. The data aggregation method or the data anonymization is usually a requirement for an exempt status and not a non-human research status.

                                                                                          The response that you received from IRB is not surprising, as they probably shouldn’t have approved the study as non-human research but now they are just protecting the institution from further harm rather than protecting you as a human subject in the research (which, by the way, is not their goal at this point).

                                                                                          One thing that sticks out to me about your experience is that you weren’t asked to give consent to participate in the research. That usually requires a full IRB review as informed consent is a requirement for (most) human subject research. Exempt research still needs informed consent unless it’s secondary data analysis of existing data (which your specific example doesn’t seem to be).

                                                                                          One way to quickly fix it is to contact the grant officer that oversees the federal program that is funding the research. A nice email stating that you were coerced to participate in the research study by simply doing your work (i.e., review a patch submitted to a project that you lead) without being given the opportunity to provide prospective consent and without receiving compensation for your participation and that the research team/university is refusing to remove your data even after you contacted them because they claim that the research doesn’t involve human subjects can go a long way to force change and hit the researchers/university where they care the most.

                                                                                          1. 7

                                                                                            Thanks for explaining more of the context and norms, I appreciate the introduction. Do you know how to find the grant officer or funding program?

                                                                                            1. 7

                                                                                              It depends on how “stalky” you want to be.

                                                                                              If NSF was the funder, they have a public search here: https://nsf.gov/awardsearch/

                                                                                              Most PIs also add a line about grants received to their CVs. You should be able to match the grant title to the research project.

                                                                                              If they have published a paper from that work, it should probably include an award number.

                                                                                              Once you have the award number, you can search the funder website for it and you should find a page with the funding information that includes the program officer/manager contact information.

                                                                                              1. 3

                                                                                                If they published a paper about it they likely included the grant ID number in the acknowledgements.

                                                                                                1. 1

                                                                                                  You might have more luck reaching out to the sponsored programs office at their university, as opposed to first trying to contact an NSF program officer.

                                                                                              2. 4

                                                                                                How about something like a an Computer Science - External Review Board? Open source projects could sign up, and include a disclaimer that their project and community ban all research that hasn’t been approved. The approval process could be as simple as a GitHub issue the researcher has to open, and anyone in the community could review it.

                                                                                                It wouldn’t stop the really bad actors, but any IRB would have to explain why they allowed an experiment on subjects that explicitly refused consent.

                                                                                                [Edit] I felt sufficiently motivated, so I made a quick repo for the project . Suggestions welcome.

                                                                                                1. 7

                                                                                                  I’m in favor of building our own review boards. It seems like an important step in our profession taking its reponsibility seriously.

                                                                                                  The single most important thing I’d say is, be sure to get the scope of the review right. I’ve looked into this before and one of the more important limitations on IRBs is that they aren’t allowed to consider the societal consequences of the research succeeding. They’re only allowed to consider harm to experimental subjects. My best guess is that it’s like that because that’s where activists in the 20th-century peace movement ran out of steam, but it’s a wild guess.

                                                                                                  1. 4

                                                                                                    At least in security, there are a lot of different Hacker Codes of Ethics floating around, which pen testers are generally expected to adhere to… I don’t think any of them cover this specific scenario though.

                                                                                                    1. 2

                                                                                                      any so-called “hacker code of ethics” in use by any for-profit entity places protection of that entity first and foremost before any other ethical consideration (including human rights) and would likely not apply in a research scenario.

                                                                                                2. 23

                                                                                                  They are bending the rules for non human research. One of the exceptions for non-human research is research on organization, which my IRB defines as “Information gathering about organizations, including information about operations, budgets, etc. from organizational spokespersons or data sources. Does not include identifiable private information about individual members, employees, or staff of the organization.” Within this exception, you can talk with people about how the organization merges patches but not how they personally do that (for example). All the questions need to be about the organization and not the individual as part of the organization.

                                                                                                  On the other hand, research involving human subjects is defined as any research activity that involves an “individual who is or becomes a participant in research, either:

                                                                                                  • As a recipient of a test article (drug, biologic, or device); or
                                                                                                  • As a control.”

                                                                                                  So, this is how I interpret what they did.

                                                                                                  The researchers submitted an IRB approval saying that they just downloaded the kernel maintainer mailing lists and analyzed the review process. This doesn’t meet the requirements for IRB supervision because it’s either (1) secondary data analysis using publicly available data and (2) research on organizational practices of the OSS community after all identifiable information is removed.

                                                                                                  Once they started emailing the list with bogus patches (as the maintainers allege), the research involved human subjects as these people received a test article (in the form of an email) and the researchers interacted with them during the review process. The maintainers processing the patch did not do so to provide information about their organization’s processes and did so in their own personal capacity (In other words, they didn’t ask them how does the OSS community processes this patch but asked them to process a patch themselves). The participants should have given consent to participate in the research and the risks of participating in it should have been disclosed, especially given the fact that missing a security bug and agreeing to merge it could be detrimental to someone’s reputation and future employability (that is, this would qualify for more than minimal risk for participants, requiring a full IRB review of the research design and process) with minimal benefits to them personally or to the organization as a whole (as it seems from the maintainers’ reaction to a new patch submission).

                                                                                                  One way to design this experiment ethically would have been to email the maintainers and invite them to participate in a “lab based” patch review process where the research team would present them with “good” and “bad” patches and ask them whether they would have accepted them or not. This is after they were informed about the study and exercised their right to informed consent. I really don’t see how emailing random stuff out and see how people interact with it (with their full name attached to it and in full view of their peers and employers) can qualify as research with less than minimal risks and that doesn’t involve human subjects.

                                                                                                  The other thing that rubs me the wrong way is that they sought (and supposedly received) retroactive IRB approval for this work. That wouldn’t fly with my IRB, as my IRB person would definitely rip me a new one for seeking retroactive IRB approval for work that is already done, data that was already collected, and a paper that is already written and submitted to a conference.

                                                                                                  1. 6

                                                                                                    You make excellent points.

                                                                                                    1. IRB review has to happen before the study is started. For NIH, the grant application has to have the IRB approval - even before a single experiment is even funded to be done, let alone actually done.
                                                                                                    2. I can see the value of doing a test “in the field” so as to get the natural state of the system. In a lab setting where the participants know they are being tested, various things will happen to skew results. The volunteer reviewers might be systematically different from the actual population of reviewers, the volunteers may be much more alert during the experiment and so on.

                                                                                                    The issue with this study is that there was no serious thought given to what are the ethical ramifications of this are.

                                                                                                    If the pen tested system has not asked to be pen tested then this is basically a criminal act. Otherwise all bank robbers could use the “I was just testing the security system” defense.

                                                                                                    1. 8

                                                                                                      The same requirement for prior IRB approval is necessary for NSF grants (which the authors seem to have received). By what they write in the paper and my interpretation of the circumstances, they self certified as conducting non-human research at time of submitting the grant and only asked their IRB for confirmation after they wrote the paper.

                                                                                                      Totally agree with the importance of “field experiment” work and that, sometimes, it is not possible to get prospective consent to participate in the research activities. However, the guidelines are clear on what activities fall within research activities that are exempt from prior consent. The only one that I think is applicable to this case is exception 3(ii):

                                                                                                      (ii) For the purpose of this provision, benign behavioral interventions are brief in duration, harmless, painless, not physically invasive, not likely to have a significant adverse lasting impact on the subjects, and the investigator has no reason to think the subjects will find the interventions offensive or embarrassing. Provided all such criteria are met, examples of such benign behavioral interventions would include having the subjects play an online game, having them solve puzzles under various noise conditions, or having them decide how to allocate a nominal amount of received cash between themselves and someone else.

                                                                                                      These usually cover “simple” psychology experiments involving mini games or economics games involving money.

                                                                                                      In the case of this kernel patching experiment, it is clear that this experiment doesn’t meet this requirement as participants have found this intervention offensive or embarrassing, to the point that they are banning the researchers’ institution from pushing patched to the kernel. Also, I am not sure if reviewing a patch is a “benign game” as this is the reviewers’ jobs, most likely. Plus, the patch review could have adverse lasting impact on the subject if they get asked to stop reviewing patches if they don’t catch the security risk (e.g., being deemed imcompetent).

                                                                                                      Moreover, there is this follow up stipulation:

                                                                                                      (iii) If the research involves deceiving the subjects regarding the nature or purposes of the research, this exemption is not applicable unless the subject authorizes the deception through a prospective agreement to participate in research in circumstances in which the subject is informed that he or she will be unaware of or misled regarding the nature or purposes of the research.

                                                                                                      As their patch submission process was deceptive in nature, as their outline in the paper, exemption 3(ii) cannot apply to this work unless they notify maintainers that they will be participating in a deceptive research study about kernel patching.

                                                                                                      That leaves the authors to either pursue full IRB review for their work (as a full IRB review can approve a deceptive research project if it deems it appropriate and the risk/benefit balance is in favor to the participants) or to self-certify as non-human subjects research and fix any problems later. They decided to go with the latter.

                                                                                                  2. 35

                                                                                                    We believe that an effective and immediate action would be to update the code of conduct of OSS, such as adding a term like “by submitting the patch, I agree to not intend to introduce bugs.”

                                                                                                    I copied this from that paper. This is not research, anyone who writes a sentence like this with a straight face is a complete moron and is just mocking about. I hope all of this will be reported to their university.

                                                                                                    1. 18

                                                                                                      It’s not human research because we don’t collect personal information

                                                                                                      I yelled bullshit so loud at this sentence that it woke up the neighbors’ dog.

                                                                                                      1. 2

                                                                                                        Yeah, that came from the “clarifiactions” which is garbage top to bottom. They should have apologized, accepted the consequences and left it at that. Here’s another thing they came up with in that PDF:

                                                                                                        Suggestions to improving the patching process In the paper, we provide our suggestions to improve the patching process.

                                                                                                        • OSS projects would be suggested to update the code of conduct, something like “By submitting the patch, I agree to not intend to introduce bugs”

                                                                                                        i.e. people should say they won’t do exactly what we did.

                                                                                                        They acted in bad faith, skirted IRB through incompetence (let’s assume incompetence and not malice) and then act surprised.

                                                                                                      2. 14

                                                                                                        Apparently they didn’t ask the IRB about the ethics of the research until the paper was already written: https://www-users.cs.umn.edu/~kjlu/papers/clarifications-hc.pdf

                                                                                                        Throughout the study, we honestly did not think this is human research, so we did not apply for an IRB approval in the beginning. We apologize for the raised concerns. This is an important lesson we learned—Do not trust ourselves on determining human research; always refer to IRB whenever a study might be involving any human subjects in any form. We would like to thank the people who suggested us to talk to IRB after seeing the paper abstract.

                                                                                                        1. 14

                                                                                                          I don’t approve of researchers YOLOing IRB protocols, but I also want this research done. I’m sure many people here are cynical/realistic enough that the results of this study aren’t surprising. “Of course you can get malicious code in the kernel. What sweet summer child thought otherwise?” But the industry as a whole proceeds largely as if that’s not the case (or you could say that most actors have no ability to do anything about the problem). Heighten the contradictions!

                                                                                                          There are some scary things in that thread. It sounds as if some of the malicious patches reached stable, which suggests that the author mostly failed by not being conservative enough in what they sent. Or for instance:

                                                                                                          Right, my guess is that many maintainers failed in the trap when they saw respectful address @umn.edu together with commit message saying about “new static analyzer tool”.

                                                                                                          1. 17

                                                                                                            I agree, while this is totally unethical, it’s very important to know how good the review processes are. If one curious grad student at one university is trying it, you know every government intelligence department is trying it.

                                                                                                            1. 8

                                                                                                              I entirely agree that we need research on this topic. There’s better ways of doing it though. If there aren’t better ways of doing it, then it’s the researcher’s job to invent them.

                                                                                                            2. 7

                                                                                                              It sounds as if some of the malicious patches reached stable

                                                                                                              Some patches from this University reached stable, but it’s not clear to me that those patches also introduced (intentional) vulnerabilities; the paper explicitly mentions the steps that they’re taking steps to ensure those patches don’t reach stable (I omitted that part, but it’s just before the part I cited)

                                                                                                              All umn.edu are being reverted, but at this point it’s mostly a matter of “we don’t trust these patches and will need additional review” rather than “they introduced security vulnerabilities”. A number of patches already have replies from maintainers indicating they’re genuine and should not be reverted.

                                                                                                              1. 5

                                                                                                                Yes, whether actual security holes reached stable or not is not completely clear to me (or apparently to maintainers!). I got that impression from the thread, but it’s a little hard to say.

                                                                                                                Since the supposed mechanism for keeping them from reaching stable is conscious effort on the part of the researchers to mitigate them, I think the point may still stand.

                                                                                                                1. 1

                                                                                                                  It’s also hard to figure out what the case is since there is no clear answer what the commits where, and where they are.

                                                                                                              2. 4

                                                                                                                The Linux review process is so slow that it’s really common for downstream folks to grab under-review patches and run with them. It’s therefore incredibly irresponsible to put patches that you know introduce security vulnerabilities into this form. Saying ‘oh, well, we were going to tell people before they were deployed’ is not an excuse and I’d expect it to be a pretty clear-cut violation of the Computer Misuse Act here and equivalent local laws elsewhere. That’s ignoring the fact that they were running experiments on people without their consent.

                                                                                                                I’m pretty appalled the Oakland accepted the paper for publication. I’ve seen paper rejected from there before because they didn’t have appropriate ethics review oversite.

                                                                                                            1. 7

                                                                                                              This is an interesting article. I have a few “big picture” issues with it that I discuss below.

                                                                                                              The argument in the introduction seems somewhat all over the place. The points that the author raises are interesting, but I am not sure if I agree with the conclusion. For example, the author claims that

                                                                                                              Furthermore, the complexity of the statistical apparatus being used leads to the general public being unable to understand scientific research.

                                                                                                              as a critique of “classical statistical” methods. While I agree with the author that scientific and statistical literacy is a problem, I am at a loss of the eventual conclusion that statistical learning models, that are even less interpretable than classical models, are a solution for this problem.

                                                                                                              I am also not sure if I follow how these three conclusions that the author highlights for the post follow from the article:

                                                                                                              1. Simplicity; It can be explained to an average 14-year-old (without obfuscating its shortcomings)
                                                                                                              2. Inferential power; It allows for more insight to be extracted out of data
                                                                                                              3. Safety against data manipulation; It disallows techniques that permit clever uses of “classical statistics” to generate fallacious conclusions

                                                                                                              As far as 1, a 14 year old in a regular algebra class will be able to carry out a simple linear regression with 1 variable. That’s because the methods are rather simple and it is easy enough to understand the difference between the line of best fit and the observed points. I am not completely sure if the same is true for even the simplest statistical learning models.

                                                                                                              Maybe I am not understanding what “inferential power” means here, but my understanding of machine learning is that those models excel at predictions, as in out of sample predictions, rather than inferences. I am also fuzzy about the kind of insights that the author is going to discuss.

                                                                                                              Finally, I am not sure how data manipulation (as in data cleaning, for example) is connected to data analysis and how statistical learning “fixes” them. All the ideas that are discussed further in the post seem to be on the data analysis side, so I don’t see the connection between this last insight to the rest of the post.

                                                                                                              As a final comment, most research is done to answer specific research questions. Usually a researcher asks a question and picks the best methods to answer it. The author seems to advocate moving from inferential questions (e.g., that is the difference between a group that received X and a group that didn’t) towards predictive questions (e.g., what will happen if this group receive X, maybe?). I wooden if a better motivation for the post needs to be an epistemological first rather than a methodological/technical one.

                                                                                                              A few more fine-grained questions:

                                                                                                              • In point iii), isn’t the author just describing jackknifing/leave-one-out? The same idea can be applied to classical methods to calculate non-parametric standard errors/p-values. What am I missing here?
                                                                                                              • In point vi), there seem to be a strong assumption that all confounding variables are observed and that the deconfounding technique that the author describes can be applied to the data (by the way, if I understood the procedure that is described, it should be the same idea behind classical multiple regression). The most difficult thing about deconfounding is trying to account for the effects of unobserved variables (e.g., omitted variable bias). I am not sure if machine learning methods are better or worse than classical methods to address these issues.
                                                                                                              1. 3

                                                                                                                that are even less interpretable than classical models, are a solution for this problem.

                                                                                                                I agree that ML models are less interpretable. But what I think lends itself to explainability is performing CV and reporting accuracy as a function of that (easy to get, relatively speaking) and making the error-function explicit, i.e. informing people of what you’re trying to optimize and comparing your model to a “base” assumption or previous models on that metric.

                                                                                                                You don’t need to understand how the model operates to get the above, but in statistics, the underlying model (assumptions about it’s error distribution) is actually relevant to interpreting e.g. the p-values.

                                                                                                                All the ideas that are discussed further in the post seem to be on the data analysis side, so I don’t see the connection between this last insight to the rest of the post.

                                                                                                                I focus on how data analysis can be done under this paradigm because a lot of the strong argument pro using outdated statistical model seems to focus on their power of “explaining” the data while making predictions at the same time. So I’m trying to showcase why that also works just as well if not better with more complex models.

                                                                                                                I agree it’s in part a different point.

                                                                                                                I wooden if a better motivation for the post needs to be an epistemological first rather than a methodological/technical one.

                                                                                                                In part the post came as part of a long series of post about why a predictive based epistemology is the only way to actually make sense of the world, and why e.g. a “equation first” epistemology is just a subset of that which happens to work decently in some domains (e.g. medieval physics).

                                                                                                                But I guess the article specifically was meant mostly towards people that already agree a predictive epistemology makes more sense to pursue knowledge about the world, to actually lay out how it could be done better than now.

                                                                                                                In part my fault for not tackling this assumption, but at that point I’d have had a literal book instead of a way-too-long article.

                                                                                                                In point iii), isn’t the author just describing jackknifing/leave-one-out? The same idea can be applied to classical methods to calculate non-parametric standard errors/p-values. What am I missing here?

                                                                                                                You aren’t missing anything. Classical methods would work just fine under this paradigm and I assume for many problems a linear regression would still end up being a sufficiently complex model to capture all relevant connections.

                                                                                                                In point vi), there seem to be a strong assumption that all confounding variables are observed and that the deconfounding technique that the author describes can be applied to the data (by the way, if I understood the procedure that is described, it should be the same idea behind classical multiple regression). The most difficult thing about deconfounding is trying to account for the effects of unobserved variables (e.g., omitted variable bias). I am not sure if machine learning methods are better or worse than classical methods to address these issues.

                                                                                                                I’d argue accounting for the effect of unobserved variables is essentially impossible without making ridiculous assumptions about the external world (see psychology and degrees of freedom). I agree that ML models don’t add anything new here.

                                                                                                              1. 4

                                                                                                                Racism is a real problem that is socially and culturally constructed (see the race as social construct argument). Isn’t it naive to think that an algorithm will help “solve” or reduce the impacts of racism on society? The machine itself might not be programmed to be racist, but the people reading its output are bound to interpret it through their own biases, thus leading to racist ideas being spread and reinforced. Another subtle think to keep in mind is that most people see ML/stats models as “objectives” thus leading people to absolve themselves from the guilt of being racist and instead just say “but that’s what the machine told me” (see the controversies around the bell curve book).

                                                                                                                In short, race is a social issue that is to be solved through social means and we won’t be able to code our way out of it.

                                                                                                                1. 14

                                                                                                                  It’s also just the fact that people who do any kind of image processing algorithm work will inevitably end up using their own face to test these sorts of thing. I certainly have cameras pointed at me, not random strangers, while working on video streaming related image processing code, and I would use myself as a test subject if I wrote face recognition algorithms, or automated soap dispensers, or if I just used somebody else’s image processing library.

                                                                                                                  If most people working on these kinds of technologies are white people, even in a society with no humans with racial bias and no pre-existing systemic racism, the default outcome is that the products are racially biased. We have to be conscious about this stuff and actively work against bias (be it racial, gender or otherwise) in the technology we make.

                                                                                                                  It’s a hard problem. It’s so easy for me work on image processing algorithms for myself with my own face as a test subject, but I’ll likely end up with an algorithm which works really well on white guys but which might not work as well for other demographics. I could do blackface, but… no. I could do user testing with a diverse group of users, but that’s expensive, time consuming and difficult; users’ faces from user testing certainly won’t be as integrated into the development process as my own face is.

                                                                                                                  Don’t get me wrong, society has a racism problem. But even if it didn’t, we would still be making racially biased technology. There are solutions here, but they’re difficult and require active and conscious work to implement. Because the default state of technology is to be biased in favor of people who are similar to its creators.

                                                                                                                  1. 1

                                                                                                                    We absolutely can’t code our way out of it, but I think most people who caution about the encoding of biases into predictive models approach it more from the belief that we ought not to code ourselves even further into it. Although objects in this technical domain merely reflect the biases of their creators, they still represent an intensification of those biases both because of their perceived impartiality and because they’re increasingly empowered to automate high-stakes decisions that at least would have had the chance to be flagged by a conscientious human operator in a manual workflow.

                                                                                                                  1. 1

                                                                                                                    Isn’t this a case of adobe having to enforce the copyright in order to keep the copyright on something? Maybe they need to enforce this on an ancient version in order to keep it on updates. Like, they need their “root” copyright otherwise the whole sandcastle would come down.

                                                                                                                    27 years ago might be prehistory for computer programs but it was only 1994. I think I can’t still upload or share a song or a movie from 1994, so why would software be different?

                                                                                                                    1. 17

                                                                                                                      You’re thinking of trademark law. There is no requirement to enforce copyright.

                                                                                                                      1. 2

                                                                                                                        I wanted to add an edit after I posted the comment. Thank you for clarifying it.

                                                                                                                      2. 5

                                                                                                                        I’m pretty sure the application is still covered by copyright, which is essentially forever for corporate code.

                                                                                                                        The article mentions that Adobe is subcontracting out the DMCA enforcement to a separate company, and they probably have a list saying something like “All Adobe executables online”. They don’t care if it’s even usable nowadays.

                                                                                                                        1. 2

                                                                                                                          The article mentions that Adobe is subcontracting out the DMCA enforcement to a separate company, and they probably have a list saying something like “All Adobe executables online”. They don’t care if it’s even usable nowadays.

                                                                                                                          I wonder if this going to be a point of embarrassment for the subcontractor that went after something so ancient that it is irrelevant, or a point of pride they use to highlight their diligence?

                                                                                                                      1. 1

                                                                                                                        I really liked the article. It gives some good intuition on why you don’t need that many digits.

                                                                                                                        Wouldn’t another argument be that the significant digits in a calculation should match (or at least be close to) the least precise measurement in the model? If that’s the case, how many significant digits does our measurement of the distance for voyager 1 have? I would doubt that it is close to 17 digits, so there is really no need for more than 16 decimal places for pi. By the way, the blog post gives the distance to 3 significance digits, but I suspect that they have better measurements for it.

                                                                                                                        1. 2

                                                                                                                          The best I’ve seen is Mountain Duck but unfortunately while it is FLOSS there is no Linux port yet

                                                                                                                          1. 2

                                                                                                                            Looks rather similar to rclone.

                                                                                                                            1. 1

                                                                                                                              Hmm, maybe. rclone looks more like sshfs? It might be better than sshfs, but what Mountain Duck can do is sync the local (partial) copy of a directiory to the remote in the background / resume after connection loss etc, like Dropbox would.

                                                                                                                              1. 1

                                                                                                                                I have used Mountain Duck up to version 4 beta and I switched to rclone. The two services are pretty comparable, minus the UI for rclone. They both do partial/on demand syncing of files and mount most services.

                                                                                                                                I had fewer sync errors with rclone than I did when I was using MD. I mainly sync to google drive, box, and dropbox. I have lost some important work with MD when it failed silently on a bad connection. I haven’t experienced the same problems with rclone but I have also just been at home most of the time since I set it up.

                                                                                                                                1. 1

                                                                                                                                  They both do partial/on demand syncing of files and mount most services.

                                                                                                                                  Interesting… I can’t find anything about this in the docs. What does your setup look like?

                                                                                                                                  1. 1

                                                                                                                                    This is what I use to mount a google drive mount. Take a look at the Rclone mount page, specifically to the vfs file caching section.

                                                                                                                                    $(eval tmp_dir := $(shell mktemp -d -t rclone-XXXXXXXXXX))
                                                                                                                                    rclone cmount --daemon \
                                                                                                                                             --allow-other \
                                                                                                                                             --dir-cache-time 48h \
                                                                                                                                             --vfs-read-chunk-size 32M \
                                                                                                                                             --vfs-read-chunk-size-limit 2G \
                                                                                                                                             --buffer-size 512M  \
                                                                                                                                             --vfs-cache-mode full \
                                                                                                                                             --daemon-timeout 5m \
                                                                                                                                             --cache-dir $(tmp_dir) \
                                                                                                                                             mount_name mount_folder
                                                                                                                                    
                                                                                                                          1. 12

                                                                                                                            Yes, maybe? Most of the examples come from bioinformatics domain. If I look at remote sensing or GIS where C++ and Python reigns or in Epidemiology/Public Health/Clinical Trials where C++/R/SAS reigns, I am not so sure to see anything in Rust coming for a long time. It depends also of the research affinity, when I was looking at the uses of OCaml in scientific research, I found a researcher working on climate models. It only tells me that it is possible to do it not that it is a new trend or the norm. It is nice to see Rust percolating in scientific environment but that is the only thing that the article shows.

                                                                                                                            1. 10

                                                                                                                              I don’t think the article purports to show ‘all scientists everywhere are turning to Rust’. The headline isn’t even misleading. Yes, scientists (not all) are turning to Rust, and the why is explained inside.

                                                                                                                              1. 6

                                                                                                                                In science all the coding is done by graduate students and postdocs. It’s a great opportunity to try out the new shiny. No one is going to challenge you technically.

                                                                                                                                1. 5

                                                                                                                                  Then the grad student graduates and the new one is stuck trying to learn new_shiny_language that no one in the lab knows while under pressure to keep the research output going while trying to find a bug that is stopping an R&R to be done while working on their own research.

                                                                                                                                  There is a reason why some scientific code is still written in Fortran. It is too costly to to a rewrite in some more modern program, there are no incentives for a student to spend time rewriting old code because that is not publishable, and the people that have written the code have long left the research lab and no one really knows what is going on in the code.

                                                                                                                                  1. 2

                                                                                                                                    Our experiences differ, not on the graduate/Phd/postdocs part but I have been constraint to use specific technologies in certain research projects. During your Phd or postdocs maybe, you may have the freedom to chose your tools but it is not certain. Anecdata and personal bias of course.

                                                                                                                                    I am happy to see people using Rust in research, and fresh projects like WhiteBoxTools in GIS/Remote Sensing.

                                                                                                                                1. 20

                                                                                                                                  IMHO, the only reason why zoom is interested in end-to-end encryption is to be hipaa compliant so that hospitals can use the platform for remote visits. I don’t see a reason why zoom would be worried whether someone at their company or from other agencies would listen in my calls. I actually believe that they have incentives not to encrypt my calls in case they are served a warrant and need to provide a recording for the calls.

                                                                                                                                  Incidentally, my university has adopted zoom as their platform of choice for the medical school and hospital at the end of September, which would coincide more or less with their release of the end-to-end encryption.

                                                                                                                                  In short, I see the white paper as a evidence of compliance with some government regulation rather than a real effort to secure user communication.

                                                                                                                                  1. 6

                                                                                                                                    Yep. If US companies offer encryption, I wouldn’t expecting any kind of security guarantee out of it – they are either already NSLed or will be.

                                                                                                                                    1. 6

                                                                                                                                      I don’t think an NSL is relevant here.

                                                                                                                                      Most of the time, the companies are not compelled to undermine our civil liberties; they’re complicit. But even then, an NSL only allows surreptitious requests for subscriber data. The scope is fairly limited. If the FBI/CIA/NSA trifecta want more than that out of company that has principles, they need to explore other options (subpoena, etc.) because the NSL doesn’t encompass those requests.

                                                                                                                                      But really, most companies are run by would-be crony capitalists that don’t give a damn about your privacy, and therefore cannot be trusted.

                                                                                                                                  1. 7

                                                                                                                                    Maybe I am not understanding the solution, but how will a reputation score help when I get served a subpoena to the chat logs of a room that I host on my server? As a server operator, I will either have to comply with the subpoena and provide a copy of the clear text logs or face the consequences of not providing them.

                                                                                                                                    I can see how a reputation system could help administrators be more selective of the content/rooms/users they host, but it doesn’t really help in addressing the request made in the international statement that inspired the post.

                                                                                                                                    In other words, as soon as someone starts operating a public-facing matrix server, they become responsible for the content that others put up on it. The question then is whether matrix should provide a backdoor setting for administrators to access to logs in case that they do not want to take the risk of going up against the government if they get asked for the logs (see what happened to the lavabit guy, for example).

                                                                                                                                    1. 12

                                                                                                                                      So, anything which lets a server admin cough up clear text logs is obviously a backdoor. Any general purpose backdoor is obviously flawed. Therefore we have to offer an alternative: in this instance it’s something that can help investigation and prevention instead: letting folks self identify bad stuff. The authorities’ role then becomes one of infiltrating and preventing abuse at the time rather than busting privacy in retrospect. Frankly, the legal processes have no choice but evolve to reflect the reality of the technology rather than trying to turn back the tide and stuff E2EE back in its bottle.

                                                                                                                                      1. 3

                                                                                                                                        I hope you’re right that the legal processes will evolve, but I have to admit it looks like this latest political push for surveillance transcends national and partisan boundaries… I’m upset at the prospect of living in the world it would create, if successful.

                                                                                                                                        Thanks for offering a way forward. Good luck.

                                                                                                                                    1. 6

                                                                                                                                      chatroom which is limited to 50 messages in total, so use the chat wisely

                                                                                                                                      That’s a nice touch :-)

                                                                                                                                      The victim being random when there’s >1 crewmate in the room feels a bit weird, but does allow for quicker kills I expect.

                                                                                                                                      I’m wondering if there’s a text-based way to implement mechanics like sabotage and vision radius… and I’m imagining an Among Us MUD.

                                                                                                                                      1. 2

                                                                                                                                        I’m wondering if there’s a text-based way to implement mechanics like sabotage and vision radius… and I’m imagining an Among Us MUD.

                                                                                                                                        Something like TomeNET. But I guess we can impose rules ourselves and play Mafia/Werewolf on TomeNET too.

                                                                                                                                        1. 1

                                                                                                                                          Is this a parallel to the timed discussion in Among Us? The readme doesn’t say anything about having a time frame during which the discussion happens.

                                                                                                                                          I agree that might be a better implementation for a text-based game, as typing is not the same as talking on discord. Does anyone have an idea of how many messages are sent in the original game when people play in chat mode?

                                                                                                                                          1. 1

                                                                                                                                            From observation, typically four sessions of discussion with about 3 messages per surviving player average. Ten player max, game over when just two or four or six players remain (depending on the number of imposters)

                                                                                                                                        1. 6

                                                                                                                                          Thank you for the write-up!

                                                                                                                                          What is your take on using cgit over something more complex like gitea? . Last time I looked into it, one of the things that made cgit interesting was the option to serve a static website instead of hosting a gitea instance with the overhead that comes with it. Do you see any other benefits of cgit that maybe I haven’t considered?

                                                                                                                                          1. 6

                                                                                                                                            Can’t speak for the OP, but one argument for cgit is that you can point it at any bare git repo you like, whereas gitea is much more “managed” - there’s a database with all the repos it knows about and metadata and stuff. My usual workflow for “private-ish” git repos is approximately ssh my-shell-host "cd /home/git && git init --bare repo.git", so cgit fits quite nicely into that

                                                                                                                                            If anyone’s interested in the equivalent steps to get a barebones cgit running on nixos with nginx/fastcgi, I have a (hacky but reasonably well-commented) nixos module at https://gist.github.com/telent/b3cddb5b69d1206cb130bbff56b4d5e0 which you are welcome to.

                                                                                                                                            1. 2

                                                                                                                                              There are also less “managed” options in the static world, if I understand your intended meaning correctly. For example, stagit produces a static sequence of html pages, although it’s somewhat more geared towards smaller projects.

                                                                                                                                            2. 4

                                                                                                                                              What is your take on using cgit over something more complex like gitea?

                                                                                                                                              Well, as you imply, simplicity is one argument.

                                                                                                                                              And cgit shouldn’t be static, that’s could be stagit. cgit can generate miscelanious diffs, which would be horrible to implement statically.

                                                                                                                                              1. 3

                                                                                                                                                What is your take on using cgit over something more complex like gitea?

                                                                                                                                                cgit is lesser in all ways: hardware requirements, the amount of management, complexity; and that is what i like about it. For a personal instance, such as mine, I have no need for users, organizations, activity feeds, web notifications and the variety of other features provided by gitea.