1. 19

    I’m confused by this post.

    A github fork is a convenient UI over git’s branches. You create your own personal copy of the repository and the github platform does some configuration that allows you to use their nice UI for allowing them to merge your branch/changes even if you don’t have write access to the project. They have a nice UI for doing the commentary and reviews etc.

    Nothing says you have to use this. You can do bare git stuff, ask for write permissions, push your changes etc. etc.

    This post is complaining that github adds a value - for most users - by giving them a easy to use UI over git.

    1. 14

      The article is mostly about how github has overloaded the term ‘fork’, giving it the same meaning as a traditional personal repo copy. Forking a project, as the article mentions, is taking it and giving it new life as something else, without necessarily contributing changes back to the original repo (article gives the example of the ffmpeg –> libva forking).

      Github is taking well-established terminology and giving it new meaning in their proprietary platform, which causes confusion.

      1. 13

        Which possibly is an over load of “forking processes”, which possibly is a overload of the term “fork” meaning a thing with multiple prongs. “clone-ing” a repository relates to “A clone” of something and so on. Language is malleable.

        Github forks allow you to neatly merge back to what you forked from. Again, you don’t need to use this. I think this is just a bad argument.

        1. 7

          The article is mostly about how github has overloaded the term ‘fork’, giving it the same meaning as a traditional personal repo copy.

          It’s somewhat difficult to take this complaint seriously considering the article then proceeds to do the exact same thing by using the term “branch” to mean “clone of a git repo” instead of the already existing meaning of “branch within a git repo”.

          Either pedantry over terminology is a big deal or it isn’t (my take is that it really isn’t), but the author needs to pick a lane.

          1. 3

            Your application of branch is too specific and tied to the idea of a canonical upstream repository. Cloning is the process of creating a local branch which mirrors a remote branch. The end result is still a branch.

            1. 8

              Your application of branch is too specific and tied to the idea of a canonical upstream repository.

              Not at all, it’s merely tied to the semantics of git, the DVCS under discussion here.

              Cloning is the process of creating a local branch which mirrors a remote branch.

              Cloning creates a local copy of a remote repository and (typically, unless you do some things to restrict it) all branches within it. Creating a local branch which mirrors a remote branch is not a clone operation, it’s a git fetch —track.

              It’s incorrect to describe a fetch as a clone. It’s incorrect to describe a repository as a branch. These are all distinct concepts in git, with different semantics, regardless of whether or not your development model involves a canonical upstream or not.

              1. 1

                I’m definitely simplifying things a lot, and I appreciate your clarifications. But my point is that you end up with your own “master” branch, which is discrete from the other branch. The repo is just a container for that branch, no matter where you have it pushed. The key distinction between your repo and the other repo is the branch you’re doing your work on. Thus, I think branch is a valid term to use here.

        2. 8

          I think it’s easier to understand the implicit criticism by wondering about how Github would have been if it wasn’t based around forking a project to contribute to it. Currently the data that is used to create a pull request – in addition to the reference to which you want to contribute changes – is the name of your personal fork, and of the branch you want to send.

          Now if forks were not pervasive, how would people be able to contribute to projects on github? Well, either they would have to indicate the URL of a full git somewhere (and a branch name), or they would submit to github a series of patch (by sending an email, or by uploading git format-patch files, etc.).

          Those two models are not much harder to implement than the current forking model, and/but they immediately make github a partially federated platform, making it easy to contribute to a github-hosted project without yourself being on github or hosting your personal working copy on github.

          You can see how that is not much less convenient for users, and how it makes it much easiers for users to migrate to another platform.

          Personally I’m not at all surprised that github is not interested in supporting these less-centralized workflows, but I am a bit disappointed that Gitlab has not yet invested much effort in supporting them. Contrarily to github, there are many instances of Gitlab (the free version) around, and merge-requests-across-instances would be directly useful to me today. In fact, there are some instances that I avoid using specifically because they restrict account creation (or creation of new projects for untrusted users), which kills collaboration in a fork-centric model. (See gitlab issues #4013, #260, patch by email #40830).

          1. 7

            Now if forks were not pervasive, how would people be able to contribute to projects on github? Well, either they would have to indicate the URL of a full git somewhere (and a branch name), or they would submit to github a series of patch (by sending an email, or by uploading git format-patch files, etc.).

            But “Github forks” being pervasive does absolutely nothing to prevent you from continuing to do just this?

            Unless the complaint here is “if I email a patch to a maintainer, they complain that it’s a PITA to deal with and ask me to just submit a PR”, in which case…

            You can see how that is not much less convenient for users

            I’ve worked in both models, I strongly disagree.

            Federation is great. Emailing patches around sucks a ton.

            1. 2

              “matt” matt@lobste.rs writes:

              Emailing patches around sucks a ton.

              Having a fast tagging system like notmuch helps a lot, and doesn’t take much time to setup.

              1. 2

                Emailing patches around sucks a ton.

                I agree. However, after thinking about it a bit, @gasche’s comment made me wonder about a word where github supplied a “git-format-patch-pr” – similar to emailing patches, but which instead uploaded patches to github from a local branch into some ephemeral PR specific location. All this instead of having to create a fork and open a PR.

                1. 4

                  The Phabricator workflow is pretty similar to what you’re describing: you can use arc to upload patches from a local branch. Or just paste diff output into a web form without even committing anything :D

                  1. 1

                    I like this workflow so much, but I couldn’t convince my co-workers of its qualities, because they (understandably) get uneasy about the fact that patches don’t have a clear “target” – and they can mean different things depending on what they are applied to. (If anyone has a suggestion for a solution for this, please contact me.)

                    (So instead they keep using Bitbucket and adopt all kinds of terrible practises for the only reason that doing the right thing is made highly inconvenient by bitbucket.)

                  2. 2

                    Emailing patches around sucks a ton.

                    I agree. However, after thinking about it a bit, @gasche’s comment made me wonder about a word where github supplied a “git-format-patch-pr” – similar to emailing patched, but which instead uploaded patches to github from a local branch into some ephemeral PR specific location.

                    I wonder if it would be considered rude to simply attach a patch to the relevant github issue..? Or link to a patch stored on a paste service like ix.io? You can add to github issues over email, so with this method you could sort-of submit patches over email, bypassing the whole PR interface.

                    1.  

                      Worth noting that, while it doesn’t stop you from needing to make a fork, you can retrieve a .diff or .patch from any comparison: https://github.com/microsoft/VFSForGit/compare/features/linuxprototype...github:linux-gvfs-provider.patch

                2. 4

                  A github fork is a convenient UI over git’s branches

                  Not in my experience. It usually consists of:

                  • Patch my clone
                  • Try to push
                  • Get permission denied, since I forgot that I don’t “own” the remote
                  • Go to the GitHub page
                  • “Fork” the repo
                  • Edit .git/config, or try to remember what the git remote commands are, to change origin to my “fork” (I usually keep the previous remote as upstream, since my “fork” will inevitably fall behind and need regular pulling and pushing)
                  • Try to push again
                  • Go back to GitHub and use the “pull request” UI
                  • Add a comment to the pull request that links it back to the “issue” that caused me to make the change
                  • Try to keep track of both threads of conversation, until the patch gets merged
                  • Have a small existential crisis as I debate whether to delete this “fork”, to cut down the hundreds of repos I have to wade through; or whether I should be a good Netizen and preserve the URLs.

                  Phew!

                  1. 1

                    Afaik you don’t have to change the upstream, you can just say git push <alternate upstream>. I might be wrong about that, though; might be confusing it with git pull. Also, afaik deleting the ‘fork’ is standard practice; you aren’t really contributing to linkrot that way, because you’re not erasing any content, and no one is linking to such a fork.

                    To be clear, I still think the github pull request ui is awful, but ^^^ is my understanding of how it works.

                    1.  

                      Indeed, git push takes either a URL or remote name as its first numbered argument. Sometimes if I need to make a quick push to a remote I don’t use commonly, I’ll just detach HEAD, make commit/s, then git push git@wherever:whatever.git HEAD:remote-branch.

                1. 2

                  I’ve never understood this – why would I want to configure my editor based on your project? I want my editor to use my config, not yours.

                  1. 19

                    I want my editor to fit a project’s style to reduce friction collaborating with upstream / teammates.

                    1. 6

                      So I don’t have to tell every new employee/contributor to use the correct line ending for the project, or to add a newline at the end, or to use the correct indentation size. EditorConfig takes care of those kind of common setting that you should really always be using for the project you’re working on.

                      Some people have proposed to add stuff like spell checking, and those are the kind of personal settings that should not be in EditorConfig (like auto indent, automatically add closing paren, etc.) but I don’t see why there is ever a reason to not use a project’s standard indentation size and such?

                      1. -1

                        Why should I use a project’s indentation in MY EDITOR? That’s absurd. If EditorConfig is for such things, it’s misnamed. It should be named FormatterConfig instead. I edit the code using my config, and format the code using project’s config just before the commit.

                        1. 5

                          I don’t think this is how most people work, and it’s quite non-trivial to do correct: you can’t just do a search/replace when the indent size and max line length are different. You’ll either need to spend some time re-formatting the code, or your code will look strange.

                          If whatever you’re doing works for you: great, keep on doing it. But simply setting the indentation size to the project’s is what works for the vast majority of people, and that is no more or less “absurd” than any other personal preference.

                          Besides, you can do whatever you want with an EditorConfig file; you don’t need to do anything with it, or you can just read the file and echo the settings if that’s what you prefer.

                          1. -3

                            In a sane language, for example Rust, you just type “cargo fmt” before the commit. I spend zero time manually re-formatting the code, and my code looks great.

                            1. 8

                              How is this even relevant here. I am guessing EditorConfig probably came out before rustfmt itself. For example, I have to use tabs for a project I contribute to and EditorConfig takes care of it and I don’t have remember to switch settings for my editor each time I edit a file in that project alone since I am not using tabs otherwise. I can use clang-format later but EditorConfig is a low hanging fruit.

                              1. 1

                                Thanks for this comment, I think it summarizes the use case for this tool perfectly.

                              2. 5

                                Guess I’m not sane then ¯\_(ツ)_/¯

                                1. 2

                                  Is having a cross-platform autoformatter that works well necessary and sufficient for a language to be sane, or merely necessary? Are there any languages which you currently classify as insane which would become sane if only they had a cross-platform autoformatter that works well?

                                  1. 1

                                    We even have CI warn if running cargo fmt would change the layout.

                                    I guess we could use a git hook, but since we are running tests and clippy anyway.

                          1. 4

                            The way the current machine learning revolution was explained to me was simply that “We stopped trying to manually extract features, and instead made the models work on raw data. It turns out with these statistical methods we can find a lot of useful features on dimensions we would never even think of encoding into our manual feature extraction.”

                            This article is basically a conclusion of that description. Very interesting! It really highlights how arbitrary human classification processes can be – further emphasized by the amount of re-labeling that happens constantly in biology.

                            1. 3

                              Yep! I think that’s a really concise and useful summary. A downside is you lose access to explicit/interpretable feature representations, but you gain potentially much better performance while removing lots of manual labor.

                              Tangentially related is this idea.

                            1. 1

                              What has happened in the .net ecosystem, and the industry at large, is a gradual shift from component-based Visual Basic-style gui applications to web applications. In later versions of .net Framework, the main selling points are no longer the gui builder tools, but the asp.net framework. With .net Core, this is taken an additional step further: there are no longer any gui builder tools; it’s all about the web.

                              I’m guessing it’s also about the fact that implementing a cross platform GUI layer is a non trivial thing to do well, and ongoing effort to keep up s new operating systems and their associated APIs are released.

                              1. 3

                                I think the same thing could be said about web browser-based GUIs. If we spent half the effort needed to uglify and babelify and shim web GUIs to make them cross-browser compatible on instead making a native cross-platform GUI toolkit we’d get a long long way.

                              1. 1

                                I recently had an opportunity to do a limited, small-ish implementation in PHP at my current job. I took it, having heard a bit about how PHP has improved over the years. I did everything I could to adopt current best practices, and spent hours upon hours digging out the new ways to do things.

                                It’s better than 10 years ago, sure. But that’s not a very high bar. In the end, I was disappointed by the lack of progress and while it was a valuable experience, I would not do anything new in PHP now.

                                1. 3

                                  I have never really invested in learning org mode. I can’t really imagine planning my life on a laptop/desktop running emacs. What happens if I remember something that I need to add to my TODO list when I’m out for a walk or at a store? Does org mode actually solve this problem in some way that I’m not familiar with or does it just fail to work for people who aren’t permanently attached to emacs?

                                  1. 3

                                    I use and really like Orgzly on Android: http://www.orgzly.com/

                                    (I think there are some iOS apps also.)

                                    There are multiple ways you could use Orgzly. In my case I sync all my .org files[*] to my phone for browsing/searching but on mobile I add new entries to an “inbox.org” file and then go through that file every few days in emacs. This works well for the “oh I just remembered/realised this thing” case, and organizing/editing in emacs is easier than on a phone.

                                    I don’t use Dropbox so instead Orgzly uses a directory on the phone and SyncThing syncs that directory with my laptop. This has been pretty reliable for me (although I think it helps that I only add things in inbox.org on my phone and then empty this file on my laptop, minimum potential for sync conflicts.)

                                    [*] I have a to-do list but I also keep a lot of “personal wiki” type notes and ideas in Org.

                                    1. 2

                                      I don’t use Dropbox so instead Orgzly uses a directory on the phone and SyncThing syncs that directory with my laptop. This has been pretty reliable for me (although I think it helps that I only add things in inbox.org on my phone and then empty this file on my laptop, minimum potential for sync conflicts.)

                                      Orgzly and SyncThing is precisely my set up too, and it has worked unbelievable well. I have also figured out why, and it’s not because of the one-directionality of inbox.org – it’s because at any given point in time, I am either in a location where I can not add things on my computer, or am I in a location where my phone can sync its additions over wifi. So the only case where I get conflicts is when I accidentally have the wifi turned off on my phone, and it accumulates notes without syncing them.

                                      1. 1

                                        That’s good to know. I sync a lot of data with syncthing so it’s not enabled unless my phone is on wifi and also charging, so not as clear cut for me. But good to know it would work smoothly if I changed that.

                                    2. 2

                                      The article is posted on the blog of a mobile app (“beorg”) that syncs orgmode files via Dropbox, iCloud, etc and helps with the mobile editing affair

                                    1. 3

                                      If I’m not mistaken, .NET Core performance lagged behind Windows when if first came out. How is .NET looking on non-Windows platforms these days? Are people using it outside of Microsoft land? (Azure, MS SQL Server, Windows, etc.)

                                      1. 6

                                        AWS Lambda has .NET support. I found a blog that compares .NET Core 1, 2, Java, Go, Python & Node.js. The big take-aways are

                                        • .NET Core 2 significantly out-performs Core 1
                                        • .NET Core 2 was up to 3 times faster than Go
                                        • Go was similar to Java
                                        1. 4

                                          .Net Core is much faster than .Net Framework on or off windows in my experience. People are using it outside of Microsoft land AWS,PGSQL,Linux, etc. As to what amounts I couldn’t tell you but anecdotally I know people who do, so at least SOME are.

                                          1. 4

                                            I work at a .NET shop which is slowly moving away from Microsoft land. We started rolling out production customers on Linux servers a few months ago, and so far we have seen pretty insane improvements in performance. Average response times are halved, 99 and 100 percentiles look even better. (Obviously measured on the same physical machine with the same version of .NET Core etc.)

                                          1. 4

                                            Animated spinners are possible in CSS as well. No need to chuck a gif in there!

                                            1. 2

                                              Yeah, but title says “5 Tasks You Didn’t Know Could be Done…” and animated spinners in CSS is something I believe to be common. The 5 in the article are, at least for me, some gems.

                                              1. 1

                                                Good point. Probably even worth it, since would be like 1kb of CSS instead of the 50 kb gif.

                                              1. 18

                                                I have long had trouble understanding how the blockchain gets the very many magical properties ascribed to it. Every time I’ve asked, I have gotten a lecture in hashes and proof of work. I know the fundamentals of blockchains, what I don’t understand is how the fundamentals lead to these amazing emergent properties.

                                                This article sorta kinda makes me think I might not be missing anything at all – the people talking about it may have been full of shit.

                                                1. 8

                                                  I have been wrong before but to me it seems like a mass psychological phenomenon. That many people and that much money cannot be wrong! So they add more people and money.

                                                  In the best case, some companies use the label “block chain” to market some established cryptographic techniques that are not block chain at all…

                                                  1. 7

                                                    yep. You might think “blockchain” meant something like “append-only ledger with a consensus mechanism”, but it turns out in practice to literally just mean “whatever I’m trying to sell you today”.

                                                    I was talking about this a few months ago with a well-meaning non-techie, who suggested that Uber - the money-burning minicab firm with an app - was an example of a “decentralised system.” More than that - it was a model for how blockchain could succeed.

                                                    I think they’d never thought about the concept of distributed systems of any sort ever before in their lives.

                                                    “It’s like blockchain, because anyone can sign up to be an Uber driver!”
                                                    “Uh … anyone can sign up to be a minicab driver.”

                                                    or the very concept of “open source” only being possible with “blockchain”.

                                                    1. 4

                                                      The weird thing is that I know intelligent, technical people that advocate for this. If asked for specifics, some variant of “we still have to figure out the specifics” is used.

                                                      Well, chances are that you never will…

                                                    2. 4

                                                      The hype cycle became self fulfilling. I got a look at the internal roadmap for one of the pieces of legacy software at the big enterprise I work at - crusty old barely touched 90s technology that’s critical for parts management and ordering.

                                                      2020 plans? Traceability of parts on the blockchain.

                                                      1. 1

                                                        Traceability, correct me if I’m wrong, was one of the actual things a distributed append-only ledger was good at. The way I see it, it’s a good decision with regards to what tech to use, at least until someone puts the wrong data in.

                                                    3. 4

                                                      As well as that non-tech explanation talk, I have the longer and more techy version to an infosec group. (My mission to get across to them: “please don’t get into blockchains”)

                                                      1. 4

                                                        You’re sending the wrong message. Instead, tell them to come up with something useful, pitch a blockchain version, build the useful thing first with the money, open source it, and then build the “better” version with blockchain. We’ll steadily get more useful stuff out of blockchain investments.

                                                      2. 4

                                                        I’d like to refer to this article about Bitcoin from 2011, before all the mass hysteria began: https://paulbohm.com/articles/bitcoins-value-is-decentralization/

                                                        To elaborate: Bitcoin isn’t just a currency but an elegant universal solution to the Byzantine Generals’ Problem, one of the core problems of reaching consensus in Distributed Systems. Until recently it was thought to not be practically solvable at all, much less on a global scale. Irrespective of its currency aspects, many experts believe Bitcoin is brilliant in that it technically made possible what was previously thought impossible.

                                                        /edit quote

                                                        1. 4

                                                          Herd behavior. It’s usually irrational except for the scheming people fueling and benefiting from it.

                                                          Blockchain looks like herd behavior. Similarly, most of it has a tiny group of people that will get rich if enough buy in. That’s similar to how companies like Goldman create, pop, and profit from bubbles in the market.

                                                          1. 3

                                                            irrational exuberance meets unjustifi-ed/able faith in technology.

                                                            if you bought into a blockchain, you want to hype it up because that’s how you get paid. If you didn’t buy into it, well you got bored of trying to reason with people a long time ago.

                                                            It’s probably the most interesting social phenomenon of recent years.

                                                            1. 2

                                                              It sounds basically like a pyramid scheme when you put it like that…

                                                            2. 2

                                                              The flip side of this is that some companies are actually trying to look for the blockchain-based “killer app”, if such an app ever exists. I did develop a few blockchain based proof of concepts, which didn’t go anywhere, but there wasn’t any attempt to trick anyone. It’s just about experimenting with a new technology and see what can be done with it.

                                                            1. 14

                                                              What you would do is, instead of having that data being diverted to third-party servers that you have no control over, you would either set up your own server or pay for a service by a trusted third party to store that data yourself.

                                                              Peak silicon valley capitalism: dying because the doctors couldn’t access very important info about you because the server with that info was turned off because you didn’t pay for the hosting.

                                                              1. 17

                                                                I think peak Silicon Valley capitalism would be a free medical record host that profits off the data.

                                                                1. 15

                                                                  Or one that you pay but which sells the data anyway (23andMe).

                                                                2. 3

                                                                  Heh. Do me a favor, and do a quick search of software for your average doctor’s office or hospital, and let me know which one is the best.

                                                                  1. 25

                                                                    I’m currently on my third stint in health care.

                                                                    The stuff in a typical doctor’s office is not great, but I’d still take it over the average blockchain solution-in-search-of-a-problem any day of the week. The fundamental properties of a blockchain are the opposite of what you want for medical data. Blockchains have everything public and immutable by default and design. Medical data is private by law and must support corrections and errata. In fact, properly handling medical data often requires that you implement a time machine and be able to change history, then replay the new timeline forward.

                                                                    Here’s an example: suppose there’s some ongoing treatment that requires documentation before claims on it can be paid, and the documentation doesn’t come in until after the first 4 claims. The first 4 claims would have been rejected, and now you have to rewind time, then replay those 4 claims and pay them.

                                                                    Or say there’s a plan with a deductible: the first $500 of costs in the year are the patient’s responsibility, then the plan pays all claims after that. But a claim for something that happened early in the year doesn’t come in until later, after you think the deductible has been met. On many plans – including some of the US government-backed ones – you now have to start over, rewind time to the start of the year, and replay all the claims in chronological order, processing things according to what the deductible situation would have been if the claims had arrived in that order, and pull refunds from doctors you weren’t supposed to pay, order refunds to the patient from doctors who should have been paid by you, and reconcile the whole thing until the correct entities have paid the correct bills.

                                                                    An append-only structure is fundamentally terrible at this unless you build a whole bunch of specialized stuff on top of it to treat later entries as addending, modifying or replacing earlier ones. And since at that point you’ve gone and built a mutable history structure on top of your immutable blockchain, why didn’t you just build the mutable history software in the first place and skip the blockchain? You’re not using it for any of the unique things it does.

                                                                    And that’s just the technical/bureaucratic part of the problem. The social side of the problem is even worse. For example: sometimes it is incredibly important that a patient be able to scrub data out of their medical history, because that data is wrong and will influence or even prejudice doctors who see the patient in the future. Doctors who just ignore obvious symptoms and write down in the notes “it’s all in their head, refer to a psychiatrist” are depressingly common, and every future doctor will see those notes. When it turns out that doctor was wrong and there was a real problem, you do not want to have to fight with the next doctor who says “well, it’s here in your file that this was found to be psychosomatic”. You have to get that fixed, and it’s already hard enough to do without people introducing uncorrectable-by-design medical records (and no, merely putting a big “that doctor was wrong” addendum in the medical blockchain is not a real solution to this).

                                                                    Compared to how much worse it could get with blockchain, the crappy hairballs of only-run-on-Windows-XP (or worse) software in a typical doctor’s office are downright pleasant.

                                                                    1. 5

                                                                      this is the sort of thing I heard from Americans who work in health care when they reviewed the article ahead of time, yeah.

                                                                      The big problem they flagged was data silos - lots of patient data trapped in systems that don’t talk to each other, and the ridiculous dificulty and expense of extracting your health record from your doctor (though passing your stuff to another doctor is apparently fine). You can see the blockchain pitch in there - “control your own data!” … not that it can offer a solution in practice.

                                                                      1. 8

                                                                        though passing your stuff to another doctor is apparently fine

                                                                        It absolutely is not, at least technically, unless both doctors happen to use the same EMR, in which case it’s merely painful; or, if you’re extremely lucky, the same instance of the same EMR (for instance, half the health care in eastern Massachusetts uses Mass General’s EMR), in which case the experience is basically reasonable. Otherwise, you end up with some of the most absurd bullshit imaginable, that makes mailing paper charts seem reasonable in comparison; the best I’ve heard is a mailed CD containing proprietary viewing software in order to send imaging.

                                                                        Interestingly, while “patients should own their own data” is a nice pitch, it’s actually somewhat problematic in practice. Health care providers may need to share information about a patient that patient would object to or should be kept unaware of (for instance, if a patient has been violent towards providers in the past, that information absolutely must be conveyed to any future providers that see them); and, like all professionals, health care providers use a lot of jargon in order to communicate clearly and precisely, which tends to make the chart incomprehensible to laypeople.

                                                                        1. 3

                                                                          In the US, HIPAA provides a right to your medical records, similar (but not identical) to what a European would be familiar with from the GDPR. The gist of it is that you can make a request to any medical provider who’s treated you, and they have 30 days from the time of the request to provide you with a copy of your records. There are some exceptions (the most common exception is therapists’ notes), but not many.

                                                                          I would guess that a lot of people probably don’t know they have this right, and probably a lot of medical providers aren’t forthcoming about making sure patients really understand their rights (they have to provide a notice of their privacy-related policies in writing, but a written notice in legalese is not the same as genuine understanding). A bigger problem is just that most people aren’t really able to look at medical records in their “standard” form and understand what they’re seeing.

                                                                          And like the other commenter points out, interoperability between medical providers is not great. HIPAA allows medical providers to share information for treatment purposes, though, and the rules produce results that sometimes seem odd to modern tech people (for example, in the US the medical industry relies heavily on fax for sharing documents, because it’s often both the technically and legally simplest way to do so).

                                                                        2. 3

                                                                          Maybe I’m missing something, but examples you give are related to health insurance, not medical records per se – those are two different concerns that are related, but the latter can exist without the former. Medical records are immutable if they store facts, even wrong diagnoses – after all, how do you figure our that some diagnosis is wrong – by someone else claiming the otherwise and providing supporting evidence. Further, medical records are not a single blob of information, they are more like tiny databases, for which we can have various ACLs for various pieces of information – IBM did quite a lot of work in that direction, IIRC. Nevertheless, blockchain is not the right tool, at least not for this domain.

                                                                          1. 5

                                                                            Claims are medical records just like everything else.

                                                                            1. 1

                                                                              But that depends on the definition what a medical record is, no? In socialist countries with universal healthcare, there is no such thing as claim that should be reimbursed or a plan with deductible. However, what is universal across the board is the state of body and mind, that is, all diagnoses and prescribed medications.

                                                                              1. 6

                                                                                From this comment by @ubernostrum further up the chain:

                                                                                The social side of the problem is even worse. For example: sometimes it is incredibly important that a patient be able to scrub data out of their medical history, because that data is wrong and will influence or even prejudice doctors who see the patient in the future.

                                                                                This applies even without the baroque details of the US health insurance system. And even in countries with universal coverage, you still need to look out for fraud, fraudulent prescription of drugs, etc. The money comes from somewhere and it shouldn’t be wasted.

                                                                                1. 3

                                                                                  Here in Finland “universal” claims for things like medical pensions (whatever it’s called, disability retirement) are routinely denied. It’s tough, because people do try to abuse the shit out of it, but sometimes proper claims get denied. The processes for countering these claims are long and costly.

                                                                                  We also have systems within the same public health-care district that don’t talk to each other. The private franchises have handled that better, by asking for permission to share data, because it gives a better customer experience.

                                                                                  This is fortunately changing, but the data is now within a single point of failure, also duplicated in part for every relevant franchise.

                                                                                  Getting your data into the unified system incurs a cost. I don’t know if you can opt out of it, but you probably don’t want to, as the cost is not high, I think insurances cover it (transfer of wealth style) and it’s more convenient to check the records online than papers in a binder somewhere.

                                                                                  1. 3

                                                                                    That is, for me, the key point. I have had a close relative get the wrong treatment for years because a doctor hastily put in an incorrect diagnosis and everyone after that just assumed it was correct.

                                                                                    Why did it take so long to have it edited out of her records? Because one symptom of that diagnosis is denying it. Once that diagnosis is in your records, whatever you say, the next doctor will just put in a note saying, “patient does not think she is suffering from X”.

                                                                                    So as far as I’m concerned, mutability of medical records is absolutely crucial. (Of course with a detailed log of operations visible only on court order or something.)

                                                                            2. 1

                                                                              Blockchains are indeed append-only logs, albeit ones constructed in an interesting way.

                                                                              And yet within a blockchain-based system state changes are made over time (Bitcoin balances change, CryptoKittes get new owners) by parsing the data contained within those logs.

                                                                              In a medical system this means that records are indeed mutable/scrubbable. Want to fix a record? Post an update to the system’s blockchain. The record is the result of parsing the logs, so this updates the record. If you want a scrubbable log that’s also doable, although it does affect trust in the system in ways that take more thinking through than just “but GDPR!!!”.

                                                                              All that said, like the OP I’m very wary of “control your data” pitches of all kinds. Don’t get me started on data-ownership UBI. ;-)

                                                                        1. 2

                                                                          a related website is guesstimate - https://www.getguesstimate.com/models/13218 - where you can combine estimates, by way of sampling and simple functions. it’s pretty neat.

                                                                          1. 2

                                                                            Hub, this is a really cool website. Not because it is particularly technically challenging, but because it is so simple. I won’t claim to know anything about the future, but I have a strong mental image of a world where the computational statistical advances we’ve seen in ML have triggered a complete statistical revolution. This tool is straight out of that world. The only thing I haven’t seen is some sort of Bayesian relationship. (come to think of it, it might be there only I haven’t read the fine manual.)

                                                                          1. 2

                                                                            I love the concept of finding captivating ways to convert maths to graphics. I would never have guessed this would turn out so well! I’m also glad radix sort was included.

                                                                            1. 11

                                                                              What about this is intrinsic to vi as an editor? Most of these seem to be about vi’s mode/input mechanism and the fact that it has good unix interop.

                                                                              Sometimes it seems to me that if you set aside everything that can be emulated by other Editors (Evil, Jetbrains VIM pugin, …) vi is only left with is:

                                                                              • a small footprint,
                                                                              • the fact that most *nix machines have it installed,
                                                                              • mysticism about being the l33t editor

                                                                              The whole system was, as Bill Joy himself confirms, made for “[…] a world that is now extinct”:

                                                                              It [writing vi] was really hard to do because you’ve got to remember that I was trying to make it usable over a 300 baud modem. That’s also the reason you have all these funny commands. It just barely worked to use a screen editor over a modem. It was just barely fast enough. A 1200 baud modem was an upgrade. 1200 baud now is pretty slow.

                                                                              9600 baud is faster than you can read. 1200 baud is way slower. So the editor was optimized so that you could edit and feel productive when it was painting slower than you could think. Now that computers are so much faster than you can think, nobody understands this anymore.

                                                                              But then again, I have an irrational aversion. The only reason I use it is because my muscle memory types “vi” when I want to edit something in a shell ^^

                                                                              1. 4

                                                                                The reason that it’s still popular today is that the things that made vi a good editor over a 300 baud link still make it a good editor for the fast, efficient editing of text today, even though the bandwidth limitations no longer apply. Vi lets you manipulate text (and code, in particular) much quicker than a “standard” editor that forces you to navigate with the arrow keys, home, end, etc. The price you pay for that efficiency, of course, is one heck of a learning curve.

                                                                                1. 3

                                                                                  Two points:

                                                                                  1. That’s the keybindings, not the editor. As I pointed out, others have the same features now.
                                                                                  2. I still think the most important reason it’s so popular is that it’s usually the most popularly suggested “real” text editor. I see it at my university, most people seem to be using vi(m) because their tutors use it, and I’m guessing their reason is more or less the same. This probably goes back some 30-40 years back to when, for example, vi was built into a UNIX system, while you had to pay for Emacs. I’m not saying it’s the only one, just not-insignificant.
                                                                                2. 3

                                                                                  Vim plugins are so important to my workflow that using Vim modes in other editors is always going to be a less productive experience. I wish the Vim plugin loader could be packaged into other editors.

                                                                                  The most important ones are a fuzzy search tool https://github.com/junegunn/fzf

                                                                                  and a way to jump between words on a page real quick https://github.com/easymotion/vim-easymotion

                                                                                  1. 4

                                                                                    I wish the Vim plugin loader could be packaged into other editors.

                                                                                    What is the “package loader”?

                                                                                    And if you’ve ever considered Emacs, both of your examples have two packages: ivy and helm for fuzzy search (and a lot more) and ace-jump and avy for to jump to words (or charachters, lines, subwords, …).

                                                                                    Also, don’t forget: this vim isn’t vi.

                                                                                    1. 2

                                                                                      I mean the subsystem of Vim that supports plugins. Editors with “vim-support” just support the key mappings, not the plugin ecosystem. So they’re much weaker than the real Vim.

                                                                                      Cool, didn’t know about the counterparts for Emacs.

                                                                                      1. 2

                                                                                        Not necessarily. Emacs’ Evil mode has quite a lot of integration, see melpa, and I’m quite sure that in the case of IntelliJ, there are plenty of plugins and already built in features.

                                                                                        1. 1

                                                                                          I mean to say that you can’t just use a Vim plugin wholesale in other editors with Vim modes. These are pretty helpful plugins though. I didn’t know about them, thanks.

                                                                                        2. 1

                                                                                          …wait, really? I have heard a lot of things about emacs vs vim, but never have I heard that emacs should be much weaker tjan vim in terms of plug-in ecosystems!

                                                                                          I’d be surprised if you had a non-trivial plug-in for vim that also didn’t exist for Evil mode – and, conversely, I’m sure I can find plugins for Evil mode which don’t have a counterpart for vim.

                                                                                          1. 1

                                                                                            Try spacemacs. It probably has cognates for almost all your favorite plugins already installed, or automatically installed when you access a file of the relevant type.

                                                                                      2. 1

                                                                                        I largely agree. Most of my ‘vim’ use these days occurs ourside of vim itself - in Visual Studio, VS Code, Firefox etc. For the most part I’ve found that the plugins I “needed” in vim are either covered by VS Code, or there’s an comparible VS Code plugin.

                                                                                      1. 2

                                                                                        I think there’s a point here, but felt the broken wristband example was a straw man. The patient received the wrong medication because they were given the wrong medication because the nurse didn’t know their name because the nurse didn’t read the wristband because they weren’t wearing a wristband. Solution: give patients wristbands.

                                                                                        1. 3

                                                                                          Did you mean that as a summary of the article’s first 5-why-analysis of the wristband incident, or as a proposed beter analysis? Either way, it’s less deep than the original – it only reaches point 2 of the article’s analysis, and so never figures out the wristband wasn’t given because the wristband printer was broken. Here’s the article’s 5-why analysis:

                                                                                          • Incident: Wrong patient medication error
                                                                                          • Why? Wristband not checked
                                                                                          • Why? Wristband missing
                                                                                          • Why? Wristband printer on the unit was broken
                                                                                          • Why? Label jam
                                                                                          • Why? Poor product design

                                                                                          Extra ironically, even that 5-why analysis is not meant to be good – it is one example of an insufficient analysis of this causal and contributing factors tree diagram.

                                                                                          1. 2

                                                                                            Any time the conclusion is “someone else made an error” you’re looking at a shitty five-why analysis. Essentially any cause that’s outside of the control of the organisation is not very useful to look at, which includes “poor product design” and “unusually high workload”, among others. These things are going to happen and the system should not rely on them not happening.

                                                                                            The path that goes down “Nurse did not ask for help with finding working printer” looks to me like a much more productive path.

                                                                                            1. 2

                                                                                              Sure, and how would you know your five-why analysis went down the most useful path without checking all the other paths? And even if you go down the ‘most’ useful path, why should your RCA method let you ignore the other also-useful paths? That’s the article’s entire point: even the best five-why analysis will give incomplete results.

                                                                                              How much might be omitted when using ‘5 whys’? The tree diagram for our example uncovers more than 75 whys (causes and contributing factors), each of which is a potential target for action to reduce the risk of a recurrence. The ‘5 whys’ approach would identify only one (or possibly two) root cause as target for action. At best, this represents <3% of the opportunities for improvement identified using the tree diagram.

                                                                                              “The 5 whys approach is not simple but simplistic.”

                                                                                              Simplicity is a complicated virtue when it comes to the frameworks, tools and techniques of QI. […] as this paper has shown, the ‘5 whys’ approach has clearly overshot the mark: it is not simple, but simplistic. It is, as Leveson describes, “… perhaps the most simplistic [accident analysis technique] and … leads to the least amount of learning from events”.[13]

                                                                                              (That final quote from Leveson, by the way, is from the Nancy Leveson, the researcher on systems safety who also did the Therac-25 investigation. Thanks to Hillel Wayne for teaching me that in his blog post “STAMPing on event-stream”, which he and I both link to elsewhere in these comments.)

                                                                                        1. 7

                                                                                          I couldn’t not post something when I read the title. I’m currently waiting for a test which takes about 15 minutes. Thing is, I have to compare the output to values defined in a CSV, and the biggest chunk of the time is waiting for the input to be loaded. Usually, there are a lot of errors immediately, but I only see this after 15 minutes. Then, I have to open the Excel file which computes the values used for checking (takes ~10 minutes), and find the corresponding computation in both the program and the Excel file. When I fix the error, I have to fix the error in the code or in the Excel file. When it’s in the Excel file (it usually is), I have to run it (takes about 20 minutes), and update the CSV in some specific location, and run a program to put the new values in the database. This again, takes about 20 minutes.

                                                                                          So all in all, I spend about an hour for every error I find. This can be (and often is) something as simple as a typo in the Excel file. I have complained about this process before, but I don’t have the time/authority to change this (there are about 15-20 people working on this project, so I can’t just change the workflow if it’s not a task that is assigned to me). I don’t know why others think this is acceptable. Maybe because I’m usually the one ending up doing this tedious task, because I’m the ‘technical guy’ in my team.

                                                                                          This all used to really drag me down to the point of taking my work home and having a bad mood because of it. Now I’m a bit apathetic. If they don’t fix this, they’ll just pay me to do dumber work and be less productive. Their loss.

                                                                                          1. 16

                                                                                            And people wondered why I favored unit tests over integration tests in Working Effectively with Legacy Code.

                                                                                            1. 2

                                                                                              My favorite was a discussion about a review of your book the other day where someone complained that unit tests cause tests to be too fine grained, and that logic of your solution creeps into your tests. Seems like that’s a good litmus test for complexity getting out of hand.

                                                                                              My work code base mingles what are clearly integration tests with unit tests and as a result no one runs the full suite before pushing new code and means master has a 50/50 chance of being broken at any given time. It’s terrible.

                                                                                            2. 3

                                                                                              Consider yourself lucky – the test suite on a project I worked on 10 years ago took over 6 hours to run (the developers didn’t consider the speed of the tests when writing them – since the tests are “not in the fast path”).

                                                                                              One thing we did do well is make sure that each test in the suite tested one and only one thing. If the suite failed, it was possible to re-run just one of the failing tests (which would take 10-20 seconds), rather than re-running the entire suite. In your case, it sounds like the developers of the tests might benefit from AAA (Arrange, Act, Assert) – which applies mostly to unit tests, but can also be used in integration tests.

                                                                                              1. 2

                                                                                                I’m talking about an individual test, which takes about 15 minutes (we have some which take up to half an hour). If you run all tests, it takes like 4 hours, I guess. So it often happens that you run tests before pushing to develop, and while the tests are running, someone pushes to develop, so you have to merge and run the tests again (and hope that no one pushes this time). Or, you just run some important tests and push. Then, if you break develop, you’ll know after 4 hours (and of course, people will have pulled from and pushed to develop).

                                                                                                1. 3

                                                                                                  That’s solved by having a good integration flow, like bors-ng or zuul, in place. If your patch breaks the test suite, your patch should not land.

                                                                                              2. 3

                                                                                                I feel like there has to be a way in which you can take advantage of everyone’s apathy and disinterest in that task, artificially increase the cost of it over time, like pretend the spreadsheet takes an hour to load, and then try to do something about it in the newly created dead time? It’ll be slow work, but incremental improvements do lead places.

                                                                                                1. 3

                                                                                                  Start getting paid by hour and detach yourself from the process and see your joy and happiness raise dramatically.

                                                                                                  You’ll be absolutely delighted to know that tests have slowed down, as it allows you an extra cup of coffee/tea and another round of play with your doggo (or catto) :P

                                                                                                  1. 1

                                                                                                    Usually, there are a lot of errors immediately, but I only see this after 15 minutes.

                                                                                                    Surely there’s a way to make your test framework fail fast? If it’s a case of loading everything into memory (which I doubt), then again surely there’s some streaming library for your language.

                                                                                                    1. 1

                                                                                                      Jup, it’s not even very hard. It’s just not a priority, so it’s never fixed. Also, most team leads don’t usually don’t do the job of running and fixing these tests, so they don’t really feel the pain of having slow tests.

                                                                                                      I estimate that many steps in the process do about 10 to 1000 times the strictly necessary work:

                                                                                                      • If you want to compare tests results you have to obtain output from a big excel file, that is about 60 MB big (even though you just need one sheet).
                                                                                                      • If you load new input, you have to reload all the output (100s of MBs)
                                                                                                      • If you do a test, you first load all the values into memory
                                                                                                  1. 10

                                                                                                    It’s rather amazing how, over almost half a century, UNIX’s design flaws not only persist, but are celebrated. Notice that the user here is being chastised, by not using this new ’‘ShellCheck’’.

                                                                                                    The fatal error here is that // is not a comment in shell scripts. It’s a path to the root directory, equivalent to /.

                                                                                                    UNIX has many different, inconsistent, and rather nonsensical syntaxes, but it’s the user’s fault for confusing some of them.

                                                                                                    (Why can’t or won’t rm simply refuse to delete /* too? Because it never sees /*: the shell expands it first, so rm sees /bin /boot /dev /data …. While rm could obviously refuse to remove first level directories as well, this starts getting in the way of legitimate usage – a big sin in the Unix philosophy)

                                                                                                    Why, this very same issue is mentioned in ’‘The UNIX-HATERS Handbook’’, published in 1994. Right, commands can’t see the real command line that invoked them, bypassing already meagre checks. Then, rm can of course delete system files without any contesting, because it’s not as if UNIX has a coherent idea of what constitutes what anyway. Finally, rm removes files, with little practical chance of ever recovering them and certainly none that is reliable and ’‘official’’. Any single one of these qualities would’ve prevented this, but that then gets in the way of doing ’‘clever things’’, such as deleting a database you actually want to keep.

                                                                                                    ’‘UNIX was not designed to stop you from doing stupid things, because that would also stop you from doing clever things.’’ –Doug Gwyn

                                                                                                    I can really see how checks against accidental deletion, protection of special files, a forgiving file management program, or a real notion of file versioning prevents me from doing clever things, sure, yes.

                                                                                                    Shell scripts are really convenient, but also have a large number of potential pitfalls. Many issues that would be simple, fail-fast syntax errors in other languages would instead cause a script to misbehave in confusing, annoying, or catastrophic ways.

                                                                                                    All joking aside, how could anyone find this acceptable? In general, I use Common Lisp for ’‘scripting’’, because I know Common Lisp can survive spaces and newlines in file names, along with having real error handling.

                                                                                                    The next time you’re writing a script, your editor will show warnings and suggestions automatically. Whether or not you want to fix the more pedantic style issues, it may be worth looking at any unexpected errors and warnings. It might just save your database.

                                                                                                    Well, at least we can all agree that it’s never, ever UNIX’s fault under any circumstances. If only he’d used SpellCheck beforehand, yes. It’s his fault, entirely.

                                                                                                    Joking aside, again, this is clearly unacceptable. In any case, it’s amusing to see the same things from ’‘The UNIX-HATERS Handbook’’ in a modern context. This will probably never be fixed. Here’s a link to the book:

                                                                                                    http://web.mit.edu/~simsong/www/ugh.pdf

                                                                                                    1. 8

                                                                                                      I don’t think anything here is the fault of UNIX design - it’s the fault of someone not learning the basics of the language they’re programming in. Lisp isn’t a magic bullet, it won’t fix the programmer.

                                                                                                      1. 3

                                                                                                        Syntax errors are not problems in Common Lisp?

                                                                                                        1. -1

                                                                                                          Correct. They simply prevent the program from running, rather than having the disastrous results they do in shell scripts.

                                                                                                          1. 9

                                                                                                            I really like the simplicity of lisp syntax and shell syntax is a mess, but simply leaving out a ’ or getting mixed up about nesting can totally change the meaning of a lisp program without causing a syntax error.

                                                                                                      1. 2

                                                                                                        Hoping this doesn’t spark a huge debate, but any concerns about using non-ECC ram for your ZFS setup? Is it less of a concern considering it’s just a mirrored-drive configuration?

                                                                                                        I did check and read your earlier post https://vermaden.wordpress.com/2018/06/07/silent-fanless-freebsd-desktop-server/ where you discuss this.

                                                                                                        1. 3

                                                                                                          I read extensively about the topic when I implemented a similar ZFS mirror for home use, and my conclusion then was basically that almost everyone recommending against ZFS on non-ECC memory used rhetoric that sounded a lot like FUD. The few people that were able to explain the issue in technical detail seemed to be of the opinion that “yes, non-ECC memory is bad. But it’s bad regardless of what file system you use, and even worse with non-ZFS filesystems.”

                                                                                                          1. 2

                                                                                                            any concerns about using non-ECC ram for your ZFS setup

                                                                                                            This is a perpetuated myth that somehow refuses to die. From one of the authors of ZFS:

                                                                                                            There’s nothing special about ZFS that requires/encourages the use of ECC RAM more so than any other filesystem. If you use UFS, EXT, NTFS, btrfs, etc without ECC RAM, you are just as much at risk as if you used ZFS without ECC RAM. Actually, ZFS can mitigate this risk to some degree if you enable the unsupported ZFS_DEBUG_MODIFY flag (zfs_flags=0x10). This will checksum the data while at rest in memory, and verify it before writing to disk, thus reducing the window of vulnerability from a memory error.

                                                                                                            Source: https://arstechnica.com/civis/viewtopic.php?f=2&t=1235679&p=26303271#p26303271

                                                                                                            1. 2

                                                                                                              ECC RAM helps ZFS the same way as any other filesystem. I am not sure why such urban legend arose that you can get away with UFS/EXT4/XFS/… on non-ECC RAM and while using ZFS on non-ECC RAM will mean trouble.

                                                                                                              Its the same for all filesystems, unreliable RAM means trouble - if you want to be sure and can afford motherboard/platform/RAM that supports ECC RAM then do it without any doubt but after using ZFS for years without ECC RAM (because of costs) I did not experienced any problems. Also when you have ZFS mirror or RAIDZ then this problem will be detected.

                                                                                                              A lot of people write that price difference between ECC memory and non-ECC memory is that small then there is no point in doing non-ECC memory setups … but its not the whole truth.

                                                                                                              To support ECC memory on the Intel CPU you will need Intel Xeon CPU or ‘server oriented’ Atom CPU. None of the Celeron/Pentium/Core lines support ECC RAM, only Xeon does. Besides 3-5 times more expensive CPU you will end with 2-3 times more expensive motherboard while the increased costs of ECC RAM sticks will be minimal.

                                                                                                              Its better in the AMD world where ALL AMD CPUs support ECC memory - but you still need to get motherboard that supports that ECC memory - still increased costs besides just ECC RAM sticks.

                                                                                                              Even from my blog post - https://vermaden.wordpress.com/2018/06/07/silent-fanless-freebsd-desktop-server/ the difference in motherboard cost to support ECC RAM is 6 times! I was not able to find cheaper Mini ITX motherboard with ECC RAM support that will also have very low - less then 15W - TDP.

                                                                                                               $49  non-ECC  ASRock J3355B-ITX 
                                                                                                              $290  ECC      ASRock C2550D4I
                                                                                                              
                                                                                                              1. 2

                                                                                                                Not only is the ram more expensive, the QVL (qualified vendor list) for all the motherboards I looked at only listed a single ECC SKU on their support charts (alongside ~100+ non-ECC SKUs).

                                                                                                                Good luck finding the right one in stock nearby - and finding an alternative is hard, as ECC has several variants and “ECC support” on a motherboard means it supports at least one of them (but they don’t tell you which).

                                                                                                            1. 1

                                                                                                              This is a good article! I’ve been thinking of writing something similar, except with a list of all open source applications that are much better than the proprietary alternative. Those cases should truly be celebrated! Other examples include darktable, PostgreSQL, and perhaps QGIS too.

                                                                                                              1. 9

                                                                                                                I don’t have comments on my blog. I don’t need other people’s opinion there. That place is for my opinion.

                                                                                                                This also saves me from fighting spam, from assisting anybody tracking my visitors, and also for aving to replace by and advising others to do so.

                                                                                                                1. 8

                                                                                                                  I’ve been of the opinion that in the cases where comments actually add something meaningful and their author cares enough, they can just email me and I’ll publish their comment manually, as part of the article text.

                                                                                                                  That seems like a more useful deal for both me and my readers.

                                                                                                                  1. 3

                                                                                                                    I agree.

                                                                                                                    I’m not specifically against different opinions than mine, simply opening up my place on the internet to being a public forum is not something I want. If someone wishes to give feedback, then it is possible to find the ways for that. This hurdle is too much for most spammers or not constructive discourses to be kept away from me.

                                                                                                                    1. 2

                                                                                                                      I mean, I agree, but I added this explanation to my blog 5 years ago and have yet to get a single comment emailed in. Maybe my blog just isn’t interesting enough.

                                                                                                                      I did get an email from someone offering a “correction” to my book-reading list page where he noticed duplicate entries and didn’t realize that people read books more than once on purpose.

                                                                                                                    2. 2

                                                                                                                      The value-add of Disqus is that they take care of spam (it’s a centralized service a bit like Gmail).

                                                                                                                      If I had to have comments enabled (for business reasons) I’d be happy to use a service like Disqus.

                                                                                                                      1. 2

                                                                                                                        For a business Disqus can be a reasonable choice. Most business track their users heavily anyways.

                                                                                                                    1. 1

                                                                                                                      This seems like important research. One thing that strikes me as really odd is

                                                                                                                      The choice of the subject systems was driven by the will to consider systems having different size (ranging from 0.4 to 868 KLOCs), belonging to different application domains (modeling tools, parsers, IDEs, IR-engines, etc.), developed by different open source communities (Apache, Eclipse, etc.), and having different lifetime (from 1 to 19 years).

                                                                                                                      Does the high variance of the group make inter-project comparisons difficult? This wide array of variables makes it almost certain they’ll find a false positive somewhere, no?