1. 26
  1.  

  2. 13

    I have a theory that popularity of vision and language is due to data. There are endless amount of unlabeled data, and labeled data can be crowdsourced cheaply.

    People like Horvath are hailed as genius-level polymaths in molecular biology for calling 4 scikit-learn functions on a tiny dataset.

    Looking at https://en.wikipedia.org/wiki/Epigenetic_clock, I read:

    Horvath spent over 4 years collecting publicly available Illumina DNA methylation data… (snip) The age estimator was developed using 8,000 samples from 82 Illumina DNA methylation array datasets.

    It is both true that 8,000 samples are tiny, and it took 4 years to collect. Majority of machine learning effort is data collection, not data modeling. Data collection is easier with vision and language, even though data modeling is higher impact elsewhere.

    1. 3

      I have a theory that popularity of vision and language is due to data.

      Well to add to this, CV and NLP are old fields. They’ve been collecting and cleaning datasets for decades now. Applying newer NN-based ML techniques was as easy as taking an academic dataset and training the nets. Other fields don’t have nearly the same history of pedagogy and so they’re probably going to take a lot longer to collect data (usefully) for.

      1. 3

        Bio is also special…. Consider a genome. Most of your genome is the same as everyone else’s, so even though it’s a lot of data, the degrees of freedom are considerably lower. Also, even if we were to sequence every American and european, that’s only 750M data points (most of which self correlate)… Wiipedia alone kas 3.9 billion words.

        1. 2

          This would be true for just DNA and genetics. If you include epigenetic information and start going into gene expression in different cell types/tissues, there’s probably a lot more variation, but I don’t think we’ve quantified it yet.

        2. 2

          I agree with this, for hard problems you can’t just use mturk to label commonly available data

          1. 1

            I would attribute the popularity of language and vision more to problems that can be well modeled on GPUs w/ neural nets, as well as having massive amounts of labelled data. ImageNet existed well before the computer vision breakthrough by Hinton’s team. It was applying GPUs for process + neural networks that did the trick.

          2. 4

            I think that the human-level intelligence is also partially a way to enable more people to “access” APIs, especially the people with less affinity towards technology.

            Just as an example, consider ordering a pizza. If you are able to order a pizza via a website I have never not seen the option to add custom text regarding the order. You simply will have people that do not have their needs met by the ordering mask or do not find the option they really want. In this case it can make sense to have a computer attempt to decipher the customers wish.

            This could also be applied to ordering via phone, which is still very common in Germany. This would be another avenue where voice recognition could make sense and improve efficiency. But ultimately, those problems are intricately connected to UX and not necessarily to fancy ML models, too.

            In the end these types of ML models reach more people and it’s important that they do not feel left behind, which can cause rejection and opposition to modern technology. That is a very social problem and I am not convinced that there is a technological solution to it.

            1. 3

              I think I agree with this, it might be that the “left behind” angle is one I don’t consider enough when thinking about this, ultimately the vast majority of people, can’t use google maps, heck, even many professionals like cab drivers which would benefit immensely from it.

              I guess I’m not sure how much of this can be fixed by “education” once technology becomes to useful to be ignored, but it’s the above google map example that made me hedgy on this, after all it’s a prime example of a life-transforming technology that many people just ignore.

              Maybe in some cases it’s just more fun to live life without technology? I’ve heard this argument made by literate people that preferred using paper maps and compass navigation just because it was exciting. But under that scenario I’m not sure simpler UX would help, or that any help is needed per se

              1. 8

                I have a personal theory that many people firmly believe, in a deep, maybe even unconscious way, that they are entitled to stop learning new things at some point in their lives. It’s something I’ve been noticing more and more with the pandemic cutting off so many “non-tech” ways of doing things.

                1. 3

                  I have noticed this too. It really pains me. I personally favour the hypothesis that our education systems are to blame. All humans start off with curiosity as a primary motivator, perhaps the primary motivator after survival stuff. It seems there is a strong correlation between attending a few years of school and this no longer being a strong motivator for many people. As far as causation goes there are reasons to favour this hypothesis - the degree to which curiosity is actively suppressed in most education systems, and also the degree to which education is made to be associated with bland, boring and generally unpleasant hard work, rather than discovery. Unfortunately there is not a lot of good data regarding well-educated individuals that have not been subjected to a classical western style education. There are alternative systems like Montessori but there is no clear way to quantify cognitive ability, we use western school performance for that in most instances. The Finnish school system is an interesting case study though.

                  This may all seem off topic, but to tie it in to the main theme: It is actually trivial to create a human level intelligence. Many people do it by accident when they are drunk. This makes creating them artificially doubly redundant. I think that parallel to improving inhuman AI as the article suggests, we should be looking into improving the human level intelligence we have easy access to, which is mainly a software problem. Most of the software that is widely available for the human brain is decades out of date and designed for applications that are no longer needed.

                  1. 1

                    To some extent though this might be a good conservative instinct.

                    Don’t get me wrong, I like being in some “privileged” 0.01% of the world’s population that knows the technology well enough to make 6 figures writing stuff on a laptop, I’m sure most people here feel the same way.

                    But if I had to choose between being in the top 20% or in the bottom 20% I’d choose to bottom.

                    Technology very quickly becomes a tool that most people end up using to fulfil destructive patterns more than anything.

                    That’s not to say it has no uses, it does, but it’s way too easy to get into pitfalls if you spend too much time on computers and don’t understand how good certain programs are at manipulating your preferences.

                    1. 1

                      I’ve noticed it myself with myself. After 30 years in the computer industry, I’m just sick and tired of having to relearn the hot new development methodology. Again. For the umteenth time. Can’t things just settle down for a few years? It seems as if the singularity has hit and no one can keep up.

                    2. 3

                      I suppose to some people it is more fun. I write my day-to-day plans still on paper because I prefer it to electronic bookkeeping. The battery of a sheet of paper cannot run out, although my phone hardly does either these days. But there still is something that makes me prefer to write them by hand, it also helps me remember what I actually wrote down. To some extent it is also control over the medium; I know that I am responsible if I lose it or it somehow is destroyed. I guess the aspect of control and being in charge can also translate to navigation or in general other aspects of life (although I have to agree with you, navigating via online maps is much better).

                      Potentially education could help with furthering usage of technology. But it could also be that most people are just stubborn and want to continue with their way of life indefinitely. Depending on how big that share of people is, it is important to make new non-human intelligence backwards compatible, so to say. Then once most people stop relying on the old technology (die out?) it can be deprecated.

                  2. 3

                    Maybe I am missing the point of the post, but I don’t agree with some of the points you raise.

                    Human brains are shit at certain tasks, things like finding the strongest correlation with some variables in an n-million times n-million valued matrix. Or heck, even finding the most productive categories to quantify a spreadsheet with a few dozen categorical columns and a few thousand rows.

                    That might be true of individual brains, but you can put together a lot of people and get to the same result. See how mathematical computations were carried out by humans computers before the introduction of electronic computers.

                    Similarly, I don’t agree with the overall framing of your critique. You can say something like “Human bodies are shit at certain tasks, things like digging a 100ft long 10ft deep trench. Or heck, even walking more than a few miles a day or even run a few hundred feet”. We have excavators and cars for a reason. Similarly, we have computers to do “boring” computation work for us.

                    This is not to say that the scientific establishment is doomed or anything, it’s just slow at using new technologies, especially those that shift the onus of what a researcher ought to be doing.

                    Well, isn’t there the saying that suggests scientific progress happens one funeral at at time? It feels to me that the problem is that researchers tend to ask questions that they can reasonably answer. The questions that they ask are informed by their methods and theoretical preparation. So, if a researcher doesn’t have a good understanding of machine learning models and what they could do, they will probably not think of a research line that will leverage these new methods to move the field forward. Instead, they will probably just apply them toy to an old problem (which will probably be solved by an old boring method just as well).

                    I wonder if we are at the intersection of “old” and “new”. While we are at the intersection, we are still applying the “new” models to the “old” problems. Our problem space hasn’t opened up yet to the possibilities that the “new” methods opened (maybe because we haven’t figured them out yet). That makes me wonder whether we need to push ourselves to imagine and tackle new, unexplored problem spaces that machine learning (or any new technology) has opened for us instead of trying to solve old problems with the new methods.

                    1. 2

                      That might be true of individual brains, but you can put together a lot of people and get to the same result. See how mathematical computations were carried out by humans computers before the introduction of electronic computers.

                      A 2-layer neural network can learn the tricks of 200 years of research into signal processing from a bunch of domain-specific training data: https://www.youtube.com/watch?v=HtFZ9uwlscE&feature=youtu.be

                      That being said I don’t think I have an argument against this, but my intuition based on how many applications nns have found in terms of outdoing equations in hard sciences like physics and chemistry is that we aren’t super good at this, even if it’s 100k of us working on it together.

                      Similarly, I don’t agree with the overall framing of your critique. You can say something like “Human bodies are shit at certain tasks, things like digging a 100ft long 10ft deep trench. Or heck, even walking more than a few miles a day or even run a few hundred feet”. We have excavators and cars for a reason. Similarly, we have computers to do “boring” computation work for us.

                      In hindsight, yeah, I think the framing is a bit off.

                    2. 3

                      Doesn’t the callout of the biologist directly contradict the premise of the post? “Calling 4 scikit-learn functions on a tiny dataset” sounds like “boring machine learning” to me, yet he’s being criticized in this post which ostensibly promotes boring machine learning.

                      Is the problem that that application of ML is too boring for the author? That boring ML must simultaneously be cutting-edge in order to be worthy of praise?

                      1. 5

                        The article calls out people that hailed him as a genius-level polymath, not Horvath himself. Having said that I think being able to navigate academia, have an expert understanding of biology, and also have a decent grasp of machine learning is pretty close to being a genius level polymath :-p

                      2. 2

                        Boring things are good because we can rely on them. It enables automation proper. It’s the old AM/FM problem once again: Actual Machines vs Fucking Magic.