People like me have been saying this for quite some time. You could use traditional non-linear optimization techniques here to do even better than what the author’s simple random search does, for example gradient descent.

My old boss at uni used to point out that neural networks are just another form of interpolation, but far harder to reason about. People get wowed by metaphors like “neural networks” and “genetic algorithms” and waste lots of time on methods that are often outperformed by polynomial regression.

I’d say that only a few ML algorithms ultimately pan out as something like gradient descent. Scalable gradient descent is a new thing thanks to the advent of differentiable programming. Previously, you’d have to hand-write the gradients which often would involve investment into alternative methods of optimization. Cheap, fast, scalable gradients are often “good enough” to curtail some of the other effort.

An additional issue is that often times the gradients just aren’t available, even with autodiff. In this circumstance, you have to do something else more creative and end up with other kinds of iterative algorithms.

It’s all optimization somehow or another under the hood, but gradients are a real special case that just happens to have discovered a big boost in scalability lately.

A large part of ML engineering is about evaluating model fit. Given that linear models and generalized linear models can be constructed in a few lines of code using most popular statistical frameworks [1], I see no reason for ML engineers not to reach for a few lines of a GLM, evaluate fit, and conclude that the fit is fine and move on. In practice for more complicated situations, decision trees and random forests are also quite popular. DL methods also take quite a bit of compute and engineer time to train, so in reality most folks I know reach for DL methods only after exhausting other options.

For a two parameter model being optimized over a pretty nonlinear space like a hand-drawn track I think random search is a great choice. It’s probably close to optimal and very trivial to implement whereas gradient descent would require at least a few more steps.

I suppose people typically use neural networks for their huge model capacity, instead of for the efficiency of the optimization method (i.e. backward propagation). While neural networks are just another form of interpolation, they allow us to express much more detailed structures than (low-order) polynomials.

There is some evidence that this overparameterisation in neural network models is actually allowing you to get something that looks like fancier optimisation methods[1] as well as it’s a form of regularisation[2].

neural networks […] allow us to express much more detailed structures than (low-order) polynomials

Not really. A neural network and a polynomial regression using the same number of parameters should perform roughly as well. There is some “wiggle room” for NNs to be better or PR to be better depending on the problem domain. Signal compression has notably used sinusodial regression since forever.

A neural network and a polynomial regression using the same number of parameters should perform roughly as well.

That’s interesting. I have rarely seen polynomial models with more than 5 parameters in the wild, but neural networks easily contain millions of parameters. Do you have any reading material and/or war stories about such high-order polynomial regressions to share?

This post and the associated paper made the rounds a while ago. For a linear model of a system with 1,000 variables, you’re looking at 1,002,001 parameters. Most of these can likely be zero while still providing a decent fit. NNs can’t really do that sort of stuff.

This article explores a situation where the program has perfect information about its environment. Neural Networks are mostly useful when you need help deriving useful high-level features from otherwise unstructured data. In practice, most of the challenge in deploying “AI” is conveying an accurate digital representation of the state of the world to your program.

My favorite blog post on this topic is Hacker Noon’s AI Hierarchy of Needs, which points out the value of basic data cleaning and analysis. It has a great graphic I’ve used in many presentations.

I wonder if machine learning is popular not because it leads to optimal solutions, but because it appears to be a solution path that doesn’t seem to require a lot of knowledge about the problem domain, or even much in the way of mathematics.

The method used in the original video is slow, clumsy and suboptimal, but very likely to be applicable to a completely different problem. It’s presented as a general problem solving skill, rather than a way to solve a specific problem.

I agree with the author here, but, playing devil’s advocate, it is sort of handy to have a tool you can whack a lot of problems with, then move on to other problems. I’m not one of these, but I’m guessing a machine learning practitioner could whack this with a few ML hammers pretty quickly, get something working, then move on to other problems. ML seems to be doing an okay job at “general hammering”

I guess another way of saying that is that I’m not sure this is true:

These polynomials are obviously much faster than a neural network, but they’re also easy to understand and debug.

If you’re on a team of ML people who are very comfortable hitting everything with pytorch, maybe the pytorch solution is easier?

I’m not even talking about the practical applications of the pytorch hammer (and to be clear: I also prefer the author’s approach), I’m talking about its reputation. Every day we’re presented with new, exciting applications for machine learning. One day it’s generating faces, the next day it’s generating text. It’s not strange that people see it as a generalist skill that’s worthwhile to acquire.

The argument is basically “Business Logic is superior to Machine Learning”, which makes perfect sense. However, I think machine learning is still helpful in cases where you have insufficient prior knowledge of the business logic, and thus need to estimate or approximate it.

For example, consider the problem of determining whether an array of pixels represents a picture of a dog or of a cat. It would be difficult to write out the “business logic” for that.

When you have a large dataset and you don’t know what is driving the data to behave a certain way. Coming up with a detailed, knowledge-based model takes a long time. “Throwing” an ML-style model at a problem can be a lot cheaper. Sure if you’re making line-of-business CRUD you’ll rarely run into this, but there are domains where ML can help short circuit a lot of modelling.

The more I hear about machine learning, the more it sounds like bullshit. I used to be in the IoT field (our startup failed) and the ML industry feels similar in the sense that if you yell out buzzwords, old white men with money will fund your idea.

Neural networking and the like captured my imagination when I played the first Megaman Battle Network game. That series has since predicted a lot of online life so I used to be quite sure we were right around the corner to having actual A.I.

What we actually have are glorified if/then/else statements but hey, progress!

I think you’re being overly pessimistic because of startup buzz and some grouchy articles. While Deep Learning models do get a lot of press, the benefits have percolated to all parts of the scientific computing field. From greater attention to hill climbing algorithms to help optimize knapsack solvers to high quality, fast samplers that enable Bayesian modelling, there’s been a lot of progress in scientific computing. I remember a decade ago, K-means, being a non-parametric algorithm, was one of the easiest to use so laypeople loved using it. Other than that and some basic markov chains, most folks just made bespoke models. Even making a linear regression model required often hand-coding the Maximum Likelihood derivation. Nowadays, you can import a library and just let the libraries do the heavy lifting, and get GPU acceleration for free. A layperson can easily run a Generalized Linear Model with only a faint understanding of ML estimators, link functions, or the bias-variance tradeoff, and still get meaningful results.

I highly recommend diving in. There’s a lot to discover.

“Bullshit” is not the problem with machine learning: clearly there are real results, from image style transfer, funny text generation and cat detection to voice assistants and autopilot and so on. Throwing tons of data at statistical algorithms on big computers can do some cool things.

The main problem in that field is the “everything looks like a nail” problem, too many people want to apply the machine learning hammer in creepy (anything trying to classify people, especially for awful purposes like suspecting “criminals” and denying jobs/loans/etc.) and useless (store product recommendations that boil down to “buy the same thing again”) ways. Not enough people realize that some things just shouldn’t be predicted, and we sometimes should embrace chaos, unpredictability, and human judgment. :P

We don’t need machine learning no, nor do we need blockchains. Apparently we need effect-based type systems and optics traversal libraries for 20 byte REST JSON payloads.

Yes, we can and we do solve optimization problems (in this case, optimize difference between distances to sides) without the use of back-propagation. However, watching genetic algorithms and neural networks work is just really fascinating. I use them sparingly at work and mostly work on them in my spare time for entertainment purposes.

Last year I was working on some product ideas, and I worked quite a bit on a better recipe site (many of them suck so much; I didn’t really finish it because it’s hard to monetize, but I do want to get back to it at some point).

At any rate, one of the features I wanted was the ability to extract ingredients from a written recipe. Turns out there’s already some existing ML stuff for that so I spent some time trying to get it to work.

Long story short, I never really got great results with it, never mind that it was pretty slow. I ended up writing about 250 lines of Go code which worked much better. Essentially it just looks for a fixed list of known units (“gram”, “g”, “package”, “can”, etc.) and can get the actual ingredients from there. Nothing fancy or whizzbang about it, but it works quite well and better than my (admittedly novice) ML attempts.

The real value of ML is when the input is so complex that it would be exceedingly difficult to program for all possible inputs. A simple AI driving like this is easy, but real-world driving has so many situations that it’ll be very daunting to program for that.

People like me have been saying this for quite some time. You could use traditional non-linear optimization techniques here to do even better than what the author’s simple random search does, for example gradient descent.

My old boss at uni used to point out that neural networks are just another form of interpolation, but far harder to reason about. People get wowed by metaphors like “neural networks” and “genetic algorithms” and waste lots of time on methods that are often outperformed by polynomial regression.

Most of ML techniques boil down to gradient descent at some point, even neural networks.

Youtuber 3blue1brown has an excellent video on that: https://www.youtube.com/watch?v=IHZwWFHWa-w .

Yep, any decent NN training algorithm will seek a minimum. And GAs are just very terrible optimization algorithms.

I’d say that only a few ML algorithms ultimately pan out as something like gradient descent. Scalable gradient descent is a new thing thanks to the advent of differentiable programming. Previously, you’d have to hand-write the gradients which often would involve investment into alternative methods of optimization. Cheap, fast, scalable gradients are often “good enough” to curtail some of the other effort.

An additional issue is that often times the gradients just aren’t available, even with autodiff. In this circumstance, you have to do something else more creative and end up with other kinds of iterative algorithms.

It’s all optimization somehow or another under the hood, but gradients are a real special case that just happens to have discovered a big boost in scalability lately.

A large part of ML engineering is about evaluating model fit. Given that linear models and generalized linear models can be constructed in a few lines of code using most popular statistical frameworks [1], I see no reason for ML engineers not to reach for a few lines of a GLM, evaluate fit, and conclude that the fit is fine and move on. In practice for more complicated situations, decision trees and random forests are also quite popular. DL methods also take quite a bit of compute and engineer time to train, so in reality most folks I know reach for DL methods only after exhausting other options.

[1]: https://www.statsmodels.org/stable/examples/index.html#generalized-linear-models is one I tend to reach for when I’m not in the mood for a Bayesian model.

Didn’t know about generalized linear models, thanks for the tip

For a two parameter model being optimized over a pretty nonlinear space like a hand-drawn track I think random search is a great choice. It’s probably close to optimal and very trivial to implement whereas gradient descent would require at least a few more steps.

Hill climbing with random restart would likely outperform it. But not a bad method for this problem, no.

I suppose people typically use neural networks for their huge model capacity, instead of for the efficiency of the optimization method (i.e. backward propagation). While neural networks are

justanother form of interpolation, they allow us to express much more detailed structures than (low-order) polynomials.There is some evidence that this overparameterisation in neural network models is actually allowing you to get something that looks like fancier optimisation methods[1] as well as it’s a form of regularisation[2].

The linked works are really interesting. Here is a previous article with a similar view: https://lobste.rs/s/qzbfzc/why_deep_learning_works_even_though_it

Not really. A neural network and a polynomial regression using the same number of parameters should perform roughly as well. There is some “wiggle room” for NNs to be better or PR to be better depending on the problem domain. Signal compression has notably used sinusodial regression since forever.

That’s interesting. I have rarely seen polynomial models with more than 5 parameters in the wild, but neural networks easily contain millions of parameters. Do you have any reading material and/or war stories about such high-order polynomial regressions to share?

This post and the associated paper made the rounds a while ago. For a linear model of a system with 1,000 variables, you’re looking at 1,002,001 parameters. Most of these can likely be zero while still providing a decent fit. NNs can’t really do that sort of stuff.

This article explores a situation where the program has perfect information about its environment. Neural Networks are mostly useful when you need help deriving useful high-level features from otherwise unstructured data. In practice, most of the challenge in deploying “AI” is conveying an accurate digital representation of the state of the world to your program.

My favorite blog post on this topic is Hacker Noon’s AI Hierarchy of Needs, which points out the value of basic data cleaning and analysis. It has a great graphic I’ve used in many presentations.

I’d argue the author’s solution is technically also machine learning just a way simpler method.

Part of me wants to write an article with machine learning methods in the spirit of Evolution of a Haskell Programmer

Please do, I’d love to read it and learn. :-)

I wonder if machine learning is popular not because it leads to optimal solutions, but because it appears to be a solution path that doesn’t seem to require a lot of knowledge about the problem domain, or even much in the way of mathematics.

The method used in the original video is slow, clumsy and suboptimal, but very likely to be applicable to a completely different problem. It’s presented as a

generalproblem solving skill, rather than a way to solve a specific problem.I agree with the author here, but, playing devil’s advocate, it is sort of handy to have a tool you can whack a lot of problems with, then move on to other problems. I’m not one of these, but I’m guessing a machine learning practitioner could whack this with a few ML hammers pretty quickly, get something working, then move on to other problems. ML seems to be doing an okay job at “general hammering”

I guess another way of saying that is that I’m not sure this is true:

If you’re on a team of ML people who are very comfortable hitting everything with pytorch, maybe the pytorch solution is easier?

I’m not even talking about the practical applications of the pytorch hammer (and to be clear: I also prefer the author’s approach), I’m talking about its reputation. Every day we’re presented with new, exciting applications for machine learning. One day it’s generating faces, the next day it’s generating text. It’s not strange that people see it as a generalist skill that’s worthwhile to acquire.

As opposed to plain old boring mathematics.

The argument is basically “Business Logic is superior to Machine Learning”, which makes perfect sense. However, I think machine learning is still helpful in cases where you have insufficient prior knowledge of the business logic, and thus need to estimate or approximate it.

When does that happen?

For example, consider the problem of determining whether an array of pixels represents a picture of a dog or of a cat. It would be difficult to write out the “business logic” for that.

When you have a large dataset and you don’t know what is driving the data to behave a certain way. Coming up with a detailed, knowledge-based model takes a long time. “Throwing” an ML-style model at a problem can be a lot cheaper. Sure if you’re making line-of-business CRUD you’ll rarely run into this, but there are domains where ML can help short circuit a lot of modelling.

The more I hear about machine learning, the more it sounds like bullshit. I used to be in the IoT field (our startup failed) and the ML industry feels similar in the sense that if you yell out buzzwords, old white men with money will fund your idea.

Neural networking and the like captured my imagination when I played the first Megaman Battle Network game. That series has since predicted a lot of online life so I used to be quite sure we were right around the corner to having actual A.I.

What we actually have are glorified if/then/else statements but hey, progress!

I think you’re being overly pessimistic because of startup buzz and some grouchy articles. While Deep Learning models do get a lot of press, the benefits have percolated to all parts of the scientific computing field. From greater attention to hill climbing algorithms to help optimize knapsack solvers to high quality, fast samplers that enable Bayesian modelling, there’s been a lot of progress in scientific computing. I remember a decade ago, K-means, being a non-parametric algorithm, was one of the easiest to use so laypeople loved using it. Other than that and some basic markov chains, most folks just made bespoke models. Even making a linear regression model required often hand-coding the Maximum Likelihood derivation. Nowadays, you can import a library and just let the libraries do the heavy lifting, and get GPU acceleration for free. A layperson can easily run a Generalized Linear Model with only a faint understanding of ML estimators, link functions, or the bias-variance tradeoff, and still get meaningful results.

I highly recommend diving in. There’s a lot to discover.

That actually sounds super neat, wow.

You’re right, my opinion has definitely been shaped by startup buzz!

“Bullshit” is not the problem with machine learning: clearly there are real results, from image style transfer, funny text generation and cat detection to voice assistants and autopilot and so on. Throwing tons of data at statistical algorithms on big computers can do some cool things.

The main problem in that field is the “everything looks like a nail” problem, too many people want to apply the machine learning hammer in creepy (anything trying to classify

people, especially for awful purposes like suspecting “criminals” and denying jobs/loans/etc.) and useless (store product recommendations that boil down to “buy the same thing again”) ways. Not enough people realize that some things just shouldn’t be predicted, and we sometimes should embrace chaos, unpredictability, and human judgment. :PAgreed on the nail analogy. Same as companies thinking a blockchain will improve their workflow.

We don’t need machine learning no, nor do we need blockchains. Apparently we need effect-based type systems and optics traversal libraries for 20 byte REST JSON payloads.

Yes, we can and we do solve optimization problems (in this case, optimize difference between distances to sides) without the use of back-propagation. However, watching genetic algorithms and neural networks work is just really fascinating. I use them sparingly at work and mostly work on them in my spare time for entertainment purposes.

Last year I was working on some product ideas, and I worked quite a bit on a better recipe site (many of them suck so much; I didn’t really finish it because it’s hard to monetize, but I do want to get back to it at some point).

At any rate, one of the features I wanted was the ability to extract ingredients from a written recipe. Turns out there’s already some existing ML stuff for that so I spent some time trying to get it to work.

Long story short, I never really got great results with it, never mind that it was pretty slow. I ended up writing about 250 lines of Go code which worked much better. Essentially it just looks for a fixed list of known units (“gram”, “g”, “package”, “can”, etc.) and can get the actual ingredients from there. Nothing fancy or whizzbang about it, but it works quite well and better than my (admittedly novice) ML attempts.

The real value of ML is when the input is so complex that it would be exceedingly difficult to program for all possible inputs. A simple AI driving like this is easy, but real-world driving has so many situations that it’ll be very daunting to program for that.