1. 26

  2. 10

    Good article, here were the key points that stood out to me:

    The words we use to describe the reality forge our understanding of it.

    Absolutely agree, though I would say “the words we use forge the story we convince ourselves is reality”.

    Pulling back the poetic imagery and talking about ML approaches as simply statistical approaches is highly clarifying. And spelling it out as “neural networks approximate a function between two sets”, it’s a lot easier to see how limited certain projects are, like the discrimination example you provided. The technique doesn’t approximate how “risky” an individual is, it approximates how we perceive the riskyness of individuals. Which, surprise! is highly biased, especially relating to race.

    Your points about responsibility are especially pertinent. Without substantial evidence (as you outline) the tools are just moving bias from one place to the other, and the people choosing to make choices based on that evidence are responsible for that choice.

    1. 5

      the tools are just moving bias from one place to the other

      The funny part is that, by moving the prejudice in the machine, we make it observable.

      When somebody talks about the risks of AI, he diverts attention from the true danger: the people using it.

      1. 3

        Yes an AI should never be responsible, only the people who trained it and the people who use it. After all if it could reason and think on its own it really shouldn’t be owned.

    2. 3

      This is relevant to a side project of mine. I wanted to train a neural net Q-learner (with no pre-training or stuff-I-know-about-the-game feature engineering) to play Ambition. To add slightly (actually, only slightly) to the difficulty, I’ve been rolling my own libraries for statistics and linear algebra, because I want it to run in “just plain C”, out of the box, so people can ideally download a small set of files and not only play against the AI, but improve it. Stats and linear algebra aren’t hard to write; but it’s a bitch to debug, e.g. a backprop algorithm written in C that seg-faults because you wrote malloc(n_weights * sizeof(double)) as opposed to malloc((n_weights + 1) * sizeof(double)), forgetting the bias. It’s also disgustingly easy to write algorithms that compile, run, and fully look like they work but, e.g. fail to update biases in hidden layers.

      So far, no luck. That’s after checking the obvious– was the backprop correctly getting the gradient? yes. was the learning rate reasonable? yes. I had about 600 dimensions and tried to get it to learn a heuristic, and it came out with some overfitting, whilst spitting zeros (i.e., not learning anything, and minimizing error by guessing the mean) on validation data. It’s not surprising to me that I ran into this difficulty. The universal do-my-homework machine doesn’t exist, and the more one learns about neural nets, the more one realizes just how much data they need to train (and how much work it takes).

      A more hybrid approach– mixing automated function-learning in small spaces (< 20 dimensions) with prior knowledge about the game, in order to build a painstaking but reliable MCTS player– would have worked much faster for my particular problem; but I wanted it to teach me about the game. So far, no luck. But I’m far from out of ideas.

      Neural nets are good for images because, especially if we use convolutional topologies, we can specify translation and rotation invariance. Those are also high-dimensional spaces where a few features are meaningful, but hard to specify in “this implies that” terms. Pixel (383, 110) means absolutely nothing, except in relation to Pixel(383, 109). Hence, convolution. Neural nets can also be good with games because the program can generate new data by playing against itself. It can still do a bad job of exploring the strategy space (reinforcement learning is far from a solved problem) but the classical danger of overfitting to a small training set isn’t there; your training set is effectively infinite (although, if it explores the strategy space poorly, biased).

      Are NNs the best function approximators in general? Sadly, I think they’re often not. That’s something I’m exploring: sparse neural nets and evolutionary algorithms. With fully-connected layered networks, not only is there a risk of overfitting (which is a fancy word for “finding a too-complex function that fits the training data beautifully, but performs badly on new data”), but it’s computationally expensive to update tens of thousands of weights each step and evaluate a function where you know that much of the work cancels out other work.

      Real talk: If you have “only” 1,000,000 training points in a noisy space, and 10,000 input dimensions– that’s 75 GB of data– even ordinary linear regression is likely to overfit. (You can regularize, but now you have to tune the hyperparameters.) You can’t reasonably fit more than a few hundred features– which may be quadratic interaction terms or non-linear transfer functions, but won’t all be transforms of linear combinations of transforms of linear combinations of transforms of …– and neural nets get you nowhere in picking the right ones. Neural nets try to regularize with early stopping (quitting “optimizing” when performance falters on a validation set) but that often means that you’re at risk of quitting when it has barely learned anything (more than a linear model would) if only because it’s starting to learn “wrong” things.

      For an example where neural nets do a great job of fitting training data, but don’t make sense outside of it, use the XOR set, {(0, 0) -> 0, (0, 1) -> 1, (1, 0) -> 1, (1, 1) -> 0}. What’s the “best” interpolation of the never-seen value of (0.5, 0.5)? I would say that it’s 0.5; the simplest interpolation function is a quadratic with a saddle-point there, in the middle. You want the simplest, “tightest”, function for interpolation. But train a neural net, and you’ll quickly get one that gets those four points perfectly; but as for (0.5, 0.5)? All bets are off. Traditional neural network applications end up needing to use multiple tiers of segregated data– training, topology validation, method validation, a true test set; then, finally, independent tests or a production environment which are the truest of true test sets– but this has its own issues. If nothing else, it takes forever unless you’re at a Facebook-level quantity of resources (computation and data).

      To show that your neural network is not “trained to discriminate” you simply have to declare the function you tried to approximate and

      • provide (and thus safely store, for years) the whole data sets used to calibrate the network, including the one used for cross-validation;
      • provide (and thus safely store, for years) the initial values you choose for each topology you tried, and obviously the topologies;
      • disclose the full source code, with documentation;
      • hire an independent team of experts to verify the whole application.

      The issue here is that, if a neural network’s parameters and topology are known, it’s relatively easy to concoct an adversarial example that it gets wrong. Imagine explaining the brilliance of your self-driving car in court, and then watching as the opposing counsel presents a “Speed Limit 70” sign that it mistakes for a deer about to run across the road.

      I think that there’s a metacognitive issue here with the hype around “machine learning”. A few people in the world (I’m not one of them) deeply understand multilayer, fully-connected networks– what they can do, what they can’t do– in a way that requires years of experience, and all the tricks necessary to make them work on noisy data in the real world. Many more people– especially in the anti-intellectual environment of the corporate world, where taking time to understand things is the sin of “learning at work”– see NNs as an all-purpose do-your-homework algorithm (as opposed to all that “boring” stuff in the Hastie book that every data scientist calls “The Bible” and, like the Bible, few who claim to adore it have actually read) that’s more mystical than boring logistic regression coupled with domain-specific feature engineering. Neural nets are great at some things but, for many problems, they require either a massive amount of handholding or an extreme supply of computing resources to train effectively.

      1. 1

        The issue here is that, if a neural network’s parameters and topology are known, it’s relatively easy to concoct an adversarial example that it gets wrong. Imagine explaining the brilliance of your self-driving car in court, and then watching as the opposing counsel presents a “Speed Limit 70” sign that it mistakes for a deer about to run across the road.

        Good point: an adversarial example easily falsify a declaration about the target function of the neural network.
        This has obvious application in digital forensics, but even worse, it could be used as a weapon!

        Suppose a self-driving car fooled by a constructed toy that can be easily removed after the car crash.

        However, I don’t foresee a future where the insurance requires that an artificial intelligence drives your car.
        Not because of technological issues with this, but because the automobile manufacturers would be directly liable of deaths occurred in their cars by default.

        And I guess they don’t want this to happen.