1. 7
  1.  

  2. 3

    My intuition for this stuff (also upsampling, style transfer, and many other image to image applications) is that the AI is basically a domain-specific decompression program. With any decompression program, we feed in a small amount of data (e.g. a runlength encoded bitmap image) and get out more data (e.g. a normal bitmap image), but crucially the amount of information doesn’t increase: everything we see in the output was either already present in the input (perhaps in some encoded form), or is an artefact (e.g. the “blockyness” seen in JPEGs). The analogy to decompression isn’t so apparent when we’re turning x-by-y pixels into x-by-y pixels, since the amount of data stays the same, but the idea that we’re “decoding” the input, and that we can’t gain any information (since there’s nowhere for it to come from) is what I’m getting at.

    What worries me with these learned systems is that they’re so specific to the domain that their artefacts are indistinguishable from real content. This is especially true for generative adversarial networks, where the generated data is “100% artefact”, and trained specifically to be indistinguishable from real inputs.

    The failure modes of these systems won’t be things that we’re used to with generic image processing, like “this bush looks blurry” or “the street sign has incorrect colours”, etc. Instead we’ll get very plausible looking images, which turn out to have quite important problems like “image includes a human figure, but it should actually be a trash can”, “streets signs are missing several intersections”, etc. This is very important to keep in mind when thinking of applications for this technology: when the information is sparse or ambiguous, the system will just make something up, and that will be indistinguishable from a real input. One obvious application of this “night vision” in particular is on attack drones, but that may be a very bad idea if it “hallucinates” targets.

    An example of this which comes to mind is the “character substitution” problem on some scanners/faxes/copiers. The idea is to perform OCR on scanned documents, so they can be compressed more easily (e.g. storing “123” instead of all the scanned pixel values of those digits). However, when there’s ambiguity, like a “7” which looks like a “1”, the OCR will pick whichever it thinks is correct (say “1”) and store it just like any other character; losing the information that it could have been something else. When the document gets printed out, a perfect, crisp “1” will appear, which is indistinguishable from all of the correct characters.

    1. 3

      As a general rule of thumb, I try to always store the confidence level or accuracy estimate of anything processed by machine. For example, working in this domain of computer vision, I might process images to denoise and find contours then use SVM to classify what’s in the image and only store tags that have, say, 0.9 confidence (out of 1.0). The important step is to store metadata, including the list of tags, with their confidence score and anything else that pertains to accuracy, such as the exact kernel or model that was used. This doesn’t solve the problem, but provides insight as to what happened, how to recreate it, and what other output is now suspect.