Unfortunately, papers like this contain a critical flaw. They assume that AI systems are driven by a certain class of control mechanism. This class doesn’t have a formal name but we could loosely call it a Goal Stack With Utility class of mechanisms. Their central feature is that the system tries to maximize a reward signal, and in so doing acts intelligently.
The problem is that such mechanisms only work on toy problems. They do not scale. They arguable never will scale up to the point where they can drive a real AI (an AGI). Part of the reason they don’t scale is that they break down in precisely the kinds of ways described in papers like this one …. except, they do this all the time, in such a way that (a) they continually exhibit stupidity, and (b) during learning, they add corrupted (false) new knowledge to their system in such a way as to make the stupidly increase over time.
So, what this type of paper does is to assume that the AI never suffers these failures during its formative period, but instead blossoms into a superintelligent and therefore dangerous AI. Then, the authors implicitly assume that the AI suddenly reverts to type when it is an adult … and they point out all the ways in which the AI could be dangerous. The critical flaw, as you can see, is that these dangers they point to would already have prevented the AI from becoming intelligent in the first place.
I published all of this in a 2014 AAAI Symposium paper called The Maverick Nanny With A Dopamine Drip.
Can you link to your paper?
Apologies for forgetting the link. It can be found here: https://www.aaai.org/ocs/index.php/SSS/SSS14/paper/viewFile/7752/7743
This is a neat exploration of some of the various ways that an AI can fail to be safe, exploring with some thoroughness different issues like undesired side-effects during plan implementation, unexpected shortcuts in maximizing rewards, and so forth. It also does all of this without delving into crazy Singularity nonsense, instead preferring the test case of a humble if enterprising office cleaning robot.
OpenAI’s blog post has a nice high-level summary of the research areas discussed in the paper.