Unfortunately, papers like this contain a critical flaw. They assume that AI systems are driven by a certain class of control mechanism. This class doesn’t have a formal name but we could loosely call it a Goal Stack With Utility class of mechanisms. Their central feature is that the system tries to maximize a reward signal, and in so doing acts intelligently.
The problem is that such mechanisms only work on toy problems. They do not scale. They arguable never will scale up to the point where they can drive a real AI (an AGI). Part of the reason they don’t scale is that they break down in precisely the kinds of ways described in papers like this one …. except, they do this all the time, in such a way that (a) they continually exhibit stupidity, and (b) during learning, they add corrupted (false) new knowledge to their system in such a way as to make the stupidly increase over time.
So, what this type of paper does is to assume that the AI never suffers these failures during its formative period, but instead blossoms into a superintelligent and therefore dangerous AI. Then, the authors implicitly assume that the AI suddenly reverts to type when it is an adult … and they point out all the ways in which the AI could be dangerous. The critical flaw, as you can see, is that these dangers they point to would already have prevented the AI from becoming intelligent in the first place.
I published all of this in a 2014 AAAI Symposium paper called The Maverick Nanny With A Dopamine Drip.
Can you link to your paper?
Apologies for forgetting the link. It can be found here: https://www.aaai.org/ocs/index.php/SSS/SSS14/paper/viewFile/7752/7743