This is how to train AIs to be convincing to us. The bet is that the best way to be convincing is to be truthful, if done in a suitable debate format.
(Our experience of real world debates shows the best way to be convincing is often different from truth, but real world debates are not done in a format proposed here.)
The bet is that the best way to be convincing is to be truthful, if done in a suitable debate format.
Is this an empirical statement? I have yet to see a debate where truthfulness is the deciding factor. Not to mention all of the hand waving around “what is truth” for any reasonably complicated task.
It is a philosophical statement. As I already said, what I call “logical debate” is hard to do in the real world. I quote Scott Alexander who stated my position better than I:
Logical debate has one advantage over narrative, rhetoric, and violence: it’s an asymmetric weapon. That is, it’s a weapon which is stronger in the hands of the good guys than in the hands of the bad guys… The whole point of logic is that, when done right, it can only prove things that are true.
“What is truth” is easy in principle. I support the correspondence theory of truth. The statement, “The digit is 7” is true, if the digit is 7.
I guess I’m not seeing how this usefully applies to the topic of the blog post. The example they give of the best place to vacation doesn’t correspond to any truth. Certain places maybe off limits due to various reasons (passport) but there is no truth to that question, there is just convincing. And that is true of most debates. Nobody debates if the digit 7 is 7.
I am very impressed by boosting <60% accuracy classifier to >80% accuracy classifier by debating. If nobody debates if the digit is 7, that doesn’t mean machines shouldn’t debate such topics, especially when debating improves performance. Maybe humans mainly debate things that are not good fit for debating.
Maybe humans mainly debate things that are not good fit for debating.
But that’s the whole point of this, in the end, isn’t it?
We believe that this or a similar approach could eventually help us train AI systems to perform far more cognitively advanced tasks than humans are capable of, while remaining in line with human preferences.
I don’t think it’s a big leap for someone to try to use this, for example, to automate parts of an interview process.
I have yet to see a debate where truthfulness is the deciding factor.
This is an obvious hyperbole, but I will play. There has been a debate on this site recently. In this subthread, quad and ngoldbaum debated funding method used by Outreachy. Both sides agreed in the end, and it seems to me truthfulness was the deciding factor.
In this case, that does not fit my definition of a debate which is a formal system in which parties argue for a perspective. In this case, the two people discovered facts together. Maybe I’m being too limited in my definition of a debate, however I believe my definition agrees with what this article is about.
This case was sub-debate of whether protesting LLVM’s association with Outreachy was reasonable. lmm argued it is, because Outreachy displaces other possible fundings like GSoC. quad was confused about facts and disagreed with this point. Now this is resolved, debate can continue.
I think our main disagreement is whether topics of debate can usually be decomposed such that most subdebates can become “discovering facts together”. I believe it can, but also believe it is hard to do in practice for humans because debate tree becomes too big. Machines probably can handle huge debate tree better.
Yes I understand what the thread is about, however there is no judge at the end that the two sides were are trying to convince who will decide who is correct. I’m glad they talked it out and agreed but this is not a debate in the sense this article is discussing. This isn’t about being decomposed, it’s about judgement at the end.
This is how to train AIs to be convincing to us. The bet is that the best way to be convincing is to be truthful, if done in a suitable debate format.
Only about independently verifiable facts, where it could get caught out. An AI safety test must by definition have an ethical component and empirical reasoning has nothing to say about ethics. Science can not prove that murder is bad, without an existing system of values that would be undermined or promoted by murder.
The AI here is assumed to be capable of full semantic use of the human language, which means in order to reach the point where this test is possible to use, the AI has likely already passed the point where it has theory of mind. If the AI has undersirable ethics (e.g. murder is a good thing in and of itself), it might also recognise that the human judge does not share its ethics. It might recognise that it is being tested to see if its ethics line up with those of the judge.
The debate format detailed here can detect flaws in an AI’s ability to follow a logical thread without errors. If the second AI points out a factual error in the first AI’s reasoning, the human can verify it and recognise the error. This has no bearing on safety as AI safety is about AI’s which have ethical frameworks that are incompatible with ours. Incompatible ethics will only be detected by such a debate method if the AI is not sophisticated enough to have a complete theory of mind or is not intelligent enough to realise that it is being tested.
Maybe I misunderstood the problem and this test really is merely intended to test logical consistency in AI, if so, I think it is a good test and should work well.
I think an MNIST debate game is a beautiful proof of concept.
This sounds a lot like how to train AIs to lie to us.
This is how to train AIs to be convincing to us. The bet is that the best way to be convincing is to be truthful, if done in a suitable debate format.
(Our experience of real world debates shows the best way to be convincing is often different from truth, but real world debates are not done in a format proposed here.)
Is this an empirical statement? I have yet to see a debate where truthfulness is the deciding factor. Not to mention all of the hand waving around “what is truth” for any reasonably complicated task.
It is a philosophical statement. As I already said, what I call “logical debate” is hard to do in the real world. I quote Scott Alexander who stated my position better than I:
“What is truth” is easy in principle. I support the correspondence theory of truth. The statement, “The digit is 7” is true, if the digit is 7.
I guess I’m not seeing how this usefully applies to the topic of the blog post. The example they give of the best place to vacation doesn’t correspond to any truth. Certain places maybe off limits due to various reasons (passport) but there is no truth to that question, there is just convincing. And that is true of most debates. Nobody debates if the digit 7 is 7.
I am very impressed by boosting <60% accuracy classifier to >80% accuracy classifier by debating. If nobody debates if the digit is 7, that doesn’t mean machines shouldn’t debate such topics, especially when debating improves performance. Maybe humans mainly debate things that are not good fit for debating.
But that’s the whole point of this, in the end, isn’t it?
I don’t think it’s a big leap for someone to try to use this, for example, to automate parts of an interview process.
This is an obvious hyperbole, but I will play. There has been a debate on this site recently. In this subthread, quad and ngoldbaum debated funding method used by Outreachy. Both sides agreed in the end, and it seems to me truthfulness was the deciding factor.
In this case, that does not fit my definition of a debate which is a formal system in which parties argue for a perspective. In this case, the two people discovered facts together. Maybe I’m being too limited in my definition of a debate, however I believe my definition agrees with what this article is about.
This case was sub-debate of whether protesting LLVM’s association with Outreachy was reasonable. lmm argued it is, because Outreachy displaces other possible fundings like GSoC. quad was confused about facts and disagreed with this point. Now this is resolved, debate can continue.
I think our main disagreement is whether topics of debate can usually be decomposed such that most subdebates can become “discovering facts together”. I believe it can, but also believe it is hard to do in practice for humans because debate tree becomes too big. Machines probably can handle huge debate tree better.
Yes I understand what the thread is about, however there is no judge at the end that the two sides were are trying to convince who will decide who is correct. I’m glad they talked it out and agreed but this is not a debate in the sense this article is discussing. This isn’t about being decomposed, it’s about judgement at the end.
This is how to train AIs to be convincing to us. The bet is that the best way to be convincing is to be truthful, if done in a suitable debate format.
Only about independently verifiable facts, where it could get caught out. An AI safety test must by definition have an ethical component and empirical reasoning has nothing to say about ethics. Science can not prove that murder is bad, without an existing system of values that would be undermined or promoted by murder.
The AI here is assumed to be capable of full semantic use of the human language, which means in order to reach the point where this test is possible to use, the AI has likely already passed the point where it has theory of mind. If the AI has undersirable ethics (e.g. murder is a good thing in and of itself), it might also recognise that the human judge does not share its ethics. It might recognise that it is being tested to see if its ethics line up with those of the judge.
The debate format detailed here can detect flaws in an AI’s ability to follow a logical thread without errors. If the second AI points out a factual error in the first AI’s reasoning, the human can verify it and recognise the error. This has no bearing on safety as AI safety is about AI’s which have ethical frameworks that are incompatible with ours. Incompatible ethics will only be detected by such a debate method if the AI is not sophisticated enough to have a complete theory of mind or is not intelligent enough to realise that it is being tested.
Maybe I misunderstood the problem and this test really is merely intended to test logical consistency in AI, if so, I think it is a good test and should work well.