Yes! I wanted more work in this area. That and solvers are only way to make it practical. I figure it’s going to be a mix of experts laying supporting theories (eg on arrays, floats) with automation connecting them. After it starts working, we can ask folks not get funded unless they use the solvers and test generators to speed up the work.
Indeed! I wish the authors had included analysis on how much this tool could boost overall productivity. It’s hard to judge how useful this would be as a guided proof assistant.
72 of the 79 generated proofs were <= 10 proof steps, which is impressive given that each proof consists of ~100 commands! However, the commands within the 10-step-proofs consisted of only ~20% of the ~12,000 total proof commands. Why not break the longer (and ostensibly more sophisticated) proofs into shorter ones (as CoqGym did)?
That being said, CoqGym and PB9K1 reaching 50-200% of CoqHammer’s solve rate is impressive: CoqHammer’s backends have had years commercial development (preceded by decades of research). The authors are probably busy improving their algorithms and planning on more sophisticated analysis later. Hopefully we’ll see more qualitative discussion in the future.
Well, your comment tells me there’s at least straight-forward avenues for improvement. Breaking things up. They should try to merge those ideas first just to get better results. That gets more funding and bides time for them coming up with the real leap a year or two later. ;) Also, I’d rather them apply it to more proof efforts to generalize it. The nature of compilers might make them easier to do this on than say OS kernels or data structures mutating graphs.
One problem that may kick in is that I hear tools like Coq change in a way where old proofs don’t work on newer versions. If true, that might make building a training set of exemplar proofs more difficult. If no standardization, might need converters that transform older ones to new representation. I don’t know if that’s going to be just a transpiler or would require understanding the logic to point that it’s improbable/impossible.
Implementation: https://github.com/UCSD-PL/proverbot9001
Yes! I wanted more work in this area. That and solvers are only way to make it practical. I figure it’s going to be a mix of experts laying supporting theories (eg on arrays, floats) with automation connecting them. After it starts working, we can ask folks not get funded unless they use the solvers and test generators to speed up the work.
Indeed! I wish the authors had included analysis on how much this tool could boost overall productivity. It’s hard to judge how useful this would be as a guided proof assistant.
72 of the 79 generated proofs were <= 10 proof steps, which is impressive given that each proof consists of ~100 commands! However, the commands within the 10-step-proofs consisted of only ~20% of the ~12,000 total proof commands. Why not break the longer (and ostensibly more sophisticated) proofs into shorter ones (as CoqGym did)?
That being said, CoqGym and PB9K1 reaching 50-200% of CoqHammer’s solve rate is impressive: CoqHammer’s backends have had years commercial development (preceded by decades of research). The authors are probably busy improving their algorithms and planning on more sophisticated analysis later. Hopefully we’ll see more qualitative discussion in the future.
Well, your comment tells me there’s at least straight-forward avenues for improvement. Breaking things up. They should try to merge those ideas first just to get better results. That gets more funding and bides time for them coming up with the real leap a year or two later. ;) Also, I’d rather them apply it to more proof efforts to generalize it. The nature of compilers might make them easier to do this on than say OS kernels or data structures mutating graphs.
One problem that may kick in is that I hear tools like Coq change in a way where old proofs don’t work on newer versions. If true, that might make building a training set of exemplar proofs more difficult. If no standardization, might need converters that transform older ones to new representation. I don’t know if that’s going to be just a transpiler or would require understanding the logic to point that it’s improbable/impossible.
[Comment from banned user removed]