This is not an original puzzle, sadly. It’s a version of a recreational mathematical game which, at least in the English-speaking word, appears to be called “Deleting Sheep”. An example of that game is described here.

I can’t point you at a specific website that has this specific game but I remember a lot of variations from when I was a child. I definitely remember this one from a very stupid test back in third grade.

But is this actually an original puzzle? ChatGPT thinks so!

Read: When you ask ChatGPT “And to the best of you knowledge this type of puzzle does not currently exist?”, it generates text to the effect of ‘Not as far as I am aware’. Which only tells you something about the prompt + the corpus ChatGPT was trained on. Any perceived correspondence with reality cannot be trusted until investigated.

So 1) it’s not a new gametype, as found by @x64k, but 2) it is still impressive that you could generate this with ChatGPT!

Now the interesting part: is this a good implementation of the puzzle? The puzzle generation code assigns each cell randomly from 1-9¹, then randomly picks “good”/“bad” for each cell and calculates the sums of the “good” cells. This means puzzles can have multiple solutions, or be “cooked”. It’s unlikely to generate a cooked puzzle, but it’s still possible. This is a very, very bad thing! Cooked puzzles are a huge faux pas. The implementation still happens to accept an alternate solution, but it’ll give you “wrong” information for hints and error-checking.

(¹-19 to 19 if it’s 10x10 or larger, though the comment incorrectly says 8x8 or larger.)

The other limitation: random generation doesn’t lead to “interesting” puzzles. This is harder to explain well. A good logic puzzle has a “vocabulary” of repeated patterns and strategies you can use to chip away at the problem. It looks hard but there’s a place to start working, and that opens up your options. Really good puzzles also have a global structure, so independent-looking regions are actually linked, and seeing those relationships is an important part of doing well. Like when you realize that a 3 in the upper left propagates to a 6 in the lower right, and then that would propagate somewhere else and give you a conflict, so you can rule out the original move.

I usually start seeing some heuristics after two or three puzzles in the same genre. I’ve done a bunch of these sumpletes and there’s barely anything to go on. A big part of the problem is that often there will be 4+ subsets that satisfy a sum, which means you get barely any information from pinning down any numbers. I just generated a 7x7 and got this row:

8 4 9 4 3 8 3 20

There are eight different satisfying solutions! This makes solving a random sumplete a complete slog, where you can’t do much more than guess and backtrack. And even guesses don’t tell you much, since even completing a whole row gives you no more information about the rest of the board.

I think there could be interesting sumplete games, but this is also a reminder of why most good logic puzzles are still handcrafted.

Now I wanna see if I can get minizinc to generate cooked puzzles.

include "globals.mzn";
int: dim = 3;
array[1..dim, 1..dim] of var 1..9: board;
array[1..dim, 1..dim] of var bool: x;
array[1..dim, 1..dim] of var bool: y;
array[1..dim] of var int: hsums;
array[1..dim] of var int: vsums;
constraint forall (i in 1..dim) (hsums[i] = sum([board[i, j] * x[i, j] | j in 1..dim]));
constraint forall (i in 1..dim) (hsums[i] = sum([board[i, j] * y[i, j] | j in 1..dim]));
constraint forall (j in 1..dim) (vsums[j] = sum([board[i, j] * x[i, j] | i in 1..dim]));
constraint forall (j in 1..dim) (vsums[j] = sum([board[i, j] * y[i, j] | i in 1..dim]));
constraint x != y;
constraint nvalue(8, array1d(1..(dim*dim), board));
solve satisfy;

And here’s a puzzle it spat back:

2 8 6 | 8
7 4 3 | 7
5 4 9 | 9
- - -
7 8 9

I’ve forgotten both how much fun and how frustrating minizinc can be.

Oh, wow! It didn’t even occur to me to look at the code ChatGPT generated!

I think the code it generated further illustrates the current limitations of GPT-based models. The easiest way to come up with a satisfying puzzle of this kind is to mangle a magic square – it’s one of the many games based on them. You can do that according to a nice pattern, thus ensuring you don’t wind up with a frustrating puzzle that you can only complete by guessing and backtracking. And there are a bunch of well-known analytical methods for generating magic squares of (almost) arbitrary size, most of which allow you to “bake in” some patterns in the initial magic square itself – most of them are pattern-based in the first place.

I suspect that, properly “guided”, transformer-based models could kind of figure out how to generate a puzzle like this from a magic square, but are highly unlikely to come up with this faster and more satisfying implementation when asked to write code that generates a puzzle like this.

This is amazing. My main frustration with chatgpt is that there appears to be no way to get it to read enough of my code to eg edit it.

That said, this seems to be a good way to quickly prototype something - it certainly seems like this would be a good basis for a no-code service to develop interactive mock-ups and prototypes, but again being able to show it something existing would be more powerful.

When a row or column is “done”, clicking on the line number will automatically mark the remaining unmarked cells as “required”. (Or just do this automatically.)

When the remaining numbers don’t sum high enough to reach a row/column, mark the number in red.

This is not an original puzzle, sadly. It’s a version of a recreational mathematical game which, at least in the English-speaking word, appears to be called “Deleting Sheep”. An example of that game is described here.

I can’t point you at a specific website that has this specific game but I remember a lot of variations from when I was a child. I definitely remember this one from a very stupid test back in third grade.

This game (via HN) is pretty much the same, from 2019.

Also not that much fun I think, and I like games like this usually.

Russian cosmism, but the only person we bring back is Martin Gardner as a forgetful AI.

Read: When you ask ChatGPT “And to the best of you knowledge this type of puzzle does not currently exist?”, it generates text to the effect of ‘Not as far as I am aware’. Which only tells you something about the prompt + the corpus ChatGPT was trained on. Any perceived correspondence with reality cannot be trusted until investigated.

So 1) it’s not a new gametype, as found by @x64k, but 2) it is still impressive that you could generate this with ChatGPT!

Now the interesting part: is this a good

implementationof the puzzle? The puzzle generation code assigns each cell randomly from 1-9¹, then randomly picks “good”/“bad” for each cell and calculates the sums of the “good” cells. This means puzzles can have multiple solutions, or be “cooked”. It’s unlikely to generate a cooked puzzle, but it’s still possible. This is a very, very bad thing! Cooked puzzles are a huge faux pas. The implementation still happens to accept an alternate solution, but it’ll give you “wrong” information for hints and error-checking.(¹-19 to 19 if it’s 10x10 or larger, though the comment incorrectly says 8x8 or larger.)

The other limitation: random generation doesn’t lead to “interesting” puzzles. This is harder to explain well. A good logic puzzle has a “vocabulary” of repeated patterns and strategies you can use to chip away at the problem. It looks hard but there’s a place to start working, and that opens up your options. Really good puzzles also have a global structure, so independent-looking regions are actually linked, and seeing those relationships is an important part of doing well. Like when you realize that a 3 in the upper left propagates to a 6 in the lower right, and then

thatwould propagate somewhere else and give you a conflict, so you can rule out the original move.I usually start seeing some heuristics after two or three puzzles in the same genre. I’ve done a bunch of these sumpletes and there’s barely anything to go on. A big part of the problem is that often there will be 4+ subsets that satisfy a sum, which means you get barely any information from pinning down any numbers. I just generated a 7x7 and got this row:

There are

eightdifferent satisfying solutions! This makes solving a random sumplete a complete slog, where you can’t do much more than guess and backtrack. And even guesses don’t tell you much, since even completing a whole row gives you no more information about the rest of the board.I think there could be interesting sumplete games, but this is also a reminder of why most good logic puzzles are still handcrafted.

Now I wanna see if I can get minizinc to generate cooked puzzles.

Okay, here’s a very crappy minizinc script:

And here’s a puzzle it spat back:

I’ve forgotten both how much fun and how frustrating minizinc can be.

Oh, wow! It didn’t even occur to me to look at the code ChatGPT generated!

I think the code it generated further illustrates the current limitations of GPT-based models. The easiest way to come up with a satisfying puzzle of this kind is to mangle a magic square – it’s one of the many games based on them. You can do that according to a nice pattern, thus ensuring you don’t wind up with a frustrating puzzle that you can only complete by guessing and backtracking. And there are a bunch of well-known analytical methods for generating magic squares of (almost) arbitrary size, most of which allow you to “bake in” some patterns in the initial magic square itself – most of them are pattern-based in the first place.

I suspect that, properly “guided”, transformer-based models could kind of figure out how to generate a puzzle like this from a magic square, but are highly unlikely to come up with this faster and more satisfying implementation when asked to write code that generates a puzzle like this.

Before you ask ChatGPT anything, assume it’s a drunken plagiarist with pretty good spelling.

This is amazing. My main frustration with chatgpt is that there appears to be no way to get it to read enough of my code to eg edit it.

That said, this seems to be a good way to quickly prototype something - it certainly seems like this would be a good basis for a no-code service to develop interactive mock-ups and prototypes, but again being able to show it something existing would be more powerful.

Feature proposals:

When a row or column is “done”, clicking on the line number will automatically mark the remaining unmarked cells as “required”. (Or just do this automatically.)

When the remaining numbers don’t sum high enough to reach a row/column, mark the number in red.

Show remaining count, not total (optional?)

New puzzle button

Undo button, undo key (U)

Actually, could you put it on Github?