This is fantastic, and I wish it existed when I was first learning Japanese.
It also seems like something which was really fun to program!
I wish it were standard practice to build tools for all sorts of language complexities. Not everyone learns this way, but when given this kind of tool I will play with them for hours, and learn a ton in the process. For instance, rendaku in Japanese. I know I still don’t get those right half of the time, but the first time I really “got it” was reading this semi-guide semi-blog post about them.
This is very cool. Now I wonder which other natural languages we could do this for. Malay might be a good candidate.
My guess would be that Korean is another likely candidate, since at least simple sentences can be translated word-for-word from Japanese to Korean (the way you might’ve pretended to write Spanish or French when you were a kid). The Ryukyuan languages might also be good candidates, since I know they’re pretty close to Japanese.
But that might honestly be it, if those even do work. I can think of a lot of other languages with very regular grammars that might be candidates (e.g. Turkish), but for every one I can think of, at least some feature (in the case of Turkish, vowel harmony) messes it up.
Korean also has vowel harmony.
Chomsky originally tried to do this for all languages, but he started with English. Formal and generative grammars were invented for natural languages,but they kind of fall short of their goal. There’s a reason why computer scientists like formal grammars a lot more than linguists do.
Starting with English is either a sign of prudently testing to see if the hardest problem can be solved first or naively thinking that English is straightforward. Given that this was Chomsky, I’d imagine it was the former, although I’ve never seen that explicitly stated anywhere.
English is about as complicated as any other language. Chomsky started with English because it’s the language he knows best.
As evidenced by the post, you can’t really do this for Japanese (it’s a very limited subset). So limited, you might do something similar for English.