I worked for a company that was developing this technology with an eye towards online classrooms, to prevent a user from giving their credentials to a brainier friend on exam day. We also relied heavily on dwell time and gap time in our statistical analysis.
We ran into two problems that were very difficult to get around. One was that when a user bought a new keyboard, or a new laptop, or visited their parents for the weekend, or got drunk, their typing profile was completely unrecognizable. From our perspective, there was no difference between sometimes doing your homework from your girlfriend’s house, and sometimes having your girlfriend do your homework.
Secondly is the base rate fallacy. We had similar accuracy to that described in the article–after a surprisingly brief training period, we could confirm or reject users some 95% of the time. However, since we monitored thousands of students, and the overwhelming majority of them were honest, we had to choose between issuing dozens of false accusations for every actual cheater, or letting all but the sloppiest cheaters get away with it.
I am not sure how I would get around such difficulties, especially if the goal is to get a fully automated yes/no judgment.
Additionally, you can stop yourself from making purchases while drunk!
The other issue I could see is that small changes, like a bandaid on a finger, would totally whack it out.