1. 14

So far about 200 responses from reddit and already some clear patterns. I look forward to sharing this when it’s cleaned up and has more data.

If this is not appropriate for lobste.rs, I apologize.

I’m doing this because I’m curious how effective a machine learning password bruteforce attack would be. If there is some inherent pattern to human randomness in the realm of keyboard input, I think this could be a big problem.


  2. 4

    Humans are not very random, but I think any keyboard patterns are a result of the keyboard layout and the typing method. People who type in a similar way should have a similar way of mashing keys. This might be fun for you http://www.loper-os.org/bad-at-entropy/manmach.html

    1. 3

      I know my own failings well enough to not trust my brain as a source of randomness; I use computer assistance to generate passwords.

      For example, in about half these fields as I was taking this, I inadvertently typed something highly regular - my first attempt at “4 random letters” was “azzz”. I then went back and did those ones over. I think this happens because the task of “plop your fingers down without paying attention to where they’re going” requires disengaging higher reasoning, and … well, the brain is a distributed system. The part of me that was bored with the task just wanted to press any keys at all to get it over with.

      I also found it particularly difficult to type the exact required number of characters without consciously picking each character. The survey seemed well-designed to draw out whether that interference results in worse randomness.

      From what I understand of the mind and the brain, I expect it’s about the same, but it’s certainly possible this is an easier task for singlets, which I imagine most lobste.rs users are. I’ll be interested in whether the data supports that.

      1. [Comment removed by author]

        1. 2

          For each class of your “passwords” you have a different sized alphabet. I would say that the sample size has to be some moderate multiple of your alphabet say 20x, and since the survey is combined, you should pick your largest alphabet, which seems to be 26 letters + 10 numbers + 32 special characters (?) which is 68 * 20 = 1360 samples.

          The use case I had in mind for your data, is a simple histogram of the distribution of each letter/position. So you have, as the basic data, a heat map of character vs position. Truly random is even, while non-random across people will manifest as splotchiness which can be quantified by MI computations.

          It’s cool how even a simple experiment like this can yield rich data!

          1. 1

            I would also look at 2-grams and 3-grams - I’d use a histogram for those as well. The layout of keyboards definitely has some effect on this task; hopefully it’s measurable.