1. 4

  2. 4

    I found another similarity distance useful that is fast to compute. It is based on letter pairs present in the key and the search string. You can find it explained here: strsim.

    This tool implements another algorithm that is relatively robust against swapped characters and small additions and deletions. The algorithm builds for each string a set of all adjacent characters (a set of pairs) and compares these:

    Let x and y be strings and xs and ys the corresponding sets of adjacent pairs from these string and ss their intersection. The similarity s of x and y is computed as

        s = (2*|ss|)/(|xs|+|ys|)

    where |xs| denotes the cardinality of set |xs|. Example:

        x = hello
        y = hallo (German for hello)
        xs = {he,el,ll,lo}
        ys = {ha,al,ll,lo}
        ss = {ll,lo}
        s = (2*2)/(4+3) = 4/7 = 0.57
    1. 1

      Thank you very much @lindig . I’ll have a look at it!