Class Levenshtein

    • Constructor Detail

      • Levenshtein

        public Levenshtein()
    • Method Detail

      • distance

        public final double distance​(String s1,
                                     String s2,
                                     int limit)
        The Levenshtein distance, or edit distance, between two words is the minimum number of single-character edits (insertions, deletions or substitutions) required to change one word into the other. http://en.wikipedia.org/wiki/Levenshtein_distance It is always at least the difference of the sizes of the two strings. It is at most the length of the longer string. It is zero if and only if the strings are equal. If the strings are the same size, the Hamming distance is an upper bound on the Levenshtein distance. The Levenshtein distance verifies the triangle inequality (the distance between two strings is no greater than the sum Levenshtein distances from a third string). Implementation uses dynamic programming (Wagner–Fischer algorithm), with only 2 rows of data. The space requirement is thus O(m) and the algorithm runs in O(mn).
        Parameters:
        s1 - The first string to compare.
        s2 - The second string to compare.
        limit - The maximum result to compute before stopping. This means that the calculation can terminate early if you only care about strings with a certain similarity. Set this to Integer.MAX_VALUE if you want to run the calculation to completion in every case.
        Returns:
        The computed Levenshtein distance.
        Throws:
        NullPointerException - if s1 or s2 is null.