Saturday, 3 November 2012

Levenshtein Edit Distance Algorithm in Java

The Levenshtein distance (more popularly called the edit distance) of two strings is defined as the number of edits required to make the strings equal. The edits can be,
  1. add a character
  2. delete a character
  3. change a character
Descriptions of this algorithm can be found on several sources online. The one on Wikipedia is quite helpful.

Here is a Java implementation of the algorithm, that you can directly use in your code. It makes use of dynamic programming, so you can be sure that it is efficient both in time and space complexities:
   public static int distance(String s1, String s2){
        int edits[][]=new int[s1.length()+1][s2.length()+1];
        for(int i=0;i<=s1.length();i++)
            edits[i][0]=i;
        for(int j=1;j<=s2.length();j++)
            edits[0][j]=j;
        for(int i=1;i<=s1.length();i++){
            for(int j=1;j<=s2.length();j++){
                int u=(s1.charAt(i-1)==s2.charAt(j-1)?0:1);
                edits[i][j]=Math.min(
                                edits[i-1][j]+1,
                                Math.min(
                                   edits[i][j-1]+1,
                                   edits[i-1][j-1]+u
                                )
                            );
            }
        }
        return edits[s1.length()][s2.length()];
   }
This takes two strings as arguments, and returns an integer which is the edit distance. For example if you pass the arguments "jdepths" and "pdepths", you will get 1 as the result.

No comments:

Post a Comment