Class SorensenDice
- java.lang.Object
-
- info.debatty.java.stringsimilarity.ShingleBased
-
- info.debatty.java.stringsimilarity.SorensenDice
-
- All Implemented Interfaces:
NormalizedStringDistance,NormalizedStringSimilarity,StringDistance,StringSimilarity,Serializable
@Immutable public class SorensenDice extends ShingleBased implements NormalizedStringDistance, NormalizedStringSimilarity
Similar to Jaccard index, but this time the similarity is computed as 2 * |V1 inter V2| / (|V1| + |V2|). Distance is computed as 1 - cosine similarity.- Author:
- Thibault Debatty
- See Also:
- Serialized Form
-
-
Constructor Summary
Constructors Constructor Description SorensenDice()Sorensen-Dice coefficient, aka Sørensen index, Dice's coefficient or Czekanowski's binary (non-quantitative) index.SorensenDice(int k)Sorensen-Dice coefficient, aka Sørensen index, Dice's coefficient or Czekanowski's binary (non-quantitative) index.
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description doubledistance(String s1, String s2)Returns 1 - similarity.doublesimilarity(String s1, String s2)Similarity is computed as 2 * |A inter B| / (|A| + |B|).-
Methods inherited from class info.debatty.java.stringsimilarity.ShingleBased
getK, getProfile
-
-
-
-
Constructor Detail
-
SorensenDice
public SorensenDice(int k)
Sorensen-Dice coefficient, aka Sørensen index, Dice's coefficient or Czekanowski's binary (non-quantitative) index. The strings are first converted to boolean sets of k-shingles (sequences of k characters), then the similarity is computed as 2 * |A inter B| / (|A| + |B|). Attention: Sorensen-Dice distance (and similarity) does not satisfy triangle inequality.- Parameters:
k-
-
SorensenDice
public SorensenDice()
Sorensen-Dice coefficient, aka Sørensen index, Dice's coefficient or Czekanowski's binary (non-quantitative) index. The strings are first converted to boolean sets of k-shingles (sequences of k characters), then the similarity is computed as 2 * |A inter B| / (|A| + |B|). Attention: Sorensen-Dice distance (and similarity) does not satisfy triangle inequality. Default k is 3.
-
-
Method Detail
-
similarity
public final double similarity(String s1, String s2)
Similarity is computed as 2 * |A inter B| / (|A| + |B|).- Specified by:
similarityin interfaceStringSimilarity- Parameters:
s1- The first string to compare.s2- The second string to compare.- Returns:
- The computed Sorensen-Dice similarity.
- Throws:
NullPointerException- if s1 or s2 is null.
-
distance
public final double distance(String s1, String s2)
Returns 1 - similarity.- Specified by:
distancein interfaceStringDistance- Parameters:
s1- The first string to compare.s2- The second string to compare.- Returns:
- 1.0 - the computed similarity
- Throws:
NullPointerException- if s1 or s2 is null.
-
-