14 December 2018

Document Similarity Measures

String Matching
  • Edit Distance
    • Levenstein
    • Smith-Waterman
    • Affine
  • Alignment
    • Jaro-Winkler
    • Soft-TFIDF
    • Monge-Elkan
  • Phonetic
    • Soundex
    • Translation
Distance Matching
    • Euclidean
    • Manhattan
    • Minkowski
  • Text Analytics
    • Jaccard
    • TFIDF
    • Cosine Similarity
Relational Matching
  • Set Based
    • Dice
    • Tanimoto (Jaccard)
    • Common Neighbors
    • Adar Weighted
  • Aggregates
    • Average values
    • Max/Min values
    • Medians
    • Frequency (Mode)
Other Matching
    • Numeric distance
    • Boolean equality
    • Fuzzy matching
    • Domain specific
  • Gazettes
    • Lexical matching
    • Named Entities (NER)