Validating fuzzy logic values
With the Levenshtein distance, you might find that the difference between "Microsoft" and "Nzcrosoft" is only 2, but it will take a lot more time to come to that result.
A good approach is to seek corroboration from other data, such as address information, postal codes, tel numbers, Geo Coordinates etc.
** For example, someone writes: I have found the following threads that seem similar to this question, but the poster has not approved and I'm not sure if their use-case is applicable: How to find best fuzzy match for a string in a large string database Matching inexact company names in Java For more advanced needs, I think you need to look at the Levenshtein distance (also called "edit distance") of two strings and work with a threshold.
This is the more complex (=slower) solution, but it allows for greater flexibility.
Double-Metaphone includes a much larger encoding rule set than its predecessor, handles a subset of non-Latin characters, and returns a primary and a secondary encoding to account for different pronunciations of a single word in English.
At the bottom of the double metaphone page, they have the implementations of it for all kinds of programming languages: Python & My SQL implementation: https://github.com/Atom Boy/double-metaphone Firstly, I would like to add that you should be very careful when using any form of Phonetic/Fuzzy Matching Algorithm, as this kind of logic is exactly that, Fuzzy or to put it more simply; potentially inaccurate.