One of the well known approximate string matching algorithms
is Lipschitz Embeddings. Here it goes:
Input: A text file T containing N strings (O1, O2,…….ON)
and a query object q.
Output: Nearest string to q with corresponding edit distance.
Steps:
- Compute F(q),the array of edit distances of the query object q to k strings of R, that is, F(q) = (d(q,A1 ),d(q,A2 ),……,d(q,Ak ) ) where, d(q,Aj ) is the edit distance between q and Aj .Here,1≤j≤k.
- Compute F(Oi ),the array of edit distances of each string Oi in T to k reference strings in R.
that is, F(Oi) =( (d(Oi
,A1),(d(Oi , A2 ),…….d(Oi , Ak
) ).Here,1≤ i ≤ N.
- Calculate ∂(F(Oj ),F(q)),the distance between each string Oj of T to query string q in embedding space where,
where 1≤ j ≤ N, 1≤ i ≤ k,
p=2.
5. Find the minimum value among ∂(F(Oi ),F(q)), if the minimum is ∂(F(Om ),F(q)),
6.
For i=1 to N,
then edit_distance= d(Om
,q) and nearest_string= Om.
else if (d(Oi ,q) < d(Om ,q))
then edit_distance= d(Oi
,q) and nearest_string = Oi
7.
Show the edit_distance and nearest_string.
See ya next time.