Thursday, March 26, 2015

Lipschitz Embeddings



One of the well known approximate string matching algorithms is Lipschitz Embeddings. Here it goes:

Input: A text file T containing N strings (O1, O2,…….ON) and a query object q.
Output: Nearest string to q with corresponding edit distance.

Steps:
  1. Construct a text file R of k strings (A1,A2,……..Ak) by choosing randomly from N string
  1. Compute F(q),the array of edit distances of the query object q to k strings of R, that is, F(q) = (d(q,A1 ),d(q,A2 ),……,d(q,Ak ) ) where, d(q,Aj ) is the edit distance between q and Aj .Here,1≤j≤k.

  1. Compute F(Oi ),the array of edit distances of each string Oi in T to k reference strings in R.
         that is, F(Oi) =( (d(Oi ,A1),(d(Oi , A2 ),…….d(Oi , Ak ) ).Here,1≤ i ≤ N.

  1. Calculate ∂(F(Oj ),F(q)),the distance between each string Oj of T to query string q in embedding  space where,
        
          where 1≤ j ≤ N, 1≤ i ≤ k, p=2.

    5. Find the minimum value among ∂(F(Oi ),F(q)), if the minimum is ∂(F(Om ),F(q)),
         then find d(Om ,q). (1≤i≤N).

   6. For i=1 to N,
      if(∂(F(Oi ),F(q))  > d(Om ,q) )
           then edit_distance= d(Om ,q) and nearest_string= Om.
      else if (d(Oi ,q) < d(Om ,q))
           then edit_distance= d(Oi ,q) and nearest_string = Oi

7.      Show the edit_distance and nearest_string.


See ya next time.



No comments:

Post a Comment