As
an example of thesis, a study comparison among data mining algorithms might be
a good one. Take for instance, approximate string matching algorithms. They can
be coded in a suitable language and applied on data sets and their run times can
be found to get the approximate matched string in response to the query string.
And an analysis of about which algorithm is optimal to use can be done.
Here is an analysis of
three approximate string matching algorithms:
Lipschitz Embedding
Algorithm uses a straight forward procedure to match an approximate string to
the query string. It creates reference strings, uses mathematical formula to
find out minimum edit distance and the corresponding nearest string to the
query string. Ball partitioning algorithm (which I haven’t covered but can be
found from suitable web resources) creates a VP tree and visits left and right
nodes of the tree to find out minimum edit distance and the corresponding
string which is nearest to the query string. Execution time increases with the
increase of visited nodes. Brute force algorithm for approximate string
matching uses the most inefficient way to find out the nearest string. It calculates
edit distances of all the text strings from the query string and then checks
all the distances to find out the minimum distance.As a result, logically as
well as based on their run times on data sets, among the three approximate
string matching algorithms, the fastest is Lipschitz Embeddings Algorithm, then
the Ball Partitioning algorithm and the slowest is Brute Force algorithm.
No comments:
Post a Comment