A bank loans officer needs analysis of her data in order to learn which
loan applicants are "safe" and which are "risky" for the
bank. A marketing manager at a company that needs data analysis to help guess
whether a customer with a given profile will buy a new computer. A medical
researcher wants to analyze breast cancer data in order to predict which one of
three specific treatments a patient should receive. In each of these examples,
the data analysis task is classification, where a model or classifier is
constructed to predict categorical labels, such as "safe" or
"risky" for the loan application data; "yes" or
"no" for the marketing data; or "treatment A," "treatment
B," or "treatment C" for the medical data. These categories can
be represented by discrete values, where the ordering among values has no
meaning. For example, the values 1, 2, and 3 may be used to represent
treatments A, B, and C, where there is no ordering implied among this group of
treatment regimes.
Suppose that the marketing manager would like to predict how much a given customer will spend during a sale. This data analysis task is an example of numeric prediction, where the model constructed predicts a continuous-valued function, or ordered value, as opposed to a categorical label. This model is a predictor.
Model construction: describing a set of predetermined classes
Suppose that the marketing manager would like to predict how much a given customer will spend during a sale. This data analysis task is an example of numeric prediction, where the model constructed predicts a continuous-valued function, or ordered value, as opposed to a categorical label. This model is a predictor.
Classification- a two step
process
Model construction: describing a set of predetermined classes
·
Each
tuple/sample is assumed to belong to a predefined class, as determined by the
class label attribute.
· The
set of tuples used for model construction is a training set.
· The
model is represented as classification rules, decision tress, or mathematical
formulae.
Model
Prediction: for classifying future or unknown
objects
- Estimate Accuracy of the model
· The unknown label of test sample is compared with the classified result from the model.· Accuracy rate is the percentage of test set samples that are correctly classified by the model.· Test set is independent of training set, otherwise over-fitting will occur.
Comparison of Data Mining Algorithms Based on Classification and PredictionClassification and Prediction can be applied as explained above to data mining algorithms like Decision Tree Induction. And its accuracy can be compared to a classification algorithm like Naive Bayesian Classification and an analysis between the two may be chalked out.
No comments:
Post a Comment