



Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Community
Ask the community for help and clear up your study doubts
Discover the best universities in your country according to Docsity users
Free resources
Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors
decision tree used for classification in supervised learning algorithm
Typology: Lecture notes
1 / 5
This page cannot be seen from the preview
Don't miss anything!
Decision tree builds classifica�on or regression models in the form of a tree structure. It breaks down a data smaller and smaller subsets while at the same �me an associated decision tree is incrementally developed. T final result is a tree with decision nodes and leaf nodes. A decision node (e.g., Outlook) has two or m branches (e.g., Sunny, Overcast and Rainy). Leaf node (e.g., Play) represents a classifica�on or decision. The topmost decision node in a tree which corresponds to the best predictor called root node. Decision tree can handle both categorical and numerical data.
Algorithm
The core algorithm for building decision trees called ID3 by J. R. Quinlan which employs a top-d greedy search through the space of possible branches with no backtracking. ID3 uses Entropy and Informa Gain to construct a decision tree. In ZeroR model there is no predictor, in OneR model we try to find the sing predictor, naive Bayesian includes all predictors using Bayes' rule and the independence assump�ons betwe predictors but decision tree includes all predictors with the dependence assump�ons between predictors.
Entropy
A decision tree is built top-down from a root node and involves par��oning the data into subsets that conta instances with similar values (homogenous). ID3 algorithm uses entropy to calculate the homogeneity of a s If the sample is completely homogeneous the entropy is zero and if the sample is an equally divided it has en of one.
To build a decision tree, we need to calculate two types of entropy using frequency tables as follows:
a) Entropy using the frequency table of one a�ribute:
b) Entropy using the frequency table of two a�ributes:
Step 3 : Choose a�ribute with the largest informa�on gain as the decision node, divide the dataset by its bran and repeat the same process on every branch.
Step 4a : A branch with entropy of 0 is a leaf node.
Step 4b : A branch with entropy more than 0 needs further spli�ng.
Step 5 : The ID3 algorithm is run recursively on the non-leaf branches, un�l all data is classified.
A decision tree can easily be transformed to a set of rules by mapping from the root node to the leaf nodes o one.