Similarity measure exercise for classification trees based on the classification path
Abstract
Classification tree models are known for their simplicity and efficiency when dealing with domains contain large number of variables and cases. However, a small perturbation in the data, can lead to a very different tree. We introduce a method for measuring similarity between binary classification trees based on the similarity between the classification paths. The trees to be compared are represented in the form of matrices whose entries are in the interval [0,1]. Overlap similarity measure is used to measure the similarity between each pair of path in two trees, and the best matching paths between trees are used to calculate
the similarity measure. This method has advantage to measure trees that possess the same structure and leaf nodes but different internal node.