Chi-squared-based vs. entropy-based mechanisms for building fuzzy discretizers, inducers and classifiers
Vasile Georgescu. University of Craiova
- Fuzzy Economic Review: Volume VII, Number 1. May 2002
- DOI: 10.25102/fer.2002.01.01
Abstract
This paper proposes an automatic knowledge acquisition system that includes mechanisms for generating fuzzy partitions, inducing fuzzy decision trees and inferring fuzzy classifications. Both discretizer and inducer designing need a dissimilarity measure to choose the appropriate partitions and the most significant predictors among the candi-date ones. Although current approa-ches use the entropy as a measure, our study focuses on adapting a c2 distance in order to accommodate a probabilistic test with a fuzzy data description. Summarizing such data within fuzzy contingency tables provides formal support to apply the c2-test for indepen-dence. The advantage of using a c2-based measure instead of an entropy-based one is to control probabilistically the partitioning as well as the splitting mechanisms. However, handling accu-rately the test procedure in fuzzy context needs restricting the practicable covering schemata for allowing the interpretation of membership degree vectors in terms of probability distribu-tions. Finally, the fuzzy inducer can be employed to build fuzzy classifiers, namely to apply a fuzzy inference mechanism in order to classify new (unseen) cases. Experimental eviden-ces derived from comparative tests confirm that a c2-based inducer produces more accurate and reliable results than an entropy-based one.
Keywords: fuzzy decision tree inducers, splitting criteria, fuzzy discretizers and classifiers