Using the m-estimate in rule induction
Abstract
Rule induction, a subarea of machine learning, is concerned with the problem of constructing rules from examples. In rule induction systems, various heuristic functions are used to estimate the quality of rules. Most of them use some form of probability estimates, relative frequency being the most common. This has resulted in the problem of small disjuncts, where specific rules produce high error rates, due to unreliable probability estimates from small samples. Tu alleviate this problem, the Laplace estimate has been used in the rule induction system CN2. We have replaced the Laplace estimate by a general Bayesian probability estimate, the m-estimate, which does not rely on the Laplacian assumption of equally likely classes. The parameter m in the m-estimate allows for adapting to the learning domain. Depending on the level of noise in the examples and other properties of the domain, the appropriate level of generalization can be achieved by setting the m parameter to an appropriate value. We compare the performance of rules derived by using the Laplace and the m-estimate on several practical domains in terms of classification accuracy and the theoretically underpinned measure of relative information score.
Full Text:
PDFThis work is licensed under a Creative Commons Attribution-NoDerivatives 4.0 International License.