عنوان مقاله
بهینه سازی با الگوریتم کلونی مورچه ها برای پیکره بندی مجموعه های انباشته سازی
فهرست مطالب
مقدمه
پیشینه
شیوه ACO-Stacking
کاربرد داده کاوی حساس به هزینه واقعی
نتیجه گیری
بخشی از مقاله
شیوه ACO-Stacking
برای ساخت ACO-Stacking ، مجموعه کاندیدهای طبقه بند سطح پایه و مجموعه کاندیدهای طبقه بند متا و همچنین مجموعه های آموزشی ، مجموعه های ارزیابی و مجموعه تست مطرح شده اند.
مدل انباشته سازی با طبقه بند های سطح پایه و طبقه بند متا پیکره بندی شده است. سپس این انباشته سازی با مجموعه های آموزشی، آموزش و با مجموعه های ارزیابی، ارزیابی شده است. اگر بسته مسیر جدید بهتر از بسته موجود باشد، آنگاه جای خود را به بسته موجود می دهد. در غیر این صورت، بسته مسیر موجود این مورچه تغییر نمی کند. در پایان، پیکره بندی بهترین مورچه پیکره بندی نهایی شیوه خواهد بود. بالاخره این پیکره بندی با استفاده از مجموعه تست، تست می شود.
22222222
کلمات کلیدی:
Applying Ant Colony Optimization to configuring stacking ensembles for data mining YiJun Chen, Man-Leung Wong ⇑ , Haibing Li Department of Computing and Decision Sciences, Lingnan University, Tuen Mun, Hong Kong article info Keywords: ACO Ensemble Stacking Metaheuristics Data mining Direct marketing abstract An ensemble is a collective decision-making system which applies a strategy to combine the predictions of learned classifiers to generate its prediction of new instances. Early research has proved that ensemble classifiers in most cases can be more accurate than any single component classifier both empirically and theoretically. Though many ensemble approaches are proposed, it is still not an easy task to find a suitable ensemble configuration for a specific dataset. In some early works, the ensemble is selected manually according to the experience of the specialists. Metaheuristic methods can be alternative solutions to find configurations. Ant Colony Optimization (ACO) is one popular approach among metaheuristics. In this work, we propose a new ensemble construction method which applies ACO to the stacking ensemble construction process to generate domain-specific configurations. A number of experiments are performed to compare the proposed approach with some well-known ensemble methods on 18 benchmark data mining datasets. The approach is also applied to learning ensembles for a real-world cost-sensitive data mining problem. The experiment results show that the new approach can generate better stacking ensembles. 2013 Elsevier Ltd. All rights reserved. 1. Introduction Over years of development, it has become more and more diffi- cult to improve significantly the performance of a single classifier. Recently, there has been growing research interest in the method to combine different classifiers together to achieve better performance. The combining method is referred to as Ensemble. In early research, ensembles were proved empirically and theoretically to perform more accurately than any single component classifier in most cases. If an ensemble is generated by a set of classifiers which are trained from the same learning algorithm, this ensemble is a homogeneous ensemble. If an ensemble is generated by a set of classifiers, which are trained from different learning algorithms, this ensemble is a heterogeneous ensemble (Dietterich, 2000). For example, Bagging (Breiman, 1996) and Boosting (Schapire, 1990) are homogeneous ensembles, while stacking (Wolpert, 1992) is a heterogeneous ensemble. To generate an ensemble to achieve expected results, two important things should be considered carefully. The first is to introduce enough diversity into the components of an ensemble. The second is to choose a suitable combining method to combine the diverse outputs to a single output (Polikar, 2006). The diversity is the foundation of an ensemble. However, as the diversity increases, the marginal effect decreases after a certain threshold. The memories and computing cost increase significantly while the performance does not improve steadily. For early Bagging and Boosting methods, the diversity is achieved by using the resample strategy. The classifiers included in Bagging are trained with the data subsets, which are randomly sampled from the original dataset. A majority voting scheme is applied as the combining method to make a collective decision. Boosting uses a weighted resample strategy. The weights of all instances are initialized equally. If an instance is misclassified, its weight will be increased. Thus it will be more likely to select the misclassified instances into the next training subset. The diversity generating process stops when the errors are too small. The combining scheme of Boosting is a weighted majority voting. Compared to Bagging and Boosting, stacking does not manipulate the training dataset directly. Instead, an ensemble of classifiers is generated based on two levels. In the base level, multiple classifiers are trained with different learning algorithms. The diversity is introduced because different learning algorithms make different errors in the same dataset. A meta-classifier is applied to generate the final prediction. The meta-classifier is trained with a learning algorithm using a metadataset which combines the outputs of base-level classifiers and the real class label. One problem of stacking is how to obtain an ‘‘appropriate’’ con- figuration of the base-level classifiers and meta-classifier for each 0957-4174/$ - see front matter 2013 Elsevier