عنوان مقاله

کاربرد تابع زیان C برای دسته بندی الگو



خرید نسخه پاورپوینت این مقاله


خرید نسخه ورد این مقاله



 

فهرست مطالب

مقدمه

نظریه آماری دسته بندی

تابع زیان القایی با Correntropy 

آموزش با استفاده از تابع زیانC 

آزمایشات و نتایج

نتیجه گیری





بخشی از مقاله

قاعده تصمیم بهینه بیز

فرض کنید p(x)=P(Y=1|X=x) احتمال شرطی کلاس مثبت در شرایط X=x می باشد. سپس قاعده دسته بندی بهینه نظری تصمیم با کوچکترین خطای تعمیم به صورت sign[p(x)-1/2] نوشته می شود. این قاعده، قاعده بهینه بیزنامیده می شود. ریسک وابسته به قاعده بهینه بیز، ریسک بهینه بیز R*=R(f*) نامیده می شود. 






خرید نسخه پاورپوینت این مقاله


خرید نسخه ورد این مقاله



 

کلمات کلیدی: 

The C-loss function for pattern classification Abhishek Singh a,n , Rosha Pokharel b , Jose Principe b a Department of Electrical & Computer Engineering, University of Illinois at Urbana-Champaign, United States b Department of Electrical & Computer Engineering, University of Florida, Gainesville, United States article info Article history: Received 3 October 2012 Received in revised form 16 July 2013 Accepted 24 July 2013 Available online 3 August 2013 Keywords: Classification Correntropy Neural network Loss function Backprojection abstract This paper presents a new loss function for neural network classification, inspired by the recently proposed similarity measure called Correntropy. We show that this function essentially behaves like the conventional square loss for samples that are well within the decision boundary and have small errors, and L0 or counting norm for samples that are outliers or are difficult to classify. Depending on the value of the kernel size parameter, the proposed loss function moves smoothly from convex to non-convex and becomes a close approximation to the misclassification loss (ideal 0–1 loss). We show that the discriminant function obtained by optimizing the proposed loss function in the neighborhood of the ideal 0–1 loss function to train a neural network is immune to overfitting, more robust to outliers, and has consistent and better generalization performance as compared to other commonly used loss functions, even after prolonged training. The results also show that it is a close competitor to the SVM. Since the proposed method is compatible with simple gradient based online learning, it is a practical way of improving the performance of neural network classifiers. & 2013 Elsevier Ltd. All rights reserved. 1. Introduction Classification aims at assigning class labels to data using an ‘optimal’ decision rule that is learnt using a set of pre-labeled training samples. This ‘optimal’ decision rule or discriminant function f is learnt by minimizing the empirical risk, which is a sample average of a loss function. The loss (function of the prediction fðxÞ, and the true label y) is essentially the price we pay for predicting the label to be fðxÞ, instead of y. This procedure for learning the discriminant function is called the Empirical Risk Minimization, and is a widely used principle for classification and statistical learning [1,2]. The most natural loss function for classification is the misclassification error rate (or the 0–1 loss) l01ðfðxÞ; yÞ ¼ JðyfðxÞÞþ J 0; ð1Þ where ð:Þþ denotes the positive part and J:J 0 denotes the L0 norm. This essentially is a count of the number of incorrect classifications made by the decision rule f. Therefore, the 0–1 loss function directly relates to the probability of misclassification. Optimization of the risk based on such a loss function, however, is computationally intractable due to its non-continuity and non-convexity [1,2]. Therefore, a surrogate loss function is applied to many classification procedures. For example, well known loss functions for training the weights of a neural network or a radial basis function (RBF) network are the squared loss, ðyfðxÞÞ2, or ð1yfðxÞÞ2, and the logistic loss, log ð1 þ eyfðxÞ Þ. The Support Vector Machine (SVM) [3,4] uses the hinge loss, ½1yfðxÞþ. Within the statistical learning community, convex surrogates of the 0–1 misclassification loss are highly preferred because of the virtues that convexity brings – unique optima, efficient optimization using convex optimization tools and amenability to theoretical analysis of error bounds [5]. However, convex functions are still poor approximations to the 0–1 loss function. They tend to be boundless and offer poor robustness to outliers [2]. Another important limitation is that the complexities of convex optimization algorithms grow very fast with more data [6]. Some nonconvex loss functions have been proposed recently with the aim of addressing these issues [7,8]. There is a large class of problems where optimization cannot be done using convex programming techniques. For example, training of deep networks for large scale AI problems primarily rely on online, gradient-based methods [9,10]. Such neural network based learning machines can benefit from non-convex loss functions, as they can potentially offer better scalability, robustness and generalization performance. Although non-convex optimization and loss functions do not offer many theoretical guarantees, the empirical evidence that they work better in engineering applications is becoming overwhelming [6]. A loss function for classification that is inspired by the statistical measure called Correntropy [11] was proposed in [12]. Correntropy Conte