عنوان مقاله
کاربرد حداقل مربعات غیر منفی موازی کارآمد در معماریهای چند هسته ای
فهرست مطالب
مقدمه
روش مجموعه فعال
معماریهایCPU چند هسته ای
معماریهای GPU
کاربرد
ملاحظات پایانی
بخشی از مقاله
معماری های GPU
GPU ها اغلب به صورت مجموعه مولتی یا چند پردازنده هایی طراحی شده اند که هر یک حاوی مجموعه کوچکتری از پردازنده های اسکالر (SP) با معماریSIMD می باشند. چند رشته ای سخت افزار تحت معماریSIMT رشته های متعدد را برروی یکSP نگاشت می کند. یک SP آدرس دستور را جابجا و حالات رشته های متعدد را رجیستر یا ثبت می کند به گونه ای که آنها می توانند به صورت مستقل اجرا شوند.
کلمات کلیدی:
EFFICIENT PARALLEL NON-NEGATIVE LEAST SQUARES ON MULTI-CORE ARCHITECTURES YUANCHENG LUO AND RAMANI DURAISWAMI UNIVERSITY OF MARYLAND, COLLEGE PARK Abstract. We parallelize a version of the active-set iterative algorithm derived from the original works of Lawson and Hanson (1974) on multi-core architectures. This algorithm requires the solution of an unconstrained least squares problem in every step of the iteration for a matrix composed of the passive columns of the original system matrix. To achieve improved performance, we use parallelizable procedures to efficiently update and downdate the QR factorization of the matrix at each iteration, to account for inserted and removed columns. We use a reordering strategy of the columns in the decomposition to reduce computation and memory access costs. We consider graphics processing units (GPUs) as a new mode for efficient parallel computations and compare our implementations to that of multi-core CPUs. Both synthetic and non-synthetic data are used in the experiments. Key words. Non-negative least squares, active-set, QR updating, parallelism, multi-core, GPU, deconvolution AMS subject classifications. 15A06, 15A23, 65Y05, 65Y20 1. Introduction. A central problem in data-modelling is the optimization of underlying parameters specifying a linear model used to describe observed data. The underlying parameters of the model form a set n variables in a n × 1 vector x = {x1, · · · , xn} T . The observed data is composed of m observations in a m×1 vector b = {b1, · · · , bm} T . Suppose that the observed data are linear functions of the underlying parameters in the model, then the function’s values at data points may be expressed as a m × n matrix A where Ax = b describes a linear mapping from the parameters in x to the observations in b. In the general case where m ≥ n, the dense overdetermined system of linear equations may be solved via a least squares approach. The usual way to solve the least squares problem is with the QR decomposition of the matrix A where A = QR, with Q an orthogonal m×n matrix, and R an upper-triangular n×n matrix. Modern implementations for general matrices use successive applications of the Householder transform to form QR, though variants based on Givens rotation or Gram-Schmidt orthogonalization are also viable. Such algorithms carry an associated O(mn2 ) timecomplexity. The resulting matrix equation may be rearranged to Rx = QT b and solved via back-substitution for x. Sometimes, the underlying parameters are constrained to be non-negative in order to reflect real-world prior information. When the data is corrupted by noise, the estimated parameters may not satisfy these constraints, producing answers which are not usable. In these cases, it is necessary to explicitly enforce non-negativity, leading to the non-negative least squares (NNLS) problem considered in this paper. The seminal work of Lawson and Hanson [19] provide the first widely used method for solving this non-negative least squares problem. This algorithm, later referred to as the active-set method, partitions the set of parameters or variables into the active and passive-sets. The active-set contains the variables with values forcibly set to zero and which violate the constraints in the problem. The passive-set contains the variables that do not violate the constraint. By iteratively updating a feasibility vector with components from the passive-set, each iteration is reduced to an unconstrained linear least squares sub-problem that is solvable via QR.