عنوان مقاله
جدایی گفتار تک کانالی در حوزه فرکانس مدولاسیون بر اساس روش جدید تخمین رنج گام
فهرست مطالب
مقدمه
تجزیه و تحلیل فرکانس مدولاسیون
توصیف سیستم
ارزیابی
نتیجه گیری
مقدمه
تجزیه و تحلیل فرکانس مدولاسیون
توصیف سیستم
ارزیابی
نتیجه گیری
بخشی از مقاله
توصیف سیستم
هدف اصلی سیستم فعلی ، تولید یک ماسک نرم برای جداسازی گفتار تک کانالی در حوزه اسپکتوگرام مدولاسیون می باشد. در سیستم پیشنهاد شده، تعیین رنج گام گفتار هدف و تداخلی یکی از عناصر ضروری برای تولید ماسک جهت جداسازی گفتار محسوب می گردد. وقتی اسپکتوگرام مدولاسیون سیگنال گفتاری محاسبه شده باشد، آنگاه نوبت به تعیین رنج گام اسپیکرهای هدف و تداخلی رسیده و سپس ماسک درست برای جداسازی گفتار محاسبه می گردد. مراحل کلی سیستم معرفی شده در شکل 4 نشان داده شده است.
کلمات کلیدی:
Single channel speech separation in modulation frequency domain based on a novel pitch range estimation method Azar Mahmoodzadeh1 , Hamid Reza Abutalebi1*, Hamid Soltanian-Zadeh2,3 and Hamid Sheikhzadeh4 Abstract Computational Auditory Scene Analysis (CASA) has been the focus in recent literature for speech separation from monaural mixtures. The performance of current CASA systems on voiced speech separation strictly depends on the robustness of the algorithm used for pitch frequency estimation. We propose a new system that estimates pitch (frequency) range of a target utterance and separates voiced portions of target speech. The algorithm, first, estimates the pitch range of target speech in each frame of data in the modulation frequency domain, and then, uses the estimated pitch range for segregating the target speech. The method of pitch range estimation is based on an onset and offset algorithm. Speech separation is performed by filtering the mixture signal with a mask extracted from the modulation spectrogram. A systematic evaluation shows that the proposed system extracts the majority of target speech signal with minimal interference and outperforms previous systems in both pitch extraction and voiced speech separation. Keywords: acoustic frequency, modulation frequency, onset and offset algorithm, pitch range estimation, speech separation 1. Introduction Speech separation, as a solution to the cocktail party problem, is a well-known challenge with important applications. To touch the point, consider the telecommunication systems or the Automatic Speech Recognition systems that lose performance in the presence of interfering sounds [1,2]. An effective system that segregates speech from interference in monaural (singlemicrophone) situations can be rewarding in such problems. Many methods have been proposed for monaural speech enhancement; for example, see [3-7]. These methods usually assume certain statistical properties for interference and tend to lack the capacity of dealing with a variety of interferences. While the monaural speech separation works awkwardly, the human auditory system performs proficiently. The perceptual process is considered as Auditory Scene Analysis (ASA) [5]. Psychoacoustic research in ASA has inspired considerable work in developing Computational Auditory Scene Analysis (CASA) systems for speech separation (see [6,7] for a comprehensive review).