رکورد قبلیرکورد بعدی

" Audio-visual tracking of multiple moving speakers "


Document Type : Latin Dissertation
Record Number : 831899
Doc. No : TLets683821
Main Entry : Kilic, V.
Title & Author : Audio-visual tracking of multiple moving speakers\ Kilic, V.Wang, W. ; Kittler, J. ; Barnard, M.
College : University of Surrey
Date : 2016
student score : 2016
Degree : Thesis (Ph.D.)
Abstract : In this thesis, a novel approach is proposed for multi-speaker tracking by integrating audio and visual data in a particle filtering (PF) framework. This approach is further improved for adaptive estimation of two critical parameters of the PF, namely, the number of particles and noise variance, based on tracking error and the area occupied by the particles in the image. Here, it is assumed that the number of speakers is known and constant during the tracking. To relax this assumption, the random finite set (RFS) theory is used due to its ability in dealing with the problem of tracking a variable number of speakers. However, the computational complexity increases exponentially with the number of speakers, so probability hypothesis density (PHD) filter, which is first order approximation of the RFS, is applied with sequential Monte Carlo (SMC), namely particle filter, implementation since the computational complexity increases linearly with the number of speakers. The SMC-PHD filter in visual tracking uses three types of particles (i.e. surviving, spawned and born particles) to model the state of the speakers and to estimate the number of speakers. We propose to use audio data in the distribution of these particles to improve the visual SMC-PHD filter in terms of estimation accuracy and computational efficiency. The tracking accuracy of the proposed algorithm is further improved by using a modified mean-shift algorithm, and the extra computational complexity introduced by mean-shift is controlled with a sparse sampling technique. For quantitative evaluation, both audio and video sequences are required together with the calibration information of the cameras and microphone arrays (circular arrays). To this end, the AV16.3 dataset is used to demonstrate the performance of the proposed methods in a variety of scenarios such as occlusion and rapid movements of the speakers.
Added Entry : Wang, W. ; Kittler, J. ; Barnard, M.
Added Entry : University of Surrey
کپی لینک

پیشنهاد خرید
پیوستها
عنوان :
نام فایل :
نوع عام محتوا :
نوع ماده :
فرمت :
سایز :
عرض :
طول :
TLets683821_80289.pdf
TLets683821.pdf
پایان نامه لاتین
متن
application/pdf
18.85 MB
85
85
نظرسنجی
نظرسنجی منابع دیجیتال

1 - آیا از کیفیت منابع دیجیتال راضی هستید؟