Enhancing Speaker Recognition Robustness with Scalable Deep Learning Models and MFCC Features
Main Article Content
Abstract
Speaker recognition is the process of distinguishing various speakers within recordings of sounds or stream. Several variables contribute to the task's complexity, including variances in structure, overlapping sound events, as well as the presence of multiple noise sources after recorded. Despite the plethora of algorithms that have been developed to extract this data for identification purposes, capturing speaker-specific attributes from the often intricate sound mix is still a difficulty for machines. Earlier methods have used discriminative models to decode voice data, but with increasing computation capability, generative models are taking some ground. While they are functional for various speech types missing transition or clarity, the scalability of these models is questionable. To address this issue in this paper, the different databases used to train deep learning models like the Feed Forward Neural Network (FFNN), Forward Cascade Back Propagation (FCBP), and Elman Propagation Neural Network (EPNN) are trained in such a way that addresses scalability problems of the models.
Article Details
Section

This work is licensed under a Creative Commons Attribution 4.0 International License.