Speaker Change Detection System Using A Hamming Window And The Central Output Of The Bi-Rnn
The speaker diarization technology is a technology to find an answer to a question such as “Who says when?” in an audio recording that has an unknown number of speakers and an unknown amount of speech. In the present study, the deep learning architecture of RNN and FCL structures was used to extract the speaker boundary from the audio stream, and ICSI meeting corpus was used for experiments. In order to enhance the central region information of the analysis section, the audio data was segmented into segments of 3 seconds and the hamming window was applied. In addition, the output value of the central area of the RNN was used so that the two pieces of speaker information in the segment could be compared well. As a result of experiments conducted applying and not applying the hamming window and using different RNN cell output selection methods, it was identified that the highest performance 73.98% was shown by the architecture in which the hamming window was applied and the central output value of RNN was used.
Keywords- Speaker Change Detection, Bidirectional RNN, Hamming Window