A Phoneme-Based Deepfake Detection System
What happens if we can no longer trust our senses? There is a general concern that Artificial Intelligence (AI) and Machine Learning (ML) algorithms could soon create extraordinarily realistic fake videos that trick our eyes and ears. Thinking on a counter-response, we propose a new deepfake detection system based on phonemes, their transcribed text, the associated mouth movements, and video-extracted features. As a proof-of-concept, we create a Brazilian Portuguese deepfake detection system using three presidential candidates of the 2022 elections and one of the authors. We also present a novel dataset of authentic and fake videos from these four individuals mixing their identities, which we used to extract features and feed our classification methods. Our classification methods achieved satisfactory results when authenticating (or not) testing videos that contain at least one of the trained phonemes from the Brazilian Portuguese language. In conclusion, we support the hypothesis that deepfake detection is possible due to the lack of expression in the target's mouth, especially in non-English language fake videos. And developing a deepfake detection system with individual-guided classification models may help authenticate videos of celebrities or politicians in future and online events.
Keywords - Deepfake, Phoneme, Portuguese, Brazil.