Budapesti Műszaki és Gazdaságtudományi Egyetem - Villamosmérnöki és Informatikai Kar

Témák listája

Explainable Deep Learning Models for Text-to-Speech Conversational AI
Conversational AI uses machine learning to develop speech-based apps that allow humans to interact naturally with devices, machines, and computers using audio. Several deep learning models are connected to a pipeline to build a conversational AI application. This project aims to study and refine the TTS part in one Conversational AI toolkit (for example, NVIDIA NeMo or SpeechBrain).
Exploring Efficient Neural Architectures for Text-to-Speech Synthesis
Text-to-Speech (TTS) is a comprehensive technology that involves many disciplines such as acoustics, signal processing, and machine learning. This project focuses on developing a deep-learning model designed to provide a high-quality TTS system. The student's task is mapping from linguistic to acoustic features with various deep neural networks. Students must evaluate the updated system from different aspects, including intelligibility, naturalness, and preference for synthetic speech.