TÁVKÖZLÉSI ÉS MÉDIAINFORMATIKAI TANSZÉK
Budapesti Műszaki és Gazdaságtudományi Egyetem - Villamosmérnöki és Informatikai Kar

Témák listája

Computer Vision and Natural Language Processing in machine learning
Computer vision (CV) and Natural Language Processing (NLP) are two main subfields of machine learning, and a lot of research is going on there. These two subfields overlap together in tasks such as text generation out of image (image2text) or vice-versa (text2image). A main obstacle in the way of teaching models (supervised learning) which are able to perform such tasks is the lack of labeled data, and a way to overcome this is to follow unsupervised learning approach. The task of the student(s) is to get familiar with those tasks and try to reproduce available solutions in order to be able to improve them later. No. of students: 1 - 3 contact email: alshouha@edu.bme.hu
COMPUTER VISION AND NATURAL LANGUAGE PROCESSING IN MACHINE LEARNING
Computer vision (CV) and Natural Language Processing (NLP) are two main subfields of machine learning, and a lot of research is going on there. These two subfields overlap together in tasks such as text generation out of image (image2text) or vice-versa (text2image). A main challenge that is facing these models (and ML based models in general) is the explaination of the model's output, e.g.: why a certain object appears in a certain image captioning. The task of the student(s) is to get familiar with those tasks and try to reproduce available XAI (explainable AI) algorithms in order to utilize them later. Number of students: 1 - 2.
COMPUTER VISION AND NATURAL LANGUAGE PROCESSING IN MACHINE LEARNING
Computer vision (CV) and Natural Language Processing (NLP) are two main subfields of machine learning, and a lot of research is going on there. These two subfields overlap together in tasks such as text generation out of image (image2text) or vice-versa (text2image). A new subfield has emerged, i.e. Story Visualization, with the help of the advancement of GANs and Diffusion models. The task of the student(s) is to explore Story Visualization topic by investigating and utilizing the state-of-the-art models in the field. No. of students: 1 - 3 contact email: alshouha@edu.bme.hu
Automatic speech recognition for low-resource languages
Speech recognition technology has been used for a long time, but recognizing a speech accurately is a very difficult task. In this topic, we mainly use the conformer-ctc model provided by open-source toolkits (Nemo), and fine-tune the model to achieve better training results. If you are interested in automatic speech recognition, and have a good foundation in python, it is highly recommended that you choose this topic.
Témavezető: Meng Yan
Automatic speech recognition for low-resource languages
Speech recognition technology has been used for a long time, but recognizing a speech accurately is a very difficult task. In this topic, we mainly use the conformer-ctc model provided by open-sourch toolkits(Nemo), and fine-tune the model to achieve better training results. If you are interested in automatic speech recognition, and have a good foundation in python, it is highly recommended that you choose this topic.
Témavezető: Meng Yan
Automatic speech recognition for low-resource languages
Speech recognition technology has been used for a long time, but recognizing a speech accurately is a very difficult task. In this topic, we mainly use the conformer-ctc model provided by open-source toolkits (Nemo), and fine-tune the model to achieve better training results. If you are interested in automatic speech recognition, and have a good foundation in python, it is highly recommended that you choose this topic.
Témavezető: Meng Yan