CUHK - Research on audio-visual bimodal fusion

    In human-computer interaction (HCI), the audio and visual modalities form the core of the interfaces, a computer with such interface will be able to listen to and see the user and provide the audio visual feedback to the user. To build such bimodal human computer interfaces, the researches on the audio visual integration are very important.

    Human speech is bimodal in nature: audio and visual. While the audio speech signal refers to the acoustic waveform produced by the speaker, the visual speech signal refers to the movements of the lips, tongue, and other facial muscles of the speaker.

    In this project, We are devoting to develop a new audio-visual bimodal model which will attempt to capture the inter-correlations as well as the loose time synchronicity between audio and visual speeches.

Contact Us:

Professor Helen Meng: hmmeng@se.cuhk.edu.hk
Dr. Zhiyong WU: zywu@se.cuhk.edu.hk