Human-Computer Communications Laboratory

Overview

The Human-Computer Communications Laboratory (HCCL) was established in 1999. Our vision is to leverage the powerful confluence of massive of computing, communication and content to derive intelligence in a form that is amenable to effective access, visualization and utilization for humans. Our mission is to foster interdisciplinary research and education in human-centric information systems. The scope of our study includes how interactive and intelligent human-computer interfaces to information should be designed and realized, in order to enable users to accomplish their desired tasks in smart, effective and efficient ways.

Guided by our mission, HCCL supports research areas including but not limited to: speech recognition, spoken language understanding, speech generation and synthesis, conversational systems development, audio information processing, multimodal and multimedia interface development, multi-biometric authentication, intelligent agents, mobile computing and e-learning.

Research

Guided by our mission, HCCL is intended to support research areas including but not limited to: speech recognition, spoken language understanding, speech generation, conversational systems development, audio information processing, multimodal and multimedia interface development, intelligent agents, mobile computing, and computer-supported collaborative work.

Systems and Demos

Enunciate: An online computer aided pronunciation training (CAPT) system (2017)
The growing number of second language (L2) learners creates a large demand of language learning resources. It is estimated that the number of English learners in China and India is over 500 million, which is greater than the combined population of English speaking countries. This creates a serious shortage of professional English teachers. A computer-aided pronunciation training (CAPT) system is one of the best approaches to fulfill the demands. The aim of CAPT is to provide an automatic system together with automatic speech recognition and diagnosis technology to help learner learning language around-the-clock.
Voice Conversion (2017)
Voice Conversion (VC) aims to modify a source speaker’s speech in such a way that it will resemble the target speaker’s speech. Real-time VC has wide applicability, e.g. in customized feedback of computer-aided pronunciation trimming systems, development of personalized speaking aids for speech-impaired subjects, moving dubbing with various actors’ voices, etc.
CU CRYSTAL: Highly Natural Chinese Speech Synthesis with a Talking Head (2009)
We have developed Crystal, a text-to-audiovisual-speech synthesizer that can automatically generate a cartoontalking head based on textual input. This avatar can speak in Cantonese or Putonghua. We are working on improving the naturalness of the avatar, both in terms of its spoken expressions, as well as facial expressions and articulatory gestures. This exciting project has many applications, e.g. electronic books, reading aids for the visually impaired, language learning, etc.
CU VOCAL: A Corpus-based Approach for Highly Natural Cantonese Text-to-Speech Synthesis (2006)
Our original objective is to develop a domain-optimized speech generation engine across Chinese dialects (focusing on Cantonese and Putonghua) as a component technology in Spoken Dialog Systems. We have successfully designed and implemented a novel approach for this task, and applied it to application domains such as foreign exchange, stocks and finanical news. The latest development of the CU Vocal project is that we are now able to synthesize speech for domain-independent text for Cantonese. The naturalness of the synthesized speech is our main concern.
HCCL’s CU FOREX system — Live Demonstration!
CU FOREX is a bilingual conversational hotline for foreign exchange enquiries. You can try talking your way to our real-time Reuters satellite feed!
HCCL’s Natural Language Interface
This is a natural language interface to demonstrate language understanding of Air Travel Information Queries
HCCL’s Speech Generation System
Listen to our speech generation for Putonghua and Cantonese! It speaks out the most uptodate foreign exchange rates!
Trilingual Audio Search Engine
ASE is a multilingual audio search engine on the web that supports Cantonese, English and Putonghua speech retrieval. Users can use a textual query to retrieve audio and video documents.
Semantic Processing
The semantic processing system can perform bilingual natural language understanding(NLU) for user queries in the financial domain. It uses a unified approach to handle both English and Chinese natural language queries. The system’s intelligence can derive complete interpretations of queries. It can automatically retrieve real-time stock quote from REUTERS. The system can be integrated with speech recognition and synthesis technologies.

Publications

Full List of Publications

Resources

High-end PCs and Workstations
Audio Equipment
Quiet, recording room
Special-purpose software tools

People

Prof. MENG, Mei Ling Helen (Lab Coordinator)
Prof. LAM, Wai

Prof. AHN, Dohyun have received the 2024 Operations Research Reviewer Meritorious Service Award

Competition on System Modeling & Optimization (COSMO) 2025

金融科技+金融學本科雙學位課程宣講會 May 17 2025