Information Systems is about data-intensive computing for information processing and intelligence extraction to enable better decision-making and execution for complex systems in our changing society.
In order to leverage today’s rapidly-advancing technology, new generations of algorithms and technologies are applied. Systems engineers are well-trained with solid computer-related and programming knowledge for analysing and mining data, building large-scale analytic models, both stochastic and deterministic, creating algorithms for solving problems, executing large-scale simulation models, and allowing users to easily visualize and manipulate the data.
- Audio Search Engines
- Bi-directional English-Chinese Machine Translation
- Computer-Aided Second Language Learning through Speech-based Human-Computer Interactions
- Graph Database
- Highly Natural Chinese Speech Synthesis with a Talking Head
- Information Mining and Discovery from Text Data
- Integration of Classification and Pattern Mining: A Discriminative and Frequent Pattern-based Approach
- Learning Text Categorization and Classification
- Multi-modal and Trilingual Spoken Dialog Systems
- Network Informal Language Processing
- Querying Large Evolving Graphs
- Social Media and e-Community Analysis
- Temporal Information Extraction and Processing
Audio Search Engines
Audio search engines enable us to search through the mass of audio information that is available on the internet, e.g. audio tracks of video, radio broadcasts, meeting recordings, etc. This project combines speech processing and information retrieval technologies to facilitate audio search and retrieval. Features such as automatic segmentation of hours of audio into individual stories, retrieval of Chinese spoken recordings based on textual input queries and also cross-language English-Chinese spoken document retrieval are also possible.
Bi-directional English-Chinese Machine Translation
We have developed one of the first bi-directional English-Chinese Machine Translation systems using semi-automatically generated grammars. The same system can automatically generate the Chinese translation of an input English query as well as the English translation of an input Chinese query. Grammars are derived semi-automatically using a datadriven technique.
Computer-Aided Second Language Learning through Speech-based Human-Computer Interactions
This is a new initiative that aims to develop speech and language technologies to support second language learning, especially for Chinese learners of English. We are developing an automatic speech recognizer that can detect and diagnose the learners’ pronunciation errors, in order to automatically generate corrective feedback that is helpful for the user. Text-to-speech synthesis technologies are also developed to provide spoken feedback. This project brings together the fields of engineering, linguistics and education. It opens up new opportunities in the area of e-learning and collaborative learning using next-generation web technologies. Please see www.se.cuhk.edu.hk/hccl/languagelearning
As rapid growth of Internet and Web-technology, information becomes ever more pervasive and important. The demand keeps increasing for database management systems to provide more effective mechanisms, as being shown in up-to-date research activities in supporting Web, semistructured and XML applications in database systems. This project aims at providing advanced techniques to effectively and efficiently handle graph query processing, indexing, and storage management, for large graph datasets.
Highly Natural Chinese Speech Synthesis with a Talking Head
We have developed Crystal, a text-to-audiovisual-speech synthesizer that can automatically generate a cartoontalking head based on textual input. This avatar can speak in Cantonese or Putonghua. We are working on improving the naturalness of the avatar, both in terms of its spoken expressions, as well as facial expressions and articulatory gestures. This exciting project has many applications, e.g. electronic books, reading aids for the visually impaired, language learning, etc. Please see www.se.cuhk.edu.hk/crystal
Information Mining and Discovery from Text Data
Massive amount of information is stored in the form of texts. They can be in the form of unrestricted natural language and in different domains. Some texts are in semi-structured form such as Web pages. This project aims at developing new models for discovering new, previously unknown information that is useful for human or for further construction of intelligent systems. Techniques drawn from machine learning, natural language processing, and information retrieval are investigated.
Integration of Classification and Pattern Mining: A Discriminative and Frequent Pattern-based Approach
Many existing classification methods assume the input data is in a feature vector representation. However, in many tasks, the predefined feature space is not discriminative enough to distinguish different classes. More seriously, in many other applications, the input data has no predefined feature vector, such as transactions, sequences, graphs, and semistructured data. For both scenarios, a primary challenge is how to construct a discriminative and compact feature set. Besides popularly investigated machine learning and statistical approaches, frequent pattern mining can be considered as another approach. The direction is interesting because frequent patterns are usually statistically significant and semantically meaningful. The objective of this project is to use discriminative frequent patterns to characterize complex structural data and thus enhance the classification power. I developed a framework of discriminative frequent patternbased classification which could lead to a highly accurate, efficient and interpretable classifier on complex data.
Learning Text Categorization and Classification
One important and useful basic component in text mining is automatic text categorization. Text categorization has a lot of applications including intelligent document routing and knowledge management. It is more challenging than ordinary classification problems due to high dimensionality, text feature extraction, and skewedly distributed classes. We have been developing new algorithms for this problem. In addition to algorithmic progress, we intend to seek a more realistic model for capturing the inherent properties of text classification.
Multi-modal and Trilingual Spoken Dialog Systems
We are developing distributed spoken dialog systems that support the languages of Hong Kong (Cantonese, Mandarin and English) as well as human-computer interactions using portable PDAs and smart phones connected over a wireless network. Our systems accept multimodal input via speech, handwriting and pointing; and they deliver multimedia output involving text, audio and video. Users can use these systems for information access in the travel and financial domains. Our systems integrate a plethora of technologies involving speech recognition, natural language understanding, multi-modal dialog modelling and speech synthesis.
Network Informal Language Processing
Network Informal Language (NIL) refers to the language commonly used on the Internet for real-time information exchange, such as over ICQ, MSN, etc. NIL is very different from natural language. It is dynamic and anomalous in nature. We propose to use a machine learning approach to acquire new vocabulary and grammar rules from a proprietary NIL corpus. Understanding NIL would enable us to analyse the behaviour of Internet users. This in turn could be applied to commercial application, such as customer relationship management.
Querying Large Evolving Graphs
The data available on the Web increases significantly over years and will continue to grow significantly. We consider the Web as an example of a large evolving graph, because its content and links structure dynamically change over time. Like the Web, there exist many other large graphs in real applications that change over time dynamically, for example, the social networks. In this project, we study a large evolving graph, which is a sequence of graphs, (G1, G2, G3) such as Gj evolves from Gi if Gi appears before Gj. We focus on providing new effective and efficient mechanisms for users to understand the structural perspectives and understand how these structural perspectives evolve or change over time. The importance of this project is based on the fact that users request to understand changes in terms of structural perspectives in a global sense rather than on a basis of individual documents and Web pages that contain certain keywords, and users request to know how the things, that are not obviously related, are related and how some have impacts over the others.
Social Media and e-Community Analysis
Facebook, Twitter, LinkedIn, etc. are popular social media. Today, they are widely used for sharing opinions on different targets, e.g. services, products, politics etc. Social media is becoming an indispensable way of communication in our daily life. Different from traditional communication, social media provides a platform where people are connected together to form e-communities. Hence, social media brings significant advances to our understanding of social behaviors, and the study of social media is of great importance in sociology, biology, and computer science. The core element in social media is the notion of e-community, which serves the roles of an information generator and propagator, as well as a relationship manager. There is, therefore, a growing research interest in understanding e-communities, which is the target of our research team.
Temporal Information Extraction and Processing
Temporal information carries information about changes and time of the changes. It is regarded as an equally, if not more, important piece of information in applications like extracting and tracking information over time or planning and evaluating activities. The conventional information systems may maintain and manipulate the occurrence time of events, but they may not be able to handle users’ queries concerning how an event relates to another in time. In this project, we investigate techniques in natural language processing for extracting temporal information from a document and, based on the extracted information, develop techniques in temporal logic inference.