News & Activities
About Us
Programmes
People
Facilities
Research
Career
Facilities
Student Society
Job Vacancies
Intranet
Download
¡@ Home > Research > Information Systems > Details sitemaphome
Information Systems
¡@

AICAMS Prototype Development

K.P. Lam

AICAMS (Artificial Intelligence Crime Analysis and Management System) is an on-going collaborative project between the Hong Kong Police Force and the Chinese University of Hong Kong. The project aims to apply artificial intelligence, knowledge-based and map-based technologies for improving detection rates of reported crime and reducing the incidents of crime. Work on a pilot system commenced in Tuen Mun in April 1997 and is proceeding to a functional prototype by February 1999.


A Bayesian Framework for an Intelligent Information Filtering Systems under Dynamic Environments

W. Lam

Due to the explosive growth of information such as the WWW and numerous electronic document collections, information filters are essential for users to sift out relevant items from incoming information. The objective of this project is to develop an intelligent information filtering system based on a Bayesian framework for dynamic environments. A number of issues are involved in developing a filtering system. The information, such as text-based documents, is raw and unstructured. The researcher proposes an intelligent formation filtering system, base on a Bayesian framework, offering the following features:
1) intelligent user feedback handling;
2) supporting the dynamic nature of users;
3) supporting the dynamic nature of documents;
4) content-based filtering.


An Inference Network for Knowledge Representation and Reasoning

K.P. Lam and W. Lam

The main goals of this project are to use artificial intelligence for knowledge representation and to model human commonsense reasoning. A prototype of the knowledge system shell will be engineered and implemented. This project takes the hybrid approach in combining the computational efficiency of numerical networks and the more human-like approximate symbolic reasoning. This network is built on a four-truth-value paradigm with degree-of-beliefs. It has the ability to accept inconsistent knowledge with a set of network update operators. The consistency reasoning algorithm ensures that all knowledge states are unique and consistent.


Audio Search Engines

H. Meng

Audio search engines enable us to search through the mass of audio information that is available on the internet, e.g. audio tracks of video, radio broadcasts, meeting recordings, etc. This project combines automatic speech recognition and information retrieval technologies to achieve audio searching. Our system can support monolingual searches in which both queries and spoken documents are in Cantonese, Mandarin or English. Our system can also support cross-language searches in which the language of the query differs from that of the spoken documents.


Automatic Text Categorization and Classification

W. Lam

One important and useful basic component in text mining is automatic text categorization. Text categorization has a lot of applications including intelligent document routing and knowledge management. It is more challenging than ordinary classification problems due to high dimensionality, text feature extraction, and skewly distributed classes. We have been developing new algorithms for this problem. In addition to algorithmic progress, we intend to seek a more realistic model for capturing the inherent properties of text classification.


Bi-directional English-Chinese Machine Translation

H. Meng

We have developed one of the first bi-directional English-Chinese Machine Translation systems using semi-automatically generated grammars. The same system can automatically generate the Chinese translation of an input English query as well as the English translation of an input Chinese query. Grammars are derived semi-automatically using a data-driven technique.


Category Tree Integration

C. C. Yang

Classification of large volume of documents has been widely used by organization or information providers to organize, archive, and access documents. Document classification organizes large document collection into distinct groups of similar documents and identifies hidden themes for each group. Different information providers have different classification for the information they provide. Individual users have to be familiar with the structures of all of their category trees before they can effectively search and aggregate information from multiple information providers. On the other hand, individual users have their own classification to manage the information they collected. In this project, we propose an automatic technique to integrate the categories from the source category trees to the personalized master category tree of individual users. Such automatic technique is able to expand the personal category tree from the source category tree when necessary. Users will no longer require browsing through multiple category trees from different information providers but only their own category tree. The personal master category trees will also be improved as the aggregated information continuous to expand.


Digital Library and Next Generation Internet

J. Yu, C.C. Yang and W. Lam

In order to make Hong Kong the Internet hub and electronic commerce center in this region, four key factors are needed: high-bandwidth Internet infrastructure, rich content, sufficient applications, and flexible interface.  We will develop a multimedia digital library - the Multi-lingual Informedia, with Carnegie Mellon University (CMU), Academic Sinica of Taiwan, and South China University of Technology, to study how to use the state-of-the-art Internet, multimedia, and content management technologies to support the creation of the above four key factors.


Cross-lingual Information Retrieval

C. C. Yang

Cross-lingual information retrieval refers to the ability to process a query for information in one language, search a collection of objects, including text, images, audio files etc. and return the most relevant objects, translated into the user's language if necessary. Corpus-based approach has been proven to overcome the shortcomings of dictionary based approach by making use of the statistical information of term usage in parallel or comparable corpora to construct an automatic thesaurus. In this project, we investigate the automatic techniques to extract the cross-lingual concept space from the multilingual parallel corpus. In addition, by investigating the query log files of Internet search engines, we extract the association between the queries in multiple languages.


Event Evolution

C. C. Yang

In topic detection and tracking, news stories are monitored and automatic techniques are used to spot new events and track the progress of previously spotted events. However, the traditional document clustering techniques only organize the news stories into a flat hierarchical structure. Such an organization is not capable of presenting the evolution and complex relationships between news events. Users are not able to capture how the events begin, evolve and end without a graphical evolution network. An event evolution graph is effective in presenting the rich underlying structure of events and allows efficient and meaningful information browsing. As a result, users are not only able to retrieve news stories of the same topic but also able to capture their evolution. In this project, we represent the event evolution relationships among events in an event evolution graph, where event evolution includes event threading and event joining. We investigate the features and techniques to determine the event evolution relationship.


Extracting Topic Hierarchy from Web sites

C. C. Yang

Modeling web site's content structure is useful for various web site processing tasks including navigation, classification, etc. Hierarchical models are commonly used to organize a web site's content. A web site's content structure can be represented by a topic hierarchy, a directed tree rooted at a Web site's homepage in which the vertices and edges correspond to Web pages and hyperlinks. In this work, we investigate several algorithms to extract a web site's topic hierarchy from its link structure by analyzing the semantic relationships between web pages based on link structure, web page content and directory structure.


Highly Natural Cantonese Speech Synthesis with a Talking Head

H. Meng

We have developed CU VOCAL, a Cantonese text-to-speech synthesizer that can automatically generate highly natural Cantonese speech based on input Chinese text. CU VOCAL can also be integrated with a digital avatar with automatically generated facial animation that is synchronized with the synthetic speech.


Information Mining and Analysis for E-Commerce

J. Yu

The project aims at establishing a new platform to conduct information mining and analysis for E-commerce in Financial Engineering Laboratory.  The focus is to systematically discover all possible unknown knowledge such as trading rules from the eight large financial databases we can fully access in our FEL, using different combinations of up-to-date on-line analytical processing (OLAP) and data mining techniques.


Internet Privacy and Security Issues

C.H. Cheng

Malicious use of Internet resources may be a threat to individual privacy and security. In this research, we examine the possibility of profiling individuals using Internet tools commonly used by the community. Through this study, we hope to understand how the profiling on individuals is done, how accurate the information is, and what individuals may do to protect their privacy and security.


Knowledge Discovery

W. Lam, H. Meng and J. Yu

This project focuses on automated or semi-automated learning from data and texts, and the transformation of learned theories into some knowledge representation formalisms. We expect to develop the theory and techniques for partial or full automation of the time-consuming process of expert knowledge elicitation through automatic knowledge discovery or learning from data. We aim at not only on the accurary and effectiveness of the learned information, but also on improving the level and depth of knowledge discovered.


Multilingual Knowledge Management

C. C. Yang

Knowledge management applications include generating, consuming, and maintaining tremendous amount of information. An efficient and effective management of continuously increasing volume of documents is essential so that users may obtain the knowledge to accomplish their own tasks. Text categorization deals with the automatic learning of a text categorization model from a training set of preclassified documents on the basis of their contents and the assignment of uncategorized documents to appropriate categories. Most of existing text categorization techniques deal with monolingual documents (i.e., all documents are written in one language) during the text categorization model learning and category assignment (or prediction). However, with the globalization of business environments and advances in Internet technology, an organization or individual often generates/acquires and subsequently archives documents in different languages, thus creating the need for cross-lingual text categorization. Motivated by its significance and need, this research investigates the cross-lingual text categorization technique.


Multi-modal and Trilingual Spoken Dialog Systems

H. Meng

We are developing distributed spoken dialog systems that support the languages of Hong Kong (Cantonese, Mandarin and English) as well as human-computer interactions using portable PDAs and smart phones connected over a wireless network. Our systems accept multimodal input via speech, handwriting and pointing; and they deliver multimedia output involving text, audio and video. Users can use these systems for information access in the travel and financial domains. Our systems integrate a plethora of technologies involving speech recognition, natural language understanding, multi-modal dialog modeling and speech synthesis.


Network Informal Language Processing

K.F. Wong

Network Informal Language (NIL) refers to the language commonly used on the Internet for real-time information exchange, such as over ICQ, MSN, etc. NIL is very different from natural language. It is dynamic and anomalous in nature. We propose to use a machine learning approach to acquire new vocabulary and grammar rules from a proprietary NIL corpus. Understanding NIL would enable us analyse the behaviour of Internet users. This in turn could be applied into commercial application, such as customer relationship management.


On the data partitioning problem

C.H. Cheng

Clustering computing environments can provide the performance and reliability required by distributed and parallel database systems. In a typical distributed/parallel database system, a request mostly accesses a subset of the entire database. It is, therefore, natural to organize commonly accessed data together and to place them on the near-by, preferably the same, machine(s)/site(s). For this reason, in distributed database application design, data partitioning and data allocation are critical issues. In this research we focus on data partitioning and in particular, investigate the use of genetic algorithms.


Reinforcement Learning Architecture using FPGA

K. P. Lam

Current work is to extend an analogue VLSI implementation of the binary relation inference network architecture to its digital FPGA dynamic-programming equivalent. It is planned that the digital implementation using the field programmable interconnect system at the MIT Lab will provide the basis of testing a complicated network for self-tuning neural control.


Self-tuning Neural Control Systems and their VLSI Implementatioin

K.P. Lam and C.S. Poon

Self-tuning adaptive control and neural networks are well established research disciplines in systems science and engineering. Recent years have seen the integrated use of major techniques developed in these two fields, for tackling practical control applications in non-linear, and largely unknown systems perturbed by unpredictable disturbances. In contrast to most neural adaptive control systems, this proposal aims at developing new self-tuning neural control systems with a biomedical and physiological emphasis. The study has two major engineering objectives: the first is to understand the intricate adaptive nature of human neural control systems as evidenced in respiration and circulation; the second is to derive biologically inspired intelligent control techniques for general engineering applications. Research in this direction is challenging due to the complexity of many unknowns of the human neural system, and also to the excessive computing time for simulating and emulating clusters of biological neurons to a sufficient granularity. An important part of our study is to develop a more practical real-time computing platform by designing specific analogue VLSI chips, to be used with a field-programmable interconnect system for implementing the self-tuning neural control system. It is envisaged that the platform would provide the necessary flexibility for evaluating different chip designs, for validating the emergent behavior of artificial neural networks with specific hypotheses, and for realizing a practical real-time emulator for self-tuning neural control applications.


Sentiment Analysis

C. C. Yang

The Web provides a public channel for consumers to express their opinions on consumer products. Sentiment analysis supports companies and individuals to exploit these sources of information to gain market intelligence. In this project, we develop automatic techniques to extracted product features commented by consumers and determine the semantic orientation of these comments. With these techniques, we are able to provide summary and benchmarking of consumer products.


Social Network Visualization

C. C. Yang

Analysis of social networks is essential for discovering knowledge about the structure of a community. Visualization of a network using a 2D graph can greatly facilitate the inspection of the global structure of the network. However, its usefulness becomes limited when the size and complexity of the network increase. In this project, we investigate the use of two interactive visualization techniques in the visualization of complex social networks: fisheye views and fractal views. Both techniques facilitate the exploration of complex networks by allowing a user to select one or more focus points and dynamically adjusting the graph layout to enhance the view of regions of interest. Combining the two techniques can effectively recognize patterns previously unreadable in the normal display due to the network complexity. We are also investigating the applications on visualization and analysis of the virtual community, such as blogspace, forum, and chatroom, to support Web searching and mining.


Temporal Information Extraction and Processing

K.F. Wong

Temporal information carries information about changes and time of the changes. It is regarded as an equally, if not more, important piece of information in applications like extracting and tracking information over time or planning and evaluating activities. The conventional information systems may maintain and manipulate the occurrence time of events, but they may not be able to handle users' queries concerning how an event relates to another in time. In this project, we investigate techniques in natural language processing for extracting temporal information from a document and, based on the extracted information, techniques in temporal logic inference.


Text Mining and Event Detection

W. Lam

The growth in the size of electronic textual data is increasingly enormous. This project aims at developing novel techniques for discovering useful knowledge or pattern from textual documents such as on-line news and reports. It requires advanced techniques drawn from machine learning, intelligent information retrieval, and natural language processing. One research direction is to automatically discover events from multi-lingual news. Traditional query-driven retrieval is useful only when one precisely knows the nature of documents. Automatic event detection helps a user to understand the document collection. It can also support intelligent processing of documents for different applications.


Web Services with Speech Technologies

H. Meng

Web services offer a highly interoperable platform for various applications to integrate and incorporate core Cantonese speech recognition and synthesis engines developed in our research team. We have developed the first Web service with our Cantonese text-to-speech engines that can automatically generate a spoken presentation of email and real-time information delivered via an alert service.


XML database

J. Yu

As rapid growth of Internet and Web-technology, information becomes ever more pervasive and important. The demand keeps increasing for database management systems to provide more effective mechanisms, as being shown in up-to-date research activities in supporting Web, semistructured and XML applications in database systems. This project aims at providing advanced techniques to effectively and efficiently handle XML query processing, indexing, and storage management, for large XML datasets.


XML for Chinese News Information Processing

K.F. Wong and W. Lam

The Chinese NewsML Community (see http://cnewsml.org/) was set up by the Chinese University of Hong Kong, and funded by the Innovation and Technology Commission, HKSAR. Its charter is to establish and promote a local NewsML standard and the corresponding supporting tools in order to speed up the pace of e-business in Hong Kong. This, in turn, will enhance Hong Kong's commercial competitiveness in the world.

¡@ ¡@
¡@ Email: dept@se.cuhk.edu.hk Tel: +852 2609-8313 Fax: +852 2603-5505
Address: Room 609, William M. W. Mong Engineering Building, The Chinese University of Hong Kong, Shatin, N.T., Hong Kong

¡@
© COPYRIGHT 2005 SEEM, CUHK

¡@

¡@