Home

Biography Current Projects Research Teaching Publications Call For Papers

Current Projects

Link, Content Analysis and Information Visualization of Weblog Communities

A Weblog is a Web site where entries are made in diary style, maintained by its sole author - a blogger, and displayed in a reverse chronological order. In this project, we develop the tecniques to analyze and visualize Weblog social network. Link analysis uses the relationships between bloggers to construct the Weblog social network. Content analysis associates similar blog messages to unveil implicit relationships found in the semantics to further improve the Weblog social network analysis. Users can use different interactive information visualization techniques to explore various aspects of the underlying social network at different levels of abstraction. Based on this anaylsis, we shall further develop other applications such as monitoring the theme and structural evolution, active sub-group identification, and information flow classification etc.


Cross-lingual Information Retrieval

Cross-lingual information retrieval refers to the ability to process a query for information in one language, search a collection of objects, including text, images, audio files etc. and return the most relevant objects, translated into the user’s language if necessary.  Corpus based approach has been proven to overcome the shortcomings of dictionary based approach by making use of the statistical information of term usage in parallel or comparable corpora to construct an automatic thesaurus.  In this project, we investigate the automatic techniques to extract the cross-lingual concept space from the multilingual parallel corpus.  In addition, by investigating the query log files of Internet search engines, we extract the association between the queries in multiple languages. 

 

Multilingual Knowledge Management

Knowledge management applications include generating, consuming, and maintaining tremendous amount of information. An efficient and effective management of continuously increasing volume of documents is essential so that users may obtain the knowledge to accomplish their own tasks. Text categorization deals with the automatic learning of a text categorization model from a training set of preclassified documents on the basis of their contents and the assignment of uncategorized documents to appropriate categories. Most of existing text categorization techniques deal with monolingual documents (i.e., all documents are written in one language) during the text categorization model learning and category assignment (or prediction). However, with the globalization of business environments and advances in Internet technology, an organization or individual often generates/acquires and subsequently archives documents in different languages, thus creating the need for cross-lingual text categorization. Motivated by its significance and need, this research investigates the cross-lingual text categorization technique.

 

Social Network Visualization

Analysis of social networks is essential for discovering knowledge about the structure of a community. Visualization of a network using a 2D graph can greatly facilitate the inspection of the global structure of the network. However, its usefulness becomes limited when the size and complexity of the network increase. In this project, we investigate the use of two interactive visualization techniques in the visualization of complex social networks: fisheye views and fractal views. Both techniques facilitate the exploration of complex networks by allowing a user to select one or more focus points and dynamically adjusting the graph layout to enhance the view of regions of interest. Combining the two techniques can effectively help an investigator to recognize patterns previously unreadable in the normal display due to the network complexity. 

 

Event Evolution

In topic detection and tracking, news stories are monitored and automatic techniques are used to spot new events and track the progress of previously spotted events. However, the traditional document clustering techniques only organize the news stories into a flat hierarchical structure. Such an organization is not capable of presenting the evolution and complex relationships between the news events. Users are not able to capture how the events begin, evolve and end without a graphical evolution network. An event evolution graph is effective in presenting the rich underlying structure of events and allows efficient and meaningful information browsing. As a result, users are not only able to retrieve news stories of the same topic but also able to capture their evolution.  In this project, we represent the event evolution relationships among events in an event evolution graph, where event evolution includes event threading and event joining. We investigate the features and techniques to determine the event evolution relationship.      

 

Extracting Topic Hierarchy from Web sites

Modeling web site’s content structure is useful for various web site processing tasks including navigation, classification, etc. Hierarchical models are commonly used to organize a web site’s content. A web site’s content structure can be represented by a topic hierarchy, a directed tree rooted at a Web site’s homepage in which the vertices and edges correspond to Web pages and hyperlinks. In this work, we investigate several algorithms to extract a web site’s topic hierarchy from its link structure by analyzing the semantic relationships between web pages based on link structure, web page content and directory structure.

 

Category Tree Integration

Classification of large volume of documents has been widely used by organization or information providers to organize, archive, and access documents. Document classification organizes large document collection into distinct groups of similar documents and identifies hidden themes for each group. Different information providers have different classification for the information they provide. Individual users have to be familiar with the structures of all of their category trees before they can effectively search and aggregate information from multiple information providers. On the other hand, individual users have their own classification to manage the information they collected. In this project, we propose an automatic technique to integrate the categories from the source category trees to the personalized master category tree of individual users. Such automatic technique is able to expand the personal category tree from the source category tree when necessary. Users will no longer require browsing through multiple category trees from different information providers but only their own category tree. The personal master category trees will also be improved as the aggregated information continuous to expand.

 

Multimedia Retrieval using Hyperlink Analysis

Hyperlink analysis has been widely investigated to support the retrieval of Web documents in Internet search engines.    It has been proven that the hyperlink analysis significantly improves the relevance of the search results and these techniques have been adopted in many commercial search engines, e.g. Google.  However, hyperlink analysis is mostly utilized in the ranking mechanism of Web pages only but not including other multimedia objects, such as images and video.  In this project, we investigate several algorithms to support the searching of multimedia objects in the Web. 

 

Sentiment Analysis

The Web provides a public channel for consumers to express their opinions on consumer products.  Sentiment analysis supports companies and individuals to exploit these sources of information to gain market intelligence.  In this project, we develop automatic techniques to extracted product features commented by consumers and determine the semantic orientation of these comments.  With these techniques, we are able to provide summary and benchmarking of consumer products. 

 




Copyright © 2005 Christopher C. Yang. All Rights Reserved.

Last updated: July 2005