University of Technology, Sydney

Staff directory | Webmail | Maps | Newsroom | What's on

A Fast Approximate AIB Algorithm for Distributional Word Clustering

Speaker: Dr. Lei Wang, Faculty of Informatics, University of Wollongong.

Seminar Chairman: Associate Professor Jian Zhang (


Distributional word clustering merges the words having similar probability distributions to attain reliable parameter estimation, compact classification models and even better classification performance. Agglomerative Information Bottleneck (AIB) is one of the typical word clustering algorithms and has been applied to both traditional text classification and recent image recognition. Although enjoying theoretical elegance, AIB has one main issue on its computational efficiency, especially when clustering a large number of words. Different from existing solutions to this issue, we analyze the characteristics of its objective function — the loss of mutual information, and show that by merely using the ratio of word-class joint probabilities of each word, good candidate word pairs for merging can be easily identified. Based on this finding, we propose a fast approximate AIB algorithm and show that it can significantly improve the computational efficiency of AIB while well maintaining or even slightly increasing its classification performance. Experimental study on both text and image classification benchmark data sets shows that our algorithm can achieve more than 100 times speedup on large real data sets over the state-of-the-art method.


Dr. Lei Wang received his Ph.D. from Nanyang Technological University, Singapore in 2004. Now he is with Faculty of Informatics of University of Wollongong as Senior Lecturer. He was awarded the Australian Postdoctoral Fellowship by the Australian Research Council and the Early Career Researcher Award by Australian Academy of Science. His research interest lies at machine learning, pattern recognition and computer vision. For machine learning and pattern recognition, he is interested in feature selection, model selection, and kernel-based learning methods. For computer vision, he is interested in content-based image retrieval and generic image categorisation.

Overview to AAI seminar series

The Advanced Analytics Seminar Series presents the latest theoretical advancement and empirical experience in a broad range of interdisciplinary and business-oriented analytics fields. It covers topics related to data mining, machine learning, statistics, bioinformatics, behavior informatics, marketing analytics and multimedia analytics. It also provides a platform for the showcase of commercial products in ubiquitous advanced analytics. Speakers are invited from both academia and industry. It opens regularly on every Friday afternoon at the garden-like UTS Blackfriars Campus. You are warmly welcome to attend this seminar series.

Jinyan Li, Seminar Coordinator, Associate Professor
Advanced Analytics Institute, School of Software, Faculty of Engineering and IT
University of Technology, Sydney
P.O. Box 123, Broadway, NSW 2007, Australia
Tel: 02 95149264 (office);

6 September 2013
13:30 - 14:30
Blackfriars CB25 Room GD.01
(5 minutes walk from the UTS Tower Building, CB01)
All Welcome
Colin Wise

Back to list of Past Events