University of Technology, Sydney

Staff directory | Webmail | Maps | Newsroom | What's on
#

Professor Dacheng Tao

Core Member, Joint Research Centre in Intelligent Systems Membership

BEng (USTC), MPhil (CUHK), PhD (London)

Email: Dacheng.Tao@uts.edu.au
Phone:
Fax:
Room:
Mailing address:

Edit your profile

Biography

Dacheng Tao is Professor of Computer Science with the Centre for Quantum Computation and Intelligent Systems (QCIS) and the Faculty of Engineering and Information Technology (FEIT) in the University of Technology Sydney (UTS). He takes/took a visiting professorship at many top universities and research institutes, e.g. Birkbeck - University of London, Shanghai Jiaotong University, Huazhong University of Science & Technology, Wuhan University, Northwestern Polytechnic University, Chinese Academy of Sciences, and Xidian University. Previously, he worked as a Nanyang Assistant Professor in the Nanyang Technological University and an Assistant Professor in the Hong Kong Polytechnic University. He received his BEng degree from the University of Science and Technology of China (USTC), his MPhil degree from the Chinese University of Hong Kong (CUHK), and his PhD from the University of London (London).

He mainly applies statistics and mathematics to data analytics problems and his research interests spread across computer vision, computational neuroscience, data science, geoinformatics, image processing, machine learning, medical informatics, multimedia, neural networks and video surveillance. His research results have expounded in one monograph and 400+ publications at prestigious journals and prominent conferences, such as IEEE T-PAMI, T-NNLS, T-IP, T-SP, T-MI, T-KDE, T-CYB, JMLR, IJCV, NIPS, ICML, CVPR, ICCV, ECCV, AISTATS, ICDM, SDM; ACM SIGKDD and Multimedia, with several best paper awards, such as the best theory/algorithm paper runner up award in IEEE ICDM’07, the best student paper award in IEEE ICDM’13, and the 2014 ICDM 10 Year Highest-Impact Paper Award. 

He has made notable contributions to universities by providing excellent research student supervision. His PhD students (including co-supervised PhD students) won Chancellor’s Award for the most outstanding PhD thesis across the university in 2012 and 2015, respectively, UTS Chancellor Postdoctoral Fellowship in 2012, the Extraordinary Potential Prize of 2011 Chinese Government Award for Outstanding Self-Financed Students Abroad, Microsoft Fellowship Award, Baidu Fellowship, Beihang “Zhuoyue” Program, the PLA Best PhD Dissertation Award, the Chinese Computer Federation (CCF) Outstanding Dissertation Award, the Award for the Excellent Doctoral Dissertation of Shanghai, the Award for the Excellent Doctoral Dissertation of Beijing, and Excellent PhD Dissertation Award from the National University of Defense Technology.

He is/was a guest editor of 10+ special issues, an editor of 10+ journals, including IEEE Trans. on Big Data (T-BD), IEEE Trans. on Neural Networks and Learning Systems (T-NNLS), IEEE Trans. on Image Processing (T-IP), IEEE Trans. on Cybernetics (T-CYB), IEEE Trans. on Systems, Man and Cybernetics: Part B (T-SMCB), IEEE Trans. on Circuits and Systems for Video Technology (T-CSVT), IEEE Trans. on Knowledge and Data Engineering (T-KDE), Pattern Recognition (Elsevier), Information Sciences (Elsevier), Signal Processing (Elsevier), and Computational Statistics & Data Analysis (Elsevier). He has edited five books on several topics of optical pattern recognition and its applications. He has chaired for conferences, special sessions, invited sessions, workshops, and panels for 60+ times. He has served for nearly 200 major conferences including CVPR, ICCV, ECCV, AAAI, IJCAI, NIPS, ICDM, AISTATS, ACM SIGKDD and Multimedia, and nearly 100 prestigious international journals including T-PAMI, IJCV, JMLR, AIJ, and MLJ.

He is a Fellow of the Institute of Electrical and Electronics Engineers (IEEE), a Fellow of the Optical Society of America (OSA), a Fellow of the International Association of Pattern Recognition (IAPR), a Fellow of the International Society for Optical Engineering (SPIE), an Elected Member of the International Statistical Institute (ISI), a Fellow of the British Computer Society (BCS), and a Fellow of the Institution of Engineering and Technology (IET/IEE). He is an elected member of the Global Young Academy (GYA). He chairs the IEEE SMC Technical Committee on Cognitive Computing and the IEEE SMC New South Wales Section.

Professional

Fellow, Institute of Electrical and Electronics Engineers (FIEEE)

Fellow, Optical Society of America (FOSA)

Fellow, International Association of Pattern Recognition (FIAPR)

Fellow, International Society for Optical Engineering (FSPIE)

Fellow, Institution of Engineering and Technology (FIET)

Fellow, British Computer Society (FBCS)

Elected Member, International Statistical Institute (ISI)

Elected Member of the Global Young Academy (GYA)

Teaching areas

Image and Video Analysis; Computer Vision; Pattern Recognition; Machine Learning; and Discrete Mathematics

Research

Research interests

statistics and mathematics for data analysis problems in machine learning, data mining & engineering, computer vision, image processing, multimedia, video surveillance and neuroscience

Research supervision: Yes

Projects

Publications

Journal articles

Chen, Z., You, X., Zhong, B., Li, J. & Tao, D. 2017, 'Dynamically Modulated Mask Sparse Tracking', IEEE Transactions on Cybernetics.
View description>>

Visual tracking is a critical task in many computer vision applications such as surveillance and robotics. However, although the robustness to local corruptions has been improved, prevailing trackers are still sensitive to large scale corruptions, such as occlusions and illumination variations. In this paper, we propose a novel robust object tracking technique depends on subspace learning-based appearance model. Our contributions are twofold. First, mask templates produced by frame difference are introduced into our template dictionary. Since the mask templates contain abundant structure information of corruptions, the model could encode information about the corruptions on the object more efficiently. Meanwhile, the robustness of the tracker is further enhanced by adopting system dynamic, which considers the moving tendency of the object. Second, we provide the theoretic guarantee that by adapting the modulated template dictionary system, our new sparse model can be solved by the accelerated proximal gradient algorithm as efficient as in traditional sparse tracking methods. Extensive experimental evaluations demonstrate that our method significantly outperforms 21 other cutting-edge algorithms in both speed and tracking accuracy, especially when there are challenges such as pose variation, occlusion, and illumination changes.

Dong, Y., Du, B., Zhang, L., Zhang, L. & Tao, D. 2017, 'LAM3L: Locally adaptive maximum margin metric learning for visual data classification', Neurocomputing, vol. 235, pp. 1-9.
View description>>

© 2016.Visual data classification, which is aimed at determining a unique label for each class, is an increasingly important issue in the machine learning community. In recent years, increasing attention has been paid to the application of metric learning for classification, which has been proven to be a good way to obtain a promising performance. However, as a result of the limited training samples and data with complex distributions, the vast majority of these algorithms usually fail to perform well. This has motivated us to develop a novel locally adaptive maximum margin metric learning (LAM3L) algorithm in order to maximally separate similar and dissimilar classes, based on the changes between the distances before and after the maximum margin metric learning. The experimental results on two widely used UCI datasets and a real hyperspectral dataset demonstrate that the proposed method outperforms the state-of-the-art metric learning methods.

Du, B., Wang, Z., Zhang, L., Zhang, L. & Tao, D. 2017, 'Robust and Discriminative Labeling for Multi-Label Active Learning Based on Maximum Correntropy Criterion', IEEE Transactions on Image Processing, vol. 26, no. 4, pp. 1694-1707.
View description>>

© 1992-2012 IEEE. Multi-label learning draws great interests in many real world applications. It is a highly costly task to assign many labels by the oracle for one instance. Meanwhile, it is also hard to build a good model without diagnosing discriminative labels. Can we reduce the label costs and improve the ability to train a good model for multi-label learning simultaneously? Active learning addresses the less training samples problem by querying the most valuable samples to achieve a better performance with little costs. In multi-label active learning, some researches have been done for querying the relevant labels with less training samples or querying all labels without diagnosing the discriminative information. They all cannot effectively handle the outlier labels for the measurement of uncertainty. Since maximum correntropy criterion (MCC) provides a robust analysis for outliers in many machine learning and data mining algorithms, in this paper, we derive a robust multi-label active learning algorithm based on an MCC by merging uncertainty and representativeness, and propose an efficient alternating optimization method to solve it. With MCC, our method can eliminate the influence of outlier labels that are not discriminative to measure the uncertainty. To make further improvement on the ability of information measurement, we merge uncertainty and representativeness with the prediction labels of unknown data. It cannot only enhance the uncertainty but also improve the similarity measurement of multi-label data with labels information. Experiments on benchmark multi-label data sets have shown a superior performance than the state-of-the-art methods.

Du, B., Xiong, W., Wu, J., Zhang, L., Zhang, L. & Tao, D. 2017, 'Stacked Convolutional Denoising Auto-Encoders for Feature Representation', IEEE Transactions on Cybernetics, vol. 47, no. 4, pp. 1017-1027.
View description>>

Deep networks have achieved excellent performance in learning representation from visual data. However, the supervised deep models like convolutional neural network require large quantities of labeled data, which are very expensive to obtain. To solve this problem, this paper proposes an unsupervised deep network, called the stacked convolutional denoising auto-encoders, which can map images to hierarchical representations without any label information. The network, optimized by layer-wise training, is constructed by stacking layers of denoising auto-encoders in a convolutional way. In each layer, high dimensional feature maps are generated by convolving features of the lower layer with kernels learned by a denoising auto-encoder. The auto-encoder is trained on patches extracted from feature maps in the lower layer to learn robust feature detectors. To better train the large network, a layer-wise whitening technique is introduced into the model. Before each convolutional layer, a whitening layer is embedded to sphere the input data. By layers of mapping, raw images are transformed into high-level feature representations which would boost the performance of the subsequent support vector machine classifier. The proposed algorithm is evaluated by extensive experimentations and demonstrates superior classification performance to state-of-the-art unsupervised networks.

Du, B., Zhang, M., Zhang, L., Hu, R. & Tao, D. 2017, 'PLTD: Patch-Based Low-Rank Tensor Decomposition for Hyperspectral Images', IEEE Transactions on Multimedia, vol. 19, no. 1, pp. 67-79.
View description>>

© 1999-2012 IEEE. Recent years has witnessed growing interest in hyperspectral image (HSI) processing. In practice, however, HSIs always suffer from huge data size and mass of redundant information, which hinder their application in many cases. HSI compression is a straightforward way of relieving these problems. However, most of the conventional image encoding algorithms mainly focus on the spatial dimensions, and they need not consider the redundancy in the spectral dimension. In this paper, we propose a novel HSI compression and reconstruction algorithm via patch-based low-rank tensor decomposition (PLTD). Instead of processing the HSI separately by spectral channel or by pixel, we represent each local patch of the HSI as a third-order tensor. Then, the similar tensor patches are grouped by clustering to form a fourth-order tensor per cluster. Since the grouped tensor is assumed to be redundant, each cluster can be approximately decomposed to a coefficient tensor and three dictionary matrices, which leads to a low-rank tensor representation of both the spatial and spectral modes. The reconstructed HSI can then be simply obtained by the product of the coefficient tensor and dictionary matrices per cluster. In this way, the proposed PLTD algorithm simultaneously removes the redundancy in both the spatial and spectral domains in a unified framework. The extensive experimental results on various public HSI datasets demonstrate that the proposed method outperforms the traditional image compression approaches and other tensor-based methods.

Fang, M., Yin, J., Hall, L.O. & Tao, D. 2017, 'Active Multitask Learning With Trace Norm Regularization Based on Excess Risk', IEEE Transactions on Cybernetics.
View description>>

This paper addresses the problem of active learning on multiple tasks, where labeled data are expensive to obtain for each individual task but the learning problems share some commonalities across multiple related tasks. To leverage the benefits of jointly learning from multiple related tasks and making active queries, we propose a novel active multitask learning approach based on trace norm regularized least squares. The basic idea is to induce an optimal classifier which has the lowest risk and at the same time which is closest to the true hypothesis. Toward this aim, we devise a new active selection criterion that takes into account not only the risk but also the excess risk, which measures the distance to the true hypothesis. Based on this criterion, our proposed algorithm actively selects the instance to query for its label based on the combination of the two risks. Experiments on both synthetic and real-world datasets show that our proposed algorithm provides superior performance as compared to other state-of-the-art active learning methods.

Gong, C., Liu, T., Tang, Y., Yang, J., Yang, J. & Tao, D. 2017, 'A Regularization Approach for Instance-Based Superset Label Learning.', IEEE Transactions on Cybernetics.
View description>>

Different from the traditional supervised learning in which each training example has only one explicit label, superset label learning (SLL) refers to the problem that a training example can be associated with a set of candidate labels, and only one of them is correct. Existing SLL methods are either regularization-based or instance-based, and the latter of which has achieved state-of-the-art performance. This is because the latest instance-based methods contain an explicit disambiguation operation that accurately picks up the groundtruth label of each training example from its ambiguous candidate labels. However, such disambiguation operation does not fully consider the mutually exclusive relationship among different candidate labels, so the disambiguated labels are usually generated in a nondiscriminative way, which is unfavorable for the instance-based methods to obtain satisfactory performance. To address this defect, we develop a novel regularization approach for instance-based superset label (RegISL) learning so that our instance-based method also inherits the good discriminative ability possessed by the regularization scheme. Specifically, we employ a graph to represent the training set, and require the examples that are adjacent on the graph to obtain similar labels. More importantly, a discrimination term is proposed to enlarge the gap of values between possible labels and unlikely labels for every training example. As a result, the intrinsic constraints among different candidate labels are deployed, and the disambiguated labels generated by RegISL are more discriminative and accurate than those output by existing instance-based algorithms. The experimental results on various tasks convincingly demonstrate the superiority of our RegISL to other typical SLL methods in terms of both training accuracy and test accuracy.

Gong, C., Tao, D., Liu, W., Liu, L. & Yang, J. 2017, 'Label Propagation via Teaching-to-Learn and Learning-to-Teach.', IEEE transactions on neural networks and learning systems.
View description>>

How to propagate label information from labeled examples to unlabeled examples over a graph has been intensively studied for a long time. Existing graph-based propagation algorithms usually treat unlabeled examples equally, and transmit seed labels to the unlabeled examples that are connected to the labeled examples in a neighborhood graph. However, such a popular propagation scheme is very likely to yield inaccurate propagation, because it falls short of tackling ambiguous but critical data points (e.g., outliers). To this end, this paper treats the unlabeled examples in different levels of difficulties by assessing their reliability and discriminability, and explicitly optimizes the propagation quality by manipulating the propagation sequence to move from simple to difficult examples. In particular, we propose a novel iterative label propagation algorithm in which each propagation alternates between two paradigms, teaching-to-learn and learning-to-teach (TLLT). In the teaching-to-learn step, the learner conducts the propagation on the simplest unlabeled examples designated by the teacher. In the learning-to-teach step, the teacher incorporates the learner's feedback to adjust the choice of the subsequent simplest examples. The proposed TLLT strategy critically improves the accuracy of label propagation, making our algorithm substantially robust to the values of tuning parameters, such as the Gaussian kernel width used in graph construction. The merits of our algorithm are theoretically justified and empirically demonstrated through experiments performed on both synthetic and real-world data sets.

Gui, J., Liu, T., Sun, Z., Tao, D. & Tan, T. 2017, 'Supervised Discrete Hashing With Relaxation', IEEE Transactions on Neural Networks and Learning Systems.
View description>>

Data-dependent hashing has recently attracted attention due to being able to support efficient retrieval and storage of high-dimensional data, such as documents, images, and videos. In this paper, we propose a novel learning-based hashing method called ''supervised discrete hashing with relaxation'' (SDHR) based on ''supervised discrete hashing'' (SDH). SDH uses ordinary least squares regression and traditional zero-one matrix encoding of class label information as the regression target (code words), thus fixing the regression target. In SDHR, the regression target is instead optimized. The optimized regression target matrix satisfies a large margin constraint for correct classification of each example. Compared with SDH, which uses the traditional zero-one matrix, SDHR utilizes the learned regression target matrix and, therefore, more accurately measures the classification error of the regression model and is more flexible. As expected, SDHR generally outperforms SDH. Experimental results on two large-scale image data sets (CIFAR-10 and MNIST) and a large-scale and challenging face data set (FRGC) demonstrate the effectiveness and efficiency of SDHR.

Hou, S., Chen, L., Tao, D., Zhou, S., Liu, W. & Zheng, Y. 2017, 'Multi-layer multi-view topic model for classifying advertising video', Pattern Recognition, vol. 68, pp. 66-81.
View description>>

© 2017 Elsevier Ltd The recent proliferation of advertising (ad) videos has driven the research in multiple applications, ranging from video analysis to video indexing and retrieval. Among them, classifying ad video is a key task because it allows automatic organization of videos according to categories or genres, and this further enables ad video indexing and retrieval. However, classifying ad video is challenging compared to other types of video classification because of its unconstrained content. While many studies focus on embedding ads relevant to videos, to our knowledge, few focus on ad video classification. In order to classify ad video, this paper proposes a novel ad video representation that aims to sufficiently capture the latent semantics of video content from multiple views in an unsupervised manner. In particular, we represent ad videos from four views, including bag-of-feature (BOF), vector of locally aggregated descriptors (VLAD), fisher vector (FV) and object bank (OB). We then devise a multi-layer multi-view topic model, mlmv_LDA, which models the topics of videos from different views. A topical representation for video, supporting category-related task, is finally achieved by the proposed method. Our empirical classification results on 10,111 real-world ad videos demonstrate that the proposed approach effectively differentiate ad videos.

Li, J., Deng, C., Xu, R.Y.D., Tao, D. & Zhao, B. 2017, 'Robust Object Tracking with Discrete Graph-Based Multiple Experts', IEEE Transactions on Image Processing, vol. 26, no. 6, pp. 2736-2750.
View description>>

© 1992-2012 IEEE. Variations of target appearances due to illumination changes, heavy occlusions, and target deformations are the major factors for tracking drift. In this paper, we show that the tracking drift can be effectively corrected by exploiting the relationship between the current tracker and its historical tracker snapshots. Here, a multi-expert framework is established by the current tracker and its historical trained tracker snapshots. The proposed scheme is formulated into a unified discrete graph optimization framework, whose nodes are modeled by the hypotheses of the multiple experts. Furthermore, an exact solution of the discrete graph exists giving the object state estimation at each time step. With the unary and binary compatibility graph scores defined properly, the proposed framework corrects the tracker drift via selecting the best expert hypothesis, which implicitly analyzes the recent performance of the multi-expert by only evaluating graph scores at the current frame. Three base trackers are integrated into the proposed framework to validate its effectiveness. We first integrate the online SVM on a budget algorithm into the framework with significant improvement. Then, the regression correlation filters with hand-crafted features and deep convolutional neural network features are introduced, respectively, to further boost the tracking performance. The proposed three trackers are extensively evaluated on three data sets: TB-50, TB-100, and VOT2015. The experimental results demonstrate the excellent performance of the proposed approaches against the state-of-the-art methods.

Li, J., Mei, X., Prokhorov, D. & Tao, D. 2017, 'Deep Neural Network for Structural Prediction and Lane Detection in Traffic Scene', IEEE Transactions on Neural Networks and Learning Systems, vol. 28, no. 3, pp. 690-703.
View description>>

© 2016 IEEE. Hierarchical neural networks have been shown to be effective in learning representative image features and recognizing object classes. However, most existing networks combine the low/middle level cues for classification without accounting for any spatial structures. For applications such as understanding a scene, how the visual cues are spatially distributed in an image becomes essential for successful analysis. This paper extends the framework of deep neural networks by accounting for the structural cues in the visual signals. In particular, two kinds of neural networks have been proposed. First, we develop a multitask deep convolutional network, which simultaneously detects the presence of the target and the geometric attributes (location and orientation) of the target with respect to the region of interest. Second, a recurrent neuron layer is adopted for structured visual detection. The recurrent neurons can deal with the spatial distribution of visible cues belonging to an object whose shape or structure is difficult to explicitly define. Both the networks are demonstrated by the practical task of detecting lane boundaries in traffic scenes. The multitask convolutional neural network provides auxiliary geometric information to help the subsequent modeling of the given lane structures. The recurrent neural network automatically detects lane boundaries, including those areas containing no marks, without any explicit prior knowledge or secondary modeling.

Li, J., Xu, C., Yang, W., Sun, C. & Tao, D. 2017, 'Discriminative multi-view interactive image re-ranking', IEEE Transactions on Image Processing, vol. 26, no. 7, pp. 3113-3127.
View description>>

© 2016 IEEE. Given an unreliable visual patterns and insufficient query information, content-based image retrieval is often suboptimal and requires image re-ranking using auxiliary information. In this paper, we propose a discriminative multiview interactive image re-ranking (DMINTIR), which integrates user relevance feedback capturing users' intentions and multiple features that sufficiently describe the images. In DMINTIR, heterogeneous property features are incorporated in the multi-view learning scheme to exploit their complementarities. In addition, a discriminatively learned weight vector is obtained to reassign updated scores and target images for re-ranking. Compared with other multi-view learning techniques, our scheme not only generates a compact representation in the latent space from the redundant multi-view features but also maximally preserves the discriminative information in feature encoding by the large-margin principle. Furthermore, the generalization error bound of the proposed algorithm is theoretically analyzed and shown to be improved by the interactions between the latent space and discriminant function learning. Experimental results on two benchmark data sets demonstrate that our approach boosts baseline retrieval quality and is competitive with the other state-of-the-art re-ranking strategies.

Li, Y., Tian, X., Liu, T. & Tao, D. 2017, 'On Better Exploring and Exploiting Task Relationships in Multitask Learning: Joint Model and Feature Learning', IEEE Transactions on Neural Networks and Learning Systems.
View description>>

Multitask learning (MTL) aims to learn multiple tasks simultaneously through the interdependence between different tasks. The way to measure the relatedness between tasks is always a popular issue. There are mainly two ways to measure relatedness between tasks: common parameters sharing and common features sharing across different tasks. However, these two types of relatedness are mainly learned independently, leading to a loss of information. In this paper, we propose a new strategy to measure the relatedness that jointly learns shared parameters and shared feature representations. The objective of our proposed method is to transform the features of different tasks into a common feature space in which the tasks are closely related and the shared parameters can be better optimized. We give a detailed introduction to our proposed MTL method. Additionally, an alternating algorithm is introduced to optimize the nonconvex objection. A theoretical bound is given to demonstrate that the relatedness between tasks can be better measured by our proposed MTL algorithm. We conduct various experiments to verify the superiority of the proposed joint model and feature MTL method.

Liu, T., Tao, D., Song, M. & Maybank, S. 2017, 'Algorithm-Dependent Generalization Bounds for Multi-Task Learning.', IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 39, no. 2, pp. 227-241.
View description>>

Often, tasks are collected for multi-task learning (MTL) because they share similar feature structures. Based on this observation, in this paper, we present novel algorithm-dependent generalization bounds for MTL by exploiting the notion of algorithmic stability. We focus on the performance of one particular task and the average performance over multiple tasks by analyzing the generalization ability of a common parameter that is shared in MTL. When focusing on one particular task, with the help of a mild assumption on the feature structures, we interpret the function of the other tasks as a regularizer that produces a specific inductive bias. The algorithm for learning the common parameter, as well as the predictor, is thereby uniformly stable with respect to the domain of the particular task and has a generalization bound with a fast convergence rate of order O(1=n), where n is the sample size of the particular task. When focusing on the average performance over multiple tasks, we prove that a similar inductive bias exists under certain conditions on the feature structures. Thus, the corresponding algorithm for learning the common parameter is also uniformly stable with respect to the domains of the multiple tasks, and its generalization bound is of the order O(1=T ), where T is the number of tasks. These theoretical analyses naturally show that the similarity of feature structures in MTL will lead to specific regularizations for predicting, which enables the learning algorithms to generalize fast and correctly from a few examples.

Peng, H., Lan, C., Zheng, Y., Hutvagner, G., Tao, D. & Li, J. 2017, 'Cross disease analysis of co-functional microRNA pairs on a reconstructed network of disease-gene-microRNA tripartite.', BMC bioinformatics, vol. 18, no. 1, p. 193.
View description>>

MicroRNAs always function cooperatively in their regulation of gene expression. Dysfunctions of these co-functional microRNAs can play significant roles in disease development. We are interested in those multi-disease associated co-functional microRNAs that regulate their common dysfunctional target genes cooperatively in the development of multiple diseases. The research is potentially useful for human disease studies at the transcriptional level and for the study of multi-purpose microRNA therapeutics.We designed a computational method to detect multi-disease associated co-functional microRNA pairs and conducted cross disease analysis on a reconstructed disease-gene-microRNA (DGR) tripartite network. The construction of the DGR tripartite network is by the integration of newly predicted disease-microRNA associations with those relationships of diseases, microRNAs and genes maintained by existing databases. The prediction method uses a set of reliable negative samples of disease-microRNA association and a pre-computed kernel matrix instead of kernel functions. From this reconstructed DGR tripartite network, multi-disease associated co-functional microRNA pairs are detected together with their common dysfunctional target genes and ranked by a novel scoring method. We also conducted proof-of-concept case studies on cancer-related co-functional microRNA pairs as well as on non-cancer disease-related microRNA pairs.With the prioritization of the co-functional microRNAs that relate to a series of diseases, we found that the co-function phenomenon is not unusual. We also confirmed that the regulation of the microRNAs for the development of cancers is more complex and have more unique properties than those of non-cancer diseases.

Qiao, M., Liu, L., Yu, J., Xu, C. & Tao, D. 2017, 'Diversified dictionaries for multi-instance learning', Pattern Recognition, vol. 64, pp. 407-416.
View description>>

© 2016 Elsevier LtdMultiple-instance learning (MIL) has been a popular topic in the study of pattern recognition for years due to its usefulness for such tasks as drug activity prediction and image/text classification. In a typical MIL setting, a bag contains a bag-level label and more than one instance/pattern. How to bridge instance-level representations to bag-level labels is a key step to achieve satisfactory classification accuracy results. In this paper, we present a supervised learning method, diversified dictionaries MIL, to address this problem. Our approach, on the one hand, exploits bag-level label information for training class-specific dictionaries. On the other hand, it introduces a diversity regularizer into the class-specific dictionaries to avoid ambiguity between them. To the best of our knowledge, this is the first time that the diversity prior is introduced to solve the MIL problems. Experiments conducted on several benchmark (drug activity and image/text annotation) datasets show that the proposed method compares favorably to state-of-the-art methods.

Shen, X., Tian, X., Liu, T., Xu, F. & Tao, D. 2017, 'Continuous Dropout', IEEE Transactions on Neural Networks and Learning Systems.
View description>>

IEEE Dropout has been proven to be an effective algorithm for training robust deep networks because of its ability to prevent overfitting by avoiding the co-adaptation of feature detectors. Current explanations of dropout include bagging, naive Bayes, regularization, and sex in evolution. According to the activation patterns of neurons in the human brain, when faced with different situations, the firing rates of neurons are random and continuous, not binary as current dropout does. Inspired by this phenomenon, we extend the traditional binary dropout to continuous dropout. On the one hand, continuous dropout is considerably closer to the activation characteristics of neurons in the human brain than traditional binary dropout. On the other hand, we demonstrate that continuous dropout has the property of avoiding the co-adaptation of feature detectors, which suggests that we can extract more independent feature detectors for model averaging in the test stage. We introduce the proposed continuous dropout to a feedforward neural network and comprehensively compare it with binary dropout, adaptive dropout, and DropConnect on Modified National Institute of Standards and Technology, Canadian Institute for Advanced Research-10, Street View House Numbers, NORB, and ImageNet large scale visual recognition competition-12. Thorough experiments demonstrate that our method performs better in preventing the co-adaptation of feature detectors and improves test performance.

Wang, Y., Xu, C., You, S., Xu, C. & Tao, D. 2017, 'DCT regularized extreme visual recovery', IEEE Transactions on Image Processing, vol. 26, no. 7, pp. 3360-3371.
View description>>

© 2017 IEEE. Here we study the extreme visual recovery problem, in which over 90% of pixel values in a given image are missing. Existing low rank-based algorithms are only effective for recovering data with at most 90% missing values. Thus, we exploit visual data's smoothness property to help solve this challenging extreme visual recovery problem. Based on the discrete cosine transform (DCT), we propose a novel DCT regularizer that involves all pixels and produces smooth estimations in any view. Our theoretical analysis shows that the total variation regularizer, which only achieves local smoothness, is a special case of the proposed DCT regularizer. We also develop a new visual recovery algorithm by minimizing the DCT regularizer and nuclear norm to achieve a more visually pleasing estimation. Experimental results on a benchmark image data set demonstrate that the proposed approach is superior to the state-of-the-art methods in terms of peak signal-to-noise ratio and structural similarity.

Xiong, W., Zhang, L., Du, B. & Tao, D. 2017, 'Combining local and global: Rich and robust feature pooling for visual recognition', Pattern Recognition, vol. 62, pp. 225-235.
View description>>

© 2016 Elsevier Ltd The human visual system proves expert in discovering patterns in both global and local feature space. Can we design a similar way for unsupervised feature learning? In this paper, we propose a novel spatial pooling method within an unsupervised feature learning framework, named Rich and Robust Feature Pooling (R 2 FP), to better extract rich and robust representation from sparse feature maps learned from the raw data. Both local and global pooling strategies are further considered to instantiate such a method. The former selects the most representative features in the sub-region and summarizes the joint distribution of the selected features, while the latter is utilized to extract multiple resolutions of features and fuse the features with a feature balance kernel for rich representation. Extensive experiments on several image recognition tasks demonstrate the superiority of the proposed method.

Zeng, K., Yu, J., Wang, R., Li, C. & Tao, D. 2017, 'Coupled Deep Autoencoder for Single Image Super-Resolution', IEEE Transactions on Cybernetics, vol. 47, no. 1, pp. 27-37.
View description>>

Sparse coding has been widely applied to learning-based single image super-resolution (SR) and has obtained promising performance by jointly learning effective representations for low-resolution (LR) and high-resolution (HR) image patch pairs. However, the resulting HR images often suffer from ringing, jaggy, and blurring artifacts due to the strong yet ad hoc assumptions that the LR image patch representation is equal to, is linear with, lies on a manifold similar to, or has the same support set as the corresponding HR image patch representation. Motivated by the success of deep learning, we develop a data-driven model coupled deep autoencoder (CDA) for single image SR. CDA is based on a new deep architecture and has high representational capability. CDA simultaneously learns the intrinsic representations of LR and HR image patches and a big-data-driven function that precisely maps these LR representations to their corresponding HR representations. Extensive experimentation demonstrates the superior effectiveness and efficiency of CDA for single image SR compared to other state-of-the-art methods on Set5 and Set14 datasets.

Chua, T.S., He, X., Liu, W., Piccardi, M., Wen, Y. & Tao, D. 2016, 'Big data meets multimedia analytics', Signal Processing, vol. 124, pp. 1-4.

Ding, C., Choi, J., Tao, D. & Davis, L.S. 2016, 'Multi-Directional Multi-Level Dual-Cross Patterns for Robust Face Recognition.', IEEE transactions on pattern analysis and machine intelligence, vol. 38, no. 3, pp. 518-531.
View description>>

To perform unconstrained face recognition robust to variations in illumination, pose and expression, this paper presents a new scheme to extract "Multi-Directional Multi-Level Dual-Cross Patterns" (MDML-DCPs) from face images. Specifically, the MDML-DCPs scheme exploits the first derivative of Gaussian operator to reduce the impact of differences in illumination and then computes the DCP feature at both the holistic and component levels. DCP is a novel face image descriptor inspired by the unique textural structure of human faces. It is computationally efficient and only doubles the cost of computing local binary patterns, yet is extremely robust to pose and expression variations. MDML-DCPs comprehensively yet efficiently encodes the invariant characteristics of a face image from multiple levels into patterns that are highly discriminative of inter-personal differences but robust to intra-personal variations. Experimental results on the FERET, CAS-PERL-R1, FRGC 2.0, and LFW databases indicate that DCP outperforms the state-of-the-art local descriptors (e.g., LBP, LTP, LPQ, POEM, tLBP, and LGXP) for both face identification and face verification tasks. More impressively, the best performance is achieved on the challenging LFW and FRGC 2.0 databases by deploying MDML-DCPs in a simple recognition scheme.

Du, B., Wang, S., Wang, N., Zhang, L., Tao, D. & Zhang, L. 2016, 'Hyperspectral signal unmixing based on constrained non-negative matrix factorization approach', Neurocomputing, vol. 204, pp. 153-161.
View description>>

© 2016 Elsevier B.V. Hyperspectral unmixing is a hot topic in signal and image processing. A set of high-dimensional data matrices can be decomposed into two sets of non-negative low-dimensional matrices by Non-negative matrix factorization (NMF). However, the algorithm has many local solutions because of the non-convexity of the objective function. Some algorithms solve this problem by adding auxiliary constraints, such as sparse. The sparse NMF has a good performance but the result is unstable and sensitive to noise. Using the structural information for the unmixing approaches can make the decomposition stable. Someone used a clustering based on Euclidean distance to guide the decomposition and obtain good performance. The Euclidean distance is just used to measure the straight line distance of two points. However, the ground objects usually obey certain statistical distribution. It's difficult to measure the difference between the statistical distributions comprehensively by Euclidean distance. Kullback-Leibler divergence (KL divergence) is a better metric. In this paper, we propose a new approach named KL divergence constrained NMF which measures the statistical distribution difference using KL divergence instead of the Euclidean distance. It can improve the accuracy of structured information by using the KL divergence in the algorithm. Experimental results based on synthetic and real hyperspectral data show the superiority of the proposed algorithm with respect to other state-of-the-art algorithms.

Du, B., Zhang, Y., Zhang, L. & Tao, D. 2016, 'Beyond the Sparsity-Based Target Detector: A Hybrid Sparsity and Statistics-Based Detector for Hyperspectral Images', IEEE Transactions on Image Processing, vol. 25, no. 11, pp. 5345-5357.
View description>>

© 2016 IEEE. Hyperspectral images provide great potential for target detection, however, new challenges are also introduced for hyperspectral target detection, resulting that hyp erspectral target detection should be treated as a new problem and modeled differently. Many classical detectors are proposed based on the linear mixing model and the sparsity model. However, the former type of model cannot deal well with spectral variability in limited endmembers, and the latter type of model usually treats the target detection as a simple classification problem and pays less attention to the low target probability. In this case, can we find an efficient way to utilize both the high-dimension features behind hyperspectral images and the limited target information to extract small targets? This paper proposes a novel sparsity-based detector named the hybrid sparsity and statistics detector (HSSD) for target detection in hyperspectral imagery, which can effectively deal with the above two problems. The proposed algorithm designs a hypothesis-specific dictionary based on the prior hypotheses for the test pixel, which can avoid the imbalanced number of training samples for a class-specific dictionary. Then, a purification process is employed for the background training samples in order to construct an effective competition between the two hypotheses. Next, a sparse representation-based binary hypothesis model merged with additive Gaussian noise is proposed to represent the image. Finally, a generalized likelihood ratio test is performed to obtain a more robust detection decision than the reconstruction residual-based detection methods. Extensive experimental results with three hyperspectral data sets confirm that the proposed HSSD algorithm clearly outperforms the state-of-the-art target detectors.

Gui, J., Liu, T., Tao, D., Sun, Z. & Tan, T. 2016, 'Representative Vector Machines: A Unified Framework for Classical Classifiers', IEEE Transactions on Cybernetics, vol. 46, no. 8, pp. 1877-1888.
View description>>

Classifier design is a fundamental problem in pattern recognition. A variety of pattern classification methods such as the nearest neighbor (NN) classifier, support vector machine (SVM), and sparse representation-based classification (SRC) have been proposed in the literature. These typical and widely used classifiers were originally developed from different theory or application motivations and they are conventionally treated as independent and specific solutions for pattern classification. This paper proposes a novel pattern classification framework, namely, representative vector machines (or RVMs for short). The basic idea of RVMs is to assign the class label of a test example according to its nearest representative vector. The contributions of RVMs are twofold. On one hand, the proposed RVMs establish a unified framework of classical classifiers because NN, SVM, and SRC can be interpreted as the special cases of RVMs with different definitions of representative vectors. Thus, the underlying relationship among a number of classical classifiers is revealed for better understanding of pattern classification. On the other hand, novel and advanced classifiers are inspired in the framework of RVMs. For example, a robust pattern classification method called discriminant vector machine (DVM) is motivated from RVMs. Given a test example, DVM first finds its k-NNs and then performs classification based on the robust M-estimator and manifold regularization. Extensive experimental evaluations on a variety of visual recognition tasks such as face recognition (Yale and face recognition grand challenge databases), object categorization (Caltech-101 dataset), and action recognition (Action Similarity LAbeliNg) demonstrate the advantages of DVM over other classifiers.

Hong, R., Hu, Z., Wang, R., Wang, M. & Tao, D. 2016, 'Multi-View Object Retrieval via Multi-Scale Topic Models', IEEE Transactions on Image Processing, vol. 25, no. 12, pp. 5814-5827.
View description>>

© 2016 IEEE. The increasing number of 3D objects in various applications has increased the requirement for effective and efficient 3D object retrieval methods, which attracted extensive research efforts in recent years. Existing works mainly focus on how to extract features and conduct object matching. With the increasing applications, 3D objects come from different areas. In such circumstances, how to conduct object retrieval becomes more important. To address this issue, we propose a multi-view object retrieval method using multi-scale topic models in this paper. In our method, multiple views are first extracted from each object, and then the dense visual features are extracted to represent each view. To represent the 3D object, multi-scale topic models are employed to extract the hidden relationship among these features with respect to varied topic numbers in the topic model. In this way, each object can be represented by a set of bag of topics. To compare the objects, we first conduct topic clustering for the basic topics from two data sets, and then generate the common topic dictionary for new representation. Then, the two objects can be aligned to the same common feature space for comparison. To evaluate the performance of the proposed method, experiments are conducted on two data sets. The 3D object retrieval experimental results and comparison with existing methods demonstrate the effectiveness of the proposed method.

Li, Q., Xie, B., You, J., Bian, W. & Tao, D. 2016, 'Correlated Logistic Model with Elastic Net Regularization for Multilabel Image Classification', IEEE Transactions on Image Processing, vol. 25, no. 8, pp. 3801-3813.
View description>>

© 1992-2012 IEEE. In this paper, we present correlated logistic (CorrLog) model for multilabel image classification. CorrLog extends conventional logistic regression model into multilabel cases, via explicitly modeling the pairwise correlation between labels. In addition, we propose to learn the model parameters of CorrLog with elastic net regularization, which helps exploit the sparsity in feature selection and label correlations and thus further boost the performance of multilabel classification. CorrLog can be efficiently learned, though approximately, by regularized maximum pseudo likelihood estimation, and it enjoys a satisfying generalization bound that is independent of the number of labels. CorrLog performs competitively for multilabel image classification on benchmark data sets MULAN scene, MIT outdoor scene, PASCAL VOC 2007, and PASCAL VOC 2012, compared with the state-of-the-art multilabel classification algorithms.

Li, Z., Gong, D., Li, Q., Tao, D. & Li, X. 2016, 'Mutual component analysis for heterogeneous face recognition', ACM Transactions on Intelligent Systems and Technology, vol. 7, no. 3, pp. 1-23.
View description>>

Heterogeneous face recognition, also known as cross-modality face recognition or intermodality face recognition, refers to matching two face images from alternative image modalities. Since face images from different image modalities of the same person are associated with the same face object, there should be mutual components that reflect those intrinsic face characteristics that are invariant to the image modalities. Motivated by this rationality, we propose a novel approach called Mutual Component Analysis (MCA) to infer the mutual components for robust heterogeneous face recognition. In the MCA approach, a generative model is first proposed to model the process of generating face images in different modalities, and then an Expectation Maximization (EM) algorithm is designed to iteratively learn the model parameters. The learned generative model is able to infer the mutual components (which we call the hidden factor, where hidden means the factor is unreachable and invisible, and can only be inferred from observations) that are associated with the person's identity, thus enabling fast and effective matching for cross-modality face recognition. To enhance recognition performance, we propose an MCA-based multiclassifier framework using multiple local features. Experimental results show that our new approach significantly outperforms the state-of-the-art results on two typical application scenarios: sketch-to-photo and infrared-to-visible face recognition.

Luo, Y., Wen, Y., Tao, D., Gui, J. & Xu, C. 2016, 'Large margin multi-modal multi-task feature extraction for image classification', IEEE Transactions on Image Processing, vol. 25, no. 1, pp. 414-427.
View description>>

© 2015 IEEE. The features used in many image analysis-based applications are frequently of very high dimension. Feature extraction offers several advantages in high-dimensional cases, and many recent studies have used multi-task feature extraction approaches, which often outperform single-task feature extraction approaches. However, most of these methods are limited in that they only consider data represented by a single type of feature, even though features usually represent images from multiple modalities. We, therefore, propose a novel large margin multi-modal multi-task feature extraction (LM3FE) framework for handling multi-modal features for image classification. In particular, LM3FE simultaneously learns the feature extraction matrix for each modality and the modality combination coefficients. In this way, LM3FE not only handles correlated and noisy features, but also utilizes the complementarity of different modalities to further help reduce feature redundancy in each modality. The large margin principle employed also helps to extract strongly predictive features, so that they are more suitable for prediction (e.g., classification). An alternating algorithm is developed for problem optimization, and each subproblem can be efficiently solved. Experiments on two challenging real-world image data sets demonstrate the effectiveness and superiority of the proposed method.

Shen, F., Zhou, X., Yang, Y., Song, J., Shen, H.T. & Tao, D. 2016, 'A Fast Optimization Method for General Binary Code Learning', IEEE Transactions on Image Processing, vol. 25, no. 12, pp. 5610-5621.
View description>>

© 2016 IEEE. Hashing or binary code learning has been recognized to accomplish efficient near neighbor search, and has thus attracted broad interests in recent retrieval, vision, and learning studies. One main challenge of learning to hash arises from the involvement of discrete variables in binary code optimization. While the widely used continuous relaxation may achieve high learning efficiency, the pursued codes are typically less effective due to accumulated quantization error. In this paper, we propose a novel binary code optimization method, dubbed discrete proximal linearized minimization (DPLM), which directly handles the discrete constraints during the learning process. Specifically, the discrete (thus nonsmooth nonconvex) problem is reformulated as minimizing the sum of a smooth loss term with a nonsmooth indicator function. The obtained problem is then efficiently solved by an iterative procedure with each iteration admitting an analytical discrete solution, which is thus shown to converge very fast. In addition, the proposed method supports a large family of empirical loss functions, which is particularly instantiated in this paper by both a supervised and an unsupervised hashing losses, together with the bits uncorrelation and balance constraints. In particular, the proposed DPLM with a supervised ℓ 2 loss encodes the whole NUS-WIDE database into 64-b binary codes within 10 s on a standard desktop computer. The proposed approach is extensively evaluated on several large-scale data sets and the generated binary codes are shown to achieve very promising results on both retrieval and classification tasks.

Xie, L., Tao, D. & Wei, H. 2016, 'Joint structured sparsity regularized multiview dimension reduction for video-based facial expression recognition', ACM Transactions on Intelligent Systems and Technology, vol. 8, no. 2.
View description>>

© 2016 ACM. Video-based facial expression recognition (FER) has recently received increased attention as a result of its widespread application. Using only one type of feature to describe facial expression in video sequences is often inadequate, because the information available is very complex. With the emergence of different features to represent different properties of facial expressions in videos, an appropriate combination of these features becomes an important, yet challenging, problem. Considering that the dimensionality of these features is usually high, we thus introduce multiview dimension reduction (MVDR) into video-based FER. In MVDR, it is critical to explore the relationships between and within different feature views. To achieve this goal, we propose a novel framework of MVDR by enforcing joint structured sparsity at both inter- and intraview levels. In this way, correlations on and between the feature spaces of different views tend to be well-exploited. In addition, a transformation matrix is learned for each view to discover the patterns contained in the original features, so that the different views are comparable in finding a commo n representation. The model can be not only performed in an unsupervised manner, but also easily extended to a semisupervised setting by incorporating some domain knowledge. An alternating algorithm is developed for problem optimization, and each subproblem can be efficiently solved. Experiments on two challenging video-based FER datasets demonstrate the effectiveness of the proposed framework.

Xu, C., Liu, T., Tao, D. & Xu, C. 2016, 'Local Rademacher Complexity for Multi-label Learning', IEEE Transactions on Image Processing, vol. 25, no. 3, pp. 1495-1507.
View description>>

We analyze the local Rademacher complexity of empirical risk minimization (ERM)-based multi-label learning algorithms, and in doing so propose a new algorithm for multi-label learning. Rather than using the trace norm to regularize the multi-label predictor, we instead minimize the tail sum of the singular values of the predictor in multi-label learning. Benefiting from the use of the local Rademacher complexity, our algorithm, therefore, has a sharper generalization error bound and a faster convergence rate. Compared to methods that minimize over all singular values, concentrating on the tail singular values results in better recovery of the low-rank structure of the multi-label predictor, which plays an import role in exploiting label correlations. We propose a new conditional singular value thresholding algorithm to solve the resulting objective function. Empirical studies on real-world datasets validate our theoretical results and demonstrate the effectiveness of the proposed algorithm.

Xu, Z., Hong, Z., Zhang, Y., Wu, J., Tsoi, A.C. & Tao, D. 2016, 'Multinomial Latent Logistic Regression for Image Understanding', IEEE Transactions on Image Processing, vol. 25, no. 2, pp. 973-987.
View description>>

© 1992-2012 IEEE. In this paper, we present multinomial latent logistic regression (MLLR), a new learning paradigm that introduces latent variables to logistic regression. By inheriting the advantages of logistic regression, MLLR is efficiently optimized using the second-order derivatives and provides effective probabilistic analysis on output predictions. MLLR is particularly effective in weakly supervised settings where the latent variable has an exponential number of possible values. The effectiveness of MLLR is demonstrated on four different image understanding applications, including a new challenging architectural style classification task. Furthermore, we show that MLLR can be generalized to general structured output prediction, and in doing so, we provide a thorough investigation of the connections and differences between MLLR and existing related algorithms, including latent structural SVMs and hidden conditional random fields.

Yu, B., Fang, M. & Tao, D. 2016, 'Per-Round Knapsack-Constrained Linear Submodular Bandits.', Neural computation, vol. 28, no. 12, pp. 2757-2789.
View description>>

Linear submodular bandits has been proven to be effective in solving the diversification and feature-based exploration problem in information retrieval systems. Considering there is inevitably a budget constraint in many web-based applications, such as news article recommendations and online advertising, we study the problem of diversification under a budget constraint in a bandit setting. We first introduce a budget constraint to each exploration step of linear submodular bandits as a new problem, which we call per-round knapsack-constrained linear submodular bandits. We then define an [Formula: see text]-approximation unit-cost regret considering that the submodular function maximization is NP-hard. To solve this new problem, we propose two greedy algorithms based on a modified UCB rule. We prove these two algorithms with different regret bounds and computational complexities. Inspired by the lazy evaluation process in submodular function maximization, we also prove that a modified lazy evaluation process can be used to accelerate our algorithms without losing their theoretical guarantee. We conduct a number of experiments, and the experimental results confirm our theoretical analyses.

Conferences

Fu, H., Wang, C., Tao, D. & Black, M.J. 2016, 'Occlusion boundary detection via deep exploration of context', Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, IEEE Conference on Computer Vision and Pattern Recognition, IEEE, Las Vegas, Nevada, pp. 241-250.
View description>>

Occlusion boundaries contain rich perceptual information about the underlying scene structure. They also provide important cues in many visual perception tasks such as scene understanding, object recognition, and segmentation. In this paper, we improve occlusion boundary detection via enhanced exploration of contextual information (e.g., local structural boundary patterns, observations from surrounding regions, and temporal context), and in doing so develop a novel approach based on convolutional neural networks (CNNs) and conditional random fields (CRFs). Experimental results demonstrate that our detector significantly outperforms the state-of-the-art (e.g., improving the F-measure from 0.62 to 0.71 on the commonly used CMU benchmark). Last but not least, we empirically assess the roles of several important components of the proposed detector, so as to validate the rationale behind this approach.

Gong, C., Tao, D., Yang, J. & Liu, W. 2016, 'Teaching-to-learn and learning-to-teach for multi-label propagation', Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence (AAAI-16), AAAI Conference on Artificial Intelligence, AAAI, Arizona, pp. 1610-1616.
View description>>

Multi-label propagation aims to transmit the multi-label information from labeled examples to unlabeled examples based on a weighted graph. Existing methods ignore the specific propagation difficulty of different unlabeled examples and conduct the propagation in an imperfect sequence, leading to the error-prone classification of some difficult examples with uncertain labels. To address this problem, this paper associates each possible label with a “teacher”, and proposes a “Multi-Label Teaching-to-Learn and Learning-to- Teach” (ML-TLLT) algorithm, so that the entire propagation process is guided by the teachers and manipulated from simple examples to more difficult ones. In the teaching-to-learn step, the teachers select the simplest examples for the current propagation by investigating both the definitiveness of each possible label of the unlabeled examples, and the dependencies between labels revealed by the labeled examples. In the learning-to-teach step, the teachers reversely learn from the learner’s feedback to properly select the simplest examples for the next propagation. Thorough empirical studies show that due to the optimized propagation sequence designed by the teachers, ML-TLLT yields generally better performance than seven state-of-the-art methods on the typical multi-label benchmark datasets.

Gong, M., Zhang, K., Liu, T., Tao, D., Glymour, C. & Scholkopf, B. 2016, 'Domain adaptation with conditional transferable components', Proceedings of Machine Learning Research, International Conference on International Conference on Machine Learning (ICML), ACM, New York, USA, pp. 2839-2848.
View description>>

© 2016 by the author(s).Domain adaptation arises in supervised learning when the training (source domain) and test (target domain) data have different distribution- s. Let X and Y denote the features and target, respectively, previous work on domain adaptation mainly considers the covariate shift situation where the distribution of the features P(X) changes across domains while the conditional distribution P(Y\X) stays the same. To reduce domain discrepancy, recent methods try to find invariant components T(X) that have similar P(T(X)) on different domains by explicitly minimizing a distribution discrepancy measure. However, it is not clear if P(Y\T(X)) in different domains is also similar when P(Y/X)changes. Furthermore, transferable components do not necessarily have to be invariant. If the change in some components is identifiable, we can make use of such components for prediction in the target domain. In this paper, we focus on the case where P{X ,Y) and P(Y') both change in a causal system in which Y is the cause for X. Under appropriate assumptions, we aim to extract conditional transferable components whose conditional distribution P(T{X)\Y) is invariant after proper location-scale (LS) transformations, and identify how P{Y) changes between domains simultaneously. We provide theoretical analysis and empirical evaluation on both synthetic and real-world data to show the effectiveness of our method.

Huang, S., Xu, Z., Tao, D. & Zhang, Y. 2016, 'Part-Stacked CNN for Fine-Grained Visual Categorization', 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), IEEE Conference on Computer Vision and Pattern Recognition, IEEE Xplore, Las Vegas, Nevada, pp. 1173-1182.
View description>>

In the context of fine-grained visual categorization, the ability to interpret models as human-understandable visual manuals is sometimes as important as achieving high classification accuracy. In this paper, we propose a novel Part-Stacked CNN architecture that explicitly explains the fine-grained recognition process by modeling subtle differences from object parts. Based on manually-labeled strong part annotations, the proposed architecture consists of a fully convolutional network to locate multiple object parts and a two-stream classification network that en- codes object-level and part-level cues simultaneously. By adopting a set of sharing strategies between the computation of multiple object parts, the proposed architecture is very efficient running at 20 frames/sec during inference. Experimental results on the CUB-200-2011 dataset reveal the effectiveness of the proposed architecture, from both the perspective of classification accuracy and model interpretability.

Li, Q., Bian, W., Xu, Y., You, J. & Tao, D. 2016, 'Random Mixed Field Model for Mixed-Attribute Data Restoration', Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, AAAI Conference on Artificial Intelligence, AAAI, Phoenix, Arizona, USA, pp. 1244-1250.
View description>>

Noisy and incomplete data restoration is a critical preprocessing step in developing effective learning algorithms, which targets to reduce the effect of noise and missing values in data. By utilizing attribute correlations and/or instance similarities, various techniques have been developed for data denoising and imputation tasks. However, current existing data restoration methods are either specifically designed for a particular task, or incapable of dealing with mixed-attribute data. In this paper, we develop a new probabilistic model to provide a general and principled method for restoring mixed-attribute data. The main contributions of this study are twofold: a) a unified generative model, utilizing a generic random mixed field (RMF) prior, is designed to exploit mixed-attribute correlations; and b) a structured mean-field variational approach is proposed to solve the challenging inference problem of simultaneous denoising and imputation. We evaluate our method by classification experiments on both synthetic data and real benchmark datasets. Experiments demonstrate, our approach can effectively improve the classification accuracy of noisy and incomplete data by comparing with other data restoration methods.

Scheirer, W.J., Flynn, P.J., Ding, C.X., Guo, G., Struc, V., Al Jazaery, M., Grm, K., Dobrisek, S., Tao, D., Zhu, Y., Brogan, J., Banerjee, S., Bharati, A. & Webster, B. 2016, 'Report on the BTAS 2016 Video Person Recognition Evaluation', Proceedings of the IEEE International Conference on Biometrics Theory, Applications and Systems (BTAS), IEEE International Conference on Biometrics Theory, Applications and Systems (BTAS), IEEE, Buffalo, New York, pp. 1-8.
View description>>

This report presents results from the Video Person Recognition Evaluation held in conjunction with the 8th IEEE International Conference on Biometrics: Theory, Applications, and Systems (BTAS). Two experiments required algorithms to recognize people in videos from the Point-and-Shoot Face Recognition Challenge Problem (PaSC). The first consisted of videos from a tripod mounted high quality video camera. The second contained videos acquired from 5 different handheld video cameras. There were 1,401 videos in each experiment of 265 subjects. The subjects, the scenes, and the actions carried out by the people are the same in both experiments. An additional experiment required algorithms to recognize people in videos from the Video Database of Moving Faces and People (VDMFP). There were 958 videos in this experiment of 297 subjects. Four groups from around the world participated in the evaluation. The top verification rate for PaSC from this evaluation is 0.98 at a false accept rate of 0.01 - a remarkable advancement in performance from the competition held at FG 2015.

Wang, Z., Du, B., Zhang, L., Zhang, L., Fang, M. & Tao, D. 2016, 'Multi-label active learning based on maximum correntropy criterion: Towards robust and discriminative labeling', Computer Vision – ECCV 2016 (LNCS), European Conference on Computer Vision (ECCE), Springer, Amsterdam, The Netherlands, pp. 453-468.
View description>>

© Springer International Publishing AG 2016.Multi-label learning is a challenging problem in computer vision field. In this paper, we propose a novel active learning approach to reduce the annotation costs greatly for multi-label classification. State-of-the-art active learning methods either annotate all the relevant samples without diagnosing discriminative information in the labels or annotate only limited discriminative samples manually, that has weak immunity for the outlier labels. To overcome these problems, we propose a multi-label active learning method based on Maximum Correntropy Criterion (MCC) by merging uncertainty and representativeness. We use the the labels of labeled data and the prediction labels of unknown data to enhance the uncertainty and representativeness measurement by merging strategy, and use the MCC to alleviate the influence of outlier labels for discriminative labeling. Experiments on several challenging benchmark multi-label datasets show the superior performance of our proposed method to the state-of-the-art methods.

Xie, L., Tao, D. & Wei, H. 2016, 'Multi-view exclusive unsupervised dimension reduction for video-based facial expression recognition', Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence (IJCAI-16), International Joint Conference on Artificial Intelligence, AAAI, New York, USA, pp. 2217-2223.
View description>>

Video-based facial expression recognition (FER) has recently received increased attention as a result of its widespread application. Many kinds of features have been proposed to represent different properties of facial expressions in videos. However the dimensionality of these features is usually high. In addition, due to the complexity of the information available in video sequences, using only one type of feature is often inadequate. How to effectively reduce the dimensionality and combine multi-view features thus becomes a challenging problem. In this paper, motivated by the recent success in exclusive feature selection, we first introduce exclusive group LASSO (EG-LASSO) to unsupervised dimension reduction (UDR). This leads to the proposed exclusive UDR (EUDR) framework, which allows arbitrary sparse structures on the feature space. To properly combine multiple kinds of features, we further extend EUDR to multi-view EUDR (MEUDR), where the structured sparsity is enforced at both intra- and inter-view levels. In addition, combination weights are learned for all views to allow them to contribute differently to the final consensus presentation. A reliable solution is then obtained. Experiments on two challenging video-based FER datasets demonstrate the effectiveness of the proposed method.

Xiong, W., Du, B., Zhang, L., Hu, R., Bian, W., Shen, J. & Tao, D. 2015, 'R2FP: Rich and robust feature pooling for mining visual data', Proceedings - IEEE International Conference on Data Mining, ICDM, IEEE International Conference on Data Mining, IEEE, Piscataway, USA, pp. 469-478.
View description>>

© 2015 IEEE. The human visual system proves smart in extracting both global and local features. Can we design a similar way for unsupervised feature learning? In this paper, we propose anovel pooling method within an unsupervised feature learningframework, named Rich and Robust Feature Pooling (R2FP), to better explore rich and robust representation from sparsefeature maps of the input data. Both local and global poolingstrategies are further considered to instantiate such a methodand intensively studied. The former selects the most conductivefeatures in the sub-region and summarizes the joint distributionof the selected features, while the latter is utilized to extractmultiple resolutions of features and fuse the features witha feature balancing kernel for rich representation. Extensiveexperiments on several image recognition tasks demonstratethe superiority of the proposed techniques.

Xu, Z., Huang, S., Zhang, Y. & Tao, D. 2015, 'Augmenting strong supervision using web data for fine-grained categorization', Proceedings of the IEEE International Conference on Computer Vision, IEEE International Conference on Computer Vision (ICCV), IEEE, Santiago, Chile, pp. 2524-2532.
View description>>

© 2015 IEEE.We propose a new method for fine-grained object recognition that employs part-level annotations and deep convolutional neural networks (CNNs) in a unified framework. Although both schemes have been widely used to boost recognition performance, due to the difficulty in acquiring detailed part annotations, strongly supervised fine-grained datasets are usually too small to keep pace with the rapid evolution of CNN architectures. In this paper, we solve this problem by exploiting inexhaustible web data. The proposed method improves classification accuracy in two ways: more discriminative CNN feature representations are generated using a training set augmented by collecting a large number of part patches from weakly supervised web images, and more robust object classifiers are learned using a multi-instance learning algorithm jointly on the strong and weak datasets. Despite its simplicity, the proposed method delivers a remarkable performance improvement on the CUB200-2011 dataset compared to baseline part-based R-CNN methods, and achieves the highest accuracy on this dataset even in the absence of test image annotations.

Yu, B., Fang, M., Tao, D. & Yin, J. 2016, 'Submodular asymmetric feature selection in cascade object detection', Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence (AAAI-16), AAAI Conference on Artificial Intelligence, AAAI, Phoenix, USA, pp. 1387-1393.
View description>>

© Copyright 2016, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved.A cascade classifier has turned out to be effective in slidingwindow based real-Time object detection. In a cascade classifier, node learning is the key process, which includes feature selection and classifier design. Previous algorithms fail to effectively tackle the asymmetry and intersection problems existing in cascade classification, thereby limiting the performance of object detection. In this paper, we improve current feature selection algorithm by addressing both asymmetry and intersection problems. We formulate asymmetric feature selection as a submodular function maximization problem. We then propose a new algorithm SAFS with formal performance guarantee to solve this problem.We use face detection as a case study and perform experiments on two real-world face detection datasets. The experimental results demonstrate that our algorithm SAFS outperforms the state-of-Art feature selection algorithms in cascade object detection, such as FFS and LACBoost.

Zhang, Q., Zhang, L., Du, B., Zheng, W., Bian, W. & Tao, D. 2015, 'MMFE: Multitask multiview feature embedding', Proceedings - IEEE International Conference on Data Mining, ICDM, IEEE International Conference on Data Mining, pp. 1105-1110.
View description>>

© 2015 IEEE. In data mining and pattern recognition area, the learned objects are often represented by the multiple features from various of views. How to learn an efficient and effective feature embedding for the subsequent learning tasks? In this paper, we address this issue by providing a novel multi-task multiview feature embedding (MMFE) framework. The MMFE algorithm is based on the idea of low-rank approximation, which suggests that the observed multiview feature matrix is approximately represented by the low-dimensional feature embedding multiplied by a projection matrix. In order to fully consider the particular role of each view to the multiview feature embedding, we simultaneously suggest the multitask learning scheme and ensemble manifold regularization into the MMFE algorithm to seek the optimal projection. Since the objection function of MMFE is multi-variable and non-convex, we further provide an iterative optimization procedure to find the available solution. Two real world experiments show that the proposed method outperforms single-task-based as well as state-of-the-art multiview feature embedding methods for the classification problem.

Zhao, N., Zhang, L., Du, B., Zhang, L., Tao, D. & You, J. 2016, 'Sparse tensor discriminative locality alignment for gait recognition', Proceedings of the International Joint Conference on Neural Networks, IEEE International Joint Conference on Neural Networks, IEEE, Vancouver, Canada, pp. 4489-4495.
View description>>

© 2016 IEEE.Gait recognition is a rising biometric technology which aims to distinguish people purely through the analysis of the way they walk, while the problem is that the dimensionality of the gait data is too high, so it is necessary to carry on dimensionality reduction task. Up to date, in the area of computer vision and pattern recognition, various dimensionality reduction algorithms have been employed for gait data, including the conventional vector representation based methods principal components analysis (PCA) and, locality preserving projection (LPP), and the recently proposed multi-linear subspace learning based approaches such as multilinear principal component analysis (MPCA). In this paper, inspired by the advantages of the tensor representation and manifold learning, we propose a novel sparse tensor discriminative locality alignment for human gait feature representation and dimensionality reduction algorithm, and subsequently apply the refined feature for gait recognition by a lazy classifier of the KNN. The proposed method adopts sparse multi-way projection based on the high-order version of discriminative locality alignment, by which the class separability is enhanced and the potential model overfitting is simultaneously avoided. Extensive experiments on the University of South Florida (USF) HumanID Gait Database show that the proposed method achieves better recognition rate compared with some existing classical dimensionality reduction algorithms.

Chapters

He, X., Luo, S., Tao, D., Xu, C., Yang, J. & Abul Hasan, M. 2015, 'Preface' in MultiMedia Modeling, Springer, Germany, pp. V-VI.

He, X., Xu, C., Tao, D., Luo, S., Yang, J. & Hasan, M.A. 2015, 'Preface' in MultiMedia Modeling (LNCS), Springer, Germany, pp. V-VI.

Journal articles

Gong, C., Liu, T., Tao, D., Fu, K., Tu, E. & Yang, J. 2015, 'Deformed Graph Laplacian for Semisupervised Learning', IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, vol. 26, no. 10, pp. 2261-2274.

Gong, C., Tao, D., Fu, K. & Yang, J. 2015, 'Fick's Law Assisted Propagation for Semisupervised Learning', IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, vol. 26, no. 9, pp. 2148-2162.

He, X., Luo, S., Tao, D., Xu, C. & Yang, J. 2015, 'The 21st International Conference on MultiMedia Modeling', IEEE Multimedia, vol. 22, no. 2, pp. 86-88.
View description>>

© 2015 IEEE. This report on The 21st International Conference on MultiMedia Modeling provides an overview of the best papers and keynote presentations. It also reviews the special sessions on Personal (Big) Data Modeling for Information Access and Retrieval; Social Geo-Media Analytics and Retrieval; and Image or Video Processing, Semantic Analysis, and Understanding.

Li, J., Lin, X., Rui, X., Rui, Y. & Tao, D. 2015, 'A Distributed Approach Toward Discriminative Distance Metric Learning', IEEE Transactions on Neural Networks and Learning Systems, vol. 26, no. 9, pp. 2111-2122.
View description>>

Distance metric learning (DML) is successful in discovering intrinsic relations in data. However, most algorithms are computationally demanding when the problem size becomes large. In this paper, we propose a discriminative metric learning algorithm, develop a distributed scheme learning metrics on moderate-sized subsets of data, and aggregate the results into a global solution. The technique leverages the power of parallel computation. The algorithm of the aggregated DML (ADML) scales well with the data size and can be controlled by the partition. We theoretically analyze and provide bounds for the error induced by the distributed treatment. We have conducted experimental evaluation of the ADML, both on specially designed tests and on practical image annotation tasks. Those tests have shown that the ADML achieves the state-of-the-art performance at only a fraction of the cost incurred by most existing methods.

Li, X., He, H., Wang, R. & Tao, D. 2015, 'Single Image Superresolution via Directional Group Sparsity and Directional Features', IEEE Transactions on Image Processing, vol. 24, no. 9, pp. 2874-2888.
View description>>

© 2015 IEEE. Single image superresolution (SR) aims to construct a high-resolution version from a single low-resolution (LR) image. The SR reconstruction is challenging because of the missing details in the given LR image. Thus, it is critical to explore and exploit effective prior knowledge for boosting the reconstruction performance. In this paper, we propose a novel SR method by exploiting both the directional group sparsity of the image gradients and the directional features in similarity weight estimation. The proposed SR approach is based on two observations: 1) most of the sharp edges are oriented in a limited number of directions and 2) an image pixel can be estimated by the weighted averaging of its neighbors. In consideration of these observations, we apply the curvelet transform to extract directional features which are then used for region selection and weight estimation. A combined total variation regularizer is presented which assumes that the gradients in natural images have a straightforward group sparsity structure. In addition, a directional nonlocal means regularization term takes pixel values and directional information into account to suppress unwanted artifacts. By assembling the designed regularization terms, we solve the SR problem of an energy function with minimal reconstruction error by applying a framework of templates for first-order conic solvers. The thorough quantitative and qualitative results in terms of peak signal-to-noise ratio, structural similarity, information fidelity criterion, and preference matrix demonstrate that the proposed approach achieves higher quality SR reconstruction than the state-of-the-art algorithms.

Liu, X., Song, M., Tao, D., Bu, J. & Chen, C. 2015, 'Random Geometric Prior Forest for Multiclass Object Segmentation', IEEE Transactions on Image Processing, vol. 24, no. 10, pp. 3060-3070.
View description>>

© 1992-2012 IEEE. Recent advances in object detection have led to the development of segmentation by detection approaches that integrate top-down geometric priors for multiclass object segmentation. A key yet under-addressed issue in utilizing top-down cues for the problem of multiclass object segmentation by detection is efficiently generating robust and accurate geometric priors. In this paper, we propose a random geometric prior forest scheme to obtain object-adaptive geometric priors efficiently and robustly. In the scheme, a testing object first searches for training neighbors with similar geometries using the random geometric prior forest, and then the geometry of the testing object is reconstructed by linearly combining the geometries of its neighbors. Our scheme enjoys several favorable properties when compared with conventional methods. First, it is robust and very fast because its inference does not suffer from bad initializations, poor local minimums or complex optimization. Second, the figure/ground geometries of training samples are utilized in a multitask manner. Third, our scheme is object-adaptive but does not require the labeling of parts or poselets, and thus, it is quite easy to implement. To demonstrate the effectiveness of the proposed scheme, we integrate the obtained top-down geometric priors with conventional bottom-up color cues in the frame of graph cut. The proposed random geometric prior forest achieves the best segmentation results of all of the methods tested on VOC2010/2012 and is 90 times faster than the current state-of-the-art method.

Lu, Y., Xie, F., Liu, T., Jiang, Z. & Tao, D. 2015, 'No Reference Quality Assessment for Multiply-Distorted Images Based on an Improved Bag-of-Words Model', IEEE Signal Processing Letters, vol. 22, no. 10, pp. 1811-1815.
View description>>

© 2015 IEEE. Multiple distortion assessment is a big challenge in image quality assessment (IQA). In this letter, a no reference IQA model for multiply-distorted images is proposed. The features, which are sensitive to each distortion type even in the presence of other distortions, are first selected from three kinds of NSS features. An improved Bag-of-Words (BoW) model is then applied to encode the selected features. Lastly, a simple yet effective linear combination is used to map the image features to the quality score. The combination weights are obtained through lasso regression. A series of experiments show that the feature selection strategy and the improved BoW model are effective in improving the accuracy of quality prediction for multiple distortion IQA. Compared with other algorithms, the proposed method delivers the best result for multiple distortion IQA.

Luo, Y., Liu, T., Tao, D. & Xu, C. 2015, 'Multiview matrix completion for multilabel image classification', IEEE Transactions on Image Processing, vol. 24, no. 8, pp. 2355-2368.
View description>>

© 2015 IEEE. There is growing interest in multilabel image classification due to its critical role in web-based image analytics-based applications, such as large-scale image retrieval and browsing. Matrix completion (MC) has recently been introduced as a method for transductive (semisupervised) multilabel classification, and has several distinct advantages, including robustness to missing data and background noise in both feature and label space. However, it is limited by only considering data represented by a single-view feature, which cannot precisely characterize images containing several semantic concepts. To utilize multiple features taken from different views, we have to concatenate the different features as a long vector. However, this concatenation is prone to over-fitting and often leads to very high time complexity in MC-based image classification. Therefore, we propose to weightedly combine the MC outputs of different views, and present the multiview MC (MVMC) framework for transductive multilabel image classification. To learn the view combination weights effectively, we apply a cross-validation strategy on the labeled set. In particular, MVMC splits the labeled set into two parts, and pred icts the labels of one part using the known labels of the other part. The predicted labels are then used to learn the view combination coefficients. In the learning process, we adopt the average precision (AP) loss, which is particular suitable for multilabel image classification, since the ranking-based criteria are critical for evaluating a multilabel classification system. A least squares loss formulation is also presented for the sake of efficiency, and the robustness of the algorithm based on the AP loss compared with the other losses is investigated. Experimental evaluation on two real-world data sets (PASCAL VOC' 07 and MIR Flickr) demonstrate the effectiveness of MVMC for transductive (semisupervised) multilabel image classification, and show that MVMC can ...

Luo, Y., Tao, D., Ramamohanarao, K., Xu, C. & Wen, Y. 2015, 'Tensor Canonical Correlation Analysis for Multi-View Dimension Reduction', IEEE Transactions on Knowledge and Data Engineering, vol. 27, no. 11, pp. 3111-3124.
View description>>

© 2015 IEEE. Canonical correlation analysis (CCA) has proven an effective tool for two-view dimension reduction due to its profound theoretical foundation and success in practical applications. In respect of multi-view learning, however, it is limited by its capability of only handling data represented by two-view features, whi le in many real-world applications, the number of views is frequently many more. Although the ad hoc way of simultaneously exploring all possible pairs of features can numerically deal with multi-view data, it ignores the high order statistics (correlation information) which can only be discovered by simultaneously exploring all features. Therefore, in this work, we develop tensor CCA (TCCA) which straightforwardly yet naturally generalizes CCA to handle the data of an arbitrary number of views by analyzing the covariance tensor of the different views. TCCA aims to directly maximize the canonical correlation of multiple (more than two) views. Crucially, we prove that the main problem of multi-view canonical correlation maximization is equivalent to finding the best rank-1 approximation of the data covariance tensor, which can be solved efficiently using the well-known alternating least squares (ALS) algorithm. As a consequence, the high order correlation information contained in the different views is explored and thus a more reliable common subspace shared by all features can be obtained. In addition, a non-linear extension of TCCA is presented. Experiments on various challenge tasks, including large scale biometric structure prediction, internet advertisement classification, and web image annotation, demonstrate the effectiveness of the proposed method.

Mei, X., Hong, Z., Prokhorov, D. & Tao, D. 2015, 'Robust Multitask Multiview Tracking in Videos', IEEE Transactions on Neural Networks and Learning Systems.
View description>>

Various sparse-representation-based methods have been proposed to solve tracking problems, and most of them employ least squares (LSs) criteria to learn the sparse representation. In many tracking scenarios, traditional LS-based methods may not perform well owing to the presence of heavy-tailed noise. In this paper, we present a tracking approach using an approximate least absolute deviation (LAD)-based multitask multiview sparse learning method to enjoy robustness of LAD and take advantage of multiple types of visual features, such as intensity, color, and texture. The proposed method is integrated in a particle filter framework, where learning the sparse representation for each view of the single particle is regarded as an individual task. The underlying relationship between tasks across different views and different particles is jointly exploited in a unified robust multitask formulation based on LAD. In addition, to capture the frequently emerging outlier tasks, we decompose the representation matrix to two collaborative components that enable a more robust and accurate approximation. We show that the proposed formulation can be effectively approximated by Nesterov's smoothing method and efficiently solved using the accelerated proximal gradient method. The presented tracker is implemented using four types of features and is tested on numerous synthetic sequences and real-world video sequences, including the CVPR2013 tracking benchmark and ALOV$++$ data set. Both the qualitative and quantitative results demonstrate the superior performance of the proposed approach compared with several state-of-the-art trackers.

Xu, C., Tao, D. & Xu, C. 2015, 'Multi-View Learning with Incomplete Views', IEEE Transactions on Image Processing, vol. 24, no. 12, pp. 5812-5825.
View description>>

© 2015 IEEE. One underlying assumption of the conventional multi-view learning algorithms is that all examples can be successfully observed on all the views. However, due to various failures or faults in collecting and pre-processing the data on different views, we are more likely to be faced with an incomplete-view setting, where an example could be missing its representation on one view (i.e., missing view) or could be only partially observed on that view (i.e., missing variables). Low-rank assumption used to be effective for recovering the random missing variables of features, but it is disabled by concentrated missing variables and has no effect on missing views. This paper suggests that the key to handling the incomplete-view problem is to exploit the connections between multiple views, enabling the incomplete views to be restored with the help of the complete views. We propose an effective algorithm to accomplish multi-view learning with incomplete views by assuming that different views are generated from a shared subspace. To handle the large-scale problem and obtain fast convergence, we investigate a successive over-relaxation method to solve the objective function. Convergence of the optimization technique is theoretically analyzed. The experimental results on toy data and real-world data sets suggest that studying the incomplete-view problem in multi-view learning is significant and that the proposed algorithm can effectively handle the incomplete views in different applications.

Xu, C., Tao, D., Li, Y. & Xu, C. 2015, 'Large-margin multi-view Gaussian process', Multimedia Systems, vol. 21, no. 2, pp. 147-157.
View description>>

In image classification, the goal was to decide whether an image belongs to a certain category or not. Multiple features are usually employed to comprehend the contents of images substantially for the improvement of classification accuracy. However, it also brings in some new problems that how to effectively combine multiple features together and how to handle the high-dimensional features from multiple views given the small training set. In this paper, we integrate the large-margin idea into the Gaussian process to discover the latent subspace shared by multiple features. Therefore, our approach inherits all the advantages of Gaussian process and large-margin principle. A probabilistic explanation is provided by Gaussian process to embed multiple features into the shared low-dimensional subspace, which derives a strong discriminative ability from the large-margin principle, and thus, the subsequent classification task can be effectively accomplished. Finally, we demonstrate the advantages of the proposed algorithm on real-world image datasets for discovering discriminative latent subspace and improving the classification performance. © 2014 Springer-Verlag Berlin Heidelberg.

Zeng, X., Bian, W., Liu, W., Shen, J. & Tao, D. 2015, 'Dictionary Pair Learning on Grassmann Manifolds for Image Denoising', IEEE Transactions on Image Processing, vol. 24, no. 11, pp. 4556-4569.
View description>>

© 2015 IEEE. Image denoising is a fundamental problem in computer vision and image processing that holds considerable practical importance for real-world applications. The traditional patch-based and sparse coding-driven image denoising methods convert 2D image patches into 1D vectors for further processing. Thus, these methods inevitably break down the inherent 2D geometric structure of natural images. To overcome this limitation pertaining to the previous image denoising methods, we propose a 2D image denoising model, namely, the dictionary pair learning (DPL) model, and we design a corresponding algorithm called the DPL on the Grassmann-manifold (DPLG) algorithm. The DPLG algorithm first learns an initial dictionary pair (i.e., the left and right dictionaries) by employing a subspace partition technique on the Grassmann manifold, wherein the refined dictionary pair is obtained through a sub-dictionary pair merging. The DPLG obtains a sparse representation by encoding each image patch only with the selected sub-dictionary pair. The non-zero elements of the sparse representation are further smoothed by the graph Laplacian operator to remove the noise. Consequently, the DPLG algorithm not only preserves the inherent 2D geometric structure of natural images but also performs manifold smoothing in the 2D sparse coding space. We demonstrate that the DPLG algorithm also improves the structural SIMilarity values of the perceptual visual quality for denoised images using the experimental evaluations on the benchmark images and Berkeley segmentation data sets. Moreover, the DPLG also produces the competitive peak signal-to-noise ratio values from popular image denoising algorithms.

Conferences

Al-Dmour, H., Ali, N. & Al-Ani, A. 2015, 'An Efficient Hybrid Steganography Method Based on Edge Adaptive and Tree Based Parity Check', MultiMedia Modeling (LNCS), 21st International Conference on MultiMedia Modelling, MMM 2015, Springer, Sydney, Australia, pp. 1-12.
View description>>

A major requirement for any steganography method is to minimize the changes that are introduced to the cover image by the data embedding process. Since the Human Visual System (HVS) is less sensitive to changes in sharp regions compared to smooth regions, edge adaptive has been proposed to discover edge regions and enhance the quality of the stego image as well as improve the embedding capacity. However, edge adaptive does not apply any coding scheme, and hence it embedding efficiency may not be optimal. In this paper, we propose a method that enhances edge adaptive by incorporating the Tree-Based Parity Check (TBPC) algorithm, which is a well-established coding-based steganography method. This combination enables not only the identification of potential pixels for embedding, but it also enhances the embedding efficiency through an efficient coding mechanism. More specifically, the method identifies the embedding locations according to the difference value between every two adjacent pixels, that form a block, in the cover image, and the number of embedding bits in each block is determined based on the difference between its two pixels. The incorporation of TBPC minimizes the modifications of the cover image, as it changes no more than two bits out of seven pixel bits when embedding four secret bits. Experimental results show that the proposed scheme can achieve both large embedding payload and high embedding efficiency.

Beveridge, J.R., Zhang, H., Draper, B.A., Flynn, P.J., Feng, Z., Huber, P., Kittler, J., Huang, Z., Li, S., Li, Y., Kan, M., Wang, R., Shan, S., Chen, X., Li, H., Hua, G., Struc, V., Krizaj, J., Ding, C., Tao, D. & Phillips, P.J. 2015, 'Report on the FG 2015 Video Person Recognition Evaluation', Proceedings of the 2015 11th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition, FG 2015, IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG), IEEE, Ljubljana, Slovenia, pp. 1-8.
View description>>

© 2015 IEEE. This report presents results from the Video Person Recognition Evaluation held in conjunction with the 11th IEEE International Conference on Automatic Face and Gesture Recognition. Two experiments required algorithms to recognize people in videos from the Point-and-Shoot Face Recognition Challenge Problem (PaSC). The first consisted of videos from a tripod mounted high quality video camera. The second contained videos acquired from 5 different handheld video cameras. There were 1401 videos in each experiment of 265 subjects. The subjects, the scenes, and the actions carried out by the people are the same in both experiments. Five groups from around the world participated in the evaluation. The video handheld experiment was included in the International Joint Conference on Biometrics (IJCB) 2014 Handheld Video Face and Person Recognition Competition. The top verification rate from this evaluation is double that of the top performer in the IJCB competition. Analysis shows that the factor most effecting algorithm performance is the combination of location and action: where the video was acquired and what the person was doing.

Gong, C., Tao, D., Liu, W., Maybank, S.J., Fang, M., Fu, K., Yang, J. & IEEE 2015, 'Saliency Propagation from Simple to Difficult', 2015 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), pp. 2531-2539.

Gong, M., Zhang, K., Schölkopf, B., Tao, D. & Geiger, P. 2015, 'Discovering temporal causal relations from subsampled data', Proceedings of The 32nd International Conference on Machine Learning, International Conference on Machine Learning, IMLS, Lille Grand Palais, pp. 1898-1906.
View description>>

© Copyright 2015 by International Machine Learning Society (IMLS). All rights reserved.Granger causal analysis has been an important tool for causal analysis for time series in various fields, including neuroscience and economics, and recently it has been extended to include instantaneous effects between the time series to explain the contemporaneous dependence in the residuals. In this paper, we assume that the time series at the true causal frequency follow the vector autoregressive model. We show that when the data resolution becomes lower due to subsam-pling, neither the original Granger causal analysis nor the extended one is able to discover the underlying causal relations. We then aim to answer the following question: can we estimate the temporal causal relations at the right causal frequency from the subsampled data? Traditionally this suffers from the identifiability problems: under the Gaussianity assumption of the data, the solutions are generally not unique. We prove that, however, if the noise terms are non-Gaussian, the underlying model for the high-frequency data is identifiable from subsampled data under mild conditions. We then propose an Expectation-Maximization (EM) approach and a variational inference approach to recover temporal causal relations from such subsampled data. Experimental results on both simulated and real data are reported to illustrate the performance of the proposed approaches.

He, X., Luo, S., Tao, D., Xu, C., Yang, J. & Abul Hasan, M. 2015, 'MultiMedia Modeling: 21st International Conference, MMM 2015 Sydney, NSW, Australia, January 5-7, 2015 Proceedings, Part II', Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics).

Hong, Z., Chen, Z., Wang, C., Mei, X., Prokhorov, D. & Tao, D. 2015, 'MUlti-Store Tracker (MUSTer): A cognitive psychology inspired approach to object tracking', Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, IEEE Conference on Computer Vision and Pattern Recognition, IEEE, Boston, Massachusetts, USA, pp. 749-758.
View description>>

© 2015 IEEE. Variations in the appearance of a tracked object, such as changes in geometry/photometry, camera viewpoint, illumination, or partial occlusion, pose a major challenge to object tracking. Here, we adopt cognitive psychology principles to design a flexible representation that can adapt to changes in object appearance during tracking. Inspired by the well-known Atkinson-Shiffrin Memory Model, we propose MUlti-Store Tracker (MUSTer), a dual-component approach consisting of short- and long-term memory stores to process target appearance memories. A powerful and efficient Integrated Correlation Filter (ICF) is employed in the short-term store for short-term tracking. The integrated long-term component, which is based on keypoint matching-tracking and RANSAC estimation, can interact with the long-term memory and provide additional information for output control. MUSTer was extensively evaluated on the CVPR2013 Online Object Tracking Benchmark (OOTB) and ALOV++ datasets. The experimental results demonstrated the superior performance of MUSTer in comparison with other state-of-art trackers.

Li, Y., Tian, X., Liu, T. & Tao, D. 2015, 'Multi-Task Model and Feature Joint Learning', http://ijcai.org/papers15/contents.php, International Joint Conference on Artificial Intelligence, AAAI Press / International Joint Conferences on Artificial Intelligence, Buenos Aires, Argentia, pp. 3643-3649.
View description>>

Given several tasks, multi-task learning (MTL) learns multiple tasks jointly by exploring the interdependence between them. The basic assumption in MTL is that those tasks are indeed related. Existing MTL methods model the task relatedness/interdependence in two different ways, either common parameter-sharing or common feature sharing across tasks. In this paper, we propose a novel multi-task learning method to jointly learn shared parameters and shared feature representation. Our objective is to learn a set of common features with which the tasks are related as closely as possible, therefore common parameters shared across tasks can be optimally learned. We present a detailed deviation of our multi-task learning method and propose an alternating algorithm to solve the non-convex optimization problem. We further present a theoretical bound which directly demonstrates that the proposed multi-task learning method can successfully model the relatedness via joint common parameter- and common feature learning. Extensive experiments are conducted on several real world multi-task learning datasets. All results demonstrate the effectiveness of our multitask model and feature joint learning method.

Liu, H., Liu, T., Tao, D., Wu, J. & Yun, F. 2015, 'Spectral Ensemble Clustering', Proceedings of the 21st ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM International Conference on Knowledge Discovery and Data Mining, ACM, Hilton, Sydney, pp. 715-724.
View description>>

Ensemble clustering, also known as consensus clustering, is emerging as a promising solution for multi-source and/or heterogeneous data clustering. The co-association matrix based method, which redefines the ensemble clustering problem as a classical graph partition problem, is a landmark method in this area. Nevertheless, the relatively high time and space complexity preclude it from real-life large-scale data clustering. We therefore propose SEC, an efficient Spectral Ensemble Clustering method based on co-association matrix. We show that SEC has theoretical equivalence to weighted K-means clustering and results in vastly reduced algorithmic complexity. We then derive the latent consensus function of SEC, which to our best knowledge is among the first to bridge co-association matrix based method to the methods with explicit object functions. The robustness and generalizability of SEC are then investigated to prove the superiority of SEC in theory. We finally extend SEC to meet the challenge rising from incomplete basic partitions, based on which a scheme for big data clustering can be formed. Experimental results on various real-world data sets demonstrate that SEC is an effective and efficient competitor to some state-of-the-art ensemble clustering methods and is also suitable for big data clustering.

Tan, H., Zhang, X., Guan, N., Tao, D., Huang, X. & Luo, Z. 2015, 'Two-dimensional euler PCA for face recognition', Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), pp. 548-559.
View description>>

© Springer International Publishing Switzerland 2015. Principal component analysis (PCA) projects data on the directions with maximal variances. Since PCA is quite effective in dimension reduction, it has been widely used in computer vision. However, conventional PCA suffers from following deficiencies: 1) it spends much computational costs to handle high-dimensional data, and 2) it cannot reveal the nonlinear relationship among different features of data. To overcome these deficiencies, this paper proposes an efficient two-dimensional Euler PCA (2D-ePCA) algorithm. Particularly, 2D-ePCA learns projection matrix on the 2D pixel matrix of each image without reshaping it into 1D long vector, and uncovers nonlinear relationships among features by mapping data onto complex representation. Since such 2D complex representation induces much smaller kernel matrix and principal subspaces, 2D-ePCA costs much less computational overheads than Euler PCA on large-scale dataset. Experimental results on popular face datasets show that 2D-ePCA outperforms the representative algorithms in terms of accuracy, computational overhead, and robustness.

Wang, Z., Du, B., Zhang, L., Hu, W., Tao, D. & Zhang, L. 2015, 'Batch mode active learning for geographical image classification', Web Technologies and Applications (LNCS), 17th Asia-Pacific Web Conference, Springer, Guangzhou, China, pp. 744-755.
View description>>

© Springer International Publishing Switzerland 2015. In this paper, an innovative batch mode active learning by combining discriminative and representative information for hyperspectral image classification with support vector machine is proposed. In the past years, the batch mode active learning mainly exploits different query functions, which are based on two criteria: uncertainty criterion and diversity criterion. Generally, the uncertainty criterion and diversity criterion are independent of each other, and they also could not make sure the queried samples identical and independent distribution. In the proposed method, the diversity criterion is focused. In the innovative diversity criterion, firstly, we derive a novel form of upper bound for true risk in the active learning setting, by minimizing this upper bound to measure the discriminative information, which is connected with the uncertainty. Secondly, for the representative information, the maximum mean discrepancy(MMD) which captures the representative information of the data structure is adopt to match the distribution of the labeled samples and query samples, to make sure the queried samples have a similar distribution to the labeled samples and guarantee the queried samples are diversified. Meanwhile, the number of new queried samples is adaptive, which depends on the distribution of the labeled samples. In the experiment, we employ two benchmark remote sensing images, Indian Pines and Washington DC. The experimental results demonstrate the effective of our proposed method compared with the state-of-the-art AL methods.

Wu, S., Zhang, X., Guan, N., Tao, D., Huang, X. & Luo, Z. 2015, 'Non-negative low-rank and group-sparse matrix factorization', Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), pp. 536-547.
View description>>

© Springer International Publishing Switzerland 2015. Non-negative matrix factorization (NMF) has been a popular data analysis tool and has been widely applied in computer vision. However, conventional NMF methods cannot adaptively learn grouping structure froma dataset.This paper proposes a non-negative low-rank and group-sparse matrix factorization (NLRGS) method to overcome this deficiency. Particularly, NLRGS captures the relationships among examples by constraining rank of the coefficients meanwhile identifies the grouping structure via group sparsity regularization. By both constraints, NLRGS boosts NMF in both classification and clustering. However, NLRGS is difficult to be optimized because it needs to deal with the low-rank constraint. To relax such hard constraint, we approximate the low-rank constraint with the nuclear norm and then develop an optimization algorithm for NLRGS in the frame of augmented Lagrangian method(ALM). Experimental results of both face recognition and clustering on four popular face datasets demonstrate the effectiveness of NLRGS in quantities.

Zhang, F., Li, J., Li, F., Xu, M., Xu, Y. & He, X. 2015, 'Community detection based on links and node features in social networks', Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 21st International Conference on Multimedia Modelling, MMM 2015, Springer, Sydney, Australia, pp. 418-429.
View description>>

© Springer International Publishing Switzerland 2015. Community detection is a significant but challenging task in the field of social network analysis. Many effective methods have been proposed to solve this problem. However, most of them are mainly based on the topological structure or node attributes. In this paper, based on SPAEM [1], we propose a joint probabilistic model to detect community which combines node attributes and topological structure. In our model, we create a novel feature-based weighted network, within which each edge weight is represented by the node feature similarity between two nodes at the end of the edge. Then we fuse the original network and the created network with a parameter and employ expectation-maximization algorithm (EM) to identify a community. Experiments on a diverse set of data, collected from Facebook and Twitter, demonstrate that our algorithm has achieved promising results compared with other algorithms.

Journal articles

Bian, W., Zhou, T., Martinez, A.M., Baciu, G. & Tao, D. 2014, 'Minimizing Nearest Neighbor Classification Error for Nonparametric Dimension Reduction', IEEE Transactions on Neural Networks and Learning Systems, vol. 25, no. 8, pp. 1588-1594.
View description>>

In this brief, we show that minimizing nearest neighbor classification error (MNNE) is a favorable criterion for supervised linear dimension reduction (SLDR). We prove that MNNE is better than maximizing mutual information in the sense of being a proxy of the Bayes optimal criterion. Based on kernel density estimation, we derive a nonparametric algorithm for MNNE. Experiments on benchmark data sets show the superiority of MNNE over existing nonparametric SLDR methods.

Liu, W., Li, Y., Lin, X., Tao, D. & Wang, Y. 2014, 'Hessian-regularized co-training for social activity recognition', PLoS ONE, vol. 9, no. 9.
View description>>

© 2014 Liu et al. Co-training is a major multi-view learning paradigm that alternately trains two classifiers on two distinct views and maximizes the mutual agreement on the two-view unlabeled data. Traditional co-training algorithms usually train a learner on each view separately and then force the learners to be consistent across views. Although many co-trainings have been developed, it is quite possible that a learner will receive erroneous labels for unlabeled data when the other learner has only mediocre accuracy. This usually happens in the first rounds of co-training, when there are only a few labeled examples. As a result, co-training algorithms often have unstable performance. In this paper, Hessian-regularized co-training is proposed to overcome these limitations. Specifically, each Hessian is obtained from a particular view of examples; Hessian regularization is then integrated into the learner training process of each view by penalizing the regression function along the potential manifold. Hessian can properly exploit the local structure of the underlying data manifold. Hessian regularization significantly boosts the generalizability of a classifier, especially when there are a small number of labeled examples and a large number of unlabeled examples. To evaluate the proposed method, extensive experiments were conducted on the unstructured social activity attribute (USAA) dataset for social activity recognition. Our results demonstrate that the proposed method outperforms baseline methods, including the traditional co-training and LapCo algorithms.

Liu, W., Tao, D., Cheng, J. & Tang, Y. 2014, 'Multiview Hessian discriminative sparse coding for image annotation', Computer Vision and Image Understanding, vol. 118, pp. 50-60.
View description>>

Sparse coding represents a signal sparsely by using an overcomplete dictionary, and obtains promising performance in practical computer vision applications, especially for signal restoration tasks such as image denoising and image inpainting. In recent years, many discriminative sparse coding algorithms have been developed for classification problems, but they cannot naturally handle visual data represented by multiview features. In addition, existing sparse coding algorithms use graph Laplacian to model the local geometry of the data distribution. It has been identified that Laplacian regularization biases the solution towards a constant function which possibly leads to poor extrapolating power. In this paper, we present multiview Hessian discriminative sparse coding (mHDSC) which seamlessly integrates Hessian regularization with discriminative sparse coding for multiview learning problems. In particular, mHDSC exploits Hessian regularization to steer the solution which varies smoothly along geodesics in the manifold, and treats the label information as an additional view of feature for incorporating the discriminative power for image annotation. We conduct extensive experiments on PASCAL VOC07 dataset and demonstrate the effectiveness of mHDSC for image annotation.

Lou, Y., Liu, T. & Tao, D. 2014, 'Decomposition-Based Transfer Distance Metric Learning for Image Classification', IEEE Transactions On Image Processing, vol. 23, no. 9, pp. 3789-3801.

Ou, W., You, X., Tao, D., Zhang, P., Tang, Y. & Zhu, Z. 2014, 'Robust face recognition via occlusion dictionary learning', Pattern Recognition, vol. 47, no. 4, pp. 1559-1572.
View description>>

Sparse representation based classification (SRC) has recently been proposed for robust face recognition. To deal with occlusion, SRC introduces an identity matrix as an occlusion dictionary on the assumption that the occlusion has sparse representation in this dictionary. However, the results show that SRC's use of this occlusion dictionary is not nearly as robust to large occlusion as it is to random pixel corruption. In addition, the identity matrix renders the expanded dictionary large, which results in expensive computation. In this paper, we present a novel method, namely structured sparse representation based classification (SSRC), for face recognition with occlusion. A novel structured dictionary learning method is proposed to learn an occlusion dictionary from the data instead of an identity matrix. Specifically, a mutual incoherence of dictionaries regularization term is incorporated into the dictionary learning objective function which encourages the occlusion dictionary to be as independent as possible of the training sample dictionary. So that the occlusion can then be sparsely represented by the linear combination of the atoms from the learned occlusion dictionary and effectively separated from the occluded face image. The classification can thus be efficiently carried out on the recovered non-occluded face images and the size of the expanded dictionary is also much smaller than that used in SRC. The extensive experiments demonstrate that the proposed method achieves better results than the existing sparse representation based face recognition methods, especially in dealing with large region contiguous occlusion and severe illumination variation, while the computational cost is much lower.

Qiao, M., Cheng, J., Bian, W. & Tao, D. 2014, 'Biview Learning for Human Posture Segmentation from 3D Points Cloud', PLoS One, vol. 9, no. 1, pp. e85811-e85811.
View description>>

Posture segmentation plays an essential role in human motion analysis. The state-of-the-art method extracts sufficiently high-dimensional features from 3D depth images for each 3D point and learns an efficient body part classifier. However, high-dimensional features are memory-consuming and difficult to handle on large-scale training dataset. In this paper, we propose an efficient two-stage dimension reduction scheme, termed biview learning, to encode two independent views which are depth-difference features (DDF) and relative position features (RPF). Biview learning explores the complementary property of DDF and RPF, and uses two stages to learn a compact yet comprehensive low-dimensional feature space for posture segmentation. In the first stage, discriminative locality alignment (DLA) is applied to the high-dimensional DDF to learn a discriminative low-dimensional representation. In the second stage, canonical correlation analysis (CCA) is used to explore the complementary property of RPF and the dimensionality reduced DDF. Finally, we train a support vector machine (SVM) over the output of CCA. We carefully validate the effectiveness of DLA and CCA utilized in the two-stage scheme on our 3D human points cloud dataset. Experimental results show that the proposed biview learning scheme significantly outperforms the state-of-the-art method for human posture segmentation.

Wang, N., Tao, D., Gao, X., Li, X. & Li, J. 2014, 'A Comprehensive Survey to Face Hallucination', International journal of computer vision, vol. 106, no. 1, pp. 9-30.
View description>>

This paper comprehensively surveys the development of face hallucination (FH), including both face super-resolution and face sketch-photo synthesis techniques. Indeed, these two techniques share the same objective of inferring a target face image (e.g. high-resolution face image, face sketch and face photo) from a corresponding source input (e.g. low-resolution face image, face photo and face sketch). Considering the critical role of image interpretation in modern intelligent systems for authentication, surveillance, law enforcement, security control, and entertainment, FH has attracted growing attention in recent years. Existing FH methods can be grouped into four categories: Bayesian inference approaches, subspace learning approaches, a combination of Bayesian inference and subspace learning approaches, and sparse representation-based approaches. In spite of achieving a certain level of development, FH is limited in its success by complex application conditions such as variant illuminations, poses, or views. This paper provides a holistic understanding and deep insight into FH, and presents a comparative analysis of representative methods and promising future directions.

Xu, C., Tao, D. & Xu, C. 2014, 'Large-Margin Multi-View Information bottleneck', IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 36, no. 8, pp. 1559-1572.
View description>>

In this paper, we extend the theory of the information bottleneck (IB) to learning from examples represented by multi-view features. We formulate the problem as one of encoding a communication system with multiple senders, each of which represents one view of the data. Based on the precise components filtered out from multiple information sources through a 'bottleneck', a margin maximization approach is then used to strengthen the discrimination of the encoder by improving the code distance within the frame of coding theory. The resulting algorithm therefore inherits all the merits of the IB principle and coding theory. It has two distinct advantages over existing algorithms, namely, that our method finds a tradeoff between the accuracy and complexity of the multi-view model, and that the encoded multi-view data retains sufficient discrimination for classification. We also derive the robustness and generalization error bound of the proposed algorithm, and reveal the specific properties of multi-view learning. First, the complementarity of multi-view features guarantees the robustness of the algorithm. Second, the consensus of multi-view features reduces the empirical Rademacher complexity of the objective function, enhances the accuracy of the solution, and improves the generalization error bound of the algorithm. The resulting objective function is solved efficiently using the alternating direction method. Experimental results on annotation, classification and recognition tasks demonstrate that the proposed algorithm is promising for practical applications. © 1979-2012 IEEE.

You, X., Li, Q., Tao, D., Ou, W. & Gong, M. 2014, 'Local Metric Learning for Exemplar-Based Object Detection', Circuits and Systems for Video Technology, IEEE Transactions on, vol. 24, pp. 1265-1276.

You, X., Wang, R. & Tao, D. 2014, 'Diverse Expected Gradient Active Learning for Relative Attributes', IEEE Transactions On Image Processing, vol. 23, no. 7, pp. 3203-3217.

Conferences

Fang, M., Yin, J. & Tao, D. 2014, 'Active learning for crowdsourcing using knowledge transfer', Proceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence, AAAI Conference on Artificial Intelligence, Quebec City; Canada, pp. 1809-1815.

Guan, N., Lan, L., Tao, D., Luo, Z. & Yang, X. 2014, 'Transductive nonnegative matrix factorization for semi-supervised high-performance speech separation', ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, pp. 2534-2538.
View description>>

Regarding the non-negativity property of the magnitude spectrogram of speech signals, nonnegative matrix factorization (NMF) has obtained promising performance for speech separation by independently learning a dictionary on the speech signals of each known speaker. However, traditional NM-F fails to represent the mixture signals accurately because the dictionaries for speakers are learned in the absence of mixture signals. In this paper, we propose a new transductive NMF algorithm (TNMF) to jointly learn a dictionary on both speech signals of each speaker and the mixture signals to be separated. Since TNMF learns a more descriptive dictionary by encoding the mixture signals than that learned by NMF, it significantly boosts the separation performance. Experiments results on a popular TIMIT dataset show that the proposed TNMF-based methods outperform traditional NMF-based methods for separating the monophonic mixtures of speech signals of known speakers. © 2014 IEEE.

Hong, Z., Wang, C., Mei, X., Prokhorov, D. & Tao, D. 2014, 'Tracking Using Multilevel Quantizations', Computer Vision – ECCV 2014, European Conference on Computer Vision, Springer, Switzerland, pp. 155-171.
View description>>

Most object tracking methods only exploit a single quantization of an image space: pixels, superpixels, or bounding boxes, each of which has advantages and disadvantages. It is highly unlikely that a common optimal quantization level, suitable for tracking all objects in all environments, exists. We therefore propose a hierarchical appearance representation model for tracking, based on a graphical model that exploits shared information across multiple quantization levels. The tracker aims to find the most possible position of the target by jointly classifying the pixels and superpixels and obtaining the best configuration across all levels. The motion of the bounding box is taken into consideration, while Online Random Forests are used to provide pixel- and superpixel-level quantizations and progressively updated on-the-fly. By appropriately considering the multilevel quantizations, our tracker exhibits not only excellent performance in non-rigid object deformation handling, but also its robustness to occlusions. A quantitative evaluation is conducted on two benchmark datasets: a non-rigid object tracking dataset (11 sequences) and the CVPR2013 tracking benchmark (50 sequences). Experimental results show that our tracker overcomes various tracking challenges and is superior to a number of other popular tracking methods.

Lan, L., Guan, N., Zhang, X., Tao, D. & Luo, Z. 2014, 'Soft-constrained nonnegative matrix factorization via normalization', Proceedings of the International Joint Conference on Neural Networks, IEEE International Joint Conference on Neural Networks, pp. 3025-3030.
View description>>

© 2014 IEEE. Semi-supervised clustering aims at boosting the clustering performance on unlabeled samples by using labels from a few labeled samples. Constrained NMF (CNMF) is one of the most significant semi-supervised clustering methods, and it factorizes the whole dataset by NMF and constrains those labeled samples from the same class to have identical encodings. In this paper, we propose a novel soft-constrained NMF (SCNMF) method by softening the hard constraint in CNMF. Particularly, SCNMF factorizes the whole dataset into two lower-dimensional factor matrices by using multiplicative update rule (MUR). To utilize the labels of labeled samples, SCNMF iteratively normalizes both factor matrices after updating them with MURs to make encodings of labeled samples close to their label vectors. It is therefore reasonable to believe that encodings of unlabeled samples are also close to their corresponding label vectors. Such strategy significantly boosts the clustering performance even when the labeled samples are rather limited, e.g., each class owns only a single labeled sample. Since the normalization procedure never increases the computational complexity of MUR, SCNMF is quite efficient and effective in practices. Experimental results on face image datasets illustrate both efficiency and effectiveness of SCNMF compared with both NMF and CNMF.

Shao, M., Li, S., Liu, T., Tao, D., Huang, T.S. & Fu, Y. 2014, 'Learning relative features through adaptive pooling for image classification', Proceedings - IEEE International Conference on Multimedia and Expo, IEEE International Conference on Multimedia and Expo Workshop, IEEE, Chengdu, China.
View description>>

© 2014 IEEE. Bag-of-Feature (BoF) representations and spatial constraints have been popular in image classification research. One of the most successful methods uses sparse coding and spatial pooling to build discriminative features. However, minimizing the reconstruction error by sparse coding only considers the similarity between the input and codebooks. In contrast, this paper describes a novel feature learning approach for image classification by considering the dissimilarity between inputs and prototype images, or what we called reference basis (RB). First, we learn the feature representation by max-margin criterion between the input and the RB. The learned hyperplane is stored as the relative feature. Second, we propose an adaptive pooling technique to assemble multiple relative features generated by different RBs under the SVM framework, where the classifier and the pooling weights are jointly learned. Experiments based on three challenging datasets: Caltech-101, Scene 15 and Willow-Actions, demonstrate the effectiveness and generality of our framework.

Xu, C., Tao, D., Xu, C. & Rui, Y. 2014, 'Large-margin weakly supervised dimensionality reduction', Proceedings of the 31st International Conference on Machine Learning (ICML-14), International Conference on Machine Learning, Beijing; China.

Xu, Z., Tao, D., Zhang, Y., Wu, J. & Tsoi, A.C. 2014, 'Architectural Style Classification Using Multinomial Latent Logistic Regression', 13th European Conference on Computer Vision, Proceedings, Part I, 13th European Conference on Computer Vision, Springer, Zurich, Switzerland, pp. 600-615.
View description>>

Architectural style classification differs from standard classification tasks due to the rich inter-class relationships between different styles, such as re-interpretation, revival, and territoriality. In this paper, we adopt Deformable Part-based Models (DPM) to capture the morphological characteristics of basic architectural components and propose Multinomial Latent Logistic Regression (MLLR) that introduces the probabilistic analysis and tackles the multi-class problem in latent variable models. Due to the lack of publicly available datasets, we release a new large-scale architectural style dataset containing twenty-five classes. Experimentation on this dataset shows that MLLR in combination with standard global image features, obtains the best classification results. We also present interpretable probabilistic explanations for the results, such as the styles of individual buildings and a style relationship network, to illustrate inter-class relationships.

Zhang, X., Guan, N., Lan, L., Tao, D. & Luo, Z. 2014, 'Box-constrained projective nonnegative matrix factorization via augmented Lagrangian method', Proceedings of the International Joint Conference on Neural Networks, IEEE International Joint Conference on Neural Networks, pp. 1900-1906.
View description>>

© 2014 IEEE. Projective non-negative matrix factorization (P-NMF) projects a set of examples onto a subspace spanned by a non-negative basis whose transpose is regarded as the projection matrix. Since PNMF learns a natural parts-based representation, it has been successfully used in text mining and pattern recognition. However, it is non-trivial to analyze the convergence of the optimization algorithms for PNMF because its objective function is non-convex. In this paper, we propose a Box-constrained PNMF (BPNMF) method to overcome this deficiency of PNMF. In particular, BPNMF introduces an auxiliary variable, i.e., the coefficients of examples, and incorporates the following two types of constraints: 1) each entry of the basis is non-negative and upper-bounded, i.e., box-constrained, and 2) the coefficients equal to the projected points of the examples. The first box constraint makes the basis to be bound and the second equality constraint keeps its equivalence to PNMF. Similar to PNMF, BPNMF is difficult because the objective function is non-convex. To solve BPNMF, we developed an efficient algorithm in the frame of augmented Lagrangian multiplier (ALM) method and proved that the ALM-based algorithm converges to local minima. Experimental results on two face image datasets demonstrate the effectiveness of BPNMF compared with the representative methods.

Journal articles

Cheng, J., Bian, W. & Tao, D. 2013, 'Locally regularized sliced inverse regression based 3D hand gesture recognition on a dance robot', Information Sciences, vol. 221, pp. 274-283.
View description>>

Gesture recognition plays an important role in human machine interactions (HMIs) for multimedia entertainment. In this paper, we present a dimension reduction based approach for dynamic real-time hand gesture recognition. The hand gestures are recorded as acceleration signals by using a handheld with a 3-axis accelerometer sensor installed, and represented by discrete cosine transform (DCT) coefficients. To recognize different hand gestures, we develop a new dimension reduction method, locally regularized sliced inverse regression (LR-SIR), to find an effective low dimensional subspace, in which different hand gestures are well separable, following which recognition can be performed by using simple and efficient classifiers, e.g., nearest mean, k-nearest-neighbor rule and support vector machine. LR-SIR is built upon the well-known sliced inverse regression (SIR), but overcomes its limitation that it ignores the local geometry of the data distribution. Besides, LR-SIR can be effectively and efficiently solved by eigen-decomposition. Finally, we apply the LR-SIR based gesture recognition to control our recently developed dance robot for multimedia entertainment. Thorough empirical studies on `digits-gesture recognition suggest the effectiveness of the new gesture recognition scheme for HMI.

Du, B., Zhang, L., Tao, D. & Zhang, D. 2013, 'Unsupervised transfer learning for target detection from hyperspectral images', Neurocomputing, vol. 120, no. 1, pp. 72-82.
View description>>

Target detection has been of great interest in hyperspectral image analysis. Feature extraction from target samples and counterpart backgrounds consist the key to the problem. Traditional target detection methods depend on comparatively fixed feature for

Li, J., Bian, W., Tao, D. & Zhang, C. 2013, 'Learning Colours From Textures By Sparse Manifold Embedding', Signal Processing, vol. 93, no. 6, pp. 1485-1495.
View description>>

The capability of inferring colours from the texture (grayscale contents) of an image is useful in many application areas, when the imaging device/environment is limited. Traditional manual or limited automatic colour assignment involves intensive human

Liu, T., Sachdev, P., Lipnicki, D., Jiang, J., Geng, G., Zhu, W., Reppermund, S., Tao, D., Trollor, J., Brodaty, H. & Wen, W. 2013, 'Limited relationships between two-year changes in sulcal morphology and other common neuroimaging indices in the elderly', Neuroimage, vol. 83, no. 1, pp. 12-17.
View description>>

Measuring the geometry or morphology of sulcal folds has recently become an important approach to investigating neuroanatomy. However, relationships between cortical sulci and other brain structures are poorly understood. The present study investigates h

Luo, Y., Tao, D., Geng, B., Xu, C. & Maybank, S. 2013, 'Manifold Regularized Multi-task Learning for Semi-supervised Multi-label Image Classification', IEEE Transactions On Image Processing, vol. 22, no. 2, pp. 523-532.
View description>>

It is a significant challenge to classify images with multiple labels by using only a small number of labeled samples. One option is to learn a binary classifier for each label and use manifold regularization to improve the classification performance by exploring the underlying geometric structure of the data distribution. However, such an approach does not perform well in practice when images from multiple concepts are represented by high-dimensional visual features. Thus, manifold regularization is insufficient to control the model complexity. In this paper, we propose a manifold regularized multitask learning (MRMTL) algorithm. MRMTL learns a discriminative subspace shared by multiple classification tasks by exploiting the common structure of these tasks. It effectively controls the model complexity because different tasks limit one another's search volume, and the manifold regularization ensures that the functions in the shared hypothesis space are smooth along the data manifold. We conduct extensive experiments, on the PASCAL VOC'07 dataset with 20 classes and the MIR dataset with 38 classes, by comparing MRMTL with popular image classification algorithms. The results suggest that MRMTL is effective for image classification.

Luo, Y., Tao, D., Xu, C., Xu, C., Liu, H. & Wen, Y. 2013, 'Multiview Vector-valued Manifold Regularization For Multilabel Image Classification', IEEE Transactions on Neural Networks and Learning Systems, vol. 24, no. 5, pp. 709-722.
View description>>

In computer vision, image datasets used for classification are naturally associated with multiple labels and comprised of multiple views, because each image may contain several objects (e.g., pedestrian, bicycle, and tree) and is properly characterized by multiple visual features (e.g., color, texture, and shape). Currently, available tools ignore either the label relationship or the view complementarily. Motivated by the success of the vector-valued function that constructs matrix-valued kernels to explore the multilabel structure in the output space, we introduce multiview vector-valued manifold regularization (MV3MR) to integrate multiple features. MV3MR exploits the complementary property of different features and discovers the intrinsic local geometry of the compact support shared by different features under the theme of manifold regularization. We conduct extensive experiments on two challenging, but popular, datasets, PASCAL VOC' 07 and MIR Flickr, and validate the effectiveness of the proposed MV3MR for image classification.

Wang, N., Li, J., Tao, D., Li, X. & Gao, X. 2013, 'Heterogeneous Image Transformation', Pattern Recognition Letters, vol. 34, no. 1, pp. 77-84.
View description>>

Heterogeneous image transformation (HIT) plays an important role in both law enforcements and digital entertainment. Some available popular transformation methods, like locally linear embedding based, usually generate images with lower definition and blurred details mainly due to two defects: (1) these approaches use a fixed number of nearest neighbors (NN) to model the transformation process, i.e., K-NN-based methods; (2) with overlapping areas averaged, the transformed image is approximately equivalent to be filtered by a low pass filter, which filters the high frequency or detail information. These drawbacks reduce the visual quality and the recognition rate across heterogeneous images. In order to overcome these two disadvantages, a two step framework is constructed based on sparse feature selection (SFS) and support vector regression (SVR). In the proposed model, SFS selects nearest neighbors adaptively based on sparse representation to implement an initial transformation, and subsequently the SVR model is applied to estimate the lost high frequency information or detail information. Finally, by linear superimposing these two parts, the ultimate transformed image is obtained. Extensive experiments on both sketch-photo database and near infraredvisible image database illustrates the effectiveness of the proposed heterogeneous image transformation method.

Wang, X., Bian, W. & Tao, D. 2013, 'Grassmannian regularized structured multi-view embedding for image classification', IEEE Transactions On Image Processing, vol. 22, no. 7, pp. 2646-2660.
View description>>

Images are usually represented by features from multiple views, e.g., color and texture. In image classification, the goal is to fuse all the multi-view features in a reasonable manner and achieve satisfactory classification performance. However, the fea

Conferences

Du, B., Wang, N., Zhang, L. & Tao, D. 2013, 'Hyperspectral medical images unmixing for cancer screening based on rotational independent component analysis', Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), International Conference on Intelligent Science and Intelligent Data Engineering (IScIDE), Springer, Beijing, China, pp. 336-343.
View description>>

Hyperspectral images have shown promising performance in many applications, especially extracting information from remotely sensed geometric images. One obvious advantage is its good ability to reflect the physical meaning from a point view of spectrum, since even two very similar materials would present an obvious difference by a hyperspectral imaging system. Recent work has made great progress on the hyperspectral fluorescence imaging techniques, which makes the elaborate spectral observation of cancer areas possible. Cancer cells would be distinguishable with normal ones when the living body is injected with fluorescence, which helps organs inside the living body emit lights, and then the signals can be obtained by the passive imaging sensor. This paper discusses the ability to screen the cancers by means of hyperspectral bioluminescence images. A rotational independent component analysis method is proposed to solve the problem. Experiments evaluate the superior performance of the proposed ICA-based method to other blind source separation methods: 1) The ICA-based methods do perform well in detect the cancer areas inside the living body; 2) The proposed method presents more accurate cancer areas than other state-of-the-art algorithms. © 2013 Springer-Verlag Berlin Heidelberg.

Gunther, M., Costa-Pazo, A., Ding, C., Boutellaa, E., Chiachia, G., Zhang, H., De Assis Angeloni, M., Struc, V., Khoury, E., Vazquez-Fernandez, E., Tao, D., Bengherabi, M., Cox, D., Kiranyaz, S., De Freitas Pereira, T., Zganec-Gros, J., Argones-Rua, E., Pinto, N., Gabbouj, M., Simoes, F., Dobrisek, S., Gonzalez-Jimenez, D., Rocha, A., Neto, M.U., Pavesic, N., Falcao, A., Violato, R. & Marcel, S. 2013, 'The 2013 face recognition evaluation in mobile environment', Proceedings - 2013 International Conference on Biometrics, ICB 2013, IAPR International Conference on Biometrics (ICB), IEEE, Madrid, Spain.
View description>>

Automatic face recognition in unconstrained environments is a challenging task. To test current trends in face recognition algorithms, we organized an evaluation on face recognition in mobile environment. This paper presents the results of 8 different participants using two verification metrics. Most submitted algorithms rely on one or more of three types of features: local binary patterns, Gabor wavelet responses including Gabor phases, and color information. The best results are obtained from UNILJ-ALP, which fused several image representations and feature types, and UC-HU, which learns optimal features with a convolutional neural network. Additionally, we assess the usability of the algorithms in mobile devices with limited resources. © 2013 IEEE.

Hong, Z., Mei, X., Prokhorov, D. & Tao, D. 2013, 'Tracking via Robust Multi-task Multi-view Joint Sparse Representation', Proceedings of IEEE International Conference on Computer Vision, IEEE International Conference on Computer Vision, IEEE, Sydney, pp. 649-656.
View description>>

Combining multiple observation views has proven beneficial for tracking. In this paper, we cast tracking as a novel multi-task multi-view sparse learning problem and exploit the cues from multiple views including various types of visual features, such as intensity, color, and edge, where each feature observation can be sparsely represented by a linear combination of atoms from an adaptive feature dictionary. The proposed method is integrated in a particle filter framework where every view in each particle is regarded as an individual task. We jointly consider the underlying relationship between tasks across different views and different particles, and tackle it in a unified robust multi-task formulation. In addition, to capture the frequently emerging outlier tasks, we decompose the representation matrix to two collaborative components which enable a more robust and accurate approximation. We show that the proposed formulation can be efficiently solved using the Accelerated Proximal Gradient method with a small number of closed-form updates. The presented tracker is implemented using four types of features and is tested on numerous benchmark video sequences. Both the qualitative and quantitative results demonstrate the superior performance of the proposed approach compared to several state-of-the-art trackers.

Wu, F., Tan, X., Yang, Y., Tao, D., Tang, S. & Zhuang, Y. 2013, 'Supervised Nonnegative Tensor Factorization with Maximum-Margin Constraint', Twenty-Seventh AAAI Conference on Artificial Intelligence, AAAI Conference on Artificial Intelligence, AAAI Press, Bellevue, Washington, USA, pp. 962-968.
View description>>

Non-negative tensor factorization (NTF) has attracted great attention in the machine learning community. In this paper, we extend traditional non-negative tensor factorization into a supervised discriminative decomposition, referred as Supervised Non-negative Tensor Factorization with Maximum-Margin Constraint(SNTFM2). SNTFM2 formulates the optimal discriminative factorization of non-negative tensorial data as a coupled least-squares optimization problem via a maximum-margin method. As a result, SNTFM2 not only faithfully approximates the tensorial data by additive combinations of the basis, but also obtains a strong generalization power to discriminative analysis (in particularfor classification in this paper). The experimental results show the superiority of our proposed model over state-of-the-art techniques on both toy and real world data sets.

Xu, C., Tao, D., Li, Y. & Xu, C. 2013, 'Large-margin multi-view Gaussian process for image classification', Proceedings of the Fifth International Conference on Internet Multimedia Computing and Service, International Conference on Internet Multimedia Computing and Service, ACM, Huangshan, China, pp. 7-12.
View description>>

In image classification, the goal is to decide whether an image belongs to a certain category or not. Multiple features are usually employed to comprehend the contents of images substantially for the improvement of classification accuracy. However, it also brings in some new problems that how to effectively combine multiple features together, and how to handle the high-dimensional features from multiple views given the small training set. In this paper, we present a large-margin Gaussian process approach to discover the latent space shared by multiple features. Therefore, multiple features can complement each other in this low-dimensional latent space, which derives a strong discriminative ability from the large-margin principle, and then the following classification task can be effectively accomplished. The resulted objective function can be efficiently solved using the gradient descent techniques. Finally, we demonstrate the advantages of the proposed algorithm on real-world image datasets for discovering discriminative latent space and improving the classification performance.

Zhang, K., Gao, X., Tao, D. & Li, X. 2013, 'Image super-resolution via non-local steering kernel regression regularization', IEEE International Conference on Image Processing, ICIP 2013, IEEE International Conference on Image Processing, IEEE, Melbourne, Australia, pp. 943-946.
View description>>

In this paper, we employ the non-local steering kernel regres- sion to construct an effective regularization term for the sin- gle image super-resolution problem. The proposed method seamlessly integrates the properties of local structural regu- larity and non-local self-similarity existing in natural images, and solves a least squares minimization problem for obtain- ing the desired high-resolution image. Extensive experimen- tal results on both simulated and real low-resolution images demonstrate that the proposed method can restore compelling results with sharp edges and fine textures.

Zhou, T., Bian, W. & Tao, D. 2013, 'Divide-and-Conquer Anchoring for Near-Separable Nonnegative Matrix Factorization and Completion in High Dimensions', IEEE 13th International Conference on Data Mining, IEEE International Conference on Data Mining, IEEE, Dallas, TX, USA, pp. 917-926.
View description>>

Abstract Nonnegative matrix factorization (NMF) becomes tractable in polynomial time with unique solution under separability assumption , which postulates all the data points are contained in the conical hull of a few anchor data points. Recently developed linear programming and greedy pursuit methods can pick out the anchors from noisy data and results in a near-separable NMF. But their efficiency could be seriously weakened in high dimensions. In this paper, we show that the anchors can be precisely located from low- dimensional geometry of the data points even when their high dimensional features suffer from serious incompleteness. Our framework, entitled divide-and-conquer anchoring (DCA), divides the high-dimensional anchoring problem into a few cheaper sub-problems seeking anchors of data projections in low-dimensional random spaces, which can be solved in parallel by any near-separable NMF, and combines all the detected low-dimensional anchors via a fast hypothesis testing to identify the original anchors. We further develop two non- iterative anchoring algorithms in 1D and 2D spaces for data in convex hull and conical hull, respectively. These two rapid algorithms in the ultra low dimensions suffice to generate a robust and efficient near-separable NMF for high-dimensional or incomplete data via DCA. Compared to existing methods, two vital advantages of DCA are its scalability for big data, and capability of handling incomplete and high-dimensional noisy data. A rigorous analysis proves that DCA is able to find the correct anchors of a rank- k matrix by solving O ( k log k ) sub- problems. Finally, we show DCA outperforms state-of-the-art methods on various datasets and tasks.

Journal articles

Bian, W., Tao, D. & Rui, Y. 2012, 'Cross-Domain Human Action Recognition', Ieee Transactions On Systems Man And Cybernetics Part B-Cybernetics, vol. 42, no. 2, pp. 298-307.
View description>>

Conventional human action recognition algorithms cannot work well when the amount of training videos is insufficient. We solve this problem by proposing a transfer topic model (TTM), which utilizes information extracted from videos in the auxiliary domai

Cheng, J., Xie, C., Bian, W. & Tao, D. 2012, 'Feature fusion for 3D hand gesture recognition by learning a shared hidden space', Pattern Recognition Letters, vol. 33, no. 4, pp. 476-484.
View description>>

Hand gesture recognition has been intensively applied in various humancomputer interaction (HCI) systems. Different hand gesture recognition methods were developed based on particular features, e.g., gesture trajectories and acceleration signals. However, it has been noticed that the limitation of either features can lead to flaws of a HCI system. In this paper, to overcome the limitations but combine the merits of both features, we propose a novel feature fusion approach for 3D hand gesture recognition. In our approach, gesture trajectories are represented by the intersection numbers with randomly generated line segments on their 2D principal planes, acceleration signals are represented by the coefficients of discrete cosine transformation (DCT). Then, a hidden space shared by the two features is learned by using penalized maximum likelihood estimation (MLE). An iterative algorithm, composed of two steps per iteration, is derived to for this penalized MLE, in which the first step is to solve a standard least square problem and the second step is to solve a Sylvester equation. We tested our hand gesture recognition approach on different hand gesture sets. Results confirm the effectiveness of the feature fusion method.

Fei, G., Tao, D., Li, X., Gao, X. & He, L. 2012, 'Local Structure Divergence Index for Image Quality Assessment', Lecture Notes in Computer Science, vol. 7667, pp. 337-344.
View description>>

Image quality assessment (IQA) algorithms are important for image-processing systems. And structure information plays a significant role in the development of IQA metrics. In contrast to existing structure driven IQA algorithms that measure the structure information using the normalized image or gradient amplitudes, we present a new Local Structure Divergence (LSD) index based on the local structures contained in an image. In particular, we exploit the steering kernels to describe local structures. Afterward, we estimate the quality of a given image by calculating the symmetric Kullback-Leibler divergence (SKLD) between kernels of the reference image and the distorted image. Experimental results on the LIVE database II show that LSD performs consistently with the human perception with a high confidence, and outperforms representative structure driven IQA metrics across various distortions

Gao, X., Wang, N., Tao, D. & Li, X. 2012, 'Face Sketch-photo Synthesis And Retrieval Using Sparse Representation', IEEE Transactions on Circuits and Systems for Video Technology, vol. 22, no. 8, pp. 1213-1226.
View description>>

Sketch-photo synthesis plays an important role in sketch-based face photo retrieval and photo-based face sketch retrieval systems. In this paper, we propose an automatic sketch-photo synthesis and retrieval algorithm based on sparse representation. The p

Gao, Y., Wang, M., Tao, D., Ji, R. & Dai, Q. 2012, '3D Object Retrieval and Recognition With Hypergraph Analysis', IEEE Transactions On Image Processing, vol. 21, no. 9, pp. 4290-4303.
View description>>

View-based 3-D object retrieval and recognition has become popular in practice, e.g., in computer aided design. It is difficult to precisely estimate the distance between two objects represented by multiple views. Thus, current view-based 3-D object retrieval and recognition methods may not perform well. In this paper, we propose a hypergraph analysis approach to address this problem by avoiding the estimation of the distance between objects. In particular, we construct multiple hypergraphs for a set of 3-D objects based on their 2-D views. In these hypergraphs, each vertex is an object, and each edge is a cluster of views. Therefore, an edge connects multiple vertices. We define the weight of each edge based on the similarities between any two views within the cluster. Retrieval and recognition are performed based on the hypergraphs. Therefore, our method can explore the higher order relationship among objects and does not use the distance between objects. We conduct experiments on the National Taiwan University 3-D model dataset and the ETH 3-D object collection. Experimental results demonstrate the effectiveness of the proposed method by comparing with the state-of-the-art methods.

Li, J., Tao, D. & Li, X. 2012, 'A probabilistic model for image representation via multiple patterns', Pattern Recognition, vol. 45, no. 11, pp. 4044-4053.
View description>>

For image analysis, an important extension to principal component analysis (PCA) is to treat an image as multiple samples, which helps alleviate the small sample size problem. Various schemes of transforming an image to multiple samples have been proposed. Although having been shown effective in practice, the schemes are mainly based on heuristics and experience. In this paper, we propose a probabilistic PCA model, in which we explicitly represent the transformation scheme and incorporate the scheme as a stochastic component of the model. Therefore fitting the model automatically learns the transformation. Moreover, the learned model allows us to distinguish regions that can be well described by the PCA model from those that need further treatment. Experiments on synthetic images and face data sets demonstrate the properties and utility of the proposed model

Li, Y., Geng, B., Yang, L., Xu, C. & Bian, W. 2012, 'Query Difficulty Estimation For Image Retrieval', Neurocomputing, vol. 95, no. NA, pp. 48-53.
View description>>

Query difficulty estimation predicts the performance of the search result of the given query. It is a powerful tool for multimedia retrieval and receives increasing attention. It can guide the pseudo relevance feedback to rerank the image search results

Yu, J.X., Bian, W., Song, M., Cheng, J.L. & Tao, D. 2012, 'Graph Based Transductive Learning For Cartoon Correspondence Construction', Neurocomputing, vol. 79, pp. 105-114.
View description>>

Correspondence construction of characters in key frames is the prerequisite for cartoon animations' automatic inbetweening and coloring. Since each frame of an animation consists of multiple layers, characters are complicated in terms of shape and struct

Zhang, C., Bian, W., Tao, D. & Weisi, L. 2012, 'Discretized-Vapnik-Chervonenkis Dimension For Analyzing Complexity Of Real Function Classes', IEEE Transactions On Neural Networks And Learning Systems, vol. 23, no. 9, pp. 1461-1472.
View description>>

In this paper, we introduce the discretized-Vapnik-Chervonenkis (VC) dimension for studying the complexity of a real function class, and then analyze properties of real function classes and neural networks. We first prove that a countable traversal set i

Zhang, K., Gao, X., Tao, D. & Li, X. 2012, 'Single image super-resolution with non-local means and steering kernel regression', IEEE Transactions On Image Processing, vol. 21, no. 11, pp. 4544-4556.
View description>>

Image super-resolution (SR) reconstruction is essentially an ill-posed problem, so it is important to design an effective prior. For this purpose, we propose a novel image SR method by learning both non-local and local regularization priors from a given low-resolution image. The non-local prior takes advantage of the redundancy of similar patches in natural images, while the local prior assumes that a target pixel can be estimated by a weighted average of its neighbors. Based on the above considerations, we utilize the non-local means ?lter to learn a non-local prior and the steering kernel regression to learn a local prior. By assembling the two complementary regularization terms, we propose a maximum a posteriori probability framework for SR recovery. Thorough experimental results suggest that the proposed SR method can reconstruct higher quality results both quantitatively and perceptually

Zhang, K., Mu, G., Yuan, Y., Gao, X. & Tao, D. 2012, 'Video Super-resolution With 3D Adaptive Normalized Convolution', Neurocomputing, vol. 94, no. NA, pp. 140-151.
View description>>

The classic multi-image-based super-resolution (SR) methods typically take global motion pattern to produce one or multiple high-resolution (HR) versions from a set of low-resolution (LR) images. However, due to the influence of aliasing and noise, it is

Zhang, Z., Cheng, J., Li, J., Bian, W. & Tao, D. 2012, 'Segment-Based Features for Time Series Classification', Computer Journal, vol. 55, no. 9, pp. 1088-1102.
View description>>

In this paper, we propose an approach termed segment-based features (SBFs) to classify time series. The approach is inspired by the success of the component- or part-based methods of object recognition in computer vision, in which a visual object is described as a number of characteristic parts and the relations among the parts. Utilizing this idea in the problem of time series classification, a time series is represented as a set of segments and the corresponding temporal relations. First, a number of interest segments are extracted by interest point detection with automatic scale selection. Then, a number of feature prototypes are collected by random sampling from the segment set, where each feature prototype may include single segment or multiple ordered segments. Subsequently, each time series is transformed to a standard feature vector, i.e. SBF, where each entry in the SBF is calculated as the maximum response (maximum similarity) of the corresponding feature prototype to the segment set of the time series.

Zheng, S., Huang, K., tan, T. & Tao, D. 2012, 'A Cascade Fusion Scheme For Gait And Cumulative Foot Pressure Image Recognition', Pattern Recognition, vol. 45, no. 10, pp. 3603-3610.
View description>>

Cumulative foot pressure images represent the 2D ground reaction force during one gait cycle. Biomedical and forensic studies show that humans can be distinguished by unique limb movement patterns and ground reaction force. Considering continuous gait po

Conferences

Du, B., Zhang, L., Tao, D., Wang, N. & Chen, T. 2012, 'A spectral dissimilarity constrained nonnegative matrix factorization based cancer screening algorithm from hyperspectral fluorescence images', ICCH 2012 Proceedings - International Conference on Computerized Healthcare, International Conference on Computerized Healthcare, IEEE, Institute of Electrical and Electronics Engineers, Hong Kong, China, pp. 112-119.
View description>>

Bioluminescence from living body can help screen cancers without penetrating the inside of living body. Hyperspectral imaging technique is a novel way to obtain physical meaningful signatures, providing very fine spectral resolution, that can be very used in distinguishing different kinds of materials, and have been widely used in remote sensing field. Fluorescence imaging has proved effective in monitoring probable cancer cells. Recent work has made great progress on the hyperspectral fluorescence imaging techniques, which makes the elaborate spectral observation of cancer areas possible. So how to propose the proper hyperspectral image processing methods to handle the hyperspectral medical images is of practical importance. Cancer cells would be distinguishable with normal ones when the living body is injected with fluorescence, which helps organs inside the living body emit lights, and then the signals can be catched by the passive imaging sensor. Spectral unmixing technique in hyperspectral remote sensing has been introduced to detect the probable cancer areas. However, since the cancer areas are small and the normal areas and the cancer ares may not pure pixels so that the predefined endmembers would not available. In this case, the classic blind signals separation methods are applicable. Considering the spectral dissimilarity between cancer and normal cells, a novel spectral dissimilarity constrained based NMF method is proposed in this paper for cancer screening from fluorescence hyperspectral images. Experiments evaluate the performance of variable NMF based method and our proposed spectral dissimilarity based NMF methods: 1) The NMF methods do perform well in detect the cancer areas inside the living body; 2) The spectral dissimilarity constrained NMF present more accurate cancer areas; 3) The spectral dissimilarity constraint presents better performance in different SNR and different purities of the mixing endmembers. © 2012 IEEE.

Liu, X., Song, M., Zhang, L., Wang, S., Bu, J., Chen, C. & Tao, D. 2012, 'Joint shot boundary detection and key frame extraction', Proceedings - International Conference on Pattern Recognition, pp. 2565-2568.
View description>>

Representing a video by a set of key frames is useful for efficient video browsing and retrieving. But key frame extraction keeps a challenge in the computer vision field. In this paper, we propose a joint framework to integrate both shot boundary detection and key frame extraction, wherein three probabilistic components are taken into account, i.e. The prior of the key frames, the conditional probability of shot boundaries and the conditional probability of each video frame. Thus the key frame extraction is treated as a Maximum A Posteriori which can be solved by adopting alternate strategy. Experimental results show that the proposed method preserves the scene level structure and extracts key frames that are representative and discriminative. © 2012 ICPR Org Committee.

Zhang, K., Gao, X., Tao, D. & Li, X. 2012, 'Multi-scale dictionary for single image super-resolution', 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), IEEE Conference on Computer Vision and Pattern Recognition, IEEE, Providence, USA, pp. 1114-1121.
View description>>

Reconstruction- and example-based super-resolution (SR) methods are promising for restoring a high-resolution (HR) image from low-resolution (LR) image(s). Under large magnification, reconstruction-based methods usually fail to hallucinate visual details while example-based methods sometimes introduce unexpected details. Given a generic LR image, to reconstruct a photo-realistic SR image and to suppress artifacts in the reconstructed SR image, we introduce a multi-scale dictionary to a novel SR method that simultaneously integrates local and non-local priors. The local prior suppresses artifacts by using steering kernel regression to predict the target pixel from a small local area. The non-local prior enriches visual details by taking a weighted average of a large neighborhood as an estimate of the target pixel. Essentially, these two priors are complementary to each other. Experimental results demonstrate that the proposed method can produce high quality SR recovery both quantitatively and perceptually.

Zhang, L., Song, M., Sun, L., Liu, X., Wang, Y., Tao, D., Bu, J. & Chen, C. 2012, 'Spatial graphlet matching kernel for recognizing aerial image categories', Proceedings - International Conference on Pattern Recognition, pp. 2813-2816.
View description>>

This paper presents a method for recognizing aerial image categories based on matching graphlets(i.e., small connected subgraphs) extracted from aerial images. By constructing a Region Adjacency Graph (RAG) to encode the geometric property and the color distribution of each aerial image, we cast aerial image category recognition as RAG-to-RAG matching. Based on graph theory, RAG-to-RAG matching is conducted by matching all their respective graphlets. Towards an effective graphlet matching process, we develop a manifold embedding algorithm to transfer different-sized graphlets into equal length feature vectors and further integrate these feature vectors into a kernel. This kernel is used to train a SVM [8] classifier for aerial image categories recognition. Experimental results demonstrate our method outperforms several state-of-the-art object/scene recognition models. © 2012 ICPR Org Committee.

Journal articles

Cheng, J.L., Qiao, M., Bian, W. & Tao, D. 2011, '3D Human Posture Segmentation By Spectral Clustering With Surface Normal Constraint', Signal Processing, vol. 91, no. 9, pp. 2204-2212.
View description>>

In this paper, we propose a new algorithm for partitioning human posture represented by 3D point clouds sampled from the surface of human body. The algorithm is formed as a constrained extension of the recently developed segmentation method, spectral clu

Wang, Y., Tao, D., Gao, X., Li, X. & Wang, B. 2011, 'Mammographic Mass Segmentation: Embedding Multiple Features In Vector-Valued Level Set In Ambiguous Regions', Pattern Recognition, vol. 44, no. 9, pp. 1903-1915.
View description>>

Mammographic mass segmentation plays an important role in computer-aided diagnosis systems. It is very challenging because masses are always of low contrast with ambiguous margins, connected with the normal tissues, and of various scales and complex shap

Zhang, K., Gao, X., Li, X. & Tao, D. 2011, 'Partially Supervised Neighbor Embedding for Example-Based Image Super-Resolution', IEEE Journal of Selected Topics in Signal Processing, vol. 5, no. 2, pp. 230-239.
View description>>

Neighbor embedding algorithm has been widely used in example-based super-resolution reconstruction from a single frame, which makes the assumption that neighbor patches embedded are contained in a single manifold. However, it is not always true for compl

Conferences

Cheng, J., Tao, D., Liu, J., Wong, D.W.K., Lee, B.H., Baskaran, M., Wong, T.Y. & Aung, T. 2011, 'Focal biologically inspired feature for glaucoma type classification', Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), pp. 91-98.
View description>>

Glaucoma is an optic nerve disease resulting in loss of vision. There are two common types of glaucoma: open angle glaucoma and angle closure glaucoma. Glaucoma type classification is important in glaucoma diagnosis. Ophthalmologists examine the iridocorneal angle between iris and cornea to determine the glaucoma type. However, manual classification/grading of the iridocorneal angle images is subjective and time consuming. To save workload and facilitate large-scale clinical use, it is essential to determine glaucoma type automatically. In this paper, we propose to use focal biologically inspired feature for the classification. The iris surface is located to determine the focal region. The association between focal biologically inspired feature and angle grades is built. The experimental results show that the proposed method can correctly classify 85.2% images from open angle glaucoma and 84.3% images from angle closure glaucoma. The accuracy could be improved close to 90% with more images included in the training. The results show that the focal biologically inspired feature is effective for automatic glaucoma type classification. It can be used to reduce workload of ophthalmologists and diagnosis cost. © 2011 Springer-Verlag.

Li, Y., Luo, Y., Tao, D. & Xu, C. 2011, 'Query difficulty guided image retrieval system', Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), International Conference on Multimedia Modelling, Springer, Taipei, Taiwan, pp. 479-482.
View description>>

Query difficulty estimation is a useful tool for content-based image retrieval. It predicts the performance of the search result of a given query, and thus it can guide the pseudo relevance feedback to rerank the image search results, and can be used to re-write the given query by suggesting "easy" alternatives. This paper presents a query difficulty estimation guided image retrieval system. The system initially estimates the difficulty of a given query image by analyzing both the query image and the retrieved top ranked images. Different search strategies are correspondingly applied to improve the retrieval performance. © 2011 Springer-Verlag Berlin Heidelberg.

Xie, B., Bian, W., Tao, D. & Chordia, P. 2011, 'Music tagging with regularized logistic regression', Proceedings of the 12th International Society for Music Information Retrieval Conference, ISMIR 2011, International Society for Music Information Retrieval Conference, Miami, Florida, USA, pp. 711-716.
View description>>

In this paper, we present a set of simple and efficient regularized logistic regression algorithms to predict tags of music. We first vector-quantize the delta MFCC features using k-means and construct "bag-of-words" representation for each song. We then learn the parameters of these logistic regression algorithms from the "bag-of- words" vectors and ground truth labels in the training set. At test time, the prediction confidence by the linear classifiers can be used to rank the songs for music annotation and retrieval tasks. Thanks to the convex property of the objective functions, we adopt an efficient and scalable generalized gradient method to learn the parameters, with global optimum guaranteed. And we show that these efficient algorithms achieve stateof- the-art performance in annotation and retrieval tasks evaluated on CAL-500. © 2011 International Society for Music Information Retrieval.

Zhang, L., Bian, W., Song, M., Tao, D. & Liu, X. 2011, 'Integrating Local Features into Discriminative Graphlets for Scene Classification', Lecture Notes in Computer Science. Neural Information Processing. 18th International Conference, ICONIP 2011, International Conference, ICONIP, Springer-Verlag Berlin / Heidelberg, Shanghai, China, pp. 657-666.
View description>>

Scene classification plays an important role in multimedia information retrieval. Since local features are robust to image transformation, they have been used extensively for scene classification. However, it is difficult to encode the spatial relations of local features in the classification process. To solve this problem, Geometric Local Features Integration(GLFI) is proposed. By segmenting a scene image into a set of regions, a so-called Region Adjacency Graph(RAG) is constructed to model their spatial relations. To measure the similarity of two RAGs, we select a few discriminative templates and then use them to extract the corresponding discriminative graphlets(connected subgraphs of an RAG). These discriminative graphlets are further integrated by a boosting strategy for scene classification. Experiments on five datasets validate the effectiveness of our GLFI.

Zhang, L., Song, M., Bian, W., Tao, D., Liu, X., Bu, J. & Chen, C. 2011, 'Feature Relationships Hypergraph for Multimodal Recognition', Lecture Notes in Computer Science. Neural Information Processing. 18th International Conference, ICONIP 2011, International Conference on Neural Information Processing, Springer-Verlag Berlin / Heidelberg, Shanghai, China, pp. 589-598.
View description>>

Utilizing multimodal features to describe multimedia data is a natural way for accurate pattern recognition. However, how to deal with the complex relationships caused by the tremendous multimodal features and the curse of dimensionality are still two crucial challenges. To solve the two problems, a new multimodal features integration method is proposed. Firstly, a so-called Feature Relationships Hypergraph (FRH) is proposed to model the high-order correlations among the multimodal features. Then, based on FRH, the multimodal features are clustered into a set of low-dimensional partitions. And two types of matrices, the interpartition matrix and intra-partition matrix, are computed to quantify the inter- and intra- partition relationships. Finally, a multi-class boosting strategy is developed to obtain a strong classifier by combining the weak classifiers learned from the intra- partition matrices. The experimental results on different datasets validate the effectiveness of our approach

Journal articles

Yang, Y., Zhuang, Y., Tao, D., Xu, D., Yu, J. & Luo, J. 2010, 'Recognizing Cartoon Image Gestures for Retrieval and Interactive Cartoon Clip Synthesis', IEEE Transactions on Circuits and Systems for Video Technology, vol. 20, no. 12, pp. 1745-1756.
View description>>

In this paper, we propose a new method to recognize gestures of cartoon images with two practical applications, i.e., content-based cartoon image retrieval and interactive cartoon clip synthesis. Upon analyzing the unique properties of four types of features including global color histogram, local color histogram (LCH), edge feature (EF), and motion direction feature (MDF), we propose to employ different features for different purposes and in various phases. We use EF to define a graph and then refine its local structure by LCH. Based on this graph, we adopt a transductive learning algorithm to construct local patches for each cartoon image. A spectral method is then proposed to optimize the local structure of each patch and then align these patches globally. MDF is fused with EF and LCH and a cartoon gesture space is constructed for cartoon image gesture recognition. We apply the proposed method to content-based cartoon image retrieval and interactive cartoon clip synthesis. The experiments demonstrate the effectiveness of our method.

Conferences

Bian, W., Li, J. & Tao, D. 2010, 'Feature Extraction For FMRI-based Human Brain Activity Recognition', Machine Learning In Medical Imaging, International Workshop on Machine Learning in Medical Imaging, Springer-Verlag Berlin, Beijing, China, pp. 148-156.
View description>>

Mitchell et al. [9] demonstrated that support vector machines (SVM) are effective to classify the cognitive state of a human subject based on fRMI images observed over a single time interval. However, the direct use of classifiers on active voxels veils

Bian, W., Cheng, J.L. & Tao, D. 2009, 'Biased Isomap Projections For Interactive Reranking', ICME: 2009 IEEE International Conference On Multimedia And Expo, Vols 1-3, IEEE International Conference on Multimedia and Expo, IEEE, New York, NY, pp. 1632-1635.
View description>>

Image search has recently gained more and more attention for various applications. To capture users' intensions and to bridge the gap between the low level visual features and the high level semantics, a dozen of interactive reranking (IR) or relevance f

Yang, Y., Zhuang, Y., Xu, D., Pan, Y., Tao, D. & Maybank, S. 2009, 'Retrieval Based Interactive Cartoon Synthesis via Unsupervised Bi-Distance Metric Learning', 2009 ACM International Conference on Multimedia Compilation E-Proceedings (with co-located workshops & symposiums), ACM international conference on Multimedia, Association for Computing Machinery, Inc. (ACM), Beijing, China, pp. 311-320.
View description>>

Cartoons play important roles in many areas, but it requires a lot of labor to produce new cartoon clips. In this paper, we propose a gesture recognition method for cartoon character images with two applications, namely content-based cartoon image retrieval and cartoon clip synthesis. We first define Edge Features (EF) and Motion Direction Features (MDF) for cartoon character images. The features are classified into two different groups, namely intra-features and inter-features. An Unsupervised Bi-Distance Metric Learning (UBDML) algorithm is proposed to recognize the gestures of cartoon character images. Different from the previous research efforts on distance metric learning, UBDML learns the optimal distance metric from the heterogeneous distance metrics derived from intra-features and inter-features. Content-based cartoon character image retrieval and cartoon clip synthesis can be carried out based on the distance metric learned by UBDML. Experiments show that the cartoon character image retrieval has a high precision and that the cartoon clip synthesis can be carried out efficiently.

Journal articles

Li, X., Maybank, S., Yan, S., Tao, D. & Xu, D. 2008, 'Gait components and their application to gender recognition', IEEE Transactions On Systems Man And Cybernetics Part C-Applications And Reviews, vol. 38, no. 2, pp. 145-155.
View description>>

Human gait is a promising biometrics; resource. In this paper, the information about gait is obtained from the motions of the different parts of the silhouette. The human silhouette is segmented into seven components, namely head, arm, trunk, thigh, fron


Back to list image