University of Technology, Sydney

Staff directory | Webmail | Maps | Newsroom | What's on

# Publications

| 1986 | 1987 | 1990 | 1991 | 1992 | 1994 | 1995 | 1996 | 1997 | 1998 | 1999 | 2000 | 2001 | 2002 | 2003 | 2004 | 2005 | 2006 | 2007 | 2008 | 2009 | 2010 | 2011 | 2012 | 2013 | 2014 | 2015 | 2016 | 2017 | 2018

## Journal articles

Anshar, M. & Williams, M.A. 2018, 'Evolving robot empathy towards humans with motor disabilities through artificial pain generation', AIMS Neuroscience, vol. 5, no. 1, pp. 56-73.
View description>>

© 2018 the Author(s). In contact assistive robots, a prolonged physical engagement between robots and humans with motor disabilities due to shoulder injuries, for instance, may at times lead humans to experience pain. In this situation, robots will require sophisticated capabilities, such as the ability to recognize human pain in advance and generate counter-responses as follow up emphatic action. Hence, it is important for robots to acquire an appropriate pain concept that allows them to develop these capabilities. This paper conceptualizes empathy generation through the realization of synthetic pain classes integrated into a robot's self-awareness framework, and the implementation of fault detection on the robot body serves as a primary source of pain activation. Projection of human shoulder motion into the robot arm motion acts as a fusion process, which is used as a medium to gather information for analyses then to generate corresponding synthetic pain and emphatic responses. An experiment is designed to mirror a human peer's shoulder motion into an observer robot. The results demonstrate that the fusion takes place accurately whenever unified internal states are achieved, allowing accurate classification of synthetic pain categories and generation of empathy responses in a timely fashion. Future works will consider a pain activation mechanism development.

Deng, C., Liu, X., Li, C. & Tao, D. 2018, 'Active multi-kernel domain adaptation for hyperspectral image classification', Pattern Recognition.
View description>>

© 2017 Elsevier Ltd. Recent years have witnessed the quick progress of the hyperspectral images (HSI) classification. Most of existing studies either heavily rely on the expensive label information using the supervised learning or can hardly exploit the discriminative information borrowed from related domains. To address this issues, in this paper we show a novel framework addressing HSI classification based on the domain adaptation (DA) with active learning (AL). The main idea of our method is to retrain the multi-kernel classifier by utilizing the available labeled samples from source domain, and adding minimum number of the most informative samples with active queries in the target domain. The proposed method adaptively combines multiple kernels, forming a DA classifier that minimizes the bias between the source and target domains. Further equipped with the nested actively updating process, it sequentially expands the training set and gradually converges to a satisfying level of classification performance. We study this active adaptation framework with the Margin Sampling (MS) strategy in the HSI classification task. Our experimental results on two popular HSI datasets demonstrate its effectiveness.

Ding, C. & Tao, D. 2018, 'Trunk-Branch Ensemble Convolutional Neural Networks for Video-Based Face Recognition', IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 40, no. 4, pp. 1002-1014.
View description>>

© 1979-2012 IEEE. Human faces in surveillance videos often suffer from severe image blur, dramatic pose variations, and occlusion. In this paper, we propose a comprehensive framework based on Convolutional Neural Networks (CNN) to overcome challenges in video-based face recognition (VFR). First, to learn blur-robust face representations, we artificially blur training data composed of clear still images to account for a shortfall in real-world video training data. Using training data composed of both still images and artificially blurred data, CNN is encouraged to learn blur-insensitive features automatically. Second, to enhance robustness of CNN features to pose variations and occlusion, we propose a Trunk-Branch Ensemble CNN model (TBE-CNN), which extracts complementary information from holistic face images and patches cropped around facial components. TBE-CNN is an end-to-end model that extracts features efficiently by sharing the low-and middle-level convolutional layers between the trunk and branch networks. Third, to further promote the discriminative power of the representations learnt by TBE-CNN, we propose an improved triplet loss function. Systematic experiments justify the effectiveness of the proposed techniques. Most impressively, TBE-CNN achieves state-of-the-art performance on three popular video face databases: PaSC, COX Face, and YouTube Faces. With the proposed techniques, we also obtain the first place in the BTAS 2016 Video Person Recognition Evaluation.

Dong, F., Zhang, G., Lu, J. & Li, K. 2018, 'Fuzzy competence model drift detection for data-driven decision support systems', Knowledge-Based Systems, vol. 143, pp. 284-294.
View description>>

© 2017 Elsevier B.V. This paper focuses on concept drift in business intelligence and data-driven decision support systems (DSSs). The assumption of a fixed distribution in the data renders conventional static DSSs inaccurate and unable to make correct decisions when concept drift occurs. However, it is important to know when, how, and where concept drift occurs so a DSS can adjust its decision processing knowledge to adapt to an ever-changing environment at the appropriate time. This paper presents a data distribution-based concept drift detection method called fuzzy competence model drift detection (FCM-DD). By introducing fuzzy sets theory and replacing crisp boundaries with fuzzy ones, we have improved the competence model to provide a better, more refined empirical distribution of the data stream. FCM-DD requires no prior knowledge of the underlying distribution and provides statistical guarantee of the reliability of the detected drift, based on the theory of bootstrapping. A series of experiments show that our proposed FCM-DD method can detect drift more accurately, has good sensitivity, and is robust.

Dong, X., Dong, J., Zhou, H., Sun, J. & Tao, D. 2018, 'Automatic Chinese Postal Address Block Location Using Proximity Descriptors and Cooperative Profit Random Forests', IEEE Transactions on Industrial Electronics, vol. 65, no. 5, pp. 4401-4412.
View description>>

© 1982-2012 IEEE. Locating the destination address block is key to automated sorting of mails. Due to the characteristics of Chinese envelopes used in mainland China, we here exploit proximity cues in order to describe the investigated regions on envelopes. We propose two proximity descriptors encoding spatial distributions of the connected components obtained from the binary envelope images. To locate the destination address block, these descriptors are used together with cooperative profit random forests (CPRFs). Experimental results show that the proposed proximity descriptors are superior to two component descriptors, which only exploit the shape characteristics of the individual components, and the CPRF classifier produces higher recall values than seven state-of-the-art classifiers. These promising results are due to the fact that the proposed descriptors encode the proximity characteristics of the binary envelope images, and the CPRF classifier uses an effective tree node split approach.

Du, B., Tang, X., Wang, Z., Zhang, L. & Tao, D. 2018, 'Robust Graph-Based Semisupervised Learning for Noisy Labeled Data via Maximum Correntropy Criterion', IEEE Transactions on Cybernetics.
View description>>

IEEE Semisupervised learning (SSL) methods have been proved to be effective at solving the labeled samples shortage problem by using a large number of unlabeled samples together with a small number of labeled samples. However, many traditional SSL methods may not be robust with too much labeling noisy data. To address this issue, in this paper, we propose a robust graph-based SSL method based on maximum correntropy criterion to learn a robust and strong generalization model. In detail, the graph-based SSL framework is improved by imposing supervised information on the regularizer, which can strengthen the constraint on labels, thus ensuring that the predicted labels of each cluster are close to the true labels. Furthermore, the maximum correntropy criterion is introduced into the graph-based SSL framework to suppress labeling noise. Extensive image classification experiments prove the generalization and robustness of the proposed SSL method.

Fan, M., Zhang, X., Du, L., Chen, L. & Tao, D. 2018, 'Semi-Supervised Learning Through Label Propagation on Geodesics', IEEE Transactions on Cybernetics, vol. 48, no. 5, pp. 1486-1499.
View description>>

© 2013 IEEE. Graph-based semi-supervised learning (SSL) has attracted great attention over the past decade. However, there are still several open problems in this paper, including: 1) how to construct an effective graph over data with complex distribution and 2) how to define and effectively use pair-wise similarity for robust label propagation. In this paper, we utilize a simple and effective graph construction method to construct the graph over data lying on multiple data manifolds. The method can guarantee the c onnectiveness between pair-wise data points. Then, the global pair-wise data similarity is naturally characterized by geodesic distance-based joint probability, where the geodesic distance is approximated by the graph distance. The new data similarity is much more effective than previous Euclidean distance-based similarities. To apply data structure for robust label propagation, Kullback-Leibler divergence is utilized to measure the inconsistency between the input pair-wise similarity and the output similarity. In order to further consider intraclass and interclass variances, a novel regularization term on sample-wise margins is introduced to the objective function. This enables the proposed method fully utilizes the input data structure and the label information for classification. An efficient optimization method and the convergence analysis have been proposed for our problem. Besides, out-of-sample extension is discussed and addressed. Comparisons with the state-of-the-art SSL methods on image classification tasks have been presented to show the effectiveness of the proposed method.

Fu, Q., Luo, Y., Wen, Y., Tao, D., Li, Y. & Duan, L. 2018, 'Towards Intelligent Product Retrieval for TV-to-Online (T2O) Application: A Transfer Metric Learning Approach', IEEE Transactions on Multimedia.
View description>>

IEEE It is desired (especially for young people) to shop for the same or similar products shown in the multimedia contents (such as online TV programs). This indicates an urgent demand for improving the experience of TV-to-Online (T2O). In this paper, a transfer learning approach as well as a prototype system for effortless T2O experience is developed. In the system, a key component is high precision product search, which is to fulfill exact matching between a query item and the database ones. The matching performance primarily relies on distance estimation, but the data characteristics cannot be well modeled and exploited by a simple Euclidean (EU) distance. This motivates us to introduce distance metric learning (DML) for improving the distance estimation. However, in traditional DML methods, the side information (such as the similar/dissimilar constraints or relevance/irrelevance judgements) in target domain is leveraged. These methods may fail due to limited side information. Fortunately, this issue can be alleviated by utilizing transfer metric learning (TML) to exploit information from other related domains. In this paper, a novel Manifold Regularized Heterogeneous Multi-Task Metric Learning (MRHMTML) framework is proposed, in which each domain is treated equally. The proposed approach allows to simultaneously exploit the information from other domains and the unlabeled information. Furthermore, the ranking-based loss is adopted to make our model more appropriate for search. Experiments on two challenging real-world datasets demonstrate the effectiveness of the proposed method. This transfer metric learning approach is expected to impact the transformation of the emerging T2O trend in both TV and online video domains.

Gong, C., Liu, T., Tang, Y., Yang, J., Yang, J. & Tao, D. 2018, 'A Regularization Approach for Instance-Based Superset Label Learning.', IEEE Transactions on Cybernetics, vol. 48, no. 3, pp. 967-978.
View description>>

Different from the traditional supervised learning in which each training example has only one explicit label, superset label learning (SLL) refers to the problem that a training example can be associated with a set of candidate labels, and only one of them is correct. Existing SLL methods are either regularization-based or instance-based, and the latter of which has achieved state-of-the-art performance. This is because the latest instance-based methods contain an explicit disambiguation operation that accurately picks up the groundtruth label of each training example from its ambiguous candidate labels. However, such disambiguation operation does not fully consider the mutually exclusive relationship among different candidate labels, so the disambiguated labels are usually generated in a nondiscriminative way, which is unfavorable for the instance-based methods to obtain satisfactory performance. To address this defect, we develop a novel regularization approach for instance-based superset label (RegISL) learning so that our instance-based method also inherits the good discriminative ability possessed by the regularization scheme. Specifically, we employ a graph to represent the training set, and require the examples that are adjacent on the graph to obtain similar labels. More importantly, a discrimination term is proposed to enlarge the gap of values between possible labels and unlikely labels for every training example. As a result, the intrinsic constraints among different candidate labels are deployed, and the disambiguated labels generated by RegISL are more discriminative and accurate than those output by existing instance-based algorithms. The experimental results on various tasks convincingly demonstrate the superiority of our RegISL to other typical SLL methods in terms of both training accuracy and test accuracy.

Gong, C., Tao, D., Chang, X. & Yang, J. 2018, 'Ensemble Teaching for Hybrid Label Propagation', IEEE Transactions on Cybernetics.
View description>>

IEEE Label propagation aims to iteratively diffuse the label information from labeled examples to unlabeled examples over a similarity graph. Current label propagation algorithms cannot consistently yield satisfactory performance due to two reasons: one is the instability of single propagation method in dealing with various practical data, and the other one is the improper propagation sequence ignoring the labeling difficulties of different examples. To remedy above defects, this paper proposes a novel propagation algorithm called hybrid diffusion under ensemble teaching (HyDEnT). Specifically, HyDEnT integrates multiple propagation methods as base ''learners'' to fully exploit their individual wisdom, which helps HyDEnT to be stable and obtain consistent encouraging results. More importantly, HyDEnT conducts propagation under the guidance of an ensemble of ''teachers''. That is to say, in every propagation round the simplest curriculum examples are wisely designated by a teaching algorithm, so that their labels can be reliably and accurately decided by the learners. To optimally choose these simplest examples, every teacher in the ensemble should comprehensively consider the examples' difficulties from its own viewpoint, as well as the common knowledge shared by all the teachers. This is accomplished by a designed optimization problem, which can be efficiently solved via the block coordinate descent method. Thanks to the efforts of the teachers, all the unlabeled examples are logically propagated from simple to difficult, leading to better propagation quality of HyDEnT than the existing methods. Experiments on six popular datasets reveal that HyDEnT achieves the highest classification accuracy when compared with six state-of-the-art propagation methodologies such as harmonic functions, Fick's law assisted propagation, linear neighborhood propagation, semisupervised ensemble learning, bipartite graph-based consensus maximization, and teaching-to-learn and learnin...

Gu, K., Tao, D., Qiao, J.F. & Lin, W. 2018, 'Learning a No-Reference Quality Assessment Model of Enhanced Images with Big Data', IEEE Transactions on Neural Networks and Learning Systems, vol. 29, no. 4, pp. 1301-1313.
View description>>

© 2012 IEEE. In this paper, we investigate into the problem of image quality assessment (IQA) and enhancement via machine learning. This issue has long attracted a wide range of attention in computational intelligence and image processing communities, since, for many practical applications, e.g., object detection and recognition, raw images are usually needed to be appropriately enhanced to raise the visual quality (e.g., visibility and contrast). In fact, proper enhancement can noticeably improve the quality of input images, even better than originally captured images, which are generally thought to be of the best quality. In this paper, we present two most important contributions. The first contribution is to develop a new no-reference (NR) IQA model. Given an image, our quality measure first extracts 17 features through analysis of contrast, sharpness, brightness and more, and then yields a measure of visual quality using a regression module, which is learned with big-data training samples that are much bigger than the size of relevant image data sets. The results of experiments on nine data sets validate the superiority and efficiency of our blind metric compared with typical state-of-the-art full-reference, reduced-reference and NA IQA methods. The second contribution is that a robust image enhancement framework is established based on quality optimization. For an input image, by the guidance of the proposed NR-IQA measure, we conduct histogram modification to successively rectify image brightness and contrast to a proper level. Thorough tests demonstrate that our framework can well enhance natural images, low-contrast images, low-light images, and dehazed images. The source code will be released at https://sites.google.com/site/guke198701/publications.

Guan, N., Liu, T., Zhang, Y., Tao, D. & Davis, L.S. 2018, 'Truncated Cauchy Non-negative Matrix Factorization for Robust Subspace Learning', IEEE Transactions on Pattern Analysis and Machine Intelligence.
View description>>

IEEE Non-negative matrix factorization (NMF) minimizes the Euclidean distance between the data matrix and its low rank approximation, and it fails when applied to corrupted data because the loss function is sensitive to outliers. In this paper, we propose a Truncated Cauchy loss that handle outliers by truncating large errors, and develop a Truncated CauchyNMF to robustly learn the subspace on noisy datasets contaminated by outliers. We theoretically analyze the robustness of Truncated CauchyNMF comparing with the competing models and theoretically prove that Truncated CauchyNMF has a generalization bound which converges at a rate of order < formula > < tex > $O(\sqrt{\ln{n}/n})$ < /tex > < /formula > , where < formula > < tex > $n$ < /tex > < /formula > is the sample size. We evaluate Truncated CauchyNMF by image clustering on both simulated and real datasets. The experimental results on the datasets containing gross corruptions validate the effectiveness and robustness of Truncated CauchyNMF for learning robust subspaces.

Gui, J., Liu, T., Sun, Z., Tao, D. & Tan, T. 2018, 'Fast Supervised Discrete Hashing', IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 40, no. 2, pp. 490-496.
View description>>

© 2017 IEEE. Learning-based hashing algorithms are 'hot topics' because they can greatly increase the scale at which existing methods operate. In this paper, we propose a new learning-based hashing method called 'fast supervised discrete hashing' (FSDH) based on 'supervised discrete hashing' (SDH). Regressing the training examples (or hash code) to the corresponding class labels is widely used in ordinary least squares regression. Rather than adopting this method, FSDH uses a very simple yet effective regression of the class labels of training examples to the corresponding hash code to accelerate the algorithm. To the best of our knowledge, this strategy has not previously been used for hashing. Traditional SDH decomposes the optimization into three sub-problems, with the most critical sub-problem - discrete optimization for binary hash codes - solved using iterative discrete cyclic coordinate descent (DCC), which is time-consuming. However, FSDH has a closed-form solution and only requires a single rather than iterative hash code-solving step, which is highly efficient. Furthermore, FSDH is usually faster than SDH for solving the projection matrix for least squares regression, making FSDH generally faster than SDH. For example, our results show that FSDH is about 12-times faster than SDH when the number of hashing bits is 128 on the CIFAR-10 data base, and FSDH is about 151-times faster than FastHash when the number of hashing bits is 64 on the MNIST data-base. Our experimental results show that FSDH is not only fast, but also outperforms other comparative methods.

Guo, F., Wang, W., Shen, J., Shao, L., Yang, J., Tao, D. & Tang, Y.Y. 2018, 'Video Saliency Detection Using Object Proposals', IEEE Transactions on Cybernetics.
View description>>

IEEE In this paper, we introduce a novel approach to identify salient object regions in videos via object proposals. The core idea is to solve the saliency detection problem by ranking and selecting the salient proposals based on object-level saliency cues. Object proposals offer a more complete and high-level representation, which naturally caters to the needs of salient object detection. As well as introducing this novel solution for video salient object detection, we reorganize various discriminative saliency cues and traditional saliency assumptions on object proposals. With object candidates, a proposal ranking and voting scheme, based on various object-level saliency cues, is designed to screen out nonsalient parts, select salient object regions, and to infer an initial saliency estimate. Then a saliency optimization process that considers temporal consistency and appearance differences between salient and nonsalient regions is used to refine the initial saliency estimates. Our experiments on public datasets (SegTrackV2, Freiburg-Berkeley Motion Segmentation Dataset, and Densely Annotated Video Segmentation) validate the effectiveness, and the proposed method produces significant improvements over state-of-the-art algorithms.

Guo, K., Liu, L., Xu, X., Xu, D. & Tao, D. 2018, 'GoDec+: Fast and Robust Low-Rank Matrix Decomposition Based on Maximum Correntropy', IEEE Transactions on Neural Networks and Learning Systems.
View description>>

GoDec is an efficient low-rank matrix decomposition algorithm. However, optimal performance depends on sparse errors and Gaussian noise. This paper aims to address the problem that a matrix is composed of a low-rank component and unknown corruptions. We introduce a robust local similarity measure called correntropy to describe the corruptions and, in doing so, obtain a more robust and faster low-rank decomposition algorithm: GoDec+. Based on half-quadratic optimization and greedy bilateral paradigm, we deliver a solution to the maximum correntropy criterion (MCC)-based low-rank decomposition problem. Experimental results show that GoDec+ is efficient and robust to different corruptions including Gaussian noise, Laplacian noise, salt & pepper noise, and occlusion on both synthetic and real vision data. We further apply GoDec+ to more general applications including classification and subspace clustering. For classification, we construct an ensemble subspace from the low-rank GoDec+ matrix and introduce an MCC-based classifier. For subspace clustering, we utilize GoDec+ values low-rank matrix for MCC-based self-expression and combine it with spectral clustering. Face recognition, motion segmentation, and face clustering experiments show that the proposed methods are effective and robust. In particular, we achieve the state-of-the-art performance on the Hopkins 155 data set and the first 10 subjects of extended Yale B for subspace clustering.

Kang, G., Li, J. & Tao, D. 2018, 'Shakeout: A New Approach to Regularized Deep Neural Network Training', IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 40, no. 5, pp. 1245-1258.
View description>>

© 1979-2012 IEEE. Recent years have witnessed the success of deep neural networks in dealing with a plenty of practical problems. Dropout has played an essential role in many successful deep neural networks, by inducing regularization in the model training. In this paper, we present a new regularized training approach: Shakeout. Instead of randomly discarding units as Dropout does at the training stage, Shakeout randomly chooses to enhance or reverse each unit's contribution to the next layer. This minor modification of Dropout has the statistical trait: the regularizer induced by Shakeout adaptively combines L-{0} , L-{1} and L-{2} regularization terms. Our classification experiments with representative deep architectures on image datasets MNIST, CIFAR-10 and ImageNet show that Shakeout deals with over-fitting effectively and outperforms Dropout. We empirically demonstrate that Shakeout leads to sparser weights under both unsupervised and supervised settings. Shakeout also leads to the grouping effect of the input units in a layer. Considering the weights in reflecting the importance of connections, Shakeout is superior to Dropout, which is valuable for the deep model compression. Moreover, we demonstrate that Shakeout can effectively reduce the instability of the training process of the deep architecture.

Kong, S., Lee, J.H. & Li, S. 2018, 'A new distributed algorithm for efficient generalized arc-consistency propagation', Autonomous Agents and Multi-Agent Systems, pp. 1-33.
View description>>

© 2018 The Author(s) Generalized arc-consistency propagation is predominantly used in constraint solvers to efficiently prune the search space when solving constraint satisfaction problems. Although many practical applications can be modelled as distributed constraint satisfaction problems, no distributed arc-consistency algorithms so far have considered the privacy of individual agents. In this paper, we propose a new distributed arc-consistency algorithm, called (Formula presented.), which leaks less private information of agents than existing distributed arc-consistency algorithms. In particular, (Formula presented.) uses a novel termination determination mechanism, which allows the agents to share domains, constraints and communication addresses only with relevant agents. We further extend (Formula presented.) to (Formula presented.), which is the first distributed algorithm that enforces generalized arc-consistency on k-ary ((Formula presented.)) constraint satisfaction problems. Theoretical analyses show that our algorithms are efficient in both time and space. Experiments also demonstrate that (Formula presented.) outperforms the state-of-the-art distributed arc-consistency algorithm and that (Formula presented.) ’s performance scales linearly in the number of agents.

Lassetter, J.H., Macintosh, C.I., Williams, M., Driessnack, M., Ray, G. & Wisco, J.J. 2018, 'Psychometric testing of the healthy eating and physical activity self-efficacy questionnaire and the healthy eating and physical activity behavior recall questionnaire for children', Journal for Specialists in Pediatric Nursing, vol. 23, no. 2.
View description>>

© 2018 Wiley Periodicals, Inc. Purpose: The purpose of this study was to develop and assess the psychometric properties for two related questionnaires: the Healthy Eating and Physical Activity Self-Efficacy Questionnaire for Children (HEPASEQ-C) and the Healthy Eating and Physical Activity Behavior Recall Questionnaire for Children (HEPABRQ-C). Design and Methods: HEPASEQ-C and HEPABRQ-C were administered to 517 participating children with 492 completing. Data were analyzed to evaluate for reliability and validity of the questionnaires. Results: Content validity was established through a 10-person expert panel. For the HEPASEQ-C, item content validity index (CVI) ranged from 0.80 to 1.00. The CVI for the total questionnaire was 1.0. All HEPASEQ-C items loaded on a single factor. Cronbach's alpha was deemed acceptable (.749). For the HEPABRQ-C, item CVI ranged from 0.88 to 1.00. CVI for the total questionnaire was 1.0. Pearson product moment correlation between HEPASEQ-C and HEPABRQ-C scores was significant (r =.501, p =.000). Practice Implications: The HEPASEQ-C and HEPABRQ-C are easily administered and provide helpful insights into children's self-efficacy and behavior recall. They are easy to use and applicable for upper elementary school settings, in clinical settings for individual patients, and in health promotion settings.

Li, X., Liu, K., Dong, Y. & Tao, D. 2018, 'Patch Alignment Manifold Matting', IEEE Transactions on Neural Networks and Learning Systems.
View description>>

IEEE Image matting is generally modeled as a space transform from the color space to the alpha space. By estimating the alpha factor of the model, the foreground of an image can be extracted. However, there is some dimensional information redundancy in the alpha space. It usually leads to the misjudgments of some pixels near the boundary between the foreground and the background. In this paper, a manifold matting framework named Patch Alignment Manifold Matting is proposed for image matting. In particular, we first propose a part modeling of color space in the local image patch. We then perform whole alignment optimization for approximating the alpha results using subspace reconstructing error. Furthermore, we utilize Nesterov's algorithm to solve the optimization problem. Finally, we apply some manifold learning methods in the framework, and obtain several image matting methods, such as named ISOMAP matting and its derived Cascade ISOMAP matting. The experimental results reveal that the manifold matting framework and its two examples are effective when compared with several representative matting methods.

Li, X., Lu, Q., Dong, Y. & Tao, D. 2018, 'SCE: A Manifold Regularized Set-Covering Method for Data Partitioning', IEEE Transactions on Neural Networks and Learning Systems, vol. 29, no. 5, pp. 1760-1773.
View description>>

Cluster analysis plays a very important role in data analysis. In these years, cluster ensemble, as a cluster analysis tool, has drawn much attention for its robustness, stability, and accuracy. Many efforts have been done to combine different initial clustering results into a single clustering solution with better performance. However, they neglect the structure information of the raw data in performing the cluster ensemble. In this paper, we propose a Structural Cluster Ensemble (SCE) algorithm for data partitioning formulated as a set-covering problem. In particular, we construct a Laplacian regularized objective function to capture the structure information among clusters. Moreover, considering the importance of the discriminative information underlying in the initial clustering results, we add a discriminative constraint into our proposed objective function. Finally, we verify the performance of the SCE algorithm on both synthetic and real data sets. The experimental results show the effectiveness of our proposed method SCE algorithm.

Li, X., Zhao, L., Ji, W., Wu, Y., Wu, F., Yang, M.H., Tao, D. & Reid, I. 2018, 'Multi-Task Structure-aware Context Modeling for Robust Keypoint-based Object Tracking', IEEE Transactions on Pattern Analysis and Machine Intelligence.
View description>>

IEEE In the fields of computer vision and graphics, keypoint-based object tracking is a fundamental and challenging problem, which is typically formulated in a spatio-temporal context modeling framework. However, many existing keypoint trackers are incapable of effectively modeling and balancing the following three aspects in a simultaneous manner: temporal model coherence across frames, spatial model consistency within frames, and discriminative feature construction. To address this problem, we propose a robust keypoint tracker based on spatio-temporal multi-task structured output optimization driven by discriminative metric learning. Consequently, temporal model coherence is characterized by multi-task structured keypoint model learning over several adjacent frames; spatial model consistency is modeled by solving a geometric verification based structured learning problem; discriminative feature construction is enabled by metric learning to ensure the intra-class compactness and inter-class separability. To achieve the goal of effective object tracking, we jointly optimize the above three modules in a spatio-temporal multi-task learning scheme. Furthermore, we incorporate this joint learning scheme into both single-object and multi-object tracking scenarios, resulting in robust tracking results. Experiments over several challenging datasets have justified the effectiveness of our single-object and multi-object trackers against the state-of-the-art.

Li, Y., Tian, X., Liu, T. & Tao, D. 2018, 'On Better Exploring and Exploiting Task Relationships in Multitask Learning: Joint Model and Feature Learning', IEEE Transactions on Neural Networks and Learning Systems, vol. 29, no. 5, pp. 1975-1985.
View description>>

Liu, A., Lu, J., Liu, F. & Zhang, G. 2018, 'Accumulating regional density dissimilarity for concept drift detection in data streams', Pattern Recognition, vol. 76, pp. 256-272.
View description>>

© 2017 Elsevier Ltd In a non-stationary environment, newly received data may have different knowledge patterns from the data used to train learning models. As time passes, a learning model's performance may become increasingly unreliable. This problem is known as concept drift and is a common issue in real-world domains. Concept drift detection has attracted increasing attention in recent years. However, very few existing methods pay attention to small regional drifts, and their accuracy may vary due to differing statistical significance tests. This paper presents a novel concept drift detection method, based on regional-density estimation, named nearest neighbor-based density variation identification (NN-DVI). It consists of three components. The first is a k-nearest neighbor-based space-partitioning schema (NNPS), which transforms unmeasurable discrete da ta instances into a set of shared subspaces for density estimation. The second is a distance function that accumulates the density discrepancies in these subspaces and quantifies the overall differences. The third component is a tailored statistical significance test by which the confidence interval of a concept drift can be accurately determined. The distance applied in NN-DVI is sensitive to regional drift and has been proven to follow a normal distribution. As a result, the NN-DVI's accuracy and false-alarm rate are statistically guaranteed. Additionally, several benchmarks have been used to evaluate the method, including both synthetic and real-world datasets. The overall results show that NN-DVI has better performance in terms of addressing problems related to concept drift-detection.

Liu, F., Gong, C., Huang, X., Zhou, T., Yang, J. & Tao, D. 2018, 'Robust Visual Tracking Revisited: From Correlation Filter to Template Matching', IEEE Transactions on Image Processing, vol. 27, no. 6, pp. 2777-2790.
View description>>

© 2018 IEEE. In this paper, we propose a novel matching based tracker by investigating the relationship between template matching and the recent popular correlation filter based trackers (CFTs). Compared to the correlation operation in CFTs, a sophisticated similarity metric termed mutual buddies similarity is proposed to exploit the relationship of multiple reciprocal nearest neighbors for target matching. By doing so, our tracker obtains powerful discriminative ability on distinguishing target and background as demonstrated by both empirical and theoretical analyses. Besides, instead of utilizing single template with the improper updating scheme in CFTs, we design a novel online template updating strategy named memory, which aims to select a certain amount of representative and reliable tracking results in history to construct the current stable and expressive template set. This scheme is beneficial for the proposed tracker to comprehensively understand the target appearance variations, recall some stable results. Both qualitative and quantitative evaluations on two benchmarks suggest that the proposed tracking method performs favorably against some recently developed CFTs and other competitive trackers.

Liu, M., Xu, C., Luo, Y., Xua, C., Wen, Y. & Tao, D. 2018, 'Cost-Sensitive Feature Selection by Optimizing F-measures', IEEE Transactions on Image Processing, vol. 27, no. 3, pp. 1323-1333.
View description>>

IEEE Feature selection is beneficial for improving the performance of general machine learning tasks by extracting an informative subset from the high-dimensional features. Conventional feature selection methods usually ignore the class imbalance problem, thus the selected features will be biased towards the majority class. Considering that F-measure is a more reasonable performance measure than accuracy for imbalanced data, this paper presents an effective feature selection algorithm that explores the class imbalance issue by optimizing F-measures. Since F-measure optimization can be decomposed into a series of cost-sensitive classification problems, we investigate the costsensitive feature selection (CSFS) by generating and assigning different costs to each class with rigorous theory guidance. After solving a series of cost-sensitive feature selection problems, features corresponding to the best F-measure will be selected. In this way, the selected features will fully represent the properties of all classes. Experimental results on popular benchmarks and challenging real-world datasets demonstrate the significance of cost-sensitive feature selection for the imbalanced data setting and validate the effectiveness of the proposed method.

Lu, J., Xuan, J., Zhang, G. & Luo, X. 2018, 'Structural property-aware multilayer network embedding for latent factor analysis', Pattern Recognition, vol. 76, pp. 228-241.
View description>>

© 2017 Elsevier Ltd Multilayer network is a structure commonly used to describe and model the complex interaction between sets of entities/nodes. A three-layer example is the author-paper-word structure in which authors are linked by co-author relation, papers are linked b y citation relation, and words are linked by semantic relation. Network embedding, which aims to project the nodes in the network into a relatively low-dimensional space for latent factor analysis, has recently emerged as an effective method for a variety of network-based tasks, such as collaborative filtering and link prediction. However, existing studies of network embedding both focus on the single-layer network and overlook the structural properties of the network, e.g., the degree distribution and communities, which are significant for node characterization, such as the preferences of users in a social network. In this paper, we propose four multilayer network embedding algorithms based on Nonnegative Matrix Factorization (NMF) with consideration given to four structural properties: whole network (NNMF), community (CNMF), degree distribution (DNMF), and max spanning tree (TNMF). Experiments on synthetic data show that the proposed algorithms are able to preserve the desired structural properties as designed. Experiments on real-world data show that multilayer network embedding improves the accuracy of document clustering and recommendation, and the four embedding algorithms corresponding to the four structural properties demonstrate the differences in performance on these two tasks. These results can be directly used in document clustering and recommendation systems.

Luo, Y., Wen, Y. & Tao, D. 2018, 'Heterogeneous Multitask Metric Learning Across Multiple Domains', IEEE Transactions on Neural Networks and Learning Systems.
View description>>

IEEE Distance metric learning plays a crucial role in diverse machine learning algorithms and applications. When the labeled information in a target domain is limited, transfer metric learning (TML) helps to learn the metric by leveraging the sufficient information from other related domains. Multitask metric learning (MTML), which can be regarded as a special case of TML, performs transfer across all related domains. Current TML tools usually assume that the same feature representation is exploited for different domains. However, in real-world applications, data may be drawn from heterogeneous domains. Heterogeneous transfer learning approaches can be adopted to remedy this drawback by deriving a metric from the learned transformation across different domains. However, they are often limited in that only two domains can be handled. To appropriately handle multiple domains, we develop a novel heterogeneous MTML (HMTML) framework. In HMTML, the metrics of all different domains are learned together. The transformations derived from the metrics are utilized to induce a common subspace, and the high-order covariance among the predictive structures of these domains is maximized in this subspace. There do exist a few heterogeneous transfer learning approaches that deal with multiple domains, but the high-order statistics (correlation information), which can only be exploited by simultaneously examining all domains, is ignored in these approaches. Compared with them, the proposed HMTML can effectively explore such high-order information, thus obtaining more reliable feature transformations and metrics. Effectiveness of our method is validated by the extensive and intensive experiments on text categorization, scene classification, and social image annotation.

Luo, Y., Wen, Y., Liu, T. & Tao, D. 2018, 'Transferring Knowledge Fragments for Learning Distance Metric from A Heterogeneous Domain', IEEE Transactions on Pattern Analysis and Machine Intelligence.
View description>>

IEEE The goal of transfer learning is to improve the performance of target learning task by leveraging information (or transferring knowledge) from other related tasks. In this paper, we examine the problem of transfer distance metric learning (DML), which usually aims to mitigate the label information deficiency issue in the target DML. Most of the current Transfer DML (TDML) methods are not applicable to the scenario where data are drawn from heterogeneous domains. Some existing heterogeneous transfer learning (HTL) approaches can learn target distance metric by usually transforming the samples of source and target domain into a common subspace. However, these approaches lack flexibility in real-world applications, and the learned transformations are often restricted to be linear. This motivates us to develop a general flexible heterogeneous TDML (HTDML) framework. In particular, any (linear/nonlinear) DML algorithms can be employed to learn the source metric beforehand. Then the pre-learned source metric is represented as a set of knowledge fragments to help target metric learning. We show how generalization error in the target domain could be reduced using the proposed transfer strategy, and develop novel algorithm to learn either linear or nonlinear target metric. Extensive experiments on various applications demonstrate the effectiveness of the proposed method.

Mei, T., Li, L., Tian, X., Tao, D. & Ngo, C.W. 2018, 'PageSense: Toward Stylewise Contextual Advertising via Visual Analysis of Web Pages', IEEE Transactions on Circuits and Systems for Video Technology, vol. 28, no. 1, pp. 254-266.
View description>>

Ojha, S., Williams, M.A. & Johnston, B. 2018, 'The Essence of Ethical Reasoning in Robot-Emotion Processing', International Journal of Social Robotics, vol. 10, no. 2, pp. 211-223.
View description>>

© 2017, Springer Science+Business Media B.V., part of Springer Nature. As social robots become more and more intelligent and autonomous in operation, it is extremely important to ensure that such robots act in socially acceptable manner. More specifically, if such an autonomous robot is capable of generating and expressing emotions of its own, it should also have an ability to reason if it is ethical to exhibit a particular emotional state in response to a surrounding event. Most existing computational models of emotion for social robots have focused on achieving a certain level of believability of the emotions expressed. We argue that believability of a robot’s emotions, although crucially necessary, is not a sufficient quality to elicit socially acceptable emotions. Thus, we stress on the need of higher level of cognition in emotion processing mechanism which empowers social robots with an ability to decide if it is socially appropriate to express a particular emotion in a given context or it is better to inhibit such an experience. In this paper, we present the detailed mathematical explanation of the ethical reasoning mechanism in our computational model, EEGS, that helps a social robot to reach to the most socially acceptable emotional state when more than one emotions are elicited by an event. Experimental results show that ethical reasoning in EEGS helps in the generation of believable as well as socially acceptable emotions.

Qiao, M., Yu, J., Bian, W., Li, Q. & Tao, D. 2018, 'Adapting Stochastic Block Models to Power-Law Degree Distributions', IEEE Transactions on Cybernetics.
View description>>

IEEE Stochastic block models (SBMs) have been playing an important role in modeling clusters or community structures of network data. But, it is incapable of handling several complex features ubiquitously exhibited in real-world networks, one of which is the power-law degree characteristic. To this end, we propose a new variant of SBM, termed power-law degree SBM (PLD-SBM), by introducing degree decay variables to explicitly encode the varying degree distribution over all nodes. With an exponential prior, it is proved that PLD-SBM approximately preserves the scale-free feature in real networks. In addition, from the inference of variational E-Step, PLD-SBM is indeed to correct the bias inherited in SBM with the introduced degree decay factors. Furthermore, experiments conducted on both synthetic networks and two real-world datasets including Adolescent Health Data and the political blogs network verify the effectiveness of the proposed model in terms of cluster prediction accuracies.

Shen, J., Liang, Z., Liu, J., Sun, H., Shao, L. & Tao, D. 2018, 'Multiobject Tracking by Submodular Optimization', IEEE Transactions on Cybernetics.
View description>>

IEEE In this paper, we propose a new multiobject visual tracking algorithm by submodular optimization. The proposed algorithm is composed of two main stages. At the first stage, a new selecting strategy of tracklets is proposed to cope with occlusion problem. We generate low-level tracklets using overlap criteria and min-cost flow, respectively, and then integrate them into a candidate tracklets set. In the second stage, we formulate the multiobject tracking problem as the submodular maximization problem subject to related constraints. The submodular function selects the correct tracklets from the candidate set of tracklets to form the object trajectory. Then, we design a connecting process which connects the corresponding trajectories to overcome the occlusion problem. Experimental results demonstrate the effectiveness of our tracking algorithm. Our source code is available at https://github.com/shenjianbing/submodulartrack.

Shen, X., Liu, T., Tao, D., Fan, Y., Zhang, J., Li, S., Jiang, J., Zhu, W., Wang, Y., Wang, Y., Brodaty, H., Sachdev, P. & Wen, W. 2018, 'Variation in longitudinal trajectories of cortical sulci in normal elderly.', NeuroImage, vol. 166, pp. 1-9.
View description>>

Sulcal morphology has been reported to change with age-related neurological diseases, but the trajectories of sulcal change in normal ageing in the elderly is still unclear. We conducted a study of sulcal morphological changes over seven years in 132 normal elderly participants aged 70-90 years at baseline, and who remained cognitively normal for the next seven years. We examined the fold opening and sulcal depth of sixteen (eight on each hemisphere) prominent sulci based on T1-weighted MRI using automated methods with visual quality control. The trajectory of each individual sulcus with respect to age was examined separately by linear mixed models. Fold opening was best modelled by cubic fits in five sulci, by quadratic models in six sulci and by linear models in five sulci, indicating an accelerated widening of a number of sulci in older age. Sulcal depth showed significant linear decline in three sulci and quadratic trend in one sulcus. Turning points of non-linear trajectories towards accelerated widening of the fold were found to be around the age between 75 and 80, indicating an accelerated atrophy of brain cortex starting in the age of late 70s. Our findings of cortical sulcal changes in normal ageing could provide a reference for studies of neurocognitive disorders, including neurodegenerative diseases, in the elderly.

Shen, X., Tian, X., Liu, T., Xu, F. & Tao, D. 2018, 'Continuous Dropout', IEEE Transactions on Neural Networks and Learning Systems.
View description>>

IEEE Dropout has been proven to be an effective algorithm for training robust deep networks because of its ability to prevent overfitting by avoiding the co-adaptation of feature detectors. Current explanations of dropout include bagging, naive Bayes, regularization, and sex in evolution. According to the activation patterns of neurons in the human brain, when faced with different situations, the firing rates of neurons are random and continuous, not binary as current dropout does. Inspired by this phenomenon, we extend the traditional binary dropout to continuous dropout. On the one hand, continuous dropout is considerably closer to the activation characteristics of neurons in the human brain than traditional binary dropout. On the other hand, we demonstrate that continuous dropout has the property of avoiding the co-adaptation of feature detectors, which suggests that we can extract more independent feature detectors for model averaging in the test stage. We introduce the proposed continuous dropout to a feedforward neural network and comprehensively compare it with binary dropout, adaptive dropout, and DropConnect on Modified National Institute of Standards and Technology, Canadian Institute for Advanced Research-10, Street View House Numbers, NORB, and ImageNet large scale visual recognition competition-12. Thorough experiments demonstrate that our method performs better in preventing the co-adaptation of feature detectors and improves test performance.

Wang, B., Yuan, X., Gao, X., Li, X. & Tao, D. 2018, 'A Hybrid Level Set With Semantic Shape Constraint for Object Segmentation', IEEE Transactions on Cybernetics.
View description>>

IEEE This paper presents a hybrid level set method for object segmentation. The method deconstructs segmentation task into two procedures, i.e., shape transformation and curve evolution, which are alternately optimized until convergence. In this framework, only one shape prior encoded by shape context is utilized to estimate a transformation allowing the curve to have the same semantic expression as shape prior, and curve evolution is driven by an energy functional with topology-preserving and kernelized terms. In such a way, the proposed method is featured by the following advantages: 1) hybrid paradigm makes the level set framework possess the ability of incorporating other shape-related techniques about shape descriptor and distance; 2) shape context endows one single prior with semanticity, and hence leads to the competitive performance compared to the ones with multiple shape priors; and 3) additionally, combining topology-preserving and kernelization mechanisms together contributes to realizing a more reasonable segmentation on textured and noisy images. As far as we know, we propose a hybrid level set framework and utilize shape context to guide curve evolution for the first time. Our method is evaluated with synthetic, healthcare, and natural images, as a result, it shows competitive and even better performance compared to the counterparts.

Wang, H., Wu, J., Zhu, X., Chen, Y. & Zhang, C. 2018, 'Time-Variant Graph Classification', IEEE Transactions on Systems, Man, and Cybernetics: Systems.
View description>>

IEEE Graphs are commonly used to represent objects, such as images and text, for pattern classification. In a dynamic world, an object may continuously evolve over time, and so does the graph extracted from the underlying object. These changes in graph structure with respect to the temporal order present a new representation of the graph, in which an object corresponds to a set of time-variant graphs. In this paper, we formulate a novel time-variant graph classification task and propose a new graph feature, called a graph-shapelet pattern, for learning and classifying time-variant graphs. Graph-shapelet patterns are compact and discriminative graph transformation subsequences. A graph-shapelet pattern can be regarded as a graphical extension of a shapelet--a class of discriminative features designed for vector-based temporal data classification. To discover graph-shapelet patterns, we propose to convert a time-variant graph sequence into time-series data and use the discovered shapelets to find graph transformation subsequences as graph-shapelet patterns. By converting each graph-shapelet pattern into a unique tokenized graph transformation sequence, we can measure the similarity between two graph-shapelet patterns and therefore classify time-variant graphs. Experiments on both synthetic and real-world data demonstrate the superior performance of the proposed algorithms.

Wang, R. & Tao, D. 2018, 'Training Very Deep CNNs for General Non-Blind Deconvolution', IEEE Transactions on Image Processing, vol. 27, no. 6, pp. 2897-2910.
View description>>

© 1992-2012 IEEE. Non-blind image deconvolution is an ill-posed problem. The presence of noise and band-limited blur kernels makes the solution of this problem non-unique. Existing deconvolution techniques produce a residual between the sharp image and the estimation that is highly correlated with the sharp image, the kernel, and the noise. In most cases, different restoration models must be constructed for different blur kernels and different levels of noise, resulting in low computational efficiency or highly redundant model parameters. Here we aim to develop a single model that handles different types of kernels and different levels of noise: general non-blind deconvolution. Specifically, we propose a very deep convolutional neural network that predicts the residual between a pre-deconvolved image and the sharp image rather than the sharp image. The residual learning strategy makes it easier to train a single model for different kernels and different levels of noise, encouraging high effectiveness and efficiency. Quantitative evaluations demonstrate the practical applicability of the proposed model for different blur kernels. The model also shows the state-of-the-art performance on synthesized blurry images.

Wang, R., Liu, T. & Tao, D. 2018, 'Multiclass Learning With Partially Corrupted Labels', IEEE Transactions on Neural Networks and Learning Systems.
View description>>

Traditional classification systems rely heavily on sufficient training data with accurate labels. However, the quality of the collected data depends on the labelers, among which inexperienced labelers may exist and produce unexpected labels that may degrade the performance of a learning system. In this paper, we investigate the multiclass classification problem where a certain amount of training examples are randomly labeled. Specifically, we show that this issue can be formulated as a label noise problem. To perform multiclass classification, we employ the widely used importance reweighting strategy to enable the learning on noisy data to more closely reflect the results on noise-free data. We illustrate the applicability of this strategy to any surrogate loss functions and to different classification settings. The proportion of randomly labeled examples is proved to be upper bounded and can be estimated under a mild condition. The convergence analysis ensures the consistency of the learned classifier to the optimal classifier with respect to clean data. Two instantiations of the proposed strategy are also introduced. Experiments on synthetic and real data verify that our approach yields improvements over the traditional classifiers as well as the robust classifiers. Moreover, we empirically demonstrate that the proposed strategy is effective even on asymmetrically noisy data.

Wang, S. & Cao, L. 2018, 'Inferring Implicit Rules by Learning Explicit and Hidden Item Dependency', IEEE Transactions on Systems Man and Cybernetics: Systems.
View description>>

IEEE Revealing complex relations between entities (e.g., items within or between transactions) is of great significance for business optimization, prediction, and decision making. Such relations include not only co-occurrence-based explicit relations but also nonco-occurrence-based implicit ones. Explicit relations have been substantially studied by rule mining-based approaches, including association rule mining and causal rule discovery. In contrast, implicit relations have received much less attention but could be more actionable. In this paper, we focus on the implicit relations between items which rarely or never co-occur while each of them co-occurs with other identical items (link items) with a high probability. A framework integrates both explicit and hidden item dependencies and a corresponding efficient algorithm IRRMiner captures such implicit relations with implicit rule inference. Experimental results show that IRRMiner not only infers implicit rules of various sizes consisting of both frequent and infrequent items effectively, it also runs at least four times faster than IARMiner, a typical indirect association rule mining algorithm which can only mine size-2 indirect association rules between frequent items. IRRMiner is applied to make recommendations and shows that the identified implicit rules can increase recommendation reliability.

Wang, W., Zhang, G. & Lu, J. 2018, 'Hierarchy Visualization for Group Recommender Systems', IEEE Transactions on Systems Man and Cybernetics: Systems, pp. 1-12.
View description>>

IEEE Most recommender systems (RSs), especially group RSs, focus on methods and accuracy but lack explanations, hence users find them difficult to trust. We present a hierarchy visualization method for group recommender (HVGR) systems to provide visual presentation and intuitive explanation. We first use a hierarchy graph to organize all the entities using nodes (e.g., neighbor nodes and recommendation nodes) and illustrate the overall recommender process using edges. Second, a pie chart is attached to every entity node in which each slice represents a single member, which makes it easy to track the influence of each member on a specific entity. HVGR can be extended to adapt different pseudouser modeling methods by resizing group member nodes and pseudouser nodes. It can also be easily extended to individual RSs through the use of a single member group. An implementation has been developed and feasibility is tested using a real data set.

Wu, J., Pan, S., Zhu, X., Zhang, C. & Wu, X. 2018, 'Multi-instance learning with discriminative bag mapping', IEEE Transactions on Knowledge and Data Engineering, vol. 30, no. 6, pp. 1065-1080.
View description>>

© 1989-2012 IEEE. Multi-instance learning (MIL) is a useful tool for tackling labeling ambiguity in learning because it allows a bag of instances to share one label. Bag mapping transforms a bag into a single instance in a new space via instance selection and has drawn significant attention recently. To date, most existing work is based on the original space, using all instances inside each bag for bag mapping, and the selected instances are not directly tied to an MIL objective. As a result, it is difficult to guarantee the distinguishing capacity of the selected instances in the new bag mapping space. In this paper, we propose a discriminative mapping approach for multi-instance learning (MILDM) that aims to identify the best instances to directly distinguish bags in the new mapping space. Accordingly, each instance bag can be mapped using the selected instances to a new feature space, and hence any generic learning algorithm, such as an instance-based learning algorithm, can be used to derive learning models for multi-instance classification. Experiments and comparisons on eight different types of real-world learning tasks (including 14 data sets) demonstrate that MILDM outperforms the state-of-The-Art bag mapping multi-instance learning approaches. Results also confirm that MILDM achieves balanced performance between runtime efficiency and classification effectiveness.

Wu, J., Pan, S., Zhu, X., Zhang, C. & Yu, P.S. 2018, 'Multiple Structure-View Learning for Graph Classification', IEEE Transactions on Neural Networks and Learning Systems.
View description>>

IEEE Many applications involve objects containing structure and rich content information, each describing different feature aspects of the object. Graph learning and classification is a common tool for handling such objects. To date, existing graph classification has been limited to the single-graph setting with each object being represented as one graph from a single structure-view. This inherently limits its use to the classification of complicated objects containing complex structures and uncertain labels. In this paper, we advance graph classification to handle multigraph learning for complicated objects from multiple structure views, where each object is represented as a bag containing several graphs and the label is only available for each graph bag but not individual graphs inside the bag. To learn such graph classification models, we propose a multistructure-view bag constrained learning (MSVBL) algorithm, which aims to explore substructure features across multiple structure views for learning. By enabling joint regularization across multiple structure views and enforcing labeling constraints at the bag and graph levels, MSVBL is able to discover the most effective substructure features across all structure views. Experiments and comparisons on real-world data sets validate and demonstrate the superior performance of MSVBL in representing complicated objects as multigraph for classification, e.g., MSVBL outperforms the state-of-the-art multiview graph classification and multiview multi-instance learning approaches.

Xie, Y., Tao, D., Zhang, W., Liu, Y., Zhang, L. & Qu, Y. 2018, 'On Unifying Multi-view Self-Representations for Clustering by Tensor Multi-rank Minimization', International Journal of Computer Vision, pp. 1-23.
View description>>

© 2018 Springer Science+Business Media, LLC, part of Springer Nature In this paper, we address the multi-view subspace clustering problem. Our method utilizes the circulant algebra for tensor, which is constructed by stacking the subspace representation matrices of different views and then rotating, to capture the low rank tensor subspace so that the refinement of the view-specific subspaces can be achieved, as well as the high order correlations underlying multi-view data can be explored. By introducing a recently proposed tensor factorization, namely tensor-Singular Value Decomposition (t-SVD) (Kilmer et al. in SIAM J Matrix Anal Appl 34(1):148–172, 2013), we can impose a new type of low-rank tensor constraint on the rotated tensor to ensure the consensus among multiple views. Different from traditional unfolding based tensor norm, this low-rank tensor constraint has optimality properties similar to that of matrix rank derived from SVD, so the complementary information can be explored and propagated among all the views more thoroughly and effectively. The established model, called t-SVD based Multi-view Subspace Clustering (t-SVD-MSC), falls into the applicable scope of augmented Lagrangian method, and its minimization problem can be efficiently solved with theoretical convergence guarantee and relatively low computational complexity. Extensive experimental testing on eight challenging image datasets shows that the proposed method has achieved highly competent objective performance compared to several state-of-the-art multi-view clustering methods.

Xu, Z., Huang, S., Zhang, Y. & Tao, D. 2018, 'Webly-Supervised Fine-Grained Visual Categorization via Deep Domain Adaptation', IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 40, no. 5, pp. 1100-1113.
View description>>

© 1979-2012 IEEE. Learning visual representations from web data has recently attracted attention for object recognition. Previous studies have mainly focused on overcoming label noise and data bias and have shown promising results by learning directly from web data. However, we argue that it might be better to transfer knowledge from existing human labeling resources to improve performance at nearly no additional cost. In this paper, we propose a new semi-supervised method for learning via web data. Our method has the unique design of exploiting strong supervision, i.e., in addition to standard image-level labels, our method also utilizes detailed annotations including object bounding boxes and part landmarks. By transferring as much knowledge as possible from existing strongly supervised datasets to weakly supervised web images, our method can benefit from sophisticated object recognition algorithms and overcome several typical problems found in webly-supervised learning. We consider the problem of fine-grained visual categorization, in which existing training resources are scarce, as our main research objective. Comprehensive experimentation and extensive analysis demonstrate encouraging performance of the proposed approach, which, at the same time, delivers a new pipeline for fine-grained visual categorization that is likely to be highly effective for real-world applications.

Xuan, J., Lu, J., Zhang, G., Xu, R.Y.D. & Luo, X. 2018, 'Doubly Nonparametric Sparse Nonnegative Matrix Factorization Based on Dependent Indian Buffet Processes', IEEE Transactions on Neural Networks and Learning Systems, pp. 1835-1849.
View description>>

Sparse nonnegative matrix factorization (SNMF) aims to factorize a data matrix into two optimized nonnegative sparse factor matrices, which could benefit many tasks, such as document-word co-clustering. However, the traditional SNMF typically assumes the number of latent factors (i.e., dimensionality of the factor matrices) to be fixed. This assumption makes it inflexible in practice. In this paper, we propose a doubly sparse nonparametric NMF framework to mitigate this issue by using dependent Indian buffet processes (dIBP). We apply a correlation function for the generation of two stick weights associated with each column pair of factor matrices while still maintaining their respective marginal distribution specified by IBP. As a consequence, the generation of two factor matrices will be columnwise correlated. Under this framework, two classes of correlation function are proposed: 1) using bivariate Beta distribution and 2) using Copula function. Compared with the single IBP-based NMF, this paper jointly makes two factor matrices nonparametric and sparse, which could be applied to broader scenarios, such as co-clustering. This paper is seen to be much more flexible than Gaussian process-based and hierarchial Beta process-based dIBPs in terms of allowing the two corresponding binary matrix columns to have greater variations in their nonzero entries. Our experiments on synthetic data show the merits of this paper compared with the state-of-the-art models in respect of factorization efficiency, sparsity, and flexibility. Experiments on real-world data sets demonstrate the efficiency of this paper in document-word co-clustering tasks.

Yang, E., Deng, C., Li, C., Liu, W., Li, J. & Tao, D. 2018, 'Shared Predictive Cross-Modal Deep Quantization', IEEE Transactions on Neural Networks and Learning Systems.
View description>>

IEEE With explosive growth of data volume and ever-increasing diversity of data modalities, cross-modal similarity search, which conducts nearest neighbor search across different modalities, has been attracting increasing interest. This paper presents a deep compact code learning solution for efficient cross-modal similarity search. Many recent studies have proven that quantization-based approaches perform generally better than hashing-based approaches on single-modal similarity search. In this paper, we propose a deep quantization approach, which is among the early attempts of leveraging deep neural networks into quantization-based cross-modal similarity search. Our approach, dubbed shared predictive deep quantization (SPDQ), explicitly formulates a shared subspace across different modalities and two private subspaces for individual modalities, and representations in the shared subspace and the private subspaces are learned simultaneously by embedding them to a reproducing kernel Hilbert space, where the mean embedding of different modality distributions can be explicitly compared. In addition, in the shared subspace, a quantizer is learned to produce the semantics preserving compact codes with the help of label alignment. Thanks to this novel network architecture in cooperation with supervised quantization training, SPDQ can preserve intramodal and intermodal similarities as much as possible and greatly reduce quantization error. Experiments on two popular benchmarks corroborate that our approach outperforms state-of-the-art methods.

Yang, X., Wang, M. & Tao, D. 2018, 'Person Re-Identification with Metric Learning Using Privileged Information', IEEE Transactions on Image Processing, vol. 27, no. 2, pp. 791-805.
View description>>

© 2017 IEEE. Despite the promising progress made in recent years, person re-identification remains a challenging task due to complex variations in human appearances from different camera views. This paper presents a logistic discriminant metric learning method for this challenging problem. Different with most existing metric learning algorithms, it exploits both original data and auxiliary data during training, which is motivated by the new machine learning paradigm-learning using privileged information. Such privileged information is a kind of auxiliary knowledge, which is only available during training. Our goal is to learn an optimal distance function by constructing a locally adaptive decision rule with the help of privileged information. We jointly learn two distance metrics by minimizing the empirical loss penalizing the difference between the distance in the original space and that in the privileged space. In our setting, the distance in the privileged space functions as a local decision threshold, which guides the decision making in the original space like a teacher. The metric learned from the original space is used to compute the distance between a probe image and a gallery image during testing. In addition, we extend the proposed approach to a multi-view setting which is able to explore the complementatio n of multiple feature representations. In the multi-view setting, multiple metrics corresponding to different original features are jointly learned, guided by the same privileged information. Besides, an effective iterative optimization scheme is introduced to simultaneously optimize the metrics and the assigned metric weights. Experiment results on several widely-used data sets demonstrate that the proposed approach is superior to global decision threshold-based methods and outperforms most state-of-the-art results.

Yu, J., Hong, C., Rui, Y. & Tao, D. 2018, 'Multitask Autoencoder Model for Recovering Human Poses', IEEE Transactions on Industrial Electronics, vol. 65, no. 6, pp. 5060-5068.
View description>>

© 1982-2012 IEEE. Human pose recovery in videos is usually conducted by matching 2-D image features and retrieving relevant 3-D human poses. In the retrieving process, the mapping between images and poses is critical. Traditional methods assume this mapping relationship as local joint detection or global joint localization, which limits recovery performance of these methods since this two tasks are actually unified. In this paper, we propose a novel pose recovery framework by simultaneously learning the tasks of joint localization and joint detection. To obtain this framework, multiple manifold learning is used and the shared parameter is calculated. With them, multiple manifold regularizers are integrated and generalized eigendecomposition is utilized to achieve parameter optimization. In this way, pose recovery is boosted by both global mapping and local refinement. Experimental results on two popular datasets demonstrates that the recovery error has been reduced by 10%-20%, which proves the performance improvement of the proposed method.

Yu, Z., Yu, J., Xiang, C., Fan, J. & Tao, D. 2018, 'Beyond Bilinear: Generalized Multimodal Factorized High-Order Pooling for Visual Question Answering', IEEE Transactions on Neural Networks and Learning Systems.
View description>>

IEEE Visual question answering (VQA) is challenging, because it requires a simultaneous understanding of both visual content of images and textual content of questions. To support the VQA task, we need to find good solutions for the following three issues: 1) fine-grained feature representations for both the image and the question; 2) multimodal feature fusion that is able to capture the complex interactions between multimodal features; and 3) automatic answer prediction that is able to consider the complex correlations between multiple diverse answers for the same question. For fine-grained image and question representations, a "coattention" mechanism is developed using a deep neural network (DNN) architecture to jointly learn the attentions for both the image and the question, which can allow us to reduce the irrelevant features effectively and obtain more discriminative features for image and question representations. For multimodal feature fusion, a generalized multimodal factorized high-order pooling approach (MFH) is developed to achieve more effective fusion of multimodal features by exploiting their correlations sufficiently, which can further result in superior VQA performance as compared with the state-of-the-art approaches. For answer prediction, the Kullback-Leibler divergence is used as the loss function to achieve precise characterization of the complex correlations between multiple diverse answers with the same or similar meaning, which can allow us to achieve faster convergence rate and obtain slightly better accuracy on answer prediction. A DNN architecture is designed to integrate all these aforementioned modules into a unified model for achieving superior VQA performance. With an ensemble of our MFH models, we achieve the state-of-the-art performance on the large-scale VQA data sets and win the runner-up in VQA Challenge 2017.

Zeng, X. & Lu, J. 2018, 'Decision Support Systems with Uncertainties in Big Data Environments', Knowledge-Based Systems, vol. 143, pp. 317-326.

Zhang, J., Yu, J. & Tao, D. 2018, 'Local Deep-Feature Alignment for Unsupervised Dimension Reduction', IEEE Transactions on Image Processing, vol. 27, no. 5, pp. 2420-2432.
View description>>

© 1992-2012 IEEE. This paper presents an unsupervised deep-learning framework named local deep-feature alignment (LDFA) for dimension reduction. We construct neighbourhood for each data sample and learn a local stacked contractive auto-encoder (SCAE) from the neighbourhood to extract the local deep features. Next, we exploit an affine transformation to align the local deep features of each neighbourhood with the global features. Moreover, we derive an approach from LDFA to map explicitly a new data sample into the learned low-dimensional subspace. The advantage of the LDFA method is that it learns both local and global characteristics of the data sample set: the local SCAEs capture local characteristics contained in the data set, while the global alignment procedures encode the interdependencies between neighbourhoods into the final low-dimensional feature representations. Experimental results on data visualization, clustering, and classification show that the LDFA method is competitive with several well-known dimension reduction techniques, and exploiting locality in deep learning is a research topic worth further exploring.

Zhang, Q., Liu, Y., Blum, R.S., Han, J. & Tao, D. 2018, 'Sparse representation based multi-sensor image fusion for multi-focus and multi-modality images: A review', Information Fusion, vol. 40, pp. 57-75.
View description>>

© 2017 Elsevier B.V. As a result of several successful applications in computer vision and image processing, sparse representation (SR) has attracted significant attention in multi-sensor image fusion. Unlike the traditional multiscale transforms (MSTs) that presume the basis functions, SR learns an over-complete dictionary from a set of training images for image fusion, and it achieves more stable and meaningful representations of the source images. By doing so, the SR-based fusion methods generally outperform the traditional MST image fusion methods in both subjective and objective tests. In addition, they are less susceptible to mis-registration among the source images, thus facilitating the practical applications. This survey paper proposes a systematic review of the SR-based multi-sensor image fusion literature, highlighting the pros and cons of each category of approaches. Specifically, we start by performing a theoretical investigation of the entire system from three key algorithmic aspects, (1) sparse representation models; (2) dictionary learning methods; and (3) activity levels and fusion rules. Subsequently, we show how the existing works address these scientific problems and design the appropriate fusion rules for each application such as multi-focus image fusion and multi-modality (e.g., infrared and visible) image fusion. At last, we carry out some experiments to evaluate the impact of these three algorithmic components on the fusion performance when dealing with different applications. This article is expected to serve as a tutorial and source of reference for researchers preparing to enter the field or who desire to employ the sparse representation theory in other fields.

Zhang, Q., Wu, J., Zhang, Q., Zhang, P., Long, G. & Zhang, C. 2018, 'Dual influence embedded social recommendation', World Wide Web, vol. 21, no. 4, pp. 849-874.
View description>>

© 2017 Springer Science+Business Media, LLC Recommender systems are designed to solve the information overload problem and have been widely studied for many years. Conventional recommender systems tend to take ratings of users on products into account. With the development of Web 2.0, Rating Networks in many online communities (e.g. Netflix and Douban) allow users not only to co-comment or co-rate their interests (e.g. movies and books), but also to build explicit social networks. Recent recommendation models use various social data, such as observable links, but these explicit pieces of social information incorporating recommendations normally adopt similarity measures (e.g. cosine similarity) to evaluate the explicit relationships in the network - they do not consider the latent and implicit relationships in the network, such as social influence. A target user’s purchase behavior or interest, for instance, is not always determined by their directly connected relationships and may be significantly influenced by the high reputation of people they do not know in the network, or others who have expertise in specific domains (e.g. famous social communities). In this paper, based on the above observations, we first simulate the social influence diffusion in the network to find the global and local influence nodes and then embed this dual influence data into a traditional recommendation model to improve accuracy. Mathematically, we formulate the global and local influence data as new dual social influence regularization terms and embed them into a matrix factorization-based recommendation model. Experiments on real-world datasets demonstrate the effective performance of the proposed method.

Zhao, Y., You, X., Yu, S., Xu, C., Yuan, W., Jing, X.Y., Zhang, T. & Tao, D. 2018, 'Multi-view manifold learning with locality alignment', Pattern Recognition, vol. 78, pp. 154-166.
View description>>

© 2018 Elsevier Ltd Manifold learning aims to discover the low dimensional space where the input high dimensional data are embedded by preserving the geometric structure. Unfortunately, almost all the existing manifold learning methods were proposed under single view scenario, and they cannot be straightforwardly applied to multiple feature sets. Although concatenating multiple views into a single feature provides a plausible solution, it remains a question on how to better explore the independence and interdependence of different views while conducting manifold learning. In this paper, we propose a multi-view manifold learning with locality alignment (MVML-LA) framework to learn a common yet discriminative low-dimensional latent space that contain sufficient information of original inputs. Both supervised algorithm (S-MVML-LA) and unsupervised algorithm (U-MVML-LA) are developed. Experiments on benchmark real-world datasets demonstrate the superiority of our proposed S-MVML-LA and U-MVML-LA over existing state-of-the-art methods.

Zhu, F., Gao, J., Xu, C., Yang, J. & Tao, D. 2018, 'On Selecting Effective Patterns for Fast Support Vector Regression Training', IEEE Transactions on Neural Networks and Learning Systems.
View description>>

IEEE It is time consuming to train support vector regression (SVR) for large-scale problems even with efficient quadratic programming solvers. This issue is particularly serious when tuning the model's parameters. One way to address the issue is to reduce the problem's scale by selecting a subset of the training set. This paper presents a fast pattern selection method by scanning the training data set to reduce a problem's scale. In particular, we find the k-nearest neighbors (kNNs) in a local region around each pattern's target value, and then determine to retain the pattern according to the distribution of its nearest neighbors. There is a high probability that the pattern locates outside the & #x03F5;-tube. Since the kNNs of a pattern are found in a very small region, it is fast to scan the whole training data set. The proposed method deals with the year prediction Million Song Data set, which contains 463,715 patterns, within 10 s on a personal computer with an Intel Core i5-4690 CPU at 3.50 GHz and 8GB DRAM. An additional advantage of the proposed method is that it can predefine the size of the selected subset according to the training set. Comprehensive empirical evaluations demonstrate that the proposed method can significantly eliminate redundant patterns for SVR training with only a slight decrease in performance.

## Conferences

Herse, S., Vitale, J., Ebrahimian, D., Tonkin, M., Ojha, S., Sidra, S., Johnston, B., Phillips, S., Gudi, S.L.K.C., Clark, J., Judge, W. & Williams, M.A. 2018, 'Bon Appetit! Robot Persuasion for Food Recommendation', ACM/IEEE International Conference on Human-Robot Interaction, pp. 125-126.
View description>>

© 2018 Authors. The integration of social robots within service industries requires social robots to be persuasive. We conducted a vignette experiment to investigate the persuasiveness of a human, robot, and an information kiosk when offering consumers a restaurant recommendation. We found that embodiment type significantly affects the persuasiveness of the agent, but only when using a specific recommendation sentence. These preliminary results suggest that human-like features of an agent may serve to boost persuasion in recommendation systems. However, the extent of the effect is determined by the nature of the given recommendation.

Tonkin, M., Vitale, J., Herse, S., Williams, M.A., Judge, W. & Wang, X. 2018, 'Design Methodology for the UX of HRI: A Field Study of a Commercial Social Robot at an Airport', ACM/IEEE International Conference on Human-Robot Interaction, pp. 407-415.
View description>>

© 2018 ACM. Research in robotics and human-robot interaction is becoming more and more mature. Additionally, more affordable social robots are being released commercially. Thus, industry is currently demanding ideas for viable commercial applications to situate social robots in public spaces and enhance customers experience. However, present literature in human-robot interaction does not provide a clear set of guidelines and a methodology to (i) identify commercial applications for robotic platforms able to position the users needs at the centre of the discussion and (ii) ensure the creation of a positive user experience. With this paper we propose to fill this gap by providing a methodology for the design of robotic applications including these desired features, suitable for integration by researchers, industry, business and government organisations. As we will show in this paper, we successfully employed this methodology for an exploratory field study involving the trial implementation of a commercially available, social humanoid robot at an airport.

Vitale, J., Tonkin, M., Herse, S., Ojha, S., Clark, J., Williams, M., Wang, X. & Judge, W. 2018, 'Be More Transparent and Users Will Like You: A Robot Privacy and User Experience Design Experiment', Proceedings of 2018 ACM/IEEE International Conference on Human- Robot Interaction, ACM/IEEE International Conference on Human-Robot Interaction 2018, ACM, Chicago, pp. 379-387.