University of Technology, Sydney

Staff directory | Webmail | Maps | Newsroom | What's on
#

Professor Longbing Cao

Longbing Cao

Core Member, Joint Research Centre in Intelligent Systems Membership

Bachelor Degree in Industrial Electric Automation, M.Sc (CUMT), PhD (UTS), PhD (CAS)

Senior Member, Institution of Electrical and Electronic Engineers
Member, Association for Computing Machinery

Email: LongBing.Cao@uts.edu.au
Phone: +61 2 9514 4477
Fax:
Room:
Mailing address:

Edit your profile

Biography

Longbing Cao was awarded a PhD in computing science at UTS and another PhD in Pattern Recognition and Intelligent Systems from Chinese Academy of Sciences. He is a professor of information technology at the Faculty of Engineering and IT, UTS; and the founding Director of the UTS Advanced Analytics Institute and a core member of the Data Sciences and Knowledge Discovery Lab at the Centre for Quantum Computation and Intelligent Systems at the Faculty of Engineering and IT, UTS. He is also the Research Leader of the Data Mining Program at the Australian Capital Markets Cooperative Research Centre.

Before joining UTS, he had several years of research experience in Chinese Academy of Sciences, and working experiences in managing and leading industry and commercial projects in telecommunications, banking and publishing, as a manager or chief technology officer.

He is now leading research in particular areas including data science, behavior informatics, data mining, machine learning, agent mining, complex intelligent systems, and in particular the enterprise applications of deep data analytics and active customer, behavior and business management in the real world. He has published 3 monographs, 4 edited books and 16 proceedings, 11 book chapters, and around 200 journal/conference publications including IJCAI, KDD, ICDE, ICDM, AAMAS, WWW and IEEE Transactions in the above areas.

Professional

Longbing is a Senior Member of IEEE, Computer and SMC Society, and a member of ACM. He is the Chair of ACM SIGKDD Australia and New Zealand Chapter, IEEE Task Force on Data Science and Advanced Analytics, and IEEE Task Force on Behavioral, Economic and Socio-cultural Computing. He serves as associate editor and guest editor on such journals as ACM Computing Surveys, and as general co-chair such as KDD2015.

He has been taken chairing roles and program committee members in both data mining and multiagent systems, such as International Conference on Data Mining (ICDM), ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), International Conference on Autonomous Agents and Multiagent Systems (AAMAS). In particular, he initiated several international workshops and special interest groups on emerging technologies, such as Data Science, Domain Driven Data Mining (DDDM), Agent Mining (Agents and Data Mining Interaction, ADMI), Behavior Informatics (BI), and Financial Data Mining (FDM) and Educational Data Mining (EDM).

Most recently, he founded the International Conference on Data Science and Advanced Analytics, co-sponsored by ACM and IEEE in 2014, which will be the first IEEE fully sponsored conference on data science since 2015. He was also a key driver in establishing the International Conference on Behavioral, Economic and Socio-cultural Computing, which is also IEEE/ACM co-sponsored and will be an IEEE conference since 2015.

Teaching areas

Longbing is currently research-only. Had experience in teaching system development, user interface design, and lectures on data mining, behavior analytics, business intelligence and advanced analytics.

Research

Research interests
Longbing's major research interest covers data science (data analytics and mining), machine learning, and artificial intelligence, in particular, the following areas:

  • Data science and big data analytics: as one of the pioneer researchers, he is initiating and leading research, education and development on data sciences, his interest on data mining and machine learning has been mainly focused on complex data analytics, non-iidness learning for big data analytics, coupled object analysis, pattern relation learning, domain-driven data mining/actionable knowledge discovery, combined mining, and multi-structured data learning for complex data and environment, as well as infrastructure, solutions, systems, algorithms and services for enterprise data mining and business analytics applications.
  • Behavior and social informatics: he proposed and has been leading research on behavior informatics and behavior computing, focusing on complex behavior and social modeling and representation, behavior and social networking analysis, social media and sentiment analysis, group/community social behavior analysis, negative behavior analysis, behavior risk and impact modeling and analysis, high risk, impact and utility behavior and social pattern analysis, behavior evolution, active behavior management, and domain-specific behavior and social analysis applications etc.
  • Agent mining: the proposed concept involving fundamental infrastructure, agent-based distributed multi-source data mining, agent behavior learning, agent-based cloud analytics, and applications in financial trading agents etc.
  • Artificial intelligence and intelligent systems: including knowledge representation, software engineering and system design for open complex intelligent systems, metasynthetic computing and engineering, and learning systems.



In Data Science and Advanced Analytics, Longbing

  • Was one of the very few originally advocated the concept of “Data Science”, founded and chairs the IEEE Task Force on Data Science and Advanced Analytics, the IEEE International Conference on Data Science and Advanced Analytics, and ACM SIGKDD Australia and New Zealand Chapter;
  • Formed the lab “Data Science and Knowledge Discovery” at QCIS, and then the Advanced Analytics Institute (AAi) at UTS dedicated to data science RED;
  • Established the globally-first Master of Analytics (Research) and PhD Thesis: Analytics at UTS;
  • Founded and drives the annual event Big Data Summit in Sydney and Canberra widely engaging industry, government and academia; and
  • Provides independent and insightful Consultancies to many tier-one industry and government organizations in Australia and globally.

Research supervision: Yes


Longbing is registered as a principal supervisor for supervising PhD and Master by research students, in areas including Analytics Master by Research, PhD Thesis: Analytics, and PhD in Computing Sciences, in the following particular areas:






  • Data science (big data analytics, data/text mining) and machine learning


  • Behavior and social informatics and computing (modeling, analysis, learning, mining and management)


  • Business analytics and corporate analytics


  • Financial data analytics and economic computing


  • Enterprise data analytics including compliance, fraud, outlier and risk analytics and management


  • Complex intelligent systems

Projects

Selected Peer-Assessed Projects

Joint Sino-Australian Research Institute on Big Data Industry

ASIC Executive Analytics Workshop

Analytics for Client Engagement- W/O no 15.11-1-1-1

Modelling and Discovering Complex Interaction Relations Hidden in Group Behaviours in Businesses, Online and Social Communities

Collaboration with IIT to Build a Research Network for Big Data Analytics

DIPB Data Mining/Analytics PhD Scholarships

Data Mining of Learning Behaviors and Interactions for Improved Academic Sentiment and Performance Transfer - OLT ID 13-3058

Mining Complex Concurrency Relationship Patterns for Dynamic Customer/Asset Interaction Modeling through Novel Industrial Behavior Networks

Sponsorship to AAI and FEIT for Student Research

ATO (13.256): Self-finalising Lodgement Analytical Model

Collaboration with Shanghai Jiaotong University and other partners to build a Research Network for Big Data Analytics

Data Mining Customer Satisfaction

Debt Collection Optimisation

Detecting Significant Changes in Organisation-Customer Interactions Leading to Non-Compliance

Effective Profiling and Detection of At-Risk Taxpayers to Strengthen ATO Compliance

The identification of 'Blackspots' across Australia

ATO Debt Collection Optimisation

PAKDD 2013: The 17th Pacific Asia Conference on Knowledge Discovery and Data Mining

Pattern Discovery of Discriminating Behaviour Associated with Hidden Communities

Review NSW Government's Expenditure Data Cube (EDC) structure and product NSW Procurement, NSW Department of Finance and Services (RFX0562)

Data Mining Driven, Deep Understanding of Interventions for Improving Compliance

Online declaration modeling pilot project

Pattern Analysis and Risk Control of E-Commerce Transactions to Secure Online Payments

Discovering Activity Patterns Driven by High Impacts in Heterogeneous and Imbalanced Data

Health Insurance Fraud Control and Pattern Analysis - Capital Markets CRC Scholarship - Bo Liu

Health Insurance Fraud Control and Pattern Analysis - Capital Markets CRC Scholarship - Yanshan Xiao

Health Insurance Fraud Control and Pattern Analysis - Capital Markets CRC Scholarship - Zhigang Zheng

Health Insurance Fraud Control and Pattern Analysis - Capital Markets CRC Scholarship - Ziye Zuo

Pilot Data Mining: Mining Discriminative Patterns Showing Significant Behavioural Difference between Lapse & Active Customers

Yanhuang Science and Technology Park

Centrelink Fraud Investigation: Opportunities and Test

Cheque Fraud Detection Project

Domain-Driven Actionable Link Discovery

Risk Analysis and Control on E-Commerce Transactions

Efficient Techniques for Mining Exceptional Patterns

Mining Activity Transactions to Strengthen Debt Prevention

Smart Image Searching: Multi-Category Object Detection

Integration of multi-agent systems and data mining in financial markets

UTS Research Strength: Centre for Intelligent Information Systems

Self-Organizing Relational Link Discovery in Mixed Data Types for Compliance

Provision of Income Reporting Data Analysis Services

Publications

Journal articles

Fan, X., Xu, R.Y.D., Cao, L. & Song, Y. 2017, 'Learning Nonparametric Relational Models by Conjugately Incorporating Node Information in a Network', IEEE Transactions on Cybernetics, vol. 47, no. 3, pp. 589-599.
View description>>

Relational model learning is useful for numerous practical applications. Many algorithms have been proposed in recent years to tackle this important yet challenging problem. Existing algorithms utilize only binary directional link data to recover hidden network structures. However, there exists far richer and more meaningful information in other parts of a network which one can (and should) exploit. The attributes associated with each node, for instance, contain crucial information to help practitioners understand the underlying relationships in a network. For this reason, in this paper, we propose two models and their solutions, namely the node-information involved mixed-membership model and the node-information involved latent-feature model, in an effort to systematically incorporate additional node information. To effectively achieve this aim, node information is used to generate individual sticks of a stick-breaking process. In this way, not only can we avoid the need to prespecify the number of communities beforehand, the algorithm also encourages that nodes exhibiting similar information have a higher chance of assigning the same community membership. Substantial efforts have been made toward achieving the appropriateness and efficiency of these models, including the use of conjugate priors. We evaluate our framework and its inference algorithms using real-world data sets, which show the generality and effectiveness of our models in capturing implicit network structures.

Ghosh, S., Li, J., Cao, L. & Ramamohanarao, K. 2017, 'Septic shock prediction for ICU patients via coupled HMM walking on sequential contrast patterns.', J Biomed Inform, vol. 66, pp. 19-31.
View description>>

BACKGROUND AND OBJECTIVE: Critical care patient events like sepsis or septic shock in intensive care units (ICUs) are dangerous complications which can cause multiple organ failures and eventual death. Preventive prediction of such events will allow clinicians to stage effective interventions for averting these critical complications. METHODS: It is widely understood that physiological conditions of patients on variables such as blood pressure and heart rate are suggestive to gradual changes over a certain period of time, prior to the occurrence of a septic shock. This work investigates the performance of a novel machine learning approach for the early prediction of septic shock. The approach combines highly informative sequential patterns extracted from multiple physiological variables and captures the interactions among these patterns via coupled hidden Markov models (CHMM). In particular, the patterns are extracted from three non-invasive waveform measurements: the mean arterial pressure levels, the heart rates and respiratory rates of septic shock patients from a large clinical ICU dataset called MIMIC-II. EVALUATION AND RESULTS: For baseline estimations, SVM and HMM models on the continuous time series data for the given patients, using MAP (mean arterial pressure), HR (heart rate), and RR (respiratory rate) are employed. Single channel patterns based HMM (SCP-HMM) and multi-channel patterns based coupled HMM (MCP-HMM) are compared against baseline models using 5-fold cross validation accuracies over multiple rounds. Particularly, the results of MCP-HMM are statistically significant having a p-value of 0.0014, in comparison to baseline models. Our experiments demonstrate a strong competitive accuracy in the prediction of septic shock, especially when the interactions between the multiple variables are coupled by the learning model. CONCLUSIONS: It can be concluded that the novelty of the approach, stems from the integration of sequence-based physiological pa...

Jiang, Y., Tsai, P., Yeh, W.C. & Cao, L. 2017, 'A honey-bee-mating based algorithm for multilevel image segmentation using Bayesian theorem', Applied Soft Computing Journal, vol. 52, pp. 1181-1190.
View description>>

© 2016 Elsevier B.V. The image thresholding techniques are considered as a must for objects segmentation, compression and target recognition, and they have been widely studied for the last few decades; for example, the multi-level thresholding methods, and as such (they) render more great challenges for image segmentation techniques that remain computationally more expensive, when their choices of threshold numbers were increased. Therefore, our aim was to propose an algorithm based on Bayesian theorem and the so-called honey-bee-mating algorithm (HBMA), called a Bayesian honey bee mating algorithm BHBMA. It can not only reduce the computational time and curse of dimensionality, but also can run more reliably and more stably. This enhanced capability was technically accomplished by embedding a new population initialization strategy based on the characteristics of multi-level thresholding technique in pixel-based intensity images arranged from lower grey levels to higher ones. Extensive experiments have shown that our proposed method outperformed other state-of-the-art algorithms empirically in terms of their effectiveness and efficiency, when applying to complex image processing scenario such as automatic target recognition.

Liu, B., Xiao, Y. & Cao, L. 2017, 'SVM-based multi-state-mapping approach for multi-class classification', Knowledge-Based Systems.
View description>>

© 2017.Traditional SVM-based multi-class classification algorithms mainly adopt the strategy of mapping the data set with all classes into a single feature space via a kernel function, in which SVM is constructed for each decomposed binary classification problem. However, it is not always possible to find an appropriate kernel function to render all the classes distinguishable in a single feature space, since each class is always derived from different data distributions. Consequently, the performance is not always as good as expected. To improve the performance of multi-class classification, this paper proposes an improved approach, called multi-state-mapping (MSM) with SVM based on hierarchical architecture, which maps the data set with all classes into different feature spaces at the different states of the decomposition of a multi-class classification problem in terms of a binary tree architecture. We prove that the computational complexity of MSM at its worst lies between that of the one-against-all scheme and one-against-one scheme. Substantial experiments have been conducted on sixteen UCI data sets to show the performance of our method. The statistical results show that MSM outperforms state-of-the-art methods in terms of accuracy and standard deviation.

Meng, X., Cao, L., Zhang, X. & Shao, J. 2017, 'Top-k coupled keyword recommendation for relational keyword queries', Knowledge and Information Systems, vol. 50, no. 3, pp. 883-916.
View description>>

© 2016 Springer-Verlag LondonProviding top-k typical relevant keyword queries would benefit the users who cannot formulate appropriate queries to express their imprecise query intentions. By extracting the semantic relationships both between keywords and keyword queries, this paper proposes a new keyword query suggestion approach which can provide typical and semantically related queries to the given query. Firstly, a keyword coupling relationship measure, which considers both intra- and inter-couplings between each pair of keywords, is proposed. Then, the semantic similarity of different keyword queries can be measured by using a semantic matrix, in which the coupling relationships between keywords in queries are reserved. Based on the query semantic similarities, we next propose an approximation algorithm to find the most typical queries from query history by using the probability density estimation method. Lastly, a threshold-based top-k query selection method is proposed to expeditiously evaluate the top-k typical relevant queries. We demonstrate that our keyword coupling relationship and query semantic similarity measures can capture the coupling relationships between keywords and semantic similarities between keyword queries accurately. The efficiency of query typicality analysis and top-k query selection algorithm is also demonstrated.

Conferences

Kim, J., Shim, K., Cao, L., Lee, J.G., Lin, X. & Moon, Y.S. 2017, 'Advances in Knowledge Discovery and Data Mining: 21st Pacific-Asia Conference, PAKDD 2017 Jeju, South Korea, May 23–26, 2017 Proceedings, Part I', Lecture Notes in Computer Science, Springer Verlag (Germany).

Kim, J., Shim, K., Cao, L., Lee, J.G., Lin, X. & Moon, Y.S. 2017, 'Advances in knowledge discovery and data mining: 21st Pacific-Asia conference, PAKDD 2017 Jeju, South Korea, may 23–26, 2017 proceedings, part II', Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics).

Journal articles

Cao, L. 2016, 'Data science: Nature and pitfalls', IEEE Intelligent Systems, vol. 31, no. 5, pp. 66-75.
View description>>

© 2001-2011 IEEE. Data science is creating exciting trends as well as significant controversy. A critical matter for the healthy development of data science in its early stages is to deeply understand the nature of data and data science and discuss the various pitfalls. These important issues motivate the discussions in this article.

Cao, L. 2016, 'Non-IID Recommender Systems: A Review and Framework of Recommendation Paradigm Shifting', Engineering, vol. 2, no. 2, pp. 212-224.
View description>>

© 2016 THE AUTHORS While recommendation plays an increasingly critical role in our living, study, work, and entertainment, the recommendations we receive are often for irrelevant, duplicate, or uninteresting products and services. A critical reason for such bad recommendations lies in the intrinsic assumption that recommended users and items are independent and identically distributed (IID) in existing theories and systems. Another phenomenon is that, while tremendous efforts have been made to model specific aspects of users or items, the overall user and item characteristics and their non-IIDness have been overlooked. In this paper, the non-IID nature and characteristics of recommendation are discussed, followed by the non-IID theoretical framework in order to build a deep and comprehensive understanding of the intrinsic nature of recommendation problems, from the perspective of both couplings and heterogeneity. This non-IID recommendation research triggers the paradigm shift from IID to non-IID recommendation research and can hopefully deliver informed, relevant, personalized, and actionable recommendations. It creates exciting new directions and fundamental solutions to address various complexities including cold-start, sparse data-based, cross-domain, group-based, and shilling attack-related issues.

Cao, L., Dong, X. & Zheng, Z. 2016, 'E-NSP: Efficient negative sequential pattern mining', Artificial Intelligence, vol. 235, pp. 156-182.
View description>>

© 2016 The Authors. Published by Elsevier B.V. As an important tool for behavior informatics, negative sequential patterns (NSP) (such as missing medical treatments) are critical and sometimes much more informative than positive sequential patterns (PSP) (e.g. using a medical service) in many intelligent systems and applications such as intelligent transport systems, healthcare and risk management, as they often involve non-occurring but interesting behaviors. However, discovering NSP is much more difficult than identifying PSP due to the significant problem complexity caused by non-occurring elements, high computational cost and huge search space in calculating negative sequential candidates (NSC). So far, the problem has not been formalized well, and very few approaches have been proposed to mine for specific types of NSP, which rely on database re-scans after identifying PSP in order to calculate the NSC supports. This has been shown to be very inefficient or even impractical, since the NSC search space is usually huge. This paper proposes a very innovative and efficient theoretical framework: Set theory-based NSP mining (ST-NSP), and a corresponding algorithm, e-NSP, to efficiently identify NSP by involving only the identified PSP, without re-scanning the database. Accordingly, negative containment is first defined to determine whether a data sequence contains a negative sequence based on set theory. Second, an efficient approach is proposed to convert the negative containment problem to a positive containment problem. The NSC supports are then calculated based only on the corresponding PSP. This not only avoids the need for additional database scans, but also enables the use of existing PSP mining algorithms to mine for NSP. Finally, a simple but efficient strategy is proposed to generate NSC. Theoretical analyses show that e-NSP performs particularly well on datasets with a small number of elements in a sequence, a large number of itemsets and low minimum s...

Cui, P., Liu, H., Aggarwal, C., Wang, F., Cao, L., Yu, P.S., Beutel, A. & Faloutsos, C. 2016, 'Uncovering and Predicting Human Behaviors', IEEE Intelligent Systems, vol. 31, no. 2, pp. 77-88.
View description>>

© 2001-2011 IEEE. This installment of Trends & amp; Controversies provides an array of perspectives on the latest research in modeling user behavior. Peng Cui, Huan Liu, Charu Aggarwal, and Fei Wang introduce the field in 'Uncovering and Predicting Human Behaviors.' The essays included are 'Computational Modeling of Complex User Behaviors: Challenges and Opportunities,' by Peng Cui, Huan Liu, Charu Aggarwal, and Fei Wang; 'Non-IID Recommendation Theories and Systems,' by Longbing Cao and Philip S. Yu; 'User Behavior Modeling and Fraud Detection,' by Alex Beutel and Christos Faloutsos; and 'Transfer Learning for Behavior Prediction,' by Weike Pan and Qiang Yang.

Hu, L., Cao, L., Cao, J., Gu, Z., Xu, G. & Yang, D. 2016, 'Learning informative priors from heterogeneous domains to improve recommendation in cold-start user domains', ACM Transactions on Information Systems, vol. 35, no. 2.
View description>>

© 2016 ACM. In the real-world environment, users have sufficient experience in their focused domains but lack experience in other domains. Recommender systems are very helpful for recommending potentially desirable items to users in unfamiliar domains, and cross-domain collaborative filtering is therefore an important emerging research topic. However, it is inevitable that the cold-start issue will be encountered in unfamiliar domains due to the lack of feedback data. The Bayesian approach shows that priors play an important role when there are insufficient data, which implies that recommendation performance can be significantly improved in cold-start domains if informative priors can be provided. Based on this idea, we propose a Weighted Irregular T ensor Factorization (WITF) model to leverage multi-domain feedback data across all users to learn the cross-domain priors w.r.t. both users and items. The features learned from WITF serve as the informative priors on the latent factors of users and items in terms of weighted matrix factorization models. Moreover, WITF is a unified framework for dealing with both explicit feedback and implicit feedback. To prove the effectiveness of our approach, we studied three typical real-world cases in which a collection of empirical evaluations were conducted on real-world datasets to compare the performance of our model and other state-of-the-art approaches. The results show the superiority of our model over comparison models.

Li, D., He, X., Cao, L. & Chen, H. 2016, 'Permutation anonymization', Journal of Intelligent Information Systems, vol. 47, no. 3, pp. 427-445.
View description>>

In data publishing, anonymization techniques have been designed to provide privacy protection. Anatomy is an important techniques for privacy preserving in data publication and attracts considerable attention in the literature. However, anatomy is fragile under background knowledge attack and the presence attack. In addition, anatomy can only be applied into limited applications. To overcome these drawbacks, we propose an improved version of anatomy: permutation anonymization, a new anonymization technique that is more effective than anatomy in privacy protection, and in the meanwhile is able to retain significantly more information in the microdata. We present the detail of the technique and build the underlying theory of the technique. Extensive experiments on real data are conducted, showing that our technique allows highly effective data analysis, while offering strong privacy guarantees.

Li, F., Xu, G. & Cao, L. 2016, 'Two-level matrix factorization for recommender systems', Neural Computing and Applications, vol. 27, no. 8, pp. 2267-2278.
View description>>

Many existing recommendation methods such as matrix factorization (MF) mainly rely on user–item rating matrix, which sometimes is not informative enough, often suffering from the cold-start problem. To solve this challenge, complementary textual relations between items are incorporated into recommender systems (RS) in this paper. Specifically, we first apply a novel weighted textual matrix factorization (WTMF) approach to compute the semantic similarities between items, then integrate the inferred item semantic relations into MF and propose a two-level matrix factorization (TLMF) model for RS. Experimental results on two open data sets not only demonstrate the superiority of TLMF model over bench-mark methods, but also show the effectiveness of TLMF for solving the cold-start problem.

Shen, B., Cao, L., Yao, M. & Gao, Y. 2016, 'Mining preferred navigation patterns by consolidating both selection and time preferences', World Wide Web.
View description>>

© 2015 Springer Science+Business Media New York Preferred navigation patterns (PNP) are those contiguous sequential patterns whose elements are preferred by users to be selected as the next steps between several different selections and are preferred by users to spend much time on. Such navigation path and time preferred patterns are more actionable than any other finds only considering either path or time in various web applications, such as web user navigation, targeted online advertising and recommendation. However, due to the conceptual confusion and limitation on navigation preference in the existing work, the corresponding algorithms cannot discover actionable preferred navigation patterns. In this paper, we study the problem of preferred navigation pattern mining by involving both navigation path and time length. Firstly, we carefully define the concepts of time preference and selection preference for time-related path sequences, which can well reflect user interests from the relative path selection and time consumption respectively. Secondly, we propose an efficient PNP-forest algorithm for identifying PNPs, by first introducing PNP-forest data structure, and then presenting PNP-forest growth and maintenance mechanism, associated with optimization strategies. Then we introduce a more efficient mining algorithm called PrefixSpan_Forest, which integrates the advantages of PrefixSpan and PNP-forest. The performance of these two algorithms are also evaluated and the results show that the algorithms can discover PNPs effectively.

Zheng, Z., Wei, W., Liu, C., Cao, W., Cao, L. & Bhatia, M. 2016, 'An effective contrast sequential pattern mining approach to taxpayer behavior analysis', World Wide Web, vol. 19, no. 4, pp. 633-651.
View description>>

Data mining for client behavior analysis has become increasingly important in business, however further analysis on transactions and sequential behaviors would be of even greater value, especially in the financial service industry, such as banking and insurance, government and so on. In a real-world business application of taxation debt collection, in order to understand the internal relationship between taxpayers’ sequential behaviors (payment, lodgment and actions) and compliance to their debt, we need to find the contrast sequential behavior patterns between compliant and non-compliant taxpayers. Contrast Patterns (CP) are defined as the itemsets showing the difference/discrimination between two classes/datasets (Dong and Li, 1999). However, the existing CP mining methods which can only mine itemset patterns, are not suitable for mining sequential patterns, such as time-ordered transactions in taxpayer sequential behaviors. Little work has been conducted on Contrast Sequential Pattern (CSP) mining so far. Therefore, to address this issue, we develop a CSP mining approach, e C S P, by using an effective CSP-tree structure, which improves the PrefixSpan tree (Pei et al., 2001) for mining contrast patterns. We propose some heuristics and interestingness filtering criteria, and integrate them into the CSP-tree seamlessly to reduce the search space and to find business-interesting patterns as well. The performance of the proposed approach is evaluated on three real-world datasets. In addition, we use a case study to show how to implement the approach to analyse taxpayer behaviour. The results show a very promising performance and convincing business value.

Conferences

Cao, L. & Zhu, H. 2016, 'Message from General Chairs', Proceedings - 2016 IEEE 2nd International Conference on Big Data Computing Service and Applications, BigDataService 2016, p. x.

Chen, Q., Hu, L., Xu, J., Liu, W. & cao, L. 2015, 'Document Similarity Analysis via Involving Both Explicit and Implicit Semantic Couplings', 2015 International Conference on Data Science and Advanced Analytics, Paris.

Fan, X., Xu, R.Y.D. & Cao, L. 2016, 'Copula mixed-membership stochastic block model', IJCAI International Joint Conference on Artificial Intelligence, International Joint Conference on Artificial Intelligence (IJCAI), AAAI Press / International Joint Conferences on Artificial Intelligence, New York City, New York, United States, pp. 1462-1468.
View description>>

The Mixed-Membership Stochastic Blockmodels (MMSB) is a popular framework for modelling social relationships by fully exploiting each individual node's participation (or membership) in a social network. Despite its powerful representations, MMSB assumes that the membership indicators of each pair of nodes (i.e., people) are distributed independently. However, such an assumption often does not hold in real-life social networks, in which certain known groups of people may correlate with each other in terms of factors such as their membership categories. To expand MMSB's ability to model such dependent relationships, a new framework - a Copula Mixed-Membership Stochastic Blockmodel - is introduced in this paper for modeling intra-group correlations, namely an individual Copula function jointly models the membership pairs of those nodes within the group of interest. This framework enables various Copula functions to be used on demand, while maintaining the membership indicator's marginal distribution needed for modelling membership indicators with other nodes outside of the group of interest. Sampling algorithms for both the finite and infinite number of groups are also detailed. Our experimental results show its superior performance in capturing group interactions when compared with the baseline models on both synthetic and real world datasets.

Gaussier, E. & Cao, L. 2016, 'Conference Report on 2015 IEEE International Conference on Data Science and Advanced Analytics (DSAA'2015) [Conference Reports]', IEEE Computational Intelligence Magazine, pp. 13-14.

Kumar, K.D., Reddy, P.K., Reddy, P.B. & Cao, L. 2016, 'Improving the performance of collaborative filtering with category-specific neighborhood', Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), Asian Conference on Intelligent Information and Database Systems (ACIIDS 2016), pp. 201-210.
View description>>

© Springer-Verlag Berlin Heidelberg 2016.Recommender system (RS) helps customers to select appropriate products from millions of products and has become a key component in e-commerce systems. Collaborative filtering (CF) based approaches are widely employed to build RSs. In CF, recommendation to the target user is computed after forming the corresponding neighbourhood of users. Neighborhood of a target user is extracted based on the similarity between the product rating vector of the target user and the product rating vectors of individual users. In CF, the methodology employed for neighborhood formation influences the performance. In this paper, we have made an effort to improve the performance of CF by proposing a different approach to compute recommendations by considering two kinds of neighborhood. One is the neighborhood by considering the product ratings of the user as a single vector and the other is based on the neighborhood of the corresponding virtual users. For the target user, the virtual users are formed by dividing the ratings based on the category of products. We have proposed a combined approach to compute better recommendations by considering both kinds of neighborhoods. The experiments results on real world MovieLens dataset show that the proposed approach improves the performance over CF.

Pang, G., Cao, L. & Chen, L. 2016, 'Identifying Outliers in Complex Categorical Data by Modeling the Feature Value Couplings', International Joint Conference on Artificial Intelligence (IJCAI).

Pang, G., Cao, L. & Chen, L. 2016, 'Outlier detection in complex categorical data by modelling the feature value couplings', Proceedings of the 25th International Joint Conference on Artificial Intelligence, AAAI, pp. 1902-1908.

Pang, G., Cao, L., Chen, L. & Liu, H. 2016, 'Unsupervised Feature Selection for Outlier De- tection by Modelling Hierarchical Value-Feature Couplings', The IEEE International Conference on Data Mining (ICDM), Barcelona.

Shao, J., Meng, X. & Cao, L. 2016, 'Mining Actionable Combined High Utility Incremental and Associated Patterns', 2016 IEEE/CSAA International Conference on Aircraft Utility Systems (AUS), IEEE/CSAA International Conference on Aircraft Utility Systems, IEEE, Beijing,China, pp. 1164-1169.
View description>>

High Utility Itemsets(HUI) Mining, instead of Frequent Pattern Mining (FIM), has been an attractive theme in data mining domain for over a decade since it can be regarded as an alternative way for researchers to identify actionable patterns. In addition, the necessity of decision-making actions and behavior-oriented strategies based on large amount of informative data impels the significance of discovering actionable patterns to be widely admitted. The current HUIM research focus has been on improving the efficiency to make algorithms faster and more stable. However, the coupling relationships between items in given itemsets are ignored. For example, the utility of one itemset might be lower than the manager expected until one additional item takes part in; and vice versa, the utility of an itemset might drop sharply when another one joins in. What's more, it is not occasional to find out that quite a lot of redundant itemsets sharing the same underlying item are presented based on existing academic HUI mining methods. Store managers would not make expected profits based on such results which makes the results not actionable at all. To this end, here we introduce a new framework for mining actionable patterns, called Mining Utility Associated Patterns (MUAP), which aims to find high utility incremental and strongly associated item/itemset with combined incorporating criteria. The outputs of this algorithm are convincing on real datasets as well as synthetic datasets.

Wang, S., Liu, W., Wu, J., Cao, L., Meng, Q. & Kennedy, P.J. 2016, 'Training deep neural networks on imbalanced data sets', Proceedings of the International Joint Conference on Neural Networks, International Joint Conference on Neural Networks (IJCNN), IEEE, Vancouver, Canada, pp. 4368-4374.
View description>>

© 2016 IEEE.Deep learning has become increasingly popular in both academic and industrial areas in the past years. Various domains including pattern recognition, computer vision, and natural language processing have witnessed the great power of deep networks. However, current studies on deep learning mainly focus on data sets with balanced class labels, while its performance on imbalanced data is not well examined. Imbalanced data sets exist widely in real world and they have been providing great challenges for classification tasks. In this paper, we focus on the problem of classification using deep network on imbalanced data sets. Specifically, a novel loss function called mean false error together with its improved version mean squared false error are proposed for the training of deep networks on imbalanced data sets. The proposed method can effectively capture classification errors from both majority class and minority class equally. Experiments and comparisons demonstrate the superiority of the proposed approach compared with conventional methods in classifying imbalanced data sets on deep neural networks.

Books

Cao, L. 2015, Metasynthetic Computing and Engineering of Complex Systems, Springer.
View description>>

Provides a comprehensive overview and introduction to the concepts, methodologies, analysis, design and applications of metasynthetic computing and engineering. The author: • Presents an overview of complex systems, especially open complex giant systems such as the Internet, complex behavioural and social problems, and actionable knowledge discovery and delivery in the big data era. • Discusses ubiquitous intelligence in complex systems, including human intelligence, domain intelligence, social intelligence, network intelligence, data intelligence and machine intelligence, and their synergy through metasynthetic engineering. • Explains the concept and methodology of human-centred, human-machine-cooperated qualitative-to-quantitative metasynthesis for understanding and managing open complex giant systems, and its computing approach: metasynthetic computing. • Introduces techniques and tools for analysing and designing problem-solving systems for open complex problems and systems. Metasynthetic Computing and Engineering uses the systematology methodology in addressing system complexities in open complex giant systems, for which it may not only be effective to apply reductionism or holism. The book aims to encourage and inspire discussions, design, implementation and reflection of effective methodologies and tools for computing and engineering open complex systems and problems. Researchers, research students and practitioners in complex systems, artificial intelligence, data science, computer science, and even system science, cognitive science, behaviour science, and social science, will find this book invaluable.

Journal articles

Cao, L. 2015, 'Coupling learning of complex interactions', Information Processing and Management, vol. 51, no. 2, pp. 167-186.
View description>>

© 2014 Elsevier Ltd. Complex applications such as big data analytics involve different forms of coupling relationships that reflect interactions between factors related to technical, business (domain-specific) and environmental (including socio-cultural and economic) aspects. There are diverse forms of couplings embedded in poor-structured and ill-structured data. Such couplings are ubiquitous, implicit and/or explicit, objective and/or subjective, heterogeneous and/or homogeneous, presenting complexities to existing learning systems in statistics, mathematics and computer sciences, such as typical dependency, association and correlation relationships. Modeling and learning such couplings thus is fundamental but challenging. This paper discusses the concept of coupling learning, focusing on the involvement of coupling relationships in learning systems. Coupling learning has great potential for building a deep understanding of the essence of business problems and handling challenges that have not been addressed well by existing learning theories and tools. This argument is verified by several case studies on coupling learning, including handling coupling in recommender systems, incorporating couplings into coupled clustering, coupling document clustering, coupled recommender algorithms and coupled behavior analysis for groups.

Cao, L., Yu, P.S. & Kumar, V. 2015, 'Nonoccurring Behavior Analytics: A New Area', IEEE Intelligent Systems, vol. 30, no. 6, pp. 4-11.

Cao, W. & Cao, L. 2015, 'Financial Crisis Forecasting via Coupled Market State Analysis', IEEE Intelligent Systems, vol. 30, no. 2, pp. 18-25.
View description>>

© 2001-2011 IEEE. Financial crisis forecasting has been a long-standing challenge that often involves couplings between indicators of multiple markets. Such couplings include implicit relations that might not be effectively detected from raw market observations. However, most methods for crisis forecasting rely directly on market observations and might not detect the hidden interactions between markets. To this end, the authors explore coupled market state analysis (CMSA), assuming that the observations of markets are governed by a collection of intra- and intercoupled hidden market states. Accordingly, they built a forecaster based on these coupled market states instead of observations.

Deng, Z., Cao, L., Jiang, Y. & Wang, S. 2015, 'Minimax Probability TSK Fuzzy System Classifier: A More Transparent and Highly Interpretable Classification Model', IEEE Transactions on Fuzzy Systems, vol. 23, no. 4, pp. 813-826.
View description>>

© 1993-2012 IEEE. When an intelligent model is used for medical diagnosis, it is desirable to have a high level of interpretability and transparent model reliability for users. Compared with most of the existing intelligence models, fuzzy systems have shown a distinctive advantage in their interpretabilities. However, how to determine the model reliability of a fuzzy system trained for a recognition task is still an unsolved problem at present. In this study, a minimax probability Takagi-Sugeno-Kang (TSK) fuzzy system classifier called MP-TSK-FSC is proposed to train a fuzzy system classifier and determine the model reliability simultaneously. For the proposed MP-TSK-FSC, a lower bound of correct classification can be presented to the users to characterize the reliability of the trained fuzzy classifier. Thus, the obtained classifier has the distinctive characteristics of both a high level of interpretability and transparent model reliability inherited from the fuzzy system and minimax probability learning strategy, respectively. Our experiments on synthetic datasets and several real-world datasets for medical diagnosis have confirmed the distinctive characteristics of the proposed method.

Fan, X. & Cao, L. 2015, 'A convergence theorem for graph shift-type algorithms', Pattern Recognition, vol. 48, no. 8, pp. 2751-2760.
View description>>

© 2015 Elsevier Ltd. All rights reserved. Abstract The Robust Graph mode seeking by Graph Shift (Liu and Yan, 2010) (RGGS) algorithm represents a recent promising approach for discovering dense subgraphs in noisy data. However, there are no theoretical foundations for proving the convergence of the RGGS algorithm, leaving the question as to whether an algorithm works for solid reasons. In this paper, we propose a generic theoretical framework consisting of three key Graph Shift (GS) components: the simplex of a generated sequence set, the monotonic and continuous objective function and closed mapping. We prove that the GS-type algorithms built on such components can be transformed to fit Zangwill's theory, and the sequence set generated by the GS procedures always terminates at a local maximum, or at worst, contains a subsequence which converges to a local maximum of the similarity measure function. The framework is verified by theoretical analysis and experimental results of several typical GS-type algorithms.

Fariha, A., Ahmed, C.F., Leung, C.K., Samiullah, M., Pervin, S. & Cao, L. 2015, 'A new framework for mining frequent interaction patterns from meeting databases', Engineering Applications of Artificial Intelligence, vol. 45, pp. 103-118.
View description>>

© 2015 Elsevier Ltd. All rights reserved. Meetings play an important role in workplace dynamics in modern life since their atomic components represent the interactions among human beings. Semantic knowledge can be acquired by discovering interaction patterns from these meetings. A recent method represents meeting interactions using tree data structure and mines interaction patterns from it. However, such a tree based method may not be able to capture all kinds of triggering relations among interactions and distinguish same interaction from different participants of different ranks. Hence, it is not suitable to find all interaction patterns such as those about correlated interactions. In this paper, we propose a new framework for mining interaction patterns from meetings using an alternative data structure, namely, weighted interaction flow directed acyclic graph (WIFDAG). Specifically, a WIFDAG captures both temporal and triggering relations among interactions in meetings. Additionally, to distinguish participants from different ranks, we assign weights to nodes in the WIFDAGs. Moreover, we also propose an algorithm called WDAGMeet for mining weighted frequent interaction patterns from meetings represented by the proposed framework. Extensive experimental results are shown to signify the effectiveness of the proposed framework and the mining algorithm built on that framework for mining frequent interaction patterns from meetings.

Fournier-Viger, P., Wu, C.W., Tseng, V.S., Cao, L. & Nkambou, R. 2015, 'Mining Partially-Ordered Sequential Rules Common to Multiple Sequences', IEEE Transactions on Knowledge and Data Engineering, vol. 27, no. 8, pp. 2203-2216.
View description>>

© 2015 IEEE. Sequential rule mining is an important data mining problem with multiple applications. An important limitation of algorithms for mining sequential rules common to multiple sequences is that rules are very specific and therefore many similar rules may represent the same situation. This can cause three major problems: (1) similar rules can be rated quite differently, (2) rules may not be found because they are individually considered uninteresting, and (3) rules that are too specific are less likely to be used for making predictions. To address these issues, we explore the idea of mining "partially-ordered sequential rules" (POSR), a more general form of sequential rules such that items in the antecedent and the consequent of each rule are unordered. To mine POSR, we propose the RuleGrowth algorithm, which is efficient and easily extendable. In particular, we present an extension (TRuleGrowth) that accepts a sliding-window constraint to find rules occurring within a maximum amount of time. A performance study with four real-life datasets show that RuleGrowth and TRuleGrowth have excellent performance and scalability compared to baseline algorithms and that the number of rules discovered can be several orders of magnitude smaller when the sliding-window constraint is applied. Furthermore, we also report results from a real application showing that POSR can provide a much higher prediction accuracy than regular sequential rules for sequence prediction.

Jiang, Y., Tsai, P., Hao, Z. & Cao, L. 2015, 'Automatic multilevel thresholding for image segmentation using stratified sampling and Tabu Search', Soft Computing, vol. 19, no. 9, pp. 2605-2617.
View description>>

Image segmentation techniques have been widely applied in many fields such as pattern recognition and feature extraction. For the primate visual attention model, the perceptual organization is an important process to automatically extract the desirable features. In this article, we propose a new method called an automatic multilevel thresholding algorithm using the stratified sampling and Tabu Search (AMTSSTS) by imitating the primate visual perceptual behaviors. In the AMTSSTS algorithm, a gray image is treated as a population with the gray values of pixels as the individuals. First, the image is evenly divided into several strata (blocks), and a sample is drawn from each stratum. Second, a Tabu Search-based optimization is applied to each sample to maximize the ratio between mean and variance for each sample. The threshold number and threshold values are preliminarily determined based on the optimized samples, and are further optimized by a deterministic method which includes a new local criterion function with property of local continuity of an image. Results of extensive simulations on Berkeley datasets indicate that AMTSSTS can obtain more effective, efficient and smooth segmentation, and can be applied to complex and real-time environments. © 2014 Springer-Verlag Berlin Heidelberg.

Liu, W., Deng, Z.H., Cao, L., Xu, X., Liu, H. & Gong, X. 2015, 'Mining top K spread sources for a specific topic and a given node', IEEE Transactions on Cybernetics, vol. 45, no. 11, pp. 2472-2483.
View description>>

© 2013 IEEE. In social networks, nodes (or users) interested in specific topics are often influenced by others. The influence is usually associated with a set of nodes rather than a single one. An interesting but challenging task for any given topic and node is to find the set of nodes that represents the source or trigger for the topic and thus identify those nodes that have the greatest influence on the given node as the topic spreads. We find that it is an NP-hard problem. This paper proposes an effective framework to deal with this problem. First, the topic propagation is represented as the Bayesian network. We then construct the propagation model by a variant of the voter model. The probability transition matrix (PTM) algorithm is presented to conduct the probability inference with the complexity O{θ 3 log 2 θ), while θ is the number nodes in the given graph. To evaluate the PTM algorithm, we conduct extensive experiments on real datasets. The experimental results show that the PTM algorithm is both effective and efficient.

Wang, C., Cao, L. & Chi, C.H. 2015, 'Formalization and Verification of Group Behavior Interactions', IEEE Transactions on Systems, Man, and Cybernetics: Systems, vol. 45, no. 8, pp. 1109-1124.
View description>>

© 2013 IEEE. Group behavior interactions, such as multirobot teamwork and group communications in social networks, are widely seen in both natural, social, and artificial behavior-related applications. Behavior interactions in a group are often associated with varying coupling relationships, for instance, conjunction or disjunction. Such coupling relationships challenge existing behavior representation methods, because they involve multiple behaviors from different actors, constraints on the interactions, and behavior evolution. In addition, the quality of behavior interactions are not checked through verification techniques. In this paper, we propose an ontology-based behavior modeling and checking system (OntoB for short) to explicitly represent and verify complex behavior relationships, aggregations, and constraints. The OntoB system provides both a visual behavior model and an abstract behavior tuple to capture behavioral elements, as well as building blocks. It formalizes various intra-coupled interactions (behaviors conducted by the same actor) via transition systems (TSs), and inter-coupled behavior aggregations (behaviors conducted by different actors) from temporal, inferential, and party-based perspectives. OntoB converts a behavior-oriented application into a TS and temporal logic formulas for further verification and refinement. We demonstrate and evaluate the effectiveness of the OntoB in modeling multirobot behaviors and their interactions in the Robocup soccer competition game. We show, that the OntoB system can effectively model complex behavior interactions, verify and refine the modeling of complex group behavior interactions in a sound manner.

Yang, W., Gao, Y., Shi, Y. & Cao, L. 2015, 'MRM-Lasso: A Sparse Multiview Feature Selection Method via Low-Rank Analysis', IEEE Transactions on Neural Networks and Learning Systems.
View description>>

Learning about multiview data involves many applications, such as video understanding, image classification, and social media. However, when the data dimension increases dramatically, it is important but very challenging to remove redundant features in multiview feature selection. In this paper, we propose a novel feature selection algorithm, multiview rank minimization-based Lasso (MRM-Lasso), which jointly utilizes Lasso for sparse feature selection and rank minimization for learning relevant patterns across views. Instead of simply integrating multiple Lasso from view level, we focus on the performance of sample-level (sample significance) and introduce pattern-specific weights into MRM-Lasso. The weights are utilized to measure the contribution of each sample to the labels in the current view. In addition, the latent correlation across different views is successfully captured by learning a low-rank matrix consisting of pattern-specific weights. The alternating direction method of multipliers is applied to optimize the proposed MRM-Lasso. Experiments on four real-life data sets show that features selected by MRM-Lasso have better multiview classification performance than the baselines. Moreover, pattern-specific weights are demonstrated to be significant for learning about multiview data, compared with view-specific weights.

Yue, X.D., Cao, L.B., Miao, D.Q., Chen, Y.F. & Xu, B. 2015, 'Multi-view attribute reduction model for traffic bottleneck analysis', Knowledge-Based Systems, vol. 86, pp. 1-10.

Conferences

Cao, L., Zhang, C., Joachims, T., Webb, G., Margineantu, D., Williams, G., Parekh, R., Fayyad, U., Eliassi-Rad, T., Fürnkranz, J., Pei, J., Zhou, Z.H., Bekkerman, R. & Tang, J. 2015, 'Foreword', Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. iii-iv.

Cao, W., Demazeau, Y., Cao, L. & Zhu, W. 2015, 'Financial crisis and global market couplings', Proceedings of the 2015 IEEE International Conference on Data Science and Advanced Analytics, DSAA 2015.
View description>>

© 2015 IEEE. The global financial crisis occurred in 2007 and its severe damaging consequences on other global financial markets, show the great importance of understanding the impact and contagion between different financial markets. A variety of methods have been proposed and implemented on market contagion. However, most of the existing literature simply test the existence of market contagion in financial crisis, and there is limited work go deep to investigate the complex market couplings which are the essence of market contagion. This is indeed very difficult as it involves the selection of discriminative indicators, the different types of couplings (intra-market coupling, inter-market coupling), the hidden characteristic of couplings, and the evaluation of market couplings in understanding crisis. To address these issues, this paper proposes a CHMM-LR framework to investigate the relations between financial crisis and three pairwise market couplings from three typical global financial markets: Equity market, Commodity market and Interest market. We adopt Coupled Hidden Markov Model (CHMM) to capture the complex hidden pairwise market couplings, and the financial crisis forecasting abilities based on different pairwise market couplings are imported to measure the relations by Logistic Regression (LR). Experiments of real financial data during the period 1990 to 2010 show the advantages of market couplings in understanding crisis. In addition, the experimental results provide crucial interpretation for the 2008 global financial crisis periods identification.

Fu, B., Xu, G., Cao, L., Wang, Z. & Wu, Z. 2015, 'Coupling multiple views of relations for recommendation', Advances in Knowledge Discovery and Data Mining - LNCS, Pacific-Asia Conference on Knowledge Discovery and Data Mining, Springer, Ho Chi Minh City, Vietnam, pp. 732-743.
View description>>

© Springer International Publishing Switzerland 2015. Learning user/item relation is a key issue in recommender system, and existing methods mostly measure the user/item relation from one particular aspect, e.g., historical ratings, etc. However, the relations between users/items could be influenced by multifaceted factors, so any single type of measure could get only a partial view of them. Thus it is more advisable to integrate measures from different aspects to estimate the underlying user/item relation. Furthermore, the estimation of underlying user/item relation should be optimal for current task. To this end, we propose a novel model to couple multiple relations measured on different aspects, and determine the optimal user/item relations via learning the optimal way of integrating these relation measures. Specifically, matrix factorization model is extended in this paper by considering the relations between latent factors of different users/items. Experiments are conducted and our method shows good performance and outperforms other baseline methods.

Jiang, X., Liu, W., Cao, L. & Long, G. 2015, 'Coupled Collaborative Filtering for Context-aware Recommendation', AAAI Publications, Twenty-Ninth AAAI Conference on Artificial Intelligence, Student Abstracts, AAAI 2015, AAAI, Austin Texas, USA, pp. 4172-4173.
View description>>

Context-aware features have been widely recognized as important factors in recommender systems. However, as a major technique in recommender systems, traditional Collaborative Filtering (CF) does not provide a straight-forward way of integrating the context-aware information into personal recommendation. We propose a Coupled Collaborative Filtering (CCF) model to measure the contextual information and use it to improve recommendations. In the proposed approach, coupled similarity computation is designed to be calculated by interitem, intra-context and inter-context interactions among item, user and context-ware factors. Experiments based on different types of CF models demonstrate the effectiveness of our design.

Li, F., Xu, G. & Cao, L. 2015, 'Coupled Matrix Factorization within Non-IID Context', Proceedings, Part II, 19th Pacific-Asia Conference, PAKDD 2015, PAKDD 2015, Springer, Ho Chi Minh City, Vietnam, pp. 707-719.
View description>>

Recommender systems research has experienced different stages such as from user preference understanding to content analysis. Typical recommendation algorithms were built on the following bases: (1) assuming users and items are IID, namely independent and identically distributed, and (2) focusing on specific aspects such as user preferences or contents. In reality, complex recommendation tasks involve and request (1) personalized outcomes to tailor heterogeneous subjective preferences; and (2) explicit and implicit objective coupling relationships between users, items, and ratings to be considered as intrinsic forces driving preferences. This inevitably involves the non-IID complexity and the need of combining subjective preference with objective couplings hidden in recommendation applications. In this paper, we propose a novel generic coupled matrix factorization (CMF) model by incorporating non-IID coupling relations between users and items. Such couplings integrate the intra-coupled interactions within an attribute and inter-coupled interactions among different attributes. Experimental results on two open data sets demonstrate that the user/item couplings can be effectively applied in RS and CMF outperforms the benchmark methods.

Liu, C. & Cao, L. 2015, 'A coupled k-nearest neighbor algorithm for multi-label classification', Advances in Knowledge Discovery and Data Mining - LNCS, Pacific-Asia Conference on Knowledge Discovery and Data Mining, Springer, Hi Chi Minh City, Vietnam, pp. 176-187.
View description>>

© Springer International Publishing Switzerland 2015. ML-kNN is a well-known algorithm for multi-label classification. Although effective in some cases, ML-kNN has some defect due to the fact that it is a binary relevance classifier which only considers one label every time. In this paper, we present a new method for multi-label classification, which is based on lazy learning approaches to classify an unseen instance on the basis of its k nearest neighbors. By introducing the coupled similarity between class labels, the proposed method exploits the correlations between class labels, which overcomes the shortcoming of ML-kNN. Experiments on benchmark data sets show that our proposed Coupled Multi-Label k Nearest Neighbor algorithm (CML-kNN) achieves superior performance than some existing multi-label classification algorithms.

Shao, J., Yin, J., Liu, W. & Cao, L. 2015, 'Actionable Combined High Utility Itemset Mining', AAAI'15 Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence, AAAI Conference on Artificial Intelligence Pages, pp. 4206-4207.

Shao, J., Yin, J., Liu, W. & Cao, L. 2015, 'Mining Actionable Combined Patterns of High Utility and Frequency', Proceedings of the IEEE International Conference on Data Science and Advanced Analytics, IEEE International Conference on Data Science and Advanced Analytics, IEEE, Paris, pp. 1-10.
View description>>

In recent years, the importance of identifying actionable patterns has become increasingly recognized so that decision-support actions can be inspired by the resultant patterns. A typical shift is on identifying high utility rather than highly frequent patterns. Accordingly, High Utility Itemset (HUI) Mining methods have become quite popular as well as faster and more reliable than before. However, the current research focus has been on improving the efficiency while the coupling relationships between items are ignored. It is important to study item and itemset couplings inbuilt in the data. For example, the utility of one itemset might be lower than user-specified threshold until one additional itemset takes part in; and vice versa, an item's utility might be high until another one joins in. In this way, even though some absolutely high utility itemsets can be discovered, sometimes it is easily to find out that quite a lot of redundant itemsets sharing the same item are mined (e.g., if the utility of a diamond is high enough, all its supersets are proved to be HUIs). Such itemsets are not actionable, and sellers cannot make higher profit if marketing strategies are created on top of such findings. To this end, here we introduce a new framework for mining actionable high utility association rules, called Combined Utility-Association Rules (CUAR), which aims to find high utility and strong association of itemset combinations incorporating item/itemset relations. The algorithm is proved to be efficient per experimental outcomes on both real and synthetic datasets.

Wei, C., Hu, L. & Cao, L. 2015, 'Deep Modeling Complex Couplings within Financial Markets', AAAI'15 Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence, AAAI Conference on Artificial Intelligence, AAI Press, pp. 2518-2524.
View description>>

The global financial crisis occurred in 2008 and its contagion to other regions, as well as the long-lasting impact on different markets, show that it is increasingly important to understand the complicated coupling relationships across financial markets. This is indeed very difficult as complex hidden coupling relationships exist between different financial markets in various countries, which are very hard to model. The couplings involve interactions between homogeneous markets from various countries (we call intra-market coupling), interactions between heterogeneous markets (inter-market coupling) and interactions between current and past market behaviors (temporal coupling). Very limited work has been done towards modeling such complex couplings, whereas some existing methods predict market movement by simply aggregating indicators from various markets but ignoring the inbuilt couplings. As a result, these methods are highly sensitive to observations, and may often fail when financial indicators change slightly. In this paper, a coupled deep belief network is designed to accommodate the above three types of couplings across financial markets. With a deep-architecture model to capture the high-level coupled features, the proposed approach can infer market trends. Experimental results on data of stock and currency markets from three countries show that our approach outperforms other baselines, from both technical and business perspectives.

Yu, P.S., Cao, L., Zeng, Y., An, B., Symeonidis, A.L., Gorodetsky, V. & Coenen, F. 2015, 'Message from the workshop chairs - 2014 International Workshopon Agents and Data Mining Interaction (ADMI 2014)', Lecture Notes in Artificial Intelligence (Subseries of Lecture Notes in Computer Science), 2014 International Workshop on Agents and Data Mining Interaction (ADMI 2014), Springer, Paris, France, pp. v-vi.
View description>>

We are pleased to welcome you to the proceedings of the 2014 International Workshop on Agents and Data Mining Interaction (ADMI 2014), held jointly with AAMAS 2014. In recent years, agents and data mining interaction (ADMI, or agent mining) has emerged as a very promising research field. Following the success of previous ADMIs, ADMI 2014 provided a premier forum for sharing research and engineering results, as well as potential challenges and prospects encountered in the coupling between agents and data mining

Yue, X., Cao, L., Chen, Y. & Xu, B. 2015, 'Multi-View Actionable Patterns for Managing Traffic Bottleneck', Artificial Intelligence for Transportation: Advice, Interactivity and Actor Modeling: Papers from the 2015 AAAI Workshop, AAAI Conference on Artificial Intelligence, Austin, Texas, USA, pp. 64-70.
View description>>

Discovering congestion patterns from table-formed traf- fic reports is critical for traffic bottleneck analysis. However, patterns mined by existing algorithms often do not satisfy user requirements and are not actionable for traffic management. Traffic officers may not pursue the most frequent patterns but expect mining outcomes showing the dependence between congestion and various kinds of road properties for traffic planning. Such multi-view analysis requires to integrate user preferences of data attributes into pattern mining process. To tackle this problem, we propose a multi-view attributes reduction model for discovering the patterns of user interests, in which user views are interpreted as preferred attributes and formulated by attribute orders. Based on the pattern discovery model, a workflow is built for traf- fic bottleneck analysis, which consists of data preprocessing, preference representation and congestion pattern mining. Our approach is validated on the reports of road conditions from Shanghai, which shows that the resultant multi-view findings are effective for analyzing congestion causes and traffic management.

Zhou, X., Chen, L., Zhang, Y., Cao, L., Huang, G. & Wang, C. 2015, 'Online Video Recommendation in Sharing Community', SIGMOD '15 Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, ACM SIGMOD International Conference on Management of Data, ACM, New York, pp. 1645-1656.
View description>>

The creation of sharing communities has resulted in the astonishing increasing of digital videos, and their wide applications in the domains such as entertainment, online news broadcasting etc. The improvement of these applications relies on effective solutions for social user access to video data. This fact has driven the recent research interest in social recommendation in shared communities. Although certain effort has been put into video recommendation in shared communities, the contextual information on social users has not been well exploited for effective recommendation. In this paper, we propose an approach based on the content and social information of videos for the recommendation in sharing communities. Specifically, we first exploit a robust video cuboid signature together with the Earth Mover's Distance to capture the content relevance of videos. Then, we propose to identify the social relevance of clips using the set of users belonging to a video. We fuse the content relevance and social relevance to identify the relevant videos for recommendation. Following that, we propose a novel scheme called sub-community-based approximation together with a hash-based optimization for improving the efficiency of our solution. Finally, we propose an algorithm for efficiently maintaining the social updates in dynamic shared communities. The extensive experiments are conducted to prove the high effectiveness and efficiency of our proposed video recommendation approach.

Chapters

Müller, J.P., Yu, P.S., Cao, L., Zeng, Y., Symeonidis, A.L. & Gorodetsky, V. 2014, 'Preface' in Agents and Data Mining Interaction, pp. VI-VI.

Journal articles

Cao, L. 2014, 'Non-IIDness Learning in Behavioral and Social Data', The Computer Journal, vol. 57, no. 9, pp. 1358-1370.

Cao, L. & Joachims, T. 2014, 'Behavior Computing', IEEE Intelligent Systems, vol. 29, no. 4, pp. 62-66.

Cao, L. & Xu, G. 2014, 'Behavior Informatics: A New Perspective', IEEE Intelligent Systems, vol. 29, no. 4, pp. 62-80.

Deng, Z., Choi, K.S., Cao, L. & Wang, S. 2014, 'T2fela: Type-2 fuzzy extreme learning algorithm for fast training of interval type-2 TSK fuzzy logic system', IEEE Transactions on Neural Networks and Learning Systems, vol. 25, no. 4, pp. 664-676.
View description>>

A challenge in modeling type-2 fuzzy logic systems is the development of efficient learning algorithms to cope with the ever increasing size of real-world data sets. In this paper, the extreme learning strategy is introduced to develop a fast training algorithm for interval type-2 Takagi-Sugeno-Kang fuzzy logic systems. The proposed algorithm, called type-2 fuzzy extreme learning algorithm (T2FELA), has two distinctive characteristics. First, the parameters of the antecedents are randomly generated and parameters of the consequents are obtained by a fast learning method according to the extreme learning mechanism. In addition, because the obtained parameters are optimal in the sense of minimizing the norm, the resulting fuzzy systems exhibit better generalization performance. The experimental results clearly demonstrate that the training speed of the proposed T2FELA algorithm is superior to that of the existing state-of-the-art algorithms. The proposed algorithm also shows competitive performance in generalization abilities. © 2013 IEEE.

Fan, X., Cao, L. & Xu, R.Y.D. 2014, 'Dynamic Infinite Mixed-Membership Stochastic Blockmodel', IEEE Transactions on Neural Networks and Learning Systems.
View description>>

Directional and pairwise measurements are often used to model interactions in a social network setting. The mixed-membership stochastic blockmodel (MMSB) was a seminal work in this area, and its ability has been extended. However, models such as MMSB face particular challenges in modeling dynamic networks, for example, with the unknown number of communities. Accordingly, this paper proposes a dynamic infinite mixed-membership stochastic blockmodel, a generalized framework that extends the existing work to potentially infinite communities inside a network in dynamic settings (i.e., networks are observed over time). Additional model parameters are introduced to reflect the degree of persistence among one's memberships at consecutive time stamps. Under this framework, two specific models, namely mixture time variant and mixture time invariant models, are proposed to depict two different time correlation structures. Two effective posterior sampling strategies and their results are presented, respectively, using synthetic and real-world data.

Fu, B., Wang, Z., Xu, G. & Cao, L. 2014, 'Multi-label learning based on iterative label propagation over graph', Pattern Recognition Letters, vol. 42, no. 1, pp. 85-90.

Liu, B., Xiao, Y., Yu, P.S., Cao, L., Zhang, Y. & Hao, Z. 2014, 'Uncertain One-Class Learning and Concept Summarization Learning on Uncertain Data Streams', IEEE Transactions on Knowledge and Data Engineering, vol. 26, no. 2, pp. 468-484.
View description>>

This paper presents a novel framework to uncertain one-class learning and concept summarization learning on uncertain data streams. Our proposed framework consists of two parts. First, we put forward uncertain one-class learning to cope with data of uncertainty. We first propose a local kernel-density-based method to generate a bound score for each instance, which refines the location of the corresponding instance, and then construct an uncertain one-class classifier (UOCC) by incorporating the generated bound score into a one-class SVM-based learning phase. Second, we propose a support vectors (SVs)-based clustering technique to summarize the concept of the user from the history chunks by representing the chunk data using support vectors of the uncertain one-class classifier developed on each chunk, and then extend k-mean clustering method to cluster history chunks into clusters so that we can summarize concept from the history chunks. Our proposed framework explicitly addresses the problem of one-class learning and concept summarization learning on uncertain one-class data streams. Extensive experiments on uncertain data streams demonstrate that our proposed uncertain one-class learning method performs better than others, and our concept summarization method can summarize the evolving interests of the user from the history chunks.

Liu, B., Xiao, Y., Yu, P.S., Hao, Z. & Cao, L. 2014, 'An efficient orientation distance–based discriminative feature extraction method for multi-classification', Knowledge and Information Systems, vol. 39, no. 2, pp. 409-433.
View description>>

Feature extraction is an important step before actual learning. Although many feature extraction methods have been proposed for clustering, classification and regression, very limited work has been done on multi-class classification problems. This paper proposes a novel feature extraction method, called orientation distance–based discriminative (ODD) feature extraction, particularly designed for multi-class classification problems. Our proposed method works in two steps. In the first step, we extend the Fisher Discriminant idea to determine an appropriate kernel function and map the input data with all classes into a feature space where the classes of the data are well separated. In the second step, we put forward two variants of ODD features, i.e., one-vs-all-based ODD and one-vs-one-based ODD features. We first construct hyper-plane (SVM) based on one-vs-all scheme or one-vs-one scheme in the feature space; we then extract one-vs-all-based or one-vs-one-based ODD features between a sample and each hyper-plane. These newly extracted ODD features are treated as the representative features and are thereafter used in the subsequent classification phase. Extensive experiments have been conducted to investigate the performance of one-vs-all-based and one-vs-one-based ODD features for multi-class classification. The statistical results show that the classification accuracy based on ODD features outperforms that of the state-of-the-art feature extraction methods.

Liu, B., Xiao, Y.S., Yu, P.S., Hao, Z.F. & Cao, L.B. 2014, 'An Efficient Approach for Outlier Detection with Imperfect Data Labels', IEEE Transactions on Knowledge and Data Engineering, vol. 26, no. 7, pp. 1602-1616.
View description>>

The task of outlier detection is to identify data objects that are markedly different from or inconsistent with the normal set of data. Most existing solutions typically build a model using the normal data and identify outliers that do not fit the represented model very well. However, in addition to normal data, there also exist limited negative examples or outliers in many applications, and data may be corrupted such that the outlier detection data is imperfectly labeled. These make outlier detection far more difficult than the traditional ones. This paper presents a novel outlier detection approach to address data with imperfect labels and incorporate limited abnormal examples into learning. To deal with data with imperfect labels, we introduce likelihood values for each input data which denote the degree of membership of an example toward the normal and abnormal classes respectively. Our proposed approach works in two steps. In the first step, we generate a pseudo training dataset by computing likelihood values of each example based on its local behavior. We present kernel k-means clustering method and kernel LOF-based method to compute the likelihood values. In the second step, we incorporate the generated likelihood values and limited abnormal examples into SVDD-based learning framework to build a more accurate classifier for global outlier detection. By integrating local and global outlier detection, our proposed method explicitly handles data with imperfect labels and enhances the performance of outlier detection. Extensive experiments on real life datasets have demonstrated that our proposed approaches can achieve a better tradeoff between detection rate and false alarm rate as compared to state-of-the-art outlier detection approaches.

Liu, H.-.D., Yang, M., Gao, Y. & Cao, L. 2014, 'Fast Local Histogram Specification', IEEE Transactions on Circuits and Systems for Video Technology, vol. 24, no. 11, pp. 1833-1843.
View description>>

Local histogram specification (LHS) is a useful technique for image processing. However, LHS faces a critical computational challenge when it is applied to high-resolution high-precision images. The calculation of the values in the cumulative distribution function (CDF) and the mapped value for the central pixel in each sliding window is time consuming with the computational complexity O(s + L) of the state-of-the-art techniques, where s is the side length of the square window and L is the number of gray levels. In this paper, we propose a fast algorithm for LHS, called fast local histogram specification (FLHS). FLHS reduces the complexity of calculating the CDF value for the central pixel in each sliding window to O(s + root L), and the time complexity for the mapping procedure in each window to O(log L). This results in the overall time complexity of LHS reduced from O(s + L) to O(s + root L) in each sliding window. Theoretical analysis shows that the newly developed algorithm is efficient. Experimental results on the 8-bit and high-resolution high-precision (16-bit) images demonstrate the efficiency of our proposed algorithm.

Tu, E., Cao, L., Yang, J. & Kasabov, N. 2014, 'A novel graph-based k-means for nonlinear manifold clustering and representative selection', Neurocomputing, vol. 143, pp. 109-122.

Wang, C., Cao, L. & Miao, B. 2014, 'Optimal feature selection for sparse linear discriminant analysis and its applications in gene expression data', Computational Statistics and Data Analysis, vol. 66, pp. 140-149.
View description>>

This work studies the theoretical rules of feature selection in linear discriminant analysis (LDA), and a new feature selection method is proposed for sparse linear discriminant analysis. An l1 minimization method is used to select the important features from which the LDA will be constructed. The asymptotic results of this proposed two-stage LDA (TLDA) are studied, demonstrating that TLDA is an optimal classification rule whose convergence rate is the best compared to existing methods. The experiments on simulated and real datasets are consistent with the theoretical results and show that TLDA performs favorably in comparison with current methods. Overall, TLDA uses a lower minimum number of features or genes than other approaches to achieve a better result with a reduced misclassification rate.

Wang, C., Cao, L., Gaussier, E., Li, J., Ou, Y. & Luo, D. 2014, 'Coupled Behavior Representation, Modeling, Analysis, and Reasoning', IEEE Intelligent Systems, vol. 29, no. 4, pp. 66-69.

Wang, C., Dong, X., Zhou, F., Cao, L. & Chi, C.H. 2014, 'Coupled Attribute Similarity Learning on Categorical Data', IEEE Transactions on Neural Networks and Learning Systems.
View description>>

Attribute independence has been taken as a major assumption in the limited research that has been conducted on similarity analysis for categorical data, especially unsupervised learning. However, in real-world data sources, attributes are more or less associated with each other in terms of certain coupling relationships. Accordingly, recent works on attribute dependency aggregation have introduced the co-occurrence of attribute values to explore attribute coupling, but they only present a local picture in analyzing categorical data similarity. This is inadequate for deep analysis, and the computational complexity grows exponentially when the data scale increases. This paper proposes an efficient data-driven similarity learning approach that generates a coupled attribute similarity measure for nominal objects with attribute couplings to capture a global picture of attribute similarity. It involves the frequency-based intra-coupled similarity within an attribute and the inter-coupled similarity upon value co-occurrences between attributes, as well as their integration on the object level. In particular, four measures are designed for the inter-coupled similarity to calculate the similarity between two categorical values by considering their relationships with other attributes in terms of power set, universal set, joint set, and intersection set. The theoretical analysis reveals the equivalent accuracy and superior efficiency of the measure based on the intersection set, particularly for large-scale data sets. Intensive experiments of data structure and clustering algorithms incorporating the coupled dissimilarity metric achieve a significant performance improvement on state-of-the-art measures and algorithms on 13 UCI data sets, which is confirmed by the statistical analysis. The experiment results show that the proposed coupled attribute similarity is generic, and can effectively and efficiently capture the intrinsic and global interactions within and between attr...

Wang, C., Tong, T., Cao, L. & Miao, B. 2014, 'Non-parametric shrinkage mean estimation for quadratic loss functions with unknown covariance matrices', Journal of Multivariate Analysis, vol. 125, pp. 222-232.
View description>>

In this paper, a shrinkage estimator for the population mean is proposed under known quadratic loss functions with unknown covariance matrices. The new estimator is non-parametric in the sense that it does not assume a specific parametric distribution for the data and it does not require the prior information on the population covariance matrix. Analytical results on the improvement of the proposed shrinkage estimator are provided and some corresponding asymptotic properties are also derived. Finally, we demonstrate the practical improvement of the proposed method over existing methods through extensive simulation studies and real data analysis.

Xiao, Y., Liu, B., Hao, Z. & Cao, L. 2014, 'A K-Farthest-Neighbor-based approach for support vector data description', Applied Intelligence, vol. 41, no. 1, pp. 196-211.

Xiao, Y.S., Liu, B., Hao, Z.F. & Cao, L.B. 2014, 'A Similarity-Based Classification Framework for Multiple-Instance Learning', IEEE Transactions on Cybernetics, vol. 44, no. 4, pp. 500-515.
View description>>

Multiple-instance learning (MIL) is a generalization of supervised learning that attempts to learn useful information from bags of instances. In MIL, the true labels of instances in positive bags are not available for training. This leads to a critical challenge, namely, handling the instances of which the labels are ambiguous (ambiguous instances). To deal with these ambiguous instances, we propose a novel MIL approach, called similarity-based multiple-instance learning (SMILE). Instead of eliminating a number of ambiguous instances in positive bags from training the classifier, as done in some previous MIL works, SMILE explicitly deals with the ambiguous instances by considering their similarity to the positive class and the negative class. Specifically, a subset of instances is selected from positive bags as the positive candidates and the remaining ambiguous instances are associated with two similarity weights, representing the similarity to the positive class and the negative class, respectively. The ambiguous instances, together with their similarity weights, are thereafter incorporated into the learning phase to build an extended SVM-based predictive classifier. A heuristic framework is employed to update the positive candidates and the similarity weights for refining the classification boundary. Experiments on real-world datasets show that SMILE demonstrates highly competitive classification accuracy and shows less sensitivity to labeling noise than the existing MIL methods.

Xu, Z., Zhang, Y. & Cao, L. 2014, 'Social Image Analysis From a Non-IID Perspective', IEEE Transactions on Multimedia, vol. 16, no. 7, pp. 1986-1998.
View description>>

An image in social media, termed a social image, exhibits characteristics different from images widely discussed in image processing. They can be described by both content and social related attributes, called social image attributes, including visual contents, users, tags, and timestamps. There are strong coupling relationships between social image attributes, which make social images not independent and identically distributed (non-IID). By analyzing the relationships among these attributes, we can better understand the semantic activities conducted on such non-IID social images, hence enabling new applications including content organization, recommendation, and social activity understanding. In this article, we present a novel algorithm to analyze the coupling relationships between social images, which involves not only intra-coupled similarity within a social image attribute, but also inter-coupled similarity between attributes, in analyzing the non-IIDness of the similarity between social images. In particular, we propose a multi-entry version of the coupled similarity metric to deal with attributes (i.e., tags) which have a many-to-one relationship with respect to images. Experimental results on a Flickr group dataset show that the proposed algorithm captures coupling relationships and therefore achieves promising results in various applications, including image clustering and tagging.

Yang, W., Gao, Y., Cao, L., Yang, M. & Shi, Y. 2014, 'mPadal: a joint local-and-global multi-view feature selection method for activity recognition', Applied Intelligence, vol. 41, no. 3, pp. 776-790.
View description>>

© 2014, Springer Science+Business Media New York. The selection of multi-view features plays an important role for classifying multi-view data, especially the data with high dimension. In this paper, a novel multi-view feature selection method via joint local pattern-discrimination and global label-relevance analysis (mPadal) is proposed. Different from the previous methods which globally select the multi-view features directly via view-level analysis, the proposed mPadal employs a new joint local-and-global way. In the local selection phase, the pattern-discriminative features will be first selected by considering the local neighbor structure of the most discriminative patterns. In the global selection phase, the features with the topmost label-relevance, which can well separate different classes in the current view, are selected. Finally, the two parts selected are combined to form the final features. Experimental results show that compared with several baseline methods in publicly available activity recognition dataset IXMAS, mPadal performs the best in terms of the highest accuracy, precision, recall and F1 score. Moreover, the features selected by mPadal are highly complementary among views for classification, which is able to improve the classification performance according to previous theoretical studies.

Yue, X.D., Miao, D.Q., Cao, L.B., Wu, Q. & Chen, Y.F. 2014, 'An efficient color quantization based on generic roughness measure', Pattern Recognition, vol. 47, no. 4, pp. 1777-1789.

Zhu, L., Cao, L., Yang, J. & Lei, J. 2014, 'Evolving soft subspace clustering', Applied Soft Computing, vol. 14, no. b, pp. 210-228.

Conferences

Deng, Z., Jiang, Y., Cao, L. & Wang, S. 2014, 'Knowledge-leverage based TSK fuzzy system with improved knowledge transfer', IEEE International Conference on Fuzzy Systems, pp. 178-185.
View description>>

© 2014 IEEE. In this study, the improved knowledge-leverage based TSK fuzzy system modeling method is proposed in order to overcome the weaknesses of the knowledge-leverage based TSK fuzzy system (TSK-FS) modeling method. In particular, two improved knowledge-leverage strategies have been introduced for the parameter learning of the antecedents and consequents of the TSK-FS constructed in the current scene by transfer learning from the reference scene, respectively. With the improved knowledge-leverage learning abilities, the proposed method has shown the more adaptive modeling effect compared with traditional TSK fuzzy modeling methods and some related methods on the synthetic and real world datasets.

Hu, L., Cao, J., Xu, G., Cao, L., Gu, Z. & Cao, W. 2014, 'Deep modeling of group preferences for group-based recommendation', Proceedings of the National Conference on Artificial Intelligence, AI Access Foundation, pp. 1861-1867.
View description>>

Nowadays, most recommender systems (RSs) mainly aim to suggest appropriate items for individuals. Due to the social nature of human beings, group activities have become an integral part of our daily life, thus motivating the study on group RS (GRS). However, most existing methods used by GRS make recommendations through aggregating individual ratings or individual predictive results rather than considering the collective features that govern user choices made within a group. As a result, such methods are heavily sensitive to data, hence they often fail to learn group preferences when the data are slightly inconsistent with predefined aggregation assumptions. To this end, we devise a novel GRS approach which accommodates both individual choices and group decisions in a joint model. More specifically, we propose a deep-architecture model built with collective deep belief networks and dual-wing restricted Boltzmann machines. With such a deep model, we can use high-level features, which are induced from lower-level features, to represent group preference so as to relieve the vulnerability of data. Finally, the experiments conducted on a real-world dataset prove the superiority of our deep model over other state-of-the-art methods.

Hu, L., Cao, W., Cao, J., Xu, G., Cao, L. & Gu, Z. 2014, 'Bayesian Heteroskedastic Choice Modeling on Non-identically Distributed Linkages.', Proceedings of the 2014 IEEE International Conference on Data Mining, 2014 IEEE International Conference on Data Mining, IEEE, Shenzhen, China, pp. 851-856.
View description>>

Choice modeling (CM) aims to describe and predict choices according to attributes of subjects and options. If we presume each choice making as the formation of link between subjects and options, immediately CM can be bridged to link analysis and prediction (LAP) problem. However, such a mapping is often not trivial and straightforward. In LAP problems, the only available observations are links among objects but their attributes are often inaccessible. Therefore, we extend CM into a latent feature space to avoid the need of explicit attributes. Moreover, LAP is usually based on binary linkage assumption that models observed links as positive instances and unobserved links as negative instances. Instead, we use a weaker assumption that treats unobserved links as pseudo negative instances. Furthermore, most subjects or options may be quite heterogeneous due to the long-tail distribution, which is failed to capture by conventional LAP approaches. To address above challenges, we propose a Bayesian heteroskedastic choice model to represent the non-identically distributed linkages in the LAP problems. Finally, the empirical evaluation on real-world datasets proves the superiority of our approach

Li, F., Xu, G. & Cao, L. 2014, 'Coupled Item-Based Matrix Factorization', Proceedings, Part I of the Web Information Systems Engineering - WISE 2014 - 15th International Conference, Web Information Systems Engineering, Springer, Thessaloniki, Greece, pp. 1-14.
View description>>

The essence of the challenges cold start and sparsity in Recommender Systems (RS) is that the extant techniques, such as Collaborative Filtering (CF) and Matrix Factorization (MF), mainly rely on the user-item rating matrix, which sometimes is not informative enough for predicting recommendations. To solve these challenges, the objective item attributes are incorporated as complementary information. However, most of the existing methods for inferring the relationships between items assume that the attributes are “independently and identically distributed (iid)”, which does not always hold in reality. In fact, the attributes are more or less coupled with each other by some implicit relationships. Therefore, in this paper we propose an attribute-based coupled similarity measure to capture the implicit relationships between items. We then integrate the implicit item coupling into MF to form the Coupled Item-based Matrix Factorization (CIMF) model. Experimental results on two open data sets demonstrate that CIMF outperforms the benchmark methods.

Li, M., Li, J., Ou, Y., Zhang, Y., Luo, D., Bahtia, M. & Cao, L. 2012, 'Coupled K-nearest centroid classification for non-iid data', Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), Transactions on Computational Collective Intelligence XV: International Conference on Practical Applications on Agents and Multi-Agent Systems, Springer Verlag, Salamanca, pp. 89-100.
View description>>

Most traditional classification methods assume the independence and identical distribution (iid) of objects, attributes and values. However, real world data, such as multi-agent data and behavioral data, usually contains strong couplings among values, attributes and objects, which greatly challenges existing methods and tools. This work targets the coupling similarities from these three perspectives and designs a novel classification method that applies a weighted K-Nearest Centroid to obtain the coupled similarity for non-iid data. From value and attribute perspectives, coupled similarity serves as a metric for nominal objects, which consider not only intra-coupled similarity within an attribute but also inter-coupled similarity between attributes. From the object perspective, we propose a more effective method that measures the centroid object by connecting all related objects. Extensive experiments on UCI and student data sets reveal that the proposed method outperforms classical methods for higher accuracy, especially in imbalanced data.

Li, M., Li, J., Ou, Y., Zhang, Y., Luo, D., Bahtia, M. & Cao, L. 2014, 'Learning heterogeneous coupling relationships between non-IID terms', Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), pp. 79-91.
View description>>

With the rapid proliferation of social media and online community, a vast amount of text data has been generated. Discovering the insightful value of the text data has increased its importance, a variety of text mining and process algorithms have been created in the recent years such as classification, clustering, similarity comparison. Most previous research uses a vector-space model for text representation and analysis. However, the vector-space model does not utilise the information about the relationships between the term to term. Moreover, the classic classification methods also ignore the relationships between each text document to another. In other word, the traditional text mining techniques assume the relation between terms and between documents are independent and identically distributed (iid). In this paper, we will introduce a novel term representation by involving the coupled relations from term to term. This coupled representation provides much richer information that enables us to create a coupled similarity metric for measuring document similarity, and a coupled document similarity based K-Nearest centroid classifier will be applied to the classification task. Experiments verify the proposed approach outperforming the classic vector-space based classifier, and show potential advantages and richness in exploring the other text mining tasks. © 2014 Springer-Verlag.

Liu, C., Cao, L. & Yu, P.S. 2014, 'A hybrid coupled k-nearest neighbor algorithm on imbalance data', Proceedings of the International Joint Conference on Neural Networks, pp. 2011-2018.
View description>>

© 2014 IEEE. The state-of-the-art classification algorithms rarely consider the relationship between the attributes in the data sets and assume the attributes are independently to each other (IID). However, in real-world data, these attributes are more or less interacted via explicit or implicit relationships. Although the classifiers for class-balanced data are relatively well developed, the classification of class-imbalanced data is not straightforward, especially for mixed type data which has both categorical and numerical features. Limited research has been conducted on the class-imbalanced data. Some algorithms mainly synthesize or remove instances to force the sizes of each class comparable, which may change the inherent data structure or introduces noise to the source data. While for the distance or similarity based algorithms, they ignored the relationship between features when computing the similarity. This paper proposes a hybrid coupled k-nearest neighbor classification algorithm (HC-kNN) for mixed type data, by doing discretization on numerical features to adapt the inter coupling similarity as we do on categorical features, then combing this coupled similarity to the original similarity or distance, to overcome the shortcoming of the previous algorithms. The experiment results demonstrate that our proposed algorithm can get a higher average performance than that of the relevant algorithms (e.g. the variants of kNN, Decision Tree, SMOTE and NaiveBayes).

Liu, C., Cao, L. & Yu, P.S. 2014, 'Coupled fuzzy k-nearest neighbors classification of imbalanced non-IID categorical data', Proceedings of the International Joint Conference on Neural Networks, pp. 1122-1129.
View description>>

© 2014 IEEE. Mining imbalanced data has recently received increasing attention due to its challenge and wide applications in the real world. Most of the existing work focuses on numerical data by manipulating the data structure which essentially changes the data characteristics or developing new distance or similarity measures which are designed for data with the so-called IID assumption, namely data is independent and identically distributed. This is not consistent with the real-life data and business needs, which request to fully respect the data structure and coupling relationships embedded in data objects, features and feature values. In this paper, we propose a novel coupled fuzzy similarity-based classification approach to cater for the difference between classes by a fuzzy membership and the couplings by coupled object similarity, and incorporate them into the most popular classifier: kNN to form a coupled fuzzy kNN (ie. CF-kNN). We test the approach on 14 categorical data sets compared to several kNN variants and classic classifiers including C4.5 and NaiveBayes. The experimental results show that CF-kNN outperforms the baselines, and those classifiers incorporated with the proposed coupled fuzzy similarity perform better than their original editions.

Meng, X., Cao, L. & Shao, J. 2014, 'Semantic approximate keyword query based on keyword and query coupling relationship analysis', CIKM 2014 - Proceedings of the 2014 ACM International Conference on Information and Knowledge Management, pp. 529-538.
View description>>

Due to imprecise query intention, Web database users often use a limited number of keywords that are not directly related to their precise query to search information. Semantic approximate keyword query is challenging but helpful for specifying such query intent and providing more relevant answers. By extracting the semantic relationships both between keywords and keyword queries, this paper proposes a new keyword query approach which generates semantic approximate answers by identifying a set of keyword queries from the query history whose semantics are related to the given keyword query. To capture the semantic relationships between keywords, a semantic coupling relationship analysis model is introduced to model both the intra- and inter - keyword couplings. Building on the coupling relationships between keywords, the semantic similarity of different keyword queries is then measured by a semantic matrix. The representative queries in query history are identified and then a priori order of remaining queries corresponding to each representative query in an off-line preprocessing step is created. These representative queries and associated orders are then used to expeditiously generate top-k ranked semantically related keyword queries. We demonstrate that our coupling relationship analysis model can accurately capture the semantic relationships both between keywords and queries. The efficiency of top-k keyword query selection algorithm is also demonstrated.

Wei, W., Yin, J., Li, J. & Cao, L. 2014, 'Modelling Asymmetry and Tail Dependence among Multiple Variables by Using Partial Regular Vine', Proceedings of the 2014 SIAM International Conference on Data Mining, 2014 SIAM International Conference on Data Mining, SIAM, Philadelphia, USA, pp. 776-784.
View description>>

Modeling high-dimensional dependence is widely studied to explore deep relations in multiple variables particularly useful for financial risk assessment. Very often, strong restrictions are applied on a dependence structure by existing high-dimensional dependence models. These restrictions disabled the detection of sophisticated structures such as asymmetry, upper and lower tail dependence between multiple variables. The paper proposes a partial regular vine copula model to relax these restrictions. The new model employs partial correlation to construct the regular vine structure, which is algebraically independent. This model is also able to capture the asymmetric characteristics among multiple variables by using two-parametric copula with flexible lower and upper tail dependence. Our method is tested on a cross-country stock market data set to analyse the asymmetry and tail dependence. The high prediction performance is examined by the Value at Risk, which is a commonly adopted evaluation measure in financial market. Read More: http://epubs.siam.org/doi/abs/10.1137/1.9781611973440.89

Yu, P.S., Kitsuregawa, M., Motoda, H., Goethals, B., Guo, M., Cao, L., Karypis, G., King, I. & Wang, W. 2014, 'Welcome from DSAA 2014 chairs', DSAA 2014 - Proceedings of the 2014 IEEE International Conference on Data Science and Advanced Analytics.

Chapters

Cao, L., Motoda, H., Srivastava, J., Lim, E.P., King, I., Yu, P.S., Nejdl, W., Xu, G., Li, G. & Zhang, Y. 2013, 'Preface' in Behavior and Social Computing, Springer, Germany, pp. v-vi.

Li, J., Cao, L., Wang, C., Tan, K.C. & Liu, B. 2013, 'Preface' in Li, J., Cao, L., Wang, C., Tan, K.C., Liu, B., Pei, J. & Tseng, V.S. (eds), Trends and Applicationsin Knowledge Discoveryand Data Mining: PAKDD 2013 International Workshops:DMApps, DANTH, QIMIE, BDM, CDA, CloudSDGold Coast, QLD, Australia, April 14-17, 2013Revised Selected Papers, pp. V-V.

Motoda, H., Wu, Z., Cao, L., Zaiane, O., Yao, M. & Wang, W. 2013, 'Preface' in Motoda, H., Wu, Z., Cao, L., Zaiane, O., Yao, M. & Wang, W. (eds), Advanced Data Miningand Applications: 9th International Conference, ADMA 2013Hangzhou, China, December 14-16, 2013Proceedings, Part I, pp. VI-VI.

Motoda, H., Wu, Z., Cao, L., Zaiane, O., Yao, M. & Wang, W. 2013, 'Preface' in Motoda, H., Wu, Z., Cao, L., Zaiane, O., Yao, M. & Wang, W. (eds), Advanced Data Miningand Applications: 9th International Conference, ADMA 2013Hangzhou, China, December 14-16, 2013Proceedings, Part II, Springer, pp. vi-vi.

Yu, P.S., Singh, M.P., Cao, L., Zeng, Y., Symeonidis, A.L. & Gorodetsky, V. 2013, 'Preface' in Cao, L., Zeng, Y., Symeonidis, A.L., Gorodetsky, V.I., Yu, P.S. & Singh, M.P. (eds), Agents andData Mining Interaction: 8th International Workshop, ADMI 2012Valencia, Spain, June 4-5, 2012Revised Selected Papers, pp. V-VI.

Journal articles

Cao, L. 2013, 'Combined mining: Analyzing object and pattern relations for discovering and constructing complex yet actionable patterns', Wiley Interdisciplinary Reviews-Data Mining And Knowledge Discovery, vol. 3, no. 2, pp. 140-155.
View description>>

Combined mining is a technique for analyzing object relations and pattern relations, and for extracting and constructing actionable knowledge (patterns or exceptions). Although combined patterns can be built within a single method, such as combined seque

Cao, L., Yu, P., Motoda, H. & Williams, G. 2013, 'Special issue on behavior computing', Knowledge And Information Systems, vol. 37, no. 2, pp. 245-249.
View description>>

NA

Jiang, F., Dong, D., Cao, L. & Frater, M.R. 2013, 'Agent-Based Self-Adaptable Context-Aware Network Vulnerability Assessment', IEEE Transactions on Network and Service Management, vol. 10, no. 3, pp. 255-270.
View description>>

Immunology inspired computer security has attracted enormous attention as its potential impacts on the next generation service-oriented network operation system. In this paper, we propose a new agent-based threat awareness assessment strategy inspired by the human immune system to dynamically adapt against attacks. Specifically, this approach is based on the dynamic reconfiguration of the file access right for system calls or logs (e.g., file rewritability) with balanced adaptability and vulnerability. Based on an information-theoretic analysis on the coherently associations of adaptability, autonomy as well as vulnerability, a generic solution is suggested to break down their coherent links. The principle is to maximize context-situation awared systems' adaptability and reduce systems' vulnerability simultaneously. Experimental results show the efficiency of the proposed biological behaviour-inspired vulnerability awareness system.

Liu, B., Xiao, Y., Cao, L., Hao, Z. & Deng, F. 2013, 'SVDD-based outlier detection on uncertain data', Knowledge And Information Systems, vol. 34, no. 3, pp. 597-618.
View description>>

Outlier detection is an important problem that has been studied within diverse research areas and application domains. Most existing methods are based on the assumption that an example can be exactly categorized as either a normal class or an outlier. However, in many real-life applications, data are uncertain in nature due to various errors or partial completeness. These data uncertainty make the detection of outliers far more difficult than it is from clearly separable data. The key challenge of handling uncertain data in outlier detection is how to reduce the impact of uncertain data on the learned distinctive classifier. This paper proposes a new SVDD-based approach to detect outliers on uncertain data. The proposed approach operates in two steps. In the first step, a pseudo-training set is generated by assigning a confidence score to each input example, which indicates the likelihood of an example tending normal class. In the second step, the generated confidence score is incorporated into the support vector data description training phase to construct a global distinctive classifier for outlier detection. In this phase, the contribution of the examples with the least confidence score on the construction of the decision boundary has been reduced. The experiments show that the proposed approach outperforms state-of-art outlier detection techniques.

Wang, C., Yang, J., Miao, B. & Cao, L. 2013, 'Identity tests for high dimensional data using RMT', Journal of Multivariate Analysis, vol. 118, pp. 128-137.
View description>>

In this work, we redefined two important statistics, the CLRT test [Z. Bai, D. Jiang, J. Yao, S. Zheng, Corrections to LRT on large-dimensional covariance matrix by RMT, The Annals of Statistics 37 (6B) (2009) 38223840] and the LW test [O. Ledoit, M. Wolf, Some hypothesis tests for the covariance matrix when the dimension is large compared to the sample size, The Annals of Statistics (2002) 10811102] on identity tests for high dimensional data using random matrix theories. Compared with existing CLRT and LW tests, the new tests can accommodate data which has unknown means and non-Gaussian distributions. Simulations demonstrate that the new tests have good properties in terms of size and power. What is more, even for Gaussian data, our new tests perform favorably in comparison to existing tests. Finally, we find the CLRT is more sensitive to eigenvalues less than 1 while the LW test has more advantages in relation to detecting eigenvalues larger than 1.

Wei, W., Li, J., Cao, L., Ou, Y. & Chen, J. 2013, 'Effective Detection of Sophisticated Online Banking Fraud in Extremely Imbalanced Data', World Wide Web, vol. 16, no. 4, pp. 449-475.
View description>>

Sophisticated online banking fraud reflects the integrative abuse of resources in social, cyber and physical worlds. Its detection is a typical use case of the broad-based Wisdom Web of Things (W2T) methodology. However, there is very limited information available to distinguish dynamic fraud from genuine customer behavior in such an extremely sparse and imbalanced data environment, which makes the instant and effective detection become more and more important and challenging. In this paper, we propose an effective online banking fraud detection framework that synthesizes relevant resources and incorporates several advanced data mining techniques. By building a contrast vector for each transaction based on its customerâs historical behavior sequence, we profile the differentiating rate of each current transaction against the customerâs behavior preference. A novel algorithm, ContrastMiner, is introduced to efficiently mine contrast patterns and distinguish fraudulent from genuine behavior, followed by an effective pattern selection and risk scoring that combines predictions from different models. Results from experiments on large-scale real online banking data demonstrate that our system can achieve substantially higher accuracy and lower alert volume than the latest benchmarking fraud detection system incorporating domain knowledge and traditional fraud detection methods.

Yang, W., Gao, Y. & Cao, L. 2013, 'TRASMIL: A local anomaly detection framework based on trajectory segmentation and multi-instance learning', Computer Vision And Image Understanding, vol. 117, no. 10, pp. 1273-1286.
View description>>

Local anomaly detection refers to detecting small anomalies or outliers that exist in some subsegments of events or behaviors. Such local anomalies are easily overlooked by most of the existing approaches since they are designed for detecting global or l

Yu, D., Nanda, P., Cao, L. & He, S. 2013, 'TCTM: an evaluation framework for architecture design on wireless sensor networks', International Journal of Sensor Networks, vol. 14, no. 3, pp. 168-177.
View description>>

This paper presents an evaluation framework for architecture designs on wireless sensor networks (WSNs). We introduce a simple evaluation model: triangular constraint tradeoffs model (TCTM) to grasp the essence of the architecture design consideration under transient wireless media characteristic and stringent limitation on energy and computing resource of WSNs. Based on this evaluation framework, we investigate the existing architectures proposed in literature from three main competing constraint aspects, namely generality, cost, and performance. Two important concepts: performance efficiency and deployment efficiency are identified and distinguished in overall architecture efficiency. With this powerful abstract and simple model, we describe the motivations of major body of WSNs architectures proposed in current literature. We also analyse the fundamental advantage and limitations of each class of architectures from TCTM perspective. We foresee the influence of evolving technology to futuristic architecture design. We believe our efforts will serve as a reference to orient researchers and system designers in this area

Zhou, J., Cao, L. & Yang, N. 2013, 'On the convergence of some possibilistic clustering algorithms', Fuzzy Optimization and Decision Making, vol. 12, no. 4, pp. 415-432.
View description>>

In this paper, an analysis of the convergence performance is conducted for a class of possibilistic clustering algorithms (PCAs) utilizing the Zangwill convergence theorem. It is shown that under certain conditions the iterative sequence generated by a P

Conferences

Cao, L. 2012, 'Agents and Data Mining Interaction - 8th International Workshop, ADMI 2012', Agents and Data Mining Interaction - 8th International Workshop, ADMI 2012, Springer, Valencia, Spain.
View description>>

Revised Selected Papers. Lecture Notes in Computer Science 7607, Springer 2013

Cao, W., Cao, L. & Song, Y. 2013, 'Coupled market behavior based financial crisis detection', The 2013 International Joint Conference on Neural Networks, IJCNN 2013, Dallas, TX, USA, August 4-9, 2013, IEEE, Dallas, TX, USA, pp. 1-8.
View description>>

Financial crisis detection is a long-standing challenging issue with significant practical values and impact on economy, society and globalization. The challenge lies in many aspects, in particular, the nonlinear and dynamic characteristics associated with financial crisis. Most of existing methods rely on selecting individual indicators associated with one market indicator, and the linear assumption is often behind the models for prediction. In practice, a linear assumption may be too strong to be applicable to the real market dynamics. More importantly, instruments in different markets such as gold price and petrol price are often coupled. A financial crisis may significantly change the couplings between different market indicators. In addition, such couplings in cross-market interaction are likely nonlinear. In this paper, we present a new approach for financial crisis detection by catering for the often nonlinear couplings between major indicators selected from different markets, called coupled market behavior analysis, to detect different coupled market behaviors at crisis and non-crisis periods. A Coupled Hidden Markov Model (CHMM) is built to characterize the coupled market behaviors of equity, commodity and interest markets as case studies. The empirical results show the need of catering for nonlinear couplings between various markets and the proposed approach is much more effective in capturing the coupling and nonlinear relations associated with financial crisis compared with other traditionally used approaches, such as Signal, Logistic and ANN models.

Cao, W., Wang, C. & Cao, L. 2012, 'Trading Strategy Based Portfolio Selection for Actionable Trading Agents', Agents and Data Mining Interaction - 8th International Workshop, ADMI 2012, International Workshop on Agents and Data Mining Interaction, Springer, Valencia, Spain, pp. 191-202.
View description>>

Trading agents are very useful for supporting investors in making decisions in financial markets, but the existing trading agent research focuses on simulation on artificial data. This leads to limitations in its usefulness. As for investors, how trading agents help them manipulate their assets according to their risk appetite and thus obtain a higher return is a big issue. Portfolio optimization is an approach used by many researchers to resolve this issue, but the focus is mainly on developing more accurate mathematical estimation methods, and overlooks an important factor: trading strategy. Since the global financial crisis added uncertainty to financial markets, there is an increasing demand for trading agents to be more active in providing trading strategies that will better capture trading opportunities. In this paper, we propose a new approach, namely trading strategy based portfolio selection, by which trading agents combine assets and their corresponding trading strategies to construct new portfolios, following which, trading agents can help investors to obtain the optimal weights for their portfolios according to their risk appetite. We use historical data to test our approach, the results show that it can help investors make more profit according to their risk tolerance by selecting the best portfolio in real financial markets.

Cheng, X., Miao, D., Wang, C. & Cao, L. 2013, 'Coupled term-term relation analysis for document clustering', The 2013 International Joint Conference on Neural Networks, IJCNN 2013, Dallas, TX, USA, August 4-9, 2013, The 2013 International Joint Conference on Neural Networks (IJCNN), IEEE, Dallas, TX, USA, pp. 1-8.
View description>>

Traditional document clustering approaches are usually based on the Bag of Words model, which is limited by its assumption of the independence among terms. Recent strategies have been proposed to capture the relation between terms based on statistical analysis, and they estimate the relation between terms purely by their co-occurrence across the documents. However, the implicit interactions with other link terms are overlooked, which leads to the discovery of incomplete information. This paper proposes a coupled term-term relation model for document representation, which considers both the intra-relation (i.e. co-occurrence of terms) and inter-relation (i.e. dependency of terms via link terms) between a pair of terms. The coupled relation for each pair of terms is further used to map a document onto a new feature space, which includes more semantic information. Substantial experiments verify that the document clustering incorporated with our proposed relation achieves a significant performance improvement compared to the state-of-the-art techniques.

Fariha, A., Ahmed, C.F., Leung, C.K., Abdullah, S.M. & Cao, L. 2013, 'Mining Frequent Patterns from Human Interactions in Meetings Using Directed Acyclic Graphs', Lecture Notes in Computer Science, Springer, Gold Coast, Australia, pp. 38-49.
View description>>

In modern life, interactions between human beings frequently occur in meetings, where topics are discussed. Semantic knowledge of meetings can be revealed by discovering interaction patterns from these meetings. An existing method mines interaction patterns from meetings using tree structures. However, such a tree-based method may not capture all kinds of triggering relations between interactions, and it may not distinguish a participant of a certain rank from another participant of a different rank in a meeting. Hence, the tree-based method may not be able to find all interaction patterns such as those about correlated interaction. In this paper, we propose to mine interaction patterns from meetings using an alternative data structurenamely, a directed acyclic graph (DAG). Specifically, a DAG captures both temporal and triggering relations between interactions in meetings. Moreover, to distinguish one participant of a certain rank from another, we assign weights to nodes in the DAG. As such, a meeting can be modeled as a weighted DAG, from which weighted frequent interaction patterns can be discovered. Experimental results showed the effectiveness of our proposed DAG-based method for mining interaction patterns from meetings.

Fu, B., Xu, G., Wang, Z. & Cao, L. 2013, 'Leveraging Supervised Label Dependency Propagation for Multi-label Learning', 2013 IEEE 13th International Conference on Data Mining, International Conference on Data Mining, IEEE, Dallas, TX, USA, pp. 1061-1066.
View description>>

Exploiting label dependency is a key challenge in multi-label learning, and current methods solve this problem mainly by training models on the combination of related labels and original features. However, label dependency cannot be exploited dynamically and mutually in this way. Therefore, we propose a novel paradigm of leveraging label dependency in an iterative way. Specifically, each label's prediction will be updated and also propagated to other labels via an random walk with restart process. Meanwhile, the label propagation is implemented as a supervised learning procedure via optimizing a loss function, thus more appropriate label dependency can be learned. Extensive experiments are conducted, and the results demonstrate that our method can achieve considerable improvements in terms of several evaluation metrics.

Hu, L., Cao, J., Xu, G., Cao, L., Gu, Z. & Zhu, C. 2013, 'Personalized recommendation via cross-domain triadic factorization', Proceedings of the 22nd international conference on World Wide Web WWW'13, International World Wide Web Conference, ACM, Rio de Janeiro, Brazil, pp. 595-606.
View description>>

Collaborative filtering (CF) is a major technique in recommender systems to help users find their potentially desired items. Since the data sparsity problem is quite commonly encountered in real-world scenarios, Cross-Domain Collaborative Filtering (CDCF) hence is becoming an emerging research topic in recent years. However, due to the lack of sufficient dense explicit feedbacks and even no feedback available in users' uninvolved domains, current CDCF approaches may not perform satisfactorily in user preference prediction. In this paper, we propose a generalized Cross Domain Triadic Factorization (CDTF) model over the triadic relation user-item-domain, which can better capture the interactions between domain-specific user factors and item factors. In particular, we devise two CDTF algorithms to leverage user explicit and implicit feedbacks respectively, along with a genetic algorithm based weight parameters tuning algorithm to trade off influence among domains optimally. Finally, we conduct experiments to evaluate our models and compare with other state-of-the-art models by using two real world datasets. The results show the superiority of our models against other comparative models

Hu, L., Cao, J., Xu, G., Wang, J., Gu, Z. & Cao, L. 2013, 'Cross-Domain Collaborative Filtering via Bilinear Multilevel Analysis', Proceedings of the 23rd International Joint Conference on Artificial Intelligence, International Joint Conference on Artificial Intelligence, IJCAI/AAAI, Beijing, China, pp. 2626-2632.
View description>>

Cross-domain collaborative filtering (CDCF), which aims to leverage data from multiple domains to relieve the data sparsity issue, is becoming an emerging research topic in recent years. However, current CDCF methods that mainly consider user and item factors but largely neglect the heterogeneity of domains may lead to improper knowledge transfer issues. To address this problem, we propose a novel CDCF model, the Bilinear Multilevel Analysis (BLMA), which seamlessly introduces multilevel analysis theory to the most successful collaborative filtering method, matrix factorization (MF). Specifically, we employ BLMA to more efficiently address the determinants of ratings from a hierarchical view by jointly considering domain, community, and user effects so as to overcome the issues caused by traditional MF approaches. Moreover, a parallel Gibbs sampler is provided to learn these effects. Finally, experiments conducted on a real-world dataset demonstrate the superiority of the BLMA over other state-of-the-art methods.

Li, F., Xu, G., Cao, L., Fan, X. & Niu, Z. 2013, 'CGMF: Coupled Group-Based Matrix Factorization for Recommender System', Lecture Notes in Computer Science, 14th International Conference of Web Information Systems Engineering – WISE 2013, Springer, Nanjing, China, pp. 289-298.
View description>>

With the advent of social influence, social recommender systems have become an active research topic for making recommendations based on the ratings of the users that have close social relations with the given user. The underlying assumption is that a users taste is similar to his/her friends in social networking. In fact, users enjoy different groups of items with different preferences. A user may be treated as trustful by his/her friends more on some specific rather than all groups. Unfortunately, most of the extant social recommender systems are not able to differentiate users social influence in different groups, resulting in the unsatisfactory recommendation results. Moreover, most extant systems mainly rely on social relations, but overlook the influence of relations between items. In this paper, we propose an innovative coupled group-based matrix factorization model for recommender system by leveraging the user and item groups learned by topic modeling and incorporating couplings between users and items and within users and items. Experiments conducted on publicly available data sets demonstrate the effectiveness of our approach.

Li, F., Xu, G., Cao, L., Fan, X. & Niu, Z. 2013, 'CGMF: Coupled Group-Based Matrix Factorization for Recommender System', WEB INFORMATION SYSTEMS ENGINEERING - WISE 2013, PT I, 14th International Conference on Web Information Systems Engineering (WISE), SPRINGER-VERLAG BERLIN, Nanjing, PEOPLES R CHINA, pp. 189-198.

Li, J., Wang, C., Cao, L. & Yu, P. 2013, 'Efficient Selection of Globally Optimal Rules on Large Imbalanced Data Based on Rule Coverage Relationship Analysis', Proceedings of the 13th SIAM International Conference on Data Mining, SIAM International Conference on Data Mining, SIAM, Austin, Texas, USA, pp. 216-224.
View description>>

Rule-based anomaly and fraud detection systems often suffer from massive false alerts against a huge number of enterprise transactions. A crucial and challenging problem is to effectively select a globally optimal rule set which can capture very rare anomalies dispersed in large-scale background transactions. The existing rule selection methods which suffer significantly from complex rule interactions and overlapping in large imbalanced data, often lead to very high false positive rate. In this paper, we analyze the interactions and relationships between rules and their coverage on transactions, and propose a novel metric, Max Coverage Gain. Max Coverage Gain selects the optimal rule set by evaluating the contribution of each rule in terms of overall performance to cut out those locally significant but globally redundant rules, without any negative impact on the recall. An effective algorithm, MCGminer, is then designed with a series of built-in mechanisms and pruning strategies to handle complex rule interactions and reduce computational complexity towards identifying the globally optimal rule set. Substantial experiments on 13 UCI data sets and a real time online banking transactional database demonstrate that MCGminer achieves significant improvement on both accuracy, scalability, stability and efficiency on large imbalanced data compared to several state-of-the-art rule selection techniques.

Li, W., Cao, L., Zhao, D., Cui, X. & Yang, J. 2013, 'CRNN: Integrating classification rules into neural network', The 2013 International Joint Conference on Neural Networks, IJCNN 2013, Dallas, TX, USA, August 4-9, 2013, IEEE, Dallas, TX, USA, pp. 1-8.
View description>>

Association classification has been an important type of the rule-based classification. A variety of approaches have been proposed to build a classifier based on classification rules. In the prediction stage of the extant approaches, most of the existing association classifiers use the ensemble quality measurement of each rule in a subset of rules to predict the class label of the new data. This method still suffers the following two problems. The classification rules are used individually thus the coupling relations between rules [1] are ignored in the prediction. However, in real-world rule set, rules are often inter-related and a new data object may partially satisfy many rules. Furthermore, the classification rule based prediction model lacks a general expression of the decision methodology. This paper proposes a classification method that integrating classification rules into neural network (CRNN, for short), which presents a general form of the rule based decision methodology by rule-based network. In comparison with the extant rule-based classifiers, such as C4.5, CBA, CMAR and CPAR, our approach has two advantages. First, CRNN takes the coupling relations between rules from the training data into account in the prediction step. Second, CRNN automatically obtains higher performance on the structure and parameter learning than traditional neural network. CRNN uses the linear computing algorithm in neural network instead of the costly iterative learning algorithm. Two ways of the classification rule set generation are conducted in this paper for the CRNN evaluation, and CRNN achieves the satisfactory performance.

Li, W., Zhao, D. & Cao, L. 2013, 'An Approach of Hierarchical Concept Clustering on Medical Short Text Corpus', 2013 6th International Conference on Biomedical Engineering and Informatics (BMEI 2013), 2013 6th International Conference on Biomedical Engineering and Informatics, IEEE, Hangzhou, China, pp. 509-518.
View description>>

Hierarchical clustering and conceptual clustering are two important types of clustering analysis methods. A variety of approaches have been proposed in previous works. However, seldom methods are designed to run on the medical short text database and construct a hierarchical concept taxonomy. This paper proposes a new clustering method of Hierarchical Concept Clustering on Medical Short Text corpus (HCCST), which presents a new solution on actionable disease taxonomy construction from the actual medical data. Our approach has three advantages. Firstly, HCCST takes a new similarity method which covers all the problems in medical short text distance computing. Secondly, an adaptive clustering method is proposed for synonymous disease names without predefining the size of clusters. Thirdly, this paper uses a mutual information based potential hierarchy concept pair recognition method which improves the subsumption method to create hierarchical disease taxonomy. The evaluation is conducted on Chinese medical disease name text data set and the result shows that HCCST achieves satisfactory performance.

Liu, B., Xiao, Y., Yu, P., Cao, L. & Hao, Z. 2013, 'Robust Textual Data Streams Mining Based on Continuous Transfer Learning', Proceedings of the 13th SIAM International Conference on Data Mining, SIAM International Conference on Data Mining, SIAM, Austin, Texas, USA, pp. 731-739.
View description>>

In textual data stream environment, concept drift can occur at any time, existing approaches partitioning streams into chunks can have problem if the chunk boundary does not coincide with the change point which is impossible to predict. Since concept drift can occur at any point of the streams, it will certainly occur within chunks, which is called random concept drift. The paper proposed an approach, which is called chunk level-based concept drift method (CLCD), that can overcome this chunking problem by continuously monitoring chunk characteristics to revise the classifier based on transfer learning in positive and unlabeled (PU) textual data stream environment. Our proposed approach works in three steps. In the first step, we propose core vocabulary-based criteria to justify and identify random concept drift. In the second step, we put forward the extension of LELC (PU learning by extracting likely positive and negative micro-clusters)[1], called soft-LELC, to extract representative examples from unlabeled data, and assign a confidence score to each extracted example. The assigned confidence score represents the degree of belongingness of an example towards its corresponding class. In the third step, we set up a transfer learning-based SVM to build an accurate classifier for the chunks where concept drift is identified in the first step. Extensive experiments have shown that CLCD can capture random concept drift, and outperforms state-of-the-art methods in positive and unlabeled textual data stream environments.

Liu, B., Xiao, Y., Yu, P.S., Cao, L. & Hao, Z. 2013, 'Robust textual data streams mining based on continuous transfer learning', SIAM International Conference on Data Mining 2013, SMD 2013, pp. 731-739.
View description>>

In textual data stream environment, concept drift can occur at any time, existing approaches partitioning streams into chunks can have problem if the chunk boundary does not coincide with the change point which is impossible to predict. Since concept drift can occur at any point of the streams, it will certainly occur within chunks, which is called random concept drift. The paper proposed an approach, which is called chunk level-based concept drift method (CLCD), that can overcome this chunking problem by continuously monitoring chunk characteristics to revise the classifier based on transfer learning in positive and unlabeled (PU) textual data stream environment. Our proposed approach works in three steps. In the first step, we propose core vocabulary-based criteria to justify and identify random concept drift. In the second step, we put forward the extension of LELC (PU learning by extracting likely positive and negative micro-clusters)[1], called soft-LELC, to extract representative examples from unlabeled data, and assign a confidence score to each extracted example. The assigned confidence score represents the degree of belongingness of an example towards its corresponding class. In the third step, we set up a transfer learning-based SVM to build an accurate classifier for the chunks where concept drift is identified in the first step. Extensive experiments have shown that CLCD can capture random concept drift, and outperforms state-of-the-art methods in positive and unlabeled textual data stream environments.

Song, Y., Cao, L., Yin, J. & Wang, C. 2013, 'Extracting discriminative features for identifying abnormal sequences in one-class mode', The 2013 International Joint Conference on Neural Networks, IJCNN 2013, Dallas, TX, USA, August 4-9, 2013, The 2013 International Joint Conference on Neural Networks, IEEE, Dallas, TX, USA, pp. 1-8.
View description>>

This paper presents a novel framework for detecting abnormal sequences in an one-class setting (i.e., only normal data are available), which is applicable to various domains. Examples include intrusion detection, fault detection and speaker verification. Detecting abnormal sequences with only normal data presents several challenges for anomaly detection: the weak discrimination of normal and abnormal sequences; the unavailability of the abnormal data and other issues. Traditional model-based anomaly detection techniques can solve some of the above issues but with limited discrimination power (because of directly modeling the normal data). In order to enhance the discriminative power for anomaly detection, we turn to extracting discriminative features from the generative model based on the principle deducted from the corresponding theoretical analysis. Then a new anomaly detection framework is developed on top of that. The proposed approach firstly projects all the sequential data into a model-based equal length feature space (this is theoretically proven to have better discriminative power than the model itself), and then adopts a classifier learned from the transformed data to detect anomalies. Experimental evaluation on both the synthetic and real-world data shows that our proposed approach outperforms several anomaly detection baseline algorithms for sequential data.

Song, Y., Zhang, J., Cao, L. & Sangeux, M. 2013, 'On Discovering the Correlated Relationship between Static and Dynamic Data in Clinical Gait Analysis', Lecture Notes in Computer Science, Springer, Prague, Czech Republic, pp. 563-578.
View description>>

`Gait' is a person's manner of walking. Patients may have an abnormal gait due to a range of physical impairment or brain damage. Clinical gait analysis (CGA) is a technique for identifying the underlying impairments that affect a patients gait pattern. The CGA is critical for treatment planning. Essentially, CGA tries to use patients physical examination results, known as static data, to interpret the dynamic characteristics in an abnormal gait, known as dynamic data. This process is carried out by gait analysis experts, mainly based on their experience which may lead to subjective diagnoses. To facilitate the automation of this process and form a relatively objective diagnosis, this paper proposes a new probabilistic correlated static-dynamic model (CSDM) to discover correlated relationships between the dynamic characteristics of gait and their root cause in the static data space. We propose an EMbased algorithm to learn the parameters of the CSDM. One of the main advantages of the CSDM is its ability to provide intuitive knowledge. For example, the CSDM can describe what kinds of static data will lead to what kinds of hidden gait patterns in the form of a decision tree, which helps us to infer dynamic characteristics based on static data. Our initial experiments indicate that the CSDM is promising for discovering the correlated relationship between physical examination (static) and gait (dynamic) data.

Wang, C., She, Z. & Cao, L. 2013, 'Coupled Attribute Analysis on Numerical Data', Proceedings of the 23rd International Joint Conference on Artificial Intelligence, International Joint Conference on Artificial Intelligence, IJCAI/AAAI, Beijing, China, pp. 1736-1742.
View description>>

The usual representation of quantitative data is to formalize it as an information table, which assumes the independence of attributes. In real-world data, attributes are more or less interacted and coupled via explicit or implicit relationships. Limited research has been conducted on analyzing such attribute interactions, which only describe a local picture of attribute couplings in an implicit way. This paper proposes a framework of the coupled attribute analysis to capture the global dependency of continuous attributes. Such global couplings integrate the intra-coupled interaction within an attribute (i.e. the correlations between attributes and their own powers) and inter-coupled interaction among different attributes (i.e. the correlations between attributes and the powers of others) to form a coupled representation for numerical objects by the Taylor-like expansion. This work makes one step forward towards explicitly addressing the global interactions of continuous attributes, verified by the applications in data structure analysis, data clustering, and data classification. Substantial experiments on 13 UCI data sets demonstrate that the coupled representation can effectively capture the global couplings of attributes and outperforms the traditional way, supported by statistical analysis.

Wang, C., She, Z. & Cao, L. 2013, 'Coupled clustering ensemble: Incorporating coupling relationships both between base clusterings and objects', Proceedings of the 29th IEEE International Conference on Data Engineering, IEEE International Conference on Data Engineering, IEEE, Brisbane, Australia, pp. 374-385.
View description>>

Clustering ensemble is a powerful approach for improving the accuracy and stability of individual (base) clustering algorithms. Most of the existing clustering ensemble methods obtain the final solutions by assuming that base clusterings perform independently with one another and all objects are independent too. However, in real-world data sources, objects are more or less associated in terms of certain coupling relationships. Base clusterings trained on the source data are complementary to one another since each of them may only capture some specific rather than full picture of the data. In this paper, we discuss the problem of explicating the dependency between base clusterings and between objects in clustering ensembles, and propose a framework for coupled clustering ensembles (CCE). CCE not only considers but also integrates the coupling relationships between base clusterings and between objects. Specifically, we involve both the intra-coupling within one base clustering (i.e., cluster label frequency distribution) and the inter-coupling between different base clusterings (i.e., cluster label co-occurrence dependency). Furthermore, we engage both the intra-coupling between two objects in terms of the base clustering aggregation and the inter-coupling among other objects in terms of neighborhood relationship.

Wang, X., Chen, J., Cao, L. & Meng, X. 2013, 'The Foundation of Fuzzy Rule Interchange in the Semantic Web', Workshop Proceedings of 2013 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology, IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology, IEEE, Atlanta, Georgia, USA, pp. 280-281.
View description>>

RIF (Rule Interchange Format) is a W3C's recommendation and an appropriate intermediary language for crisp (i.e., non fuzzy) rule interchange in the Semantic Web, but it is incapable of representing and interchanging fuzzy rules. Therefore, combining RIF and fuzzy sets, we propose f-RIF (fuzzy RIF), investigate its abstract syntax, concrete syntax and UML profile, and define its semantics, which lays a solid foundation for fuzzy rule interchange among heterogeneous fuzzy rule languages.

Wei, W., Li, J., Cao, L., Sun, J., Liu, C. & Li, M. 2013, 'Optimal Allocation of High Dimensional Assets through Canonical Vines', Advances in Knowledge Discovery and Data Mining: 17th Pacific-Asia Conference, PAKDD 2013, Gold Coast, Australia, April 14-17, 2013, Proceedings, Part I, Pacific-Asia Conference on Knowledge Discovery and Data Mining, Springer, Gold Coast, Australia, pp. 366-377.
View description>>

Canonical Vine, Mean Variance Criterion, Financial Return.

Yin, J., Zheng, Z., Cao, L., Song, Y. & Wei, W. 2013, 'Efficiently Mining Top-K High Utility Sequential Patterns', 2013 IEEE 13th International Conference on Data Mining, International Conference on Data Mining, IEEE, Dallas, TX, USA, pp. 1259-1264.
View description>>

High utility sequential pattern mining is an emerging topic in the data mining community. Compared to the classic frequent sequence mining, the utility framework provides more informative and actionable knowledge since the utility of a sequence indicates business value and impact. However, the introduction of "utility" makes the problem fundamentally different from the frequency-based pattern mining framework and brings about dramatic challenges. Although the existing high utility sequential pattern mining algorithms can discover all the patterns satisfying a given minimum utility, it is often difficult for users to set a proper minimum utility. A too small value may produce thousands of patterns, whereas a too big one may lead to no findings. In this paper, we propose a novel framework called top-k high utility sequential pattern mining to tackle this critical problem. Accordingly, an efficient algorithm, Top-k high Utility Sequence (TUS for short) mining, is designed to identify top-k high utility sequential patterns without minimum utility. In addition, three effective features are introduced to handle the efficiency problem, including two strategies for raising the threshold and one pruning for filtering unpromising items. Our experiments are conducted on both synthetic and real datasets. The results show that TUS incorporating the efficiency-enhanced strategies demonstrates impressive performance without missing any high utility sequential patterns

Yu, P.S., Cao, L., Ras, Z., Wong, L., Jiang, F. & Li, J. 2013, 'Preface to the 2013 international workshop on domain driven data mining', Proceedings - IEEE 13th International Conference on Data Mining Workshops, ICDMW 2013.

Yu, Y., Wang, C., Gao, Y., Cao, L. & Chen, Q. 2013, 'Erratum: A Coupled Clustering Approach for Items Recommendation'.
View description>>

The name of the 5th author has been printed incorrectly in the paper. Instead of “Xixi Chen” it should be “Qianqian Chen”.

Yu, Y., Wang, C., Gao, Y., Cao, L. & Chen, X. 2013, 'A Coupled Clustering Approach for Items Recommendation', Lecture Notes in Computer Science, Springer, Gold Coast, Australia, pp. 365-376.
View description>>

Recommender systems are very useful due to the huge volume of information available on the Web. It helps users alleviate the information overload problem by recommending users with the personalized information, products or services (called items). Collaborative filtering and content-based recommendation algorithms have been widely deployed in e-commerce web sites. However, they both suffer from the scalability problem. In addition, there are few suitable similarity measures for the content-based recommendation methods to compute the similarity between items. In this paper, we propose a hybrid recommendation algorithm by combing the content-based and collaborative filtering techniques as well as incorporating the coupled similarity. Our method firstly partitions items into several item groups by using a coupled version of k-modes clustering algorithm, where the similarity between items is measured by the Coupled Object Similarity considering coupling between items. The collaborative filtering technique is then used to produce the recommendations for active users. Experimental results show that our proposed hybrid recommendation algorithm effectively solves the scalability issue of recommender systems and provides a comparable recommendation quality when lacking most of the item features

Books

Cao, L. & Yu, P.S. 2012, Behavior computing: Modeling, analysis, mining and decision.
View description>>

© Springer-Verlag London 2012. 'Behavior' is an increasingly important concept in the scientific, societal, economic, cultural, political, military, living and virtual worlds. Behavior computing, or behavior informatics, consists of methodologies, techniques and practical tools for examining and interpreting behaviours in these various worlds. Behavior computing contributes to the in-depth understanding, discovery, applications and management of behavior intelligence. With contributions from leading researchers in this emerging field Behavior Computing: Modeling, Analysis, Mining and Decision includes chapters on: representation and modeling behaviors; behavior ontology; behaviour analysis; behaviour pattern mining; clustering complex behaviors; classification of complex behaviors; behaviour impact analysis; social behaviour analysis; organizational behaviour analysis; and behaviour computing applications. Behavior Computing: Modeling, Analysis, Mining and Decision provides a dedicated source of reference for the theory and applications of behavior informatics and behavior computing. Researchers, research students and practitioners in behavior studies, including computer science, behavioral science, and social science communities will find this state of the art volume invaluable.

Chapters

Cao, L. & Yu, P.S. 2012, 'Preface' in Cao, L. & Yu, P.S. (eds), Behavior Computing: Modeling, Analysis, Mining and Decision, pp. v-vii.

Cao, L., Srivastava, J., Williams, G. & Motoda, H. 2012, 'International Workshop on Behavior Informatics (BI 2011): PC chairs' message' in New Frontiers in Applied Data Mining: PAKDD 2011 International WorkshopsShenzhen, China, May 24-27, 2011Revised Selected Papers, pp. VII-VIII.

Wang, C. & Cao, L. 2012, 'Modeling and Analysis of Social Activity Process' in Behavior Computing Modeling, Analysis, Mining and Decision, Springer Science & Business Media, pp. 21-35.
View description>>

Behavior modeling has been increasingly recognized as a crucial means for disclosing interior driving forces and impact in social activity processes. Traditional behavior modeling in behavior and social sciences that mainly relies on qualitative methods is not aimed at deep and quantitative analysis of social activities. However, with the booming needs of understanding customer behaviors and social networks etc., there is a shortage of formal, systematic and unified behavior modeling and analysis methodologies and techniques. This paper proposes a novel and unified general framework, called Social Activity Process Modeling and Analysis System (SAPMAS). Our approach is to model social behaviors and analyze social activity processes by using model checking. More specifically, we construct behavior models from sub-models of actor, action, environment and relationship, followed by the translation from concrete properties to formal temporal logic formulae, finally obtain analyzing results with model checker SPIN. Online shopping process is illustrated to explain this whole framework.

Weiss, G., Yu, P.S., Cao, L., Bazzan, A., Symeonidis, A.L. & Gorodetsky, V. 2012, 'Message from the workshop chairs' in Agentsand Data MiningInteraction: 7th International Workshop, ADMI 2011Taipei, Taiwan, May 2-6, 2011Revised Selected Papers, pp. V-VI.

Journal articles

Cao, L. 2012, 'Actionable Knowledge Discovery And Delivery', Interdisciplinary Reviews Data Mining and Knowledge Discovery, vol. 2, no. 2, pp. 149-163.
View description>>

Actionable knowledge has been qualitatively and intensively studied in the social sciences. Its marriage with data mining is only a recent story. On the one hand, data mining has been booming for a while and has attracted an increasing variety of increas

Cao, L. 2012, 'Social Security and Social Welfare Data Mining: An Overview', IEEE Transactions On Systems Man And Cybernetics Part C-Applications And Reviews, vol. 42, no. 6, pp. 837-853.
View description>>

The importance of social security and social welfare business has been increasingly recognized in more and more countries. It impinges on a large proportion of the population and affects government service policies and peopleâs life quality. Typical welfare countries, such as Australia and Canada, have accumulated a huge amount of social security and social welfare data. Emerging business issues such as fraudulent outlays, and customer service and performance improvements challenge existing policies, as well as techniques and systems including data matching and business intelligence reporting systems. The need for a deep understanding of customers and customerâgovernment interactions through advanced data analytics has been increasingly recognized by the community at large. So far, however, no substantial work on the mining of social security and social welfare data has been reported. For the first time in data mining and machine learning, and to the best of our knowledge, this paper draws a comprehensive overall picture and summarizes the corresponding techniques and illustrations to analyze social security/welfare data, namely, social security datamining (SSDM), based on a thorough review of a large number of related references from the past half century. In particular, we introduce an SSDM framework, including business and research issues, social security/welfare services and data, as well as challenges, goals, and tasks in mining social security/welfare data. A summary of SSDM case studies is also presented with substantial citations that direct readers to more specific techniques and practices about SSDM.

Cao, L., Ou, Y. & Yu, P. 2012, 'Coupled Behavior Analysis With Applications', IEEE Transactions On Knowledge And Data Engineering, vol. 24, no. 8, pp. 1378-1392.
View description>>

Coupled behaviors refer to the activities of one to many actors who are associated with each other in terms of certain relationships. With increasing network and community-based events and applications, such as group-based crime and social network intera

Cao, L., Weiss, G. & Yu, P. 2012, 'A Brief Introduction To Agent Mining', Autonomous Agents And Multi-Agent Systems, vol. 25, no. 3, pp. 419-424.
View description>>

Agent mining is an emerging interdisciplinary area that integrates multiagent systems, data mining and knowledge discovery, machine learning and other relevant areas. It brings new opportunities to tackling issues in relevant fields more efficiently by e

Li, Z., He, Y., Cao, L., Wong, L. & Li, J. 2012, 'Conservation of water molecules in protein binding interfaces', International Journal of Bioinformatics Research and Applications, vol. 8, no. 3/4, pp. 228-244.
View description>>

The conservation of interfacial water molecules has only been studied in small data sets consisting of interfaces of a specific function. So far, no general conclusions have been drawn from largescale analysis, due to the challenges of using structural alignment in large data sets. To avoid using structural alignment, we propose a solvated sequence method to analyse water conservation properties in protein interfaces. We first use water information to label the residues, and then align interfacial residues in a fashion similar to normal sequence alignment. Our results show that, for a watercontacting interfacial residue, substituting it into hydrophobic residues tends to desolvate the local area. Surprisingly, residues with short side chains also tend not to lose their contacting water, emphasising the role of water in shaping binding sites. Deeply buried water molecules are found more conserved in terms of their contacts with interfacial residues

Melli, G., Wu, X., Beinat, P., Bonchi, F., Cao, L., Duan, R., Faloutsos, C., Ghani, R., Kitts, B., Goethals, B., McLachlan, G., Pei, J., Srivastava, A. & Zaiane, O. 2012, 'Top-10 Data Mining Case Studies', International Journal of Information Technology and Decision Making, vol. 11, no. 2, pp. 389-400.
View description>>

We report on the panel discussion held at the ICDM'10 conference on the top 10 data mining case studies in order to provide a snapshot of where and how data mining techniques have made significant real-world impact. The tasks covered by 10 case studies r

Yeh, W., Cao, L. & Jin, J. 2012, 'A Cellular Automata Hybrid Quasi-random Monte Carlo Simulation for Estimating the One-to-all Reliability of Acyclic Multistate Information Networks', International Journal Of Innovative Computing Information And Control, vol. 8, no. 3(B), pp. 2001-2014.
View description>>

Many real-world systems (such as cellular telephones and ransportation) are acyclic multi-state information networks (AMIN). These networks are composed of multi-state nodes, with different states determined by a set of nodes that receive a signal directly from these multi-state nodes, without satisfying the conservation law. Evaluating the AMIN reliability arises at the design and exploitation stage of many types of technical systems. However, existing analytical methods fail to estimate AMIN reliability in a realistic time frame, even for smaller-sized AMINs. Hence, the main purpose of this article is to present a cellular automata hybrid quasi-Monte Carlo simulation (CA-HMC) by combining cellular automata (CA, to rapidly determine network states), pseudo-random sequences (PRS, to obtain the exibility of the network) and quasi-random sequences (QRS, to improve the accuracy) to obtain a high-quality estimation of AMIN reliability in order to improve the calculation efficiency. We use one benchmark example from well-known algorithms in literature to show the utility and performance of the proposed CA-HMC simulation when evaluating the one-to-all AMIN reliability.

Yeh, W.-.C., Cao, L. & Jin, J.S. 2012, 'A CELLULAR AUTOMATA HYBRID QUASI-RANDOM MONTE CARLO SIMULATION FOR ESTIMATING THE ONE-TO-ALL RELIABILITY OF ACYCLIC MULTI-STATE INFORMATION NETWORKS', INTERNATIONAL JOURNAL OF INNOVATIVE COMPUTING INFORMATION AND CONTROL, vol. 8, no. 3B, pp. 2001-2014.

Yeh, W.C., Cao, L. & Jin, J.S. 2012, 'A cellular automata hybrid quasi-random Monte Carlo simulation for estimating the one-to-all reliability of acyclic multi-state information networks', International Journal of Innovative Computing, Information and Control, vol. 8, no. 3 B, pp. 2001-2014.
View description>>

Many real-world systems (such as cellular telephones and transportation) are acyclic multi-state information networks (AMIN). These networks are composed of multi-state nodes, with different states determined by a set of nodes that receive a signal directly from these multi-state nodes, without satisfying the conservation law. Evaluating the AMIN reliability arises at the design and exploitation stage of many types of technical systems. However, existing analytical methods fail to estimate AMIN reliability in a realistic time frame, even for smaller-sized AMINs. Hence, the main purpose of this article is to present a cellular automata hybrid quasi-Monte Carlo simulation (CA-HMC) by combining cellular automata (CA, to rapidly determine network states), pseudo-random sequences (PRS, to obtain the exibility of the network) and quasi-random sequences (QRS, to improve the accuracy) to obtain a high-quality estimation of AMIN reliability in order to improve the calculation efficiency. We use one benchmark example from well-known algorithms in literature to show the utility and performance of the proposed CA-HMC simulation when evaluating the one-to-all AMIN reliability. © 2012 ISSN 1349-4198.

Yue, X., Miao, D., Zhang, N., Cao, L. & Wu, Q. 2012, 'Multiscale Roughness Measure For Color Image Segmentation', Information Sciences, vol. 216, pp. 93-112.
View description>>

Color image segmentation is always an important technique in image processing system. Highly precise segmentation with low computation complexity can be achieved through roughness measurement which approximate the color histogram based on rough set theor

Conferences

Cao, L. 2011, 'Agents and Data Mining Interaction', 7th International Workshop, ADMI, Springer, Taipei.

Cao, L. 2011, 'New Frontiers in Applied Data Mining', PAKDD 2011 International Workshops, Springer, Shenzhen, China.

Cao, L. 2012, 'Non-iidness: Coupled object and pattern analysis', Conferences in Research and Practice in Information Technology Series, p. 5.
View description>>

© 2012, Australian Computer Society, Inc. Most of existing data mining algorithms are based on the IID assumption, which treats objects independently from each other. In the real world, objects are either loosely or tightly coupled with each other. For instance, a moving vehicle on the street interacts with the cars before and after it, and the ones on its left and right hand sides if any. In social networks, people interact with each other at different levels for varied purposes. Such interactions, or coupling relationships, are ubiquitous, and spread at various levels, between objects, between attributes describing an object, between attribute values within an attribute. It is crucial to cater for such relations in object analysis. On the other hand, the usual patterns identified by data mining are based on independent objects or items. For instance, often a large number of frequent patterns are mined by the existing algorithms, which are often treated as independent with each other. In fact, due to the object coupling relationships, patterns are associated with each other in structural and/or semantic aspects. Pattern relationship analysis is often ignored. In this talk, we will explore the needs, challenges, opportunities of analyzing complex object relations and complex pattern relations. On top of a framework for noniid-based coupled object and pattern analysis, several corresponding techniques will be introduced: Coupled object analysis to define and quantify the coupling relationships within and between objects and within and between attributes, combined pattern mining to identify a group of patterns coupled by certain relationships. Coupled behavior analysis will be explored to analyse a group of actors behaviors. We will show how such new frameworks outperform the classic iid-based data mining framework in terms of handling complex data, behavior, relation, environment and pattern in clustering, frequent pattern mining, and classification. Several real-...

Fan, X. & Cao, L. 2012, 'A Theoretical Framework of the Graph Shift Algorithm', Proceedings of the Twenty-Sixth AAAI Conference on Artificial Intelligence, ACM, Toronto, Ontario, Canada, pp. 2419-2420.
View description>>

Since no theoretical foundations for proving the convergence of Graph Shift Algorithm have been reported, we provide a generic framework consisting of three key GS components to fit the Zangwillâs convergence theorem. We show that the sequence set generated by the GS procedures always terminates at a local maximum, or at worst, contains a subsequence which converges to a local maximum of the similarity measure function. What is more, a theoretical framework is proposed to apply our proof to a more general case.

Fan, X., Cao, L., Cui, X., Zhu, L. & Ong, Y. 2012, 'Maximum Margin Clustering on Evolutionary Data', Proceedings of the 21st ACM international conference on Information and knowledge management, ACM, Maui, HI, USA, pp. 625-634.
View description>>

Evolutionary data, such as topic changing blogs and evolving trading behaviors in capital market, is widely seen in business and social applications. The time factor and intrinsic change embedded in evolutionary data greatly challenge evolutionary clustering. To incorporate the time factor, existing methods mainly regard the evolutionary clustering problem as a linear combination of snapshot cost and temporal cost, and reflect the time factor through the temporal cost. It still faces accuracy and scalability challenge though promising results gotten. This paper proposes a novel evolutionary clustering approach, evolutionary maximum margin clustering (e-MMC), to cluster large-scale evolutionary data from the maximum margin perspective. e-MMC incorporates two frameworks: Data Integration from the data changing perspective and Model Integration corresponding to model adjustment to tackle the time factor and change, with an adaptive label allocation mechanism. Three e-MMC clustering algorithms are proposed based on the two frameworks. Extensive experiments are performed on synthetic data, UCI data and real-world blog data, which confirm that e-MMC outperforms the state-of-the-art clustering algorithms in terms of accuracy, computational cost and scalability. It shows that e-MMC is particularly suitable for clustering large-scale evolving data.

Jiang, Y., Tsai, P.C., Hao, Z. & Cao, L. 2012, 'A novel auto-parameters selection process for image segmentation', IEEE Congress on Evolutionary Computation 2012, IEEE, Brisbane, Australia, pp. 1-7.
View description>>

Segmentation is a process to obtain the desirable features in image processing. However, the existing techniques that use the multilevel thresholding method in image segmentation are computationally demanding due to the lack of an automatic parameter selection process. This paper proposes an automatic parameter selection technique called an automatic multilevel thresholding algorithm using stratified sampling and Tabu Search (AMTSSTS) to remedy the limitations. It automatically determines the appropriate threshold number and values by (1) dividing an image into even strata (blocks) to extract samples; (2) applying a Tabu Search-based optimization technique on these samples to maximize the ratios of their means and variances; (3) preliminarily determining the threshold number and values based on the optimized samples; and (4) further optimizing these samples using a novel local criterion function that combines with the property of local continuity of an image. Experiments on Berkeley datasets show that AMTSSTS is an efficient and effective technique which can provide smoother results than several developed methods in recent year

Moemeng, C., Wang, C. & Cao, L. 2012, 'Obtaining an Optimal MAS Configuration for Agent-Enhanced Mining Using Constraint Optimization', Lecture Notes in Computer Science, Springer, Taipei, Taiwan, pp. 46-57.
View description>>

We investigate an interaction mechanism between agents and data mining, and focus on agent-enhanced mining. Existing data mining tools use workflow to capture user requirements. The workflow enactment can be improved with a suitable underlying execution layer, which is a Multi-Agent System (MAS). From this perspective, we propose a strategy to obtain an optimal MAS configuration from a given workflow when resource access restrictions and communication cost constraints are concerned, which is essentially a constraint optimization problem. In this paper, we show how workflow is modeled in the way that can be optimized, and how the optimized model is used to obtain an optimal MAS configuration. Finally, we demonstrate that our strategy can improve the load balancing and reduce the communication cost during the workflow enactment.

She, Z., Wang, C. & Cao, L. 2012, 'CCE: A Coupled Framework of Clustering Ensembles', Proceedings of the Twenty-Sixth AAAI Conference on Artificial Intelligence, AAAI Press, Toronto, Ontario, Canada, pp. 2455-2456.
View description>>

Clustering ensemble mainly relies on the pairwise similarity to capture the consensus function. However, it usually considers each base clustering independently, and treats the similarity measure roughly with either 0 or 1. To address these two issues, we propose a coupled framework of clustering ensembles CCE, and exemplify it with the coupled version CCSPA for CSPA. Experiments demonstrate the superiority of CCSPA over baseline approaches in terms of the clustering accuracy.

Song, Y. & Cao, L. 2012, 'Graph-based coupled behavior analysis: A case study on detecting collaborative manipulations in stock mark', The 2012 International Joint Conference on Neural Networks (IJCNN), International Joint Conference on Neural Networks, IEEE, Brisbane, Australia, pp. 1-8.
View description>>

Coupled behaviors, which refer to behaviors having some relationships between them, are usually seen in many real-world scenarios, especially in stock markets. Recently, the coupled hidden Markov model (CHMM)-based coupled behavior analysis has been proposed to consider the coupled relationships in a hidden state space. However, it requires aggregation of the behavioral data to cater for the CHMM modeling, which may overlook the couplings within the aggregated behaviors to some extent. In addition, the Markov assumption limits its capability to capturing temporal couplings. Thus, this paper proposes a novel graph-based framework for detecting abnormal coupled behaviors. The proposed framework represents the coupled behaviors in a graph view without aggregating the behavioral data and is flexible to capture richer coupling information of the behaviors (not necessarily temporal relations). On top of that, the couplings are learned via relational learning methods and an efficient anomaly detection algorithm is proposed as well. Experimental results on a real-world data set in stock markets show that the proposed framework outperforms the CHMM-based one in both technical and business measures.

Song, Y., Cao, L., Wu, X., Wei, G., Ye, W. & Ding, W. 2012, 'Coupled behavior analysis for capturing coupling relationships in group-based market manipulations', Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining, ACM SIGKDD international conference on Knowledge discovery and data mining, ACM, Beijing, pp. 976-984.
View description>>

In stock markets, an emerging challenge for surveillance is that a group of hidden manipulators collaborate with each other to manipulate the price movement of securities. Recently, the coupled hidden Markov model (CHMM)-based coupled behavior analysis (CBA) has been proposed to consider the coupling relationships in the above group-based behaviors for manipulation detection. From the modeling perspective, however, this requires overall aggregation of the behavioral data to cater for the CHMM modeling, which does not differentiate the coupling relationships presented in different forms within the aggregated behaviors and degrade the capability for further anomaly detection. Thus, this paper suggests a general CBA framework for detecting group-based market manipulation by capturing more comprehensive couplings and proposes two variant implementations, which are hybrid coupling (HC)-based and hierarchical grouping (HG)-based respectively. The proposed framework consists of three stages. The first stage, qualitative analysis, generates possible qualitative coupling relationships between behaviors with or without domain knowledge. In the second stage, quantitative representation of coupled behaviors is learned via proper methods. For the third stage, anomaly detection algorithms are proposed to cater for different application scenarios. Experimental results on data from a major Asian stock market show that the proposed framework outperforms the CHMM-based analysis in terms of detecting abnormal collaborative market manipulations. Additionally,the two different implementations are compared with their effectiveness for different application scenarios.

Wang, C., Wang, M., She, Z. & Cao, L. 2012, 'CD: A Coupled Discretization Algorithm', Lecture Notes in Computer Science, Springer, Kuala Lumpur, Malaysia, pp. 407-418.
View description>>

Discretization technique plays an important role in data mining and machine learning. While numeric data is predominant in the real world, many algorithms in supervised learning are restricted to discrete variables. Thus, a variety of research has been conducted on discretization, which is a process of converting the continuous attribute values into limited intervals. Recent work derived from entropy-based discretization methods, which has produced impressive results, introduces information attribute dependency to reduce the uncertainty level of a decision table; but no attention is given to the increment of certainty degree from the aspect of positive domain ratio. This paper proposes a discretization algorithm based on both positive domain and its coupling with information entropy, which not only considers information attribute dependency but also concerns deterministic feature relationship. Substantial experiments on extensive UCI data sets provide evidence that our proposed coupled discretization algorithm generally outperforms other seven existing methods and the positive domain based algorithm proposed in this paper, in terms of simplicity, stability, consistency, and accuracy.

Wei, W., Fan, X., Li, J. & Cao, L. 2012, 'Model the Complex Dependence Structures of Financial Variables by Using Canonical Vine', The 21st ACM International Conference on Information and Knowledge Management, The 21st ACM International Conference on Information and Knowledge Management (CIKM2012), Springer, Maui, Hawaii, USA, pp. 1382-1391.
View description>>

Financial variables such as asset returns in the massive market contain various hierarchical and horizontal relationships forming complicated dependence structures. Modeling and mining of these structures is challenging due to their own high structural complexities as well as the stylized facts of the market data. This paper introduces a new canonical vine dependence model to identify the asymmetric and non-linear dependence structures of asset returns without any prior independence assumptions. To simplify the model while maintaining its merit, a partial correlation based method is proposed to optimize the canonical vine. Compared with the original canonical vine, the new model can still maintain the most important dependence but many unimportant nodes are removed to simplify the canonical vine structure. Our model is applied to construct and analyze dependence structures of European stocks as case studies. Its performance is evaluated by measuring portfolio of Value at Risk, a widely used risk management measure. In comparison to a very recent canonical vine model and the `full' model, our experimental results demonstrate that our model has a much better quality of Value at Risk, providing insightful knowledge for investors to control and reduce the aggregation risk of the portfolio.

Yin, J., Zheng, Z. & Cao, L. 2012, 'USpan: an efficient algorithm for mining high utility sequential patterns', Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM, Beijing, pp. 660-668.
View description>>

Sequential pattern mining plays an important role in many applications, such as bioinformatics and consumer behavior analysis. However, the classic frequency-based framework often leads to many patterns being identified, most of which are not informative enough for business decision-making. In frequent pattern mining, a recent effort has been to incorporate utility into the pattern selection framework, so that high utility (frequent or infrequent) patterns are mined which address typical business concerns such as dollar value associated with each pattern. In this paper, we incorporate utility into sequential pattern mining, and a generic framework for high utility sequence mining is defined. An efficient algorithm, USpan, is presented to mine for high utility sequential patterns. In USpan, we introduce the lexicographic quantitative sequence tree to extract the complete set of high utility sequences and design concatenation mechanisms for calculating the utility of a node and its children with two effective pruning strategies. Substantial experiments on both synthetic and real datasets show that USpan efficiently identifies high utility sequences from large scale data with very low minimum utility.

Zhu, L., Cao, L. & Yang, J. 2012, 'Multiobjective evolutionary algorithm-based soft subspace clustering', IEEE Congress on Evolutionary Computation 2012, IEEE Congress on Evolutionary Computation, IEEE, Brisbane, Australia, pp. 1-8.
View description>>

In this paper, a multiobjective evolutionary algorithm based soft subspace clustering, MOSSC, is proposed to simultaneously optimize the weighting within-cluster compactness and weighting between-cluster separation incorporated within two different clustering validity criteria. The main advantage of MOSSC lies in the fact that it effectively integrates the merits of soft subspace clustering and the good properties of the multiobjective optimization-based approach for fuzzy clustering. This makes it possible to avoid trapping in local minima and thus obtain more stable clustering results. Substantial experimental results on both synthetic and real data sets demonstrate that MOSSC is generally effective in subspace clustering and can achieve superior performance over existing state-of-the-art soft subspace clustering algorithms

Journal articles

Cao, L., Zhang, H., Zhao, Y., Luo, D. & Zhang, C. 2011, 'Combined Mining: Discovering Informative Knowledge in Complex Data', IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, vol. 41, no. 3, pp. 699-712.
View description>>

Enterprise data mining applications often involve complex data such as multiple large heterogeneous data sources, user preferences, and business impact. In such situations, a single method or one-step mining is often limited in discovering informative knowledge. It would also be very time and space consuming, if not impossible, to join relevant large data sources for mining patterns consisting of multiple aspects of information. It is crucial to develop effective approaches for mining patterns combining necessary information from multiple relevant business lines, catering for real business settings and decision-making actions rather than just providing a single line of patterns. The recent years have seen increasing efforts on mining more informative patterns, e.g., integrating frequent pattern mining with classifications to generate frequent pattern-based classifiers. Rather than presenting a specific algorithm, this paper builds on our existing works and proposes combined mining as a general approach to mining for informative patterns combining components from either multiple data sets or multiple features or by multiple methods on demand. We summarize general frameworks, paradigms, and basic processes for multifeature combined mining, multisource combined mining, and multimethod combined mining. Novel types of combined patterns, such as incremental cluster patterns, can result from such frameworks, which cannot be directly produced by the existing methods. A set of real-world case studies has been conducted to test the frameworks, with some of them briefed in this paper. They identify combined patterns for informing government debt prevention and improving government service objectives, which show the flexibility and instantiation capability of combined mining in discovering informative knowledge in complex data.

Yang, T., Kecman, V., Cao, L., Zhang, C. & Huang, J. 2011, 'Margin-Based Ensemble Classifier For Protein Fold Recognition', Expert Systems with Applications, vol. 38, no. 10, pp. 12348-12355.
View description>>

Recognition of protein folding patterns is an important step in protein structure and function predictions. Traditional sequence similarity-based approach fails to yield convincing predictions when proteins have low sequence identities, while the taxonom

Conferences

Cao, L. 2011, 'Advances in Knowledge Discovery and Data Mining 15th Pacific-Asia Conference proceedings part 1', PAKDD 2011, Springer, China.

Cao, L. 2011, 'Advances in Knowledge Discovery and Data Mining 15th Pacific-Asia Conference proceedings part 2', PAKDD 2011, Springer, China.

Cao, L. 2011, 'Proceedings of the 2011 IEEE/WIC/ACM International Conference on Intelligent Agent Technology, IAT 2011', Proceedings of the 2011 IEEE/WIC/ACM International Conference on Intelligent Agent Technology, IAT 2011, IAT 2011, ieee computer society, lyon, france.

Dong, X., Zheng, Z., Cao, L., Zhao, Y., Zhang, C., Li, J., Wei, W. & Ou, Y. 2011, 'e-NSP: efficient negative sequential pattern mining based on identified positive patterns without database rescanning', Proceedings of the 20th ACM International Conference on Information and Knowledge Management, ACM international conference on Information and knowledge management, ACM, Glasgow, Scotland, UK, pp. 825-830.
View description>>

Mining Negative Sequential Patterns (NSP) is much more challenging than mining Positive Sequential Patterns (PSP) due to the high computational complexity and huge search space required in calculating Negative Sequential Candidates (NSC). Very few approaches are available for mining NSP, which mainly rely on re-scanning databases after identifying PSP. As a result, they are very ine?cient. In this paper, we propose an e?cient algorithm for mining NSP, called e-NSP, which mines for NSP by only involving the identi?ed PSP, without re-scanning databases. First, negative containment is de?ned to determine whether or not a data sequence contains a negative sequence. Second, an e?cient approach is proposed to convert the negative containment problem to a positive containment problem. The supports of NSC are then calculated based only on the corresponding PSP. Finally, a simple but e?cient approach is proposed to generate NSC. With e-NSP, mining NSP does not require additional database scans, and the existing PSP mining algorithms can be integrated into e-NSP to mine for NSP e?ciently. eNSP is compared with two currently available NSP mining algorithms on 14 synthetic and real-life datasets. Intensive experiments show that e-NSP takes as little as 3% of the runtime of the baseline approaches and is applicable for efficient mining of NSP in large datasets.

Liu, B., Xiao, Y., Cao, L. & Yu, P. 2011, 'One-class-based uncertain data stream learning', Proceedings of the Eleventh SIAM International Conference on Data Mining, SIAM International Conference on Data Mining, SDM, Arizona, pp. 992-1003.
View description>>

This paper presents a novel approach to one-class-based uncertain data stream learning. Our proposed approach works in three steps. Firstly, we put forward a local kerneldensity-based method to generate a bound score for each instance, which re?nes the location of the corresponding instance. Secondly, we construct an uncertain one-class classi?er by incorporating the generated bound score into a one-class SVM-based learning phase. Thirdly, we devise an ensemble classi?er, integrated from uncertain one-class classi?ers built on the current and historical chunks, to cope with the concept drift involved in the uncertain data stream environment. Our proposed method explicitly handles the uncertainty of the input data and enhances the ability of oneclass learning in reducing the sensitivity to noise. Extensive experiments on uncertain data streams demonstrate that our proposed approach can achieve better performance and is highly robust to noise in comparison with state-of-the-art one-class learning method.

Wang, C., Cao, L., Li, J., Wei, W., Ou, Y. & Wang, M. 2011, 'Coupled Nominal Similarity in Unsupervised Learning', Proceedings of the 20th ACM international conference on Information and knowledge management, ACM international conference on Information and knowledge management, ACM, Glasgow, UK, pp. 973-978.
View description>>

The similarity between nominal objects is not straightforward, especially in unsupervised learning. This paper proposes coupled similarity metrics for nominal objects, which consider not only intra-coupled similarity within an attribute (i.e., value frequency distribution) but also inter-coupled similarity between attributes (i.e. feature dependency aggregation). Four metrics are designed to calculate the inter-coupled similarity between two categorical values by considering their relationships with other attributes. The theoretical analysis reveals their equivalent accuracy and superior efficiency based on intersection against others, in particular for large-scale data. Substantial experiments on extensive UCI data sets verify the theoretical conclusions. In addition, experiments of clustering based on the derived dissimilarity metrics show a significant performance improvement.

Xiao, Y., Liu, B., Yin, J., Cao, L., Zhang, C. & Hao, Z. 2011, 'Similarity-Based Approach for Positive and Unlabeled Learning', Proceedings of the 22nd International Joint Conference on Artificial Intelligence, International Joint Conference on Artificial Intelligence, AAAI Press, Barcelona, Catalonia, Spain, pp. 1577-1582.
View description>>

Positive and unlabelled learning (PU learning) has been investigated to deal with the situation where only the positive examples and the unlabelled examples are available. Most of the previous works focus on identifying some negative examples from the unlabelled data, so that the supervised learning methods can be applied to build a classifier. However, for the remaining unlabelled data, which can not be explicitly identified as positive or negative (we call them ambiguous examples), they either exclude them from the training phase or simply enforce them to either class. Consequently, their performance may be constrained. This paper proposes a novel approach, called similarity-based PU learning (SPUL) method, by associating the ambiguous examples with two similarity weights, which indicate the similarity of an ambiguous example towards the positive class and the negative class, respectively. The local similarity-based and global similarity-based mechanisms are proposed to generate the similarity weights. The ambiguous examples and their similarity-weights are thereafter incorporated into an SVM-based learning phase to build a more accurate classifier. Extensive experiments on real-world datasets have shown that SPUL outperforms state-of-the-art PU learning

Zhu, L., Cao, L. & Yang, J. 2011, 'Soft subspace clustering with competitive agglomeration', IEEE International Conference on Fuzzy Systems 2011, IEEE International Conference on Fuzzy Systems, IEEE, Taipei, pp. 691-698.
View description>>

In this paper, two novel soft subspace clustering algorithms, namely fuzzy weighting subspace clustering with competitive agglomeration (FWSCA) and entropy weighting subspace clustering with competitive agglomeration (EWSCA), are proposed to overcome the problems of the unknown number of clusters and the initialization of prototypes for soft subspace clustering. The main advantage of FWSCA and EWSCA lies in the fact that they effectively integrate the merits of soft subspace clustering and the good properties of fuzzy clustering with competitive agglomeration. This makes it possible to obtain the appropriate number of clusters during the clustering progress. Moreover, FWSCA and EWSCA algorithms can converge regardless of the initial number of clusters and initialization. Substantial experimental results on both synthetic and real data sets demonstrate the effectiveness of FWSCA and EWSCA in addressing the two problems

Books

Cao, L., Yu, P., Zhang, C. & Zhao, Y. 2010, Domain Driven Data Mining, 1, Springer, New York, USA.
View description>>

* Bridges the gap between business expectations and research output * Includes techniques, methodologies and case studies in real-life enterprise DM * Addresses new areas such as blog mining In the present thriving global economy a need has evolved for complex data analysis to enhance an organizations production systems, decision-making tactics, and performance. In turn, data mining has emerged as one of the most active areas in information technologies. Domain Driven Data Mining offers state-of the-art research and development outcomes on methodologies, techniques, approaches and successful applications in domain driven, actionable knowledge discovery.

Chapters

Tsai, P., Tran, T.P. & Cao, L. 2010, 'A New Multimodal Biometric for Personal Identification' in Herout, A. (ed), Pattern Recognition Recent Advances, InTech, pp. 341-366.

Weiss, G., Yu, P.S., Cao, L., Bazzan, A., Mitkas, P.A. & Gorodetsky, V. 2010, 'Message from the Workshop Chairs' in Cao, L., Bazzan, A.L.C., Gorodetsky, V., Mitkas, P.A., Weiss, G. & Yu, P.S. (eds), Agentsand Data MiningInteraction: 6th International Workshop on Agentsand Data Mining Interaction, ADMI 2010Toronto, ON, Canada, May 11, 2010Revised Selected Papers, pp. v-vi.

Zhang, H., Zhao, Y., Cao, L., Zhang, C. & Bohlscheid, H. 2010, 'Rare class association rule mining with multiple imbalanced attributes' in Koh, Y.S. & Rountree, N. (eds), Rare Association Rule Mining and Knowledge Discovery: Technologies for Infrequent and Critical Event, IGI Global, Hershey, Pennsylvania, pp. 66-75.
View description>>

In this chapter, the authors propose a novel framework for rare class association rule mining. In each class association rule, the right-hand is a target class while the left-hand may contain one or more attributes. This algorithm is focused on the multiple imbalanced attributes on the left-hand. In the proposed framework, the rules with and without imbalanced attributes are processed in parallel. The rules without imbalanced attributes are mined through a standard algorithm while the rules with imbalanced attributes are mined based on newly defined measurements. Through simple transformation, these measurements can be in a uniform space so that only a few parameters need to be specified by user. In the case study, the proposed algorithm is applied in the social security field. Although some attributes are severely imbalanced, rules with a minority of imbalanced attributes have been mined efficiently.

Journal articles

Cao, L. 2010, 'Domain-Driven Data Mining: Challenges and Prospects', IEEE Transactions On Knowledge And Data Engineering, vol. 22, no. 6, pp. 755-769.
View description>>

Traditional data mining research mainly focus]es on developing, demonstrating, and pushing the use of specific algorithms and models. The process of data mining stops at pattern identification. Consequently, a widely seen fact is that 1) many algorithms have been designed of which very few are repeatable and executable in the real world, 2) often many patterns are mined but a major proportion of them are either commonsense or of no particular interest to business, and 3) end users generally cannot easily understand and take them over for business use. In summary, we see that the findings are not actionable, and lack soft power in solving real-world complex problems. Thorough efforts are essential for promoting the actionability of knowledge discovery in real-world smart decision making. To this end, domain-driven data mining (D3M) has been proposed to tackle the above issues, and promote the paradigm shift from ÃÂdata-centered knowledge discoveryÃÂ to ÃÂdomain-driven, actionable knowledge delivery.ÃÂ In D3M, ubiquitous intelligence is incorporated into the mining process and models, and a corresponding problem-solving system is formed as the space for knowledge discovery and delivery. Based on our related work, this paper presents an overview of driving forces, theoretical frameworks, architectures, techniques, case studies, and open issues of D3M. We understand D3M discloses many critical issues with no thorough and mature solutions available for now, which indicates the challenges and prospects for this new topic.

Cao, L. 2010, 'In-depth behavior understanding and use: The behavior informatics approach', Information Sciences, vol. 180, no. 17, pp. 3067-3085.
View description>>

The in-depth analysis of human behavior has been increasingly recognized as a crucial means for disclosing interior driving forces, causes and impact on businesses in handling many challenging issues such as behavior modeling and analysis in virtual organizations, web community analysis, counter-terrorism and stopping crime. The modeling and analysis of behaviors in virtual organizations is an open area. Traditional behavior modeling mainly relies on qualitative methods from behavioral science and social science perspectives. On the other hand, so-called behavior analysis is actually based on human demographic and business usage data, such as churn prediction in the telecommunication industry, in which behavior-oriented elements are hidden in routinely collected transactional data. As a result, it is ineffective or even impossible to deeply scrutinize native behavior intention, lifecycle and impact on complex problems and business issues. In this paper, we propose the approach of behavior informatics (BI), in order to support explicit and quantitative behavior involvement through a conversion from source data to behavioral data, and further conduct genuine analysis of behavior patterns and impacts. BI consists of key components including behavior representation, behavioral data construction, behavior impact analysis, behavior pattern analysis, behavior simulation, and behavior presentation and behavior use. We discuss the concepts of behavior and an abstract behavioral model, as well as the research tasks, process and theoretical underpinnings of BI. Two real-world case studies are demonstrated to illustrate the use of BI in dealing with complex enterprise problems, namely analyzing exceptional market microstructure behavior for market surveillance and mining for high impact behavior patterns in social security data for governmental debt prevention.

Cao, L., Zhao, Y., Zhang, H., Luo, D., Zhang, C. & Park, E. 2010, 'Flexible Frameworks For Actionable Knowledge Discovery', IEEE Transactions On Knowledge And Data Engineering, vol. 22, no. 9, pp. 1299-1312.
View description>>

Most data mining algorithms and tools stop at the mining and delivery of patterns satisfying expected technical interestingness. There are often many patterns mined but business people either are not interested in them or do not know what follow-up actio

Xiao, Y., Liu, B., Luo, D., Cao, L., Deng, F. & Hao, Z. 2010, 'Multi-agent system for customer relationship management with SVMs tool', International Journal of Intelligent Information and Database Systems, vol. 4, no. 2, pp. 121-136.
View description>>

In this paper, we introduce multiple agents, knowledge discovery and data mining into customer relationship management (CRM) to set up the architecture of a multi-agent-based CRM system (MAB-CRM), and then use the SVMs-based approach to build up the decision support model which can classify the patterns obtained by the multiple agents into several decision levels, so that managers can pursue different decision-making activities according to the decision level of a pattern. Substantial experiments in the two-dimensional space show how the SVMs-based approach works. The practical problem from one Chinese company has been resolved by the SVMs-based approach. The results illustrate that this approach has an effective ability to learn the decision rules from the assessors' experience.

Yang, T., Kecman, V. & Cao, L. 2010, 'Classification by ALH-Fast Algorithm', Tsinghua Science and Technology, vol. 15, no. 3, pp. 275-280.
View description>>

The adaptive local hyperplane (ALH) algorithm is a very recently proposed classifier, which has been shown to perform better than many other benchmarking classifiers including support vector machine (SVM), K-nearest neighbor (KNN), linear discriminant analysis (LDA), and K-local hyperplane distance nearest neighbor (HKNN) algorithms. Although the ALH algorithm is well formulated and despite the fact that it performs well in practice, its scalability over a very large data set is limited due to the online distance computations associated with all training instances. In this paper, a novel algorithm, called ALH-Fast and obtained by combining the classification tree algorithm and the ALH, is proposed to reduce the computational load of the ALH algorithm. The experiment results on two large data sets show that the ALH-Fast algorithm is both much faster and more accurate than the ALH algorithm.

Conferences

Cao, L., Ou, Y., Yu, P. & Wei, G. 2010, 'Detecting Abnormal Coupled Sequences and Sequence Changes in Group-based Manipulative Trading Behaviors', Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data, ACM SIGKDD International Conference on Knowledge Discovery and Data, ACM, Washington DC, DC, USA, pp. 85-93.
View description>>

In capital market surveillance, an emerging trend is that a group of hidden manipulators collaborate with each other to manipulate three trading sequences: buy-orders, sell-orders and trades, through carefully arranging their prices, volumes and time, in order to mislead other investors, affect the instrument movement, and thus maximize personal benefits. If the focus is on only one of the above three sequences in attempting to analyze such hidden group based behavior, or if they are merged into one sequence as per an investor, the coupling relationships among them indicated through trading actions and their prices/volumes/times would be missing, and the resulting findings would have a high probability of mismatching the genuine fact in business. Therefore, typical sequence analysis approaches, which mainly identify patterns on a single sequence, cannot be used here. This paper addresses a novel topic, namely coupled behavior analysis in hidden groups. In particular, we propose a coupled Hidden Markov Models (HMM)-based approach to detect abnormal group-based trading behaviors. The resulting models cater for (1) multiple sequences from a group of people, (2) interactions among them, (3) sequence item properties, and (4) significant change among coupled sequences. We demonstrate our approach in detecting abnormal manipulative trading behaviors on orderbook-level stock data. The results are evaluated against alerts generated by the exchange's surveillance system from both technical and computational perspectives. It shows that the proposed coupled and adaptive HMMs outperform a standard HMM only modeling any single sequence, or the HMM combining multiple single sequences, without considering the coupling relationship. Further work on coupled behavior analysis, including coupled sequence/event analysis, hidden group analysis and behavior dynamics are very critical.

Feng, J., Wang, M., Wang, C. & Cao, L. 2010, 'Enhanced co-occurrence distances for categorical data in unsupervised learning', 2010 International Conference on Machine Learning and Cybernetics, ICMLC 2010, International Conference on Machine Learning and Cybernetics, IEEE, Qingdao, pp. 2071-2078.
View description>>

Distance metrics for categorical data play an important role in unsupervised learning such as clustering. They also dramatically affect learning accuracy and computational complexities. Recently, two co-occurrence methods, Co-occurrence Distance based on

Liu, B., Xiao, Y., Cao, L. & Yu, P. 2010, 'Orientation Distance-based Discriminative Feature Extraction for Multi-Class Classification', Proceedings of the 19th ACM Conference on Information and Knowledge Management & Co-Located Workshops (CIKM 2010), ACM Conference on Information and Knowledge Manage, ACM, Toronto, Ontario, Canada, pp. 909-918.
View description>>

Feature extraction is an effective step in data mining and machine learning. While many feature extraction methods have been proposed for clustering, classification and regression, very limited work has been done on multi-class classification problems. In fact, the accuracy of multi-class classification problems relies on well-extracted features, the modeling part aside. This paper proposes a new feature extraction method, namely extracting orientation distance-based discriminative (ODD) features, which is particularly designed for multi-class classification problems. The proposed method works in two steps. In the first step, we extend the Fisher Discriminant idea to determine more appropriate kernel function and map the input data with all classes into a feature space. In the second step, the ODD features are extracted based on the one-vs-all scheme to generate discriminative features between a pattern and each hyperplane. These newly extracted features are treated as the representative features and are further used in the subsequent classification procedure. Substantial experiments on both UCI and real-world datasets have been conducted to investigate the performance of ODD features based multi-class classification. The statistical results show that the classification accuracy based on ODD features outperforms that of the state-of-the-art feature extraction methods.

Liu, B., Xiao, Y., Cao, L. & Yu, P. 2010, 'Vote-Based LELC for Positive and Unlabeled Textual Data Streams', 2010 IEEE International Conference on Data Mining Workshops (ICDMW), IEEE International Conference on Data Mining, IEEE Computer Society Conference Publishing Services (CPS), Sydney, NSW, Australia, pp. 951-958.
View description>>

In this paper, we extend LELC (PU Learning by Extracting Likely Positive and Negative Micro-Clusters) method to cope with positive and unlabeled data streams. Our developed approach, which is called vote-based LELC, works in three steps. In the first step, we extract representative documents from unlabeled data and assign a vote score to each document. The assigned vote score reflects the degree of belongingness of an example towards its corresponding class. In the second step, the extracted representative examples, together with their vote scores, are incorporated into a learning phase to build an SVM-based classifier. In the third step, we propose the usage of an ensemble classifier to cope with concept drift involved in the textual data stream environment. Our developed approach aims at improving the performance of LELC by rendering examples to contribute differently to the construction of the classifier according to their vote scores. Extensive experiments on textual data streams have demonstrated that vote-based LELC outperforms the original LELC method.

Liu, B., Yin, J., Xiao, Y., Cao, L. & Yu, P. 2010, 'Exploiting Local Data Uncertainty to Boost Global Outlier Detection', ICDM 2010, The 10th IEEE International Conference on Data Mining, IEEE International Conference on Data Mining, IEEE, Sydney, pp. 304-313.
View description>>

This paper presents a novel hybrid approach to outlier detection by incorporating local data uncertainty into the construction of a global classifier. To deal with local data uncertainty, we introduce a confidence value to each data example in the training data, which measures the strength of the corresponding class label. Our proposed method works in two steps. Firstly, we generate a pseudo training dataset by computing a confidence value of each input example on its class label. We present two different mechanisms: kernel k-means clustering algorithm and kernel LOF-based algorithm, to compute the confidence values based on the local data behavior. Secondly, we construct a global classifier for outlier detection by generalizing the SVDD-based learning framework to incorporate both positive and negative examples as well as their associated confidence values. By integrating local and global outlier detection, our proposed method explicitly handles the uncertainty of the input data and enhances the ability of SVDD in reducing the sensitivity to noise. Extensive experiments on real life datasets demonstrate that our proposed method can achieve a better tradeoff between detection rate and false alarm rate as compared to four state-of-the-art outlier detection algorithms.

Moemeng, C., Zhu, X. & Cao, L. 2010, 'Integrating Workflow into Agent-Based Distributed Data Mining Systems', Springer, Germany, pp. 4-15.
View description>>

Agent-based workflow has been proven its potential in overcoming issues in traditional workflow-based systems, such as decentralization, organizational issues, etc. The existing data mining tools provide workflow metaphor for data mining process visualization, audition and monitoring; these are particularly useful for distributed environments. In agent-based distributed data mining (ADDM), agents are an integral part of the system and can seamlessly incorporate with workflows. We describe a mechanism to use workflow in descriptive and executable styles to incorporate between workflow generators and executors. This paper shows that agent-based workflows can improve ADDM interoperability and flexibility, and also demonstrates the concepts and implementation with a supporting the argument, a multi-agent architecture and an agent-based workflow model are demonstrated.

Moemeng, C., Zhu, X., Cao, L. & Chen, J. 2010, 'i-Analyst: An Agent-Based Distributed Data Mining Platform', Data Mining Workshops (ICDMW), 2010 IEEE International Conference on, IEEE International Conference on Data Mining, IEEE, Sydney, NSW, pp. 1404-1406.
View description>>

User-friendliness and performance are important properties of data mining and analysis tools. In this demo, we introduced an agent-based distributed data mining platform that allows users to manage and share the data-mining-related resources conveniently. Furthermore, the platform employs agents for workflow enactment in which the performance is improved with agent abilities. We also present an example to illustrate how the platform works in distributed environment. The performance is relatively competitive with non-agent approach when data is highly distributed and large.

Xiao, Y., Liu, B. & Cao, L. 2010, 'K-Farthest-Neighbors-Based Concept Boundary Determination for Support Vector Data Description', Proceedings of the 19th ACM International Conference on Information and Knowledge Management & Co-Located Workshops, ACM International Conference on Information and Knowledge Management, ACM, Toronto, Ontario, Canada,, pp. 1701-1704.
View description>>

Support vector data description (SVDD) is very useful for oneclass classification. However, it incurs high time complexity in handling large scale data. In this paper, we propose a novel and efficient method, named K-Farthest-Neighbors-based Concept Boundary Detection (KFN-CBD for short), to improve the SVDD learning efficiency on large datasets. This work is motivated by the observation that SVDD classifier is determined by support vectors (SVs), and removing the non-support vectors (non-SVs) will not change the classifier but will reduce computational costs. Our approach consists of two steps. In the first step, we propose the K-farthest-neighbors method to identify the samples around the hyper-sphere surface, which are more likely to be SVs. At the same time, a new tree search strategy of M-tree is presented to speed up the K-farthest neighbor query. In the second step, the non-SVs are eliminated from the training set, and only the identified boundary samples are used to train the SVDD classifier. By removing the non-SVs, the training time of SVDD can be substantially reduced. Extensive experiments have shown that KFN-CBDachieves around 6 times speedup compared to the standard SVDD, and obtains the comparable classification quality as the entire dataset used.

Xiao, Y., Liu, B., Cao, L., Yin, J. & Wu, X. 2010, 'SMILE: A Similarity-Based Approach for Multiple Instance Learning', 2010 IEEE 10th International Conference on Data Mining (ICDM), IEEE International Conference on Data Mining, IEEE Computer Society Conference Publishing Services (CPS), Sydney, NSW, Australia, pp. 589-598.
View description>>

Multiple instance learning (MIL) is a generalization of supervised learning which attempts to learn useful information from bags of instances. In MIL, the true labels of the instances in positive bags are not always available for training. This leads to a critical challenge, namely, handling the ambiguity of instance labels in positive bags. To address this issue, this paper proposes a novel MIL method named SMILE (Similarity-based Multiple Instance LEarning). It introduces a similarity weight to each instance in positive bag, which represents the instance similarity towards the positive and negative classes. The instances in positive bags, together with their similarity weights, are thereafter incorporated into the learning phase to build an extended SVM-based predictive classifier. Experiments on three real-world datasets consisting of 12 subsets show that SMILE achieves markedly better classification accuracy than state-of-the-art MIL methods.

Yang, T., Cao, L. & Zhang, C. 2010, 'A Novel Prototype Reduction Method for the K-Nearest Neighbor Algorithm with K >= 1', Advances in Knowledge Discovery and Data Mining - Lecture Notes in Artificial Intelligence, Pacific-Asia Conference on Knowledge Discovery and Data Mining, Springer Berlin / Heidelberg, Hyderabad, India, pp. 89-100.
View description>>

In this paper, a novel prototype reduction algorithm is proposed, which aims at reducing the storage requirement and enhancing the online speed while retaining the same level of accuracy for a K-nearest neighbor (KNN) classifier. To achieve this goal, our proposed algorithm learns the weighted similarity function for a KNN classifier by maximizing the leave-one-out cross-validation accuracy. Unlike the classical methods PW, LPD and WDNN which can only work with K>=1, our developed algorithm can work with K>=1. This flexibility allows our learning algorithm to have superior classification accuracy and noise robustness. The proposed approach is assessed through experiments with twenty real world benchmark data sets. In all these experiments, the proposed approach shows it can dramatically reduce the storage requirement and online time for KNN while having equal or better accuracy than KNN, and it also shows comparable results to several prototype reduction methods proposed in literature.

Yang, T., Kecman, V., Cao, L. & Zhang, C. 2010, 'Combining Support Vector Machines and the t-statistic for Gene Selection in DNA Microarray Data Analysis', Advances in Knowledge Discovery and Data Mining - Lecture Notes in Artificial Intelligence, Pacific-Asia Conference on Knowledge Discovery and Data Mining, Springer Berlin / Heidelberg, Hyderabad, India, pp. 55-62.
View description>>

This paper proposes a new gene selection (or feature selection) method for DNA microarray data analysis. In the method, the t-statistic and support vector machines are combined efficiently. The resulting gene selection method uses both the data intrinsic information and learning algorithm performance to measure the relevance of a gene in a DNA microarray. We explain why and how the proposed method works well. The experimental results on two benchmarking microarray data sets show that the proposed method is competitive with previous methods. The proposed method can also be used for other feature selection problems.

Yang, T., Kecman, V., Cao, L. & Zhang, C. 2010, 'Testing Adaptive Local Hyperplane for multi-class classification by double cross-validation', The 2010 International Joint Conference on Neural Networks (IJCNN), International Joint Conference on Neural Networks, IEEE, Barcelona, Spain, pp. 1-5.
View description>>

Adaptive Local Hyperplane (ALH) is a recently proposed classifier for the multi-class classification problems and it has shown encouraging performance in many pattern recognition problems. However, ALH's performance over many general classification datasets has only been tested by using a single loop of cross-validation procedure, where the whole datasets are used for both hyper-parameter determination and accuracy estimation. This procedure is appropriate for classifier performance comparison, but the produced results are likely to be optimistic for classifier accuracy estimation on new datasets. In this paper, we test the performance of ALH as well as several other benchmark classifiers by using two loops of cross-validation (a.k.a. double resampling) procedure, where the inner loop is used for hyper-parameter determination and the outer loop is used for accuracy estimation. With such a testing scheme, the classification accuracy of a tested classifier can be evaluated in a more strict way. The experimental results indicate the superior performance of the ALH classifier with respect to the traditional classifiers including Support Vector Machine (SVM), K-Nearest Neighbor (KNN), Linear Discriminant Analysis (LDA), Classification Tree (Tree) and K-local Hyperplane distance Nearest Neighbor (HKNN). These results imply that the ALH classifier might become a useful tool for the pattern recognition tasks.

Yang, Y., Cao, L. & Liu, L. 2010, 'Time-Sensitive Feature Mining for Temporal Sequence Classification', Lecture Notes in Artificial Intelligence 6230 - PRICAI 2010: Trends in Artificial Intelligence, Pacific Rim International Conference on Artificial Intelligence, Springer-Verlag Berlin Heidelberg, Daegu, Korea, pp. 315-326.
View description>>

Behavior analysis received much attention in recent year, such as customer-relationship management, social security surveillance and e-business. Discovering high impact-driven behavior patterns is important for detecting and preventing their occurrences and reducing resulting risks and losses to our society. In data mining community, researchers pay little attention to time-stamps in temporal behavior sequences (without explicitly considering inherent temporal information) during classification. In this paper, we propose a novel Temporal Feature Extraction Method - TFEM. It extracts sequential pattern features where each transition is annotated with a typical transition time (its duration or interval). Therefore it substantially enriches temporal characteristics derived from temporal sequences, yielding improvements in performances, as demonstrated by a set of experiments performed on synthetic and real-world datasets. In addition, TFEM has the merit of simplicity in implementation and its pattern-based architecture can generate human-readable results and supply clear interpretability to users. Meanwhile, it is adjustable and adaptive to userâs different configurations, allowing a tradeoff between classification accuracy and time cost.

Zhao, Y., Bohlscheid, H., Wu, S. & Cao, L. 2010, 'Less Effort, More Outcomes: Optimising Debt Recovery with Decision Trees', Data Mining Workshops (ICDMW), 2010 IEEE International Conference on, IEEE International Conference on Data Mining, IEEE Computer Society Conference Publishing Services (CPS), Sydney, NSW, Australia, pp. 655-660.
View description>>

This paper presents a real-world application of data mining techniques to optimise debt recovery in social security. The traditional method of contacting a customer for the purpose of putting in place a debt recovery schedule has been an out-bound phone call, and by and large, customers are chosen at random. This obsolete and inefficient method of selecting customers for debt recovery purposes has existed for years and in order to improve this process, decision trees were built to model debt recovery and predict the response of customers if contacted by phone. Test results on historical data show that, the built model is effective to rank customers in their likelihood of entering into a successful debt recovery repayment schedule. If contacting the top 20 per cent of customers in debt, instead of contacting all of them, approximately 50 per cent of repayments would be received.

Zheng, Z., Zhao, Y., Zuo, Z. & Cao, L. 2010, 'An Efficient GA-Based Algorithm for Mining Negative Sequential Patterns', Advances in Knowledge Discovery and Data Mining - Lecture Notes in Artificial Intelligence, Pacific-Asia Conference on Knowledge Discovery and Data Mining, Springer Berlin / Heidelberg, Hyderabad, India, pp. 262-273.
View description>>

Negative sequential pattern mining has attracted increasing concerns in recent datamining research because it considers negative relationships between itemsets, which are ignored by positive sequential pattern mining. However, the search space for mining negative patterns is much bigger than that for positive ones.When the support threshold is low, in particular, there will be huge amounts of negative candidates. This paper proposes a Genetic Algorithm (GA) based algorithm to find negative sequential patterns with novel crossover and mutation operations, which are efficient at passing good genes on to next generations without generating candidates. An effective dynamic fitness function and a pruning method are also provided to improve performance. The results of extensive experiments show that the proposed method can find negative patterns efficiently and has remarkable performance compared with some other algorithms of negative pattern mining.

Books

Cao, L. 2009, Data mining and multi-agent integration.
View description>>

Data Mining and Multi-agent Integration presents cutting-edge research, applications and solutions in data mining, and the practical use of innovative information technologies written by leading international researchers in the field. Topics examined include: Integration of multiagent applications and data mining Mining temporal patterns to improve agents behavior Information enrichment through recommendation sharing Automatic web data extraction based on genetic algorithms and regular expressions A multiagent learning paradigm for medical data mining diagnostic workbench A multiagent data mining framework Streaming data in complex uncertain environments Large data clustering A multiagent, multi-objective clustering algorithm Interactive web environment for psychometric diagnostics Anomalies detection on distributed firewalls using data mining techniques Automated reasoning for distributed and multiple source of data Video contents identification Data Mining and Multi-agent Integration is intended for students, researchers, engineers and practitioners in the field, interested in the synergy between agents and data mining. This book is also relevant for readers in related areas such as machine learning, artificial intelligence, intelligent systems, knowledge engineering, human-computer interaction, intelligent information processing, decision support systems, knowledge management, organizational computing, social computing, complex systems, and soft computing. © Springer Science+Business Media, LLC 2009. All rights reserved.

Cao, L., Yu, P.S., Zhang, C. & Zhang, H. 2009, Data mining for business applications.
View description>>

Data Mining for Business Applications presents state-of-the-art data mining research and development related to methodologies, techniques, approaches and successful applications. The contributions of this book mark a paradigm shift from "data-centered pattern mining" to "domain-driven actionable knowledge discovery (AKD)" for next-generation KDD research and applications. The contents identify how KDD techniques can better contribute to critical domain problems in practice, and strengthen business intelligence in complex enterprise applications. The volume also explores challenges and directions for future data mining research and development in the dialogue between academia and business. Part I centers on developing workable AKD methodologies, including: domain-driven data mining post-processing rules for actions domain-driven customer analytics the role of human intelligence in AKD maximal pattern-based cluster ontology mining Part II focuses on novel KDD domains and the corresponding techniques, exloring the mining of emergent areas and domains such as: social security data community security data gene sequences mental health information traditional Chinese medicine data cancer related data blog data sentiment information web data procedures moving object trajectories land use mapping higher education data flight scheduling algorithmic asset management Researchers, practitioners and university students in the areas of data mining and knowledge discovery, knowledge engineering, human-computer interaction, artificial intelligence, intelligent information processing, decision support systems, knowledge management, and KDD project management are sure to find this a practical and effective means of enhancing their understanding of and using data mining in their own projects. © 2009 Springer Science+Business Media, LLC All rights reserved.

Chapters

Cao, L. 2009, 'Actionable Knowledge Discovery' in Mehdi Khosrow-Pour (ed), Encyclopedia of Information Science and Technology, IGI Global, Hershey, PA, USA, pp. 8-13.
View description>>

Actionable knowledge discovery is selected as one of the greatest challenges (Ankerst, 2002; Fayyad, Shapiro, & Uthurusamy, 2003) of next-generation knowledge discovery in database (KDD) studies (Han & Kamber, 2006). In the existing data mining, often mined patterns are nonactionable to real user needs. To enhance knowledge actionability, domain-related social intelligence is substantially essential (Cao et al., 2006b). The involvement of domain-related social intelligence into data mining leads to domaindriven data mining (Cao & Zhang, 2006a, 2007a), which complements traditional data-centered mining methodology. Domain-related social intelligence consists of intelligence of human, domain, environment, society and cyberspace, which complements data intelligence. The extension of KDD toward domain-driven data mining involves many challenging but promising research and development issues in KDD. Studies in regard to these issues may promote the paradigm shift of KDD from data-centered interesting pattern mining to domain-driven actionable knowledge discovery, and the deployment shift from simulated data set-based to real-life data and business environment-oriented as widely predicted.

Cao, L. 2009, 'Developing Actionable Trading Strategies' in Jain, L.C. & Nguyen, N.T. (eds), Knowledge Processing and Decision Making in Agent-Based Systems, Springer, Berlin, Germany, pp. 193-215.
View description>>

Actionable trading strategies for trading agents determine the potential of the simulated models in real-life markets. The development of actionable strategies is a non-trivial task, which needs to consider real-life constraints and organizational factors in the market. In this paper, we first analyze such constraints on developing actionable trading strategies. Further we propose an actionable trading strategy development framework. These points are deployed into developing a series of actionable trading strategies through optimizing, enhancing, discovering and integrating actionable trading strategies. We demonstrate working case studies in market data. These approaches and their performance are evaluated from both technical and business perspectives. Actionable trading strategies have potential to supporting smart trading decision for brokerage firms and financial companies.

Cao, L. 2009, 'Introduction to Agent Mining Interaction and Integration' in Longbing Cao (ed), Data Mining and Multi-agent Integration, Springer, New York, USA, pp. 3-36.
View description>>

In recent years, more and morc researchers have been involved in research on both agent technology and data mining. A clear disciplinary effort has been activa ted toward removing the boundary between them, that is the interaction and integrati on be tween agent technology and data mining. We refer this to agent mining as a new area. The marriage of agents and data mining is driven by challenges faced by both communities, and the need of developing more advanced intelligence, in formation processi ng and systems. This chapter presents an overall picture of agent mining from the perspective of positioning it as an emerging area. We summarize the main driving forces, compleme ntary essence, di sci plinary framework , applications, case studies, and trends and directions, as well as brief observation on agent-driven data mining, data mining-driven agents, and mutual issues in agent mini ng. Arguably, we draw the following conclusions: (I) agent mining emerges as a new area in the scientific fam il y, (2) both agent technology and data mining can greatly benefit from agent mining, (3) it is very promising to resu lL in additional advancement in intelligent information processing and systems. However, as a new open area, there are many issues waiting for research and development from theoretical, technological and practical perspectives.

Cao, L. 2009, 'Preface' in Cao, L. (ed), Data Mining and Multi-agentIntegration, Springer, pp. v-vii.

Cao, L., Yu, P., Zhang, C. & Zhang, H. 2009, 'Introduction to Domain Driven Data Mining' in Cao, L., Yu, P.S., Zhang, C. & Zhang, H. (eds), Data Mining for Business Applications, Springer, New York, USA, pp. 3-10.
View description>>

Data Mining for Business Applications presents state-of-the-art data mining research and development related to methodologies, techniques, approaches and successful applications. The contributions of this book mark a paradigm shift from "data-centered pattern mining" to "domain-driven actionable knowledge discovery (AKD)" for next-generation KDD research and applications. The contents identify how KDD techniques can better contribute to critical domain problems in practice, and strengthen business intelligence in complex enterprise applications. The volume also explores challenges and directions for future data mining research and development in the dialogue between academia and business.

Cao, L., Yu, P.S., Zhang, C. & Zhang, H. 2009, 'Preface' in Cao, L., Yu, P.S., Zhang, C. & Zhang, H. (eds), Data Mining for Business Applications, Springer, pp. v-vi.

McNicholas, P.D. & Zhao, Y. 2009, 'Association Rules: An Overview' in Zhao, Y., Zhang, C. & Cao, L. (eds), Post-Mining of Association Rules: Techniques for Effective Knowledge Extraction, IGI Global, USA, pp. 1-10.
View description>>

Association rules present one of the most versatile techniques for the analysis of binary data, with applications in areas as diverse as retail, bioinformatics, and sociology. In this chapter, the origin of association rules is discussed along with the functions by which association rules are traditionally characterised. Following the formal definition of an association rule, these functions â support, confidence and lift â are defined and various methods of rule generation are presented, spanning 15 years of development. There is some discussion about negations and negative association rules and an analogy between association rules and 2Ã2 tables is outlined. Pruning methods are discussed, followed by an overview of measures of interestingness. Finally, the post-mining stage of the association rule paradigm is put in the context of the preceding stages of the mining process.

Wu, S., Zhao, Y., Zhang, H., Zhang, C., Cao, L. & Bohlscheid, H. 2009, 'Debt Detection in Social Security by Adaptive Sequence Classification' in Karagiannis, D. & Jin, Z. (eds), Knowledge Science, Engineering and Management, Springer, Germany, pp. 192-203.
View description>>

Debt detection is important for improving payment accuracy in social security. Since debt detection from customer transaction data can be generally modelled as a fraud detection problem, a straightforward solution is to extract features from transaction sequences and build a sequence classifier for debts. For long-running debt detections, the patterns in the transaction sequences may exhibit variation from time to time, which makes it imperative to adapt classification to the pattern variation. In this paper, we present a novel adaptive sequence classification framework for debt detection in a social security application. The central technique is to catch up with the pattern variation by boosting discriminative patterns and depressing less discriminative ones according to the latest sequence data.

Zhao, Y., Cao, L., Zhang, H. & Zhang, C. 2009, 'Data Clustering' in Ferraggine, V.E., Doorn, J.H. & Rivero, L.C. (eds), Handbook of Research on Innovations in Database Technologies and Applications: Current and Future Tr, IGI Global, USA, pp. 562-572.
View description>>

Clustering is one of the most important techniques in data mining. This chapter presents a survey of popular approaches for data clustering, including well-known clustering techniques, such as partitioning clustering, hierarchical clustering, density-based clustering and grid-based clustering, and recent advances in clustering, such as subspace clustering, text clustering and data stream clustering. The major challenges and future trends of data clustering will also be introduced in this chapter. The remainder of this chapter is organized as follows. The background of data clustering will be introduced in Section 2, including the definition of clustering, categories of clustering techniques, features of good clustering algorithms, and the validation of clustering. Section 3 will present main approaches for clustering, which range from the classic partitioning and hierarchical clustering to recent approaches of bi-clustering and semisupervised clustering. Challenges and future trends will be discussed in Section 4, followed by the conclusions in the last section.

Zhao, Y., Zhang, H., Cao, L., Bohlscheid, H., Ou, Y. & Zhang, C. 2009, 'Data Mining Applications in Social Security' in Cao, L., Yu, P.S., Zhang, C. & Zhang, H. (eds), Data Mining for Business Applications, Springer, New York, USA, pp. 81-96.
View description>>

This chapter presents four applications of data mining in social security. The first is an application of decision tree and association rules to find the demographic patterns of customers. Sequence mining is used in the second application to find activity sequence patterns related to debt occurrence. In the third application, combined association rules are mined from heterogeneous data sources to discover patterns of slow payers and quick payers. In the last application, clustering and analysis of variance are employed to check the effectiveness of a new policy.

Journal articles

Cao, L. & He, X. 2009, 'Developing actionable trading agents', Knowledge And Information Systems, vol. 18, no. 2, pp. 183-192.
View description>>

Trading agents are useful for developing and back-testing quality trading strategies to support smart trading actions in the market. However, most of the existing trading agent research oversimplifies trading strategies, and focuses on simulated ones. As a result, there exists a big gap between the deliverables and business needs when the developed strategies are deployed into the real life. Therefore, the actionable capability of developed trading agents is often very limited. This paper for the first time introduces effective approaches for optimizing and integrating multiple classes of strategies through trading agent collaboration. An integration and optimization approach is proposed to identify optimal trading strategy in each category, and further integrate optimal strategies crossing classes. Positions associated with these optimal strategies are recommended for trading agents to take actions in the market. Extensive experiments on a large quantity of real-life market data show that trading agents following the recommended strategies have great potential to obtain high benefits while low costs. This verifies that it is promising to develop trading agents toward workable and satisfying business needs.

Cao, L. & Yu, P. 2009, 'Behavior Informatics: An Informatics Perspective for Behavior Studies', IEEE Computational Intelligence Bulletin, vol. 10, no. 1, pp. 6-11.
View description>>

Behavior is increasingly recognized as a key entity in business intelligence and problem-solving. Even though behavior analysis has been extensively investigated in social sciences and behavior sciences, in which qualitative and psychological methods have been the main means, nevertheless to conduct formal representation and deep quantitative analysis it is timely to investigate behavior from the informatics perspective. This article highlights the basic framework of behavior informatics, which aims to supply methodologies, approaches, means and tools for formal behavior modeling and representation, behavioral data construction, behavior impact modeling, behavior network analysis, behavior pattern analysis, behavior presentation, management and use. Behavior informatics can greatly complement existing studies in terms of providing more formal, quantitative and computable mechanisms and tools for deep understanding and use.

Cao, L., Dai, R.W. & Zhou, M. 2009, 'Metasynthesis: M-Space, M-Interaction, and M-Computing for Open Complex Giant Systems', IEEE Transactions On Systems Man And Cyberneti..., vol. 39, no. 5, pp. 1007-1021.
View description>>

The studies of complex systems have been recognized as one of the greatest challenges for current and future science and technology. Open complex giant systems (OCGSs) are a family of specially complex systems with system complexities such as openness, human involvement, societal characteristic, and intelligence emergence. They greatly challenge multiple disciplines such as system sciences, system engineering, cognitive sciences, information systems, artificial intelligence, and computer sciences. As a result, traditional problem-solving methodologies can help deal with them but are far from a mature solution methodology. The theory of qualitative-to-quantitative metasynthesis has been proposed as a breakthrough and effective methodology for the understanding and problem solving of OCGSs. In this paper, we propose the concepts of M-Interaction, M-Space, and M-Computing which are three key components for studying OCGS and building problem-solving systems. M-Interaction forms the main problem-solving mechanism of qualitative-to-quantitative metasynthesis; M-Space is the OCGS problem-solving system embedded with M-Interactions, while M-Computing consists of engineering approaches to the analysis, design, and implementation of M-Space and M-Interaction. We discuss the theoretical framework, problem-solving process, social cognitive evolution, intelligence emergence, and pitfalls of certain types of cognitions in developing M-Space and M-Interaction from the perspectives of cognitive sciences and social cognitive interaction. These can help one understand complex systems and develop effective problem-solving methodologies.

Cao, L., Gorodetsky, V. & Mitkas, P. 2009, 'Agent Mining: The Synergy of Agents and Data Mining', IEEE Intelligent Systems, vol. 24, no. 3, pp. 64-72.
View description>>

Autonomous agents and multiagent systems (or agents) and data mining and knowledge discovery (or data mining) are two of the most active areas in information technology. Ongoing research has revealed a number of intrinsic challenges and problems facing each area, which can't be addressed solely within the confines of the respective discipline. A profound insight of bringing these two communities together has unveiled a tremendous potential for new opportunities and wider applications through the synergy of agents and data mining. With increasing interest in this synergy, agent mining is emerging as a new research field studying the interaction and integration of agents and data mining. In this paper, we give an overall perspective of the driving forces, theoretical underpinnings, main research issues, and application domains of this field, while addressing the state-of-the-art of agent mining research and development. Our review is divided into three key research topics: agent-driven data mining, data mining-driven agents, and joint issues in the synergy of agents and data mining. This new and promising field exhibits a great potential for groundbreaking work from foundational, technological and practical perspectives.

Cao, L., Gorodetsky, V. & Mitkas, P.A. 2009, 'Agents and data mining', IEEE Intelligent Systems, vol. 24, no. 3, pp. 14-15.

Cao, L., Gorodetsky, V. & Mitkas, P.A. 2009, 'Guest Editors' Introduction: Agents and Data Mining', IEEE Intelligent Systems, vol. 24, no. 3, pp. 14-15.
View description>>

On top of two active research streams, agents and data mining, a most recent and exciting trend is their interaction and integration. Agent mining has emerged as a very promising field due to its unique contributions to complementary and innovative methodologies, techniques, and applications for complex problem-solving. This editorial summarizes the structure of this special issue.

Ou, Y., Cao, L. & Zhang, C. 2009, 'Adaptive Anomaly Detection of Coupled Activity Sequences', The IEEE Intelligent Informatics Bulletin, vol. 10, no. 1, pp. 12-16.
View description>>

Many real-life applications often involve multiple sequences, which are coupled with each other. It is unreasonable to either study the multiple coupled sequences separately or simply merge them into one sequence, because the information about their interacting relationships would be lost. Furthermore, such coupled sequences also have frequently significant changes which are likely to degrade the performance of trained model. Taking the detection of abnormal trading activity patterns in stock markets as an example, this paper proposes a Hidden Markov Model-based approach to address the above two issues. Our approach is suitable for sequence analysis on multiple coupled sequences and can adapt to the significant sequence changes automatically. Substantial experiments conducted on a real dataset show that our approach is effective.

Tran, T.P., Cao, L., Tran, D. & Nguyen, C.D. 2009, 'Novel Intrusion Detection using Probabilistic Neural Network and Adaptive Boosting', International Journal of Computer Science and Information Security, vol. 6, no. 1, pp. 83-91.
View description>>

This article applies Machine Learning techniques to solve Intrusion Detection problems within computer networks. Due to complex and dynamic nature of computer networks and hacking techniques, detecting malicious activities remains a challenging task for security experts, that is, currently available defense systems suffer from low detection capability and high number of false alarms. To overcome such performance limitations, we propose a novel Machine Learning algorithm, namely Boosted Subspace Probabilistic Neural Network (BSPNN), which integrates an adaptive boosting technique and a semi parametric neural network to obtain good tradeoff between accuracy and generality. As the result, learning bias and generalization variance can be significantly minimized. Substantial experiments on KDD 99 intrusion benchmark indicate that our model outperforms other state of the art learning algorithms, with significantly improved detection accuracy, minimal false alarms and relatively small computational complexity.

Tsai, P.C., Cao, L., Hintz, T.B. & Jan, T. 2009, 'A bi-modal face recognition framework integrating facial expression with facial appearance', Pattern Recognition Letters, vol. 30, no. 12, pp. 1096-1109.
View description>>

Among many biometric characteristics, the facial biometric is considered to be the least intrusive technology that can be deployed in the real-world visual surveillance environment. However, in facial biometric, little research attention has been paid to facial expression changes. In fact, facial expression changes have often been treated as noise that would degrade the recognition performance. This paper studies an innovative viewpoint: (1) whether facial expression changes, namely facial behavior, can be positively used for face recognition or not? (2) furthermore, can facial behavior be integrated with facial appearance for assisting the extra-personal separation to enhance face recognition performance? We propose a bi-modal face recognition framework which integrates facial expression with facial appearance. Substantial experiments on multiple facial appearance and facial expression data have been conducted. Our experimental results have validated that facial behavior can play a positive role in face recognition and can assist facial appearance in extra-personal separation in multiple modalities for personal identification improvement.

Zhang, H., Zhao, Y., Cao, L., Zhang, C. & Bohlscheid, H. 2009, 'Customer Activity Sequence Classification for Debt Prevention in Social Security', Journal Of Computer Science And Technology, vol. 24, no. 6, pp. 1000-1009.
View description>>

From a data mining perspective, sequence classification is to build a classifier using frequent sequential patterns. However, mining for a complete set of sequential patterns on a large dataset can be extremely time-consuming and the large number of patterns discovered also makes the pattern selection and classifier building very time-consuming. The fact is that, in sequence classification, it is much more important to discover discriminative patterns than a complete pattern set. In this paper, we propose a novel hierarchical algorithm to build sequential classifiers using discriminative sequential patterns. Firstly, we mine for the sequential patterns which are the most strongly correlated to each target class. In this step, an aggressive strategy is employed to select a small set of sequential patterns. Secondly, pattern pruning and serial coverage test are done on the mined patterns. The patterns that pass the serial test are used to build the sub-classifier at the first level of the final classifier. And thirdly, the training samples that cannot be covered are fed back to the sequential pattern mining stage with updated parameters. This process continues until predefined interestingness measure thresholds are reached, or all samples are covered. The patterns generated in each loop form the sub-classifier at each level of the final classifier. Within this framework, the searching space can be reduced dramatically while a good classification performance is achieved. The proposed algorithm is tested in a real-world business application for debt prevention in social security area. The novel sequence classification algorithm shows the effectiveness and efficiency for predicting debt occurrences based on customer activity sequence data.

Conferences

Cao, L. 2009, 'Data Mining in Financial Markets', Advanced Data Mining and Applications, 5th International Conference, ADMA 2009, International Conference on Advanced Data Mining and Applications, Springer, Budapest, Hungary, pp. 4-4.

Cao, L., Luo, D. & Zhang, C. 2009, 'Ubiquitous Intelligence in Agent Mining', ADMI 2009, International Workshop on Agents and Data Mining Interaction, Springer, Budapest, Hungary, pp. 23-35.
View description>>

Agent mining, namely the interaction and integration of multi-agent and data mining, has emerged as a very promising research area. While many mutual issues exist in both multi-agent and data mining areas, most of them can be described in terms of or related to ubiquitous intelligence. It is certainly very important to define, specify, represent, analyze and utilize ubiquitous intelligence in agents, data mining, and agent mining. This paper presents a novel but preliminary investigation of ubiquitous intelligence in these areas. We specify five types of ubiquitous intelligence: data intelligence, human intelligence, domain intelligence, network and web intelligence, organizational intelligence, and social intelligence. We define and illustrate them, and discuss techniques for involving them into agents, data mining, and agent mining for complex problem-solving. Further investigation on involving and synthesizing ubiquitous intelligence into agents, data mining, and agent mining will lead to a disciplinary upgrade from methodological, technical and practical perspectives.

Goh, T.T., Bose, S., Ng, W.K., Cao, L. & Lee, V.C.S. 2009, 'Editorial to the Proceedings of Mobile Technologies in Enterprise Computing Systems Workshop (MTECS 2009)', Proceedings - IEEE International Enterprise Distributed Object Computing Workshop, EDOC, pp. 138-139.
View description>>

MTECS 2009 workshop Proceedings publish the state-of-the-art investigation from international researchers in pursuing and discovering new knowledge on how mobile technologies are applied in mobile enterprise computing systems. The proceedings start with a brief explanation of the motivation for the workshop and subsequently a short description of the peer-reviewed papers and a brief introduction to the workshop discussion session. We conclude with special acknowledgment to the participating authors and MTECS 2009 Program Committee members. ©2009 IEEE.

Tsai, P.C., Tran, T.P. & Cao, L. 2009, 'Expression-invariant Facial Identification', Proceedings 2009 IEEE International Conference on Systems, Man and Cybernetics, IEEE Conference on Systems, Man and Cybernetics, IEEE, San Antonio, Texas, USA, pp. 5151-5155.
View description>>

Facial identification has been recognized as most simple and non-intrusive technology that can be applied in many places. However, there are still many unsolved facial identification problems due to different intra-personal variations. In particular, when images of the databases appear at different facial expressions, most currently available facial recognition approaches encounter the expression-invariant problem in which neutral faces are difficult to be recognized. In this paper, a new approach is proposed to transform facial expressions to neutral-face like images; hence enabling image retrieval systems to robustly identify a persons face for which its learning and testing face images differ in facial expression.

Xiao, Y., Liu, B., Cao, L., Wu, X., Zhang, C., Hao, Z., Yang, F. & Cao, J. 2009, 'Multi-sphere Support Vector Data for Outliers Detection on Multi-distribution Data', Data Mining Workshops, 2009. ICDMW '09. IEEE International Conference on, IEEE International Conference on Data Mining, IEEE Computer Society Press, Miami, Florida, pp. 82-87.
View description>>

SVDD has been proved a powerful tool for outlier detection. However, in detecting outliers on multi-distribution data, namely there are distinctive distributions in the data, it is very challenging for SVDD to generate a hyper-sphere for distinguishing outliers from normal data. Even if such a hyper-sphere can be identified, its performance is usually not good enough. This paper proposes an multi-sphere SVDD approach, named MS-SVDD, for outlier detection on multi-distribution data. First, an adaptive sphere detection method is proposed to detect data distributions in the dataset. The data is partitioned in terms of the identified data distributions, and the corresponding SVDD classifiers are constructed separately. Substantial experiments on both artificial and real-world datasets have demonstrated that the proposed approach outperforms original SVDD.

Zhao, G., Xiong, Y., Cao, L., Luo, D., Su, X. & Zhu, Y. 2009, 'A Cost-Effective LSH Filter for Fast Pairwise Mining', ICDM 2009, The Ninth IEEE International Conference on Data Mining, IEEE International Conference on Data Mining, IEEE Computer Society, Miami, Florida, USA, pp. 1088-1093.
View description>>

The pairwise mining problem is to discover pairwise objects having measures greater than the user-specified minimum threshold from a collection of objects. It is essential in a large variety of database and data-mining applications. Of late, there has been increasing interest in applying a Locality-Sensitive Hashing (LSH) scheme for pairwise mining. LSH-type methods have shown themselves to be simply implementable and capable of achieving significant performance gain in running time over most exact methods. However, the present LSH-type methods still suffer from some bottlenecks, such as the curse of threshold. In this paper, we proposed a novel LSHbased method, namely Cost-effective LSH filter (Ce-LSH for short), for pairwise mining. Compared with previous LSH-type methods, it uses a lower fixed number of LSH functions and is thus more cost-effective. Substantial experiments evidence that our method gives significant improvement in running time over existing LSH-type methods and some recently reported method based on upper-bound. Experimental results also indicate that it scales well even for a relatively low minimum threshold and for a fairly small miss ratio.

Zhao, Y., Zhang, H., Cao, L., Zhang, C. & Bohlscheid, H. 2009, 'Mining Both Positive and Negative Impact-Oriented Sequential Rules from Transactional Data', Advances in Knowledge Discovery and Data Mining, 13th Pacific-Asia Conference, PAKDD 2009, Pacific-Asia Conference on Knowledge Discovery and Data Mining, Springer, Bangkok, Thailand, pp. 656-663.
View description>>

Traditional sequential pattern mining deals with positive correlation between sequential patterns only, without considering negative relationship between them. In this paper, we present a notion of impact-oriented negative sequential rules, in which the left side is a positive sequential pattern or its negation, and the right side is a predefined outcome or its negation. Impact-oriented negative sequential rules are formally defined to show the impact of sequential patterns on the outcome, and an efficient algorithm is designed to discover both positive and negative impact-oriented sequential rules. Experimental results on both synthetic data and real-life data show the efficiency and effectiveness of the proposed technique.

Zhao, Y., Zhang, H., Wu, S., Pei, J., Cao, L., Zhang, C. & Bohlscheid, H. 2009, 'Debt Detection in Social Security by Sequence Classification Using Both Positive and Negative Patterns', Machine Learning and Knowledge Discovery in Databases, European Conference, ECML PKDD 2009, European Conference on Machine Learning, Springer, Bled, Slovenia, pp. 648-663.
View description>>

Debt detection is important for improving payment accuracy in social security. Since debt detection from customer transactional data can be generally modelled as a fraud detection problem, a straightforward solution is to extract features from transaction sequences and build a sequence classifier for debts. The existing sequence classification methods based on sequential patterns consider only positive patterns. However, according to our experience in a large social security application, negative patterns are very useful in accurate debt detection. In this paper, we present a successful case study of debt detection in a large social security application. The central technique is building sequence classification using both positive and negative sequential patterns.

Zheng, Z., Zhao, Y., Zuo, Z. & Cao, L. 2009, 'Negative-GSP: An Efficient Method for Mining Negative Sequential Patterns', Proceedings of the 8th Australasian Data Mining Conference (AusDM'09): Data Mining and Analytics - Conferences in Research and Practice in Information Technology Volume 101, Australian Data Mining Conference, Australian Computer Society, Melbourne, Australia, pp. 63-67.
View description>>

Different from traditional positive sequential pattern mining, negative sequential pattern mining considers both positive and negative relationships between items. Negative sequential pattern mining doesn't necessarily follow the Apriori principle, and the searching space is much larger than positive pattern mining. Giving definitions and some constraints of negative sequential patterns, this paper proposes a new method for mining negative sequential patterns, called Negative-GSP. Negative-GSP can find negative sequential patterns effectively and efficiently by joining and pruning, and extensive experimental results show the efficiency of the method.

Books

Cao, L. & Ruwei, D. 2008, Open Complex Intelligent Systems: Fundamentals, Concepts, Analysis, Design and Implementation, 1, Posts & Telecom Press, Beijing, China.
View description>>

one of nine books in computer science selected into China Key-Book Publishing Plan in 11th Five-Years (2006-2010)

Chapters

Cao, L. & Zhang, C. 2008, 'Domain Driven Data Mining' in Taniar, D. (ed), Data Mining and Knowledge Discovery Technologies, IGI Global, USA, pp. 196-223.
View description>>

Quantitative intelligence based traditional data mining is facing grand challenges from real-world enterprise and cross-organization applications. For instance, the usual demonstration of specific algorithms cannot support business users to take actions to their advantage and needs. We think this is due to Quantitative Intelligence focused data-driven philosophy. It either views data mining as an autonomous data-driven, trial-and-error process, or only analyzes business issues in an isolated, case-by-case manner. Based on experience and lessons learnt from real-world data mining and complex systems, this article proposes a practical data mining methodology referred to as Domain-Driven Data Mining. On top of quantitative intelligence and hidden knowledge in data, domain-driven data mining aims to meta-synthesize quantitative intelligence and qualitative intelligence in mining complex applications in which human is in the loop. It targets actionable knowledge discovery in constrained environment for satisfying user preference. Domain-driven methodology consists of key components including understanding constrained environment, business-technical questionnaire, representing and involving domain knowledge, human-mining cooperation and interaction, constructing next-generation mining infrastructure, in-depth pattern mining and postprocessing, business interestingness and actionability enhancement, and loop-closed human-cooperated iterative refinement. Domain-driven data mining complements the data-driven methodology, the metasynthesis of qualitative intelligence and quantitative intelligence has potential to discover knowledge from complex systems, and enhance knowledge actionability for practical use by industry and business.

Journal articles

Cao, L. 2008, 'An integrated investment decision-support framework analysing and synthesising multidimensional market dynamics', International Journal of Intelligent Systems Technologies and Applications, vol. 4, no. 3/4, pp. 239-253.
View description>>

In stock markets, the performance of traditional technology-based investment methods is limited because such methods only take into account single-dimensional market dynamics. The paper shows how the integration of multi-dimensional dynamics can improve performance. We propose a novel three-layer integrated framework composed of Analysis, Synthesis, and Investment Decision Support. At the first layer, multi-dimensional market dynamics are identified, in which we emphasise two key aspects that previous studies have neglected: unique trends of stocks, and a two-way reflexivity relationship of investors' decisions and market reactions. At the second layer, multi-dimensional dynamics are synthesized to reflect real and potential market situations. At the third layer, a prototype integrates the functions of first two layers for investment decision support. The framework covers multi-dimensional dynamics, and incorporates the concepts and advantages of traditional investment methods. The framework is promising, and our experimental results indicated that it outperformed market baselines and single-dimensional conventional methods.

Cao, L. 2008, 'Integrating Agent, Service and Organizational Computing', International Journal of Software Engineering and Knowledge Engineering, vol. 18, no. 5, pp. 573-596.
View description>>

Engineering open complex systems is challenging because of system complexities such as openness, the involvement of organizational factors and service delivery. It cannot be handled well by the single use of existing computing techniques such as agent-based computing and service-oriented computing. Due to the intrinsic organizational characteristics and the request of service delivery, an integrative computing paradigm combining agent, service, organizational and social computing can open complex systems more effectively engineering. In this paper, we briefly introduce an integrative computing approach named OASOC for system analysis and design. It combines and complements the strengths of agent, service and organizational computing to handle the complexities of open complex systems. OASOC provides facilities for organization-oriented analysis and agent service-oriented design. It also supports transition between analysis and design. Compared with the existing approaches, our approach can (1) support service and organization that are either rarely or weakly covered by single computing methods, (2) provide effective mechanisms to integrate agent, service and organizational computing, and (3) complement the strengths of various methods. Experiences in engineering an online trading support system have further shown the workable capability of integrating agent, service and organizational computing for engineering open complex systems.

Cao, L. & Nguyen, N.T. 2008, 'Intelligence Metasynthesis and Knowledge Processing in Intelligent Systems', Journal of Universal Computer Science, vol. 14, no. 14, pp. 2256-2262.
View description>>

Intelligence and Knowledge play more and more important roles in building complex intelligent systems, for instance, intrusion detection systems, and operational analysis systems. Knowledge processing in complex intelligent systems faces new challenges from the increased number of applications and environment, such as the requirements of representing domain and human knowledge in intelligent systems, and discovering actionable knowledge on a large scale in distributed web applications. In this paper, we discuss the main challenges of, and promising approaches to, intelligence metasynthesis and knowledge processing in open complex intelligent systems. We believe (1) ubiquitous intelligence, including data intelligence, domain intelligence, human intelligence, network intelligence and social intelligence, is necessary for OCIS, which needs to be meta-synthesized; and (2) knowledge processing should pay more attention to developing innovative and workable methodologies, techniques, tools and systems for representing, modelling, transforming, discovering and servicing the uncertain, large-scale, deep, distributed, domain-oriented, human-involved, and actionable knowledge highly expected in constructing open complex intelligent systems. To this end, the meta-synthesis of ubiquitous intelligence is an appropriate way in designing complex intelligent systems. To support intelligence meta-synthesis, m-interaction can play as the working mechanism to form m-spaces as problem-solving systems. In building such m-spaces, advancement in knowledge processing is necessary.

Cao, L. & Nguyen, N.T. 2008, 'Knowledge processing in intelligent systems J.UCS special issue', Journal of Universal Computer Science, vol. 14, no. 14, p. 2255.

Cao, L. & Ou, Y. 2008, 'Market Microstructure Patterns Powering Trading and Surveillance Agents', Journal of Universal Computer Science, vol. 14, no. 14, pp. 2288-2308.
View description>>

Market Surveillance plays important mechanism roles in constructing market models. From data analysis perspective, we view it valuable for smart trading in designing legal and profitable trading strategies and smart regulation in maintaining market integrity, transparency and fairness. The existing trading pattern analysis only focuses on interday data which discloses explicit and high-level market dynamics. In the mean time, the existing market surveillance systems available from large exchanges are facing crucial challenges of diversified, dynamic, distributed and cyber-based misuse, mis-disclosure and misdealing of information, announcement and orders in one market or crossing multiple markets. Therefore, there is a crucial need to develop innovative and workable methods for smart trading and surveillance. To deal with such issues, we propose the innovative concept microstructure pattern analysis and corresponding approaches in this paper. Microstructure pattern analysis studies trading behaviour patterns of traders in market microstructure data by utilizing market microstructure knowledge. The identified market microstructure patterns are then used for powering market trading and surveillance agents for automatically detecting/designing profitable and legal trading strategies or monitoring abnormal market dynamics and traderÂs behaviour. Such trading/surveillance agent-driven market trading/surveillance systems can greatly enhance the analytical, discovery and decision-support capability of market trading/surveillance than the current predefined rule/alert-based systems.

Cao, L., Zhang, C. & Zhou, M. 2008, 'Engineering open complex agent systems: A case study', IEEE Transactions On Systems Man And Cybernetics Part C-Applications And Reviews, vol. 38, no. 4, pp. 483-496.
View description>>

Open complex agent systems (OCAS) are becoming increasingly important in constructing problem-solving systems for enterprise applications. are challenging because they present very high system complexities involving human users and interactions with a ch

Cao, L., Zhao, Y. & Zhang, C. 2008, 'Mining impact-targeted activity patterns in imbalanced data', IEEE Transactions On Knowledge And Data Engineering, vol. 20, no. 8, pp. 1053-1066.
View description>>

Impact-targeted activities are rare but they may have a significant impact on the society. For example, isolated terrorism activities may lead to a disastrous event, threatening the national security. Similar issues can also be seen in many other areas.

Cao, L., Zhao, Y., Zhang, C. & Zhang, H. 2008, 'Activity mining: From activities to actions', International Journal Of Information Technology & Decision Making, vol. 7, no. 2, pp. 259-273.
View description>>

Activity data accumulated in real life, such as terrorist activities and governmental customer contacts, present special structural and semantic complexities. Activity data may lead to or be associated with significant business impacts, and result in important actions and decision making leading to business advantage. For instance, a series of terrorist activities may trigger a disaster to society, and large amounts of fraudulent activities in social security programs may result in huge government customer debt. Uncovering these activities or activity sequences can greatly evidence and/or enhance corresponding actions in business decisions. However, mining such data challenges the existing KDD research in aspects such as unbalanced data distribution and impact-targeted pattern mining. This paper investigates the characteristics and challenges of activity data, and the methodologies and tasks of activity mining based on case-study experience in the area of social security. Activity mining aims to discover high impact activity patterns in huge volumes of unbalanced activity transactions. Activity patterns identified can be used to prevent disastrous events or improve business decision making and processes. We illustrate the above issues and prospects in mining governmental customer contacts data to recover customer debt.

Chen, W., Cao, L. & Qin, Z. 2008, 'An integrated investment decision-support framework analysing and synthesising multidimensional market dynamics', International Journal of Intelligent Systems Technologies and Applications, vol. 4, no. 3-4, pp. 239-253.
View description>>

In stock markets, the performance of traditional technology-based investment methods is limited because such methods only take into account single-dimensional market dynamics. The paper shows how the integration of multi-dimensional dynamics can improve performance. We propose a novel three-layer integrated framework composed of Analysis, Synthesis, and Investment Decision Support. At the first layer, multi-dimensional market dynamics are identified, in which we emphasize two key aspects that previous studies have neglected: unique trends of stocks, and a two-way reflexivity relationship of investors’ decisions and market reactions. At the second layer, multi-dimensional dynamics are synthesized to reflect real and potential market situations. At the third layer, a prototype integrates the functions of first two layers for investment decision support. The framework covers multi-dimensional dynamics, and incorporates the concepts and advantages of traditional investment methods. The framework is promising, and our experimental results indicated that it outperformed market baselines and single-dimensional conventional methods. © 2008 Inderscience Enterprises Ltd.

Lin, L. & Cao, L. 2008, 'Mining in-depth patterns in stock market', International Journal of Intelligent Systems Technologies and Applications, vol. 4, no. 3/4, pp. 225-238.
View description>>

Stock trading plays an important role for supporting profitable stock investment. In particular, more and more data mining-based technical trading rules have been developed and used in stock trading systems to assist investors with their smart trading decisions. However, many mined trading rules are of no interest to traders and brokers because they are discovered based on statistical significance without checking traders' interestingness concerns. To this end, this paper proposes in-depth data mining technologies to overcome the disadvantages of current data mining methods. We implement a decision support in-depth trading pattern discovery system with Robust Genetic Algorithms (RGA). The system integrates expert knowledge and considers domain constraints into the trading rule development. We further utilise this technique to mine actionable stock-rule pairs targeting behaviour with high return at low risk. The proposed approaches are tested in real stock orderbook data with varying investment strategies.

Luo, D., Cao, L., Luo, C., Zhang, C. & Wang, W. 2008, 'Towards business interestingness in actionable knowledge discovery', Frontiers in Artificial Intelligence and Applications, vol. 177, no. 1, pp. 99-109.
View description>>

From the evolution of developing a pattern interestingness perspective, data mining has experienced two phases, which are Phase 1: technical objective interestingness focused research, and Phase 2: technical objective and subjective interestingness focused studies. As a result of these efforts, patterns mined are of significant interest to technical concern. However, technically interesting patterns are not necessarily of interest to business. In fact, real-world experience shows that many mined patterns, which are interesting from the perspective of the data mining method used, are out of business expectations when they are delivered to the final user. This scenario actually involves a grand challenge in next-generation KDD (Knowledge Discovery in Databases) studies, defined as actionable knowledge discovery. To discover knowledge that can be used for taking actions to business advantages, this paper addresses a framework that extends the evolution process of knowledge evaluation to Phase 3 and Phase 4. In Phase 3, concerns with objective interestingness from a business perspective are added on top of Phase 2, while in Phase 4 both technical and business interestingness should be satisfied in terms of objective and subjective perspectives. The introduction of Phase 4 provides a comprehensive knowledge actionability framework for actionable knowledge discovery. We illustrate applications in governmental data mining showing that the considerations and adoption of the framework described in Phase 4 has potential to enhance both sides of interestingness and expectation. As a result, knowledge discovered has better chances to support action-taking in the business world. © 2008 The authors and IOS Press. All rights reserved.

Ni, J., Cao, L. & Zhang, C. 2008, 'Evolutionary optimization of trading strategies', Frontiers in Artificial Intelligence and Applications, vol. 177, no. 1, pp. 11-24.
View description>>

It is a non-trivial task to effectively and efficiently optimize trading strategies, not to mention the optimization in real-world situations. This paper presents a general definition of this optimization problem, and discusses the application of evolutionary technologies (genetic algorithm in particular) to the optimization of trading strategies. Experimental results show that this approach is promising. © 2008 The authors and IOS Press. All rights reserved.

Conferences

Cao, L. 2008, 'Behavior Informatics and Analytics: Let Behavior Talk', Proceedings of the 2008 IEEE International Conference on Data Mining Workshops, IEEE International Conference on Data Mining, IEEE Computer Society, Pisa, Italy, pp. 87-96.
View description>>

Behavior is increasingly recognized as a key component in business intelligence and problem-solving. Different from traditional behavior analysis, which mainly focus on implicit behavior and explicit business appearance as a result of business usage and customer demographics, this paper proposes the field of Behavior Informatics and Analytics (BIA), to support explicit behavior involvement through a conversion from transactional data to behavioral data, and further genuine analysis of native behavior patterns and impacts. BIA consists of key components including behavior modeling and representation, behavioral data construction, behavior impact modeling, behavior pattern analysis, and behavior presentation. BIA can greatly complement the existing means for combined, more informative and social patterns and solutions for critical problem-solving in areas such as dealing with customer-officer interaction, counterterrorism and monitoring online communities.

Cao, L. 2008, 'Domain Driven Data Mining (D3 M)', Proceedings - IEEE International Conference on Data Mining Workshops, ICDM Workshops 2008, pp. 74-76.
View description>>

In deploying data mining into the real-world business, we have to cater for business scenarios, organizational factors, user preferences and business needs. However, the current data mining algorithms and tools often stop at the delivery of patterns satisfying expected technical interestingness. Business people are not informed about how and what to do to take over the technical deliverables. The gap between academia and business has seriously affected the widespread employment of advanced data mining techniques in greatly promoting enterprise operational quality and productivity. To narrow down the gap, cater for realworld factors relevant to data mining, and make data mining workable in supporting decision-making actions in the real world, we propose the methodology of Domain Driven Data Mining (D 3 M for short). D 3 M aims to construct next-generation methodologies, techniques and tools for a possible paradigm shift from data-centered hidden pattern mining to domain-driven actionable knowledge delivery. In this talk, we address the concept map of D 3 M, theoretical underpinnings, several general and flexible frameworks, research issues, possible directions, application areas etc. related to D 3 M. Real-world case studies in financial data mining and social security mining are demonstrated to show the effectiveness and applicability of D 3 M in both research and development of real-world challenging problems. © 2008 IEEE.

Cao, L. 2008, 'Domain Driven Data Mining (D3M)', Proceedings of the 2008 IEEE International Conference on Data Mining Workshops, IEEE International Conference on Data Mining, IEEE Computer Society, Pisa, Italy, pp. 74-76.
View description>>

In deploying data mining into the real-world business, we have to cater for business scenarios, organizational factors, user preferences and business needs. However, the current data mining algorithms and tools often stop at the delivery of patterns satisfying expected technical interestingness. Business people are not informed about how and what to do to take over the technical deliverables. The gap between academia and business has seriously affected the widespread employment of advanced data mining techniques in greatly promoting enterprise operational quality and productivity. To narrow down the gap, cater for realworld factors relevant to data mining, and make data mining workable in supporting decision-making actions in the real world, we propose the methodology of Domain Driven Data Mining (D3M for short). D3M aims to construct next-generation methodologies, techniques and tools for a possible paradigm shift from data-centered hidden pattern mining to domain-driven actionable knowledge delivery. In this talk, we address the concept map of D3M, theoretical underpinnings, several general and flexible frameworks, research issues, possible directions, application areas etc. related to D3M. Real-world case studies in financial data mining and social security mining are demonstrated to show the effectiveness and applicability of D3M in both research and development of real-world challenging problems.

Cao, L. 2008, 'Metasynthetic Computing for Solving Open Complex Problems', Proceedings of the 2008 32nd Annual IEEE International Computer Software and Applications Conference, International Computer Software and Applications Conference, IEEE Computer Society, Turku, Finland, pp. 896-901.
View description>>

Complex systems, in particular, open complex giant systems have become one of major challenges to many current disciplines such as system sciences, cognitive sciences, intelligence sciences, computer sciences, and information sciences. An appropriate methodology for dealing with them is the theory of qualitative-to-quantitative metasynthesis. From the perspective of engineering, we propose the concept of metasynthetic computing. This paper discusses the theoretical framework, problem-solving process and intelligence emergence of metasynthetic computing from both engineering and cognition perspectives. These efforts can help one understand complex systems and design effective problem-solving systems.

Cao, L., Luo, C. & Zhang, C. 2008, 'Developing actionable trading strategies for trading agents', Proceedings of the IEEE/WIC/ACM International Conference on Intelligent Agent Technology, IAT 2007, pp. 72-75.
View description>>

Trading agents are very useful for developing and back-testing quality trading strategies for actions taking in the real world. However, the existing trading agent research mainly focuses on simulation using artificial data and market models. As a result, the actionable capability of developed trading strategies is often limited. In this paper, we analyze such constraints on developing actionable trading strategies for trading agents. These points are deployed into developing a series of trading strategies for trading agents through optimizing, and enhancing actionable trading strategies. We demonstrate working case studies in large-scale of market data. These approaches and their performance are evaluated from both technical and business perspectives. © 2007 IEEE.

Cao, L., Luo, D., Xiao, Y. & Zheng, Z. 2008, 'Agent Collaboration for Multiple Trading Strategy Integration', Lecture Notes in Artificial Intelligence Vol 4953: Agent and Multi-Agent Systems: Technologies and Applications, International KES Symposium on Agents and Multiagent systems - Technologies and Applications, Springer Berlin, Incheon, Korea,, pp. 361-370.
View description>>

The collaboration of agents can undertake complicated tasks that cannot be handled well by a single agent. This is even true for excecuting multiple goals at the same time. In this paper, we demonstrate the use of trading agent collaboration in integrating multiple trading strategies. Trading agents are used for developing quality trading strategies to support smart actions in the market. Evolutionary trading agents are armed with evolutionary computing capability to optimize strategy parameters. To develop even smarter trading strategies (we call golden strategies), multiple Evolutionary and Collaborative trading agents negotiate with each other for m loops to search multiple local strategies with best parameter combinations. They also integrate multiple classes of strategies for trading agents to achieve the best global objectives acceptable for trader needs. Tests of five classes of trading strategies in ten years of five markets of data have shown that agent collaboration for strategy integration can achieve much better performance of trading compared with that of either individually optimized or randomly chosen strategies.

Liu, B., Cao, L., Yu, P. & Zhang, C. 2008, 'Multi-Space-Mapped SVMs for Multi-Class Classification', 2008 Eighth IEEE International Conference on Data Mining, IEEE International Conference on Data Mining, IEEE, Pisa, Italy, pp. 911-916.
View description>>

In SVMs-based multiple classification, it is not always possible to find an appropriate kernel function to map all the classes from different distribution functions into a feature space where they are linearly separable from each other. This is even worse if the number of classes is very large. As a result, the classification accuracy is not as good as expected. In order to improve the performance of SVMs-based multi-classifiers, this paper proposes a method, named multi-space-mapped SVMs, to map the classes into different feature spaces and then classify them. The proposed method reduces the requirements for the kernel function. Substantial experiments have been conducted on One-against-All, One-against-One, FSVM, DDAG algorithms and our algorithm using six UCI data sets. The statistical results show that the proposed method has a higher probability of finding appropriate kernel functions than traditional methods and outperforms others.

Luo, C., Zhao, Y., Cao, L., Ou, Y. & Liu, L. 2008, 'Outlier Mining on Multiple Time Series Data in Stock Market', PRICAI 2008: Trends in Artificial Intelligence, Pacific Rim International Conference on Artificial Intelligence, Springer, Hanoi, Vietnam, pp. 1010-1015.
View description>>

In stock market, the key surveillance function is identifying market anomalies, such as insider trading and market manipulation, to provide a fair and efficient trading platform [2,6]. Insider trading refers to the trades on privileged information unavailable to the public [8]. Market manipulation refers to the trade or action which aims to interfere with the demand or supply of a given stock to make the price increase or decrease in a particular way [3]. Recently, new intelligent technologies are required to deal with the challenges of the rapid increase of stock data. Outlier mining technologies have been used to detect market manipulation and insider trading . The objective of outlier mining is to find the data objects which are grossly different from or inconsistent with the majority of data. However, in stock market data, outliers are highly intermixed with normal data [4] and it is difficult to judge whether an object is an outlier or not. Therefore, a more effective and more efficient approach is in demand. This paper presents a new technique for outlier detection on multiple time series data in stock market. At first, principal curve algorithm is used to detect the outliers from individual measurements of stock market. Then, the generated outliers are measured with the probability of being real alerts. To improve the accuracy and precision, these outliers are combined by some rules associated with the domain knowledge. The experimental results on real stock market data show that the proposed model is feasible in practice and achieves a higher accuracy and precision than traditional methods

Luo, C., Zhao, Y., Cao, L., Ou, Y. & Zhang, C. 2008, 'Exception Mining on Multiple Time Series in Stock Market', 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology, IEEE/WIC/ACM international Conference on Web Intelligence and Intelligent Agent Technology, Springer, Sydney, Australia, pp. 690-693.
View description>>

This paper presents our research on exception mining on multiple time series data which aims to assist stock market surveillance by identifying market anomalies. Traditional technologies on stock market surveillance have shown their limitations to handle large amount of complicated stock market data. In our research, the Outlier Mining on Multiple time series (OMM) is proposed to improve the effectiveness of exception detection for stock market surveillance. The idea of our research is presented, challenges on the research are analyzed, and potential research directions are summarized.

Moemeng, C., Cao, L. & Zhang, C. 2008, 'F-TRADE 3.0: An Agent-Based Integrated Framework for Data Mining Experiments', 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology - Workshops, IEEE/WIC/ACM international Conference on Web Intelligence and Intelligent Agent Technology, IEEE Computer Society, University of Technology, Sydney, Australia, pp. 612-615.
View description>>

Data mining researches focus on algorithms that mine valuable patterns from particular domain. Apart from the theoretical research, experiments take a vast amount of effort to build. In this paper, we propose an integrated framework that utilises a multi-agent system to support the researchers to rapidly develop experiments. Moreover, the proposed framework allows extension and integration for future researches in mutual aspects of agent and data mining. The paper describes the details of the framework and also presents a sample implementation.

Ou, Y., Cao, L., Luo, C. & Liu, L. 2008, 'Mining Exceptional Activity Patterns in Microstructure Data', 2008 IEEE/WIC/ACM international Conference on Web Intelligence and Intelligent Agent Technology, IEEE/WIC/ACM international Conference on Web Intelligence and Intelligent Agent Technology, IEEE Computer Society, University of Technology, Sydney, Australia, pp. 884-887.
View description>>

Market Surveillance plays an important role in maintaining market integrity, transparency and fairnesss. The existing trading pattern analysis only focuses on interday data which discloses explicit and high-level market dynamics. In the mean time, the existing market surveillance systems are facing challenges of misuse, mis-disclosure and misdealing of information, announcement and order in one market or crossing multiple markets. Therefore, there is a crucial need to develop workable methods for smart surveillance. To deal with such issues, we propose an innovative methodology -- microstructure activity pattern analysis. Based on this methodology, a case study in identifying exceptional microstructure activity patterns is carried out. The experiments on real-life stock data show that microstructure activity pattern analysis opens a new and effective means for crucially understanding and analysing market dynamics. The resulting findings such as exceptional microstructure activity patterns can greatly enhance the learning, detection, adaption and decision-making capability of market surveillance.

Ou, Y., Cao, L., Luo, C. & Zhang, C. 2008, 'Domain-Driven Local Exceptional Pattern Mining for Detecting Stock Price Manipulation', Lecture Notes in Computer Science Vol 5351: PRICAI 2008: Trends in Artificial Intelligence, Pacific Rim International Conference on Artificial Intelligence, Springer, Hanoi,Vietnam, pp. 849-858.
View description>>

Recently, a new data mining methodology, Domain Driven Data Mining (D3M), has been developed. On top of data-centered pattern mining, D3M generally targets the actionable knowledge discovery under domain-specific circumstances. It strongly appreciates the involvement of domain intelligence in the whole process of data mining, and consequently leads to the deliverables that can satisfy business user needs and decision-making. Following the methodology of D3M, this paper investigates local exceptional patterns in real-life microstructure stock data for detecting stock price manipulations. Different from existing pattern analysis mainly on interday data, we deal with tick-by-tick data. Our approach proposes new mechanisms for constructing microstructure order sequences by involving domain factors and business logics, and for measuring the interestingness of patterns from business concern perspective. Real-life data experiments on an exchange data demonstrate that the outcomes generated by following D3M can satisfy business expectations and support business users to take actions for market surveillance.

Qiu, X., Jiang, S., Liu, H., Huang, Q. & Cao, L. 2008, 'Spatial-temporal attention analysis for home video', 2008 IEEE International Conference on Multimedia & Expo, IEEE International Conference on Multimedia and Expo, IEEE, Hannover Congress Centrum, Hannover, Germany, pp. 1517-1520.
View description>>

In this paper, by considering the multiple spatial-temporal characteristic of visual perception system, we propose a novel home video attention analysis method. Firstly, each frame of the video is segmented into regions which are more informative than pixels and image blocks. Then the saliency of each region is analyzed by combining static, motion and location attentions. Finally a region based saliency map is generated for each frame, and an attention score curve is obtained for the video clip by combining attention scores of all regions in each frame. Both of them can be utilized in wide applications. This method takes advantage of the properties of human visual perception and can well present the attention information of home videos. Experimental results show the effectiveness of this approach.

Xiao, Y., Liu, B., Luo, D. & Cao, L. 2008, 'Multi-agent system for custom relationship management with SVMs tool', 2nd KES International Symposium on Agent and Multi-Agent Systems: Technologies and Applications, KES-AMSTA 2008, International KES Symposium on Agents and Multiagent systems - Technologies and Applications, Springer Berlin / Heidelb, Incheon, pp. 333-340.
View description>>

Distributed data mining in the CRM is to learn available knowledge from the customer relationship so as to instruct the strategic behavior. In order to resolve the CRM in distributed data mining, this paper proposes the architecture of distributed data mining for CRM, and then utilizes the support vector machine tool to separate the customs into several classes and manage them. In the end, the practical experiments about one Chinese company are conducted to show the good performance of the proposed approach. © 2008 Springer-Verlag Berlin Heidelberg.

Zhang, H., Zhao, Y., Cao, L. & Zhang, C. 2008, 'Combined Association Rule Mining', Lecture Notes in Artificial Intelligence Vol 5012: Advances in Knowledge Discovery and Data Mining, Pacific-Asia Conference on Knowledge Discovery and Data Mining, Springer, Osaka, Japan, pp. 1069-1074.
View description>>

This paper proposes an algorithm to discover novel association rules, combined association rules. Compared with conventional association rule, this combined association rule allows users to perform actions directly. Combined association rules are always organized as rule sets, each of which is composed of a number of single combined association rules. These single rules consist of non-actionable attributes, actionable attributes, and class attribute, with the rules in one set sharing the same non-actionable attributes. Thus, for a group of objects having the same non-actionable attributes, the actions corresponding to a preferred class can be performed directly. However, standard association rule mining algorithms encounter many difficulties when applied to combined association rule mining, and hence new algorithms have to be developed for combined association rule mining. In this paper, we will focus on rule generation and interestingness measures in combined association rule mining. In rule generation, the frequent itemsets are discovered among itemset groups to improve efficiency. New interestingness measures are defined to discover more actionable knowledge. In the case study, the proposed algorithm is applied into the field of social security. The combined association rule provides much greater actionable knowledge to business owners and users.

Zhao, Y., Zhang, H., Cao, L., Zhang, C. & Bohlscheid, H. 2008, 'Combined Pattern Mining: from Learned Rules to Actionable Knowledge', AI 2008: Advances in Artificial Intelligence: Lecture Notes in Artificial Intelligence 5360, Australasian Joint Conference on Artificial Intelligence, Springer, Auckland, Newzealand, pp. 393-403.
View description>>

Association mining often produces large collections of association rules that are difficult to understand and put into action. In this paper, we have designed a novel notion of combined patterns to extract useful and actionable knowledge from a large amount of learned rules. We also present definitions of combined patterns, design novel metrics to measure their interestingness and analyze the redundancy in combined patterns. Experimental results on real-life social security data demonstrate the effectiveness and potential of the proposed approach in extracting actionable knowledge from complex data.

Zhao, Y., Zhang, H., Cao, L., Zhang, C. & Bohlscheid, H. 2008, 'Efficient Mining of Event-Oriented Negative Sequential Rules', 2008 IEEE/WIC/ACM International Conference on Web Intelligence, IEEE/WIC/ACM international Conference on Web Intelligence and Intelligent Agent Technology, IEEE Computer Society, University of Technology, Sydney, Australia, pp. 336-342.
View description>>

Traditional sequential pattern mining deals with positive sequential patterns only, that is, only frequent sequential patterns with the appearance of items are discovered. However, it is often interesting in many applications to find frequent sequential patterns with the non-occurrence of some items, which are referred to as negative sequential patterns. This paper analyzes three types of negative sequential rules and presents a new technique to find event-oriented negative sequential rules. Its effectiveness and efficiency are shown in our experiments.

Chapters

Cao, L., Zhang, C., Luo, D. & Dai, R. 2007, 'Intelligence Metasynthesis in Building Business Intelligence Systems' in Carbonell, J.G., Siekmann, J., Zhong, N., Liu, J., Yao, Y., Wu, J., Lu, S. & Li, K. (eds), Lecture Notes in Artificial Intelligence - Lecture Notes in Computer Science (Book Series), Springer, Germany, pp. 454-470.
View description>>

In our previous work, we have analyzed the shortcomings of existing business intelligence (BI) theory and its actionable capability. One of the works we have presented is the ontology-based integration of business, data warehousing and data mining. This way may make existing BI systems as user and business-friendly as expected. However, it is challenging to tackle issues and construct actionable and business friendly systems by simply improving existing BI framework. Therefore, in this paper, we further propose a new framework for constructing next generation BI systems. That is intelligence metasynthesis, namely the next-generation BI systems should to some extent synthesize four types of intelligence, including data intelligence, domain intelligence, human intelligence and network/web intelligence. The theory for guiding the intelligence metasynthesis is metasynthetic engineering. To this end, an appropriate intelligence integration framework is substantially important. We first address the roles of each type of intelligence in developing nextgeneration BI systems. Further, implementation issues are addressed by discussing key components for synthesizing the intelligence. The proposed framework is based on our real-world experience and practice in designing and implementing BI systems. It also greatly benefits from multi-disciplinary knowledge dialog such as complex intelligent systems and cognitive sciences. The proposed theoretical framework has potential to deal with key challenges in existing BI framework and systems.

Journal articles

Cao, L. 2007, 'Domain-driven Data Mining: A Framework', IEEE Intelligent Systems, vol. 22, no. 4, pp. 78-79.
View description>>

Data mining increasingly faces complex challenges in the real-life world of business problems and needs. The gap between business expectations and R&D results in this area involves key aspects of the field, such as methodologies, targeted problems, pattern interestingness, and infrastructure support. Both researchers and practitioners are realizing the importance of domain knowledge to close this gap and develop actionable knowledge for real user needs

Cao, L. & Zhang, C. 2007, 'The Evolution of KDD: Towards Domain-Driven Data Mining', International Journal of Pattern Recognition and Artificial Intelligence, vol. 21, no. 4, pp. 677-692.
View description>>

Traditionally, data mining is an autonomous data-driven trial-and-error process. Its typical task is to let data tell a story disclosing hidden information, in which domain intelligence may not be necessary in targeting the demonstration of an algorithm. Often knowledge discovered is not generally interesting to business needs. Comparably, real-world applications rely on knowledge for taking effective actions. In retrospect of the evolution of KDD, this paper briefly introduces domain-driven data mining to complement traditional KDD. Domain intelligence is highlighted towards actionable knowledge discovery, which involves aspects such as domain knowledge, people, environment and evaluation. We illustrate it through mining activity patterns in social security data.

Cao, L., Luo, D. & Zhang, C. 2007, 'Knowledge actionability: satisfying technical and business interestingness', International Journal of Business Intelligence and Data Mining, vol. 2, no. 4, pp. 496-514.
View description>>

Traditionally, knowledge actionability has been investigated mainly by developing and improving technical interestingness. Recently, initial work on technical subjective interestingness and business-oriented profit mining presents general potential, while it is a long-term mission to bridge the gap between technical significance and business expectation. In this paper, we propose a two-way significance framework for measuring knowledge actionability, which highlights both technical interestingness and domain-specific expectations. We further develop a fuzzy interestingness aggregation mechanism to generate a ranked final pattern set balancing technical and business interests. Real-life data mining applications show the proposed knowledge actionability framework can complement technical interestingness while satisfy real user needs.

Cao, L., Zhang, C., Yang, Q., Bell, D., Vlachos, M., Yu, P.S., Taneri, B., Keogh, E., Zhong, N., Ashrafi, M.Z., Taniar, D., Dubossarsky, E. & Graco, W. 2007, 'Domain-driven, actionable knowledge discovery', IEEE Intelligent Systems, vol. 22, no. 4, pp. 78-88.
View description>>

Researchers are developing domain-driven data mining techniques that target actionable knowledge discovery (KDD) in complex domain problems. The domain-driven technique aims to utililize and mine many aspects of intelligence, such as in-depth data, domain expertise, real-time human involvement, process, environment, and social intelligence. It also metasynthesizes its intelligence sources for actionable knowledge discovery. The method works to expose next-generation methodologies for actionable knowledge discovery, identifying ways in which KDD can better contribute to critical domain problems in theory and practice. It undercovers domain-driven techniques to help KDD, strengthen business intelligence in complex enterprise applications. It also reveals applications that effectively deploy domain-driven data mining method,to solve complex practical problems.

Zhang, H., Zhao, Y., Cao, L. & Zhang, C. 2007, 'Class Association Rule Mining with Multiple Imbalanced Attributes', Lecture Notes in Computer Science, vol. 4830, pp. 827-831.
View description>>

In this paper, we propose a novel framework to deal with data imbalance in class association rule mining. In each class association rule, the right-hand is a target class while the left-hand may contain one or more attributes. This framework is focused on the multiple imbalanced attributes on the left-hand. In the proposed framework, the rules with and without imbalanced attributes are processed in parallel. The rules without imbalanced attributes are mined through standard algorithm while the rules with imbalanced attributes are mined based on new defined measurements. Through simple transformation, these measurements can be in a uniform space so that only a few parameters need to be specified by user. In the case study, the proposed algorithm is applied into social security field. Although some attributes are severely imbalanced, the rules with minority of the imbalanced attributes have been mined efficiently.

Conferences

Cao, L. 2007, 'Multi-strategy Integration for Actionable Trading Agents', Workshop on Agents & Data Mining Interaction (ADMI 2007), International Workshop on Agents and Data Mining Interaction, IEEE Computer Soc, San Jose, USA, pp. 487-490.
View description>>

Trading agents are very useful for developing and back-testing quality trading strategies to support smart trading actions in the market. However, the existing trading agent research mainly focuses on simple and simulated strategies. As a result, there exists a big gap between academia and business when the developed trading agents are deployed in the real life. Therefore, the actionable capability of developed trading agents is often very limited. In this paper, we introduce approaches for optimizing and integrating multiple classes of strategies for trading agents. Five categories of trading strategies, including 36 types of trading strategies are trained and tested. A strategy integration and optimization approach is proposed to identify golden trading strategy in each category, and finally recommend positions associated with these golden strategies to trading agents. Test in five international markets on ten years of data respectively has shown that the final strategies recommended to trading agents can lead to high benefits while low costs. Concurrent execution of positions recommended by all golden strategies can greatly enhance performance.

Cao, L. & Zhang, C. 2007, 'F-Trade: An Agent-Mining Symbiont for Financial Services', Agent & Data Mining Interaction, International Conference on Autonomous Agents and Multiagent Systems, IFAAMAS, Honolulu, Hawai'i, pp. 1363-1364.
View description>>

The interaction and integration of agent technology and data mining presents prominent benefits to solve some of challenging issues in individual areas. For instance, data mining can enhance agent learning, while agent can benefit data mining with distributed pattern discovery. In this paper, we summarize the main functionalities and features of an agent service and data mining symbiont -- F-Trade. The F-Trade is constructed in Java agent service following the theory of open complex agent systems. We demonstrate the roles of agents in building up the F-Trade, as well as how agents can support data mining. On the other hand, data mining is used to strengthen agents. F-Trade provides flexible and efficient services of trading evidence back-testing, optimization and discovery, as well as plug and play of algorithms, data and system modules for financial trading and surveillance with online connectivity to huge quantities of global market data. and mining symbiont.

Cao, L., Luo, C. & Zhang, C. 2007, 'Agent-Mining Interaction: An Emerging Area', Autonomous Intelligent Systems: Multi-Agents and Data Mining, International Workshop Autonomous Intelligent Systems: Agents and Data Mining, Springer, St. Petersburg, Russia, pp. 60-73.
View description>>

In the past twenty years, agents (we mean autonomous agent and multi-agent systems) and data mining (also knowledge discovery) have emerged separately as two of most prominent, dynamic and exciting research areas. In recent years, an increasingly remarkable trend in both areas is the agent-mining interaction and integration. This is driven by not only researcherâs interests, but intrinsic challenges and requirements from both sides, as well as benefits and complementarity to both communities through agent-mining interaction. In this paper, we draw a high-level overview of the agent-mining interaction from the perspective of an emerging area in the scientific family. To promote it as a newly emergent scientific field, we summarize key driving forces, originality, major research directions and respective topics, and the progression of research groups, publications and activities of agent-mining interaction. Both theoretical and application-oriented aspects are addressed. The above investigation shows that the agent-mining interaction is attracting everincreasing attention from both agent and data mining communities. Some complicated challenges in either community may be effectively and efficiently tackled through agent-mining interaction. However, as a new open area, there are many issues waiting for research and development from theoretical, technological and practical perspectives. This work is sponsored by Australian Research Council Discovery Grant (DP0773412, LP0775041, DP0667060, DP0449535), and UTS internal grants.

Cao, L., Luo, C. & Zhang, C. 2007, 'Developing Actionable Trading Strategies for Trading Agents', Proceedings of the IEEE/WIC/ACM International Conference on Intellligent Agent Technology, IEEE/WIC/ACM International Conference on Intelligent Agent Technology, IEEE Computer Soc, San Jose, pp. 72-75.
View description>>

Trading agents are very useful for developing and backtesting quality trading strategies for actions taking in the real world. However, the existing trading agent research mainly focuses on simulation using artificial data and market models. As a result, the actionable capability of developed trading strategies is often limited. In this paper, we analyze such constraints on developing actionable trading strategies for trading agents. These points are deployed into developing a series of trading strategies for trading agents through optimizing, and enhancing actionable trading strategies. We demonstrate working case studies in large-scale of market data. These approaches and their performance are evaluated from both technical and business perspectives.

Cao, L., Zhao, Y., Figueiredo, F., Ou, Y. & Luo, D. 2007, 'Mining High Impact Exceptional Behavior Patterns', Emerging Technologies in Knowledge Discovery and Data Mining: Revised Selected Papers of PAKDD 2007 International Workshops, Pacific-Asia Conference on Knowledge Discovery and Data Mining, Springer, Nanjing, China, pp. 56-63.
View description>>

In the real world, exceptional behavior can be seen in many situations such as security-oriented fields. Such behavior is rare and dispersed, while some of them may be associated with significant impact on the society. A typical example is the event September 11. The key feature of the above rare but significant behavior is its high potential to be linked with some significant impact. Identifying such particular behavior before generating impact on the world is very important. In this paper, we develop several types of high impact exceptional behavior patterns. The patterns include frequent behavior patterns which are associated with either positive or negative impact, and frequent behavior patterns that lead to both positive and negative impact. Our experiments in mining debt-associated customer behavior in social-security areas show the above approaches are useful in identifying exceptional behavior to deeply understand customer behavior and streamline business process.

Luo, D., Cao, L., Ni, J. & Liu, L. 2007, 'Building Agent Service Oriented Multi-Agent Systems', Agent and Multi-Agent Systems: Technologies and Applications, International KES Symposium on Agents and Multiagent systems - Technologies and Applications, Springer, Wroclaw, Poland, pp. 11-20.
View description>>

An effective agent-based design approach is significant in engineering agent-based systems. Existing design approaches meet with challenges in designing Internet-based open agent systems. The emergence of service-oriented computing (SOC) brings in intrinsic mechanisms for complementing agent-based computing (ABS). In this paper, we investigate the dialogue between agent and service, and between ABS and SOC. As a consequence, we synthesize them and develop a design approach called agent service-oriented design (ASOD). The ASOD consists of agent service-based architectural design and detailed design. ASOD expands the content and range of agent and ABS, and synthesizes the qualities of SOC such as interoperability and openness, and the performances of ABC like flexibility and autonomy. The above techniques have been deployed in developing an online trading and mining support infrastructure F-Trade.

Ou, Y., Cao, L., Yu, T. & Zhang, C. 2007, 'Detecting Turning Points of Trading Price and Return Volatility for Market', Workshop on Agents & Data Mining Interaction (ADMI 2007), International Workshop on Agents and Data Mining Interaction, IEEE Computer Soc, San Jose, pp. 491-494.
View description>>

Trading agent concept is very useful for trading strategy design and market mechanism design. In this paper, we introduce the use of trading agent for market surveillance. Market surveillance agents can be developed for market surveillance officers and management teams to present them alerts and indicators of abnormal market movements. In particular, we investigate the strategies for market surveillance agents to detect the impact of company announcements on market movements. This paper examines the performance of segmentation on the time series of trading price and return volatility, respectively. The purpose of segmentation is to detect the turning points of market movements caused by announcements, which are useful to identify the indicators of insider trading. The experimental results indicate that the segmentation on the time series of return volatility outperforms that on the time series of trading price. It is easier to detect the turning points of return volatility than the turning points of trading price. The results will be used to code market surveillance agents for them to monitor abnormal market movements before the disclosure of market sensitive announcements. In this way, the market surveillance agents can assist market surveillance officers with indicators and alerts.

Ou, Y., Cao, L., Yu, T. & Zhang, C. 2007, 'Detecting turning points of trading price and return volatility for market surveillance agents', Proceedings - 2007 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology - Workshops, WI-IAT Workshops 2007, pp. 491-494.
View description>>

Trading agent concept is very useful for trading strategy design and market mechanism design. In this paper, we introduce the use of trading agent for market surveillance. Market surveillance agents can be developed for market surveillance officers and management teams to present them alerts and indicators of abnormal market movements. In particular, we investigate the strategies for market surveillance agents to detect the impact of company announcements on market movements. This paper examines the performance of segmentation on the time series of trading price and return volatility, respectively. The purpose of segmentation is to detect the turning points of market movements caused by announcements, which are useful to identify the indicators of insider trading. The experimental results indicate that the segmentation on the time series of return volatility outperforms that on the time series of trading price. It is easier to detect the turning points of return volatility than the turning points of trading price. The results will be used to code market surveillance agents for them to monitor abnormal market movements before the disclosure of market sensitive announcements. In this way, the market surveillance agents can assist market surveillance officers with indicators and alerts. © 2007 IEEE.

Zhao, Y., Zhang, H., Figueiredo, F., Cao, L. & Zhang, C. 2007, 'Mining for combined association rules on multiple datasets', Proceedings of the 2007 international workshop on Domain driven data mining, International Workshop on Domain Driven Data Mining, ACM, San Jose, USA, pp. 18-23.
View description>>

Many organisations have their digital information stored in a distributed systems structure scheme, be it in different locations, using vertically and horizontally distributed repositories, which brings about an high level of complexity to data mining. From a classical data mining view, where the algorithms expect a denormalised structure to be able to operate on, heterogeneous data sources, such as static demographic and dynamic transactional data are to be manipulated and integrated to the extent commercial association rules algorithms can be applied. Bearing in mind the usefulness and understandability of the application from a business perspective, combined rules of multiple patterns derived from different repositories, containing historical and point in time data, were used to produce new techniques in association mining applied to debt recovery. Initially debt repayment patterns were discovered using transactional data and class labels defined by domain expertise, then demographic patterns were attached to each of the class labels. After combining the patterns, two type of rules were discovered leading to different results: 1) same demographic pattern with different repayment patterns, and 2) same repayment pattern with different demographic patterns. The rules produced are interesting, valuable, complete and understandable, which shows the applicability and effectiveness of the new method.

Journal articles

Cao, L. & Zhang, C. 2006, 'Domain-driven data mining: A practical methodology', International Journal of Data Warehousing and Mining, vol. 2, no. 4, pp. 49-65.
View description>>

Extant data mining is based on data-driven methodologies. It either views data mining as an autonomous data-driven, trial-and-error process or only analyzes business issues in an isolated, case-by-case manner. As a result, very often the knowledge discovered generally is not interesting to real business needs. Therefore, this article proposes a practical data mining methodology referred to as domain-driven data mining, which targets actionable knowledge discovery in a constrained environment for satisfying user preference. The domain-driven data mining consists of a DDID-PD framework that considers key components such as constraint-based context, integrating domain knowledge, human-machine cooperation, in-depth mining, actionability enhancement, and iterative refinement process. We also illustrate some examples in mining actionable correlations in Australian Stock Exchange, which show that domain-driven data mining has potential to improve further the actionability of patterns for practical use by industry and business.

Cao, L., Zhang, C. & Liu, J. 2006, 'Ontology-Based Integration Of Business Intelligence', Web Intelligence and Agent Systems: An International Journal, vol. 4, no. 3, pp. 313-325.
View description>>

The integration of Business Intelligence (BI) has been taken bybusiness decision-makers as an effective means to enhance enterprise "soft power" and added value in the reconstruction and revolution oftraditional industries. The existing solutions based on structuralintegration are to pack together data warehouse (DW), OLAP, data mining(DM) and reporting systems from different vendors. BI system users arefinally delivered a reporting system in which reports, data models,dimensions and measures are predefined by system designers. As aresult of a survey in the US, 85% of DW projects based on the above solutions failed to meet their intended objectives. In this paper, wesummarize our investigation on the integration of BI on the basis ofsemantic integration and structural interaction. Ontology-basedintegration of BI is discussed for semantic interoperability inintegrating DW, OLAP and DM. A hybrid ontological structure isintroduced which includes conceptual view, analytical view and physicalview. These views are matched with user interfaces, DW and enterpriseinformation systems, respectively. Relevant ontological engineeringtechniques are developed for ontology namespace, semantic relationships,and ontological transformation, mapping and query in this ontologicalspace. The approach is promising for business-oriented, adaptive andautomatic integration of BI in the real world. Operational decisionmaking experiments within a telecom company have demonstrated that a BI system utilizing the proposed approach is more flexible.

Conferences

Cao, L. 2006, 'Activity mining: Challenges and prospects', Advanced Data Mining And Applications, Proceedings, Lecture Notes in Artificial Intelligence, International Conference on Advanced Data Mining and Applications, Springer-Verlag Berlin, Xi'an, China, pp. 582-593.
View description>>

Activity data accumulated in real life, e.g. in terrorist activities and fraudulent customer contacts, presents special structural and semantic complexities. However, it may lead to or be associated with significant business impacts. For instance, a seri

Cao, L. & Zhang, C. 2006, 'Domain-driven actionable knowledge discovery in the real world', Advances In Knowledge Discovery And Data Mining, Proceedings, Lecture Notes in Artificial Intelligence, Pacific-Asia Conference on Knowledge Discovery and Data Mining, Springer-Verlag Berlin, Singapore, pp. 821-830.
View description>>

Actionable knowledge discovery is one of Grand Challenges in KDD. To this end, many methodologies have been developed. However, they either view data mining as an autonomous data-driven trial-and-error process, or only analyze the issues in an isolated a

Cao, L., Luo, C., Ni, J., Luo, D. & Zhang, C. 2006, 'Stock data mining through fuzzy genetic algorithm', Proceedings of the 9th Joint Conference on Information Sciences, JCIS 2006.
View description>>

Stock data mining such as financial pairs mining is useful for trading supports and market surveillance. Financial pairs mining targets mining pair relationships between financial entities such as stocks and markets. This paper introduces a fuzzy genetic algorithm framework and strategies for discovering pair relationship in stock data such as in high dimensional trading data by considering user preference. The developed techniques have a potential to mine pairs between stocks, between stock-trading rules, and between markets. Experiments in real stock data show that the proposed approach is useful for mining pairs helpful for real trading decision-support and market surveillance.

Cao, L., Luo, D. & Zhang, C. 2006, 'Fuzzy genetic algorithms for pairs mining', PRICAI 2006: Trends In Artificial Intelligence, Proceedings, Lecture Notes in Artificial Intelligence, Pacific Rim International Conference on Artificial Intelligence, Springer-Verlag Berlin, Guilin, China, pp. 711-720.
View description>>

Pairs mining targets to mine pairs relationship between entities such as between stocks and markets in financial data mining. It has emerged as a kind of promising data mining applications. Due to practical complexities in the real-world pairs mining suc

Cao, L., Ni, J. & Luo, D. 2006, 'Ontological engineering in data warehousing', Frontiers Of WWW Research And Development - Apweb 2006, Proceedings - Lecture Notes in Computer Science, Asia Pacific Web Conference, Springer-Verlag Berlin, Harbin, China, pp. 923-929.
View description>>

In our previous work, we proposed the ontology-based integration of data warehousing to make existing data warehouse system more user-friendly, adaptive and automatic. This paper further outlines a high-level picture of the ontological engineering in dat

Ni, J., Cao, L. & Zhang, C. 2006, 'Agent services-oriented architectural design of a framework for artificial stock markets', Advances in Intelligent IT: Active Media Technology 2006, International Conference on Active Media Technology, IOS Press, Brisbane, Australia, pp. 396-399.

Zhang, C. & Cao, L. 2006, 'Domain-driven mining: Methodologies and applications', Advances in Intelligent IT: Active Media Technology 2006, International Conference on Active Media, IOS Press, Brisbane, Australia, pp. 13-16.

Zhao, Y., Cao, L., Morrow, Y.K., Ou, Y., Ni, J. & Zhang, C. 2006, 'Discovering debtor patterns of Centrelink customers', Data mining 2006; Proceedings of AusDM 2006, Australian Data Mining Conference, ACS Inc, Sydney, Australia, pp. 135-144.

Journal articles

Cao, L., Zhang, C. & Dai, R. 2005, 'Organization-Oriented Analysis of Open Complex Agent Systems', International Journal of Intelligent Control and Systems, vol. 10, no. 2, pp. 114-122.
View description>>

Organization-oriented analysis acts as the key step and foundation in building organization-oriented methodology (OOM) to engineer multi-agent systems especially open complex agent systems (OCAS). A number of existing approaches target OOM, while they are incompatible with each other, and none of them is available as a solid and practical tool for engineering OCAS. This paper summarizes our investigation in building a unified framework for abstracting and analyzing OCAS organizations. Our organizationoriented framework, referred to as ORGANISED, integrating and expanding existing approaches, explicitly captures the main attributes in an OCAS. Following this framework, individual modelbuilding blocks are developed for all ORGANISED members; both visual and formal specifications are utilized to present an intuitive and precise analysis . The above techniques have been deployed in developing an agent service-based trading and mining support infrastructure.

Cao, L., Zhang, C. & Dai, R. 2005, 'The OSOAD Methodology for Open Complex Agent Systems', International Journal of Intelligent Control and Systems, vol. 10, no. 4, pp. 277-285.
View description>>

Open complex agent systems (OCAS) are middle-size or large-scale open agent organization. To engineer OCAS, agentcentric organization-oriented analysis, design and implementation, namely organization-oriented methodology (OOM), has emerged as a highly promising direction. A number of OOM-related approaches have been proposed; while there are some intrinsic issues hidden in them. For instance, some fundamental system attributes, such as system dynamics, are not covered by almost all of the existing approaches. In this paper, we summarize our investigation of existing approaches, and report a new OOM approach called OSOAD. The OSOAD approach consists of organizational abstraction (OA), organization-oriented analysis (OOA), agent service-oriented design (ASOD), and Java agent service -based implementation. OSOAD provides complete and deployable mechanisms for all software engineering phases. Specifically, we notice the transition supports from OA to OOA and ASOD. This approach has been built and deployed with the practical development of agent service -based financial trading and mining applications.

Luo, D., Liu, W., Luo, C., Cao, L. & Dai, R.W. 2005, 'Hybrid Analyses and System Architecture for Telecom Frauds', Jisuanji Kexue (Computer Science), vol. 32, no. 5, pp. 17-22.

Zhang, C., Zhang, Z. & Cao, L. 2005, 'Agents and data mining: Mutual enhancement by integration', Lecture Notes In Computer Science, vol. 3505, pp. 50-61.
View description>>

This paper tells a story of synergism of two cutting edge technologies - agents and data mining. By integrating these two technologies, the power for each of them is enhanced. Integrating agents into data mining systems, or constructing data mining syste

Conferences

Cao, L., Schurmann, R. & Zhang, C. 2005, 'Domain-Driven In-Depth pattern Discovery: A Practical Methodology', Proceedings 4th Australasion Data Mining Conference AusDM05, Australian Data Mining Conference, The University of Technology, Sydney, Sydney, Australia, pp. 101-114.

Cao, L., Zhang, C. & Ni, J. 2005, 'Agent services-oriented architectural design of open complex agent systems', Proceedings of 2005 IEEE/WIC/ACM International Conference On Intelligent Agent Technology, IEEE/WIC/ACM International Conference on Intelligent Agent Technology, IEEE, Compiegne, France, pp. 120-123.
View description>>

Architectural design is a critical phase in building agent-based systems. However, most of existing agent-oriented software engineering approaches deliver weak or incomplete supports for the architectural design of distributed and especially Internet-based agent systems. On the other hand, the emergence of service-oriented computing (SOC) brings in intrinsic mechanisms for complementing agent-based computing (ABC). In this paper, we investigate the dialogue between ABC and SOC and their integration in implementing architectural design. We synthesize them and develop the computational concept agent service, and build a new design approach called agent service-oriented architectural design (ASOAD). The ASOAD expands the contents and ranges of agent and ABC, and synthesize the qualities of SOC such as interoperability and openness with the performances of ABC like flexibility and autonomy. It is suitable for designing distributed agent systems and agent service-based enterprise application integration.

Cao, L., Zhang, C., Luo, D., Chen, W. & Zamani, N. 2005, 'Integrative early requirements analysis for agent-based systems', Proceedings - HIS'04: 4th International Conference on Hybrid Intelligent Systems, pp. 118-123.
View description>>

Early requirements analysis (ERA) is quite significant for building agent-based systems. Goal-oriented requirements analysis is promising for the agent-oriented early requirements analysis. In general, either visual modeling or formal specifications is used for the ERA. This way cannot capture requirements precisely and completely. In this paper, we present an integrative modeling framework for agent-oriented early requirements analysis; this framework implements goal-oriented requirement analysis. The integrative modeling combines visual modeling and formal modeling together. Extended i* framework is used for building visual models; formal specifications complement the visual modeling to define and refine requirements. Both visual and formal models are outlined through a practical agent-based system F-TRADE 1 . The integrative modelling seems to model early requirements comprehensively and concretely, and benefit refinement and conflict management in building agent systems. © 2005 IEEE.

Cao, L., Zhang, C., Luo, D., Chen, W. & Zamari, N. 2004, 'Integrative Early Requirements Analysis for Agent-Based Software', Fourth International Conference on Hybrid Intelligent Systems HIS-2004, International Conference on Hybrid Intelligent Systems, IEEE Computer Society Press, Kitakyushu, Japan, pp. 1-6.
View description>>

Early requirements analysis (ERA) is quite significant for building agent-based systems. Goal-oriented requirements analysis is promising for the agent-oriented early requirements analysis. In general, either visual modeling or formal specifications is u

Lin, L., Cao, L. & Zhang, C. 2005, 'Genetic algorithms for robust optimization in financial applications', Proceedings Of The Iasted International Conference On Computational Intelligence, IASTED International Conference on Computational Intelligence, ACTA Press, Calgary, Canada, pp. 387-391.
View description>>

In stock market or other financial market systems, the technical trading rules are used widely to generate buy and sell alert signals. In each rule, there are many parameters. The users often want to get the best signal series from the in-sample sets, (H

Lin, L., Cao, L. & Zhang, C. 2005, 'The Fish-Eye Visualization of foreign Currency Exchange Data Streams', Asia-Pacific Symposium on Information Visualisation 2005, Asia-Pacific Symposium on Information Visualisation, ACS, Sydney, Australia, pp. 91-96.
View description>>

In a foreign currency exchange market, there are high-density data streams. The present approaches for visualization of this type of data cannot show us a figure with targeted both local details and global trend information. In this paper, on the basis of features and attributes of foreign currency exchange trading streams, we discuss and compare multiple approaches including interactive zooming, multiform sampling with combination of attribute of large foreign currency exchange data, and fish-eye view embedded visualization for visual display of high-density foreign currency exchange transactions. By comparison, Fish-eye-based visualization is the best option, which can display regional records in details without losing global movement trend in the market in a limited display window. We used Fish-eye technology for output visualization of foreign currency exchange trading strategies in our trading support system linking to real-time foreign currency market closing data:

Lin, L., Cao, L. & Zhang, C. 2005, 'The Visualization of Large Database in Stock Markets', Proceedings of the IASTED International Conference on Databases and Applications, IASTED International Multi Conference, ACTA Press, Innsbruck, Austria, pp. 163-166.

Cao, L., Luo, C., Luo, D. & Liu, L. 2004, 'Ontology services-based information integration in mining telecom business intelligence', Pricai 2004: Trends In Artificial Intelligence, Proceedings, Pacific Rim International Conference on Artificial Intelligence, Springer-Verlag Berlin, Auckland, New Zealand, pp. 85-94.

Cao, L., Luo, C., Luo, D. & Zhang, C. 2004, 'Hybrid Strategy of Analysis and Control of Telecommunications Frauds', Proceedings of 2nd International Conference on Information Technology and Applications, International Conference on Information Technology and Applications, IEEE, Harbin, China, pp. 11-15.

Cao, L., Luo, C., Luo, D. & Zhang, C. 2004, 'Hybrid strategy of analysis and control of telecommunications frauds', Proceedings of the Second International Conference on Information Technology and Applications (ICITA 2004), pp. 281-285.
View description>>

The problem of telecommunications frauds has been getting more and more serous for many years, and is even getting more and more worse not only in western countries but also in some developing countries. Detection, Analysis and prevention mechanisms are emerging both from telecommunications operators and academia. In this paper, we present a hybrid strategy of analysis and control of telecommunications frauds from engineering viewpoint Our first task is to identify the complexity of telecommunications frauds, we discuss possible fraud scenarios and their evolution. Furthermore, in order to build an information system to deal with realistic telecommunications frauds, we summarize and propose a hybrid strategy, which includes a solution package, five models and four types of analyses, to construct a loop-dosed system for analysis and control of frauds. We further discuss a system framework for analysis and control of telecommunications frauds.

Cao, L., Luo, C., Luo, D. & Zhang, C. 2004, 'Integration of Business Intelligence Based on Three-Level Ontology Services', IEEE/WIC/ACM International Conference on Web Intelligence (WI2004), IEEE/WIC/ACM international Conference on Web Intelligence and Intelligent Agent Technology, IEEE, Beijing, China, pp. 17-23.
View description>>

Usually, integration of business intelligence (BI) from realistic telecom enterprise is by packing data warehouse (DW), OLAP, data mining and reporting from different vendors together.As a result, BI system users are transferred to a reporting system with reports, data models, dimensions and measures predefined by system designers.As a result of survey, 85% of DW projects failed to meet their intended objectives.In this paper, we investigate how to integrate BI packages into an adaptive and flexible knowledge portal by constructing an internal link and communication channel from top-level business concepts to underlying enterprise information systems (EIS).An approach of three-level ontology services is developed, which implements unified naming, directory and transport of ontology services, and ontology mapping and query parsing among conceptual view, analytical view and physical view from user interfaces through DW to EIS.Experiments on top of real telecom EIS shows that our solution for integrating BI presents much stronger power to support operational decision making more user-friendly and adaptively compared with those simply combining BI products presently available together.

Cao, L., Luo, D., Luo, C. & Liu, L. 2004, 'Ontology Transformation in Multiple Domains', AI 2004: Advances in Artificial Intelligence, 17th Australian Joint Conference on Artificial Intelligence Cairns, Australia, December 2004 Proceedings, Australasian Joint Conference on Artificial Intelligence, Springer, Cairns, Australia, pp. 985-990.
View description>>

We have proposed a new approach called ontology services-driven integration of business intelligence (BI) to designing an integrated BI platform. In such a BI platform, multiple ontological domains may get involved, such as domains for business, reporting, data warehouse, and multiple underlying enterprise information systems. In general, ontologies in the above multiple domains are heterogeneous. So, a key issue emerges in the process of building an integrated BI platform, that is, how to support ontology transformation and mapping between multiple ontological domains. In this paper, we present semantic aggregations of semantic relationships and ontologies in one or multiple domains, and the ontological transformation from one domain to another. Rules for the above semantic aggregation and transformation are described. This work is the foundation for supporting BI analyses crossing multiple domains.

Cao, L., Ni, J., Wang, J. & Zhang, C. 2004, 'Agent Services-Driven-plug-and-play in F-Trade', AI 2004: Advances in Artificial Intelligence, 17th Australian Joint Conference on Artificial Intelligence Cairns, Australia, December 2004 Proceedings, Australasian Joint Conference on Artificial Intelligence, Springer, Cairns, Australia, pp. 917-922.
View description>>

We have built an agent service-based enterprise infrastructure: F-TRADE. With its online connectivity to huge real stock data in global markets, it can be used for online evaluation of trading strategies and data mining algorithms. The main functions in the F-TRADE include soft plug-and-play, and back-testing, optimization, integration and evaluation of algorithms. In this paper, we'll focus on introducing the intelligent plug-and-play, which is a key system function in the F-TRADE. The basic idea for the soft plug-and-play is to build agent services which can support the online plug-in of agents, algorithms and data sources. Agent UML-based modeling, role model and agent services for the plug-and-play are discussed. With this design, algorithm providers, data source providers, and system module developers of the F-TRADE can expand system functions and resources by online plugging them into the F-TRADE.

Cao, L., Wang, J., Lin, L. & Zhang, C. 2004, 'Agent Services - Based infrastructure for online assessment of training strategies', Proceedings IEEE/WIC/ACM International Conference on Intelligent Agent Technology (IAT 2004), IEEE/WIC/ACM International Conference on Intelligent Agent Technology, IEEE, Beijing, China, pp. 345-348.
View description>>

Traders and researchers in stock marketing often hold some private trading strategies. Evaluation and optimization of their strategies is a great benefit to them before they take any risk in realistic trading. We build an agent services-driven infrastructure: F-TRADE. It supports online plug in, iterative back-test, and recommendation of trading strategies. We propose agent services-driven approach for building the above automated enterprise infrastructure. Description, directory and mediation of agent services are discussed. System structure of the agent services-based F-TRADE is also discussed. F-TRADE has been an online test platform for research and application of multi-agent technology, and data mining in stock markets

Lin, L., Cao, L., Wang, J. & Zhang, C. 2004, 'The Applications of genetic algorithms in stock market data mining optimisation', Data Mining V, Data Mining, Text Mining and Their Business Application, Conference on Data Mining, Text Mining and Their Business Application, Wessex Institute of Technology Press, Malaga, Spain, pp. 273-280.

Journal articles

Cao, L. & Dai, R. 2003, 'Agent-Oriented Approach for Dealing with Open Giant Intelligent Systems', Moshi Shibie yu Rengong Zhineng - Journal of Pattern Recognition and Artificial Intelligence, vol. 15, no. 3, pp. 75-81.

Cao, L. & Dai, R. 2003, 'Human-Computer Cooperated Intelligent Information System Based on Multi-Agents', Zidonghua Xuebao - Acta Automatica Sinica, vol. 29, no. 1, pp. 86-94.
View description>>

The Hall for Workshop of Metasynthetie Engineering(HWME) is an engeering technology proposed for coping with open complex giant systems. In this paper we describe the implementation of a human-computer-cooperated intelligent information system with HWME and multiagents. We propose a layered model, a system structure over the network, and a distributed computing model--an n-tier client/agent/server-nested Requester-Mediator-Provider--for building the system. Furthermore, we discuss the framework and working mechanisms of an agent-based system of HWME, which is designed for macroeconomic decision-support based on intelligent information agents in Java. Our system implementation shows that an agent-oriented HWME system over the Internet may exhibit better performance in terms of handling open complex problems

Cao, L. & Dai, R. 2003, 'On Metasynthesis and Decision Making', Jisuanji Yanjiu yu Fazhan - Journal of Computer Research and Development, vol. 40, no. 1, pp. 531-537.

Cao, L. & Dai, R.W. 2003, 'Agent-oriented Metasynthetic Engineering for Decision making', International Journal of Information Technology and Decision Making, vol. 2, no. 2, pp. 197-215.

Cao, L.B. & Dai, R.W. 2003, 'On metasynthesis and decision making', Jisuanji Yanjiu yu Fazhan/Computer Research and Development, vol. 40, no. 4, p. 531.

Dai, R. & Cao, L. 2003, 'Internet----An Open Complex Giant System', Science in China Series E: Technological Sciences, vol. 33, no. 4, pp. 289-296.

Conferences

Cao, L., Li, C., Zhang, C. & Dai, R.W. 2003, 'Open Giant Intelligent Information Systems and Its Multiagent-Oriented System Design', Proceedings of the International Conference on Software Engineering Research and Practice Volume II, International Conference on Software Engineering Research and Practice, CSREA Press, Las Vegas, Nevada, USA, pp. 816-822.

Cao, L., Luo, C., Li, C., Zhang, C. & Dai, R.W. 2003, 'Open Giant Intelligent Information Systems and Its Agent-Oriented Abstraction Mechanism', Proceedings of the Fifteenth International Conference on Software Engineering and Knowledge Engineering, International Conference on Software Engineering and Knowledge Engineering, Knowledge Systems Institute, San Francisco, California, USA, pp. 85-89.

Cao, L., Luo, D., Luo, C. & Zhang, C. 2003, 'Systematic Engineering in Designing Architecture of Telecommunications Business Intelligence System', Design and Application of Hybrid Intelligent Systems, HIS03, the Third International Conference on Hybrid Intelligent Systems, International Conference on Hybrid Intelligent Systems, IOS Press, Melbourne, Australia, pp. 1084-1093.

Li, C., Zhang, C. & Cao, L. 2003, 'Theoretical Evaluation of Ring-Based Architectural Model for Middle Agents in Agent-Based System', Foundations of Intelligent Systems. 14th Symposium, ISMIS 2003 Proceedings, International Symposium on Foundations of Intelligent Systems, Springer-Verlag Berlin Heidelberg, Maebashi City, Japan, pp. 603-607.
View description>>

Ring-based architectural model is usually employed to promote the scalability and robustness of agent-based systems. However there are no criteria for evaluating the performance of ring-based architectural model. In this paper, we introduce an evaluation approach to comparing the performance of ring-based architectural model with other ones. In order to evaluate ring-based architectural model, we proposed an application-based information-gathering system with middle agents, which are organized with ring-based architectural model and solve the matching problem between service provider agents and requester agents. We evaluate the ring-based architectural model with performance predictability, adaptability, and availability. We demonstrate the potentials of ring-based architectural model by the results of evaluation.

Journal articles

Cao, L. & Dai, R. 2002, 'Agent-oriented approach for dealing with open giant intelligent systems', Moshi Shibie yu Rengong Zhineng/Pattern Recognition and Artificial Intelligence, vol. 15, no. 3, p. 257.

Cao, L. & Dai, R. 2002, 'Software Architecture of the Hall for Workshop of Metasynthetic Engineering', Ruanjian Xuebao - Journal of Software, vol. 13, no. 8, pp. 1430-1435.

Cao, L., Nan, J. & Dai, R. 2002, 'Intelligent Mobile Agents for Distributed Information Integration', Xitong Fangzhen Xuebao - Journal of System Simulation, vol. 14, no. 11, pp. 1517-1520.

Cao, L.B. & Dai, R.W. 2001, 'Information system of metasynthetic wisdom-Internet', Moshi Shibie yu Rengong Zhineng/Pattern Recognition and Artificial Intelligence, vol. 14, no. 1, p. 1.


Back to list image