Victoria University

Evolutionary Computation for Feature Manipulation in Classification on High-dimensional Data

ResearchArchive/Manakin Repository

Show simple item record

dc.contributor.advisor Zhang, Mengjie
dc.contributor.advisor Xue, Bing
dc.contributor.author Tran, Binh Ngan
dc.date.accessioned 2018-07-10T01:30:12Z
dc.date.available 2018-07-10T01:30:12Z
dc.date.copyright 2018
dc.date.issued 2018
dc.identifier.uri http://hdl.handle.net/10063/7078
dc.description.abstract More and more high-dimensional data appears in machine learning, especially in classification tasks. With thousands of features, these datasets bring challenges to learning algorithms not only because of the curse of dimensionality but also the existence of many irrelevant and redundant features. Therefore, feature selection and feature construction (or feature manipulation in short) are essential techniques in preprocessing these datasets. While feature selection aims to select relevant features, feature construction constructs high-level features from the original ones to better represent the target concept. Both methods can decrease the dimensionality and improve the performance of learning algorithms in terms of classification accuracy and computation time. Although feature manipulation has been studied for decades, the task on high-dimensional data is still challenging due to the huge search space. Existing methods usually face the problem of stagnation in local optima and/or require high computation time. Evolutionary computation techniques are well-known for their global search. Particle swarm optimisation (PSO) and genetic programming (GP) have shown promise in feature selection and feature construction, respectively. However, the use of these techniques to high-dimensional data usually requires high memory and computation time. The overall goal of this thesis is to investigate new approaches to using PSO for feature selection and GP for feature construction on high-dimensional classification problems. This thesis focuses on incorporating a variety of strategies into the evolutionary process and developing new PSO and GP representations to improve the effectiveness and efficiency of PSO and GP for feature manipulation on high-dimensional data. This thesis proposes a new PSO based feature selection approach to high-dimensional data by incorporating a new local search to balance global and local search of PSO. A hybrid of wrapper and filter evaluation method which can be sped up in the local search is proposed to help PSO achieve better performance, scalability and robustness on high-dimensional data. The results show that the proposed method significantly outperforms the compared methods in 80% of the cases with an increase up to 16% average accuracy while reduces the number of features from one to two orders of magnitude. This thesis develops the first PSO based feature selection via discretisation method that performs both multivariate discretisation and feature selection in a single stage to achieve better solutions than applying these techniques separately in two stages. Two new PSO representations are proposed to evolve cut-points for multiple features simultaneously. The results show that the proposed method selects less than 4.6% of the features in all cases to improve the classification performance from 5% to 23% in most cases. This thesis proposes the first clustering-based feature construction method to improve the performance of single-tree GP on high-dimensional data. A new feature clustering method is proposed to automatically group similar features into the same group based on a given redundancy level. The results show that compared with standard GP, the new method can select less than half of the features to construct a new high-level feature that achieves significantly better accuracy in most cases. The combination of the single constructed feature and the selected ones achieves the best performance among different feature sets created from a single tree. This thesis develops the first class-dependent multiple feature construction method using multi-tree GP for high-dimensional data. A new GP representation and a new filter fitness function that combines two filter measures are proposed to evaluate the whole set of constructed features more effectively and efficiently. The results show that in 83% of the cases, with less than 10 constructed features, the class-dependent method increases up to 32% average accuracy on using all the original thousands of features and 10% on using those constructed by the class-independent method. en_NZ
dc.language.iso en_NZ
dc.publisher Victoria University of Wellington en_NZ
dc.subject Evolutionary Computation en_NZ
dc.subject Feature selection en_NZ
dc.subject Feature construction en_NZ
dc.subject Classification en_NZ
dc.subject High-dimensional data en_NZ
dc.title Evolutionary Computation for Feature Manipulation in Classification on High-dimensional Data en_NZ
dc.type Text en_NZ
vuwschema.contributor.unit School of Engineering and Computer Science en_NZ
vuwschema.type.vuw Awarded Doctoral Thesis en_NZ
thesis.degree.discipline Computer Science en_NZ
thesis.degree.grantor Victoria University of Wellington en_NZ
thesis.degree.level Doctoral en_NZ
thesis.degree.name Doctor of Philosophy en_NZ
vuwschema.subject.anzsrcfor 080108 Neural, Evolutionary and Fuzzy Computation en_NZ
vuwschema.subject.anzsrcfor 080109 Pattern Recognition and Data Mining en_NZ
vuwschema.subject.anzsrcseo 970108 Expanding Knowledge in the Information and Computing Sciences en_NZ


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search ResearchArchive


Advanced Search

Browse

My Account

Statistics