Victoria University

Feature Manipulation with Genetic Programming

ResearchArchive/Manakin Repository

Show simple item record

dc.contributor.advisor Zhang, Mengjie
dc.contributor.advisor Andreae, Peter Neshatian, Kourosh 2010-10-05T21:03:10Z en_NZ 2015-06-22T01:58:19Z 2010-10-05T21:03:10Z en_NZ 2015-06-22T01:58:19Z 2010 2010
dc.description.abstract Feature manipulation refers to the process by which the input space of a machine learning task is altered in order to improve the learning quality and performance. Three major aspects of feature manipulation are feature construction, feature ranking and feature selection. This thesis proposes a new filter-based methodology for feature manipulation in classification problems using genetic programming (GP). The goal is to modify the input representation of classification problems in order to improve classification performance and reduce the complexity of classification models. The thesis regards classification problems as a collection of variables including conditional variables (input features) and decision variables (target class labels). GP is used to discover the relationships between these variables. The types of relationship and the ways in which they are discovered vary with the three aspects of feature manipulation. In feature construction, the thesis proposes a GP-based method to construct high-level features in the form of functions of original input features. The functions are evolved by GP using an entropy-based fitness function that maximises the purity of class intervals. Unlike existing algorithms, the proposed GP-based method constructs multiple features and it can effectively perform transformational dimensionality reduction, using only a small number of GP-constructed features while preserving good classification performance. In feature ranking, the thesis proposes two GP-based methods for ranking single features and subsets of features. In single-feature ranking, the proposed method measures the influence of individual features on the classification performance by using GP to evolve a collection of weak classification models, and then measures the contribution of input features to the making of good models. In ranking of subsets of features, a virtual structure for GP trees and a new binary relevance function is proposed to measure the relationship between a subset of features and the target class labels. It is observed that the proposed method can discover complex relationships - such as multi-modal class distributions and multivariate correlations - that cannot be detected by traditional methods. In feature selection, the thesis provides a novel multi-objective GP-based approach to measuring the goodness of subsets of features. The subsets are evaluated based on their cardinality and their relationship to target class labels. The selection is performed by choosing a subset of features from a GP-discovered Pareto front containing suboptimal solutions (subsets). The thesis also proposes a novel method for measuring the redundancy between input features. It is used to select a subset of relevant features that do not exhibit redundancy with respect to each other. It is found that in all three aspects of feature manipulation, the proposed GP-based methodology is effective in discovering relationships between the features of a classification task. In the case of feature construction, the proposed GP-based methods evolve functions of conditional variables that can significantly improve the classification performance and reduce the complexity of the learned classifiers. In the case of feature ranking, the proposed GP-based methods can find complex relationships between conditional variables and decision variables. The resulted ranking shows a strong linear correlation with the actual classification performance. In the case of feature selection, the proposed GP-based method can find a set of sub-optimal subsets of features which provids a trade-off between the number of features and their relevance to the classification task. The proposed redundancy removal method can remove redundant features from a set of features. Both proposed feature selection methods can find an optimal subset of features that yields significantly better classification performance with a much smaller number of features than conventional classification methods. en_NZ
dc.language.iso en_NZ en_NZ
dc.publisher Victoria University of Wellington en_NZ
dc.subject Machine learning en_NZ
dc.subject Evolutionary algorithms en_NZ
dc.subject Feature selection en_NZ
dc.subject Feature construction en_NZ
dc.title Feature Manipulation with Genetic Programming en_NZ
dc.type Text en_NZ
vuwschema.contributor.unit School of Engineering and Computer Science en_NZ
vuwschema.subject.marsden 280207 Pattern Recognition en_NZ
vuwschema.type.vuw Awarded Doctoral Thesis en_NZ Computer Science en_NZ Victoria University of Wellington en_NZ Doctoral en_NZ Doctor of Philosophy en_NZ
vuwschema.subject.anzsrcfor 089999 Information and Computing Sciences not elsewhere classified en_NZ

Files in this item

This item appears in the following Collection(s)

Show simple item record

Search ResearchArchive

Advanced Search


My Account