Victoria University

Genetic Programming for Biomarker Detection in Classification of Mass Spectrometry Data

ResearchArchive/Manakin Repository

Show simple item record

dc.contributor.advisor Zhang, Mengjie
dc.contributor.advisor Peng, Lifeng Ahmed, Soha 2016-01-05T02:33:39Z 2016-01-05T02:33:39Z 2015 2015
dc.description.abstract Mass spectrometry (MS) is currently the most commonly used technology in biochemical research for proteomic analysis. The primary goal of proteomic profiling using mass spectrometry is the classification of samples from different experimental states. To classify the MS samples, the identification of protein or peptides (biomarker detection) that are expressed differently between the classes, is required. However, due to the high dimensionality of the data and the small number of samples, classification of MS data is extremely challenging. Another important aspect of biomarker detection is the verification of the detected biomarker that acts as an intermediate step before passing these biomarkers to the experimental validation stage. Biomarker detection aims at altering the input space of the learning algorithm for improving classification of proteomic or metabolomic data. This task is performed through feature manipulation. Feature manipulation consists of three aspects: feature ranking, feature selection, and feature construction. Genetic programming (GP) is an evolutionary computation algorithm that has the intrinsic capability for the three aspects of feature manipulation. The ability of GP for feature manipulation in proteomic biomarker discovery has not been fully investigated. This thesis, therefore, proposes an embedded methodology for these three aspects of feature manipulation in high dimensional MS data using GP. The thesis also presents a method for biomarker verification, using GP. The thesis investigates the use of GP for both single-objective and multi-objective feature selection and construction. In feature ranking, the thesis proposes a GP-based method for ranking subsets of features by using GP as an ensemble approach. The proposed algorithm uses GP capability to combine the advantages of different feature ranking metrics and evolve a new ranking scheme for the subset of the features selected from the top ranked features. The capability of GP as a classifier is also investigated by this method. The results show that GP can select a smaller number of features and provide a better ranking of the selected features, which can improve the classification performance of five classifiers. In feature construction, this thesis proposes a novel multiple feature construction method, which uses a single GP tree to generate a new set of high-level features from the original set of selected features. The results show that the proposed new algorithm outperforms two feature selection algorithms. In feature selection, the thesis introduces the first GP multi-objective method for biomarker detection, which simultaneously increase the classification accuracy and reduce the number of detected features. The proposed multi-objective method can obtain better subsets of features than the single-objective algorithm and two traditional multi-objective approaches for feature selection. This thesis also develops the first multi-objective multiple feature construction algorithm for MS data. The proposed method aims at both maximising the classification performance and minimizing the cardinality of the constructed new high-level features. The results show that GP can dis- cover the complex relationships between the features and can significantly improve classification performance and reduce the cardinality. For biomarker verification, the thesis proposes the first GP biomarker verification method through measuring the peptide detectability. The method solves the imbalance problem in the data and shows improvement over the benchmark algorithms. Also, the algorithm outperforms a well-known peptide detection method. The thesis also introduces a new GP method for alignment of MS data as a preprocessing stage, which will further help in improving the biomarker detection process. en_NZ
dc.language.iso en_NZ
dc.publisher Victoria University of Wellington en_NZ
dc.subject Biomarker detection en_NZ
dc.subject Mass spectrometry en_NZ
dc.subject Genetic programming en_NZ
dc.title Genetic Programming for Biomarker Detection in Classification of Mass Spectrometry Data en_NZ
dc.type Text en_NZ
vuwschema.contributor.unit School of Engineering and Computer Science en_NZ
vuwschema.type.vuw Awarded Doctoral Thesis en_NZ Computer Science en_NZ Victoria University of Wellington en_NZ Doctoral en_NZ Doctor of Philosophy en_NZ
vuwschema.subject.anzsrcfor 080109 Pattern Recognition and Data Mining en_NZ
vuwschema.subject.anzsrcfor 080301 Bioinformatics Software en_NZ
vuwschema.subject.anzsrcseo 970108 Expanding Knowledge in the Information and Computing Sciences en_NZ

Files in this item

This item appears in the following Collection(s)

Show simple item record

Search ResearchArchive

Advanced Search


My Account