International Journal of Computer
& Organization Trends

Research Article | Open Access | Download PDF

Volume 6 | Issue 6 | Year 2016 | Article Id. IJCOT-V37P301 | DOI : https://doi.org/10.14445/22492593/IJCOT-V37P301

A Novel Two Step Genetic Program for High- Dimensional Data


Mandava Mamatha, Ms.J. Rama Devi

Citation :

Mandava Mamatha, Ms.J. Rama Devi, "A Novel Two Step Genetic Program for High- Dimensional Data," International Journal of Computer & Organization Trends (IJCOT), vol. 6, no. 6, pp. 1-4, 2016. Crossref, https://doi.org/10.14445/22492593/IJCOT-V37P301

Abstract

The increasing volume of data to be analysed imposes new challenges to the data mining methodologies. Now a day’s datasets have the problem curse of dimensionality. To solve these problems by the techniques PCA, 2SGP (Two Step Genetic Programming) are popular tools for linear dimensionality reduction and feature extraction. These traditional data mining methods are not integrate well with larger data sizes, miss the accuracy in terms of memory and time and not support for non-linear data. So, we propose a novel 2SKPCA algorithm .i.e; KPCA (Kernel Principal Component Analysis) and GP (Genetic Programming) to deal with high-dimensional non linear data also and reduce the dimensions. KPCA is the linear, nonlinear form of PCA, which better exploits the complicated spatial structure of highdimensional features. GP produces the feature selections and derives the significant features from the original features.

Keywords

Feature extraction, high-dimensional data, kernel PCA, Genetic Programming, Classification.

References

[1] Aronis, J ., Kolluri, V., Provost, F .,& Buchanan, B. The WORLD: Knowledge discovery from multiple distributed databases. In proceedings of 10th international Florida AI Research symposium.
[2] Hubert, M., Rousseeuw, P. J., & Vanden Branden,K.ROBPCA: a new approach to robust principal component analysis.Technometrics,
[3] Rosipal, R., Girolami, M., Trejo, L. J., & Cichocki, A. (2001). Kernel PCA for feature extraction and de-noising in nonlinear regression. Neural Computing & Applications,
[4] Zhang, Y., & Bhattacharyya, S. (2004). Genetic programming in classifying large-scale data: an ensemble method. Information Sciences,
[5] Chan, P. K., & Stolfo, S. J. (1998). Toward Scalable Learning with Non-Uniform Class and Cost Distributions: A Case Study in Credit Card Fraud Detection. In KDD (Vol. 1998, 164-168).
[6] Koza, J. R. (1992). Genetic programming: on the programming of computers by means of natural selection (Vol. 1). MIT press.
[7] Kumar, P., & Pandey, K. (2013). Big Data and Distributed Data Mining: An Example of Future Networks. International Journal, 2, 36-39.
[8] Koldovsky, Z., Tichavsky, P., & Oja, E. (2006). Efficient Variant of Algorithm FastICA for Independent Component Analysis Attaining CramÉr-Rao Lower Bound. Neural Networks, IEEE Transactions on, 17(5), 1265-1277.
[9] Du, Q., & Kopriva, I. (2008). Automated target detection and discrimination using constrained kurtosis maximization. Geoscience and Remote Sensing Letters, IEEE, 5(1), 38-42.
[10] Katal, A., Wazid, M., & Goudar, R. H. (2013). Big data: Issues, challenges, tools and Good practices. In Contemporary Computing (IC3), 2013 Sixth International Conference on, 404- 409.