Clustering Categorical-Time Evolving Data from K-Means to Rough Set Theory Using Map-Reduce Technique

S.Sridevi; Dr. Jeevaa Katiravan

doi:https://doi.org/10.14445/22492593/IJCOT-V45P305

Research Article | Open Access | Download PDF

Volume 7 | Issue 4 | Year 2017 | Article Id. IJCOT-V45P305 | DOI : https://doi.org/10.14445/22492593/IJCOT-V45P305

Clustering Categorical-Time Evolving Data from K-Means to Rough Set Theory Using Map-Reduce Technique

S.Sridevi, Dr. Jeevaa Katiravan

Citation :

S.Sridevi, Dr. Jeevaa Katiravan, "Clustering Categorical-Time Evolving Data from K-Means to Rough Set Theory Using Map-Reduce Technique," International Journal of Computer & Organization Trends (IJCOT), vol. 7, no. 4, pp. 37-45, 2017. Crossref, https://doi.org/10.14445/22492593/IJCOT-V45P305

Abstract

In the cloud environment, utilization of resources should be scaled-up and scaled-down according to the customer needs. Managing the scalability in the cloud is a critical issue. Scalability can be accomplished by dynamic resource allocation. This dynamic resource allocation based on demand is efficient only on the knowledge of load prediction. Improving the accuracy of load prediction is essential to achieve optimal job scheduling and load balancing for cloud computing. When the load prediction and server reliability is carried out simultaneously, an optimal resource allocation is possible. Various load prediction methods are discussed in this paper.

Keywords

load prediction, prediction accuracy, dynamic resource allocation.

References

[1] J.Han, M.Kamber, Data Mining: Concepts and Techniques, 2nd Edition, Morgan Kaufman, San Francisco, 2006.

[2] A. Malcom Marshall 1, Dr. S.Gunasekaran PG scholar(M.E), 2 prof & Head,1, “A Survey on Job and Task Scheduling in Big Data”, Dept. of Computer Science & Engineering Coimbatore Institute of Engineering & Technology, Coimbatore, India.

[3] Anil K.Jain & richard C. Dubes. “Algorithms for clustering Data”, Prentice-Hall International, 1988.

[4] Jain A K MN Murthy and PJ Flyn, “Data Clustering: A Review”, ACM Computing Survey, 1999.

[5] Kaufman L, P.Rousseuw, “Finding Groups in Data – An Introduction to Cluster Analysis”, Wiley Series in probability and Math. Sciences, 1990.

[6] Bradley, P.S.Usama Fayyad, and cory Reina, “Scaling Clustering algorithms to Large data bases”, Fourth International Conference on Knowledge Discovery and Data Mining, 1998.

[7] Joydeep Ghosh. Scalable Clustering methods for data mining. In Nong Ye, editor, “Hand book of Data Mining”, chapter 10, pp.247-277. Lawrence Ealbaum Assoc, 2003.

[8] Sudipto Guha, Adam Meyerson, Nina Mishra,Rajeev Motwani, and Liasen O”Callaghan, “Clustering data streams Theory and practice”, IEEE Transactions on Knowledge and Data Engineering, pp.515-528, 2003.

[9] Gibson, D.,Kleinberg J.M and Raghavan, P “Clustering Categorical Data An Approach Based on Dynamical Systems”, VLDB pp.3-4, pp.222-236, 2000.

[10] Michael R.Anderberg, “Cluster Analysis for applications”, Academic press, 1973.

[11] Chen H.L, M.-S.Chen and S-U Chen Lin “Frame work for clustering Concept – Drifting Categorical data”, IEEE Transaction Knowledge and Data Engineering v21 no5, 2009.

[12] Klinkenberg. R, Using Labeled and unlabeled data to learn drifting Concepts”, IJCAI-01 workshop on Learning from Temporal and Spatial Data, pp. 16-24, 2001.

[13] Aggarwal, C, Han, J., Wang, J. and Yu P, “A Framework for Clustering Evolving Data Streams”, Very Large Data Bases(VLDB), 2003.

[14] Aggarwal, C, Wolf, J.L., Yu, P.S. Procopiuc, C. and Park,

J.S. “Fast-Algorithms for projected Clustering.”, ACM SIGMOD‟99, pp.61-72, 1999.

[15] J.W.Grzymala - Busse and W.Ziarko. Data mining and rough set theory. Commun. ACM, 43(4):108-109, Apr 2000.

[16] Z.Pawlak, J. Grzymala – Busse, R. Slowinski, and W. Ziarko. Rough Sets, Commun. ACM, 38(11):88-95, Nov. 1995.

[17] W. Ziarko. Discovery through rough Set theory. Commun. ACM, 42(11):54-57, Nov. 1999.

[18] K.Kaneiwa. A Rough Set approach to mining Connections from information Systems. In proceedings of the 2010 ACM Symposium on Applied Computing, SAC‟10, pages 990-996, New York, NY, USA, 2010. ACM.

[19] S.Tsumoto. Automated extraction of medical expert system rules from Clinical databases based on rough Set theory. Information Sciences, 112(1-4):67-84, Dec. 1998.

[20] Y. Leung, W – Z, Wu , and W. –X. Zhang. Knowledge acquisition in incomplete information systems: A rough set approach. European Journal of Operational Research, 168(1):164-180, Jan.2006.

[21] Q. Hu, W. Pedrycz, D. Yu, and J. Lang. Selecting discrete and continuous features based on neighborhood decision error minimization. Systems, Man, and Cybernetics, Part B: Cybernetics, IEEE Transactions on, 40(1): 137-150, feb. 2010.

[22] Q. Hu, Z. Xie, and D. Yu. Hybrid attribute reduction based on a novel fuzzy-rough model and information granulation. Pattern Recognition, 40(12): 3509-3521, Dec. 2007.

[23] Y. Qian, J. Liang, W. Pedrycz, and C. Dang. Positive approximation: An accelerator for attribute reduction in rough set theory. Artificial Intelligence, 174(9-10):597- 618, June 2010.

[24] Hadoop: Open source implementation of MapReduce,<http: //hadoop.apache.org/mapreduce/>.

[25] J. Dean and S. Ghemawat. Mapreduce: simplified data processing on large clusters. In Proceedings of the 6th conference on Symposium on Operating Systems Design & Implementation – Volume 6, OSDI‟04, pages 10-10, Berkeley, CA, USA, 2004. USENIXAssociation.

[26] J. Dean and S. Ghemawat. Mapreduce: simplified data processing on large clusters. Commun. ACM, 51(1): 107- 113, Jan. 2008.

[27] Apache Hadoop. http://hadoop.apache.org/

[28] J. Venner, Pro Hadoop. Apress, June 22, 2009.

[29] T. White, Hadoop: The Definitive Guide. O‟Reilly Media, Yahoo! Press, June 5, 2009.

[30] S. Ghemawat, H. Gobioff, S. Leung. “The Google file system,” In Proc. Of ACM Symposium on Operating Systems Principles, Lake George, NY, Oct 2003, pp 29-43.

[31] A. Thusoo, J. S. Sarma, N. Jain, Z. Shao, P. Chakka, S. Anthony, H. Liu, P. Wyckoff, R. Murthy, “Hive- A Warehousing Solution Over a Map-Reduce Framework.” In Proc. Of Very Large Data Bases, vol.2 no.2, August 2009,

pp. 1626-1629.

[32] F.P. Junqueira, B. C. Reed. “The life and times of a zookeeper.“ In Proc. Of the 28th ACM Symposium on Principles of Distributed Computing, Calgary, AB, Canada, August 10-12, 2009.

[33] A. Gates, O. Natkovich, S. Chopra, P. Kamath, S. Narayanam, C.Olston, B. Reed, S. Srinivasan, U. Srivastava. “Building a High-Level Dataflow System on top of MapReduce: The Pig Experience,” In Proc. of Very Large Data Bases, vol 2 no. 2, 2009, pp. 1414-1425.

[34] H. Venkateshwara Reddy, S. Viswanadha Raju,A Study in

Employing Rough Set Based Approach for Clustering on Categorical Time-Evolving Data, IOSR Journal of Computer Engineering (IOSRJCE) ISSN: 2278-0661 Volume 3. Issue 5 (July – Aug. 2012), PP 44-51 www.iosrjournals.org.

[35] Junbo Zhang, Tianrui Li, Yi Pan, “Parallel Rough Set Based Knowledge Acquisition Using MapReduce from Big Data”, BigMine 12, August 12, 2012 Beijing, China ACM 978-1-4503-1547-0/12/08.

[36] Y. Swapna, S. Ravi Sankar, “A Framework for clustering Time Evolving Data Using Sliding Window Technique”, Vol. 3, Issue 3, Oct-Dec(2012), pp. 377-383, IJCET.

[37] Prajesh P Anchalia, Anjan K Koundinya, Srinath N K, “MapReduce Design of K-Means Clustering Algorithm”, IEEE, 2013.

[38] Weizhong Zhao, Huifang Ma, and Qing He, “Parallel K- Means Clustering Based on MapReduce” The Key Laboratory of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences Graduate University of Chinese Academy of Sciences.

[39] K.R. Madhavi, Dr. A. V. Babu & Dr. A.A. Rao, “Data Labelling Method for Clustering Time Evolving Categorical Attributes, ”Global Journal of

Computer Science and Technology: C Software and Data Engineering, vol. 15, issue 5 version 1.0 , 2015.

[40] Ashish A. Golghate, Shailendra W. Shende, “Parallel K- Means Clustering Based on Hadoop and Hama”, Department of Information Technology, Yeshwantrao Chavan College of Engineering, Nagpur, Maharashtra, India.

[41] K.V.N.Rajesh, Ravuri Daniel, P.Prudhvi kiran, T. Madhuri Priyadarshani, “Implementation and Analysis of Map- Reduce K-Means Clustering Algorithm in Hadoop”, Research Article, Vol 6, Issue No.5, IJESC.

[42] Anjan K Koundinya, Srinath N K, A K Sharma, Kiran Kumar, Madhu M N and Kiran U Shanbagh, “Map/Reduce Design and Implementation of Apriori Algorithm for handling Voluminous Data-Sets, Advanced Computing: An International Journal (ACIJ), Vol.3, No.6, Nov. 2012.