Comparison and Evaluation of scaled data mining algorithms

M Afshar Alam; Sapna Jain; Ranjit Biswas

doi:https://doi.org/10.14445/22492593/	IJCOT-V1I3P306

Research Article | Open Access | Download PDF

Volume 1 | Issue 3 | Year 2011 | Article Id. IJCOT-V1I3P306 | DOI : https://doi.org/10.14445/22492593/ IJCOT-V1I3P306

Comparison and Evaluation of scaled data mining algorithms

M Afshar Alam , Sapna Jain ,Ranjit Biswas

Citation :

M Afshar Alam , Sapna Jain ,Ranjit Biswas, "Comparison and Evaluation of scaled data mining algorithms," International Journal of Computer & Organization Trends (IJCOT), vol. 1, no. 3, pp. 28-34, 2011. Crossref, https://doi.org/10.14445/22492593/ IJCOT-V1I3P306

Abstract

Association rule mining is the most popular technique in data mining. Mining association rules is a prototypical problem as the data are being generated and stored every day in corporate computer database systems. To manage this knowledge, rules have to be pruned and grouped, so that only reasonable numbers of rules have to be inspected and analyzed. In this paper we compare the standard association rule algorithms with the proposed Scaled Association Rules algorithm and AIREP algorithm. All these algorithms are compared according to the various factors like Type of dataset, support counting, rule generation, candidate generation, computational complexity and other factors .The conclusions drawn are based on the efficiency ,performance , accuracy and scalability parameters of the algorithms.

Keywords

Association rule, Data Mining, Multidimensional dataset, Pruning, Frequent itemset. Introduction

References

[1] J. P. Bigus., “Data Mining with Neural Networks”, McGrawHill, 1996
[2] T. M. Mitchell., “Machine Learning”, McGraw-Hill, 1997.
[3] Sousa, M.S. Mattoso, M.L.Q. Ebecken, N.F.F. "Data Mining on Parallel Database Systems" Proc. Int. Conf. on Parallel and Distributed Processing Techniques and Applications (PDPTA'98), Special Session on Parallel Data Warehousing, CSREA Press, Las Vegas, E.U.A., Pp.1147-1154, July 1998.
[4] Fayyad U, “Data Mining and Knowledge Discovery in Databases: Implications from scientific databases,” In Proc. of the 9th Int. Conf. on Scientific and Statistical Database Management, Olympia, Washington, USA, pp. 2-11, 1997.
[5] Tsau Young Lin, "Sampling in association rule mining", Conference on Data mining and knowledge discovery: Theory, Tools, and Technology VI, vol. 5433, pp.: 161-167, 2004.
[6] Klaus Julisch," Data Mining for Intrusion Detection -A Critical Review" in proc. of IBM Research on application of Data Mining in Computer security, Chapter 1 , 2002.
[7] Jeffrey W. Seifert, "Data Mining: An Overview", in proceedings of CRS Report for Congress, 2004.
[8] Coenen F, Leng P, Goulbourne, G., “Tree Structures for Mining Association Rules,” In Journal of Data Mining and Knowledge Discovery, Vol. 15, pp. 391-398, 2004.
[9] Marek Wojciechowski, Krzysztof Galecki, Krzysztof Gawronek: „Concurrent Processing of Frequent Itemset Queries Using FP-Growth Algorithm‟, Proc. Of the 1st ADBIS Workshop on Data Mining and Knowledge Discovery (ADMKD'05), Tallinn, Estonia, 2005. V.Umarani et. al. / IJCSR International Journal of Computer Science and Research, Vol. 1 Issue 1, 2010 ISSN : 2210-9668 http://www.cscjournals.com 33
[10] Yu-Chiang Li, Jieh-Shan Yeh, Chin-Chen Chang, “Efficient Algorithms for Mining Shared-Frequent Itemsets”, In Proceedings of the 11th World Congress of Intl. Fuzzy Systems Association, 2005.
[11] F. Bodon, “A Fast Apriori Implementation”, In B. Goethals and M. J. Zaki, editors, Proceedings of the IEEE ICDM Workshop on Frequent Itemset Mining Implementations, Vol. 90 of CEUR Workshop Proceedings, 2003.
[12] Basel A. Mahafzah, Amer F. Al-Badarneh and Mohammed Z. Zakaria "A new sampling technique for association rule mining," in Journal Of Information Science, Vol.35, pp. 358- 376, 2009.
[13] Venkatesan T. Chakaravarthy, Vinayaka Pandit and Yogish Sabharwal, "Analysis of sampling techniques for association rule mining," In Proceedings of the 12th International Conference on Database Theory, Vol. 361, pp. 276-283, 2009.
[14] Y. Zhao, C. Zhang and S. Zhang, “Efficient frequent itemsets mining by sampling,” Proceedings of the fourth International Conference on Active Media Technology (AMT), pp. 112- 117, 2006.
[15] Han, j. and Pei, J. 2000. Mining frequent patterns by patterngrowth: methodology and implications. ACM SIGKDD Explorations Newsletter2, 2, 14-20.
[16] Wang, C., Tjortjis, C., Prices: An Efficient Algorithm for Mining Association Rules, Lecture Notes in Computer Science, Volume 2447, 2002. pp. 77-83.
[17] Yuan, Y., Huang, T., A Matrix Algorithm for Mining Association Rules, Lecture Notes in Computer Science, Volume 3664, Sep2005.pp 370-379.
[18] R.Agrawal, T.Imielinski, and A.Swami, “Mining association rules between sets of Items in large databases”, in proceedings of the ACM SIGMOD Int'l Conf. on Management of data, pp. 207-216, 1993.
[19] Choh Man Teng, "A Comparison of Standard and Interval Association Rules", In Proceedings of the Sixteenth International FLAIRS Conference, pp.: 371-375, 2003.
[20] Suzuki Kaoru, “Data Mining and the Case for Sampling,” SAS Institute Best Practices Paper, SAS Institute, 1998.
[21] Soo, J., Chen, M.S., and Yu, P.S., 1997, “Using a Hash-Based Method with Transaction Trimming and Database Scan Reduction for Mining Association Rules” IEEE Transactions On Knowledge and Data Engineering, Vol.No.5. pp. 813-825.
[22] En Tzu Wang and Arbee L.P. ChenData,“ A Novel Hash-based approach for mining frequent itemsets over data streams requiring less Memory space” Data Mining and Knowledge Discovery, Volume 19, Number 1, pp 132-172.
[23] Wojciechowski, M., Zakrzewiez, M., Dataset filtering Techniques in Constraint based Frequent pattern Mining, Lecture Notes in Computer Science, Volume 2447, 2002, pp77-83.
[24] Tien Dung Do, Siu Cheng Hui,Alvis Fong, Mining frequent itemsets with category Based Constraints. Lecture Notes in Computer Science, Volume 2843, 2003, pp226-234.
[25] Das, A., Ng, W.K., and Woon, Y, K. 2001. Rapid association rule mining. In the proceedings of the tenth international conference on Information and knowledge management.. ACM press, 474-481.
[26] Rakesh Agarwal, Ramakrishnan Srikant,” Fast Algorithms for Mining Association Rules” 20th Intl Conference on VLDB, Santigo, Chile, Set.1994.
[27] Thevar., R.E; Krishnamoorthy, R,” A new approach of modified transaction reduction algorithm for mining frequent itemset”, ICCIT 2008.11th conference on Computer and Information Technology.
[28] Cheung, D., Han, J.Ng, V., Fu, A and Fu, Y. (1996), “A fast distributed algorithm for mining association rules” in Proc of 1996 Int‟l Conference on Parallel and Distributed Information Systems‟. Miami Beach, Florida, pp.31-44.
[29] Parthasarathy, S., "Efficient progressive sampling for association rules", IEEE International Conference on Data Mining, pp.: 354- 361, 2002
[30] V.Umarani and M.Punithavalli,” Developing a Novel and Effective Approach for Association Rule Mining Using Progressive Sampling” In the proc of 2nd Int‟l Conference on Computer and Electrical Engineering (ICCEE 2009), vol.1, pp610-614.
[31] V.Umarani and M.Punithavalli,” On Developing an Effectual Progressive Sampling Based Approach for Association Rule Discovery” In the proc of 2nd IEEE Int‟l Conference on Information and data Engineering (2nd IEEE ICIME 2010), Chengdu ,China April 2010.
[32] Cheung, D., Xaio, Y., Effect of data skewness in parallel mining of association rules, Lecture Notes in Computer Science, Volume 1394,Aug 1998,pages 48-60.
[33] Raymond Chi-Wing Wong, Ada Wai-Chee Fu, "Association Rule Mining and its Application to MPIS", 2003.
[34] Agrawal, R. and Srikant, R., Fast algorithms for mining association rules. In Proc.20th Int. Conf. Very Large Data Bases, 487-499, 1994.
[35] Sotiris Kotsiantis, Dimitris Kanellopoulos,” Association Rules Mining: A Recent Overview", GESTS International Transactions on Computer Science and Engineering, Vol.32, No: 1, pp. 71-82, 2006.
[36] Parthasarathy, S., Zaki, M.J.J., Ogihara, M., Parallel data mining for association rules on shared-memory systems, Knowledge and Information Systems: An International Journal,3(1):1-29,February 2001.
[37] Basel A. Mahafzah, Amer F. Al-Badarneh and Mohammed Z. Zakaria "A new sampling technique for association rule mining," in Journal of InformationScience, Vol. 35, pp. 358- 376, 2009.
[38] B.Lent, A.Swami,J.Wisdom, “Clustering association rules”, In the proc of 13th Int‟l Conference on Data Engineering,pp.220.
[39] John D. Holt and Soon M. Chung,” Mining of Association Rules in Text Databases Using Inverted Hashing and Pruning” Lecture Notes in Computer Science, 2000, Volume 1874/2000, 290-300.
[40] Rajendra K.Gupta and Dev Prakash Agarwal,”Improving the performance of Association Rule Mining Algorithms by Filtering Insignificant Transactions dynamically”, Asian Journal of Information Management, pp.7-17. 009 Academic Journals Inc.
[41] Pi Dechang and Qin Xiaolin,” A New Fuzzy Clustering Algorithm on Association Rules for Knowledge Management”, Information Technology Journal. Pp. 119-124, 2008. Asian Network for Scientific Information.
[42] Margaret H.Dunham,”Data mining Introductory and Advanced Topics”, Pearson Education 2008.
[43] Tamanna Siddqui,M Afshar Alam ,Sapna jain ,” Discovery of Scalable Association Rule from large set of multidimensional quantitative datasets.”,Academy publisher Journal
[44]Sapna jain,M Afshar Alam ,Ranjit Biswas ,” A I R E P : a novel scaled multidimensional quantitative rules generation approach.