International Journal of Computer
& Organization Trends

Research Article | Open Access | Download PDF

Volume 1 | Issue 3 | Year 2011 | Article Id. IJCOT-V1I3P303 | DOI : https://doi.org/10.14445/22492593/ IJCOT-V1I3P303

Unsupervised Approach for Document Clustering Using Modified Fuzzy C mean Algorithm


K.Sathiyakumari , V.Preamsudha , G.Manimekalai

Citation :

K.Sathiyakumari , V.Preamsudha , G.Manimekalai, "Unsupervised Approach for Document Clustering Using Modified Fuzzy C mean Algorithm," International Journal of Computer & Organization Trends (IJCOT), vol. 1, no. 3, pp. 11-15, 2011. Crossref, https://doi.org/10.14445/22492593/ IJCOT-V1I3P303

Abstract

Clustering is one the main area in data mining literature. There are various algorithms for clustering. There are several clustering approaches available in the literature to clu ster the document. But most of the existing clustering techniques suffer from a wide range of limitations. The existing clustering approaches face the issues like practical applicability, very less accuracy, more classification time etc. In recent times, inclusion of fuzzy logic in clustering results in better clustering results. One of the widely used fuzzy logic based clustering is Fuzzy C - Means (FCM) Clustering. In order to further improve the performance of clustering, this thesis uses Modified Fuzzy C - Means (MFCM) Clustering. Before clustering, the documents are ranked using Term Frequency – Inverse Document Frequency (TF – IDF) technique. From the experimental results, it can be observed that the proposed technique results in better clustering results whe n compared to the existing technique .

Keywords

Data mining, M FCM algorithm, Purity, Entropy, TF - IDF.

References

[1] J. Han, M. Kamber, Data Mining: Concepts and Techniques, Morgan Kaufmann, 2000.
[2] J.C. Bezdek, Pattern Recognition with Fuzzy Objective Function Algorithms, Kluwer, Norwell, MA, 1981.
[3] W. Pedrycz, Conditional fuzzy C-means, Pattern Recognition Letters 17 (1996) 625–632.
[4] J.Li, X.B.Gao, L.C.Jiao, A novel typical-sample-weighting clustering algorithm for large datasets, LANI, vol. 3801,2005.
[5] X. Wang. A Course in Fuzzy Systems and Control. Prentice Hall, Inc, Upper Saddle River, NJ, 1997.
[6] M. F. Porter,"An algorithm for suffix stripping", Program; automated library and information systems, 14(3), 130- 137, 1980.
[7] Aggarwal, C. C., Wolf, J. L., Yu, P. S., Procopiuc, C., & Park, J. S. (1999). Fast algorithms for projected clustering. ACM SIGMOD Conference (pp. 61–72).
[8] Zhao, Y., & Karypis, G. (2001). Criterion functions for document clustering: Experiments and analysis (Technical Report). Department of Computer Science, University of Minnesota.
[9] R. Baeza-Yates and B. Ribeiro-Neto (1999). Modern Information Retrieval. New York: Addison Wesley, ACM Press, 1999.
[10] Nikravesh, L. A. Zadeh, B. Azvin and R. Yager (editors). Enhancing the Power of the Internet - Studies in Fuzziness and Soft Computing, Springer, vol. 139, pp. 255-278, January 2004
[11] Pallav Roxy, and Durga Toshniwal, ―Clustering Unstructured Text Documents Using Fading Function‖, International Journal of Information and Mathematical Sciences, Vol 5, NO. 3 2009.
[12] Shady Shehata, Fakhri Karray and Mohamed S. Kamel, "An Efficient Model For Enhancing Text Categorization Using Sentence Semantics", International Journal of Computational Intelligence, 2010.
[13] Jun Zhai, Yan Chen, Qinglian Wang and Miao Lv ―Fuzzy ontology models using intuitionistic fuzzy set for knowledge sharing on the semantic web‖, 12th International Conference on Computer Supported Cooperative Work in Design, 2008.
[14] A. Hinneburg and D.A. Keim. Optimal gridclustering: Towards breaking the curse of dimensionality in highdimensional clustering. In Proc. of VLDB-1999, Edinburgh, Scotland, September 2000. Morgan Kaufmann, 1999.
[15] H. Schuetze and C. Silverstein. Projections for efficient document clustering. In Proc. of SIGIR-1997, Philadelphia, PA, July 1997, pages 74–81. Morgan Kaufmann, 1997.
[16] Liping Jing,‖ Survey of Text Clustering‖, Department of Mathematics, The University of Hong Kong, HongKong, China, , ISBN: 7695-1754-4/02
[17] G. Stumme, R. Taouil, Y. Bastide, N. Pasquier and L. Lakhan, ―Computing iceberg concept lattice with Titanic‖, Journal on Knowledge and Data Engineering, Vol. 42, No. 2, 2002, pp. 189-222.
[18] S. Pollandt, Fuzzy-Begriffe: Formale Begriffsanalyse unscharfer Daten, Springer Verlag, Berlin- Heidelberg, 1996.