Research Article | Open Access | Download PDF
Volume 1 | Issue 3 | Year 2011 | Article Id. IJCOT-V1I3P303 | DOI : https://doi.org/10.14445/22492593/ IJCOT-V1I3P303
Unsupervised Approach for Document Clustering Using Modified Fuzzy C mean Algorithm
K.Sathiyakumari , V.Preamsudha , G.Manimekalai
Citation :
K.Sathiyakumari , V.Preamsudha , G.Manimekalai, "Unsupervised Approach for Document Clustering Using Modified Fuzzy C mean Algorithm," International Journal of Computer & Organization Trends (IJCOT), vol. 1, no. 3, pp. 11-15, 2011. Crossref, https://doi.org/10.14445/22492593/ IJCOT-V1I3P303
Abstract
Clustering is one the main area in data mining literature. There are various algorithms for clustering. There are several clustering approaches available in the literature to clu ster the document. But most of the existing clustering techniques suffer from a wide range of limitations. The existing clustering approaches face the issues like practical applicability, very less accuracy, more classification time etc. In recent times, inclusion of fuzzy logic in clustering results in better clustering results. One of the widely used fuzzy logic based clustering is Fuzzy C - Means (FCM) Clustering. In order to further improve the performance of clustering, this thesis uses Modified Fuzzy C - Means (MFCM) Clustering. Before clustering, the documents are ranked using Term Frequency – Inverse Document Frequency (TF – IDF) technique. From the experimental results, it can be observed that the proposed technique results in better clustering results whe n compared to the existing technique .
Keywords
Data mining, M FCM algorithm, Purity, Entropy, TF - IDF.
References
[1] J. Han, M. Kamber, Data Mining: Concepts and Techniques,
Morgan Kaufmann, 2000.
[2] J.C. Bezdek, Pattern Recognition with Fuzzy Objective
Function Algorithms, Kluwer, Norwell, MA, 1981.
[3] W. Pedrycz, Conditional fuzzy C-means, Pattern Recognition
Letters 17 (1996) 625–632.
[4] J.Li, X.B.Gao, L.C.Jiao, A novel typical-sample-weighting
clustering algorithm for large datasets, LANI, vol.
3801,2005.
[5] X. Wang. A Course in Fuzzy Systems and Control. Prentice
Hall, Inc, Upper Saddle River, NJ, 1997.
[6] M. F. Porter,"An algorithm for suffix stripping", Program;
automated library and information systems, 14(3), 130- 137,
1980.
[7] Aggarwal, C. C., Wolf, J. L., Yu, P. S., Procopiuc, C., & Park,
J. S. (1999). Fast algorithms for projected clustering. ACM
SIGMOD Conference (pp. 61–72).
[8] Zhao, Y., & Karypis, G. (2001). Criterion functions for
document clustering: Experiments and analysis (Technical
Report). Department of Computer Science, University of
Minnesota.
[9] R. Baeza-Yates and B. Ribeiro-Neto (1999). Modern
Information Retrieval. New York: Addison Wesley, ACM
Press, 1999.
[10] Nikravesh, L. A. Zadeh, B. Azvin and R. Yager (editors).
Enhancing the Power of the Internet - Studies in Fuzziness
and Soft Computing, Springer, vol. 139, pp. 255-278,
January 2004
[11] Pallav Roxy, and Durga Toshniwal, ―Clustering Unstructured
Text Documents Using Fading Function‖, International
Journal of Information and Mathematical Sciences, Vol 5,
NO. 3 2009.
[12] Shady Shehata, Fakhri Karray and Mohamed S. Kamel, "An
Efficient Model For Enhancing Text Categorization Using
Sentence Semantics", International Journal of Computational
Intelligence, 2010.
[13] Jun Zhai, Yan Chen, Qinglian Wang and Miao Lv ―Fuzzy
ontology models using intuitionistic fuzzy set for knowledge
sharing on the semantic web‖, 12th International Conference
on Computer Supported Cooperative Work in Design, 2008.
[14] A. Hinneburg and D.A. Keim. Optimal gridclustering:
Towards breaking the curse of dimensionality in highdimensional clustering. In Proc. of VLDB-1999, Edinburgh,
Scotland, September 2000. Morgan Kaufmann, 1999.
[15] H. Schuetze and C. Silverstein. Projections for efficient
document clustering. In Proc. of SIGIR-1997, Philadelphia,
PA, July 1997, pages 74–81. Morgan Kaufmann, 1997.
[16] Liping Jing,‖ Survey of Text Clustering‖, Department of
Mathematics, The University of Hong Kong, HongKong,
China, , ISBN: 7695-1754-4/02
[17] G. Stumme, R. Taouil, Y. Bastide, N. Pasquier and L.
Lakhan, ―Computing iceberg concept lattice with Titanic‖,
Journal on Knowledge and Data Engineering, Vol. 42, No. 2,
2002, pp. 189-222.
[18] S. Pollandt, Fuzzy-Begriffe: Formale Begriffsanalyse
unscharfer Daten, Springer Verlag, Berlin- Heidelberg, 1996.