跳至主要内容

Clusters Merging Method for Short Texts Clustering

Read full paper at:
http://www.scirp.org/journal/PaperInformation.aspx?PaperID=49176#.VJJomMnQrzE

Author(s) 
Under push of Mobile Internet, new social media such as microblog, we chat, question answering systems are constantly emerging. They produce huge amounts of short texts which bring forward new challenges to text clustering. In response to the features of large amount and dynamic growth of short texts, a two-stage clustering method was putted forward. This method adopted a sliding window sliding on the flow of short texts. Inside the slide window, hierarchical clustering method was used, and between the slide windows, clusters merging method based on information gain was adopted. Experiment indicated that this method is fast and has a higher accuracy.
Cite this paper
Wang, Y. , Wu, L. and Shao, H. (2014) Clusters Merging Method for Short Texts Clustering. Open Journal of Social Sciences, 2, 186-192. doi: 10.4236/jss.2014.29032.

[1] He, H., Chen, B., Xu, W., et al. (2007) Short Text Feature Extraction and Clustering for Web Topic Mining. IEEE Third International Conference on Semantics, Knowledge and Grid, 382-385.
[2] Hartigan, J.A. and Wong, M.A. (1979) Algorithm AS 136: A k-Means Clustering Algorithm. Journal of the Royal Statistical Society, Series C (Applied Statistics), 28, 100-108.
[3] Szekely, G.J. and Rizzo, M.L. (2005) Hierarchical Clustering via Joint between-within Distances: Extending Ward’s Minimum Variance Method. Journal of Classification, 22, 151-183. http://dx.doi.org/10.1007/s00357-005-0012-9
[4] Zhao, P. and Cai, Q.S. (2007) Research of Novel Chinese Text Clustering Algorithm Based on HowNet. Computer Engineering and Applications, 43, 162-163.
[5] Tang, J., Wang, X., Gao, H., et al. (2012) Enriching Short Text Representation in Microblog for Clustering. Frontiers of Computer Science, 6, 88-101.
[6] Wang, L., Jia, Y., Han, W. (2007) Instant Message Clustering Based on Extended Vector Space Model. Advances in Computation and Intelligence, Springer Berlin Heidelberg, 435-443. http://dx.doi.org/10.1007/978-3-540-74581-5_48
[7] Peng, Z.Y., Yu, X.M., Xu H.B., et al. (2011) Incomplete Clustering for Large Scale Short Texts. Journal of Chinese Information, 25, 54-59.
[8] Chen, J.C., Hu, G.W., Yang, Z.H., et al. (2011) Text Clustering Based on Global Center-Determination. Computer Engineering and Applications, 47, 147-150.
[9] Liu, Z.X., Liu, Y.B. and Luo, L.M. (2010) An Efficient Density and Grid Based Clustering Algorithm. Journal of Chongqing University of Posts and Telecommunications (Natural Science Edition), 22, 242-247.
[10] Quinlan, J.R. (1979) Discovering Rules by Induction from Large Collections of Examples. Expert Sys-tems in the Micro Electronic Age. Edinburgh University Press.
[11] Guha, S., Rastogi, R. and Shim, K. (1998) CURE: An Efficient Clustering Algorithm for Large Databases. ACM SIGMOD Record, ACM, 27, 73-84.
[12] Zhou, Z.T. (2005) Quality Evaluation of Text Clustering Results and Investigation on Text Representation. Graduate University of Chinese Academy of Sciences, Beijing.                                    eww141218lx

评论

此博客中的热门博文

Electron Spin and Proton Spin in the Hydrogen and Hydrogen-Like Atomic Systems

Read full paper at: http://www.scirp.org/journal/PaperInformation.aspx?PaperID=52202#.VIj7tMnQrzE Author(s) Stanisław Olszewski * Affiliation(s) Institute of Physical Chemistry, Polish Academy of Sciences, Warsaw, Poland . ABSTRACT The mechanical angular momentum and magnetic moment of the electron and proton spin have been calculated semiclassically with the aid of the uncertainty principle for energy and time. The spin effects of both kinds of the elementary particles can be expressed in terms of similar formulae. The quantization of the spin motion has been done on the basis of the old quantum theory. It gives a quantum number n = 1/2 as the index of the spin state acceptable for both the electron and proton ...

A Study on the Seismic Isolation Systems of Bridges with Lead Rubber Bearings

Read full paper at: http://www.scirp.org/journal/PaperInformation.aspx?PaperID=52566#.VJt06cCAM4 Author(s)   Woo-Suk Kim , Dong-Joon Ahn , Jong-Kook Lee Affiliation(s) School of Architecture, Kumoh National Institute of Technology, Gumi, Republic of Korea . ABSTRACT This study consists of the development and presentation of example of seismic isolation system analysis and design for a continuous, 3-span, cast-in-place concrete box girder bridge. It is expected that example is developed for all Lead-Rubber Bearing (LRB) seismic isolation system on piers and abutments which placed in between super-structure and sub-structure. Design forces, displacements, and drifts are given distinctive consideration in...

Effects of Karate Training on Basic Motor Abilities of Primary School Children

“You never attack first in karate” might be the best conclusion of karate, which is a martial art practiced typically without weapons. It’s reported that karate has a long history for several hundred years, but the modern karate was spread to the whole Japan from Okinawa in the early part of 20th century. Now it has become one of the most widely practiced martial art forms in the world. Usually, it’s divided into Kihon, Kata and Kumite. As for the beginners, Kihon is more suitable for them because it involves basic techniques. Due to karate consists of dynamic offensive and defensive techniques using all parts of the body to their maximum advantage, the best understanding of true karate practice is the perfection of oneself through the perfection of the art. It not only develops coordination, quickens reflexes, and builds stamina, but also develops composure, a clearer thought process, deeper insight into one’s mental capabilities, and more self-confidence. So many researchers stu...