| | Document Type | : | Latin Dissertation | Language of Document | : | English | Record Number | : | 53193 | Doc. No | : | TL23147 | Call number | : | 1469108 | Main Entry | : | Omar Mukhtar | Title & Author | : | AlphaRank: A new smoothing algorithm based on combination of link analysis techniques and frequency based methodsOmar Mukhtar | College | : | State University of New York at Buffalo | Date | : | 2009 | Degree | : | M.S. | student score | : | 2009 | Page No | : | 81 | Abstract | : | Smoothing a probability distribution so that it generalizes well is a hard machine learning problem. It is particularly challenging when building a statistical language model with insufficient training data. We have developed a new smoothing algorithm (called AlphaRank) to overcome the data sparseness problem by viewing language as a large graph where each word is a vertex and the probability of using another word is determined by the edge weight connecting two words (vertices). Thus, instead of using frequency based rules as is done in prior work, we propose a graph based method to smooth a statistical language model. Our method combines features of context-dependent probability estimators such as n-grams and features from context-independent probability estimators such as the steady-state distribution of a discrete time-step Markov chain. We have tested on a large collection of Arabic newswire articles and compared with previous approaches using the perplexity measure and found our method to be superior. | Subject | : | Applied sciences; Language, literature and linguistics; Computational linguistics; Language modeling; Smoothing; Linguistics; Artificial intelligence; Computer science; 0984:Computer science; 0290:Linguistics; 0800:Artificial intelligence | Added Entry | : | V. Govindaraju | Added Entry | : | State University of New York at Buffalo |
| | | | | |
| | |  |
http://lib.clisel.com/site/catalogue/53193
| | |
| | |  |
|