خط مشی دسترسیدرباره ماپشتیبانی آنلاین
ثبت نامثبت نام
راهنماراهنما
فارسی
ورودورود
صفحه اصلیصفحه اصلی
جستجوی مدارک
تمام متن
منابع دیجیتالی
رکورد قبلیرکورد بعدی
Document Type:Latin Dissertation
Language of Document:English
Record Number:53102
Doc. No:TL23056
Call number:‭3405188‬
Main Entry:Christian Monson
Title & Author:ParaMor: From paradigm structure to natural language morphology inductionChristian Monson
College:Carnegie Mellon University
Date:2008
Degree:Ph.D.
student score:2008
Page No:230
Abstract:Most of the world's natural languages have complex morphology. But the expense of building morphological analyzers by hand has prevented the development of morphological analysis systems for the large majority of languages. Unsupervised induction techniques, that learn from unannotated text data, can facilitate the development of computational morphology systems for new languages. Such unsupervised morphological analysis systems have been shown to help natural language processing tasks including speech recognition (Creutz, 2006) and information retrieval (Kurimo and Turunen, 2008). This thesis describes ParaMor, an unsupervised induction algorithm for learning morphological paradigms from large collections of words in any natural language Paradigms are sets of mutually substitutable morphological operations that organize the inflectional morphology of natural languages. ParaMor focuses on the most common morphological process, suffixation. ParaMor learns paradigms in a three-step algorithm. First, a recall-centric search scours a space of candidate partial paradigms for those which possibly model suffixes of true paradigms. Second, ParaMor merges selected candidates that appear to model portions of the same paradigm. And third, ParaMor discards those clusters which most likely do not model true paradigms. Based on the acquired paradigms, ParaMor then segments words into morphemes. ParaMor, by design, is particularly effective for inflectional morphology, while other systems, such as Morfessor (Creutz, 2006), better identify derivational morphology. This thesis leverages the complementary strengths of ParaMor and Morfessor by adjoining the analyses from the two systems. ParaMor and its combination with Morfessor participated in Morpho Challenge, a peer operated competition for morphology analysis systems (Kurimo, Turunen, and Varjokallio, 2008). The Morpho Challenge competitions held in 2007 and 2008 evaluated each system's morphological analyses in five languages, English, German, Finnish, Turkish, and Arabic. When ParaMor's morphological analyses are merged with those of Morfessor, the resulting morpheme recall in all five languages is higher than that of any system which competed in either year's Challenge; in Turkish, for example, ParaMor's recall of 52.1% is twice that of the next highest system. This strong recall leads to F1 scores for morpheme identification above that of all systems in all languages but English.
Subject:Applied sciences; Computational linguistics; Natural language processing; Unsupervised learning; Artificial intelligence; Computer science; 0984:Computer science; 0800:Artificial intelligence
Added Entry:J. L. Carbonell, Alon
Added Entry:Carnegie Mellon University