خط مشی دسترسیدرباره ماپشتیبانی آنلاین
ثبت نامثبت نام
راهنماراهنما
فارسی
ورودورود
صفحه اصلیصفحه اصلی
جستجوی مدارک
تمام متن
منابع دیجیتالی
رکورد قبلیرکورد بعدی
Document Type:Latin Dissertation
Language of Document:English
Record Number:54646
Doc. No:TL24600
Call number:‭3358375‬
Main Entry:Mahmudul Islam Sheikh
Title & Author:A methodology to improve the performance of extracting information from financial documentsMahmudul Islam Sheikh
College:The University of Mississippi
Date:2009
Degree:Ph.D.
student score:2009
Page No:159-n/a
Abstract:The Information Extraction (IE) technology retrieves the most relevant, context sensitive, and specific pieces of information from unstructured documents and presents it in a structured format. The IE problem is very difficult for several reasons. First of all, there is no clear boundary of the items to be retrieved. Secondly, information retrieval techniques, by using a bag of words and word statistics, may not suffice to retrieve most of the relevant information because of missing contexts. Thirdly, the direct use of some statistical techniques such as the use of Naive Bayes classifier or the use of Average Mutual Information performs well on document retrieval tasks, but these techniques are not directly applicable to the IE tasks. This study proposes an IE methodology that aims at extracting financial information of various NASDAQ listed companies with high precision and recall. The performance is improved partly by using a rule-based symbolic-learning model. A set of rules is learned by the simplest form of Tabu search algorithm. The results show that the application of the Tabu search algorithm with parts of speech tags improves precision and recall over the application of other methods and resources. The output of the learned model is further analyzed by a statistical method called "Max-Strength" to improve the precision of the items extracted by the symbolic learning model. The strength of the methodology has been evidenced by its performance on the "Seminar Announcement" corpus that has been used by several well known systems. [PUBLICATION ABSTRACT]
Subject:Social sciences; Applied sciences; Information extraction; Machine learning; Symbolic learning; Financial information; Rule-based learning; Rule generation; Management; Computer science; Studies; Financial instruments; 0984:Computer science; 0454:Management
Added Entry:S. J. Conlon
Added Entry:The University of Mississippi