Entropy of printed Bengali language texts

مرکز و کتابخانه مطالعات اسلامی به زبان های اروپایی

منو

" Entropy of printed Bengali language texts "
Subrata Pramanik

Document Type	:	Latin Dissertation
Language of Document	:	English
Record Number	:	53888
Doc. No	:	TL23842
Call number	:	‭MR48804‬
Main Entry	:	Subrata Pramanik
Title & Author	:	Entropy of printed Bengali language texts\ Subrata Pramanik
College	:	University of Northern British Columbia (Canada)
Date	:	2008
Degree	:	M.Sc.
student score	:	2008
Page No	:	82
Abstract	:	One of the most important sources of information is written and spoken human language. The language that is spoken, written, or signed by humans for general-purpose communication is referred as natural language. Determining the entropy of natural language text is a fundamentally important problem in natural language processing. The study and analysis of the entropy of a language can be a meaningful resource for researchers in linguistics and communication theory. For the purpose of this research we have taken printed Bengali language text as our source of natural language. We have collected a sufficient number of printed Bengali language text samples, and divided them into two classes, Newspaper and Literature. We have studied each class in order to come up with specific entropy for each category and analyzed their characteristics. As a separate study, we collected printed religious Bengali language texts, divided them into two classes, Islamic and Hindu, find out their entropy and studied and analyzed their characteristics. From our research, we have found the Zero and first-order entropy of Bengali language to be 5.52 and 4.55 respectively. And the language uncertainty and redundancy are 0.8242 and 17.58% respectively. These entropy and redundancy results of the language will be useful to the researchers to help find a better text compression method for Bengali language.
Subject	:	Applied sciences; Computer science; 0984:Computer science
Added Entry	:	University of Northern British Columbia (Canada)