Learning-based Data Augmentation for Multiclass Data

مرکز و کتابخانه مطالعات اسلامی به زبان های اروپایی

منو

" Learning-based Data Augmentation for Multiclass Data "
Al Olaimat, Mohammad Kim, Jinoh

Document Type	:	Latin Dissertation
Language of Document	:	English
Record Number	:	1052221
Doc. No	:	TL51338
Main Entry	:	Al Olaimat, Mohammad
Title & Author	:	Learning-based Data Augmentation for Multiclass Data\ Al Olaimat, MohammadKim, Jinoh
College	:	Texas A&M University - Commerce
Date	:	2019
Degree	:	M.S.
student score	:	2019
Note	:	73 p.
Abstract	:	A multiclass dataset is a dataset that contains three or more classes in terms of classification. As a result, many classification models have been developed to detect such classes. Classification models need a dataset for the training phase, and this dataset sometimes has many classes. Some of the important challenges for these classification models are (i) some training datasets are not well balanced because some classes are less represented than others, and (ii) some training datasets have a small set of data making it hard to represent the entire distribution. Using an unbalanced dataset or small dataset for training classification models leads to reducing the ability to detect these minor classes in the testing dataset because these minor classes may be treated as noises by the classifier, which may consequently reduce the efficiency in the classification. This research aims to make such unbalanced datasets to be representative by augmenting minor classes and enlarging small datasets to increase the performance of the classification. In particular, this research takes an approach using deep learning for augmenting. Generative adversarial networks (GANs) are one of the deep learning techniques, and this research develops a methodology to extend the data using GANs effectively. The methodology includes the following steps: generating synthetic data for the minor class using GANs based on locality augmentation strategy, augmenting the synthetic data, training a classification model using augmented dataset, and testing the classification model. Through the evaluation using the public network connection datasets, we observed that the proposed technique enhances the performance for identifying anomalies in the network up to 9% in terms of the classifier accuracy and 10% in terms of the F1-Score when the minor classes represent 3% of all classes in the training dataset.
Descriptor	:	Computer science
Added Entry	:	Kim, Jinoh
Added Entry	:	Texas Aamp;M University - Commerce