|
" Learning-based Data Augmentation for Multiclass Data "
Al Olaimat, Mohammad
Kim, Jinoh
Document Type
|
:
|
Latin Dissertation
|
Language of Document
|
:
|
English
|
Record Number
|
:
|
1052221
|
Doc. No
|
:
|
TL51338
|
Main Entry
|
:
|
Al Olaimat, Mohammad
|
Title & Author
|
:
|
Learning-based Data Augmentation for Multiclass Data\ Al Olaimat, MohammadKim, Jinoh
|
College
|
:
|
Texas A&M University - Commerce
|
Date
|
:
|
2019
|
Degree
|
:
|
M.S.
|
student score
|
:
|
2019
|
Note
|
:
|
73 p.
|
Abstract
|
:
|
A multiclass dataset is a dataset that contains three or more classes in terms of classification. As a result, many classification models have been developed to detect such classes. Classification models need a dataset for the training phase, and this dataset sometimes has many classes. Some of the important challenges for these classification models are (i) some training datasets are not well balanced because some classes are less represented than others, and (ii) some training datasets have a small set of data making it hard to represent the entire distribution. Using an unbalanced dataset or small dataset for training classification models leads to reducing the ability to detect these minor classes in the testing dataset because these minor classes may be treated as noises by the classifier, which may consequently reduce the efficiency in the classification. This research aims to make such unbalanced datasets to be representative by augmenting minor classes and enlarging small datasets to increase the performance of the classification. In particular, this research takes an approach using deep learning for augmenting. Generative adversarial networks (GANs) are one of the deep learning techniques, and this research develops a methodology to extend the data using GANs effectively. The methodology includes the following steps: generating synthetic data for the minor class using GANs based on locality augmentation strategy, augmenting the synthetic data, training a classification model using augmented dataset, and testing the classification model. Through the evaluation using the public network connection datasets, we observed that the proposed technique enhances the performance for identifying anomalies in the network up to 9% in terms of the classifier accuracy and 10% in terms of the F1-Score when the minor classes represent 3% of all classes in the training dataset.
|
Descriptor
|
:
|
Computer science
|
Added Entry
|
:
|
Kim, Jinoh
|
Added Entry
|
:
|
Texas Aamp;M University - Commerce
|
| |