Adaptive classification of scarcely labeled and evolving data streams

مرکز و کتابخانه مطالعات اسلامی به زبان های اروپایی

منو

" Adaptive classification of scarcely labeled and evolving data streams "
Mohammad Mehedy Masud L. Khan

Document Type	:	Latin Dissertation
Language of Document	:	English
Record Number	:	52713
Doc. No	:	TL22667
Call number	:	‭3391622‬
Main Entry	:	Mohammad Mehedy Masud
Title & Author	:	Adaptive classification of scarcely labeled and evolving data streams\ Mohammad Mehedy Masud
College	:	The University of Texas at Dallas
Date	:	2009
Degree	:	Ph.D.
student score	:	2009
Page No	:	166
Abstract	:	In this dissertation we propose solutions to four major problems encountered by data stream classification, namely, infinite length, concept-drift, concept-evolution and limited labeled data . Traditional data stream classification techniques address only the infinite length and concept-drift problems. Data streams are continuous flows of data, such as network traffic, sensor data and call center records. The goal of data stream classification is to build a model using past labeled data and use the model to predict the class labels of future instances. Data streams are inherently infinite in length. Concept-drift occurs in data streams when the underlying concept of the data changes over time, and concept-evolution occurs when new classes evolve. Data streams that flow at high speed also suffer from scarcity of labeled data since it is impossible to manually label all the data points in the stream. We propose three different techniques to address these problems. First, we propose efficient solutions to the infinite length and concept-drift problems using an ensemble classification approach. It solves the infinite length problem by dividing the stream into equal sized chunks such that each chunk can be stored and processed in main memory. It builds v classification models from r consecutive chunks using v -fold cross-validation type partitioning. An ensemble of such models is used to classify unlabeled data. Concept-drift is addressed by periodically updating the ensemble with newer models. Second, we provide a novel class detection technique for data streams, that addresses the concept-evolution problem in addition to addressing the infinite length and concept-drift problems. To the best of our knowledge, this is the first work that addresses the concept-evolution problem in a data stream classification framework. Our proposed technique automatically detects the presence of a novel class in data streams by analyzing and quantifying the cohesion among the unlabeled test instances, and separation of the test instances from the training data. Finally, the limited labeled data problem is addressed by building a stream classification model with scarcely labeled training data using semi-supervised clustering and ensemble classification approach. Our techniques outperform state-of-the-art data stream classification techniques on a number of benchmark stream datasets.
Subject	:	Applied sciences; Data mining; Concept-drift; Classification; Data streams; Concept evolution; Semisupervised clustering; Computer science; 0984:Computer science
Added Entry	:	L. Khan
Added Entry	:	The University of Texas at Dallas