|
" Data-Driven Feature Selection in Medical Diagnosis Using Linear Programming, Non-Parametric Methods, and Graph Analysis "
Abdulla, Mai
Khasawneh, Mohammad
Document Type
|
:
|
Latin Dissertation
|
Language of Document
|
:
|
English
|
Record Number
|
:
|
1051625
|
Doc. No
|
:
|
TL50742
|
Main Entry
|
:
|
Abdulla, Mai
|
Title & Author
|
:
|
Data-Driven Feature Selection in Medical Diagnosis Using Linear Programming, Non-Parametric Methods, and Graph Analysis\ Abdulla, MaiKhasawneh, Mohammad
|
College
|
:
|
State University of New York at Binghamton
|
Date
|
:
|
2019
|
Degree
|
:
|
Ph.D.
|
student score
|
:
|
2019
|
Note
|
:
|
197 p.
|
Abstract
|
:
|
Silent Diseases is an umbrella term that captures a whole spectrum of many illnesses that produce no clinically obvious sign and are usually diagnosed at advanced stages in which the damage is permanent. Current diagnostic strategies of silent diseases depend on self-reported symptoms and observed behavior over extended period of time, and until now there are no specific tests to diagnose silent diseases. Scientific research suggested the importance of early diagnosis to restore functionality and reduce disease-related complications. To address these challenges, many researchers are leveraging feature selection methods to improve medical diagnosis. Feature selection aims to select a small subset of informative features that contains most of the information related to a given task. The main goal of this research is to develop feature selection algorithms that identify biomarkers for different spectrum of silent diseases. In the first part, mathematical optimization models, namely, CS-SVM and CS-MSMT are proposed and applied on five medical diagnosis datasets. Unlike the existing feature selection approaches, the CS-SVM and CS-MSMT selects the least cost and most informative features while providing a tighter relaxation bound based on aggressive bound tightening during the solve process. The CS-SVM and CS-MSMT improved the accuracy and reduced cost up to 4% and 89%, respectively. In the second part, G-Forest for microarray gene expression profiling feature selection is proposed. G-Forest is an ensemble cost-sensitive feature selection algorithm that develops a population of biases for a Random Forest induction algorithm. Unlike the existing feature selection approaches, G-Forest considers the rare variance in genetic data, considers the complexity of the gene retrieval process, and is parallelizable. G-Forest improved the accuracy and reduced the gene complexity by 10% and 2.5%, respectively. In the third part, a graph-based approach to feature selection is proposed to better predict abnormal functional networks in schizophrenia patients. Schizophrenia is one of the silent diseases characterized by its severe brain disorders. The brain is represented as a complex network of interacting elements. The results are compared to the conventional seed-based approach and resulted in 100% accuracy levels.
|
Descriptor
|
:
|
Engineering
|
Added Entry
|
:
|
Khasawneh, Mohammad
|
Added Entry
|
:
|
State University of New York at Binghamton
|
| |