Towards improving e -mail content classification for spam control: Architecture, abstraction, and st...

مرکز و کتابخانه مطالعات اسلامی به زبان های اروپایی

منو

" Towards improving e -mail content classification for spam control: Architecture, abstraction, and strategies "
Muhammad Nadzir Marsono

Document Type	:	Latin Dissertation
Language of Document	:	English
Record Number	:	52682
Doc. No	:	TL22636
Call number	:	‭NR37428‬
Main Entry	:	Muhammad Nadzir Marsono
Title & Author	:	Towards improving e -mail content classification for spam control: Architecture, abstraction, and strategies\ Muhammad Nadzir Marsono
College	:	University of Victoria (Canada)
Date	:	2007
Degree	:	Ph.D.
student score	:	2007
Page No	:	162
Abstract	:	This dissertation discusses techniques to improve the effectiveness and the efficiency of spam control. Specifically, layer-3 e-mail content classification is proposed to allow e-mail pre-classification (for fast spam detection at receiving e-mail servers) and to allow distributed processing at network nodes for fast spam detection at spam control points, e.g., at e-mail servers. Fast spam detection also allows prioritizing e-mail servicing at receiving e-mail servers to safeguard non-spam e-mail deliveries even under heavy spam traffic. Fast spam detection also allows spam rejection during Simple Mail Transfer Protocol sessions for inbound and outbound spam control. We have four contributions in the dissertation. ln our first contribution, we propose a hardware architecture for naïve Bayes content classification unit for a high-throughput spam detection computation. We use the logarithmic number system to simplify the naïve Bayes computation. To handle the fast but lossy logarithmic number system computation, we analyze the noise model of our hardware architecture. Through noise analysis, synthesis, and verification by numerical simulation, we show that the naïve Bayes classification unit, implemented on FPGA is capable of processing, with very low computation noise, more than one hundred million features per second, an order of magnitude faster than that on a general-purpose processor implementation. In our second contribution, we propose e-mail content pre-classification at network layer (layer 3) instead of at application layer (layer 7) as currently being practiced to allow e-mail packet pre-classification and distributed processing for effective spam detection beyond server implementations. By performing e-mail content classification at a lower abstraction level, e-mail packets can be pre-processed, without reassembly, at any network node between sender and receiver. We demonstrated that the naïve Bayes e-mail content classification can be adapted for layer-3 processing. We also show that fast e-mail class estimation can be performed at receiving e-mail servers. Through simulation using e-mail data sets, we showed that the layer-3 e-mail content classification is capable of detecting spam with accuracy and false positive values that approximately equal the ones at layer 7. In our third contribution, we propose a prioritized e-mail servicing scheme using a priority queuing approach to improve spam handling at receiving e-mail servers. In this scheme, priority is given higher to non-spam e-mail than spam. Four servicing strategies for the proposed scheme are studied. We analyzed the performance of this scheme under different e-mail traffic loads and service capacities. We show that the non-spam delay and loss probability can be reduced when the server is under-provisioned. In our fourth contribution, we are propose a spam handling scheme that rejects spam during Simple Mail Transfer Protocol sessions. The proposed spam handling scheme allows inbound and outbound spam control. It is capable of reducing servers' loading and hence, non-spam queuing delay and loss probability. We analyze the performance of this scheme under different e-mail traffic loads and service capacities. We show that the non-spam delay and loss probability can be reduced when the server is under-provisioned. In this dissertation, we present four techniques to improve spam control on e-mail content classification. We envision that our proposed approaches complement rather than replace the current spam control systems. The proposed four approaches are capable to work with existing spam control systems and support proactive spam and other e-mail-based threats such as phishing and e-mail worm controls anywhere across the Internet.
Subject	:	Applied sciences; Content classification; E-mail; Spam control; Electrical engineering; Computer science; 0984:Computer science; 0544:Electrical engineering
Added Entry	:	University of Victoria (Canada)