رکورد قبلیرکورد بعدی

" Investigating Large-Scale Internet Abuse Through Web Page Classification "


Document Type : Latin Dissertation
Language of Document : English
Record Number : 904936
Doc. No : TL8jp0z4m4
Main Entry : Der, Matthew Francis
Title & Author : Investigating Large-Scale Internet Abuse Through Web Page Classification\ Der, Matthew FrancisSaul, Lawrence K; Voelker, Geoffrey M
College : UC San Diego
Date : 2015
student score : 2015
Abstract : The Internet is rife with abuse: examples include spam, phishing, malicious advertising, DNS abuse, search poisoning, click fraud, and so on. To detect, investigate, and defend against such abuse, security efforts frequently crawl large sets of Web sites that need to be classified into categories, e.g., the attacker behind the abuse or the type of abuse.Domain expertise is often required at first, but classifying thousands to even millions of Web pages manually is infeasible. In this dissertation, I develop machine learning tools to help security practitioners classify Web pages at scale. These automated, data-driven methods are made possible by the efforts of miscreants to operate at scale. Crafting every scam from scratch is too expensive, so miscreants use some degree of automation and replication to recreate their attacks. As a result, underlying similarities in both Web site content and structure can link related pages together. In the end, this automated classification of ``big data'' collected from the Web has significant impact, as it enables large-scale measurement and informs potential defensive interventions.This dissertation focuses on three applications. First, I present a system for monitoring Web sites that serve as online storefronts for spam-advertised goods. The system is highly accurate, even when training data is very limited. Second, I describe a system for identifying the black hat SEO campaigns that promote online stores selling counterfeit luxury goods. This system was used to nearly double the number of known campaigns to track, and increase the number of associated stores by 69%. Third, I discuss a system for categorizing the Web content hosted in new top-level domains. In total, this system was used to classify 4.1 million domains in 480 new TLDs.Overall, today's scale of well-organized cybercrime demands the use of scalable defensive analysis. This setting is where the data-driven techniques of machine learning prove especially useful. Furthermore, large-scale classification has become a frequent need in security, and our methods are more generally applicable to problems beyond just the ones documented in this dissertation.
Added Entry : Saul, Lawrence K; Voelker, Geoffrey M
Added Entry : UC San Diego
کپی لینک

پیشنهاد خرید
پیوستها
عنوان :
نام فایل :
نوع عام محتوا :
نوع ماده :
فرمت :
سایز :
عرض :
طول :
8jp0z4m4_12763.pdf
8jp0z4m4.pdf
پایان نامه لاتین
متن
application/pdf
19.73 MB
85
85
نظرسنجی
نظرسنجی منابع دیجیتال

1 - آیا از کیفیت منابع دیجیتال راضی هستید؟