Fachbereich Informatik - Aktuell

08.09.2017 10:51

Disputation Nedim Šrndić

am Mittwoch, 4. Oktober 20017 um 16 Uhr in Raum A104, Sand 1, EG.

Machine Learning and Security of Non-Executable Files

Berichterstatter 1: Prof. Dr. Andreas Zell
Berichterstatter 2: Prof. Dr. Michael Menth


Computer malware is a well-known threat in security which, despite the enormous time and effort invested in fighting it, is today more prevalent than ever. Recent years have brought a surge in one particular type: malware embedded in non-executable file formats,e.g., PDF, SWF and various office file formats.
The traditional approach to malware detection – signature matching, heuristics and behavioral profiling – has from its inception been a labor-intensive manual task, always lagging one step behind the attacker. An automated and scalable approach is needed to fill the gap between automated malware adaptation and manual malware detection, and machine learning is emerging as a viable solution. Its branch called adversarial machine learning studies the security of machine learning algorithms and the special conditions that arise when machine learning is applied for security.
This talk presents a study of adversarial machine learning in the context of static detection of malware in non-executable file formats. The effectiveness, efficiency and security of machine learning applications are evaluated in this context. To this end, 3 data-driven detection methods are presented, developed using very large, high quality datasets. PJScan detects malicious PDF files based on lexical properties of embedded JavaScript code and is the fastest method published to date. SL2013 extends its coverage to all PDF files, regardless of JavaScript presence, by analyzing the hierarchical structure of PDF logical building blocks and demonstrates excellent performance in a novel long-term realistic experiment. Finally, Hidost generalizes the hierarchical-structure-based feature set to become the first machine-learning-based malware detector operating on multiple file formats. In a comprehensive experimental evaluation on PDF and SWF, it outperforms other academic methods and commercial antivirus systems in detection effectiveness.
Furthermore, the talk presents a framework for security evaluation of machine learning classifiers in a case study performed on an independent PDF malware detector. The results show that the ability to manipulate a part of the classifier’s feature set allows a malicious adversary to disguise malware so that it appears benign to the classifier with a high success rate.