Adapting to Data Drift in Encrypted Traffic Classification Using Deep Learning

dc.contributor.authorMalekghaini, Navid
dc.date.accessioned2023-01-12T21:20:04Z
dc.date.available2023-09-30T04:50:05Z
dc.date.issued2023-01-12
dc.date.submitted2023-01-09
dc.description.abstractDeep learning models have shown to achieve high performance in encrypted traffic classification. However, when it comes to production use, multiple factors challenge the performance of these models. The emergence of new protocols, especially at the application-layer, as well as updates to previous protocols affect the patterns in input data, making the model's previously learn patterns obsolete. Furthermore, proposed model architectures are usually tested on datasets collected in controlled settings, which makes the reported performances unreliable for production use. In this thesis, we start by studying how the performances of two high-performing state-of-the-art encrypted traffic classifiers change on multiple real-world datasets collected over the course of two years from a major ISP's network, Orange telecom. We investigate the changes in traffic data patterns highlighting the extent to which these changes, a.k.a. data drift, impact the performance of the two models in service-level and application-level classification. We propose best practices to manually adapt model architectures and improve their accuracy in the face of data drift. We show that our best practices are generalizable to other encryption protocols and different levels of labeling granularity. However, designing efficient model architectures and manual architectural adaptations is time-consuming and requires domain expertise. Neural architecture search (NAS) algorithms have been shown to automatically discover efficient models in other domains, such as image recognition and natural language processing. However, NAS's application is rather unexplored in Encrypted Traffic Classification. We propose AutoML4ETC, a tool to automatically design efficient and high-performing neural architectures for Encrypted Traffic Classification, given a target dataset and corresponding features. We define three powerful search spaces tailored specifically for the prominent categories of features in the Encrypted Traffic Classification state-of-the-art, i.e., packet raw bytes, flow time-series, and flow statistics. We show that a simple search strategy over AutoML4ETC’s search spaces can generate model architectures that outperform the state-of-the-art Encrypted Traffic Classification models on several benchmark datasets, including real-world datasets of TLS and QUIC traffic collected from a major ISP network. In addition to being more accurate, the AutoML4ETC’s architectures are significantly more efficient and lighter in terms of the number of parameters. We further showcase the potential of AutoML4ETC by experimenting with state-of-the-art NAS techniques and model ensembles generated from different search spaces. We also use AutoML4ETC to analyze the state of adoption of the QUIC protocol.en
dc.identifier.urihttp://hdl.handle.net/10012/19058
dc.language.isoenen
dc.pendingfalse
dc.publisherUniversity of Waterlooen
dc.subjectDeep Learningen
dc.subjectData Driften
dc.subjectEncrypted Traffic Classificationen
dc.subjectHTTP/2en
dc.subjectQUICen
dc.subjectNeural Architecture Searchen
dc.titleAdapting to Data Drift in Encrypted Traffic Classification Using Deep Learningen
dc.typeMaster Thesisen
uws-etd.degreeMaster of Mathematicsen
uws-etd.degree.departmentDavid R. Cheriton School of Computer Scienceen
uws-etd.degree.disciplineComputer Scienceen
uws-etd.degree.grantorUniversity of Waterlooen
uws-etd.embargo.terms8 monthsen
uws.contributor.advisorBoutaba, Raouf
uws.contributor.affiliation1Faculty of Mathematicsen
uws.peerReviewStatusUnrevieweden
uws.published.cityWaterlooen
uws.published.countryCanadaen
uws.published.provinceOntarioen
uws.scholarLevelGraduateen
uws.typeOfResourceTexten

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Malekghaini_Navid.pdf
Size:
2.08 MB
Format:
Adobe Portable Document Format
Description:

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
6.4 KB
Format:
Item-specific license agreed upon to submission
Description: