Toward Automated DNS Tampering Detection Using Machine Learning

Publication
The 2024 Workshop on Free and Open Communications on the Internet

Abstract:

DNS manipulation is one of the most prevalent and effective techniques for censoring Internet access and interfering with users’ online activities worldwide. Reliable detection of DNS tampering is crucial, but challenging due to evolving censorship tactics and the lack of complete ground truth data. In this paper, we demonstrate the power of machine learning (ML) in addressing these challenges by applying supervised and unsupervised models to recent global DNS measurement data collected by the Open Observatory of Network Interference (OONI). Our models achieve high accuracy in learning expert-defined heuristics for DNS tampering and uncovering new manipulation instances missed by rule-based approaches.

Through an extensive analysis evaluating different training data volumes and time windows from one to 24 months, we provide key insights into how the quantity and diversity of data, as well as evolving censorship behaviors, impact model performance over time. Remarkably, our ML detector can enhance traditional heuristics by accurately identifying DNS fingerprints with high confidence. These findings underscore the effectiveness of ML techniques in detecting global DNS manipulation at scale while adapting to emerging censorship tactics.

To foster future research, we will release our regularly updated models, enabling the development of robust, sustainable censorship detection systems capable of withstanding the dynamic landscape of Internet censorship worldwide. Our work paves the way for more proactive interventions that safeguard Internet freedom globally.