Hallway Track: Meet me in the Hallway! 02:10 PM to 03:00 PM (50 minutes)
Session chair: Marc Juarez
- Tik-Tok: The Utility of Packet Timing in Website Fingerprinting Attacks
Mohammad Saidur Rahman (Rochester Institute of Technology), Payap Sirinam (Navaminda Kasatriyadhiraj Royal Air Force Academy), Nate Mathews (Rochester Institute of Technology), Kantha Girish Gangadhara (Rochester Institute of Technology), and Matthew Wright (Rochester Institute of Technology)
A passive local eavesdropper can leverage Website Fingerprinting (WF) to deanonymize the web browsing activity of Tor users. The value of timing information to WF has often been discounted in recent works due to the volatility of low-level timing information. In this paper, we more carefully examine the extent to which packet timing can be used to facilitate WF attacks. In particular, we propose a new set of timing-related features based on burst-level characteristics, evaluate the effectiveness of raw timing and directional timing which is a combination of raw timing and direction in a deep-learning-based WF attack. Our closed-world evaluation shows that directional timing performs best in most of the setting achieving: (i) 98.40% in undefended Tor traffic; (ii) 93.50% on WTF-PAD traffic, several points higher than when only directional information is used; and (iii) 64.70% against onion sites, 12% higher than using only direction. To further investigate the value of timing information, we perform an information leakage analysis on the handcrafted features. Our results show that while timing features leak less information than directional features, the information contained in each feature is mutually exclusive to one another and thus may improve the robustness of a classifier.
- Website Fingerprinting with Website Oracles
Tobias Pulls (Karlstad University) and Rasmus Dahlberg (Karlstad University)
Website Fingerprinting (WF) attacks are a subset of traffic analysis attacks where a local passive attacker attempts to infer which websites a target victim is visiting over an encrypted tunnel, such as the anonymity network Tor. We introduce the security notion of a Website Oracle (WO) that gives a WF attacker the capability to determine whether a particular monitored website was among the websites visited by Tor clients at the time of a victim’s trace. Our simulations show that combining a WO with a WF attack—which we refer to as a WF+WO attack—significantly reduces false positives for about half of all website visits and for the vast majority of websites visited over Tor. The measured false positive rate is on the order one false positive per million classified website trace for websites around Alexa rank 10,000. Less popular monitored websites show orders of magnitude lower false positive rates.
We argue that WOs are inherent to the setting of anonymity networks and should be an assumed capability of attackers when assessing WF attacks and defenses. Sources of WOs are abundant and available to a wide range of realistic attackers, e.g., due to the use of DNS, OCSP, and real-time bidding for online advertisement on the Internet, as well as the abundance of middleboxes and access logs. Access to a WO indicates that the evaluation of WF defenses in the open world should focus on the highest possible recall an attacker can achieve. Our simulations show that augmenting the Deep Fingerprinting WF attack by Sirinam et al. with access to a WO significantly improves the attack against five state-of-the-art WF defenses, rendering some of them largely ineffective in this new WF+WO setting.
- Effective writing style transfer via combinatorial paraphrasing
Tommi Gröndahl (Aalto University) and N. Asokan (University of Waterloo)
Stylometry can be used to profile or deanonymize authors against their will based on writing style. Style transfer provides a defence. Current techniques typically use either encoder-decoder architectures or rule-based algorithms. Crucially, style transfer must reliably retain original semantic content to be actually deployable. We conduct a multifaceted evaluation of three state-of-the-art encoder-decoder style transfer techniques, and show that all fail at semantic retainment. To mitigate this problem we propose ParChoice: a technique based on the combinatorial application of multiple paraphrasing algorithms. ParChoice strongly outperforms the encoder-decoder baselines in semantic retainment. Additionally, compared to baselines that achieve non-negligible semantic retainment, ParChoice has superior style transfer performance. Furthermore, when compared to two state-of-the-art rule-based style transfer techniques, ParChoice has markedly better semantic retainment. Combining ParChoice with the best performing rule-based baseline (Mutant-X) also reaches the highest style transfer success on the Brennan-Greenstadt and Extended-Brennan-Greenstadt corpora, with much less impact on original meaning than when using the rule-based baseline techniques alone. Finally, we highlight a critical problem that afflicts all current style transfer techniques: the adversary can use the same technique for thwarting style transfer via adversarial training. We show that adding randomness to style transfer helps to mitigate the effectiveness of adversarial training.
- Multi-x: Identifying Multiple Authors from Source Code Files
Mohammed Abuhamad (University of Central Florida), Tamer Abuhmed (Sungkyunkwan University), Aziz Mohaisen (University of Central Florida), and DaeHun Nyang (Ewha Womans University)
Most authorship identification schemes assume that code samples are written by a single author. However, real software projects are typically the result of a team effort, making it essential to consider a fine-grained multi-author identification in a single code sample, which we address with Multi-χ. Multi-χ leverages a deep learning-based approach for multi-author identification in source code, is lightweight, uses a compact representation for efficiency, and does not require any code parsing, syntax tree extraction, nor feature selection. In Multi-χ, code samples are divided into small segments, which are then represented as a sequence of n-dimensional term representation. The sequence is fed into an RNN-based verification model to assist a segment integration process that integrates positively verified segments, i.e., integrates segments that have a high probability of being written by one author. Finally, the resulting segments from the integration process are represented using word2vec or TF-IDF and fed into the identification model. We evaluate Multi-χwith several Github projects (Caffe, Facebook’s Folly, Tensor-Flow, etc.) and show remarkable accuracy. For example, Multi-χ achieves an authorship example-based accuracy(A-EBA) of 86.41% and per-segment authorship identification of 93.18% for identifying 562 programmers. We examine the performance against multiple dimensions and design choices, and demonstrate its effectiveness.