Track: Differential privacy and secure multi-party computation
Session Chair: Alptekin Küpçü
Secure and Scalable Document Similarity on Distributed Databases: Differential Privacy to the Rescue
Phillipp Schoppmann (Humboldt-Universität zu Berlin), Lennart Vogelsang (Humboldt-Universität zu Berlin), Adrià Gascón (The Alan Turing Institute / University of Warwick), and Borja Balle (DeepMind)
Summary: Privacy-preserving collaborative data analysis enables richer models than what each party can learn with their own data. Secure Multi-Party Computation (MPC) offers a robust cryptographic approach to this problem, and in fact several protocols have been proposed for various data analysis and machine learning tasks. While the protocols proposed so far are often tailored to a concrete task, they do not exploit properties of the input data that are usually leveraged for scalability when computing on the clear. In this work, we show how to realize these scalability gains when performing secure computation in the context of text data by proposing MPC protocols for feature extraction and similarity computation based on the well-known TF-IDF representation for text documents. Our protocols exploit sparsity introduced by the feature representation and common statistical properties of the distribution of words in natural language texts. As part of our contribution we propose a novel two-party protocol for secure (batched) sparse inner product, and develop a secure mechanism for extracting IDF coefficients from a distributed database of text documents while guaranteeing Differential Privacy (DP). We show how our protocols can be composed to build a secure distributed k-nearest neighbors classifier. The resulting protocol remains secure even against colluding servers, and we experimentally show that it scales to real-world datasets.
Patrick Ah-Fat (Imperial College London) and Michael Huth (Imperial College London)
Summary:Computing a function of some private inputs while maintaining the confidentiality of those inputs is an important problem, to which Differential Privacy and Secure Multi-party Computation can offer solutions under specific assumptions. Research in randomised algorithms aims at improving the privacy of such inputs by randomising the output of a computation while ensuring that large distortions of outputs occur with low probability. But use cases such as e-voting or auctions will not tolerate large distortions at all. Moreover, although Differential Privacy provides convenient solutions for statistical databases, it is not applicable to general functions. Thus, we develop a framework for randomising the output of a privacy-preserving computation, while guaranteeing that output distortions stay within a specified bound. We build randomisation algorithms and we prove that the computed randomisations maximise the inputs' privacy. Experimental work demonstrates significant privacy gains when compared with existing approaches that guarantee distortion bounds.
Seung Geol Choi (United States Naval Academy), Dana Dachman-Soled (University of Maryland), Mukul Kulkarni (University of Massachusetts), and Arkady Yerukhimovich (George Washington University)
Summary:In this paper we present a general framework for securely computing approximation of aggregate statistics in distributed setting (e.g. number of unique IP addresses across multiple datasets). We combine techniques of Secure Multi-Party Computation (MPC), Differential Privacy (DP), and Sketching Algorithms to provide framework which is scalable (can handle input data sizes in millions), efficient (MPC runtime independent of input data size), and accurate (low multiplicative error in approximation) while preserving privacy of the input data.