Track: Privacy-preserving Machine Learning
Session chair: Ananth Raghunathan
Payman Mohassel (Facebook), Mike Rosulek (Oregon State University),and Ni Trieu (Oregon State University)
Clustering is a common technique for data analysis, which aims to partition data into similar groups. When the data comes from different sources, it is highly desirable to maintain the privacy of each database. In this work, we study a popular clustering algorithm (K-means) and adapt it to the privacy-preserving context.
Specifically, to construct our privacy-preserving clustering algorithm, we first propose an efficient batched Euclidean squared distance computation protocol in the amortizing setting, when one needs to compute the distance from the same point to other points. Furthermore, we construct a customized garbled circuit for computing the minimum value among shared values. We believe these new constructions may be of independent interest.
We implement and evaluate our protocols to demonstrate their practicality and show that they are able to train datasets that are much larger and faster than in the previous work. The numerical results also show that the proposed protocol achieve almost the same accuracy compared to a K-means plain-text clustering algorithm.
Anders Dalskov (Aarhus University), Daniel Escudero (Aarhus University), and Marcel Keller (Data61, CSIRO)
We investigate two questions in this paper:
First, we ask to what extent ``MPC friendly'' models are already supported by major Machine Learning frameworks such as TensorFlow or PyTorch. Prior works provide protocols that only work on fixed-point integers and specialized activation functions, two aspects that are not supported by popular Machine Learning frameworks, and the need for these specialized model representations means that it is hard, and often impossible, to use e.g., TensorFlow to design, train and test models that later have to be evaluated securely.
Second, we ask to what extent the functionality for evaluating Neural Networks already exists in general-purpose MPC frameworks. These frameworks have received more scrutiny, are better documented and supported on more platforms. Furthermore, they are typically flexible in terms of the threat model they support. In contrast, most secure evaluation protocols in the literature are targeted to a specific threat model and their implementations are only a ``proof-of-concept'', making it very hard for their adoption in practice.
We answer both of the above questions in a positive way: We observe that the quantization techniques supported by both TensorFlow, PyTorch and MXNet can provide models in a representation that can be evaluated securely; and moreover, that this evaluation can be performed by a general purpose MPC framework.
We perform extensive benchmarks to understand the exact trade-offs
between different corruption models, network sizes and efficiency.
These experiments provide an interesting insight into cost between
active and passive security, as well as honest and dishonest majority.
Our work shows then that the separating line between existing ML
frameworks and existing MPC protocols may be narrower than implicitly
suggested by previous works.
The k-nearest neighbors (kNN) classifier predicts a class of a query, q, by taking the majority class of its k neighbors in an existing (already classified) database, S. In secure kNN, q and S are owned by two different parties and q is classified without sharing data. In this work we present a classifier based on kNN, that is more efficient to implement with homomorphic encryption (HE). The efficiency of our classifier comes from a relaxation we make to consider κ nearest neighbors for κ ≈ k with probability that increases as the statistical distance between Gaussian and the distribution of the distances from q to S decreases. We call our classifier k-ish Nearest Neighbors (k-ish NN). For the implementation we introduce double-blinded coin-toss where the bias and output of the toss are encrypted. We use it to approximate the average and variance of the distances from q to S in a scalable circuit whose depth is independent of |S|. We believe these to be of independent interest. We implemented our classifier in an open source library based on HElib and tested it on a breast tumor database. Our classifier has accuracy and running time comparable to current state of the art (non-HE) MPC solution that have better running time but worse communication complexity. It also has communication complexity similar to naive HE implementation that have worse running time.
Privacy-preserving machine learning (PPML) via Secure Multi-party Computation (MPC) has gained momentum in the recent past. Assuming a minimal network of pair-wise private channels, we propose an efficient four-party PPML framework over rings, FLASH, the first of its kind in the regime of PPML framework, that achieves the strongest security notion of Guaranteed Output Delivery (all parties obtain the output irrespective of adversary's behaviour). The state of the art ML frameworks such as ABY3 by Mohassel et.al (ACM CCS'18) and SecureNN by Wagh et.al (PETS'19) operate in the setting of 3 parties with one malicious corruption but achieve the weaker security guarantee of abort. We demonstrate PPML with real-time efficiency, using the following custom-made tools that overcome the limitations of the aforementioned state-of-the-art: (a) Dot product, which is independent of the vector size unlike the state-of-the-art ABY3, SecureNN and ASTRA by Chaudhari et.al (ACM CCSW'19), all of which have linear dependence on the vector size. (b) Truncation, which is constant round and free of circuits like Ripple Carry Adder (RCA), unlike ABY3 which uses these circuits and has round complexity of the order of depth of these circuits. We then exhibit the application of our FLASH framework in the secure server-aided prediction of vital algorithms-- Linear Regression, Logistic Regression, Deep Neural Networks, and Binarized Neural Networks. We substantiate our theoretical claims through improvement in benchmarks of the aforementioned algorithms when compared with the current best framework ABY3. All the protocols are implemented over a 64-bit ring in LAN and WAN. Our experiments demonstrate that, for MNIST dataset, the improvement (in terms of throughput) ranges from 11× to 1395× over LAN and WAN together.