Orca: FSS-based Secure Training and Inference with GPUs

Neha Jawalkar; Kanav Gupta; Arkaprava Basu; Nishanth Chandran; Divya Gupta; Rahul Sharma

Orca: FSS-based Secure Training and Inference with GPUs

Neha Jawalkar ,
Kanav Gupta ,
Arkaprava Basu ,
Nishanth Chandran ,
Divya Gupta ,
Rahul Sharma

2024 IEEE Symposium on Security and Privacy | May 2024

Published by IEEE

Download BibTex

Secure Two-party Computation (2PC) allows two parties to compute any function on their private inputs without revealing their inputs to each other. In the offline/online model for 2PC, correlated randomness that is independent of all inputs to the computation, is generated in a preprocessing (offline) phase and this randomness is then utilized in the online phase once the inputs to the parties become available. Most 2PC works focus on optimizing the online time as this overhead lies on the critical path. A recent paradigm for obtaining efficient 2PC protocols with low online cost is based on the cryptographic technique of function secret sharing (FSS). We build an end-to-end system ORCA to accelerate the computation of FSS-based 2PC protocols with GPUs. Next, we observe that the main performance bottleneck in such accelerated protocols is in storage (due to the large amount of correlated randomness), and we design new FSS-based 2PC protocols for several key functionalities in ML which reduce storage by up to 5×. Compared to prior state-of-the-art on secure training accelerated with GPUs in the same computation model (PIRANHA, Usenix Security 2022), we show that ORCA has 4% higher accuracy, 98× lesser communication, and is 26× faster on CIFAR-10. Moreover, maintaining training accuracy while using fixed-point needs stochastic truncations, and all prior works on secure fixed-point training (including PIRANHA) use insecure protocols for it. We provide the first secure protocol for stochastic truncations and build on it to provide the first evaluation of training with end-to-end security. For secure ImageNet inference, ORCA achieves sub-second latency for VGG-16 and ResNet-50, and outperforms the state-of-the-art by 8 − 103×.