Project Natick is a full-scale, fully operational datacenter module, installed underwater in the North Sea, off the Scottish coast. Powered by renewable energy, Project Natick is a test of the feasibility of underwater datacenters. We’ve taken some of the traffic traveling between the and Microsoft Research headquarters in Redmond, Washington, USA, and secured that traffic with an encrypted network tunnel protected with post-quantum cryptography.
Project Natick was the perfect testbed for this work – while it was built to mimic a Microsoft datacenter, Natick is not handling any critical business or customer data. That we weren’t able to physically access the servers and network infrastructure inside the Natick pressure vessel to setup and manage the PQ-protected tunnel made the experiment more accurately reflect the real-world, where it would be infeasible to hand-configure devices in massive datacenters worldwide.
Quantum computers are coming. The exact timeline is uncertain, but a quantum computer powerful enough to break today’s asymmetric cryptography may come online in 10 – 15 years. That cryptographically relevant quantum computer will allow adversaries to break encryption and signing of today’s internet communications. So, before that happens, the entire world needs to start using post-quantum cryptography – cryptography designed to be secure against quantum attackers.
The migration will take time – it’s going to take a long time to update all the applications, services, and infrastructure to support the new algorithms, and issue new credentials where they’re needed. While that migration is underway, we can use encrypted network tunnels to protect the traffic from software and devices that are not yet fully protected.
Microsoft already operates encrypted tunnels between its datacenters to protect network traffic in transit outside a datacenter’s physical boundaries. The great-circle distance between the underwater datacenter and Microsoft Research headquarters in Redmond is approximately 4,300 miles, and this allowed us to set up an experiment with similar real-world challenges as connections between production datacenters.
The Natick pressure vessel contains several racks of servers all connected via a network inside the vessel. This network is then connected to the Microsoft global network via a set of underwater fiberoptic cables that connect to the facility on shore. Connections between sites on the Microsoft global network are secured with classical cryptography to protect the contents of the network traveling between sites.
One of the servers runs our modified version of OpenVPN. We call this our “router node.” The router node connects to another server in Redmond to establish a post-quantum crypto encrypted tunnel between the two sites. This server is connected to both the main network in the vessel, and a second virtual local area network (VLAN) which we call the “post-quantum VLAN.” We then configured the networking hardware in the vessel to place several of the other servers on this VLAN, and we can remotely change the number of servers on the post-quantum VLAN. All traffic from these servers is routed by the router node across the tunnel to Redmond, where it continues to its final destination, and outside traffic headed back to these nodes is similarly routed to the router node in Redmond back across the tunnel and into the vessel.
The main network in the vessel is connected normally to the Microsoft global network. In fact, the tunnel uses the regular network connection to route encrypted traffic between Redmond and Scotland. The typical round-trip time on this connection is approximately 180 milliseconds.
Each router node runs our modified version of OpenVPN in a virtual machine. The session key for the data encryption is negotiated using a hybrid key exchange which combines a post-quantum key exchange algorithm with a classical key exchange algorithm. This combines the time-tested security of the classical algorithm against conventional attackers with the quantum security of the post-quantum algorithm. In our first deployment, we combined the post-quantum Supersingular Isogeny Diffie-Hellman (SIDH), as it existed in March 2018, with the classical Elliptic Curve Diffie-Hellman (ECDH) (using the NIST P-256 curve) to arrive at the symmetric session key used to encrypt data traffic with AES-256. As of 27th March 2020 we have updated to the latest versions of the algorithms and OpenVPN 2.4.8, and are now combining Supersingular Isogeny Key Encipherment (SIKE) (using the SIKEp434 parameter set) with classical Elliptic Curve Diffie-Hellman (still using the NIST P-256 curve) to arrive at the symmetric session key to encrypt data traffic with AES-256. With a configuration change, we can use any of the key exchange algorithms supported by OQS’s OpenSSL.
As is customary best practice, session keys are regularly regenerated while the tunnel is running. Currently we schedule a new key exchange to be run once an hour. This happens while data continues to flow, and so there is no interruption to data traffic while the key exchange runs; data continues to pass using the previous session key until the key exchange completes, whereupon the router nodes begin using the new session key. Re-keying therefore does not cause any of the latency observed in initial tunnel setup.
The post-quantum VLAN is assigned its own IP addresses, and the Microsoft network is configured to deliver traffic destined for those addresses to the router node in Redmond. The router node encrypts the traffic, sends it across the global network inside the tunnel to the pressure vessel in Scotland, where the router node there decrypts the traffic and puts it on the VLAN. Returning traffic is similarly encrypted by that router node, sent back across the global network inside the tunnel to the router node in Redmond, where the router node there decrypts the traffic and forwards it onwards normally.
We have measured a maximum of 250 Mbits/sec of bandwidth over the tunnel. This is below the measured capacity of the underlying link which is capable of 2-3 Gbits/sec. These results are consistent with running an unmodified version of OpenVPN over the same link using only classical cryptography, and appears to be a known limitation of tunnels running entirely in software on commodity hardware, and is not a consequence of the addition of the post-quantum key exchange.
During tunnel operation, latency over the tunnel is comparable to the latency of the underlying connection, when the underlying connection is operating normally. Variance between round-trip ping times is consistently less than 1 millisecond over a link with a typical round-trip ping time of 180 milliseconds.
We are currently using the tunnel to run volunteer computing workloads on five nodes allocated to the PQ VLAN from BOINC, the Berkeley Open Infrastructure for Network Computing. Input data for volunteer jobs are downloaded over the tunnel, processed, and results are then uploaded back via the tunnel. Typical daily transfer over the tunnel is between 300 and 600 megabytes of data, not counting spikes due to operating system updates. As these are computation-heavy workloads rather than communication-heavy workloads, we would not expect them to strain our bandwidth capacity.
We have already released a post-quantum cryptography-enabled Virtual Private Network (VPN) application based on OpenVPN, intended for use to protect the connections between remote workers back to the home office as traffic transits the internet. But encrypted tunnels like these are also used to protect the links between datacenters, as data transits between them.