Thoughts from an Outsider - Part 1

image.png

Update: After interesting discussions on LinkedIn, I will be sharing more thoughts in another post (Part 2): specially related to why P4 + PCIe might have been a good choice. For now, a good post from Tom Herbert on the topic: P4 vs DPDK vs eBPF/XDP. Also, have in mind:

  1. Original critic was why to spend so much time with the P4 choice (I'm starting to understand). 2. eBPF comparison to Dtrace (not considering eBPF as an off-loader and packet processor, or eBPF latest features and portability. 3. The use of VirtIO for their virtual machines. I’ll write more about all this.

I always wanted to study a bit further what Oxide Computer Company is doing (last time I played with OpenSolaris — Illumos in their case — Jonathan Schwartz was still Sun Microsystem's CEO =D). This was done in my free time and with no purpose.

I have no filiation to anything competing with them and I would, for sure, love to have one of their racks at home :P. I would likely use their VMM hypervisor but with Linux guest VMs only. I would love to play with their performance and management gathering APIs.

<aside> ☝

For those who don't know me, I spent several years at Sun Microsystems working on SPARC, UltraSPARC CMT, HPC, and InfiniBand, followed by time at IBM on s390x systems. I’m well-versed in hardware architecture, having managed environments like z/VM, z/OS Unix System Services, Solaris, and Beowulf clusters, along with other legacy systems, despite now focusing on OS internals and runtime security.

</aside>

<aside> ⏰

I dove into Oxide’s documentation tonight (a welcome break from my own codebase and the real-time security grind), and here are my thoughts for anyone curious.

</aside>

Helios

Helios is Oxide Computer Company’s customized operating system, derived from illumos with OpenSolaris roots, engineered to power the Oxide Rack, a rack-scale platform for on-premises cloud computing. It is optimized for Oxide’s integrated hardware, including compute sleds, P4-programmable network switches (using Intel Tofino 2 ASICs), and a unified DC bus power architecture, delivering a cohesive compute, storage, and networking solution.

While not fully open source, Helios supports building custom packages through its open-source components, with pre-built images available in the helios-engvm repository for streamlined deployment.

Helios, Oxide Computer Company’s illumos-based operating system, is a lightweight platform tailored for the Oxide Rack’s hyper-converged architecture, unifying compute, storage, and networking.

Most workloads run inside virtual machines managed by Propolis, Oxide’s Rust-based hypervisor inspired by bhyve (originally developed for FreeBSD), on compute nodes called sleds. Propolis leverages Oxide’s hardware accelerations and the Oxide Packet Transformation Engine (OPTE) for efficient virtual networking.

image.png

Network Architecture

Network virtualization is achieved through Geneve encapsulation, enabling Virtual Private Clouds (VPCs) with overlapping customer IP ranges, supporting features like VPC routing, firewalls, NAT, and peering.

The Oxide Packet Transformation Engine (OPTE), a Rust-based programmable engine running on each compute sled, handles packet transformations (e.g., routing, NAT, firewalling) and integrates with virtio-net devices exposed to guest VMs via the Propolis hypervisor, ensuring broad OS compatibility and performance via offloads (e.g., TSO, checksums).

Boundary Services, implemented on Tofino 2 switches, manage external connectivity using BGP to advertise customer IPs, performing packet transformations for NAT and floating IPs, while reserving switch resources for scalability.

<aside> 🗣

This setup requires guest OSes (e.g., Linux, Windows) to use virtio drivers to interface with Propolis’s virtio host drivers, which are non-Linux but standards-compliant. While this generally works well, potential compatibility or performance issues with virtio drivers may arise, though no public complaints or widespread issues have been identified in available sources as of May 2025.

</aside>

The design draws from Microsoft’s VL2/Ananta/VFP, Google’s Andromeda, and Joyent’s fabrics, favoring L3 over L2 to avoid ARP/NDP overhead and enable scalable routing. Virtio is chosen over SR-IOV for guest networking due to broad OS support, migration ease, and tenancy scalability, with potential evolution toward SmartNICs or PCIe doorbells to reduce VM exits.

<aside> 🤔

Oxide chose virtio over SR-IOV for guest networking to ensure broad OS compatibility and easier instance migration, but their heavy investment in custom P4-based ASIC logic (via Tofino 2 and x4c compiler) raises questions about why they didn’t adopt an InfiniBand-like topology instead.

InfiniBand, an established, open technology, offers high bandwidth, low latency, and predictable performance through RDMA, potentially simplifying the network design and leveraging existing, widely adopted standards rather than developing proprietary programmable networking logic for their rack-scale platform.

</aside>