An operating system engineered from the kernel up for LLM training workloads. Zero scheduler ticks on training cores. Immutable, auditable, and purpose-built for GPU cloud infrastructure.
What separates nullhz from a tuning script: kernel recompilation, cluster-scale reproducibility, OS-level fault integration, and a security posture your customers can actually audit.
Ubuntu ships a precompiled generic kernel. You can tune sysctls at runtime, but you cannot enable CONFIG_PREEMPT_RT, CONFIG_NO_HZ_FULL, or strip out netfilter and audit subsystems without recompiling. nullhz is built specifically for training workloads — every kernel byte is intentional, nothing is inherited from a general-purpose distro.
Across thousands of nodes, Ubuntu + scripts drift — different kernel patch levels, different package versions, different uptime state. nullhz is an immutable image artifact. Every node is bit-for-bit identical. When a node straggles, you know it's not an OS configuration issue.
A script triggers checkpoints. nullhz intercepts Machine Check Exceptions at the kernel level, detects training stalls via a hardware watchdog module, and triggers atomic BTRFS snapshots before a node fails — not after. Recovery is built into the OS, not bolted on top.
Ubuntu runs with hundreds of background processes, apt, snapd, and avahi — even after tuning. nullhz contains exactly what LLM training requires and nothing else. Every package is documented, auditable, and reviewed. A complete manifest ships with every image.
The name says it. HZ=0 on training cores. The scheduler timer tick is fully suppressed via nohz_full. All hardware interrupts are pinned to OS-only cores. Training threads run uninterrupted — no kernel, no daemon, no NIC interrupt ever preempts them.
Every NVIDIA driver release gets a validated nullhz image within 14 days, with a 48-hour hotfix SLA for critical issues. Your cluster stays current without internal OS maintenance overhead. Kernel CVE patches are included in every release cycle.
Four optimization layers, independently benchmarkable, cumulatively delivering ~20% efficiency improvement on A100 and H100 clusters.
PREEMPT_RT kernel + nohz_full + CPU isolation eliminates scheduler jitter on training cores. Synchronized distributed training stops waiting for the slowest node.
RoCE v2 tuning, GRO/TSO disabled on training NICs, 256MB socket buffers, and ECN/PFC configuration optimized for NCCL all-reduce patterns.
16K pre-allocated 2MB huge pages, strict NUMA pinning to GPU-local memory, vm.swappiness=0, and disabled background writeback interruption.
No systemd, no apt, no snapd. A minimal init sequence with zero non-essential kernel threads competing for CPU time, cache, or memory bandwidth.
nullhz is designed for environments where your customers demand infrastructure audits. Every package is justified. Every kernel module is intentional.
The OS partition is mounted read-only after boot. No package manager, no shell access by default, no drift. What ships is what runs — always. Mutations require a signed image update.
Every binary in the image is documented. A full package manifest and kernel config diff vs upstream ships with every release — making SOC2 and infrastructure audits straightforward for your customers.
No avahi, no mDNS, no CUPS, no D-Bus, no systemd-resolved. The only network listeners are RDMA management and your training workload. Every open port is deliberate.
Every nullhz release is cryptographically signed. Nodes validate image integrity at boot using secure boot chains. Tampered or unsigned images are rejected before kernel handoff.
Critical kernel CVEs patched and a new validated image available within 48 hours. Standard CVEs within 14 days. Your cluster stays hardened without internal security engineering overhead.
Kernel-level eBPF probes provide an immutable audit trail of system calls, process launches, and network connections — without the overhead of userspace audit daemons. Logs ship directly to your SIEM.
These aren't sysctl tunables. They require a purpose-built kernel — which is exactly what nullhz ships.
Every component of nullhz exists to maximize GPU utilization and eliminate the OS as a bottleneck.
We'll run nullhz against your current stack on your hardware. You keep the results.