more than you wanted to know about Nix, NixOS, and my homelab
About a year ago, I totally reconfigured my homelab. I maintained this server remotely throughout university, but an upgrade was long overdue.
This is a comprehensive retrospective on everything that went into that reconfiguration: the tech I chose, the code I wrote, the decisions I made, and rationales for basically everything.
My goal is for an uninformed reader to:
This piece is dedicated to:
This piece has two high-level parts.
The first part resembles (I think) an engineering document. It breaks down the problem I’m trying to solve with this rewrite and identifies properties I want my solution to satisfy. Then, I go over my plan in broad strokes, discussing the approach and software components along with rationales for those choices.
The second part is more like an extremely literate runbook and code walkthrough. I also went to a lot of effort to teach the information needed to fully understand the system. My goal was to write a piece that would teach an uninformed programmer enough to both reproduce and extend the implemented system.
Often, systems are created to fulfill some perceived need. Of course, these perceived needs are limited by knowledge; younger me had never kept a production system (”production” haha) running. When I was choosing an approach, there was no way to know what system properties would be important for my workflow and personality.
The first time around, my only requirement was that the system could run Jellyfin and serve media from my external hard drives. If I believed that some configuration would do this one thing when I left for university in a week, it was good enough for me.
I did a little googling, and the 2021 Perfect Media Server (PMS) guide seemed like a perfect fit. It introduced me to mergerfs, which was an ideal abstraction for my heterogenous set of random hard drives. It also showed a very clear path forward for running containerized hobbyist applications like Jellyfin. The website had a pretty thorough walkthrough of the installation, and it all seemed sensible to me.
Years later, I can look back on everything and say that my requirements were lacking, so I’m excited to do it all over with years of accrued knowledge. I have the chance to re-architect this system from first principles, picking technology and methods based on how I know I like to operate. In order to do that, I want to overview pain points that I plan to avoid.
One of the worst aspects of my original system was its accretion of undocumented changes. What began as a “simple” setup transformed into something I couldn’t confidently recreate.
The PMS installation guide boils down to installing Ubuntu and Docker on some machine and then
adding a single fstab entry. Younger me thought, “trivial installations and just one edit to /etc/fstab
? I’ll remember that! No need to write stuff down.”
But over time, changes started to pile up. I installed a specific NVIDIA driver package, I set some firewall rules, I tweaked some SSH server options. I made little state mutations all over the place, each one seemingly inconsequential in the grand scheme of things.
By the time I realized I was wrong and should have written things down, the information was long gone from my mind. With these gaps in my knowledge, I was no longer confident that, if necessary, I could start over and get back to this version of the system.
Updates will break things. I upgrade system packages, something breaks. I pull new Docker images, something breaks. “Updates will break things” sounds super obvious and is a relatively easy lesson to learn, but high school me just didn’t have enough experience to realize it.
Since my mental model was that updates fix things, I never invested in versioning aspects of my system or learning how to perform rollbacks. When I broke something (the most common issues had to do with my graphics card), I couldn’t guarantee I could go backwards to a system state that worked.
This lack of reproducibility has an important second-order effect: it made me less willing to change things as time went on. I never upgraded my Ubuntu release. I would only do image and package upgrades when I knew I had time to debug issues. Worst of all, if things were working as is, I would hesitate to even initiate upgrades or otherwise tweak things.
There’s also a positive feedback loop created here. A brittle-feeling setup and painful upgrades made me less willing to interact with the system. Without regular interaction, I forgot details and workflows, making future interactions even more painful.
The other big theme here was that my server didn’t feel legible. It didn’t take very long for it to feel like I was poking at a black box instead of configuring a well-understood system.
Setting up serious observability tools didn’t even cross my mind a few years ago, but even if I was informed, I don’t think I would have invested. I figured that I only really cared about two states: service up or service down, so I could get away with ad-hoc measurements when things broke.
Operationally, this was mostly okay. Services rarely went down outright, and I don’t recall any issues triaging outages. I’m thankful for the care that went into these open-source homelab applications like Jellyfin; without the contributions of these developers, I bet my experience wouldn’t be so stable.
Unfortunately, I had subtler ongoing issues: performance dips. My experience with Jellyfin is illustrative: I would experience spurious playback degradations like some choppy video or lots of buffering. In many cases, the problem would disappear shortly afterward. Even with the best tools, these kinds of problems are not trivial to debug. Off the top of my head, a non-exhaustive list of root causes might include: a slow disk read, the server’s network, or the client’s network. If there was transcoding happening, the server might be suffering from GPU throttling, if the GPU was even being used as intended.
I did not have the best tools. There was no way for me to go back after the incident and see any information about the system in that moment. Did the available upload bandwidth suddenly decrease? Did one of my old hard drives just act up? Maybe something happened with my GPU? I couldn’t tell you.
I can’t blame the guides I looked at for not pushing observability either. First, I was a beginner, and the guides I read were stripped down for beginners. Second, I suspect that many hobbyist sysadmins really are fine with just keeping some logs. I wouldn’t be surprised if simple homelab systems with small workloads just work without issues. For a low-touch system like that, observability probably isn’t worth the time and effort.
Every fix for issues I ran into involved extensive Googling. A bromide about learning to work with an unfamiliar technology is that this is a normal experience. Back when I was first configuring this system, “common problems are likely well-represented on the Internet” was a selling point of Ubuntu. It was treated as a given that one would someday run into an issue, paste relevant information into Google, and hope someone on a forum knew exactly what commands and configuration changes were needed.
Some amount of Googling while fixing problems is expected and accepted. It should decrease over time though, as users get better at diagnosis and become knowledgeable enough to solve some problems with just their available tools. Although I have experienced this trend toward self-sufficiency when learning programming languages or frameworks, I never experienced it with Ubuntu system administration. It wasn’t fun feeling like each issue required starting from scratch and discovering some incantation that would fix my current problem.
I like thinking in code. I especially appreciate that the expressive power of code lets me find necessary context pretty efficiently. In sufficiently complex systems, it’s mostly not worth getting to know all the details of that system just to accomplish a particular task; I prefer to identify the code path I care about and build understanding by doing a BFS from the call stack.
Unfortunately, I don’t believe that Ubuntu configuration is conducive to that discoverability. Different aspects of the system are controlled by different utilities that use bespoke configuration systems and can’t easily share data. I remember interacting with systemd, apt, and ufw for configuration tasks; I don’t think there are significant shared conventions or languages for using these tools.
The idea of spending my night grokking forum posts or blindly poking at my server did not spark joy. Struggling is an important part of any worthwhile endeavor, but this flavor of added friction affected more than my immediate enjoyment. Because of the disparate systems and limited sources of information, I never felt like I was on track to one day solve my own problems. It just felt like I was doomed to put out fires forever, with no improvement.
My problems were minor and rare; the expected pain of the process and my expectation of limited learning generally didn’t outweigh the benefits of solving a given problem. This created its own terrible positive feedback loop: if solving a minor problem was going to be a pain, I would just tough it out. That guaranteed I wasn’t going to learn anything significant.
None of the rough edges I’ve written about were show-stoppers. In fact, they can all be overcome
with a bit of investment or a bit of elbow grease. I could have rolled back a set of package
upgrades by reading /var/log/apt/history.log
and doing some scripting.
I could have pinned my container images to a specific image hash, manually swapping the hash for
each upgrade. I could have simply gotten good and read enough man pages and forum posts to really
understand the different tools I was using. I could have kept a log of every shell command I ran
along with its output. These are all doable solutions, but none of them sound particularly fun to
me.
Despite not being technical lessons, this bit of self-knowledge was my most significant takeaway from those three years. Running the server in this fashion taught me what styles of work I didn’t enjoy, and more important, it taught me that I must find joy in the work I expect to have to do. Ubuntu + Docker were technically sufficient; they’re good enough for a variety of production systems and were perfectly fine for me. However, ergonomics matter, and they matter much more in a for-fun environment.
With all these lessons, I came up with a new plan for the creation and ongoing maintenance of my server. I thought it was an interesting exercise in designing an approach from first principles.
I see a lot of value now in writing things down. I can’t rely on my memory, especially over the timescales that hobbyist work tends to get done. I’ve also convinced myself that writing solidifies and clarifies my thoughts; plans and ideas that seem sensible in my head are exposed when I have to commit them to the page.
That doesn’t mean that I’m just disregarding the expected tedium of, for example, maintaining a shell command execution history. I will still do that when it is needed, but identifying that I value writing things down guided other decisions I made for this new server.
Nix and NixOS has a bit of a reputation. When I was growing up, Reddit users and consumer Linux enthusiasts loved to promote Arch Linux as a clean and powerful distro. Funny enough, this actually got to me, and I had a mostly pleasant first year in university with a laptop that dual-booted Arch and Windows. I no longer keep up with or read this kind of content, but my understanding is that the online energy around Nix and NixOS is similar; there are some people praising the affordances of Nix, and the aesthetics and totalizing arguments turn other people off.
I don’t believe that Nix is The One True Way, nor do I believe that users of alternatives should mindlessly migrate. However, Nix is very powerful. Despite its rough edges, I believe it is the right tool for me.
In case you’re not familiar, Nix is a tool for creating “reproducible, declarative, and reliable systems”. There’s also an associated Turing-complete programming language, referred to as the Nix expression language, or just Nix. Because Nix is so powerful, it can be used to write arbitrary software. In practice, programmers write Nix expressions to describe how packages should be built and operating systems should be configured. The canonical example of the latter is NixOS, “a Linux distribution that can be configured fully declaratively and is based on Nix and Nixpkgs.”
What makes NixOS right for me? Fully declarative configuration means that almost any property of a Linux system can be codified with Nix code, whether it’s installed packages, systemd services, firewall configurations, or kernel parameters. Instead of letting these properties get modified ad-hoc, a NixOS system always derives them from the code; changing the system requires changing the code, and vice versa. This is a natural fit for my stated goal of having everything written down. Instead of keeping a log of every command I run, I can leverage the affordances of NixOS to get that codification property without sacrificing ergonomics.
It’s just as important to note that there’s no ergonomic cost for me, because I like the shape of NixOS. Thanks to many people who have worked on NixOS, just about anything can be configured in one place with the Nix expression language; I don’t have to edit my hosts file when there’s Nix code to turn my high-level list expression into that hosts file. This doesn’t make knowledge of Linux subsystems less important; if I hadn’t already learned about e.g. systemd units, I would have had to learn about them here. However, NixOS connects all these subsystems into a single expressive configuration system, making things much easier to discover via the NixOS options search page or by looking through the nixpkgs repo.
In fact, I like this so much that it bleeds over into my other goal of maximizing fun. I enjoy doing some functional programming and spelunking in the nixpkgs source code, and I would much rather think in code than get familiar with miscellaneous command-line tools. If that doesn’t resonate, then NixOS is probably not the right tool for you, just like Ubuntu ultimately wasn’t really the right tool for me.
I said that NixOS codifies almost any property of a Linux system because the boundary is roughly drawn at the filesystem. Unless you specify otherwise, like in the hosts file contents example, NixOS configuration will leave your files alone. For a lot of things, this is what you want. However, there are many important system properties that are derived from stateful file contents. That leaves me with a problem: I want to have enough information written down to reproduce my system from scratch, but I’m currently dependent on unknown file contents that I’m not tracking.
I can’t get away with just never writing things to disk, but not all file state is essential. Backing up my whole boot disk would give me the reproducibility I desire, but I would much rather just know what holes I have to fill. Graham Christensen’s blog post, “Erase your darlings”, makes this point very nicely. Instead of giving up and saving everything, I can start with a system that erases everything on boot. Then, I’ll selectively persist necessary files.
This is super easy to do in a Nix system, mostly thanks to the nix-community/impermanence project. This bit of code lets users specify persisted files in Nix code and handles the symlinking that keeps everything functional.
Tailscale is not related to any of the points I’ve been making, but it’s such a huge upgrade to the homelab experience that I have to mention it.
I run services on my server, and I would like to access those services anywhere I have internet access. Unfortunately, exposing a server on a residential network to the open internet requires a bit of fiddling (namely, port forwarding and dynamic DNS) and is also not really a good idea. I have years of sparse experience with this, dating back to the canonical “let’s host a Minecraft server”, and it’s not something I look back fondly on. The worst part is the mild anxiety that comes with knowing that anyone could hypothetically establish a connection with whatever service was listening on that port (a reverse proxy, most likely).
Tailscale solves both of these problems. Tailscale handles establishing Wireguard connections between devices; a relevant pair here is my server and my MacBook. Just establishing a Wireguard connection with my server requires the initiator to have a private key generated on my authorized devices. Tailscale also handles NAT traversal (described in one of my favorite technical writeups here), making the system robust to having peers on separate modern networks. Goodbye port forwarding and dynamic DNS! I can directly access my machines anywhere I have internet access without exposing them to the open internet.
This walkthrough has two main components:
These two components serve different goals. The combination can be a bit much, so I want to add some context and instructions.
When I was young and installing a Linux distro for the first time, I relied a lot on very detailed walkthroughs that discussed specific commands. Having a targeted runbook to start with provides a sense of safety, in my experience; knowing what to expect and roughly what commands to read about can be the difference between merely thinking an install would be neat and actually going through with it.
However, that’s a lot of fluff for someone that has already installed a Linux distro or is just reading this for fun. Even if the minutiae of a manual NixOS install doesn’t interest you, I believe this piece is still valuable for its targeted teaching of Nix and NixOS. In my opinion, there’s a big knowledge gap between just going through the NixOS manual’s installation guide and owning a NixOS system with all its power; I hope to bridge that gap.
If you’re reading for fun or learning, I recommend you skim the normal Linux distro installation steps and just read about writing a NixOS configuration. That section contains the meat of the in-context explanation and doesn’t depend on the first part.
Before doing anything NixOS-related, installing NixOS looks a lot like installing any other Linux distro: we need to boot from the install media and then we need a filesystem to install to.
This topic is well-covered online and in the NixOS manual. For posterity and completeness, I’ll note what these steps looked like for me.
Making an installer USB stick is super easy on macOS:
me@mbp:$ diskutil list
...
/dev/disk6 (external, physical):
#: TYPE NAME SIZE IDENTIFIER
0: GUID_partition_scheme *32.0 GB disk6
1: Microsoft Basic Data LEXAR 32.0 GB disk6s1
$ diskutil unmountDisk disk6
Unmount of all volumes on disk6 was successful
# it is significantly faster to write to /dev/rdisk6 when doing disk imaging
me@mbp:$ dd if=nixos-minimal-24.05.2028.e4509b3a560c-x86_64-linux.iso of=/dev/rdisk6
After booting to the installer, I’ll give myself ssh access.
nixos@nixos:$ mkdir .ssh
nixos@nixos:$ curl https://github.com/seridescent.keys >> ~/.ssh/authorized_keys
# downloading...
nixos@nixos:$ systemctl restart sshd
Then, the filesystem. Since I’m planning on using Impermanence, I want a root filesystem that supports easy snapshotting and rollbacks. ZFS seems like a reliable option that provides the desired features alongside other fault-tolerance features.
The ZFS organization system has two concepts: pools and datasets. Informally, a pool organizes one or more drives along with some pool-wide settings. One pool can then store multiple datasets, each of which can be individually snapshotted and additionally configured.
Before I start configuring ZFS, I will reconfigure my SSD. While I was reading about ZFS configuration, I learned that I should consider changing my SSD’s logical block address (LBA) size. Modern drives use 4096 byte LBAs for “increased storage density and improved error correction capabilities” (per the wonderful Arch Wiki), but some SSDs emulate 512 byte blocks for compatibility. Turns out my SSD is guilty of such emulation when it could be using the “Better” 4096 byte LBA format.
me@old-ubuntu-server:$ sudo nvme id-ns -H /dev/nvme0n1 | grep "Relative Performance"
LBA Format 0 : Metadata Size: 0 bytes - Data Size: 512 bytes - Relative Performance: 0x2 Good (in use)
LBA Format 1 : Metadata Size: 0 bytes - Data Size: 4096 bytes - Relative Performance: 0x1 Better
I should be able to change this when I format the SSD with the option --lbaf=1
.
The Arch Wiki also now recommends that users “consider performing SSD memory cell clearing” to “restore factory default write performance”.
I should be able to perform a user data erase with the option -s 1
.
nixos@nixos:$ nvme format -s 1 --lbaf=1 /dev/nvme0n1
Next comes partitioning. Most pieces I read suggest just three partitions: one ESP, one for the pool, and one swap.
nixos@nixos:$ parted -s /dev/nvme0n1 -- mklabel gpt
nixos@nixos:$ parted -s /dev/nvme0n1 -- mkpart primary 512MiB -8GiB
nixos@nixos:$ parted -s /dev/nvme0n1 -- mkpart primary linux-swap -8GiB 100%
nixos@nixos:$ parted -s /dev/nvme0n1 -- mkpart ESP fat32 1MiB 512MiB
nixos@nixos:$ parted -s /dev/nvme0n1 -- set 3 esp on
With the SSD re-configured, I’ll create one pool using just the SSD:
# readable command
zpool create
-o ashift=12
-O acltype=posixacl
-O xattr=sa
-O dnodesize=auto
-O compression=zstd
-O mountpoint=none
rpool
/dev/nvme0n1p1
# one line
nixos@nixos:$ zpool create -o ashift=12 -O acltype=posixacl -O xattr=sa -O dnodesize=auto -O compression=zstd -O mountpoint=none rpool /dev/nvme0n1p1
ashift=12
: tells ZFS the block size exponent of the underlying disk, which is just set to 2^12
= 4096 bytes.acltype=posixacl
: seems sensible to enable access control lists for the filesystemxattr=sa
: improves performance of features that rely on extended attributes like POSIX ACLs;
this option is recommended to be set alongside acltype=posixacl
dnodesize=auto
: this option is recommended to be set alongside xattr=sa
compression=zstd
: set the compression algorithmmountpoint=none
: both the zfsprops page and the “Mount Points” section of the zfsconcepts page
left me confused on what exactly this does besides preventing the filesystem from being mounted.
My guess is that since this is a root pool, it doesn’t make sense for it to be mounted anywhere
like a pool used for data would be.rpool
: name of this pool; “r” for root/dev/nvme0n1p2
: the partition to use as this pool’s virtual deviceApproximately following “Erase your darlings”, I will then make four datasets:
## root dataset; gets rolled back to blank every reboot
nixos@:$ zfs create -o mountpoint=legacy rpool/root
## nix dataset
# these artifacts are created by Nix code and are only read during runtime,
# so no access time writes needed
nixos@:$ zfs create -o mountpoint=legacy -o atime=off rpool/nix
## home dataset
nixos@:$ zfs create -p -o mountpoint=legacy rpool/safe/home
## system-wide persisted dataset
nixos@:$ zfs create -o mountpoint=legacy rpool/safe/persist
## reserved space to maintain zfs performance
# see https://nixos.wiki/wiki/ZFS#Reservations
nixos@:$ zfs create -o refreservation=48G -o mountpoint=none rpool/reserved
Before doing anything else, I’ll snapshot the blank root dataset.
nixos@:$ zfs snapshot rpool/root@blank
Finally, I’ll mount these datasets.
nixos@nixos:$ mkdir /mnt/root
nixos@nixos:$ mount -t zfs rpool/root /mnt
nixos@nixos:$ mkdir /mnt/nix /mnt/home /mnt/persist /mnt/boot
nixos@nixos:$ mount -t zfs rpool/nix /mnt/nix
nixos@nixos:$ mount -t zfs rpool/safe/home /mnt/home
nixos@nixos:$ mount -t zfs rpool/safe/persist /mnt/persist
The NixOS wiki suggests checking the pool status, which seems like a sensible checkpoint.
nixos@:$ zpool status
pool: rpool
state: ONLINE
config:
NAME STATE READ WRITE CKSUM
rpool ONLINE 0 0 0
nvme0n1p1 ONLINE 0 0 0
errors: No known data errors
Next, I have to format the boot and swap partitions.
nixos@nixos:$ mkfs.fat -F 32 -n ESP /dev/nvme0n1p3
nixos@nixos:$ mkswap -L swap /dev/nvme0n1p2
nixos@nixos:$ swapon /dev/nvme0n1p2
Finally, I need to note the UUID of my new boot partition and mount it.
nixos@nixos:$ sudo blkid | grep nvme
/dev/nvme0n1p3: LABEL_FATBOOT="ESP" LABEL="ESP" UUID="0632-1869" BLOCK_SIZE="4096" TYPE="vfat" PARTLABEL="ESP" PARTUUID="5679b059-859b-4035-aa0f-7fdb84ca2f62"
/dev/nvme0n1p1: LABEL="rpool" UUID="6469899763605391201" UUID_SUB="10634005405268408997" BLOCK_SIZE="4096" TYPE="zfs_member" PARTLABEL="primary" PARTUUID="d31c662f-9898-41f7-be5e-b388e41f045e"
/dev/nvme0n1p2: LABEL="swap" UUID="9ad578cd-8df9-4186-82e7-eca235f1aec8" TYPE="swap" PARTLABEL="primary" PARTUUID="5705784e-f4bf-4326-94f9-30003b78fa33"
nixos@nixos:$ mount -o umask=077 /dev/disk/by-uuid/0632-1869 /mnt/boot
With the drive partitioned and formatted, it’s finally time to work with Nix. The next step of the
install process is to run nixos-generate-config
, which creates a starting point based on visible
hardware. My filesystem set up is weird enough that I would prefer specifying it myself, so I won’t
generate filesystem configurations.
# --root *root*: treat the passed directory as the root of the filesystem
# --no-filesystems: omit filesystem and swap device configurations
nixos@nixos:$ nixos-generate-config --root /mnt --no-filesystems
This incantation creates hardware-configuration.nix
and configuration.nix
. I’ve included them
for posterity, but reading their contents is optional. The hardware configuration is not very
interesting, and I’ll be walking through writing and understanding a configuration.nix
file from
scratch.
hardware-configuration.nix
# Do not modify this file! It was generated by ‘nixos-generate-config’
# and may be overwritten by future invocations. Please make changes
# to /etc/nixos/configuration.nix instead.
{ config, lib, pkgs, modulesPath, ... }:
{
imports =
[ (modulesPath + "/installer/scan/not-detected.nix")
];
boot.initrd.availableKernelModules = [ "xhci_pci" "ahci" "nvme" "usbhid" "uas" "usb_storage" "sd_mod" "rtsx_pci_sdmmc" ];
boot.initrd.kernelModules = [ ];
boot.kernelModules = [ "kvm-intel" ];
boot.extraModulePackages = [ ];
# Enables DHCP on each ethernet and wireless interface. In case of scripted networking
# (the default) this is the recommended approach. When using systemd-networkd it's
# still possible to use this option, but it's recommended to use it in conjunction
# with explicit per-interface declarations with `networking.interfaces.<interface>.useDHCP`.
networking.useDHCP = lib.mkDefault true;
# networking.interfaces.enp109s0f1.useDHCP = lib.mkDefault true;
# networking.interfaces.wlp110s0.useDHCP = lib.mkDefault true;
nixpkgs.hostPlatform = lib.mkDefault "x86_64-linux";
hardware.cpu.intel.updateMicrocode = lib.mkDefault config.hardware.enableRedistributableFirmware;
}
configuration.nix
# Edit this configuration file to define what should be installed on
# your system. Help is available in the configuration.nix(5) man page, on
# https://search.nixos.org/options and in the NixOS manual (`nixos-help`).
{ config, lib, pkgs, ... }:
{
imports =
[ # Include the results of the hardware scan.
./hardware-configuration.nix
];
# Use the systemd-boot EFI boot loader.
boot.loader.systemd-boot.enable = true;
boot.loader.efi.canTouchEfiVariables = true;
# networking.hostName = "nixos"; # Define your hostname.
# Pick only one of the below networking options.
# networking.wireless.enable = true; # Enables wireless support via wpa_supplicant.
# networking.networkmanager.enable = true; # Easiest to use and most distros use this by default.
# Set your time zone.
# time.timeZone = "Europe/Amsterdam";
# Configure network proxy if necessary
# networking.proxy.default = "http://user:password@proxy:port/";
# networking.proxy.noProxy = "127.0.0.1,localhost,internal.domain";
# Select internationalisation properties.
# i18n.defaultLocale = "en_US.UTF-8";
# console = {
# font = "Lat2-Terminus16";
# keyMap = "us";
# useXkbConfig = true; # use xkb.options in tty.
# };
# Enable the X11 windowing system.
# services.xserver.enable = true;
# Configure keymap in X11
# services.xserver.xkb.layout = "us";
# services.xserver.xkb.options = "eurosign:e,caps:escape";
# Enable CUPS to print documents.
# services.printing.enable = true;
# Enable sound.
# services.pulseaudio.enable = true;
# OR
# services.pipewire = {
# enable = true;
# pulse.enable = true;
# };
# Enable touchpad support (enabled default in most desktopManager).
# services.libinput.enable = true;
# Define a user account. Don't forget to set a password with ‘passwd’.
# users.users.alice = {
# isNormalUser = true;
# extraGroups = [ "wheel" ]; # Enable ‘sudo’ for the user.
# packages = with pkgs; [
# tree
# ];
# };
# programs.firefox.enable = true;
# List packages installed in system profile.
# You can use https://search.nixos.org/ to find more packages (and options).
# environment.systemPackages = with pkgs; [
# vim # Do not forget to add an editor to edit configuration.nix! The Nano editor is also installed by default.
# wget
# ];
# Some programs need SUID wrappers, can be configured further or are
# started in user sessions.
# programs.mtr.enable = true;
# programs.gnupg.agent = {
# enable = true;
# enableSSHSupport = true;
# };
# List services that you want to enable:
# Enable the OpenSSH daemon.
# services.openssh.enable = true;
# Open ports in the firewall.
# networking.firewall.allowedTCPPorts = [ ... ];
# networking.firewall.allowedUDPPorts = [ ... ];
# Or disable the firewall altogether.
# networking.firewall.enable = false;
# Copy the NixOS configuration file and link it from the resulting system
# (/run/current-system/configuration.nix). This is useful in case you
# accidentally delete configuration.nix.
# system.copySystemConfiguration = true;
# This option defines the first version of NixOS you have installed on this particular machine,
# and is used to maintain compatibility with application data (e.g. databases) created on older NixOS versions.
#
# Most users should NEVER change this value after the initial install, for any reason,
# even if you've upgraded your system to a new NixOS release.
#
# This value does NOT affect the Nixpkgs version your packages and OS are pulled from,
# so changing it will NOT upgrade your system - see https://nixos.org/manual/nixos/stable/#sec-upgrading for how
# to actually do that.
#
# This value being lower than the current NixOS release does NOT mean your system is
# out of date, out of support, or vulnerable.
#
# Do NOT change this value unless you have manually inspected all the changes it would make to your configuration,
# and migrated your data accordingly.
#
# For more information, see `man configuration.nix` or https://nixos.org/manual/nixos/stable/options#opt-system.stateVersion .
system.stateVersion = "24.05"; # Did you read the comment?
}
It would be impractical for me to fully overview the Nix language here. Instead, I’ll touch on just enough syntax for a programmer to follow along with reading and writing a Nix configuration. For a full overview, check out the nix.dev language overview and the Nix manual.
Below is an attribute set, a widely used compound type. It is “a collection of name-value-pairs, where names must be unique”. This example from the nix.dev language overview shows a Nix attribute set alongside an equivalent object in JSON representation. It also shows off Nix’s basic types. I’ve reproduced it here:
{
string = "hello";
integer = 1;
float = 3.141;
bool = true;
null = null;
list = [ 1 "two" false ];
attribute-set = {
a = "hello";
b = 1;
c = 2.718;
d = false;
}; # comments are supported
}
{
"string": "hello",
"integer": 1,
"float": 3.141,
"bool": true,
"null": null,
"list": [1, "two", false],
"object": {
"a": "hello",
"b": 1,
"c": 2.718,
"d": false
}
}
Selecting a particular attribute from an attribute set can be done with the .
operator:
nix-repl> { a = { b = 3; c = 4; }; }.a.b
3
Nix also has some sugar for defining nested attribute sets, with these two expressions being equivalent:
{
a = {
b = 3;
c = 4;
};
}
{
a.b = 3;
a.c = 4;
}
Finally, here is an anonymous function in Nix: x: x + 1
. This expression is basically the same as lambda x: x + 1
in Python and x => x + 1
in
JavaScript. Like many functional programming languages, function application in Nix looks like this:
nix-repl> (x: x + 1) 3
4
Although a basic NixOS configuration primarily assigns constants to attributes, you should note that an attribute’s paired value can be an arbitrary expression. For example:
nix-repl> { f = x: x + 1; }.f 3
4
With that out of the way, let’s focus on configuration.nix
first. I’ll touch briefly on the big
picture, but I recommend reading the NixOS Configuration File section and the Writing NixOS Modules section of the NixOS manual for more information. My hope is that this section is enough for a programmer to
see that writing a NixOS configuration can be more like programming in a peculiar environment than
merely configuring things in a weird language.
This is the structure of a NixOS configuration:
# reproduced from the NixOS Manual section "NixOS Configuration File"
{ config, pkgs, ... }: {
/* option definitions */
}
The starter NixOS configuration is defined as a Nix function. Instead of binding the argument to a
simple name, the left-hand side is a pattern. This particular pattern only matches attribute sets
that contain the attributes config
and pkgs
and then binds those values to config
and pkgs
within the scope of the function body. The ellipsis means that additional attributes are allowed in
the argument attribute set. The function body indicates that this function, when applied, evaluates
to another attribute set. Pretty simple so far.
The configuration’s output attribute set is composed of option definitions. To be extra clear, an option is in the context of NixOS and its module system (which will be defined later), not Nix the
language. For user-configuration purposes, defining an option just means binding a value to a
particular attribute set key. Let’s start filling in this configuration.nix
attribute set with
some option definitions.
{ config, pkgs, lib, ... }:
{
time.timeZone = "America/New_York";
environment.systemPackages = [
pkgs.vim
];
}
…okay, but also like what? Where are these options coming from? And how do they get used?
I like to think of these options as stating overrides of defaults. These options were all declared somewhere with some default value (otherwise they would be required). These options are then defined in the user’s configuration module. These declared options are searchable with the NixOS
options search. For example, here is the result for the time.timeZone
option, which I defined with "America/New_York"
.
Many resources stop at “add option definitions to change your configuration”. After all, you can go a long way by only defining options with constants.
However, the suggested user NixOS configuration is still a function. For me, this begs the question,
“what exactly are the inputs to this function?”. Specifically, what are the provided arguments of
the config
and pkgs
”parameters” that so many examples name?
The manual does provide definitions for the possible “parameters” here, some more helpful than
others. config
, pkgs
, and lib
are the only ones generally relevant to a user configuring
NixOS.
lib
is pretty easy, it is just an attribute set defining the nixpkgs library. You can read the
code here. It
provides a variety of helpful functions that you can use.
I think pkgs
is also fairly easy to wrap your head around. It is just a giant attribute set,
mapping attributes like vim
or ripgrep
to a value referred to as a derivation. Diving into
derivations is beyond the scope of this piece, but just know that a derivation defines something,
like the vim
package, that Nix knows how to build. In the vast majority of cases, the specific
value of this giant pkgs
attribute set is defined by the code in nixpkgs, which codifies build
instructions for over 120,000 packages. In practice, these build artifacts get written to the Nix
store. Just for your information, Nix store paths look like this:
/nix/store/${hash}-${package_name}
config
, on the other hand, confused me for a bit. The complete answer to this question is pretty
interesting though, so bear with me.
The NixOS manual says that config
is the result of “all options after merging the values from all modules together”. Not
really helpful without a clear definition of a NixOS module.
The NixOS manual has a section explaining modules a little differently, but I think the best foothold comes from stating these two facts:
Fact #1: The canonical NixOS module is a function with the following structure (figure reproduced and lightly edited from the NixOS manual section, “Writing NixOS Modules”):
{ config, pkgs, lib, ... }:
{
imports =
[ # paths of other modules to evaluate
];
options = {
# option declarations
};
config = {
# option definitions
};
}
Fact #2: The configuration.nix
from above is sugar for the following:
{ config, pkgs, lib, ... }:
{
imports = [];
options = {};
config = {
time.timeZone = "America/New_York";
environment.systemPackages = [
pkgs.vim
];
};
}
Now we can see one obvious source of the “values from all modules” being merged together: the user’s configuration, which is also a NixOS module.
configuration.nix
is not the only NixOS module, of course. NixOS also defines built-in modules
that follow the canonical structure; you can see the code here. Thus, the config
”argument” is
the giant attribute set created by merging all of the config
attribute sets created by these NixOS
modules. config
is your complete NixOS configuration, already accessible when writing your
configuration module.
Thanks to lazy evaluation, you can do funny things like this:
{ config, pkgs, lib, ... }:
{
imports = [];
options = {};
config = {
time.timeZone = "America/New_York";
environment.systemPackages = [
pkgs.vim
# writeShellScriptBin is a descriptively-named function
# that takes two arguments.
#
# $ print-my-time-zone
# America/New_York
(writeShellScriptBin "print-my-time-zone" ''
#!/usr/bin/env bash
echo "${config.time.timeZone}"
'')
];
};
}
That covers config
, and we may as well complete the mental model of NixOS and its module system by
talking about option declarations.
We established that a user provides option definitions by modifying the config
attribute set. The
valid keys of this attribute set are searchable with the NixOS options search,
but where are the option declarations? The canonical structure of a NixOS module states that these
declarations can be defined in the options
attribute set.
Let’s look at a simplified example, connecting the time.timeZone
option that was defined to its declaration in the NixOS source.
# significantly trimmed for brevity
{ config, lib, pkgs, ... }:
{
options = {
time.timeZone = lib.mkOption {
default = null;
type = timezone;
example = "America/New_York";
description = ''
The time zone used when displaying times and dates. See <https://en.wikipedia.org/wiki/List_of_tz_database_time_zones>
for a comprehensive list of possible values for this setting.
If null, the timezone will default to UTC and can be set imperatively
using timedatectl.
'';
};
};
config = {
environment.sessionVariables.TZDIR = "/etc/zoneinfo";
environment.etc =
{
# symlink share/zoneinfo in build output of pkgs.tzdata
# to /etc/zoneinfo
zoneinfo.source = "${pkgs.tzdata}/share/zoneinfo";
}
# // is the attribute set merge operator
# lib.optionalAttrs = cond: as: if cond then as else { };
// lib.optionalAttrs (config.time.timeZone != null) {
localtime.source = "/etc/zoneinfo/${config.time.timeZone}";
localtime.mode = "direct-symlink";
};
};
}
This example locale.nix
module is representative of the code in the real locale.nix
file. It
declares an option, time.timeZone
, which has a type (type definition omitted) and a default value
of null
.
Final note, options
is like config
, a big attribute set of option declarations. These
declarations specify the valid keys and values of the config
attribute set, which we can use to
configure a NixOS system in an expressive fashion.
The coolest part is that you can declare an option in one module and then define it in another.
These declared options can be referenced in option definition expressions, like in the argument to lib.optionalAttrs
above.
That brings it all together! To me, this is the minimum amount of information needed to read any NixOS module without feeling totally lost.
configuration.nix
Here is my initial annotated configuration.nix
:
{ config, pkgs, lib, agenix, ... }: {
imports =
[ # Include the results of the hardware scan.
./hardware-configuration.nix
];
boot.loader.systemd-boot.enable = true;
nix.settings.experimental-features = [ "nix-command" "flakes" ];
time.timeZone = "America/New_York";
networking.hostName = "agrotera";
networking.hostId = "00000001";
## from https://xeiaso.net/blog/paranoid-nixos-2021-07-18/
security.sudo.execWheelOnly = true;
# learned the hard way that if your user has no password
# and you don't explicitly state that wheel does not need a password,
# it's just impossible to use sudo.
# this soft-locked me and required a re-install, because
# i couldn't update my NixOS configuration without sudo
security.sudo.wheelNeedsPassword = false;
age.secrets.ts_auth.file = ./secrets/ts_auth.age;
nix.settings.allowed-users = [ "@wheel" ];
## from https://xeiaso.net/blog/paranoid-nixos-2021-07-18/
services.openssh = {
enable = true;
settings = {
PermitRootLogin = "no";
PasswordAuthentication = false;
};
allowSFTP = false; # Don't set this if you need sftp
extraConfig = ''
AllowTcpForwarding yes
X11Forwarding no
AllowAgentForwarding no
AllowStreamLocalForwarding no
AuthenticationMethods publickey
'';
};
# disable creation of new users at runtime
users.mutableUsers = false;
users.users.seridescent = {
isNormalUser = true;
extraGroups = [ "wheel" ];
# disables logging in to this user using a password altogether
# this defaults to null, but i'll state it anyway
# see https://search.nixos.org/options?channel=24.05&show=users.users.%3Cname%3E.hashedPassword
hashedPassword = null;
openssh.authorizedKeys.keys =
[ "ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAILRpk45aMtMZY+9MAysPHaWZA3hEPsB2feQUUz3Cn1mU mbp"
];
};
environment.defaultPackages = lib.mkForce [];
environment.systemPackages = with pkgs; [
git
vim
ripgrep
mergerfs
rsync
];
services.tailscale.enable = true;
services.tailscale.authKeyFile = config.age.secrets.ts_auth.path;
networking.firewall.enable = true;
networking.firewall.allowedTCPPorts = [ 22 ];
networking.firewall.trustedInterfaces = [ "tailscale0" ];
## boot disk configuration
fileSystems."/" =
{ device = "rpool/root";
fsType = "zfs";
};
fileSystems."/nix" =
{ device = "rpool/nix";
fsType = "zfs";
};
fileSystems."/home" =
{ device = "rpool/safe/home";
fsType = "zfs";
};
fileSystems."/persist" =
{ device = "rpool/safe/persist";
fsType = "zfs";
neededForBoot = true;
};
fileSystems."/boot" =
{ device = "/dev/disk/by-uuid/0632-1869";
fsType = "vfat";
options = [ "fmask=0077" "dmask=0077" ];
};
swapDevices =
[ { device = "/dev/disk/by-uuid/9ad578cd-8df9-4186-82e7-eca235f1aec8"; }
];
boot.supportedFilesystems.zfs = true;
# see https://search.nixos.org/options?channel=24.05&show=boot.zfs.forceImportRoot&from=0&size=50&sort=relevance&type=packages&query=boot.zfs
boot.zfs.forceImportRoot = false;
services.zfs.autoScrub.enable = true;
# "Erase your darlings" recommends disabling the disk scheduler when
# using ZFS in a set up where only part of the disk is ZFS.
# However, the kernel parameter "elevator=none" has since been deprecated,
# so I will use this udev rule from https://discourse.nixos.org/t/enable-none-in-the-i-o-scheduler/36566/3
services.udev.extraRules = ''
ACTION=="add|change", KERNEL=="sd[a-z]*[0-9]*|mmcblk[0-9]*p[0-9]*|nvme[0-9]*n[0-9]*p[0-9]*", ENV{ID_FS_TYPE}=="zfs_member", ATTR{../queue/scheduler}="none"
'';
boot.initrd.kernelModules = [ "zfs" ];
boot.initrd.systemd.enable = true;
# https://discourse.nixos.org/t/zfs-rollback-not-working-using-boot-initrd-systemd/37195/3
boot.initrd.systemd.services.rollback = {
description = "Rollback root filesystem to a pristine state on boot";
wantedBy = [
"initrd.target"
];
after = [
# reading this configuration back, i don't think i need this
"zfs-import-rpool.service"
];
before = [
"sysroot.mount"
];
path = with pkgs; [
zfs
];
unitConfig.DefaultDependencies = "no";
serviceConfig.Type = "oneshot";
script = ''
zfs rollback -r rpool/root@blank && echo " >> >> rollback complete << <<"
'';
};
services.sanoid = {
enable = true;
templates.backup = {
hourly = 36;
daily = 30;
monthly = 3;
autoprune = true;
autosnap = true;
};
datasets."rpool/safe" = {
useTemplate = [ "backup" ];
recursive = true;
};
};
fileSystems."/mnt/disks/internal" =
{ device = "/dev/disk/by-uuid/3b5c2d01-78ad-4a31-993c-0d4b6d5edef5";
fsType = "ext4";
};
fileSystems."/mnt/disks/seagate" =
{ device = "/dev/disk/by-uuid/507c5918-f81f-470a-9bac-36d4f6b883d2";
fsType = "ext4";
};
fileSystems."/mnt/disks/wd" =
{ device = "/dev/disk/by-uuid/5440af94-46b8-4108-8aff-e365173b052e";
fsType = "ext4";
};
fileSystems."/storage" =
{ device = "/mnt/disks/*";
fsType = "fuse.mergerfs";
options = [
"defaults"
"cache.files=off"
"moveonenospc=true"
"dropcacheonclose=true"
"minfreespace=200G"
];
};
# persistence added by Impermanence module
environment.persistence."/persist" = {
directories =
# recommended by the NixOS Manual
[ "/var/lib/nixos"
"/var/lib/systemd"
"/var/log/journal"
# /var/tmp is supposed to be persisted between boots, apparently
"/var/tmp"
# persist tailscale state
"/var/lib/tailscale"
# persist system configuration
"/etc/nixos"
];
files =
# recommended by the NixOS Manual
[ "/etc/zfs/zpool.cache"
"/etc/machine-id"
# for ssh service
"/etc/ssh/ssh_host_ed25519_key"
"/etc/ssh/ssh_host_ed25519_key.pub"
"/etc/ssh/ssh_host_rsa_key"
"/etc/ssh/ssh_host_rsa_key.pub"
];
};
# This option defines the first version of NixOS you have installed on this particular machine,
# and is used to maintain compatibility with application data (e.g. databases) created on older NixOS versions.
#
# Most users should NEVER change this value after the initial install, for any reason,
# even if you've upgraded your system to a new NixOS release.
#
# This value does NOT affect the Nixpkgs version your packages and OS are pulled from,
# so changing it will NOT upgrade your system - see https://nixos.org/manual/nixos/stable/#sec-upgrading for how
# to actually do that.
#
# This value being lower than the current NixOS release does NOT mean your system is
# out of date, out of support, or vulnerable.
#
# Do NOT change this value unless you have manually inspected all the changes it would make to your configuration,
# and migrated your data accordingly.
#
# For more information, see `man configuration.nix` or https://nixos.org/manual/nixos/stable/options#opt-system.stateVersion .
system.stateVersion = "24.05"; # Did you read the comment?
}
Since I’m not declaring any options, I’m using the abbreviated form of a NixOS module that lets
users specify option definitions alongside some reserved keys like imports
. If you’re curious like
I was, you can actually just read the function that unifies these form variants here.
It’s a nice little function to practice reading some real NixOS code and teaching yourself new
syntax.
I think you should read my configuration if you’re really curious what it specifically looks like, but you don’t have to. I did add comments indicating rationales for non-obvious option definitions, but a lot of it is just standard system stuff.
If you’re writing your own configuration, you can start with the generated one and/or collect option definitions that you discover online. I wouldn’t sweat it too much; NixOS is super easy to iterate on, and the build process will complain if your configuration is invalid for whatever reason.
There are still some loose ends to tie up. First of all, saying that pkgs
is nixpkgs is true but
imprecise. Like, packages have versions, but we haven’t specified them. How do updates work? Also,
the configuration.nix
referred to environment.persistence
and age.secrets
, which don’t show up
in the NixOS options search.
To clear this up, I’ll show the last part of my initial configuration, flake.nix
:
{
inputs = {
nixpkgs.url = "github:nixos/nixpkgs/nixos-unstable";
impermanence.url = "github:nix-community/impermanence";
agenix.url = "github:ryantm/agenix";
};
outputs = { self, nixpkgs, impermanence, agenix, ... }: {
nixosConfigurations.agrotera = nixpkgs.lib.nixosSystem {
system = "x86_64-linux";
modules = [
impermanence.nixosModules.impermanence
agenix.nixosModules.default
./configuration.nix
];
};
};
}
Another Nix concept, yay! Read more about the flake.nix
file and Nix flakes generally here and here.
The Nix manual says that a flake is “a filesystem tree (typically fetched from a Git repository or a
tarball) that contains a file named flake.nix
in the root directory.” Thus, I have a flake that
now looks like this:
nixos-configs/
├── configuration.nix
├── hardware-configuration.nix
└── flake.nix
A flake is generally referred to in Nix code by its metadata, specified in flake.nix
. We’ll focus
on outputs
and inputs
.
outputs
is a function that takes an attribute set and returns an attribute set. The return value
can have arbitrary values; its shape just has to match the expectations of whatever will use that
return value, e.g. Nix tooling or third-party tools that consume flakes.
In my case, the NixOS tooling expects that I have an attribute, nixosConfigurations.<name>
, that
is mapped to a value it expects. The details don’t really matter for an end-user, and you can see
that creating this NixOS-legible value involves calling the function nixpkgs.lib.nixosSystem
with
a concrete argument.
You can also pretty easily understand that I am including additional modules in my NixOS
configuration. This highlights the composability of the NixOS module system. You can just add
third-party code that extends your operating system and exposes a common API surface. These modules
declare the additional options that are defined in my configuration.nix
.
This whole discussion has been fairly abstract, with references to code and modules that are
available to us but with no identified connection to real life. We can connect everything to
concrete versions of code by explaining inputs
.
A Nix flake can specify its dependencies with the inputs
attribute set. Each key of this attribute
set refers to another Nix flake. The exact type of this value is a flake reference, which
generically tells Nix where to find a flake. The Nix flake tooling can understand a variety of
reference formats, detailed here; in my flake, you can see that Nix knows how to fetch tarballs from
GitHub repositories, letting me reference specific branches concisely.
Nix flakes don’t have to specify dependencies. The nixpkgs flake just specifies outputs
and is
built with the code in the nixpkgs repository. The vast majority of useful Nix flakes will have
dependencies though.
When you use Nix flake tooling to build your flake, you can think of it as first “filling out” the inputs
attribute set and converting it to an attribute set mapping flake names to their outputs.
This attribute set becomes the argument that your flake’s outputs
function is invoked with.
However, git branches are moving targets; nixpkgs gets new commits everyday, so just specifying a branch in nixpkgs is not very precise. Nix flakes provide reproducibility by pinning inputs. For example, a flake that is versioned by Git might be pinned to a specific commit. Generically, the contents of a flake at some point in time can be hashed to provide a unique content identifier.
When a Nix flake is built, the tooling either uses an existing flake.lock
file or generates a new
one that specifies all of the pinned inputs, thus tying a build to a specific version of the flake’s
dependencies. All the information needed to reproduce a flake build can be tracked in a single git
repository. Isn’t that awesome?! It’s the same power provided by language toolchains like Cargo, but
generic!
Nix flakes also satisfy the ergonomics requirements. Like any other system’s lockfiles, a flake.lock
can be handled like any other file in version control, providing a programmer-friendly
handle for versioning and rollbacks. There’s also built-in tooling to update a flake’s inputs,
sidestepping the pain of manually maintaining and updating long URIs in source code.
That covers all the code needed to perfectly reproduce my system! The last step is to finish the install. I copied my code over and ran one last incantation:
nixos@nixos:$ sudo nixos-install --flake '/etc/nixos#agrotera' --no-root-passwd
copying channel...
building the flake in path:/etc/nixos?lastModified=1720916609&narHash=sha256-kmvAFxwyBiyUIFp5EicnsIiRT3fLe2XQKa04qudY7kA%3D...
# this worked!!
nixos@nixos:$ reboot
Remove the install media, and we’re free!
…just kidding, the ZFS root pool failed to import on boot. However, I was warned about this by the NixOS options documentation for boot.zfs.forceImportRoot
and some other sources; I booted with kernel parameter zfs_force=1
and then I was truly free.
I’m quite pleased by how everything turned out. I walked away from this confident that I could teach myself whatever I wanted to know in a sustainable fashion by reading Nix code. Maintaining this NixOS system for the past year has also been a pleasure, and I have happily tackled problems I would have shied away from before.
I would also recommend this general approach to doing manual installations. I didn’t want to fully script this process because a) I wanted to react to places where my expectations didn’t match reality and b) I wasn’t going to be repeating this installation. The installation process wasn’t just smooth sailing, but it was stress-free because of the upfront investment I made. When things broke spectacularly, a full re-install was just a few minutes of copy-pasting away.
After the installation and a lot of reading Nix reference materials, I found configuring observability and services quite easy. Being able to read the source code that the NixOS options search links to is an incredible superpower. I leave further configuration as an exercise for the reader, hopeful that you’ll find it similarly approachable :D
if you have any feedback on this piece, good or bad, i would love to hear it! see my home page for contact info :)
<3