Posted in

SSD caching on a NAS sounds clever, but it's the wrong upgrade for most…

The topic SSD caching on a NAS sounds clever, but it’s the wrong upgrade for most… is currently the subject of lively discussion — readers and analysts are keeping a close eye on developments.

This is taking place in a dynamic environment: companies’ decisions and competitors’ reactions can quickly change the picture.

Browse home lab forums for a few minutes, and you’re bound to come across dozens of tips to boost your Network-Attached Storage server’s capabilities. Many of these, like enabling SMB multichannel and using an SSD as the boot drive, actually make a lot of sense, and can improve your storage server’s performance. But you’re just as likely to encounter advice on enabling random services – some of which can cause more harm than good.

SSD caches are one such aspect, as there’s a lot to them than meets the eye. If you use ZFS as the underlying file system as I do, you’ll often see posts about three types of caches (well, technically two, because calling the last one a cache would be somewhat incorrect), and how they can speed up your NAS tasks. But having tinkered with each of them, I have to admit that they’re just not worth it for the average consumer. At best, they can improve really niche tasks, while the more complicated setups can even end up corrupting your storage drive.

Starting with the most common SSD cache folks recommend using, the read-only Level 2 Adaptive Replacement Cache is meant to supplement the RAM-based ARC. Let’s say you’ve got 16GB of memory in your NAS, with 50% of it assigned to RAM cache by TrueNAS. Depending on the storage capacity of your NAS, 8GB of ARC might be a little too low, so you can assign hundreds of GBs from any old SSD as a secondary cache in the form of L2ARC. On paper, the extra capacity, alongside the faster (compared to HDDs, I mean) speeds of an SSD, should boost transfer speeds significantly.

Unfortunately, there are a couple of annoying quirks with an SSD-powered read cache. For starters, constantly writing to the cache can lower your SSD’s lifespan. So, you might want to opt for high-endurance drives if you don’t want your expensive NVMe drive to get overwhelmed by frequent caching operations. Then there’s the fact that L2ARC only enhances read speeds for data that you’d access pretty often, meaning everything from backups to occasional movie sessions won’t really benefit from an SSD cache. Essentially, you’d end up expending your SSD’s endurance without noticeably improving your home lab tasks.

Of course, you should encounter faster speeds when accessing the same set of files via your L2ARC setup. But unless you run VMs off a NAS, write databases constantly, or run games off network drives (yes, I’ve attempted it, and it works surprisingly well), there’s not much of a point in configuring a read cache on your SSD. And that’s assuming L2ARC can even hasten read operations in the first place. On a RAM-starved system, you could end up spending precious RAM space to fit the L2ARC mapping table, thereby causing extra latency. Combine all that with the occasional cache miss, and you can see why L2ARC isn’t worth using in a home lab environment.

ZFS may offer a lot of benefits, but it can also ruin your NAS if you don’t play by its rules

The Separate Intent Log can technically improve performance for write tasks, but it only affects those that are synchronous in nature. For reference, the majority of tasks in a typical NAS are asynchronous, meaning a write task is considered complete as soon as it’s written to the RAM, instead of being stored in the HDD like you’d expect. The problem with asynchronous tasks is that if the power were to go out before the file system could transfer the data from the RAM to the hard drive, the entire operation would fail. The files involved in synchronous write operations, on the other hand, are first sent to the RAM and written in a persistent data unit (which you’d typically call a ZFS Intent Log) before the operation is deemed complete.

In a conventional HDD-only pool, synchronous operations can take a while to wrap up, as they won’t be considered complete until the file gets written to the terribly-slow ZIL on your hard drive(s). Tossing an SSD as the ZIL can expedite the synchronous tasks quite a bit, especially if you’re storing entire VMs or databases over a network share on your NAS. But any ol’ backup operations and media archival tasks won’t benefit from your SLOG storage. And considering that you don’t need the abysmally slow sync operations for typical NAS workloads anyway, SLOG becomes yet another unnecessary addition to an average tinkerer’s storage hub.

Up until now, every SSD cache I’ve mentioned has one thing in common: losing it wouldn’t affect the actual storage pool whose data it houses. If the drive bearing the L2ARC read cache breaks down, you can just swap it out and continue your backup tasks as you normally would. For SLOG drives, it’s better to opt for a mirrored setup – not because losing it would break your array, but due to the fact that a faulty SLOG configuration would cause your synchronous operations to go back to slower HDD-based ZIL. Sure, you might end up losing the files you intended to store on your NAS drives if the SLOG drive fails mid-transfer, but the overall pool shouldn’t get corrupted.

But the situation is radically different if you go down the Special Metadata VDEV route. Heck, it’s not even a cache, but since I’ve seen many folks treat it as one, I need to get its drawbacks off my chest. If you configure an SSD as a Special Metadata VDEV, it will offload smaller data chunks (including metadata) from an HDD onto the blazing-fast drive. Funnily enough, this feature will actually improve performance when you access directories within your pools, run indexing operations on your archived media, and execute scrub tasks.

The caveat? The Special VDEV is a part of your storage pool, and if anything were to happen to this SSD, you’ll end up losing all the data stored within the overarching pool. As such, you’ll have to invest in multiple high-endurance drives and configure them in a mirrored setup to ensure your Special VDEV doesn’t ascend to tech heaven and take the rest of your files with it. Considering the prices of high-end consumer SSDs, let alone their enterprise-grade counterparts, and the fact that NAS units with enough drive bays to support redundant Special VDEV setups for all your storage pools can cost an arm and a leg, I can’t recommend enabling this feature on your storage server.

A couple of optimizations to help your Proxmox SSD stay in tip-top shape

To be brutally honest with you, I started researching SSD caches just so I could stop the high-speed NVMe and SATA drives I’d pulled out of old hardware from gathering dust. But here’s the thing: tossing them into a NAS as storage pools for frequently-accessed data makes a lot more sense for typical home labs than configuring L2ARC or SLOG setups. As long as you’ve got decent Ethernet speeds, you can just fire up a network share on your old SSD and use it to transfer files at breakneck speeds. That’s pretty much how my experiment on running Steam games off a NAS came to be, and I’ve still got some storage-hogging titles stored on an iSCSI share mounted to my PCIe Gen 3 drive. Likewise, it can serve as a neat boot pool for experimental virtual guests, though you might want to back them up to a proper array if you’re attempting anything even remotely important with the VMs and containers stored on your old SSD.