ZFS on Linux: zfs-fuse 0.4.0_alpha1
Friday, December 29th, 2006As a slighly late Christmas present, Ricardo Correia posted the first version of zfs-fuse with write support, which is a major advance in bringing ZFS to Linux. The idea with zfs-fuse is to port ZFS by reusing as much of the OpenSolaris code as possible, but talking to the Linux kernel through the FUSE interface. FUSE allows you to implement a filesystem in user space (rather than kernel space), which has a variety of pros and cons. On the pro side, since the code you are writing runs in the user memory space, coding errors will not bring down the entire system. On the con side, a user space implementation will almost certainly be slower than a kernel space implementation. And on the legal side, until OpenSolaris is released with a GPL-compatible license, it would be difficult for anyone to distribute a port which used the CDDL-licensed ZFS code in the Linux kernel. By pushing that code into user space, you avoid the entire license issue without having to reimplement ZFS from scratch.
At this point, zfs-fuse has huge warnings about performance and stability, but curious to see how slow it really was, I took it for a spin on one of our AMD64 systems. This is a dual core Athlon 64 5000+ w/ 2 GB of memory running Scientific Linux 4.4. Since I didn’t have a spare disk for test, I had to create a zpool using an 8 GB file on one of the existing partitions instead. In order to be fair, I created a similarly sized ext2 filesystem in another 8 GB file and mounted it via the loopback device. During all tests, one unrelated CPU-intensive job was running.
Here are the results when I run bonnie++:
ext2
Version 1.03 ------Sequential Output------ --Sequential Input- --Random-
-Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP
nubar5.localdoma 4G 25557 5 15637 3 47999 4 118.3 0
------Sequential Create------ --------Random Create--------
-Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP
16 6161 98 +++++ +++ +++++ +++ 6573 98 +++++ +++ 19256 99
zfs
Version 1.03 ------Sequential Output------ --Sequential Input- --Random-
-Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP
nubar5.localdoma 4G 23899 1 13453 2 40544 2 122.6 0
------Sequential Create------ --------Random Create--------
-Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP
16 6173 6 17566 9 5813 6 3718 3 18367 9 6847 6
zfs (compression=on)
Then for fun, I turned zfs compression on and tried bonnie++ again:
Version 1.03 ------Sequential Output------ --Sequential Input- --Random-
-Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP
nubar5.localdoma 4G 51171 2 37265 4 120310 5 1796 2
------Sequential Create------ --------Random Create--------
-Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP
16 5653 6 18051 11 3204 3 2959 5 19936 7 5261 4
The first time I did this test, the zfs-fuse daemon process died with a failed assertion. I had to manually unmount the zpool tree, start the process back up, and then zfs mount the partitions back. When I tried the second time, everything worked. The results are highly skewed in favor of compression because bonnie++ writes highly non-random data. The compression ratio for the test files was 28x! With this enormous compression factor, most of the data fit into the disk cache, and the test went very fast. (Don’t take these results seriously, of course.)
zfs (checksum=off)
Finally, I disabled the checksum option to see if this had any visible impact on performance. Turning off checksums in ZFS only disables them for the data blocks, but the metadata blocks are always checksummed.
Version 1.03 ------Sequential Output------ --Sequential Input- --Random-
-Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP
nubar5.localdoma 4G 24985 1 14295 2 32105 3 116.5 0
------Sequential Create------ --------Random Create--------
-Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP
16 4615 6 5023 3 2439 3 1802 2 18812 11 7768 7
The read speed is actually slower!? I don’t understand this at all. Clearly something strange is going on.
Conclusions
Given the handicap of using both filesystems effectively in loopback mode, the results are actually pretty promising. zfs-fuse writing is about 10% slower than ext2, and read is 20% slower than ext2. That’s not bad considering how unoptimized zfs-fuse is at this stage, and all the non-performance benefits that zfs has to offer. Hopefully in a few weeks I can give zfs-fuse a whirl on some real disks and test a more realistic use case.
