Unfortunately, some HDD manufacturers do not properly respond to the device inquiry sizes. ZFS looks to the physical sector size (aka physical block size) for its hint on how to optimize its use of the device. If the disk reports that the physical sector size is 512 bytes, then ZFS will use an internal sector size of 512 bytes. The problem is that some HDDs misrepresent 4KB sector disks as having a physical sector size of 512 bytes. The proper response should be that the local logical sector size is 512 bytes and the physical sector size is 4KB. By 2011-2012, most HDDs were properly reporting logical and physical sector sizes. In some cases, the HDD vendors advertise the disks as "emulating 512 byte sectors" or "512e."
There is no functional or reliability problem with 4KB physical sectors being represented as 512 byte logical sectors. This technique has been used for decades in computer systems to allow expansion of device or address sizes. The general name for the technique is read-modify-write: when you need to write 512 bytes (or less than the physical sector size) then the device reads 4KB (the physical sector), modifies the data, and writes 4KB (because it can't write anything smaller). For HDDs, the cost can be a whole revolution, or 8.33 ms for a 7,200 rpm disk. Thus the performance impact for read-modify-write can be severe, and even worse for slower, consumer-grade, 5,400 rpm or variable speed "green" drives.
Bottom line: for best performance, the HDD needs to properly communicate the physical block size via the inquiry commands for best performance.
Inside ZFS, the kernel variable that describes the physical sector size in bytes is
ashift, or the log2(physical sector size). A search of the ZFS archives, can find references to the
ashift variable in discussions about the sector sizes.
- Some HDD models misrepresent their physical block size, resulting in unexpectedly poor performance for some workloads
- Attempting to replace a HDD that had 512 byte physical sectors with a new HDD that has 4KB logical and physical sectors can fail with a mismatched sector size error message
- Some, but not all ZFS implementations offer command-line options to set the physical sector size in either the the
zpoolcommand or via other OS commands and configuration settings
- Older Solaris releases did not set the default sector boundaries on 4KB boundaries, negatively impacting performance (fixed in illumos, later OpenSolaris releases, Solaris 10 recent updates, and Solaris 11)
- 4KB sector disks are not as space-efficient as 512 byte sector disks, in part because ZFS metadata is compressed, dynamically allocated, and often less than 4KB physical size
This page will try to share some knowledge and offer good operating practices for ZFS on illumos.
zpool add Commands
The physical sector size is queried when the
zpool create or
zpool add commands are executed. In these cases, a new top-level virtual device (vdev) is created and the the
ashift value is set. If the disks mixed physical sizes, then an error message is shown.
When adding 4KB physical sector size disks (
ashift = 12) to a pool containing 512 byte physical sector disks (
ashift = 9), or vice-versa, then the resulting pool contains mixed sector size top-level vdevs. ZFS functions properly with mixed-size top-level vdevs.
Note: attempting to replace disks with 512 byte physical sectors with disks that only support 4KB logical sectors can fail, leading to operational issues with stocking spares.
Overriding the Physical Sector Size
illumos illumos can override the physical sector size by configured in the the
sd(7d) driver. Once the device vendor and identification strings are known, the
/kernel/drv/sd.conf file can be modified:
- Using GNOP to emulate 4KB blocks over 512B (to create the pool via FreeBSD, such as a "zfsguru" LiveCD; you can then reboot into OI):