I have used my file server for quite a while now. It always bothered me that I didn’t use ZFS for it. How can I migrate from a ext4 RAID to ZFS without having to buy another file server with the same storage space available?
Well, I managed to do so with the following approach. I had to buy new hard disks, but I wandted to get rid of the WD Green disks anyway.
Current Situation
- CentOS 6
- 6×3 TB hard disks
- 4×WD Red
- 2×WD Green
- RAID 6, 12 TB, ext4
- 2 available SATA ports
+-md0-(RAID6, 12TB)------------------------------------+
| |
| +-sda-+ +-sdc-+ +-sdd-+ +-sde-+ +-sdg-+ +-sdh-+ |
| |sda1 | |sdc1 | |sdd1 | |sde1 | |sdg1 | |sdh1 | |
| +-----+ +-----+ +-----+ +-----+ +-----+ +-----+ |
| 3TB 3TB 3TB* 3TB* 3TB 3TB |
+------------------------------------------------------+
* WD Green
Goals
- Use ZFS instead of ext4
- Use mirrored vdevs instead of RAID61
- Migrate from CentOS 6 to FreeBSD
- Replace the existing 2 WD Green by WD Red
- Make it possible to extend the storage pool in the future without the need to buy 6 hard disks of the same size
Approach
I will describe the approach high level before I describe each commands and steps to follow in part 2.
New hard disks
First of all, I bought 2 new hard disks. It’s important that these have at least double the size of the largest existing hard disk in the RAID6. In my case these are 2 6 TB (WD Red) hard disks.
Partitioning
One disk gets one big partition spanning the full disk size (without a few
sectors to achieve the optimal alignment). The other one will be partitioned to
have two partitions – each at least the size of the largest existing hard disk
in the existing RAID6. Keep in mind to use optimal alignment (-a opt
) for
them.
Create ZFS pool
The three partitions on the two new disks are striped in a new ZFS pool called
zstorage
. Each of these partitions will later be extended to create a stripe
of 3 mirrors of each 2 disks. This ZFS pool has already the target size (12 TB)
but lacks the redundancy.
+-md0-(RAID6, 12TB)------------------------------------+
| |
| +-sda-+ +-sdc-+ +-sdd-+ +-sde-+ +-sdg-+ +-sdh-+ |
| |sda1 | |sdc1 | |sdd1 | |sde1 | |sdg1 | |sdh1 | |
| +-----+ +-----+ +-----+ +-----+ +-----+ +-----+ |
| 3TB 3TB 3TB* 3TB* 3TB 3TB |
+------------------------------------------------------+
* WD Green
+-zstorage-(zpool, 12 TB)-+
| |
| +-sdi-+ |
| |sdi1 |---vdev1 |
| |sdi2 |---vdev2 |
| +-----+ |
| 6TB |
|
| +-sdf-+ |
| |sdf1 |---vdev3 |
| | | |
| +-----+ |
| 6TB |
+-------------------------+
All data is copied from the RAID6 to this ZFS pool. There is no way to convert the existing ext4 into ZFS without copying the files. This copying provides some sort of redundancy. All files are now both on the RAID6 and on the ZFS pool.
Move redundant disks from RAID6 to ZFS pool
RAID6 allows 2 disks to fail without loosing data. I fail 2 disks manually to remove them from the RAID6 and add them to the ZFS pool.
+-md0-(RAID6, 12TB)------------------------------------+
| |
| +-sda-+ +-sdc-+ +-sdd-+ +-sde-+ +-sdg-+ +-sdh-+ |
| |sda1 | |sdc1 | |sdd1 | |sde1 | |sdg1 | |sdh1 | |
| +-----+ +-----+ +-----+ +-----+ +-----+ +-----+ |
| 3TB 3TB 3TB* 3TB* | | |
+----------------------------------------|--------|----+
| |
+-zstorage-(zpool, 12 TB)-+ | |
| | | |
| +-sdi-+ | | |
| |sdi1 |---vdev1 <----------------------+ |
| |sdi2 |---vdev2 <-------------------------------+
| +-----+ |
| 6TB |
|
| +-sdf-+ |
| |sdf1 |---vdev3 |
| | | |
| +-----+ |
| 6TB |
+-------------------------+ * WD Green
The ZFS pool then looks like this.
+-zstorage-(zpool, 12 TB)---+
| |
| +-sdi-+ +-sdg-+ |
| |sdi1 |---vdev1---|sdg1 | |
| |sdi2 |---vdev2-+ +-----+ |
| +-----+ | |
| 6TB | +-sdh-+ |
| +-|sdh1 | |
| +-sdf-+ +-----+ |
| |sdf1 |---vdev3 |
| | | |
| +-----+ |
| 6TB |
+---------------------------+
Replace the splitted disk devices by hard disk
Now that the previously splitted hard disk partitions have a mirror each, they can be replaced by the smaller devices from the existing RAID6 to free the large new disk (so we can add it as mirror for the other large disk).
This will degrade the RAID6. If something goes wrong, data might get lost.
+-sda-+ +-sdc-+ +-sdd-+ +-sde-+
|sda1 | |sdc1 | |sdd1 | |sde1 |
+-----+ +-----+ +-----+ +-----+
| | 3TB* 3TB*
| +-------replace----------+
+---------replace---------------+ |
| |
+-zstorage-(zpool, 12 TB)--------+ | |
| | | |
| +--------------------------------+ |
| | +--------------------------------+
| | | |
| | | |
| | | +-sdi-+ +-sdg-+ |
| | +->|sdi1 |---vdev1---|sdg1 | |
| +--->|sdi2 |---vdev2-+ +-----+ |
| +-----+ | |
| 6TB | +-sdh-+ |
| +-|sdh1 | |
| +-sdf-+ +-----+ |
| |sdf1 |---vdev3 |
| | | |
| +-----+ |
| 6TB |
+--------------------------------+
The ZFS pool then looks like this.
+-zstorage-(zpool, 12 TB)---+
| |
| +-sda-+ +-sdg-+ |
| |sda1 |---vdev1---|sdg1 | |
| +-----+ +-----+ |
| |
| +-sdc-+ +-sdh-+ |
| +sdc1 |---vdev2---|sdh1 | |
| +-----+ +-----+ |
| |
| +-sdf-+ |
| |sdf1 |---vdev3 |
| | | |
| +-----+ |
| 6TB |
+---------------------------+
Readd the large disk to the ZFS pool
The two remaining hard disks are the WD Green hard disks. The plan was to replace them. These are now free and not used by any pool.
Now, the last step is simply to repartition the large hard disk that has been replaced in the previous step and add it as mirror for the other large disk.
After that, the ZFS pool looks like this.
+-zstorage-(zpool, 12 TB)---+
| |
| +-sda-+ +-sdg-+ |
| |sda1 |---vdev1---|sdg1 | |
| +-----+ +-----+ |
| |
| +-sdc-+ +-sdh-+ |
| +sdc1 |---vdev2---|sdh1 | |
| +-----+ +-----+ |
| |
| +-sdf-+ +-sdi-+ |
| |sdf1 |---vdev3---|sdi1 | |
| | | | | |
| +-----+ +-----+ |
| 6TB |
+---------------------------+
By replacing two of the 3 TB disks, I can now grow the ZFS pool as needed. The two new disks don’t necessarily have to be 6 TB. Could be any size that is available for a good price in the future.
Conclusion
Copying all files and multiple resilvering puts the hard disks to some stress. As I wanted to replace the two WD Green disks, I tried to spare them as much stress as possible.
So that is what happened:
- copy from RAID6 to ZFS pool
- read from:
sdg
,sdc
,sda
,sde
,sdh
,sdd
- write to:
sdi
,sdf
- read from:
- resilver
sdg
andsdh
withsdi
- read from:
sdi
- write to:
sdg
,sdh
- read from:
- resilver
sdc
withsdi
andsdg
- read from:
sdi
+sdg
(50 % each) - write to:
sdc
- read from:
- resilver
sda
withsdi
andsdh
- read from:
sdi
+sdh
(50 % each) - write to:
sda
- read from:
- resilver
sdi
withsdf
- read from:
sdf
- write to:
sdi
- read from:
This adds up to the following:
sdd
(WD Green, 3 TB): rsde
(WD Green, 3 TB): rsda
(WD Red, 3 TB): rwsdc
(WD Red, 3 TB): rwsdg
(WD Red, 3 TB): rwrsdh
(WD Red, 3 TB): rwrsdf
(WD Red, 6 TB): wrsdi
(WD Red, 6 TB): wrrrw
sdd
and sde
are the WD Green disks. These have both just been read once to
get replaced. I didn’t want to risk a failure by doing more than that. Even
though S.M.A.R.T. didn’t indicate any risk of doing so.
sdg
and sdh
have been read more than sda
and sdc
. One pair had to be
read twice. I chose the youngest two of the hard disks for that.
sdi
was put through the most stress during this operation. This hard disk was
new. I could have chosen sdf
as well.
If you are curious, how this looks in reality, the detailed description of each step can be found in the second part.