Home > ZFS > ZFS Home NAS

ZFS Home NAS

While this is already the 3rd week running on ZFS, this is the time to write down, what happened so far. This post is going to describe the migration of my good old NSLU2 Debian server to OpenSolaris, ZFS being the main reason for this step.

Requirements/Hardware

The requirements for the hardware were focused on storage, that is ZFS, with an eye on some virtualization capabilities, in order to get rid of the original Linux server. The latter one had quite a few services running, which preferrably should be able to run on OpenSolaris, either natively or within a VM. This lead me to the following HW requirements:

  • low-power Atom-based board (Atom D510)
  • passive cooling
  • loads of memory (4GB)
  • separate OS and data disk drives (SATA-2)
  • redundant data store (at least for the data disks)
  • Gigabit ethernet
  • OpenSolaris compatibility

Finally I decided to buy the following components (prices are rounded):

Case: JCP MI-103 (incl. 60W AC/DC power supply) 50€
Mainboard: SuperMicro X7SPA-H (Atom D510, passively cooled) 180€
Memory: 4GB SO-DIMM (667 non-ECC) 70€
Disks: 2x 120GB (OS), 2x 500GB (Data), separate vendors each (form factor 2.5 inch) 190€
DVD: Slimline DVD-RW (temporarly for installation) 40€
Cables: bunch of SATA/power connectors 20€

Basic Installation

  1. Hardware installation was pretty straight-forward. Of course there’s not enough slots for installing four HDDs, so I had them partly installed, partly laying around within the case. Once I got everything up and running, power usage was around 33W, which is slightly more than I expected, but still acceptable for a 24/7 system.
  2. Next step was tweaking the BIOS. Apart from the usual stuff of enabling quiet/quick boot and disabling stuff you’ll never need for storage, the major step is to switch the SATA controller from IDE-compatibility to AHCI mode, although this can be done later as well. From my understanding, this is what is referred to as JBOD-mode in terms of dealing with the controller.
  3. The OS installation of OpenSolaris 2009.06 was easy-going, with all the hardware components supported out-of-the-box. I started out with a full-disk installation onto one of the OS disks and DHCP-based network configuration for one of the ethernet ports.
  4. The next step was to mirror the OS disks. Root pool mirroring is not supported out-of-the-box by the installer. Furthermore, you cannot simply mirror the complete disk, since it requires regular partitions (not EFI) in order to remain bootable. This link describes the necessary operations. Remember to test, what you’ve done, that is actually checking if you can come up with either of the two drives (switching them on the SATA controller is a good start).
  5. Next, I added the two data disks as a mirrored data pool. In that case, it is perfectly fine to grab the whole disk rather than dealing with partitions (slices).
  6. Finally I removed the DVD drive from the case, wondering why I actually bought one in the first place.

Configuration

Next up were various configurations, which I don’t remember in chronological order. All the following steps are related to the configuration of the main system (so-called global zone), descriptions for setting up zones and virtual machines will follow.

Boot environments

First of all, it is essential to become familiar with the concept of boot environments, unless you are keen on reinstalling the DVD drive, once you messed up your system. Boot environments are clones of the root file system configured as separate boot entries within GRUB. The main reason, why this is actually working, is the fact that the root file system (by default) comprises all the usual file system candidates, such as /var and /usr. One would never put those into one filesystem in Linux, but in OpenSolaris there is no actual need to separate them. You should rather put your real data into separate file systems (preferrably on the data pool) and put the services using them into zones. So the best way is to start with a new boot environment right from the start, leaving you with a safe point to return, in case you need it.

# beadm create -d "a suitable description, shown in GRUB" opensolaris-001
# beadm activate opensolaris-001
# init 6

Once rebooted into the new environment, it was time to start with some real configuration. Always remember, that /export/home is not part of the root file system, so you can safely rely on your files lying around there to be available in any boot environment.

Basic Networking

First, it was time to get rid of network automagic including DHCP, so I created a static configuration first:

# svcs *network/physical*
....
# svcadm disable network/physical:nwam
# echo "<your home network> <your netmask>" >> /etc/inet/netmasks
# echo "<static IP>" > /etc/hostname.e1000g0
# vi /etc/resolv.conf             # add/edit your name-server
# svcadm enable network/physical:default

Next, I figured that the second Ethernet port was wastefully idling around, so why not use it? In general, you have three choices here:

  1. Assigning a second IP to the second port, so you can distribute your services across multiple IPs, e.g. one for iSCSI and one for the rest.
  2. Combining them using a link aggregation, such as LACP, so they share the same MAC. This approach requires a suitable switch, though.
  3. Combining them using bonding, that is providing the same IP to both interfaces. This effectively provides a layer-3 aggregation.

While the second approach provides both failover and increased bandwidth in both directions, it requires support on the switch. Thus, the third approach made the race, providing me with link-based failover and increased outgoing bandwidth. Hopefully, the NAS will provide more outgoing traffic than dealing with incoming. OpenSolaris provides bonding capabilities by means of IP MultiPathing (IPMP). You simply have to choose a logical name for your IPMP group (prod0 in my case):

# echo "<static ip> mavstore-prod" >> /etc/hosts                 # let's out-source our IP
# echo "group prod0 -failover up" > /etc/hostname.e1000g0    # prod0 = IPMP group name
# echo "group prod0 -failover up" > /etc/hostname.e1000g1
# echo "ipmp group prod0 mavstore-prod up" > /etc/hostname.prod0
# init 6
# ...
# ipmpstat -a  # check IPMP status
# ipmpstat -i   # display interface stats (more detailled)

Pulling either of the cables now results in the spare interface taking over. Furthermore, outgoing traffic is spread over both interfaces.

Advanced Networking

In order to be prepared for some more network fun, when it comes to setting up virtual machines and zones, I decided to set up a virtual network infrastructure, as well. When it comes to virtual interfaces, OpenSolaris’ key solution is called Crossbow. So I started with one virtual interface (and a switch) managed from within the global zone:

# dladm create-etherstub vswitch0
# dladm create-vnic -l vswitch0 vnic0
# echo "<another subnet mask>" >> /etc/netmasks
# echo "<internal IP>" > /etc/hostname.vnic0

In order to be able to reach those internal addresses from my home network, we need to enable routing (remember to reconfigure your home-router as well):

# routeadm -e ipv4-forwarding
# routeadm -e ipv4-routing
# routeadm -u

Adding a new virtual machine or Solaris zone to this playground boils down to adding another virtual interface. While for zones it is sufficient to stick with random MAC addresses, for VirtualBox you have to use the static address that matches your network configuration within VirtualBox (bridged-mode), e.g:

# dladm create-vnic -l vswitch0 -m <static_mac_address> vnic1

Those additional interfaces are neither active nor managed within the global zone, so you won’t be able to see them with ifconfig -a. Furthermore, all etherstub based interfaces use MTU 9000, so be prepared to us a proper network driver for your virtual machine, i.e. one that allows to alter MTU settings.

Since most likely one wants to run services within those virtual machines or zones that should be accessible from the outside, but most home routers deny port forwarding to such kind of addresses, I used NAT to redirect traffic by port, e.g.:

# echo "rdr prod0 0.0.0.0/0 port 80 -> <virtual IP> port 80 tcp" >> /etc/ipf/ipnat.conf
# svcadm enable ipfilter

NTP

Setting up Network Time Protocol service is simple. The tricky part, however, is to add time servers by IP, not by name, when you plan on using NAT. I figured that once ipfilter is used for NAT, services within the system are checked for individual rules to be added dynamically. NTP is doing exactly this, just in case you’re default policy is to block incoming traffic. However, added firewall rules using DNS names, results in the NTP service to fail (switching to maintenance). So we better stick to IP:

# cat << EOF > /etc/inet/ntp.conf
driftfile /var/ntp/ntp.drift

# 0.debian.pool.ntp.org
server 85.214.29.92
EOF
# svcadm enable ntp

Auto Snapshotting

With ZFS providing us with snapshot capabilities, the next step was to set up automatic snapshotting, just in case we accidentially deleted some files, that we may rely on tomorrow. OpenSolaris 2009.06 ships with SMF services for auto-snapshotting, so I enabled the desired ones:

# svcs auto-snapshot
# ...
# svcadm enable auto-snapshot:daily
# svcadm enable auto-snapshot:weekly
# svcadm enable auto-snapshot:monthly
# ...
# zfs list -t snapshot -r rpool
# ...

The default behavior of those services is to recursively snapshot everything. That’s definitely not, what you want, especially when it comes to snapshotting swap zVols, so I tuned the default behavior:

# zfs set com.sun:auto-snapshot=false rpool        # this is inherited
# zfs set com.sun:auto-snapshot=true rpool/<any desired filesystem>
# ...

Automatic snapshots are not kept infinitely. The duration can be configured per service instance, using svccfg. I reduced the daily snapshot interval to seven days, relying on the weekly snapshot to take over from there.

Auto Scrubbing

While ZFS provides us with so-called self-healing by means of checksumming, this is only done whenever data is read from a pool. Furthermore, in a mirrored setup data is read from each vdev alternatingly. In order to assure all your data is healthy, you have scrub your pools regularly. Unfortunately, there is no such thing as an auto-scrub service in OpenSolaris, yet. This link provides a suitable SMF service, which I installed except for the graphical configuration panel. Be sure, to configure different offsets (zfs/offset), when using the same interval for both pools, so scrubs won’t run in parallel.

Rsync

A main reason for having a 24/7 home NAS is the ability to back up any client data using rsync. This article describes how to combine the regular setup with ZFS snapshotting, the idea being to always create a snapshot after a successful synchronization. This prevents you from accidentially deleting any files on one of your client machines and syncing this with server.

NFS

NFS export management is fully integrated with ZFS, so one simply has to enable the NFS service itself and get going:

# svcadm enable -r nfs/server
# echo "<client IP> client >> /etc/hosts
# zfs set sharenfs=root=client,rw=client pool/dataset
# dfshares

This enables the client machine to mount the share, e.g.:

mount.nfs server:/pool/dataset /mnt

Currently there is one problem with NFS exports interfering with CIFS shares. CIFS shares will be shared as NFS exports, as well, even if the sharenfs option is off. Since I don’t have a proper solution to this, I stick to the following after a reboot:

# zfs unshare -a
# zfs share -a

CIFS

In order to share your data with Windows system (or if you simply prefer the integrated SMB handling in KDE), you can choose between the integrated CIFS service in OpenSolaris or Samba. I picked the first one, as it is fully integrated with ZFS. First of all, one has to enable the service (preferrably in workgroup mode):

# svcadm enable -r smb/server
# smbadm join -w MAVNET
# echo "other  password required  pam_smb_passwd.so.1  nowarn" >> /etc/pam.conf
# passwd maverick
... reenter your password ...

Resetting your password is essential, if you want to use the same user for accessing a CIFS share, thus it has to be repeated for any suitable user, after the PAM entry has been added.

Now, sharing a ZFS dataset as a CIFS share is as easy as:

# zfs create -o casesensitivity=mixed -o nbmand=on tank01/someshare
# zfs set sharesmb=name=someshare tank01/someshare
# sharemgr show -vp

VirtualBox

Finally, setting up VirtualBox was necessary, since I require Linux for some of my services as well. After downloading and extracting the program, it can be installed, as follows:

# pkgadd -d <absolute path to package>

Preferrably, one should use a ZFS volume for installing the virtual OS, so one can use snapshots as well:

# zfs create -V 3G rpool/vms/debian
# chown maverick /dev/zvol/rdsk/rpool/vms/debian
# VBoxManage internalcommands createrawvmdk -filename debian.vmdk -rawdisk /dev/zvol/rdsk/rpool/vms/debian

After importing that VMDK file into VirtualBox, one can happily start installing the OS and whatever is needed. Remember that you can specify a virtual NIC on your etherstub for bridged-mode networking.
Warning: Do not use bridged-mode networking on one of the adapters that make up your IPMP device, since that breaks failover capabilities, once VirtualBox is running!
In order to make the VirtualBox run as an SMF service, I downloaded a service description and SVC script from here. I had to apply some changes to it, though:

  1. SVC method
  2. SVC configuration

After copying the SVC method file to /lib/svc/methods, one has to import the configuration and enable the service:

# svccfg import svc_cfg.xml
# svcadm enable vbox:debian

The result is quite acceptable, though I haven’t tried, how it deals with multiple VM instances, yet. Keep in mind though, that the service is automatically restarted, so shutting down the VM from inside, will effectively result in a restart. Also, remember to use snapshots and clones, where applicable, e.g. when you have set up the base system. As usual, keep your data separated from your VM!

References

  1. ziggy
    April 3rd, 2010 at 06:39 | #1

    Have you experienced any network dropoffs of your machine as documented in a similar build located here?:

    http://sorenragsdale.livejournal.com/19875.html

  2. maverick
    April 3rd, 2010 at 16:47 | #2

    No, so far I haven’t experienced any network problems. The system has been running 24/7 for about 3-4 weeks now (100MBit though). Neither can I confirm the problems related to USB, since I’m using SATA drives. My backup drive ist USB, but that takes about 1 hour, so no problems there either.

  3. Taras
    April 6th, 2010 at 06:30 | #3

    Hi, I found your guide very useful in setting up the nas box, but given the fact that I have at least 2 weeks on opensolaris. But can you show an example on how to: # echo ” client >> /etc/hosts

    what does one put in for client> if is simply want to allow access to all devices on a router with 192.168.2.1 or something like that.

  4. maverick
    April 6th, 2010 at 06:39 | #4

    Unfortunately, I am not quite sure, what’s the syntax for granting access to a whole subnet. Should be something like sharenfs=root=@192.168.2.0/24,rw=@192.168.2.0/24. In your case adding the client to /etc/hosts is not the proper way, this is only useful, if you have dedicated client IPs. Hope, this helps.

  5. March 17th, 2011 at 10:35 | #5

    Great article, like it.

    I don’t much like this statement though, ‘/var and /usr. One would never put those into one filesystem in Linux, but in OpenSolaris there is no actual need to separate them.’ To my mind there is as much a reason to break out /var as a separate filesystem on Solaris as any other UNIX OS. I realise this is a point of discussion at times but the fact is, it’s the most volatile of OS filesystems, so breaking it out facilitates better monitoring of its growth and visibility and control of the space usage of the files within it.

  6. maverick
    March 17th, 2011 at 10:46 | #6

    Of course, one may as well create a dedicated ZFS dataset for /var, however managing boot environments becomes more tricky then. On the other hand, I have never used /var for storing real data, that is, all my media files, images, etc. are stored in dedicated ZFS datasets on the data pool, anyway. As such, only log files are the only remaining artefacts that may actually start filling /var.

  1. March 29th, 2010 at 16:23 | #1