Linux Costa Blanca: samba4 cluster for AD : DRBD ocfs2 CTDB

With proper DC fail-over just around the corner, we turn our attention to that most neglected of forest workhorses: the file server. We've always wanted another file server for our domain, mainly for peace of mind. DFS was no good in a domain with windows and Linux workstations. It just had to be HA.

In true open source style, getting started in HA is strictly rocket scientists only. Documentation: apart from the excellent plain English drbd guide if it isn't out of date, there are bits missing and essential detail is assumed. So beware. The learning curve is steep and beginner support almost non existent. Anyway, let's have a go. If you can build a Samba4 DC from source, you can do this, it says here.

aim
To build a 2 node cluster. Both nodes up. One fails, the other takes over. Simple.

diagram
All cluster articles have to have an unintelligible diagram with lines, arrows and IP addresses all over the place. So here's the only one which made any sense to us; from the DRBD link.

easy to understand 2 node cluster

As we understand it, we have smbd running on node 1. If that fails, both the ip and the smbd process will be transferred to node 2. We have a mirrored disk between the two.

dramatis personae
- an AD domain
- 2 spare computers
- 4 network interface cards
- a crossover cable
- 2 straight cables
- a table with everything close by
- 2 spare disks

Connect the first network card on each node to the switch using the straight cables. Connect crossover cable between the second 2 cards. Some detect the type of cable you have attached, others don't. For this post and to obviate the need for me to get the screwdriver out again, I've done it with vms.

overview
Each node has 2 physical ethernet cards. One each is LAN side. These will carry both the physical IP for the node on that subnet and the public IPs, allocated by CTDB, which are used as the IPs for the cluster itself. The other pair carry the synchronisation traffic on a private subnet and are connected with the crossover cable. Both nodes will run smbd and winbind and will be configured with the same netbios name. A third bonding interface must be configured to load balance the actual IP of the node and the domain IPs which the workstations will use when requesting data. The slave interface for the bonding will be the card out to LAN. Unlike the crossover internal network, this interface will not be configured with an IP at operating system level. I think the term is, 'the physical interface is enslaved in the bonded interface'.

domain
We shall join the cluster to the following domain:
hh3.site
DC: hh16, 192.168.1.16

cluster hostnames and addresses
node 1: smb1 192.168.1.82/24 192.168.0.10/24
node 2: smb2 192.168.1.83/24 192.168.0.11/24

interfaces

The 1.x subnet goes out to the LAN, the 0.x is the private crossover.

The physical interfaces are enp0s3 and enp0s8 with bond0 bonded to enp0s3. The latter will carry the fail-over IP over to the domain which will be allocated by CTDB.

This is one place where openSUSE's Yast really saves you time and energy, otherwise it's back to editing files and wondering about the syntax.

On vBox, add another network adapter and connect the cable

and add a 2Gb disk

security
Kill the firewall and apparmor. Work out the ports and files later.

software
We're on openSUSE 13.1. Both nodes have un-provisioned Samba 4.1.9. You need DRBD 8.4. The documentation for 8.3 suggests that this is sufficiently different to matter. On openSUSE 13.1, this meant adding the ha and network:samba repositories. We had problems with CTDB 2.3, but 2.5.3 from the same repo. went in fine. For the file system, forget ext4; it just crashes spectacularly. ocfs2 works fine and screams between the nodes with anything you can throw at it. You also need ocfs2-tools.

configuration
identical ON BOTH nodes unless specified otherwise:

set the only dns to contain:
/etc/resolv.conf
search hh3.site
nameserver 192.168.1.16

/etc/krb5.conf
[libdefaults]
default_realm = HH3.SITE
dns_lookup_realm = false
dns_lookup_kdc = true
default_ccache_name = /tmp/krb5cc_%{uid}

/etc/hosts

127.0.0.1 localhost.localdomain localhost

192.168.0.10 smb1

192.168.0.11 smb2

/etc/HOSTNAME

node 1: smb1

node 2: smb2

/etc/drbd.conf
global {
usage-count yes;
}
common {
net {
protocol C;
}
}

resource r0 {
handlers {
split-brain "/usr/lib/drbd/notify-split-brain.sh steve";
}

net {
allow-two-primaries;
after-sb-0pri discard-zero-changes;
after-sb-1pri discard-secondary;
after-sb-2pri disconnect;
}
startup {
become-primary-on both;
}
on smb1 {
device /dev/drbd1;
disk /dev/sdb1;
address 192.168.0.10:7789;
meta-disk internal;
}
on smb2 {
device /dev/drbd1;
disk /dev/sdb1;
address 192.168.0.11:7789;
meta-disk internal;
}
}

/etc/ocfs2/cluster.conf
(make the folder yourself!)
node:
ip_port = 7777
ip_address = 192.168.0.10
number = 1
name = smb1
cluster = ocfs2
node:
ip_port = 7777
ip_address = 192.168.0.11
number = 2
name = smb2
cluster = ocfs2
cluster:
node_count = 2
name = ocfs2

/etc/ctdb/nodes
192.168.0.10
192.168.0.11

/etc/ctdb/public_addresses
192.168.1.80/24 bond0
192.168.1.81/24 bond0

chmod +x some stuff:
/etc/ctdb/events.d/
-rwxr--r-- 1 root root 7713 Jul 4 13:06 10.interface
-rwxr--r-- 1 root root 1102 Jul 4 13:06 11.routing
-rwxr--r-- 1 root root 1070 Jul 4 13:06 49.winbind
-rwxr--r-- 1 root root 3491 Jul 4 13:06 50.samba

/etc/sysconfig/ctdb
CTDB_PUBLIC_ADDRESSES=/etc/ctdb/public_addresses
CTDB_MANAGES_SAMBA=yes
CTDB_SAMBA_SKIP_SHARE_CHECK=yes
CTDB_NFS_SKIP_SHARE_CHECK=yes
CTDB_MANAGES_WINBIND=yes
CTDB_MANAGES_VSFTPD=no
CTDB_MANAGES_ISCSI=no
CTDB_MANAGES_NFS=no
CTDB_MANAGES_HTTPD=no
CTDB_INIT_STYLE=
CTDB_SERVICE_SMB=smb
CTDB_SERVICE_WINBIND=winbind
CTDB_NODES=/etc/ctdb/nodes
CTDB_NOTIFY_SCRIPT=/etc/ctdb/notify.sh
CTDB_DBDIR=/var/lib/ctdb
CTDB_DBDIR_PERSISTENT=/var/lib/ctdb/persistent
CTDB_EVENT_SCRIPT_DIR=/etc/ctdb/events.d
CTDB_SOCKET=/var/lib/ctdb/ctdb.socket
CTDB_TRANSPORT="tcp"
CTDB_MONITOR_FREE_MEMORY=100
CTDB_START_AS_DISABLED="yes"
CTDB_CAPABILITY_RECMASTER=yes
CTDB_CAPABILITY_LMASTER=yes
NATGW_PUBLIC_IP=
NATGW_PUBLIC_IFACE=
NATGW_DEFAULT_GATEWAY=
NATGW_PRIVATE_IFACE=
NATGW_PRIVATE_NETWORK=
NATGW_NODES=/etc/ctdb/natgw_nodes
CTDB_LOGFILE=/var/log/messages
CTDB_DEBUGLEVEL=2
CTDB_OPTIONS=

/etc/samba/smb.conf
[global]
workgroup = HH3
netbios name = SMBCLUSTER
realm = HH3.SITE
security = ADS
kerberos method = secrets only
winbind enum users = Yes
winbind enum groups = Yes
winbind use default domain = Yes
winbind nss info = rfc2307
idmap config * : backend = tdb
idmap config * : range = 19900-19999
idmap config HH3 : backend = ad
idmap config HH3 : range = 20000-4000000
idmap config HH3 : schema_mode = rfc2307
clustering = Yes
ctdbd socket = /var/lib/ctdb/ctdb.socket
[users]
path = /cluster/users
read only = No
[profiles]
path = /cluster/profiles
read only = No

/etc/nsswitch.conf
passwd: files winbind
group: files winbind
hosts: files dns

partition the spare disk
use fdisk or yast to end up with a partition, /dev/sdb1 where:
fdisk -l
gives:
Disk /dev/sda: 12.9 GB, 12884901888 bytes, 25165824 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk label type: dos
Disk identifier: 0x000f1bbc

Device Boot Start End Blocks Id System
/dev/sda1 2048 1525759 761856 82 Linux swap / Solaris
/dev/sda2 * 1525760 25165823 11820032 83 Linux

Disk /dev/sdb: 2147 MB, 2147483648 bytes, 4194304 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk label type: dos
Disk identifier: 0x000ab2e5

Device Boot Start End Blocks Id System
/dev/sdb1 2048 4194303 2096128 83 Linux

**If you are on a vm, clone the machine and disk at this stage.

create the drbd metadata

drbdadm create-md r0

writing meta data...

initializing activity log

NOT initialized bitmap

New drbd meta data block successfully created.

Success.

start drbd
drbd up r0

start the sync

Bring up drbd on node 2. We chose to synchronise from node 1. This only matters if you have data. Be careful. If either of your nodes has data and you have not repartitioned, choose that. The other node will be overwritten.

node 1:

drbdadm primary --force r0

Wait until:

cat /proc/drbd

responds with:

version: 8.4.4 (api:1/proto:86-101)

GIT-hash: 3c1f46cb19993f98b22fdf7e18958c21ad75176d build by SuSE Build Service

1: cs:Connected ro:Secondary/Secondary ds:UpToDate/UpToDate C r-----

monitoring the initial synchronisation

setup ocfs2-tools

/etc/init.d/o2cb configure

Configuring the O2CB driver.

This will configure the on-boot properties of the O2CB driver.

The following questions will determine whether the driver is loaded on

boot. The current values will be shown in brackets ('[]'). Hitting

<ENTER> without typing an answer will keep that current value. Ctrl-C

will abort.

Load O2CB driver on boot (y/n) [n]: y

Cluster stack backing O2CB [o2cb]:

Cluster to start on boot (Enter "none" to clear) []: ocfs2

Specify heartbeat dead threshold (>=7) [31]:

Specify network idle timeout in ms (>=5000) [30000]:

Specify network keepalive delay in ms (>=1000) [2000]:

Specify network reconnect delay in ms (>=2000) [2000]:

Writing O2CB configuration: OK

Loading filesystem "configfs": OK

Loading stack plugin "o2cb": OK

Loading filesystem "ocfs2_dlmfs": OK

Creating directory '/dlm': OK

Mounting ocfs2_dlmfs filesystem at /dlm: OK

Setting cluster stack "o2cb": OK

Registering O2CB cluster "ocfs2": OK

Setting O2CB cluster timeouts : OK

create the file system on node 1

mkfs -t ocfs2 -N 2 -L stevescluster /dev/drbd1

mkfs.ocfs2 1.8.2

Cluster stack: classic o2cb

Label: ocfs2_drbd01

Features: sparse extended-slotmap backup-super unwritten inline-data strict-journal-super xattr indexed-dirs refcount discontig-bg

Block size: 4096 (12 bits)

Cluster size: 4096 (12 bits)

Volume size: 2146332672 (524007 clusters) (524007 blocks)

Cluster groups: 17 (tail covers 7911 clusters, rest cover 32256 clusters)

Extent allocator size: 4194304 (1 groups)

Journal size: 67108864

Node slots: 2

Creating bitmaps: done

Initializing superblock: done

Writing system files: done

Writing superblock: done

Writing backup superblock: 1 block(s)

Formatting Journals: done

Growing extent allocator: done

Formatting slot map: done

Formatting quota files: done

Writing lost+found: done

mkfs.ocfs2 successful

make a mount point for the cluster on both nodes:

mkdir /cluster

become primary on node 1
drbdadm primary r0

mount the cluster on node 1

mount /dev/drbd1 /cluster

make the samba shares:

mkdir /cluster/users

mkdir /cluster/profiles

join the domain from node 1 only:

net ads join -UAdministrator

Enter Administrator's password:

Using short domain name -- HH3

Joined 'SMBCLUSTER' to dns domain 'hh3.site'

Not doing automatic DNS update in a clustered setup.

on the dc, add the round robin DNS entries
samba-tool dns add hh16 hh3.site smbcluster A 192.168.1.80
samba tool dns add hh16 hh3.site smbcluster A 192.168.1.81
samba-tool dns add hh16 1.168.192.in-addr.arpa 80 PTR smbcluster
samba-tool dns add hh16 1.168.192.in-addr.arpa 81 PTR smbcluster

on node 1, start CTDB (hold on tight!)

systemctl start ctdb && ctdb enable

on the DC, create a domain user, stevec
samba-tool user add stevec
then edit him:

ldbedit -e joe --url=/usr/local/samba/private/sam.ldb cn=stevec

to contain these attributes:

uidNumber: 3000092

gidNumber: 20513

unixHomeDirectory: /home/users/stevec

loginShell: /bin/bash

homeDrive: Z:

homeDirectory: \\smbcluster\users\stevec

profilePath: \\smbcluster\profiles\stevec

set the permissions on the profiles share

chmod 1777 /cluster/profiles

testing
with both nodes primary, mount the disk /dev/drbd1. Here is node 1:
drbdadm up r0
drbdadm primary r0
mount /dev/drbd1 /cluster
mount | grep drbd
/dev/drbd1 on /cluster type ocfs2 (rw,relatime,_netdev,heartbeat=local,nointr,data=ordered,errors=remount-ro,atime_quantum=60,coherency=full,user_xattr,acl)

test the synchronisation by creating a file on node 1 and vica versa

start ctdb
systemctl start ctdb

tail the logs in a second root terminal:
tail -f /var/log/messages

on node 2:

ctdb enable
on node 1:
ctdb enable

node 1 tail:

2014/07/13 08:52:52.755117 [recoverd: 2779]: Takeover run starting

2014/07/13 08:52:53.411916 [ 2614]: Takeover of IP 192.168.1.81/24 on interface bond0

2014-07-13T08:52:54.300701+02:00 smb1 avahi-daemon[384]: Registering new address record for 192.168.1.81 on bond0.IPv4.

2014/07/13 08:52:55.241254 [recoverd: 2779]: Takeover run completed successfully

node 2 responds:

2014/07/13 08:52:30.341070 [recoverd: 2775]: Reenabling takeover runs

2014/07/13 08:52:50.977141 [recoverd: 2775]: Node 0 has changed flags - now 0x0 was 0x4

2014/07/13 08:52:52.750525 [recoverd: 2775]: Disabling takeover runs for 60 seconds

2014/07/13 08:52:52.772956 [ 2611]: Release of IP 192.168.1.81/24 on interface bond0 node:0

2014/07/13 08:52:52.934313 [ 2611]: 10.interface: flock: failed to execute /sbin/iptables: No such file or directory

2014-07-13T08:52:53.272856+02:00 smb2 avahi-daemon[382]: Withdrawing address record for 192.168.1.81 on bond0.

2014/07/13 08:52:55.234398 [recoverd: 2775]: Reenabling takeover runs

check that smbd and winbind are started:
ps aux|grep smbd

root 10360 0.9 1.4 48416 7328 ? Ss 21:18 0:00 /usr/sbin/smbd
root 10383 0.1 0.6 48416 3360 ? S 21:18 0:00 /usr/sbin/smbd
ps aux|grep win
root 10322 0.0 0.7 26396 3676 ? Ss 21:18 0:00 /usr/sbin/winbindd
root 10335 0.0 1.0 26496 5276 ? S 21:18 0:00 /usr/sbin/winbindd
root 10361 0.0 0.9 29388 4604 ? S 21:18 0:00 /usr/sbin/winbindd
root 10362 0.0 1.1 32532 5736 ? S 21:18 0:00 /usr/sbin/winbindd

root 10364 0.0 0.8 26436 4092 ? S 21:18 0:00 /usr/sbin/winbindd

winbind: this looks familiar

id stevec

uid=3000092(stevec) gid=20513(domain users) groups=20513(domain users),19901(BUILTIN\users)

getent group Domain\ Users

domain users:x:20513:

getent passwd stevec

stevec:*:3000092:20513:stevec:/home/users/stevec:/bin/bash

Windows domain clients

stevec on xp served from each of node 1 and node 2

Linux domain clients

To mount the same shares automatically, see our cifs autofs post. To test this manually, don't forget to specify a key for the mount: if you have specified

kerberos method = system keytab

on your client, you will have a suitable key when you joined the client to the domain BUT NOT on the clustered file servers:

mount.cifs //smbcluster/users /home/users -osec=krb5,username=CATRAL$,multiuser

Where catral is the hostname of a Linux client. In this screenshot, the client is running sssd and began life on node 1.

stevec on a Linux client transparently served from node 2

That should be enough to get you started. It sure beats DFS. We now need to re-add the firewall and apparmor, automate start up and then see if we can break it. I wonder how it will do under load? Then decide if and when we can go public. Maybe we should go the whole hog and add PaceMaker to the mix? Fence wobbly nodes? STONITH it?

13.7.14

samba4 cluster for AD : DRBD ocfs2 CTDB