Effective backup/recovery process for OpenLDAP

Making sure to never lose any piece of data is a really difficult task. A point-in-time backup (snapshot), in a permanently living and changing environment does not match data loss-less expectations.

In today’s post the focus will be put on OpenLDAP backup/recovery process in order to never lose a bit of data – well maybe the last transaction in case of a power outage.

Most online resources refer to the OpenLDAP backup/recovery process as :

  • For Backup : running a slapcat command and sending the output to a backup server in a cron job
  • For Recovery : getting the last meaningful backup from the backup server and reload it with a slapadd command

Simple isn’t it ? Well it is simple but it simply does not prevent from important data loss. Let’s highlight two cases that demonstrates the limit of this backup plan.

Case 1

Let’s take a moderately busy service that inserts an average of 1,000 new daily users in its dictionary. There are backups made (using the slapcat command) every day at midnight. Now, for some reasons one day at 8.00pm, a hard drive crashes (no RAID) or the filesystem got corrupt or the reason you want to come up with… It is time for recovery. We set a new VM or a new drive, set OpenLDAP again, get back the last meaningful backup and load it with a slapadd command. OpenLDAP server is back to its yesterday state but what about the 900 entries that got inserted today ? Well simply gone. That is why you must have a redundant set of OpenLDAP servers via replication. But replication is not a backup plan in itself.

Case 2

For precaution you set up a master/slave schema (a.k.a Consumer/Provider in the LDAP terms). So even if the main OpenLDAP server crashes you do have an up to date copy. Now since error is human, if an employee inadvertently removes an important set of data, this change will be replicated to all your slaves OpenLDAP servers and the data won’t be recoverable. Recovering yesterday backup will leave you in the same state as Case 1 and data would have been lost.

Solution

Design of an infrastructure effective for backup/recovery process

Design of an infrastructure effective for backup/recovery process

To be able to almost never lose a bit of OpenLDAP data, the infrastructure to deploy will heavily rely on the accesslog module provided by OpenLDAP.

The accesslog overlay is used to keep track of all or selected operations on a particular DIT (the target DIT) by writing details of the operations as entries to another DIT (the accesslog DIT). The accesslog DIT can be searched using standard LDAP queries. Accesslog overlay parameters control whether to log all or a subset of LDAP operations (logops) on the target DIT, to save related information such as the previous contents of attributes or entries (logold and logoldattr) and when to remove log entries from the accesslog DIT

Definition from http://www.zytrax.com/books/ldap/ch6/accesslog.html

Accesslog are mainly used for replication/audit purpose. In the above schema, our slaves will never be master of any other OpenLDAP server, they do use accesslog as a real-time accesslog backup in case the Master OpenLDAP server becomes unavailable for any reason.

Backup Process

As simple as it is described by most resources out there, the backup process will be a slapcat command – run as a cron job – of the needed DIT and their relative Accesslog DIT

#> slapcat -n 2 >  maindit-bk.ldif
#> slapcat -n 3 > maindital-bk.ldif

Recovery Process

This is how the recovery process would work :

  1. Load the last meaningful backup of the needed DIT with the command
  2. Load the accesslog from either the backup or the slave accesslog – which ever fit the most – do not forget to clean the accesslog if you are trying to recover an erroneous action
  3. Set the DIT to be a consumer of the freshly loaded accesslog

Practice

Step 1 : Simulate data loss

#> service slapd stop
#> slapcat -n 2 > maindit-backup.ldif
#> service slapd start
#> ldapadd -x -w 'test' -D 'cn=Manager,dc=domain,dc=com' -f user.ldif
#> service slapd stop
#> slapcat -n 3 > maindit-accesslog-backup.ldif

At this time, there are two backup files :

  • the maindit-backup.ldif that has everything but the last entry
  • the maindit-accesslog-backup.ldif that do have the addition of the user.

Step 2 : Recovering a clean OpenLDAP server

  1. Install a new VM with the appropriate package and configuration [if necesary only]
  2. If you are using a corrupted OpenLDAP server, move all the dbd file of your corrupted database (mv /var/lib/ldap/{yourdbdname}/*.dbd /backup/ldap/{yourdbname})
  3. Enable accesslog and syncprov modules
  4. Reload the needed DIT with slapadd
  5. Create an Accesslog db that will be used as provider
  6. Reload the accesslog db with its backup
  7. Configure syncrepl on the main DIT to be a consumer of the accesslog provider
  8. Restart slapd

At this time you have your OpenLDAP server being back up-to-date data wise and no data has been lost.

Conclusion

Not that simple right ? It needs a bit more than 2 lines of shell scripts. Long time observed behavior is that people/company do backups but do not test recovery. They are tested when the backup plan is created but then left aside and almost never used. Some company, on the other side takes recovery to its extreme and deploy last night backup to production every day. This way the recovery process is well tested and they don’t fear failure. Either way one decides to go, make sure  to always have a data loss-less backup/recovery plan, an up-to-date documentation that goes along with it, and your nagios’ check_ldap plugin up and running. QED



Leave a comment