eDir Disaster Recovery on Solaris (and Linux)#
I've been doing a lot of testing of restoring eDirectory 8.7.1x on Linux servers for that last few weeks. The documentation is pretty good, but in the case where you lose all the servers in the tree, you can be in quite a bind. This is the procedure I followed to get a healthy(ish) tree back. It does require a bit of preparation in that I have a very basic tree installed with Role Based Services configured. Just the one user - admin - and he is the collection owner for all roles. Take a copy of the dib set for this tree and tar it up - keep this safe! This will be our 'get out of jail free' card when things go pear-shaped. To begin the restore, install eDir as per usual on the new server, copy this dib set into /var/nds/dib, bounce eDir and get the ball rolling.
My test tree is 3 servers, 2 x Linux and 1 x NetWare just to hold a replica.
I took my backups via iManager - no roll forward logs included, but NICI files included. It shouldn't matter if RFLs are enabled or not, I'm betting that in a DR scenario it's better to get a 12 hour-old tree running quickly than to hunt around the backup tapes for the very latest and greatest.
Login to eMBox and turn on advanced mode - this is a disaster so we're automatically talking about an advanced restore. Ignore the scary warning about costs and stuff, that's just Novell covering themselves :)
eMBox Client> login -s localhost -p 9443 -u admin.services.willeke.com -w yourpassword Login successful. eMBox Client> setmode -a *********************************************************************** WARNING: You have chosen to enable advanced features. If the advanced features are not used correctly, they can cause damage to your system. Do not use the advanced features unless you have been instructed to do so by Novell Technical Support. YOU ARE RESPONSIBLE FOR ALL COSTS ASSOCIATED WITH RESOLVING PROBLEMS RESULTING FROM THE USE OF THESE FEATURES. By typing "I AGREE," you accept the conditions and responsibilities of using the advanced features. *********************************************************************** >I Agree
At this stage, we're able to restore the dib without the -a switch (activate), but we do need to -v (override the failed verification) here rather than a seperate operation using restadv since our two other servers in the replica ring are down. If we don't specify that the restore operation should override the failed verification, the restore will never complete - the method eMBox uses to restore a dib set is to ensure that the servers participating in the replica ring are contacted, not possible in this case.
eMBox Client #> backup.restore -f /export/spare/cora/backup.bak -l /tmp/restore.log -e -r -o -n -v -k Log file name: /tmp/restore.log Restore started: 2005-2-25'T12:21:40 Restore file name: /export/spare/cora/backup.bak Starting database restore... Restoring file /export/spare/cora/backup.bak Restore progress (bytes read) Start 0 Current 516096 End 3465216 Restore progress (bytes read) Start 0 Current 1040384 End 3989504 Restore progress (bytes read) Start 0 Current 1564672 End 4513792 Restore progress (bytes read) Start 0 Current 2088960 End 5038080 Restore progress (bytes read) Start 0 Current 2613248 End 5562368 Restore progress (bytes read) Start 0 Current 2949120 End 5898240 Restoring file /var/novell/nici/nicimud Restoring file /var/novell/nici/primenici Restoring file /var/novell/nici/xmgrcfg.wks Restoring file /var/novell/nici/0/xmgrcfg.ks2 Restoring file /var/novell/nici/0/xmgrcfg.ks3 Restoring file /var/novell/nici/0/xarchive.001 Restoring file /var/novell/nici/0/nicisdi.key Restoring file /var/novell/nici/xmgrcfg.nif Restoring file /var/novell/nici/nicifk Restoring file /var/novell/nici/xarchive.000 Restoring file /var/novell/nici/1003/xmgrcfg.ks2 Restoring file /var/novell/nici/1003/xmgrcfg.ks3 Restoring file /var/novell/nici/1003/xarchive.001 Please enter the name of the next incremental file with ID 1 or leave blank to proceed without processing any more incremental files ?> Database restore finished Completion time 00:00:16 Restore completed successfully *** END *** eMBox Client #>
Next we should examine the /var/nds/dib directory and check to see if there are any RST files present - there shouldn't be. Now it's time to tidy the replica ring, since this server currently thinks there are two other servers in the tree and it's desperate to synch with them. Shutdown and restart nds and attempt to login using ndslogin as a sanity check. This should go fine and if not, check back and see what you might have changed.
Fire up dsrmenu.sh and switch on advanced mode. Choose
option 5) Advanced Menu Options, 4) Replica and Partition Operations, and 1) [Root]. Choose option 5) Designate this server as the new master replica - this is to ensure we have a master on a known good server. Next choose 10) View Replica Ring and choose the server we want to zap from the tree. Next, look for the server we want to get rid of and choose the 'remove this server from the replica ring' option. Do this for both replicas so we're left with only our current server as master and no read/write versions of this partition. Then run a ndsrepair -U to tidy things up. You'll notice errors relating to the other servers, but we're about to fix that.At this stage, we've got a good replica ring, but we still have references to the other two servers as objects in our restored tree, so we won't be able to ndsconfig them back in until we destroy those objects. You can either use ConsoleOne to remove all the objects that relate to the other servers (easy), or if you want to use iManager we need to reconfigure it to handle our newly restored tree (not so easy)
To use iManager, we can do it the long way - pkgrm tomcat4 and httpd, tidy up the remnants of the /var/novell directories then run iManagerInstall.bin, or if the disaster hasn't corrupted your existing installation of iManager, we can have a quick fix by doing the following. Edit the file /var/opt/novell/tomcat4/webapps/nps/WEB-INF/PortalServlet.properties file in a text editor. Change the line near the top from
(replacing the 10.44.82.14 with the current server's IP address)
and change the line near the bottom that reads
This will allow us to run the iManager config again without getting the login failed errors that you might see if the CA has been hosed. I know it's insecure, but we can change that later when things are running normallyand besides, we're using tomcat to talk to eDir on the same server, not across the network.
Next, browse to
Choose the usual options (including the standard pco object) and use the original admin's password to configure...have a break, this will take a minute or two.
Finally, bounce tomcat, login to iManager with the (original, pre-disaster) admin credentials and do some object deleting. We need to zap all objects relating to the dead servers - I get 19 but that's only because I have one or two extra objects from the NetWare box. It's typically nine objects per server that need to be deleted.
We've now got a functional tree, but we may not have a CA to handle all the certificate stuff. This process is lengthy, but to rebuild the PKI objects some nice soul in NTSl took the trouble to document the process.
Go to the other servers, ndsconfig them back in and then gaze at the beauty of it all...now learn to do it in 20 minutes under pressure!