eDir Disaster Recovery on Solaris (and Linux)#

WARNING This is DANGEROUS OLD Method for Solaris See the NEW eDirectory Documentation

I've been doing a lot of testing of restoring eDirectory 8.7.1x on Linux servers for that last few weeks. The documentation is pretty good, but in the case where you lose all the servers in the tree, you can be in quite a bind. This is the procedure I followed to get a healthy(ish) tree back. It does require a bit of preparation in that I have a very basic tree installed with Role Based Services configured. Just the one user - admin - and he is the collection owner for all roles. Take a copy of the dib set for this tree and tar it up - keep this safe! This will be our 'get out of jail free' card when things go pear-shaped. To begin the restore, install eDir as per usual on the new server, copy this dib set into /var/nds/dib, bounce eDir and get the ball rolling.

My test tree is 3 servers, 2 x Linux and 1 x NetWare just to hold a replica.

I took my backups via iManager - no roll forward logs included, but NICI files included. It shouldn't matter if RFLs are enabled or not, I'm betting that in a DR scenario it's better to get a 12 hour-old tree running quickly than to hunt around the backup tapes for the very latest and greatest.

Login to eMBox and turn on advanced mode - this is a disaster so we're automatically talking about an advanced restore. Ignore the scary warning about costs and stuff, that's just Novell covering themselves :)

eMBox Client> login -s localhost -p 9443 -u admin.services.willeke.com -w yourpassword
Login successful.

eMBox Client> setmode -a
WARNING: You have chosen to enable advanced features. If the advanced
features are not used correctly, they can cause damage to your system.
Do not use the advanced features unless you have been instructed to do
so by Novell Technical Support.

By typing "I AGREE," you accept the conditions and responsibilities of
using the advanced features.
>I Agree

At this stage, we're able to restore the dib without the -a switch (activate), but we do need to -v (override the failed verification) here rather than a seperate operation using restadv since our two other servers in the replica ring are down. If we don't specify that the restore operation should override the failed verification, the restore will never complete - the method eMBox uses to restore a dib set is to ensure that the servers participating in the replica ring are contacted, not possible in this case.

eMBox Client #> backup.restore -f /export/spare/cora/backup.bak -l /tmp/restore.log -e -r -o -n -v -k
Log file name: /tmp/restore.log
Restore started: 2005-2-25'T12:21:40
Restore file name: /export/spare/cora/backup.bak
Starting database restore...
Restoring file /export/spare/cora/backup.bak
Restore progress (bytes read)
Start 0 Current 516096 End 3465216
Restore progress (bytes read)
Start 0 Current 1040384 End 3989504
Restore progress (bytes read)
Start 0 Current 1564672 End 4513792
Restore progress (bytes read)
Start 0 Current 2088960 End 5038080
Restore progress (bytes read)
Start 0 Current 2613248 End 5562368
Restore progress (bytes read)
Start 0 Current 2949120 End 5898240
Restoring file /var/novell/nici/nicimud
Restoring file /var/novell/nici/primenici
Restoring file /var/novell/nici/xmgrcfg.wks
Restoring file /var/novell/nici/0/xmgrcfg.ks2
Restoring file /var/novell/nici/0/xmgrcfg.ks3
Restoring file /var/novell/nici/0/xarchive.001
Restoring file /var/novell/nici/0/nicisdi.key
Restoring file /var/novell/nici/xmgrcfg.nif
Restoring file /var/novell/nici/nicifk
Restoring file /var/novell/nici/xarchive.000
Restoring file /var/novell/nici/1003/xmgrcfg.ks2
Restoring file /var/novell/nici/1003/xmgrcfg.ks3
Restoring file /var/novell/nici/1003/xarchive.001

Please enter the name of the next incremental file with ID 1
or leave blank to proceed without processing any more incremental files
Database restore finished
Completion time 00:00:16
Restore completed successfully
*** END ***

eMBox Client #>

Next we should examine the /var/nds/dib directory and check to see if there are any RST files present - there shouldn't be. Now it's time to tidy the replica ring, since this server currently thinks there are two other servers in the tree and it's desperate to synch with them. Shutdown and restart nds and attempt to login using ndslogin as a sanity check. This should go fine and if not, check back and see what you might have changed.

Fire up dsrmenu.sh and switch on advanced mode. Choose

5) Advanced Menu Options, 
4) Replica and Partition Operations, and 
1) [Root].

Choose option 5) Designate this server as the new master replica - 
this is to ensure we have a master on a known good server. 

Next choose 10) View Replica Ring and choose the server we want to zap from the tree. 
Next, look for the server we want to get rid of and choose the 
'remove this server from the replica ring' option. 

Do this for both replicas so we're left with only our current server as master 
and no read/write versions of this partition. 

Then run a ndsrepair -U to tidy things up. 
You'll notice errors relating to the other servers, but we're about to fix that.
At this stage, we've got a good replica ring, but we still have references to the other two servers as objects in our restored tree, so we won't be able to ndsconfig them back in until we destroy those objects. You can either use ConsoleOne to remove all the objects that relate to the other servers (easy), or if you want to use iManager we need to reconfigure it to handle our newly restored tree (not so easy)

To use iManager, we can do it the long way - pkgrm tomcat4 and httpd, tidy up the remnants of the /var/novell directories then run iManagerInstall.bin, or if the disaster hasn't corrupted your existing installation of iManager, we can have a quick fix by doing the following. Edit the file /var/opt/novell/tomcat4/webapps/nps/WEB-INF/PortalServlet.properties file in a text editor. Change the line near the top from


to read


(replacing the with the current server's IP address)

and change the line near the bottom that reads




This will allow us to run the iManager config again without getting the login failed errors that you might see if the CA has been hosed. I know it's insecure, but we can change that later when things are running normallyand besides, we're using tomcat to talk to eDir on the same server, not across the network.

Next, browse to

Choose the usual options (including the standard pco object) and use the original admin's password to configure...have a break, this will take a minute or two.

Finally, bounce tomcat, login to iManager with the (original, pre-disaster) admin credentials and do some object deleting. We need to zap all objects relating to the dead servers - I get 19 but that's only because I have one or two extra objects from the NetWare box. It's typically nine objects per server that need to be deleted.

We've now got a functional tree, but we may not have a CA to handle all the certificate stuff. This process is lengthy, but to rebuild the PKI objects some nice soul in NTSl took the trouble to document the process.

Go to the other servers, ndsconfig them back in and then gaze at the beauty of it all...now learn to do it in 20 minutes under pressure!

Files In Dib Directory

More Information#

There might be more information for this subject on one of the following: