Using a temporary Exchange Server to overcome database corruption

Just a high level overview of a procedure I used a couple of weeks ago to overcome database corruption at a business that had a single server and didn’t want a huge amount of downtime.

They were running Exchange 2007 on windows 2003 x64 and their mailbox store was touching 75Gb so we had them archive and delete a large amount of data from their mailboxes. This meant there was 30Gb of white space in the database which was great, but there were latency issues which meant we had to defrag the database.

When we kicked the defrag off – it bombed out after about 30 seconds reporting database corruption.

In this situation – an intrusive database fix using eseutil could be a bit of a nightmare as;

  1. You never know how long it’s going to take
  2. The possible corruption may be huge
  3. You should run it twice to ensure the corruption has been fixed
  4. The impact on the end users is hard to determine and could be massive

We called it a night and decided to come back the following day with a spare server. We installed the server into the existing organization as you would with any migration and confirmed mail flow and all services were working on the new server.

After we had enabled circular logging (temporarily and for obvious reasons) and when the time suited the client – we gracefully moved all mailboxes to the new server – there was a little downtime, but only while the users mailboxes were being moved – client access services were unaffected as they were running Exchange 2007 so any client access server can access any mailbox store (Exchange 2003 would need a little more attention).

Once the user mailboxes were moved we removed the corrupt mailbox store and created a new one.

Once we knew the database was healthy we moved all the mailboxes back and bingo – we had fixed the corruption, removed the white space and defragmented the database without running a database fix which should always be avoided wherever possible.

We chose this method primarliy because the downtime was quantifiable – we could communicate with the users how long they’d be down which we couldn’t do with an database fix.

This was just a heads up of the way we completed the procedure… not a technical article you’ll need to do more research into exactly how to complete the procedure – post in the exchange forum if you have any questions or would like to know more…

Leave a Reply


?>