Thursday, October 08, 2009

Exchange 2007 SCR How to - Part 2

This is a continuation from Part 1 where we configured the SCR replication.


Failing over to the target SCR server


In this example, I am not having an ACTUAL failure. I am choosing to dismount the DB on the source server. I typically also will check the "do not mount the database store on startup" just so that if I do get to stopping/starting any services later that I don't accidentally remount the DB that I am attempting to fail over. And once I successfully fail over, I delete the now "old" SG and database from the EMC as well as the EDB file and transaction logs that were associated. I like to keep out of date data tidy like this. If you disagree, at the minimum, you should move these files into a well labeled folder so you know exactly what/when the files were from so that 6 months later (or wherever your comfort level is) you can do housecleaning in an educated manner. In order to ensure we have live data, I sent myself an email just before dismounting the database. You can see behind it the Get-StorageGroupCopyStatus showing the status as Healthy, with the hard coded 50 log replaylaglength.



First, we dismount the active database (in a real failure, something did this for you)


Dismount-Database OECU-EXCH1\ExecDB


Once the source database is dismounted, we can begin the SCR activation process. The first step here is to create a new SG and a new DB. DO NOT USE the folders/files/paths of your SCR data! For example, create a new SG named "RecoverySG" and a new DB named "RecoveryDB" and have their paths be new/unused folders and paths. (this is the part I mentioned above is not clear enough in the technet article)Run powershell as admin so it can do file operations!

New-StorageGroup -Name RecoverySG -LogFolderPath 'D:\Exchange Logs\RecoverySG' -SystemFolderPath 'D:\Exchange Logs\RecoverySG'


New-MailboxDatabase -name RecoveryDB -StorageGroup RecoverySG -EdbFilePath 'D:\Exchange Databases\RecoveryDB\REcoveryDB.edb'


Notice again - those are NEW and empty paths and files. If you attempt to use your SCR data at this point, you will have undesirable results! Now we run the restore command. This checks the status of the log shipping, and will attempt to copy missing log if files if needed. It also disables the original SCR and makes the database viable for mounting.


Restore-StorageGroupCopy "EXCHANGESOURCE\ExecSG" -standbymachine EXCHANGETARGET



Now is a good time for a quick reminder on what SCR is and how it does log shipping. SCR essentially just copies log files to the target, and when a backup occurs, the target will also replay those log files. So if we skip the next step, we risk bringing up a database that essentially is only as up to date as the last backup. If that was just before this test, it may not be a big deal, or may not be noticed (especially in lab where you don't have live mail flow, etc) So now we need to run ESEUTIL /R to replay the log files. This is done by running eseutil from the location of the log files like so:


eseutil /r E00


The /r Exx is the prefix for that databases logs (you can check by looking in the log folder directory for that storage group)


This should replay the logs and bring the database to a clean shut down state. You can confirm this by running eseutil /mh and specify the EDB file. The database state should be Clean Shutdown.


The below command updates the new RecoverySG's paths to match the paths of our SCR Database. The "configurationOnly" flag here tells it NOT to move the existing file, but to just change the configuration.

Move-StorageGroupPath "EXCHANGETARGET\RecoverySG" -SystemFolderPath "D:\Exchange Logs\ExecSG" -LogFolderPath "D:\Exchange Logs\ExecSG" -ConfigurationOnly


Now we need to do the same thing for the Database - point the "new" DB at our "recovered" data.

Move-DatabasePath "EXCHANGETARGET\RecoveryDB" -EdbFilePath "D:\Exchange Databases\ExecDB.edb" -ConfigurationOnly


Now we need to set the database to allow file restore. This is what will allow this database to be mounted.

Set-MailboxDatabase "EXCHANGETARGET\RecoveryDB" -AllowFileRestore:$true


If you skip the above step, when you attempt to mount the database, you will receive an error that appears to be permissions related.

Mount-Database "EXCHANGETARGET\RecoveryDB"


This is where most admins breathe a sigh of relief, but we aren't done - we need to move user s to this DB. Well, not really. What this really does is updates these user's AD objects to have their Exchange server and homeMDB at their new location in the RecoveryDB.

Get-Mailbox -Database "EXCHANGESOURCE\ExecDB" where {$_.ObjectClass -NotMatch '(SystemAttendantMailboxExOleDbSystemMailbox)'} | Move-Mailbox -ConfigurationOnly -TargetDatabase "EXCHANGETARGET\RecoveryDB"


The "where" clause in the middle of this is to prevent you from moving the System mailboxes that are unique to each DB. When you run this command - assuming your SCR target is in a different AD site - keep in mind you will need to sync AD to have your users in the main site start coming back online. You can trigger this in AD Sites and Services or various other ways, or just wait for replication to occur. At this point the users and database should all be online and client access to their data should be restored. If any HT servers in your organization had mail delivered for these users during the outage, it would deliver now. I recommend using OWA to test data as you may have client connectivity issues to troubleshoot as well with Outlook or Outlook Anywhere.


We can see the test email that I sent just prior to dismounting the database on the SCR source, so we know the logs replayed correctly. If your data is "out of date" then the likely thing that did not work is your ESEUTIL /R to replay the logs. You may be able to dismount the database and replay these again, but if you leave the DB mounted for a while and new messages flow in, your log sequence will likely be corrupted or broken. If you do this, you likely will need to go to restoring your DB from the previous night and then attempting to replay the SCR log files again.


Reseeding back to the original source server


Reseeding back is the exact same process, but your source and target are flip flopped. So you re-seed your "live" data that is on your DR server back to your main server. First, clean up the old DB/SG and the files/folders under it on your target server. Then, you can choose to rename/modify any of the DB or SG names or paths to your liking. (This can be skipped if wanted, but needs to be done before you configure SCR) Then, you can repeat the creation of an SCR replica and reseed the data back. Once data is seeded and healthy on the target, you repeat the failover process to "fail back" Once you fail back, clean up all the SG/DB paths/names once again on the DR server. Don't forget to recreate the SCR seed to your DR location after!

Labels: ,

Tuesday, October 06, 2009

Exchange 2007 SCR How to - Part 1

All of this document is based on database portability offered in Exchange 2007 SP1 known as Standby Continuous Replication, or SCR. Microsoft's article is here: http://technet.microsoft.com/en-us/library/bb738132.aspx and one of the most overlooked items I felt in this document is this bit:

I I will get into more details on this below.

Before getting started
Storage group and database paths must match on the source and target. So D:\Exchange Databases\ for the EDBFilePath must be valid on both servers. Due to this, I recommend creating the folder/path structure on the source and target as you go and name everything really smartly so you know what logs you are looking at when you are in explorer. I typically use the two cmdlets below to make sure I have the info I need:

Get-StorageGroup -server EXCHANGESOURCE ft name,logfolderpath,systemfolderpath

Get-MailboxDatabase -server EXCHANGESOURCE ft name,edbfilepath

The System Folder path is recommended to be in the same folder as the log file path for uniformity as well as to reduce the risk of missing it in this step of having to type the default location of C:\Program Files\Microsoft\Exchange Server\Mailbox\Storage Group.

Also, only one database per storage group is supported for SCR log shipping to work and replay logs on the SCR target.

I recommend putting each .edb file into a separate sub folder as well because you later (in eseutil) need to specify the database directory (not path to EDB) to replay logs, and it�s a little intimidating if you have 4-5 edb files in the same directory.

Seeding the databases

On EXCHANGESOURCE, enable SCR:

Enable-StorageGroupCopy -StandbyMachine EXCHANGETARGET -Identity "EXCHANGESOURCE\ExecSG" -ReplayLagTime 0.1:0:00 -TruncationLagTime 0.2:0:0

Those day/time formats are in day.hour:minutes:seconds format - leaving one or two zeroes does not matter for hour/day/time.

ReplayLagTime is the time that the target server will wait to replay a log file into the EDB. Above, it is set to 1 hour. If not specified, the default is 24 hours. While this may work - it can mean replaying a lot of logs, so a lower setting is preferred. There is a hard coded lag of 50 files here. This means you will always see the ReplayLagTime as at least 50 when running Get-StorageGroupCopyStatus

TruncationLagTime is the time that the SCR target will delay deleting a replayed log. This is helpful if there was ever an incident where restoring from backup had to be performed, and the replayed log files from an SCR source could be used to shorten the gap between backup and moment of failure. The Microsoft default for this is 0, however.

You may receive this warning:
WARNING: ExecSG copy is enabled but seeding is needed for the copy. The storage group copy is temporarily suspended. Please resume the copy after seeding the database.

Get-StorageGroupCopyStatus -StandbyMachine EXCHANGETARGET

Will show the SCR replication status, including copy queue length and a suspended status because the DB is not there yet. If not suspended, suspend with:

Suspend-StorageGroupCopy -Identity "EXCHANGESOURCE\ExecSG" -StandbyMachine EXCHANGETARGET

Now, on EXCHANGETARGET, we can seed the database.
Run EMS as administrator, or these will error with "Access to the path (edbfilelocation)\temp-seeding is denied"

Update-StorageGroupCopy -Identity "EXCHANGESOURCE\Executive Staff Storage Group" -StandbyMachine EXCHANGETARGET

This will then seed the data:

If you receive the following error:
Database Seeding Error: Error returned from an ESE function call (0xc7ff1004), error code (0x0).

You need to enable and allow Windows Powershell as a program in Windows firewall.

Once the seeding is completed, the suspend operation should automatically resume. If it does not, you can manually do this with:

Resume-StorageGroupCopy -Identity "EXCHANGESOURCE\ExecSG" -StandbyMachine EXCHANGETARGET

Confirming that the Database seed is healthy

From the SCR source:
Get-StorageGroupCopyStatus -StandbyMachine EXCHANGETARGET

From the SCR target:
Get-StorageGroupCopyStatus -server EXCHANGESOURCE -StandbyMachine EXCHANGETARGET

This outputs something like:

Name

SummaryCopyStatus

CopyQueueLength

ReplayQueueLength

LastInspectedLogTime

ExecSG

Healthy

0

1187

10/6/2009

Obviously, "Healthy" is what you want to see here. If there are NotConfigured, they are either not configured, OR you left the -standbymachine off! If you have errors, check your application event logs, ensure the folder structure is correct and read the next step below.

CopyQueueLength is the number of transaction logs waiting to be shipped. If this number is commonly growing, your WAN connection may not have sufficient bandwidth.

ReplayQueueLength is the number of logs in the SCR target's log directory waiting to be replayed. This number will increase continually until a full backup is taken on the SCR source, at which point the SCR target "replays" these logs and commits them to the EDB on the target server. It is important to know there is a hard coded lag of 50 log files that cannot be changed.

Last InspectedLogTime shows the data and time of the last log inspected on the SCR target. The time usually is � in powershell, so run something like:
Get-StorageGroupCopyStatus -StandbyMachine EXCHANGETARGET ft name, LastInspectedLogTime

Additionally, from the SCR target, you can run test-ReplicationHealth to troubleshoot any issues with SCR. This cmdlet does not work from the source server and errors that LCR (local continuous replication) is not configured. It also accepts a -verbose argument which displays a lot more detail.

Continue reading Part 2 which includes failover and failback as well.

Labels: ,