Insights

SQL Server Backup Recovery Rubook: Generic Runbook Template for IT Services

IT Strategy Remote DBA Support Runbook SQL Server Success Stories

SQL Server Backup Recovery Rubook: Generic Runbook Template for IT Services

1         Introduction

Every SQL Server in the production environment must have a backup plan in order to recover from partial or entire data loss for the database.  A backup Policy created and based on the business requirements can assure the recovery of lost data to a point in time. A full backup (a snapshot of the data at a point in time) can be used for these purposes.

2        Purpose

The purpose of this document is to outline the standard procedure to deal with SQL Server Backups, the schedule, the retention of the data files to maintain, and the storage method used with the backed up SQL data files.

This document will also outline how backups are tested to ensure their integrity.

3         Scope

The scope of the procedure described here is intended for the use of native SQL Server tools to perform Full SQL Server Backups following the best practices in the industry, limited only by operating environment restrictions.

 

If the SBU determines that changes are required in how backups are performed or  SQL Server backup retention because of new directives or to satisfy new SLA agreements using third party tools, a different procedure may be required.

 

4         Definitions and abbreviations

SBU: Strategic Business Unit. An organization within the Company focused on providing a specific function or serving a client.

Full SQL Backup:  A complete SQL backup from a database,  this makes a copy of all  physical files that make up the logical database as is known inside SQL Server.

Snapshot:  A copy of a SQL Server database that represents the state of the data at the point in the time that the snapshot was taken.

 

5         Full SQL Server Backup Overview

This SQL Server backup procedure takes into account the frequency of how often the data is changed.  It assumes that a backup must occur every working day because each working day there is new data coming to the databases.  It is mandatory to ensure that the new data is protected from any kind of loss.

In order to do the SQL Server backups, a Stored Procedure was created on each SQL Server.  It is executed by the SQL Agent via a SQL Job and scheduled to automatically run locally at pre-defined times every night to ensure that there is no overlap with any other production SQL jobs.

The SQL Server databases use a simple recovery model which truncate all transactions after completion. This means that backing up the log file is not useful.  Taking this as a base line, we are doing only Full database Backup from each SQL Server database.

The SQL Server databases files being backed up are safely kept in a Network Share storage to ensure that they can be easily retrieved in the event of a Server disk failure.

The SQL Server backup files are kept until  the retention period is satisfied and when that period is reached, the old SQL Server backed up  files are deleted in order to recycle the  disk space to be used by new and fresh SQL Server backups files.

The retention time of the SQL Server Backups files are limited by the physical disk space available so in the best scenario, the retention of files can be up to 3 days.

When the SQL Server Backup Job finishes, it sends an email the DBA team members notifying them of the success or failure of the SQL Server Backup Job.

 

 

6         Inputs / outputs

6.1          Inputs

 

  1. The availability of the Network Share is essential to ensure that the new SQL Server Backups that be generated can be saved
  2. The SQL Server DB mail must be enabled and configured to be able to send emails
  3. Is requested that no other maintenance SQL Job runs at the same time of the SQL Server Backup Job runs due to the nature of accessing the data files at the OS level.
  4. No SQL maintenance Job can be executed during the time that the SQL Backup Job is running and this includes:
  • Any Transaction Log Shrink action
  • Any Rebuild, Defrag or Reindex action over the database’ indexes being backed up
  • Any other action related to database file manipulation.

6.2         Outputs

  1. As a result of the SQL Server Backup Job execution, the SQL Server databases back up files are saved on the Network Share assigned to this purpose.
  2. An email is sent to the DBA Team members as result of a successful operation or an email is sent to Help Desk requesting to open an Incident Ticket as a result of a unsuccessfully operation.

 

 

7         Full SQL Server Backup Procedure description

The automatized SQL Server Backup Job starts at 12:05 am EST daily.

 

The SQL Backups will be saved as files on disk: \\Filer1\SQLBackups\

 

The retention time is 2 days due to space restrictions on the Network Share Onsite \\Filer1\SQLBackups\.

 

The Stored Procedure defined to execute the SQL Server Backup does the following actions:

 

  • Create an initial list of the last time that each database was backed up from the SQL Server and name it SQL_Devices.

 

  • Create an initial list of all databases present in the DATABASE catalog on the SQL Server, leaving out databases from Development, Test, installations or DEMO purposes and databases for Sorts purposes also.

 

  • Does a Full backup of the qualifying databases and writes it to the Network Share location.

 

  • After each database is backed up successfully, it deletes the corresponding old back up file on the Network Share where the retention date is greater than the retention time defined by the SQL Backup Job.

 

  • Validates that all databases were backed up doing a second cycle and checking that all databases have their new backup created.

 

 

If the Stored Procedure defined to do the SQL Backups finish successfully then, send an email listing the newly backed up databases to the DBA team members.

 

If the Stored Procedure defined to do the SQL Backups finishes unsuccessfully or with errors then, send an email to Help Desk requesting to create un Incident Ticket and assign it to the Database queue of the Division and SBU given the SQL Server name and IP address and describing that SQL Server Backup Job Fails.

 

The last email (Successfully or not) completes the SQL Backup Procedure.

 

If the job is unsuccessful an Incident ticket is generated and assigned to the members of the DBA team.  After the root cause of the issue is resolved, the DBA execute the SQL Server Backup manually once more in order to take a fresh full SQL backup.

 

8        Test SQL Server Backups files

As a regular task the DBA team does a weekly back up report each Monday in order to check the current state of each database backup from the last SQL Server Backup action. With this report we check to ensure we have a successful backup from each database.  If not, we investigate the root cause of the failure and resolve it. When this is fixed, we do a manually backup of that failed database backup. You can see a sample of this report in the Attachments section.

In order to assure that the SQL Server Backup procedure is working properly it is necessary to periodically verify the process and file integrity. To accomplish this task each quarter we randomly select and restore a set of backed up files on to the test SQL Server within a controlled environment of restore. 

 

8.1         How it works

With restoring each database from a previously backed up database , we are sure that the SQL Server Backup process described  in this procedure is working properly because when SQL Server does a database restore, it verifies the internal structures of the backed up database and all objects contained within such as tables, indexes, views, stored procedures functions, triggers, etc.

If the previously backed up database can be restored into the SQL Server then we are sure at 100% that SQL Server Backup process is working and the integrity of the backed up files is being consistently maintained over time.

 

 

8.2        How we verify

 

A  Stored Procedure is defined to execute the SQL Server Backup Restore Test that does the following actions for each database:

 

  • Create an initial list of the last time that each database was backed up from the SQL Server and name it SQL_Devices, it includes the databases name and path location.

 

  • Check the drive to ensure there is enough space to handle the restore.

 

  • Restore the qualifying database that was written on the Network Share location by the SQL Server Backup process naming it with the RESTORED prefix.

 

  • If the Restore action was successful the test of this database backup is successful.

 

  • If the test is successful, detach and delete the restored database, if was not successful, report and investigate the cause and do a remediation on why it was not able to be restored, identifying if the failure is due by some current action or by some issue with the backed up database file.

 

  • Document and log the Restore results of this test using the SQL Backup Test Review template and save it on the under folder “Shared Documents/ Backup Tests Reviews/Division/SBU Name” and select the proper folder for SQL instance Server also in there. These folders are over the Database Administration Team site (DBA’s sharepoint) located on https://internalnetwork.com/sites/DB/default.aspx . The template exists on the root folder of “Backup Tests Reviews/” of the same site.

 

If there was some failed restore due to backed up database file issue, resolve the cause of it and find out why this was not discovered before.