Monday 20 October 2008

High Availability System (P.A.S.S) - Choosing a high-availability option - SQL Server 2008

Choosing a high-availability option

You should ask these questions and more when selecting a High Availability option
  • What is the Service Level Agreement
  • Do they need to protect just a database or the whole SQL Server instance ?
  • Are your customers looking for 99.99% uptime?
  • Do they require automatic or manual failover capabilities?
  • And do they need to protect against a site going offline?

Failover clustering is the best choice to reduce system downtime and provide higher application availability for the whole instance of SQL Server within a site but it does not protect against disk failure and requires a SAN.

Database mirroring is a great choice if there is a need to provide high availability with or without automatic failover on one or more databases within an instance of SQL Server.

If the customer is trying to implement a high-availability or disaster recovery solution with minimal costs and wants multiple standby databases, log shipping would be a perfect fit, since the standby recovery database can span one or more geographical data centers.

You may wish to combine your high availability options to get maximum protection as there isnt one answer to address every requirement.

High Availability System (P.A.S.S) - Log Shipping & Replication - SQL Server 2008

Log shipping

Similar to database mirroring, log shipping offers increased database availability and database protection by maintaining a warm standby database on another instance of SQL Server 2008. Unlike database mirroring, log shipping can maintain one or more warm standby databases, and the standby database can be used for reporting purposes. However, log shipping failover is not seamless or automatic you must perform several manual steps, using Transact-SQL, to successfully complete a failover. Clients and applications using the database must be manually redirected from the primary database to the secondary database after manually bringing the secondary database online.

Replication

In SQL Server, there are many types of replication, allowing organisations to copy databases or portions of the database, known as articles, from one SQL Server instance to another SQL Server instance. Replication can be used for data distribution, synchronization, fault tolerance, disaster recovery, load balancing, reporting or testing. Replication uses a publish-subscribe model; in this model, a primary server, referred to as the publisher, distributes data to one or more secondary servers, or subscribers.

The high-availability replication option in SQL Server is called peer-to-peer transactional replication. When using peer-to-peer transactional replication, all participants in the replication topology are peers. There isn't a hierarchy as with normal transactional replication, and data can be updated on all databases configured in the peer-to-peer replication topology. Therefore, one of the advantages of peer-to-peer replication is that if one of the peers is unavailable, traffic can be redirected to another peer as a form of high availability. In addition, because all peers are updatable and support bidirectional replication, this model can be used for load balancing clients across multiple SQL Server instances.

It is wise to make sure that only one node of a peer-to-peer implentation is written to prevent conflicts on happening i.e lost data due to two nodes overwriting the same row in a table. Peer-to-peer replication in SQL Server 2008 introduces the option to enable conflict detection across a peer-to-peer topology. This option helps prevent the issues that are caused from undetected conflicts, including inconsistent application behavior and lost updates. By enabling this option, by default a conflicting change is treated as a critical error that causes the failure of the Distribution Agent. In the event of a conflict, the topology remains in an inconsistent state until the conflict is resolved manually and the data is made consistent across the topology.

Peer-to-peer replication is supported on both the Enterprise and Standard editions.

Tuesday 14 October 2008

High Availability System (P.A.S.S) - Database Mirroring - SQL Server 2008

Whats is database mirroring ?

Database mirroring maintains a hot standby database (known as the mirror database) that can quickly assume client connections in the event of a principal database outage.

Database mirroring involves two copies of a single database that reside on different computers. At any given time, only one copy of the database is available to clients. This copy is known as the principal database. Unlike log shipping which works by applying full transaction log backups to a warm standby database, database mirroring works by transferring and applying a stream of database log records from the principal database to the other copy of the database (the mirror database).

Database mirroring applies every database modification that is made on the principal database to the mirror database. This includes data changes as well as changes to the physical and logical structures in the database, such as database files, tables, and indexes.

Following are some basic database mirroring terms.

Principal: In a database mirroring configuration, there are two copies of a single database, but only one copy is accessible to the clients at any given time. The copy of the database that the applications connect to is called the principal database. The server that hosts the principal database is known as the principal server.

Mirror: The mirror is the copy of the principal database. The mirror is always in a restoring state; it is not accessible to the applications. To keep this database up-to-date, the log records are transferred from the principal and applied on the mirror database. The server that hosts the mirror database is known as the mirror server.

Witness: The optional witness is an SQL Server instance in a database mirroring configuration. It is separate from the principal and mirror instances. When database mirroring is used in synchronous mode, the witness provides a mechanism for automatic failover.

Send Queue: While sending the log records from the principal to the mirror, if the log records can’t be sent at the rate at which they are generated, a queue builds up at the principal. This is known as the send queue. The send queue does not use extra storage or memory. It exists entirely in the transaction log of the principal. It refers to the part of the log that has not yet been sent to the mirror.

Redo Queue: While applying log records on the mirror, if the log records can’t be applied at the rate at which they are received, a queue builds up at the mirror. This is known as the redo queue. Like the send queue, the redo queue does not use extra storage or memory. It exists entirely in the transaction log of the mirror. It refers to the part of the hardened log that remains to be applied to the mirror database to roll it forward.

Endpoint: An endpoint is a SQL Server object that enables SQL Server to communicate over the network. It encapsulates a transport protocol and a port number.

Failover: When the principal database (or the server hosting it) fails, database mirroring provides a mechanism to fail over to the mirror database.


Some important aspects of database mirroring are:
• The granularity of database mirroring is a database. Mirroring is configured for one database at a time. The whole instance is not mirrored.

• Two copies of the same database are involved in database mirroring, but only one database is accessible to the applications at any given time. You can create a snapshot on the mirror and use it for read-only purposes (a good solution for reporting requirements). However, you cannot directly access the mirror database or back up the mirror database.
• You cannot mirror the master, msdb, temp, or model databases.
• Database mirroring requires that the database use the full recovery model. You can’t use the simple or bulk-logged recovery models.
SQL Server 2005/2008 allows only one mirror database for each principal database.
• One instance can serve as the principal for one database, the mirror for another database, and the witness for yet another database.
• Multiple databases in an instance can be mirrored.
• Applications that connect to a database with ADO.NET or the SQL Native Client (SNAC) can automatically redirect connections when the database fails over to the mirror.
• Data between the principal and the mirror is transferred encrypted by default.
• A database which is mirrored to another server can also be the source database for a log-shipping scenario

Network bandwidth can be a bottleneck for mirrored databases where the rate of transaction log generation is high enough to saturate the network. On synchronously mirrored databases (where a transaction cannot commit on the principal until all the transaction log for it has been hardened in the log of the mirror database), this can even lead to workload throughput degradation.

Log compression which is a new feature of SQL Server 2008 is a way to fix this problem. The downside of this is that compression is not free - extra CPU must be used. The obvious CPU load increases comes from having to compress the transaction log on the principal and then decompress it again on the mirror before hardening it in the log. The non-obvious increase will come because the principal and mirror should end up being able to process more transactions/second, which means more CPU is needed.

This extra CPU could be a problem for some systems that are already pegged in terms of CPU load, and so they may actually see a drop in performance when log stream compression is enabled.

Apart from the obvious benefits of this feature for systems that can afford the extra CPU, it may allow more mirrored databases to perform effective database maintenance - in the form of fragmentation control and index maintenance. When database mirroring is enabled, the database has to be running in the full recovery mode. This means that operations such as index rebuilds are fully logged - that's a lot of extra log being generated if you're used to switching to bulk-logged recovery mode to perform index operations. Of course, this isn't an issue for reorganizing an index (with ALTER INDEX ... REORGANIZE) as this has been fully logged (and has to be that way because of how it works)

Anyway, for some people the fact that these two operations are fully logged means that running them produces so much additional log that, in conjunction with the log from the workload itself, the network becomes a bottleneck for mirroring and it slows down both the maintenance and reduces the workload throughput. The addition of log stream compression means that for these systems, the log may be compressed enough so that the network is no longer a bottleneck and more regular fragmentation control can take place. This in turn will increase workload throughput as well - a double benefit!

Wednesday 1 October 2008

High Availability System (P.A.S.S) - Hardware Methods

SQL Server Database Solutions P.A.S.S Checklist

There are many ways to design a database solution for SQL Server and I try to make sure that I stick to my P.A.S.S checklist
  • High Performance
  • High Availability
  • High Security
  • High Scalability
In the following articles, I will go through my checklist to show you how you can use it within your database designs and implementations.

High Availability

Some people may judge that a high availability system is a system that never has any unscheduled downtime. This can be a mistake as even a scheduled downtime can make your system a non-available system.

To judge if a database solution is a highly available system, you should look at it from a users point a view. A user should be able to do what ever he/she wants to do any time of the day and if that’s not the case, then your system in not a highly available solution and should be redesigned for any potential problems that could occur.

Other things can make your system non-available, from index rebuilding, backing up the database to locking of tables. All these potential problems should be thought of when designing a database solution to make your users more productive.


How can you make your database system, highly available?

One method is to understand your environment and what is meant by the recovery models of a SQL Server database. The longer it takes to recover a database from a failover, the more unavailable your system is. Recovery models can impact on how long your database solution can take to recover.

Another method of making your system more available is to have disk redundancy and placing your databases file on RAID which can also shorten the recovery time.

Other methods to make your system more available is using one or more of the following solutions

Hardware Methods
* Remote Mirroring
* Failover Clustering

Software Methods
* Database Mirroring
* Log Shipping
* Peer to Peer replication

Each one of the have pro and cons and I will try to explain the benefits of each

Remote Mirroring





Remote mirroring is the method of mirroring your database to another SAN (Storage Area Network). This is a good idea if you have two business locations and you want to make sure that if one building is affected by a unspecified problem, the second database at another location would be able to takeover. What happens is that two or more SANS are connected and through this connection, they can make sure that they are in sync. Depending on hardware being used, there are different ways for this method to be implemented.

As remote mirroring can be expensive, you should only use this technique if your organisation uses the same hardware throughout your organisation as each remote mirroring solution is different between vendors. Also you need a dedicated dark fibre connection between the SANS as DSL shared connections will not do.
A negative feature about remote mirroring is that there is not an automatic failover process, so you would have to manual recover the secondary database when a failure occurs.


Failover Clustering

The main features of Failover Clustering are :-

  • Windows Feature
  • Prevents Data Loss
  • Automatic Client Redirection
  • Requires Specialised Hardware (Certified Clustering Software)
  • Not as fast as Database Mirroring
  • Can be geographically dispensed

The way failover clusting works is that you have nodes (SQL Servers) that all connect using a shared hard drive(Shared Disk Array). and on the shared disk array, each node would be given its own resources to be the owners of

i.e.

Node A is the Owner of Drive D
Node B is the Owner of Drive E

Only one node can be the owner of one resource at anytime.

Each node listens to the other node via a heartbeat and if one node detects that the other node, is not functioning, the functioning node would take over the ownership of the non functioning node. This way the functioning node will be able to bring back the SQL Server instance back online.

i.e
Node B is not functioning
Node A is the Owner of Drive D
Node A is the Owner of Drive E

All uncommitted transactions are rolled back and can be restarted on the new server.

The good thing about failover clustering is that if a failure happens on a Node, the other nodes can easily take ownership and business tasks can continue.

Future parts will contain information on
* Database Mirroring
* Log Shipping

* Peer to Peer replication