Monday, April 28, 2008

Capacity Planning - Planning the Disk Subsystem

Planning the disk subsystem for a database infrastructure is the process of deciding the proper RAID strategy that satisfies the optimal throughput requirements and the appropriate storage requirements. Usually, the best practice guidelines insist five separate physical-drive sets with a logical drive each mapping for the requirements of Operating System, Transaction Logs of the production database, the data files of the production database, TempDb and lastly, the backup files storage. Before going through each of these, let us discuss about the various standard RAID configurations and their characteristics. The following table shows the different standard RAID configurations and their characteristics. (Click for a larger display).
We can see that, although the fault tolerant RAID configurations increase the number of IOs per second, we can increase the number of physical disks so that the IOs get divided among the physical disks. This means that we can increase the number of physical disks to increase the throughput. However, there will be limitations for the maximum physical throughput for individual disks such as 300IOps, and also the throughput will be dependant on the throughput limitation of the IO bus.

Requirements for Different Components
Let us take the main components for a SQL Server instance and see how to design the disk requirements of each. As mentioned above, usually, the best practice guidelines insist five separate physical-drive sets with a logical drive each mapping for the requirements of Operating System, Transaction Logs of the production database, the data files of the production database, TempDb and lastly, the backup files storage. The operating system and the SQL Serve program components don’t have very high throughput requirements or storage requirements. The main thing to consider here should be the recoverability. Considering the requirements, we can assign a RAID1 mirror for the operating systems and SQL Server program components so that we can easily recover in case of a failure. For the production database data files, we know that there will be very large storage requirements and high throughput requirements. From the RAID specifications above, we can easily make out that RAID1 cannot be used because of lack of high storage capacity as well as RAID0 cannot be used because it does not offer fault tolerance. Hence, the choice comes between RAID5 and RAID10. RAID5 and RAID10 offer good read while RAID5 is a bad choice for write. Depending upon our application, whether it is an OLTP application or an OLAP application, we can have a choice between these two.
Meeting the Throughput Requirements
To explain the throughput calculations for a database application, let us assume that the client technical team insists that the total disk IO should not exceed 85%. Consider that we have 300IOps disks. This means that each of the individual disks should not exceed 85% of its throughput, i.e. the maximum throughput of an individual disk should not exceed 255IOps. Now, let us assume that, during the peak hours, our application has approximately 600 reads and 200 writes per second. Assuming that we are considering only fault tolerant RAID configurations, let us take each possible configuration to satisfy the requirements. Let us assume that we use RAID1; we know that for RAID1 the Throughput = (Reads + (2 * Writes))/2 = (600 + 400)/2 = 500IOps. As per the requirements, the maximum throughput we can afford for an individual physical disk is 255, while our application with RAID1 will have 500IOps. Hence, we can conclude that RAID1 is not possible for the data files of our application. Now let us assume that we use RAID5; we know that for RAID5 the Throughput = (Reads + (4 * Writes))/No. of Disks. That means, No. of Disks = (600 + 800)/ Throughput; that is No. of Disks = 1400/255 ~= 6. This means that, we require at least 6 disks to satisfy the throughput requirements, if we are adopting a RAID5 configuration for the data files. Now, let us assume that we use RAID10; we know that for RAID10 the Throughput = (Reads + (2 * Writes))/No. of Disks; So, No. of Disks = (Reads + (2 * Writes))/ Throughput. Replacing the variables, No. of Disks = (600+400)/255 ~=4. This means that, we require only 4 disks to satisfy the throughput requirements, if we are adopting a RAID10 configuration for the data files. We have to notice that, there cannot be an odd number of disks for a RAID10 configuration. Even though we calculate that the total disks required is 4.5, we need to have 6 disks to configure the RAID.
Meeting the Storage Requirements
Once we decide the throughput requirements, and the number of disks required to satisfy the application throughput, we need to consider the storage requirements for the application. During the throughput calculation, the main focus was the number of disks required to satisfy the application throughput requirements, while in storage requirements calculations, we consider the capacity of the each disk. Here also, let us assume that the client insists that only 85% of the total disk capacity should be utilized at any time. This means, at any point of time there should be at least 15% free space on the disk. Consider that we are expecting our data files to have a total size of 115GB, we require total space of 115/.85 ~= 136GB. Assuming that we are considering only fault tolerant RAID configurations as above, let us take each possible configuration to satisfy the total 136GB requirement. Let us assume that we use RAID1 even though we ruled out the possibility of using RAID1 because it doesn’t satisfy our throughput requirements. We know that in a RAID1 configuration, we can have only 2 disks. This means each disk should have an individual storage capacity of 136/2 = 68GB. Now let us assume that we use RAID5; we know that in a RAID5 configuration, one physical disk will be used for parity purposes. During the throughput calculations, we concluded that we require at least 6 disks to satisfy the throughput requirements, if we are going for a RAID5 configuration. Hence, the minimum storage capacity for individual disk = 136/5 ~=28GB. Now let us assume that we use RAID10. During the throughput calculations, we concluded that we require at least 4 disks to satisfy the throughput requirements, if we are going for a RAID10 configuration. Among these 4 disks, only 2 can be used for storage purposes because of the mirrors. This means that the individual minimum storage capacity for each disk should be 136/2 ~= 68GB.

No comments: