Data Services: 2008

Friday, December 5, 2008

cache store flush

I came across this post while going through links for the cache store flush messages -
http://blogs.msdn.com/sqlprogrammability/archive/2007/01/17/10-0-plan-cache-flush.aspx

Wednesday, November 12, 2008

a "cool" system

http://www.sqlskills.com/BLOGS/PAUL/post/High-end-backup-compression-numbers.aspx
isn't that cool?

Tuesday, October 14, 2008

I personally have been a great fan of Sql DMO for many years while managing hundreds of sql server boxes with a small team. It helped me a lot in audit and common maintenance tasks. With the introduction of Smo, Dmo became a deprecated feature in Sql2k5. With Sql2k8, MS has gone one step further and avoided installing Dmo automatically. However, if you still want to use Dmo for Sql2k8, you can download the feature pack from here -
http://www.microsoft.com/downloads/details.aspx?FamilyId=C6C3E9EF-BA29-4A43-8D69-A2BED18FE73C&displaylang=en. However, it would be nice to start using Smo and Powershell now.

refreshing intellisense cache

The intellisense shipped with sql2k8 SSMS is something very helpful. There is a way to refresh the local Intellisense cache to reflect the latest changes. Go to Edit --> IntelliSense --> Refresh Local Cache.

recompilations by temp tables Vs table variables

There is an interesting post from the sqlcat team on the difference the recompilations make while using a temp table or table variable. See it here - http://blogs.msdn.com/sqlcat/archive/2008/10/09/table-variable-vs-temp-table-and-estimated-vs-actual-query-plan.aspx
There are a few other links that may be useful -
http://blogs.msdn.com/sqlserverstorageengine/archive/2008/03/30/sql-server-table-variable-vs-local-temporary-table.aspx
http://databases.aspfaq.com/database/should-i-use-a-temp-table-or-a-table-variable.html
http://support.microsoft.com/default.aspx/kb/305977

Monday, October 6, 2008

cu1 for Sql2k8

Hardly a month since the release of sql2k8, the first cu has been released. Find the link here - http://support.microsoft.com/kb/956717. To see the instructions on the installation, go here http://blogs.msdn.com/psssql/archive/2008/09/26/welcome-to-sql-server-2008-patching.aspx

Monday, September 22, 2008

changing default backup directory

Ever wondered how to change the default path for backups while using SSMS? Open registry editor and open HKLM\SOFTWARE\Microsoft\Microsoft SQL Server\MSSQL.x\MSSqlServer, where x is the instance number. On the right pane, modify the key value for BackupDirectory to the path of your choice.

Thursday, September 18, 2008

auditing table access in Sql2k8

It is difficult in auditing the SELECT staments in SQL Server. As triggers cant be used for this purpose, trace seems to be the only option until v2005. The new audit objects in v2008 should be a cool new feature to improve the audit tasks. http://blogs.msdn.com/euanga/archive/2008/05/28/sql-2008-how-do-i-tell-if-a-specific-table-is-still-being-used-in-an-app.aspx

Monday, September 1, 2008

rebuilding SQL Server 2008 system databases

Here is a post from the pss team on how to rebuild the system databases in SQL Server 2008. Find the post here http://blogs.msdn.com/psssql/archive/2008/08/29/how-to-rebuild-system-databases-in-sql-server-2008.aspx

Thursday, August 21, 2008

sql2k5 scale out articles

Find a bunch of links to find the ins and outs of scaling out with Sql Server 2005 -
http://blogs.msdn.com/sqlcat/archive/2008/06/12/sql-server-scale-out.aspx

multi-server query with SSMS 2008

A cool new feature with SSMS 2008 http://blogs.msdn.com/sqlrem/archive/2008/02/04/multiple-server-query-execution-in-sql-server-2008.aspx

cumulative update 9 released

SQL Server 2005 Sp2 cumulative update 9 has been released. Happy installations!
http://support.microsoft.com/kb/953752

zeroing for log files

Instant file initialization is an excellent feature that came with Windows 2003 which is made use by SQL Server 2005. When instant file initialization is enabled, the zeroing process for the SQL Server data files is disabled which reduces substantially the time required for file write and growth processes. However, instant file initialization cannot be done for SQL Server log files. The reason is that there are parity bits required by log file sectors to help in crash recovery processes, and if zeroing is disabled, the already existing data in the sectors can be misinterpreted as parity bits and can cause issues. There is a post from Paul Randal explaining the architectural reason. Find the post at http://www.sqlskills.com/blogs/paul/2008/08/20/SearchEngineQA24WhyCantTheTransactionLogUseInstantInitialization.aspx.

Thursday, August 7, 2008

SQL Server 2008 Released!

After the long waits, Microsoft has officially announced the release of the new version of SQL Server to our lives! Read the full story here http://www.microsoft.com/Presspass/press/2008/aug08/08-06SQLServer2008PR.mspx
Along with that, a new edition - "Web Edition" - also has been announced which would be used for internet facing web applications http://blogs.msdn.com/irelandlicensing/archive/2008/07/11/microsoft-sql-server-2008-web-edition.aspx#8824528

Monday, July 28, 2008

Database Versions

As DBAs, these days we get databases of different types. Say, databases that have been created with the current version or databases that belong to old applications that have gone through a few upgradations (from 7 to 2000 to 2005 etc). The question is how can we find out what was the original version of SQL Server when the database was created. There is an undocumented command to find this out - DBCC DBINFO. You can see two entries in the output - dbi_version and dbi_createversion. The value for dbi_version denotes the current version of the database while dbi_createversion denotes the version of SQL Server when the database was originally created. A value of 515 stands for SQL Server 7.0, A value of 539 stands for SQL Server 2000, a value of 611 stands for SQL Server 2005 and a value of 612 stands for SQL Server 2005 with vardecimal option enabled. For more information on DBCC DBINFO command, see Paul's post at http://www.sqlskills.com/blogs/paul/2008/07/11/SearchEngineQA20BootPagesAndBootPageCorruption

Monday, July 21, 2008

Tools to avoid SQL Injection

Owing to the recent mass SQL injection attacks, Microsoft has worked to come up with a few tools to reduce the chances of attacks of the same type. Find the link here -
http://blogs.technet.com/swi/archive/2008/06/24/new-tools-to-block-and-eradicate-sql-injection.aspx

Thursday, July 17, 2008

Tracking FileGrowth

I was always of the opinion that we should avoid using the “autogrow” option ON for production databases - whether small applications or large applications), simply because of the lot of overhead it creates in the expansion of the file, connection issues while the expansion happens, and most importantly, the amount of fragmentation it creates. Recently while going through a production issue, I heard this question from a fellow administrator that how we can find out when a file growth happened (This is a database where the autogrow is set to ON). I dug through some resources, but accidentally found out that is getting tracked in the default trace. We can just0 query the trace file to obtain the growth events. Joining the result with sys.Trace_Events will give us more clear results. Run the below given query and ensure that the path to the trace file is correct -
SELECT TrcEv.Name, Trc.DatabaseName, Trc.FileName, Trc.StartTime FROM fn_trace_gettable(E:\Microsoft SQL Server\MSSQL.1\MSSQL\LOG\log_6.trc', NULL) AS Trc INNER JOIN sys.trace_events TrcEv ON Trc.EventClass = TrcEv.trace_event_id WHERE TrcEv.Name LIKE '%Auto Grow'
ORDER BY StartTime.
If you want to see, how the initial size of the file and the growth factor affects the file growth (and the index fragmentation of course), try the small example given below-

CREATE DATABASE GrowthTest
GO
USE GrowthTest
GO
CREATE TABLE dbo.Employee
(EmployeeName VARCHAR(3000))
GO

DECLARE @i INT
SET @i =1
WHILE @i < 10000
BEGIN
INSERT INTO Employee VALUES(REPLICATE('A', 3000))
END
(I assume that the filegrowth for your model database is the default settings) After this, run the above given query once more to see how many times the autogrowth happened.

Thursday, July 10, 2008

Sql2k8 on August pricelist

Microsoft officially announcing that SQL Server 2008 will be released in August, and that too with no price increase!
http://www.eweek.com/c/a/Application-Development/Microsoft-to-Deliver-SQL-Server-2008-in-August/
http://blogs.technet.com/dataplatforminsider/archive/2008/07/09/sql-server-2008-available-on-august-price-list.aspx

Backups with COPY_ONLY option

We have heard a lot that having regular backups is a must in any database environment. But, there are cases where having even regular full, differential and log backups will not ensure that the database is secure, and you can lose the work for hours or days. This happens due to backup chain break when someone takes a full backup apart from the regular scheduled backups and the subsequent differential backups will have a different backup_lsn. The remedy is using the COPY_ONLY option with full backups which doesn’t break the chain. Paul Randal has illustrated the importance of this with a post which I noticed recently only http://www.sqlskills.com/blogs/paul/2007/10/15/BACKUPWITHCOPYONLYHowToAvoidBreakingTheBackupChain.aspx. So as a DBA, next time when your developers ask you for a backupset of a production database, either restore it from a regular backupset or take a backup with the COPY_ONLY option.

Thursday, June 12, 2008

Reorganization of Non-Clustered Indexes

We all have seen people getting confused on the indexes reorganization processes. One of doubts that often popup is that whether the clustered index reorganization of a table will automatically reorganize all the non-clustered indexes on the table. Well, the answer is that it depends on the version and on the type of the clustered index column.In case of SQL server 2005, the answer is straight forward – the non-clustered indexes will NOT be reorganized at all. If the version is SQL Server 2000, it depends on the clustered index. Let us discuss SQL Server 2000 first. In SQL Server 2000, if the table is a heap, then the non-clustered indexes will locate the row corresponding to a key, based on the physical recordid which consists of the pageid and the record slot. This is a physical value. So in a heap, the lookup reference is a physical location. In case we create a clustered index on the heap or remove the clustered index on a table, these physical locations may get changed due to the rearrangements. Hence, the lookup references for the non-clustered indexes will become invalid which means the non-clustered indexes need to be reorganized.
Now, let us consider that the table is not a heap and has a clustered index. In case the table has a clustered index, the lookup is NOT based on the physical recordid. This is because if there is a clustered index, the lookup can uniquely identify the record using the clustered index key. This is logical reference, because the reference is towards a clustered index key and not to any physical location. However, if we drop the clustered index from the table, the logical lookup references based on the clustered index will become invalid, and hence the non-clustered indexes have to be reorganized.
Now, let’s see what will happen when we reorganize an existing clustered index. We know that a clustered index can be unique or non-unique. To ensure the uniqueness of non-unique clustered index keys, internally SQL Server automatically adds a uniquifier to every key of the clustered index (This does not happen with a unique clustered key). Now, if the non-unique clustered index is reorganized, the uniquifier is recreated, which will in turn make the lookup references invalid. So, the non-clustered indexes have to be recreated to ensure the integrity. But, if the clustered index is a unique clustered index, there is no uniquifier and therefore no uniquifier recreation is required during reorganization. Hence, the non-clustered indexes need not be recreated.
All the above things are the same in SQL Server 2005 except that the uniquifiers are not recreated for a non-unique clustered index during a index reorganization. Instead of recreating the uniquifiers, it will reuse the old uniquifiers. This means, even if the non-unique clustered index is recreated, there is no chance of the lookup references becoming invalid. So, there is no need of recreating the non-clustered indexes neither for a unique clustered index nor for a non-unique clustered index. Hence we can summarize as follows –

(Click on the image for a larger view)
Now, you can go back and see whether your index maintenance plans and scripts require any modification!

SQL Server 2008 RC0

SQL Server 2008 RC0 is now available for download; find it at http://download.microsoft.com/download/2/0/c/20cf6e18-0448-4e7b-8d8c-60a3c4802671/Download_Instructions_ENU.htm

Monday, May 26, 2008

Fragmentation with Shrink

Whenever I hear questions on database shrinking, always my first question would be “why you need to shrink”. It is not easy for me to understand why people always look for shrinking a database whenever they see some free space within the database. I always say, forget that the space there; it can be of use later. I would recommend database shrinking only in the following conditions –
1. The database is getting archived
2. We are damn sure that the database won’t grow anymore.
3. The database has a huge amount of free space and we need to restore the database to a server that doesn’t have enough disk space.
A database shrink has too many disadvantages –
1. Increased database fragmentation.
2. Increased file fragmentation.
3. The resources utilization to do the shrink and then the required growth in future
4. The user downtime during the growth and shrink process.
There is a small but great post from Paul Randal on the autoshrink and autogrowth processes, and how it increases fragmentation. Paul being one of the best SQL Server professionals in the world and a long time program manager in the Storage Engine team, I dont think there are many who can say something as convincingly as Paul on this topic. A must read post for any production DBA. See his post here http://www.sqlskills.com/blogs/paul/2007/11/13/AutoshrinkTurnItOFF.aspx

Monday, May 19, 2008

Database Change Logging

Recently, I was asked to configure process with which the developers will get to know the changes that happened on the dev boxes. This was important because we are having new application pilot run in parallel with the existing system, and there were different departments and a lot of teams involved in the development and QA tasks. I thought of sharing how we accomplished this.
The process we agreed upon is to use DDL triggers to track the schema changes, and send the data on a weekly basis to the concerned people.
The first thing we did was to create a log table in the databases. After creating the table, we created a ddl trigger on the database, so that this trigger will get fired on every schema change on the database. The trigger captures the eventdata for each schema change and inserts the information such as login name, host name, complete xml event data etc into the Log table. This helped us to have a complete log on the changes happening on the database schema. Following is the code for the table creation and ddl trigger.
**********************************************************************************
-- Script: CretaTable_DatabaseLog.sql
-- Author: RAJAN JOHN
-- Date: 05/08/2008
--**********************************************************************************
USE [MyDatabase]
GO
SET ANSI_NULLS OFF
GO
SET QUOTED_IDENTIFIER ON
GO
SET ANSI_PADDING ON
GO
CREATE TABLE [dbo].[DatabaseLog](
[PostTime] [datetime] NULL,
[DatabaseUser] [varchar](100) NULL,
[hostname] [varchar](100) NULL,
[Event] [varchar](100) NULL,
[Schema] [varchar](100) NULL,
[Object] [varchar](100) NULL,
[XmlEvent] [xml] NULL
) ON [PRIMARY]
GO
SET ANSI_PADDING OFF

**********************************************************************************-- Script: SendDatabaseChangeLog.sql
-- Author: RAJAN JOHN
-- Date: 05/08/2008
--**********************************************************************************
**********************************************************************************
USE [MyDatabase]
GO
/****** Object: DdlTrigger [ddlDatabaseTrigger_DatabaseChangeLo]
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE TRIGGER [ddlDatabaseTrigger_DatabaseChangeLog]
ON DATABASE
FOR DDL_DATABASE_LEVEL_EVENTS
AS
BEGIN
SET NOCOUNT ON;

DECLARE @data XML;
DECLARE @schema sysname;
DECLARE @object sysname;
DECLARE @eventType sysname;
DECLARE @LoginName sysname;
DECLARE @SPID sysname;
DECLARE @hostname varchar(100);

SET @data = EVENTDATA();
SET @eventType = @data.value('(/EVENT_INSTANCE/EventType)[1]', 'sysname');
SET @schema = @data.value('(/EVENT_INSTANCE/SchemaName)[1]', 'sysname');
SET @object = @data.value('(/EVENT_INSTANCE/ObjectName)[1]', 'sysname') ;
SET @LoginName =@data.value('(/EVENT_INSTANCE/LoginName)[1]', 'sysname') ;
SET @SPID =@data.value('(/EVENT_INSTANCE/SPID)[1]', 'sysname');
SELECT @hostname=hostname FROM master.dbo.sysprocesses WHERE spid=@SPID
INSERT [dbo].[DatabaseLog]
([PostTime],
[DatabaseUser],
[Event],
[Schema],
[Object],
[XmlEvent],
[hostname] )
VALUES
( GETDATE(),
@LoginName,
@eventType,
CONVERT(sysname, @schema),
CONVERT(sysname, @object),
@data,
@hostname );
SET NOCOUNT OFF
END;
GO
SET ANSI_NULLS OFF
GO
SET QUOTED_IDENTIFIER OFF
GO
ENABLE TRIGGER [ddlDatabaseTrigger_DatabaseChangeLo] ON DATABASE

Now, the next step was to provide a report on the schema changes to the concerned people (the development leads, QA leads etc). Initially, we thought of running a query on the log table on a weekly basis, and copy-paste the resultset and send via email. Then we decided on automating this process as well. We wrote a job that uses database mail and send the results of the query to the concerned people. Following is the code for the sp that is run by the job.

*************************************************************************************
-- Script: SendDatabaseChangeLog.sql
-- Author: RAJAN JOHN
-- Date: 05/08/2008
--*************************************************************************************
*************************************************************************************
Declare @DeBug bit
Declare @strServerName varchar(255)
Declare @mailRecipients varchar(max),
@mailImporance varchar(6),
@subjectString nvarchar(255),
@mailBody nvarchar(max),
@mailQuery nvarchar(max),
@resultAttachFlag bit,
@resultAttachFileName nvarchar(255),
@resultColSeperator char(1),
@resultWidth int
--*************************************************************************************
set @DeBug = 1
select @strServerName = CASE CAST(isnull(ServerProperty('InstanceName'), '') AS VARCHAR(100))
WHEN '' THEN CAST(ServerProperty('MachineName') AS VARCHAR(100))
ELSE CAST(ServerProperty('MachineName') AS VARCHAR(100)) + '\' + CAST(isnull(ServerProperty('InstanceName'), '') AS VARCHAR(100))
END
set @mailImporance = 'High'
set @subjectString = N'Weekly Database Change Log - ' + @strServerName + '\MyDatabase'
set @mailBody = N'Please find below the database changes from ' + cast(dateadd(wk, -1, getdate()) as varchar)+ N' and ' + cast(getdate() as varchar)
set @mailQuery = N'SELECT [PostTime], [DatabaseUser], [HostName], [Event], [Schema], [Object], [XmlEvent]
FROM HrTaxMasterDB.dbo.DatabaseLog WHERE DATEDIFF(wk, [PostTime] , GETDATE()) <= 1' set @resultAttachFlag = 1 set @resultAttachFileName = N'Database Change Log' set @resultColSeperator = char(9) set @resultWidth = 32767 --************************************************************************************* if @DeBug = 1 begin set @mailRecipients = N'Rajan.JOhn@MyCompany.com print @SubjectString + char(13) + @mailBody execute msdb.dbo.sp_send_dbmail @profile_name = 'Database Change Log Profile', @recipients = @mailRecipients, @subject = @SubjectString, @body = @MailBody, @body_format = 'TEXT', @importance = @mailImporance, @query = @mailQuery, @attach_query_result_as_file = @resultAttachFlag, @query_attachment_filename = @resultAttachFileName, @query_result_separator = @resultColSeperator, @query_result_width = @resultWidth, @query_no_truncate = 0
end

Capacity Planning - Sizing Memory

Physical memory is one of the important hardware components for a database server. This is because of the fact that, adequate physical memory decreases the physical IO for the server by reusing as much as the data and query plans that exist on the cache.

SQL Server Memory Requirements
SQL Server does dynamic memory allocation. This means SQL Server will allocate whatever memory left on the server that is not being used for its functioning. Whenever any application or the operating system requires additional memory, SQL Server releases memory for their use.
Different components of SQL Server require physical memory for their functioning. The following sections describe the components within SQL Server which are the main users of physical memory.
Buffer Cache Requirements
SQL Server buffer cache is the area in the physical memory where the data pages and index pages are cached. The buffer cache is the largest consumer of physical memory in SQL Server. The dynamic allocation and de-allocation of memory by SQL Server is done mostly by adjusting the size of the buffer pool. The server should be provided with adequate memory to ensure a very high cache hit ratio and the amount of physical IO be reduced to a very minimum level. A decrease in the buffer cache size increases physical IO activities.
Procedure Cache Requirements
Procedure Cache is the area in physical memory where SQL Server caches the compiled execution plans for queries and stored procedures. This is the second largest consumer of physical memory in SQL Server. A healthy size of procedure cache is required to reduce the number of query compilations and optimizations. A decrease in the size of procedure cache increases recompilations which in turn increases processor overhead.
Connection Context Requirements
Each user connection to SQL Server requires a certain amount of memory for handling the user session. This comes around an average of 500KB per connection. Hence, larger the set of concurrent users accessing the database, larger will be the amount of memory required to support the connections.
Other Requirements
There are other components in SQL Server that require memory. These include the lock manager etc.
The details on SQL Server memory allocation can be obtained by the command “DBCC MEMORYSTATUS”
See the hardware comparison details for different SQL Server editions.

Sizing Memory through Load Testing
The best method to estimate the physical memory requirements for a database application infrastructure is through load testing.
We need to do the simulation for the application load with the maximum number of concurrent users and transactions possible, and have to take the memory consumption. Physical memory has to be allocated enough to avoid any memory bottleneck.
A decrease in memory affects the whole system. Memory deficiency increases cache misses and paging, paging increases physical IO and these all increase the processor utilization.

Sizing Memory through Workload Analysis
In all cases, its will not be possible to create the infrastructure prototype and do simulations. In such cases, memory sizing can be carried out by manually doing a workload analysis and calculating the memory requirements by the different components of the server infrastructure.
We can have an example of a sales database application that experiences an annual user accounts growth of 15% and sales data growth of 20%. The following table shows a rough estimation of the memory requirements for procedure cache, user connections etc, and how that will affect the buffer cache.

This is just a rough estimation, assuming that one user connection requires around 500KB and the SQL Server, OS program components together require around 1GB. Similarly, we can do accurate calculations and project the memory requirements.

With this, I am concluding the current Capacity Planning series. I will be demonstrating on a few tools, trends etc in some future posts.

Tuesday, May 6, 2008

Capacity Planning - Planning the Processor Subsystem

Planning the processor subsystem for a database infrastructure is a straight forward task. Normally, all the high end servers will be using the latest available best processors. If all the individual servers are already loaded with the best the maximum supported number of best available processors, the way to increase the processing power is to increase the number of servers. These two ways of increasing processing power is explained in the sections below.
Scale-up vs. Scale-out
Scale up is the process of increasing the processing power of an already existing server either by adding more processors or by increasing the processing power of the processors. Scale out is the process of increasing the processing power of a database infrastructure by increasing the number of servers so that the load can be distributed among the different servers. Both these approaches have got their benefits and downsides.
Scale up is normally considered as a quick fix to resolve processor bottlenecks for database applications. With the arrival of hot swappable CPUs and databases supporting them, this approach doesn’t require little downtime and lesser overhead of planning. The major advantage is that, since it does not increase the number of servers, there is no extra overhead in terms of maintenance, monitoring, licensing etc.
Though scale up can be done easily as a quick fix for issues, it has its downsides. The major disadvantage is that there is always a maximum gain we can get by scaling up. This is because there are limitations for the maximum number of processors supported for a particular operating system as well as the RDBMS. Hence, scaling up cannot be considered as a permanent solution especially when the application is an ongoing growing application.
Scale out is normally considered as a permanent solution for increasing processing power. The most important advantage of this approach is the expandability. We can add more servers and make the application to use the newly added servers.
Since scale out is a more permanent solution, it requires a lot of planning and overhead. The most important disadvantage is that the application should be redesigned to make use of the multiple servers. This may not be an easy task for ongoing applications, and will be a costly affair. There are other disadvantages as well. These include the increased overhead of extra servers for maintenance, monitoring licensing etc.
We can decide whether to go for scale up or scale out depending upon the current situation of the application, and what is the future of the application.
Factors for Considerations
While selecting the processors, there are a few factors to be considered and see how they affect the application environment.
Ø Processor Architecture
Nowadays, processors are available in 32bit and 64bit architectures. We can decide which architecture our infrastructure requires based upon the cost and scalability requirements of the application.
The other important factors that have to be considered when selecting the processors are the following. Since, they are known to most people, I am not explaining each of them.
o Processor Cores
o L2 Cache
o Hyper threading

Tuesday, April 29, 2008

SQL Injection Attack

As I mentioned in one of my earlier posts, using dynamic SQL is adopted as an easy way of coding. Improper use of dynamic TSQL creates a lot of performance issues because of the way it is treated by the query engine and not all dynamic TSQL queries are possible for caching. More than performance, it is important to avoid dynamic sql as much as possible to avoid the possible security threat of SQL injection. Now, there is a mass SQL Injection going on which has already affected a large number of web pages worldwide. As per F-Secure, a search string searches for all the text fields in the database, and adds a JavaScript snippet so that the web pages display the links automatically. We can read the complete story here – http://www.f-secure.com/weblog/archives/00001427.html. We all know this is an exploitation of a feature available and not a flaw in IIS or SQL Server, and the attack happened only because of bad coding practices. Let us adopt the best coding practices and make our application secure. Performance is important, but I do feel that security is more important than performance.

Monday, April 28, 2008

Capacity Planning - Planning the Disk Subsystem

Planning the disk subsystem for a database infrastructure is the process of deciding the proper RAID strategy that satisfies the optimal throughput requirements and the appropriate storage requirements. Usually, the best practice guidelines insist five separate physical-drive sets with a logical drive each mapping for the requirements of Operating System, Transaction Logs of the production database, the data files of the production database, TempDb and lastly, the backup files storage. Before going through each of these, let us discuss about the various standard RAID configurations and their characteristics. The following table shows the different standard RAID configurations and their characteristics. (Click for a larger display).

We can see that, although the fault tolerant RAID configurations increase the number of IOs per second, we can increase the number of physical disks so that the IOs get divided among the physical disks. This means that we can increase the number of physical disks to increase the throughput. However, there will be limitations for the maximum physical throughput for individual disks such as 300IOps, and also the throughput will be dependant on the throughput limitation of the IO bus.
Requirements for Different Components
Let us take the main components for a SQL Server instance and see how to design the disk requirements of each. As mentioned above, usually, the best practice guidelines insist five separate physical-drive sets with a logical drive each mapping for the requirements of Operating System, Transaction Logs of the production database, the data files of the production database, TempDb and lastly, the backup files storage. The operating system and the SQL Serve program components don’t have very high throughput requirements or storage requirements. The main thing to consider here should be the recoverability. Considering the requirements, we can assign a RAID1 mirror for the operating systems and SQL Server program components so that we can easily recover in case of a failure. For the production database data files, we know that there will be very large storage requirements and high throughput requirements. From the RAID specifications above, we can easily make out that RAID1 cannot be used because of lack of high storage capacity as well as RAID0 cannot be used because it does not offer fault tolerance. Hence, the choice comes between RAID5 and RAID10. RAID5 and RAID10 offer good read while RAID5 is a bad choice for write. Depending upon our application, whether it is an OLTP application or an OLAP application, we can have a choice between these two.
Meeting the Throughput Requirements
To explain the throughput calculations for a database application, let us assume that the client technical team insists that the total disk IO should not exceed 85%. Consider that we have 300IOps disks. This means that each of the individual disks should not exceed 85% of its throughput, i.e. the maximum throughput of an individual disk should not exceed 255IOps. Now, let us assume that, during the peak hours, our application has approximately 600 reads and 200 writes per second. Assuming that we are considering only fault tolerant RAID configurations, let us take each possible configuration to satisfy the requirements. Let us assume that we use RAID1; we know that for RAID1 the Throughput = (Reads + (2 * Writes))/2 = (600 + 400)/2 = 500IOps. As per the requirements, the maximum throughput we can afford for an individual physical disk is 255, while our application with RAID1 will have 500IOps. Hence, we can conclude that RAID1 is not possible for the data files of our application. Now let us assume that we use RAID5; we know that for RAID5 the Throughput = (Reads + (4 * Writes))/No. of Disks. That means, No. of Disks = (600 + 800)/ Throughput; that is No. of Disks = 1400/255 ~= 6. This means that, we require at least 6 disks to satisfy the throughput requirements, if we are adopting a RAID5 configuration for the data files. Now, let us assume that we use RAID10; we know that for RAID10 the Throughput = (Reads + (2 * Writes))/No. of Disks; So, No. of Disks = (Reads + (2 * Writes))/ Throughput. Replacing the variables, No. of Disks = (600+400)/255 ~=4. This means that, we require only 4 disks to satisfy the throughput requirements, if we are adopting a RAID10 configuration for the data files. We have to notice that, there cannot be an odd number of disks for a RAID10 configuration. Even though we calculate that the total disks required is 4.5, we need to have 6 disks to configure the RAID.
Meeting the Storage Requirements
Once we decide the throughput requirements, and the number of disks required to satisfy the application throughput, we need to consider the storage requirements for the application. During the throughput calculation, the main focus was the number of disks required to satisfy the application throughput requirements, while in storage requirements calculations, we consider the capacity of the each disk. Here also, let us assume that the client insists that only 85% of the total disk capacity should be utilized at any time. This means, at any point of time there should be at least 15% free space on the disk. Consider that we are expecting our data files to have a total size of 115GB, we require total space of 115/.85 ~= 136GB. Assuming that we are considering only fault tolerant RAID configurations as above, let us take each possible configuration to satisfy the total 136GB requirement. Let us assume that we use RAID1 even though we ruled out the possibility of using RAID1 because it doesn’t satisfy our throughput requirements. We know that in a RAID1 configuration, we can have only 2 disks. This means each disk should have an individual storage capacity of 136/2 = 68GB. Now let us assume that we use RAID5; we know that in a RAID5 configuration, one physical disk will be used for parity purposes. During the throughput calculations, we concluded that we require at least 6 disks to satisfy the throughput requirements, if we are going for a RAID5 configuration. Hence, the minimum storage capacity for individual disk = 136/5 ~=28GB. Now let us assume that we use RAID10. During the throughput calculations, we concluded that we require at least 4 disks to satisfy the throughput requirements, if we are going for a RAID10 configuration. Among these 4 disks, only 2 can be used for storage purposes because of the mirrors. This means that the individual minimum storage capacity for each disk should be 136/2 ~= 68GB.

Default Trace in SQL Server 2005

There have been always a few questions on keeping the default trace enabled on a SQL Server 2005 production box. Well, personally I do feel that the trace is a very handy tool with which you can find out things that have happened. This is especially true in detecting changes especially in development or testing environments where the dbas don’t have much control on the activities. By the way, tracing out the culprit will not be enough successful if there are shared logins being used. (DONT let people use shared logins; it will trouble you sooner or later). I found the default trace mostly beneficial for environments where there aren’t many experienced DBAs and there isn’t a strong or standard security measures or monitoring infrastructure. Here, the DBAs normally look back to see what happened, and only when someone created some real mess. There have been some discussions on the performance impacts of letting the default trace run on a production environment. I haven’t seen much performance benefits on my environments by disabling the trace, but still I didn’t do any load test to confirm this. There is a post from Linchi I recently found on some load test results. You can find the post at http://sqlblog.com/blogs/linchi_shea/archive/2007/01/16/performance-impact-of-enabling-page-checksum-and-default-trace.aspx?CommentPosted=true. I can’t say that Linchi's testing would be a final one, I recommend you to do your own load testing based upon your applications before coming to a final conclusion.

Wednesday, April 23, 2008

Capacity Planning - Sizing a Production Database Server

The process of designing a database server infrastructure to cater the needs of the application or business before it goes live in to a production environment is known as sizing.
It requires a lot of skilled requirement collection and analysis to project the requirements of the application well before it is deployed. Following are a few generic techniques that are being use to size a database server infrastructure.

§ Capacity Planning Interviews
One of the best techniques to understand an application well before it is deployed is through the interactions with the stakeholders. The stakeholders can be anyone who is going to be using the application; the business analysts who did the system requirements collection and analysis, the technical team who designed and developed the application, or the end users who will use the business. We need to interact with various teams to understand and collect as much as information regarding the application architecture and its usage. Following are the different aspects we need to understand with the interviews.

Ø Business Requirements
Business requirements can be collected by meeting and interacting with the business users and their leaders. We need to understand the nature of the application, who will be using the application, how frequently will the users be using the application, how critical is the application, how many users will be concurrently using the application, how complex would be the usage nature of the application like whether there will be simultaneous data input processes ad data analysis process, whether there will be additional employees joining who will be using the application etc.
The business users requirements is the most important factor that has to be understood while doing a capacity planning. However the application is built and maintained, ultimately it is the application usage pattern that determines the performance and availability of the application.

Ø Technical Requirements
While the business requirements of the application can be obtained from the pure business people, the technical requirements should be collected from the technical people that represent the application. The technical requirements that have to be collected include such as database response time, application uptime, the maintenance windows, the expected performance benchmarks, data growth patterns etc. These requirements have to be collected and documented, so that we can size the infrastructure that meets the requirements.

Ø Technical Environment
Collecting information regarding technical environment of the application comes into the picture in business models where we will not be hosting the infrastructure. In such cases, the environment is equally important as the business requirements collection. This is because we need to understand the technical environment where the database application will be going to run. This includes the network infrastructure, the security infrastructure, how efficient is the client’s system support teams are, do they have proper storage infrastructure etc.
We need to understand whether the company normally supports a staging or implementation server along with the production server. In such a case, any change can be applied and tested on the staging environments before being applied on the production server.

§ Collecting Performance Data and Baselines
This is perhaps the most important step in doing a capacity planning. Having the proper baselines in place is one of the main things people tend to forget or ignore when a new application is being deployed. Rather than collecting the baselines, the main focus those times will tend to have the application satisfy all the functional requirements and gets live in the production.
It is important to have the baselines in place because in the absence of a proper set of baselines and workload analysis data, once the application goes live, we will be ignorant on how the application is expected to behave under various workload and environments. Workload analysis can be defined as the process of collecting information on the expected behavior of the application under various workload and environments. If there is a performance issue notified on a production application, the first thing we can do is to take the baseline under that workload and compare the information with the current response. Thus we can easily detect the bottleneck and resolve accordingly.
Workload analysis requires three main components. They are the prototype server infrastructure, the load simulating tools, and the workload monitoring tools. Each of these three components is explained below.

Ø Prototype Server Infrastructure
A prototype infrastructure is required to be in place that exactly matches, with the production environment where the application is planning to be deployed. The details that are required to simulate the production infrastructure have to be obtained from the client during the requirements analysis phase. Accordingly, a proper prototype has to be established on which the workload analysis can be done. We can see that there are numerous occasions where a matching prototype could not be established due to budget constraints. It doesn’t make sense to compare the results obtained from an environment that doesn’t match with the production environment. Any analysis or tuning should be based on matching environments only.
A few factors that have to be considered while matching the environments are the following –
Hardware configuration
Processor power
Memory configuration
Network infrastructure
Storage configuration
Operating Systems
Operating systems version and patch level
Server build
Application implementation
Tier configurations
Database Instance configurations
For well maintained production environments, there will be proper documentation of infrastructure configurations. Any new infrastructure change, or a new device or asset introduction will be based on a predefined plan and documentation. The documentation include such details as what is the production standard for operating system and its edition and version etc, what is the procedure to build and harden any production server, what is the standard hardware that will be used for a typical production server, what will be the storage configuration for a database server etc. These documentations can be used to replicate the environment for the prototype infrastructure.

Ø Workload Simulation
Once the prototype infrastructure is in place, the next step we have to do is to simulate the workload on the application. For a typical multi user database application, we can avail the help of various tools available for workload simulation. A few examples are Loadrunner, Erwin etc.
Simulation tools can be used to simulate various conditions on the applications. These conditions can be applied on the hardware side as well as on the application usage side. Initially, we need to define a fixed number of environments to be simulated. Then we should define the hardware and software conditions that come under any environment. The response of the application and the infrastructure should be measured against each of these simulated environments. These environments and the response to each of these environments make our performance baseline data. Baselines can be defined as performance measurements gathered over a specific periods of time with specific sampling intervals. We need to ensure that we define a large number of possible environments and performance baseline data is collected against each of these varied environments.

Ø Response Monitoring & Analysis
The application response to the various predefined environments should be monitored and documented. The main purpose of having the monitor logs in place is to have an official documentation for current and future references. The logs display the undisputable track record of the application’s response over various environments. Any SLA or other agreements should use these track records as reference to conclude the response capability of the application and the ability to sustain various pressures and adverse environments.
An example of a load response results for an application can be as shown below.
ConcurrentUsers ProcessorUtilization
10,000 40%
20,000 50%
30,000 60%
40,000 70%
50,000 80%
From the figure, we can see easily make out the relation between the number of concurrent users with processor load. The document shows the “fact” that if the total number of users is less that 10,000, we should not expect a processor utilization of 80% unless and until there is something wrong somewhere. This gives the management the strength to commit the client on the response level of the application, as well as give the technical team a baseline reference to find out any processor related application bottlenecks.
There are various tools that can be used for this kind of monitoring purpose. One of the main tools that can be used for this purpose is the in-built Windows Performance Monitor (Perfmon). Perfmon can be used to track the application response to various software and hardware counters. The tracked response levels can be used to project the response and growth trend of the application with respect to time and load, and used as the main tool for post-configuration capacity planning process. The upcoming sessions explain the monitoring and analysis using such counters specific to physical memory, processor subsystem, disk subsystem etc.
Ø Projecting the Trends
Projection is one of the main processes involved in post-configuration capacity planning. This is explained here because it comes as a continuation and makes use of the monitor logs that we obtained from the load simulation environments.
Projection is the process of estimating the response level of the application with the help of already existing load response logs. In the above mentioned example of users vs. processor utilization, we have information on maximum of 50,000 concurrent users. Assume that the client wants and plans to have 10,000 more analysts to use the application. We can create the trend line from the already existing information as follows.

The graph shows us that the additional users will increase the processor subsystem utilization to 90% of the total processor capacity. This information can help the stakeholders to plan and take decisions that help them effectively use the application for their business.

Linear Growth and Geometric Growth
In the above example, we assumed that there will be a direct relationship between the number of concurrent users and the processor utilization. Such a relationship can be termed as a Linear Projection. It may not be always the case. We need to think that there may be increased amount of data loaded with the additional number of users. This means that there may be an exponential relationship between the users and the processor utilization. Consider the following example where we have the data growth for the application with respect to periods.

Period Total Data
200701 100
200702 110
200703 121
200704 133
200705 146
200706 161
200707 177
200708 195
200709 214
200710 236
200711 259
We can see that the data size is increasing with a pattern of d+ d*(10/100), where d is the previous period’s data size. When we need to project the disk requirements for period 200712, we can easily calculate as 259 + (259*.1), and that is approximately 285. Here, the growth is a geometric growth. The projection will be as follows –

§ Final Decision on the Production Server Infrastructure
The final decision on the database server infrastructure is based upon the performance data and baselines we collected in the performance testing phase. When we share the performance test results with the client, the client will share the required parameters that have to be satisfied by the application like number of concurrent users, uptime requirements etc. based upon their requirements, we can finalize whether the hardware we have for the prototype satisfies the requirements. In the above example, assuming that the technical team of the client agrees that the maximum processor at any time can be up to 50%, it is clear that the processor subsystem of the prototype infrastructure can be sufficient if the number of analysts accessing the application is less than 30,000. In case, either the client business team wants their concurrent users more than 30,000 or if the client technical team wants the maximum processor utilization less than 40% or so, we need to reengineer our processor subsystem to increase the strength. Otherwise, we can easily decide that the processor subsystem is sufficient for the time being.
The final decisions can be so easy always. There will be a lot of parameters to be considered. For example, the client may require that at any time the application should have the past two years’ data. In such cases, the load simulating environments should consider the processor overhead that would be caused by the additional data. In short, the simulated environments should be smart enough to satisfy the forecasted environments for the specific period (like two years, three years etc).

Ø Finalizing the SLAs
The SLA is mutual agreement with the client and the solution provided agreeing that the application will satisfy the predefined requirements. The client technical team will insist on such agreements such as 99.99% uptime, a round trip response time of less than 1 second etc. The final decision on the infrastructure should be taken so that the SLAs can be met. Though the SLAs will come as part of the technical requirement specifications, it is possible that business and budget reasons will force to have adjustments on them once the prototype testing is done. Until the SLAs are met on the prototype infrastructure, the application should be fine tuned.

Capacity Planning - Introduction

Capacity planning for a database infrastructure can be defined as the process of assessing the infrastructure requirements for a database environment before deployment, and projecting and predicting the future requirements for the infrastructure even after the environment gets deployed. Capacity planning involves various tests to measure how an environment should and will behave under the various factors that influence the environment, and planning for the infrastructure with the necessary resources that will help to achieve the response in a pre defined manner.

§ Pre-configuration & Post-configuration Capacity Planning
As mentioned above, the entire Capacity Planning process can be broadly categorized to the planning and the subsequent changes that we do before the environment is deployed, and the planning and changes that we do after the environment is deployed and is in production. We can define these as “pre-configuration capacity planning” and “post-configuration capacity planning”. The pre-configuration capacity planning is often referred as “sizing” the database infrastructure. The post-configuration capacity planning is an ongoing cyclic process of monitoring and projecting, planning and modifying. In precise, Capacity Planning for a database infrastructure can be pictorially represented as below.

In each of the upcoming sections, we will be discussing these processes in detail, and how to carry out each process for the various hardware and other requirements.

Capacity Planning - Series Start

I was asked by my department to take a session on how to do the capacity planning process for a database infrastructure. Since, the community includes data services professionals across a lot of platforms and communities, I tried to make the session a very generic one. I will be writing a series of posts to cover what all I presented in the session. I know there will be a few instances where things would be explaned with the help of SQL Server (as that is by far my expertise); forgive me for that :-)

Wednesday, April 16, 2008

Scripting SQL Authentication Logins

Last week, one of my development dba friends informed an issue. We all know that one of the development dba tasks involved in developing a database is to keep the databases at the development, QA, Staging etc environments synchronized. Here, he has the development database to be shipped to the client staging server. The way he does the syncing is scripting all the objects and run them at the destination. (This is something the client insisted on). Anyways, he found an issue; when he scripted the SQL authentication login, the password that came is a cryptic one. So when he runs the script at the destination, he gets a different password! He had not seen the issue earlier, because he had created the login manually in the dev and QA machines. We thought of creating the login manually, but that would make the client uneasy because the client wants everything as a script only. We discussed, and found a solution. There are a couple of Microsoft recommended stored procedures to create logins using a "hashed" password. This is the link to the article - http://support.microsoft.com/kb/918992
Following is the work around we used to recreate the SQL authentication logins. (Assume that the name of the SQL authentication login we are using for the application is “AppLogin”.) We ran the scripts in the article on a SQL Server 2005 machine that has a SQL Authentication login with name “AppLogin” created with the standard password we wanted to use for your production, QA and development instances. The script created two stored procedures - sp_hexadecimal and sp_revlogin. Executed the stored procedure sp_revlogin as follows EXEC SP_HELP_REVLOGIN @LOGIN_NAME = AppLogin This generated the login creation script as follows CREATE LOGIN [AppLogin] WITH PASSWORD = 0x010031D72F18EC95B59D6B3682E46BFD7A6CD70B34F8783ECEBB HASHED, SID = 0x2B94B0D0C3BCCA4DA04C916FA3A4B3FD, DEFAULT_DATABASE = [master], CHECK_POLICY = OFF, CHECK_EXPIRATION = OFF Note the “HASHED” keyword. This shows that the hexadecimal text that is passed is hashed; while creating the login using this script, SQL Server will automatically take the password corresponding to the hashed value. Once we generated the script with the stored procedures, we included the script in our migration package so that a login with name “AppLogin” and the standard password will be created without having the password hard coded. Just note that, the SID for the login is hardcoded. If you want, you can remove the “SID = 0x2B94B0D0C3BCCA4DA04C916FA3A4B3FD” to ensure that there will not be a conflict with another login with the same SID. I am not sure whether any service pack level differences will generate different hashed value; we can probably test this and confirm.

Saturday, April 12, 2008

Dynamic TSQL vs. CASE

One of the major purposes I have seen people using dynamic SQL is to have dynamic sorting operations. This means, the user should have options to sort a dataset using any column he desires and order (either ascending or descending) of his choice. I have seen many developers using EXEC to solve this, and in turn end up solutions that don’t cache and complicated. I will cover the EXEC vs. sp_executesql in another post.

Following is an example how you can use CASE to replace dynamic sql. I send the following script as an example to one of my friends. The intention was to have a CASE solution for a dynamic sort.

use Tempdb
go
create table MyEmp(
EmpFirstName varchar(25), EmpLastName varchar(25), EmpLocation varchar(25)
)
go

insert into MyEmp values ('TOM', 'JOHN', 'COK')
insert into MyEmp values ('MOHIT', 'MATHEW', 'TRV')
insert into MyEmp values ('JERRY', 'EAPEN', 'TRV')
insert into MyEmp values ('PETER', 'DANIEL', 'TRV')
insert into MyEmp values ('BIJU', 'KURIAN', 'BOM')
insert into MyEmp values ('JOHN', 'MATHEW', 'BOM')
insert into MyEmp values ('TOM', 'PAUL', 'BOM')
insert into MyEmp values ('TOM', 'DANIEL', 'MAA')
insert into MyEmp values ('TOM', 'EAPEN', 'MAA')
go

if exists(select name from sys.objects where type = 'P' and name = 'SortTest')
DROP PROC SortTest
go
create procedure SortTest @SortBy varchar(25), @SortDir varchar(4)
as
begin
select EmpFirstName, EmpLastname, EmpLocation from MyEmp
order by
case @SortDir
when 'ASC' then
case @SortBy
when 'EmpFirstName' then EmpFirstName
when 'EmpLastName' then EmpLastName
when 'EmpLocation' then EmpLocation
else EmpFirstName
end
end ASC,
case @SortDir
when 'DESC' then
case @SortBy
when 'EmpFirstName' then EmpFirstName
when 'EmpLastName' then EmpLastName
when 'EmpLocation' then EmpLocation
else EmpFirstName
end
end
DESC
End
go

EXEC SortTest @SortBy = 'EmpLocation', @SortDir = 'DESC'
go

It gave me the result set as below.

MOHIT MATHEW TRV
JERRY EAPEN TRV
PETER DANIEL TRV
TOM DANIEL MAA
TOM EAPEN MAA
TOM JOHN COK
BIJU KURIAN BOM
JOHN MATHEW BOM
TOM PAUL BOM

Now he came back to me saying that in his application he is getting erros as follows –
“Msg 8115, Level 16, State 2, Procedure sspGetExceptionReportListn, Line 40
Arithmetic overflow error converting expression to data type smalldatetime.”
My bad, I didn’t think about the data type. I rewrote the example and send as follows.
create table EmpDetails
(EmpId int, EmpFirstName varchar(25), EmpLastName varchar(25), EmpJoiningDate datetime)
GO
INSERT INTO EmpDetails VALUES(1, 'TOM', 'JOHN', '2007-08-01 00:00:00')
INSERT INTO EmpDetails VALUES(2, 'MATHEW', 'JOHN', '2007-08-01 00:00:00')
INSERT INTO EmpDetails VALUES(3, 'SYRIAC', 'JOHN', '2007-10-01 00:00:00')
INSERT INTO EmpDetails VALUES(4, 'TOM', 'CHANDY', '2006-07-01 00:00:00')
INSERT INTO EmpDetails VALUES(5, 'IYPE', 'JOHN', '2005-05-01 00:00:00')
INSERT INTO EmpDetails VALUES(6, 'TOM', 'JOSEPH', '2006-09-01 00:00:00')
INSERT INTO EmpDetails VALUES(7, 'LOPUS', 'ALEXANDER', '2001-01-01 00:00:00')
INSERT INTO EmpDetails VALUES(8, 'JIM', 'MATHEW', '2007-08-01 00:00:00')
INSERT INTO EmpDetails VALUES(9, 'ANIL', 'MATHEW', '2007-08-01 00:00:00')
INSERT INTO EmpDetails VALUES(10, 'JOSEPH', 'THOMAS', '2007-08-01 00:00:00')
INSERT INTO EmpDetails VALUES(10, 'ANTONY', 'GOMAS', '2007-08-01 00:00:00')
INSERT INTO EmpDetails VALUES(10, 'CHRISTOPHER', 'THOMAS', '2007-08-01 00:00:00')
GO

IF EXISTS(SELECT NAME FROM SYS.OBJECTS WHERE TYPE = 'P' AND NAME = 'SortEmpDetails')
DROP PROC SortEmpDetails
GO
CREATE PROCEDURE SortEmpDetails @SortColumn varchar(25), @SortDirection varchar(4) AS
SELECT EmpId, EmpFirstName, EmpLastName, EmpJoiningDate FROM EmpDetails
ORDER BY
CASE @SortDirection
WHEN 'DESC' THEN
CASE @SortColumn
WHEN 'EmpId' THEN CAST(EmpId AS INT)
WHEN 'EmpFirstName' THEN EmpFirstName
WHEN 'EmpLastName' THEN EmpFirstName
WHEN 'EmpJoiningDate' THEN CAST(EmpJoiningDate AS DATETIME)
END
END DESC,
CASE @SortDirection
WHEN 'ASC' THEN
CASE @SortColumn
WHEN 'EmpId' THEN CAST(EmpId AS INT)
WHEN 'EmpFirstName' THEN EmpFirstName
WHEN 'EmpLastName' THEN EmpFirstName
WHEN 'EmpJoiningDate' THEN CAST(EmpJoiningDate AS DATETIME)
END
END ASC
GO

EXEC SortEmpDetails @SortColumn = 'EmpJoiningDate', @SortDirection = 'DESC'
I got a result set as follows –
3 SYRIAC JOHN 10/1/07 12:00 AM
1 TOM JOHN 8/1/07 12:00 AM
2 MATHEW JOHN 8/1/07 12:00 AM
8 JIM MATHEW 8/1/07 12:00 AM
9 ANIL MATHEW 8/1/07 12:00 AM
10 JOSEPH THOMAS 8/1/07 12:00 AM
10 ANTONY GOMAS 8/1/07 12:00 AM
10 CHRISTOPHER THOMAS 8/1/07 12:00 AM
6 TOM JOSEPH 9/1/06 12:00 AM
4 TOM CHANDY 7/1/06 12:00 AM
5 IYPE JOHN 5/1/05 12:00 AM
7 LOPUS ALEXANDER 1/1/01 12:00 AM

Yes, the next obvious question was what is the cost involved with the cast we do; yes nothing comes for free. I did a small test on how costly will the cast become. The results I got showed that the cast can be costly if the column that is being casted has a clustered index, the cast operation will override the advantages of a direct index pages read. This is because it scans the index to get the data first, and then does the cast and then does a sort operation to get the final results. The test I did was with a dummy table that has 55,000 records. Following were the readings –

Column - Without Cast - With Cast
Column with Clustered Index - 0.22 milliseconds - 4.18 milliseconds
Column with Non- Clustered Index - 4.17 milliseconds - 4.18 milliseconds

So, this means, casting and sorting for a clustered index reduces performance, while a non clustered index will not be affected much. Anyways, the total performance depends upon how large the table is.

Friday, April 11, 2008

SQL Server 2005 Object Level Permissions

I had been working a few weeks back for one of my friends to extract the object level permissions provided for a bunch of developers. The dba team wanted to have an initial list and then go for a restructuring of the existing security. I wrote a script to help me have this; please find it belwow. Its a much raw one, anyone can suggest modifications or can do the modifications by themselves and reuse it.

/*
SCRIPT TO FIND THE OBJECT LEVEL PERMISSIONS PROVIDED TO SPECIFIC LOGINS.
** THIS DOES NOT INCLUDE THE "ROLE BASED" PERMISSIONS SUCH AS PROVIDING "db_datareader" ROLE FOR A SOPECIFIC LOGIN
** THE GRANTEDBY VALUE WILL ALWAYS RETAIN "dbo" IF THE PERMISSIONS WERE GRANTED BY SOMEONE WHO HAS A "sysadmin" ROLE
** TO BE USED WITH SQL SERVER 2005
Wriitten By: RAJAN JOHN
*/
SET QUOTED_IDENTIFIER ON
GO
SET NOCOUNT ON
GO
SELECT dpm.class_desc as [AccessLevel],
ISNULL(OBJECT_NAME(dpm.major_id), 'N/A')
as [ObjectName], dpr1.name AS [GrantedTo], dpr2.name AS [GrantedBy],
CASE dpm.type
WHEN 'AL' THEN 'ALTER'
WHEN 'ALAK' THEN 'ALTER ANY ASYMMETRIC KEY'
WHEN 'ALAR' THEN 'ALTER ANY APPLICATION ROLE'
WHEN 'ALAS' THEN 'ALTER ANY ASSEMBLY'
WHEN 'ALCF' THEN 'ALTER ANY CERTIFICATE'
WHEN 'ALDS' THEN 'ALTER ANY DATASPACE'
WHEN 'ALED' THEN 'ALTER ANY DATABASE EVENT NOTIFICATION'
WHEN 'ALFT' THEN 'ALTER ANY FULLTEXT CATALOG'
WHEN 'ALMT' THEN 'ALTER ANY MESSAGE TYPE'
WHEN 'ALRL' THEN 'ALTER ANY ROLE'
WHEN 'ALRT' THEN 'ALTER ANY ROUTE'
WHEN 'ALSB' THEN 'ALTER ANY REMOTE SERVICE BINDING'
WHEN 'ALSC' THEN 'ALTER ANY REMOTE SERVICE BINDING'
WHEN 'ALSC' THEN 'ALTER ANY CONTRACT'
WHEN 'ALSK' THEN 'ALTER ANY SYMMETRIC KEY'
WHEN 'ALSM' THEN 'ALTER ANY SCHEMA'
WHEN 'ALSV' THEN 'ALTER ANY SERVICE'
WHEN 'ALTG' THEN 'ALTER ANY DATABASE DDL TRIGGER'
WHEN 'ALUS' THEN 'ALTER ANY USER'
WHEN 'AUTH' THEN 'AUTHENTICATE'
WHEN 'BADB' THEN 'BACKUP DATABASE'
WHEN 'BALO' THEN 'BACKUP LOG'
WHEN 'CL' THEN 'CONTROL'
WHEN 'CO' THEN 'CONNECT'
WHEN 'CORP' THEN 'CONNECT REPLICATION'
WHEN 'CP' THEN 'CHECKPOINT'
WHEN 'CRAG' THEN 'CREATE AGGREGATE'
WHEN 'CRAK' THEN 'CREATE ASYMMETRIC KEY'
WHEN 'CRAS' THEN 'CREATE ASSEMBLY'
WHEN 'CRCF' THEN 'CREATE CERTIFICATE'
WHEN 'CRDB' THEN 'CREATE DATABASE'
WHEN 'CRDF' THEN 'CREATE DEFAULT'
WHEN 'CRED' THEN 'CREATE DATABASE DDL EVENT NOTIFICATION'
WHEN 'CRFN' THEN 'CREATE FUNCTION'
WHEN 'CRFT' THEN 'CREATE FULLTEXT CATALOG'
WHEN 'CRMT' THEN 'CREATE MESSAGE TYPE'
WHEN 'CRPR' THEN 'CREATE PROCEDURE'
WHEN 'CRQU' THEN 'CREATE QUEUE'
WHEN 'CRRL' THEN 'CREATE ROLE'
WHEN 'CRRT' THEN 'CREATE ROUTE'
WHEN 'CRRU' THEN 'CREATE RULE'
WHEN 'CRSB' THEN 'CREATE REMOTE SERVICE BINDING'
WHEN 'CRSC' THEN 'CREATE CONTRACT'
WHEN 'CRSK' THEN 'CREATE SYMMETRIC KEY'
WHEN 'CRSM' THEN 'CREATE SCHEMA'
WHEN 'CRSN' THEN 'CREATE SYNONYM'
WHEN 'CRSV' THEN 'CREATE SERVICE'
WHEN 'CRTB' THEN 'CREATE TABLE'
WHEN 'CRTY' THEN 'CREATE TYPE'
WHEN 'CRVW' THEN 'CREATE VIEW'
WHEN 'CRXS' THEN 'CREATE XML SCHEMA COLLECTION'
WHEN 'DL' THEN 'DELETE'
WHEN 'EX' THEN 'EXECUTE'
WHEN 'IM' THEN 'IMPERSONATE'
WHEN 'IN' THEN 'INSERT'
WHEN 'RC' THEN 'RECEIVE'
WHEN 'RF' THEN 'REFERENCES'
WHEN 'SL' THEN 'SELECT'
WHEN 'SN' THEN 'SEND'
WHEN 'SPLN' THEN 'SHOWPLAN'
WHEN 'SUQN' THEN 'SUBSCRIBE QUERY NOTIFICATIONS'
WHEN 'TO' THEN 'TAKE OWNERSHIP'
WHEN 'UP' THEN 'UPDATE'
WHEN 'VW' THEN 'VIEW DEFINITION'
WHEN 'VWDS' THEN 'VIEW DATABASE STATE'
END
FROM sys.database_permissions dpm INNER JOIN sys.database_principals dpr1
ON dpm.grantee_principal_id = dpr1.principal_id INNER JOIN sys.database_principals dpr2
ON dpm.grantor_principal_id = dpr2.principal_id
WHERE dpr1.principal_id NOT IN (0,1,2,3,4,16384,16385,16386,16387,16388,16389,16390,16391,16392,16393)

Thursday, April 10, 2008

Restoring Reporting Services 2005 Databases

We were having applications that use Reporting Services 2005 as the core Report engine. Though in the initial application development phase the Reporting Services databases backup/restore processes werent an issue, later when we started deploying the application for various clients, it became obvious that we needed a formal procedure to do the moving of the databases between different environments without breaking anything. I wrote a procedure to facilitate this, and I am sharing this for your help.

Reporting Services 2005 uses two databases for storing its information and as workspace to process the reports. For a default installation they will be named “ReportServer” and “ReportServerTempDB”. It becomes necessary at times to copy the databases to recreate the Report Engine of an application on another instance of Reporting Services, mostly for testing purposes and analysis purposes. The following procedure explains the steps to be followed while copying the Reporting Services databases from one instance of Reporting services to another.

IMPORTANT: Ensure that both the source and Destination Reporting Services instances are of the same version and editions before proceeding with the changes.
Execute the following steps in the order to perform a Reporting Services databases from one instance to another.

§ Backup the Databases – Source Instance
1. Open “SQL Server Management Studio” from “Start” à “Programs” à “Microsoft SQL Server 2005” à “SQL Server Management Studio” and connect the Database Engine with the appropriate credentials.
2. Collapse the “Databases” node and select the “ReportServer” database. Click the right button of the mouse and select “Tasks” à “Backup”. This will open the “Backup Database” tool.
3. In the “General” page do the following:
· In the “Source” section, select the Database as “ReportServer” and Backup type as “Full”. · In the “Backup Set” section, ensure that the “Backup Set will expire” is set to “0 Days”.
· In the “Destination” section, select the radio button for “disk” and click the “Add” button. Browse the path to “E:\SigTools” (or any other standard backups location) and set the file name to “ReportServer_”, where is the current date.
4. Click the “OK” button to create the backup set.
5. Repeat Step 2 to 4 to backup “ReportServerTempDB” and save the backup set as “ReportServerTempDB_”, where is the current date.
§ Backup the Encryption Key – Source Instance
Follow the below procedure to backup the encryption key that connects the Report Server with the Reporting Services databases.
1. Open “Reporting Services Configuration” tool from “Start” à “Programs” à “Microsoft SQL Server 2005” à “Configuration Tools” à “Reporting Services Configuration” and connect to the appropriate Reporting Services instance.
2. Select the “Encryption Keys” link to open the “Encryption Key” page and do the following.
· Click the “Backup” button to open the “Encryption Key Information” box.
· In the “Password” box, put the standard complex password and in the “Key File” box, browse to “E:\SigTools” (or any other standard backups location) and set the file name to “EncKey_”, where is the current date.
· Click the “OK” button to backup the key. Ensure that the “Task Status” pane shows no error messages.

§ Copying the Backups to the Destination Instance
Physically copy the backup sets to the destination instance so as so make them available for the destination instance. If the backups were taken to a network shared location where the source and destination instances have access to, this step can be ignored.

§ Backup the Databases – Destination Instance
1. Open “SQL Server Management Studio” from “Start” à “Programs” à “Microsoft SQL Server 2005” à “SQL Server Management Studio” and connect the Database Engine with the appropriate credentials.
2. Collapse the “Databases” node and select the “ReportServer” database. Click the right button of the mouse and select “Tasks” à “Backup”. This will open the “Backup Database” tool. In the “General” page do the following:
· In the “Source” section, select the Database as “ReportServer” and Backup type as “Full”. · In the “Backup Set” section, ensure that the “Backup Set will expire” is set to “0 Days”.
· In the “Destination” section, select the radio button for “disk” and click the “Add” button. Browse the path to “E:\SigTools” (or any other standard backups location) and set the file name to “ReportServer_”, where is the current date.
· Click the “OK” button to create the backup set.
3. Repeat Step 3 and 4 to backup “ReportServerTempDB” to and save the backup set as “ReportServerTempDB_”, where is the current date.

§ Backup the Encryption Key – Destination Instance
Follow the below procedure to backup the encryption key that connects the Report server with the Reporting Services databases.
1. Open “Reporting Services Configuration” tool from “Start” à “Programs” à “Microsoft SQL Server 2005” à “Configuration Tools” à “Reporting Services Configuration” and connect to the appropriate Reporting Services instance.
2. Select the “Encryption Keys” link to open the “Encryption Key” page and do the following.
· Click the “Backup” button to open the “Encryption Key Information” box.
· In the “Password” box, put the standard complex password and in the “Key File” box, browse to “E:\SigTools” (or any other standard backups location) and set the file name to “EncKey_”, where is the current date.
· Click the “OK” button to backup the key. Ensure that the “Task Status” pane shows no errors.

§ Stopping IIS and Report Server Services
1. Open “Services” mmc console from “Start” à “Programs” à “Administrative Tools” à “Services”.
2. Select the “IISAdmin” service and stop the service. This will open a dialog box asking for stopping all IIS dependant services like “WWW Publishing”, “SMTP”, “FTP Publishing” etc. Click “Yes” to continue. This stops IIS and its dependant services.
3. Open “Reporting Services Configuration” tool from “Start” à “Programs” à “Microsoft SQL Server 2005” à “Configuration Tools” à “Reporting Services Configuration”. On the “Server Status” page, click the “Stop” button to stop Report Server.

§ Restoring the Databases – Destination Instance
1. Open “SQL Server Management Studio” from “Start” à “Programs” à “Microsoft SQL Server 2005” à “SQL Server Management Studio” and connect the Database Engine with the appropriate credentials.
2. Collapse the “Databases” node and select the “ReportServer” database. Click the right button of the mouse and select “Tasks” à “Restore” à “Database”. This will open the “Restore Database” tool.
3. Select the “General” page do the following:
· In the “Destination for restore” section, select the Database as “ReportServer”.
· In the “Source for Restore” section, select “From Device” and click the browse button. Select “File” as the “Backup Media Type” and click the “Add” button. Select the backup set for ReportServer database click the “OK” button.
· In the “Backup Sets to Restore” section, select the appropriate backup set.
4. Select the “Options” page, do the following
· In the “Restore Options” section, ensure that the correct file path is provided for the data files and log files.
5. Click the “OK” button to start the database restore process.
6. Ensure that the restore completed successfully.
7. Perform steps 2 to 6 to restore the ReportServerTempDB database.

§ Starting IIS and Report Server Services
1. Open “Services” mmc console from “Start” à “Programs” à “Administrative Tools” à “Services”.
2. Select the “IISAdmin” service and start the service. Start all the IIS dependant services like “WWW Publishing”, “SMTP”, “FTP Publishing” etc which were stopped following the “IIS Admin” service stoppage.
3. Open “Reporting Services Configuration” tool from “Start” à “Programs” à “Microsoft SQL Server 2005” à “Configuration Tools” à “Reporting Services Configuration”. On the “Server Status” page, click the “Start” button to start Report Server.

§ Verifying the Reporting Services Configurations
Follow the below procedure to verify that the Reporting Services configurations on both the source and destination instances are matching.
a. Open “Reporting Services Configuration” tool for the source and destination instances from “Start” à “Programs” à “Microsoft SQL Server 2005” à “Configuration Tools” à “Reporting Services Configuration” and connect to the appropriate Reporting Services instances.
2. Verify that the configurations are exactly matching for both the instances. This includes the following.
· Report Server virtual directory
· Report Manager virtual directory
· Windows Service Identity
· Web Service Identity
3. Click the “Apply” button to save the changes.

§ Restoring the Encryption Key – Destination Instance
Follow the below procedure to restore the Encryption Key backed up from the source server.
1. Open “Reporting Services Configuration” tool for the destination instances from “Start” à “Programs” à “Microsoft SQL Server 2005” à “Configuration Tools” à “Reporting Services Configuration” and connect to the appropriate Reporting Services instance.
2. Open the “Encryption Keys” page and click the “Delete” button to delete the already existing Encryption Key on the destination.
NOTE: Ensure that you have backed up the Encryption Key before you any deletions.
3. Click the “Restore” button to open up the “Encryption Key Information” box. Provide the password and click the browse button to locate the key file. Click the “OK” button to do the restoration.
4. Ensure that there are no error messages in the “Task Status” pane.
5. Open the “SQL Server Configuration Manager” from “Start” à “Programs” à “Microsoft SQL Server 2005” à “Configuration Tools” à “SQL Server Configuration Manager” and restart the service “SQL Server Reporting Services”

IMPORTANT: It is recommended that the destination server is rebooted after the changes.

Using Database Mail in SQL Server 2005

SQL Server 2005 database mail is an easy to use feature that can be called by any application or code to send notification emails. This can be used by DBAs to monitor jobs and batches where there are hundreds of database servers and instances which have large number of periodic scheduled jobs.
SQL Server Database Mail is used to send emails from the SQL Server engine to a set of recipients. We can use the sp_send_dbmail stored procedure in the msdb database to send an SMTP notification emails.

Using the sp_send_dbmail Stored Procedure
The stored procedure msdb.dbo.sp_send_dbmail can be called from the application whenever a notification email has to be send. The only prerequisite required is that the SQL Server instance through which the database email is being sent should have an email profile created. This profile contains the required information such as sender email address, reply address, SMTP server name, SMTP port number etc. The section – Configuring the Email Profile – explains how to configure an email profile.
Following is the syntax for the sp_send_dbmail –
Execute msdb.dbo.sp_send_dbmail
@profile_name = '',
@recipients = ‘’,
@subject = ‘’,
@body = ‘’,
@importance = ‘’,
@body_format = ''

@profile_name – This should be the profile name to be used
@recipients – This should be the list of email addresses delimited by semi colon.
@subject – This should be the subject of the email.
@body – This should be the email body
@importance – This should be either ‘low’, ‘normal’ or ‘high’
@body_format – This should be either ‘TEXT’ or ‘HTML’

Example –
Execute msdb.dbo.sp_send_dbmail
@profile_name = 'Database Mail Profile',
@recipients = ‘abc@myComp.com; xyz@myComp.com’,
@subject = ‘Backup Completed’,
@body = ‘AppData database backed up successfully!’,
@importance = ‘high’,
@body_format = 'TEXT'
You can dynamically assign values to predefined variables and send the variable values to the parameters as well.
Configuring the Email Profile
Follow the below given procedure to configure SQL Server 2005 database mail.
§ Creating the Database Mail Profile
· Connect the SQL Server instance for which Database Mail has to be configured using SQL Server Management Studio.
· Open the “Database Mail Configuration Wizard” from “Management” à “Database Mail” à “Configure Database Mail”.
· If you are using Database Mail for the first time, select “Setup Database Mail” and click the “Next” button to proceed.
· In the “Ne Profile” window, do the following.
o In the “Profile Name” textbox, give the appropriate profile name. E.g. “TCO Database Profile”
o In the description textbox, give purpose of the profile you are creating.
o Under the “SMTP Accounts” section, click the “Add” button to add a new account. This will open up the window “Add a new Account to TCO Database Profile”. Click the “New Account” button.
1. In the “Account Name” textbox, enter the appropriate account name. E.g. “TCO Notification”
2. Give the purpose of the account in the “Description” textbox.
3. In the “Email Address” textbox, chose an appropriate email address. The database email will have sender as this email address. This address need NOT be a valid email address. E.g. TcoNotification@myComp.com.
4. In the “Display Name” section, enter any name that should be displayed as the sender name. E.g. “TCO Notification”
5. In the “Reply Email” section, enter any valid email address if you require that a reply email address is required.
6. In the “Server Name” section, enter the accurate name of the SMTP server through which the emails can be sent. E.g. “EmailServer-1.myComp.com”
7. In the “SMTP Authentication” section, select the appropriate authentication. By default, chose the “Anonymous Authentication”.
8. Click the “OK” button to save the changes. This will bring you back to “Database Mail Configuration” wizard.
NOTE: Contact your Network Administrator for the details on SMTP server details and the authentication information.
o Once the account has been created and configured, click the “Next” button to proceed.
o In the “Profile Security” section, select whether you want the profile to be a “Private Profile” or a “Public Profile”. A private profile can be use by only a single user, while a public profile can be used by any user. Click the appropriate security, and click the “Next” button to proceed.
o In the “Configure System Parameters” section, ensure that the parameter values are appropriate, else make any required change. Click the “Next” button to proceed.
o In the “Complete the Wizard” section, go through the various changes you have made. Click the “Finish” button to complete the configuration.
o Click the “Close” button to exit the wizard.

· You should test the profile you created to ensure the correctness. Follow the steps below to send a test mail.
o Open the “Send Test Mail” window from “Management” à “Database Mail” à “Send Test Mail”. o Select the profile created and enter your valid email address.
o Click the “Send Test Mail” button to send the test mail.
o Click the “OK” button to exit the window.
o Verify that you received the test mail. If you did not receive the test mail, review the configuration and try again.