Data Services: May 2008

Monday, May 26, 2008

Fragmentation with Shrink

Whenever I hear questions on database shrinking, always my first question would be “why you need to shrink”. It is not easy for me to understand why people always look for shrinking a database whenever they see some free space within the database. I always say, forget that the space there; it can be of use later. I would recommend database shrinking only in the following conditions –
1. The database is getting archived
2. We are damn sure that the database won’t grow anymore.
3. The database has a huge amount of free space and we need to restore the database to a server that doesn’t have enough disk space.
A database shrink has too many disadvantages –
1. Increased database fragmentation.
2. Increased file fragmentation.
3. The resources utilization to do the shrink and then the required growth in future
4. The user downtime during the growth and shrink process.
There is a small but great post from Paul Randal on the autoshrink and autogrowth processes, and how it increases fragmentation. Paul being one of the best SQL Server professionals in the world and a long time program manager in the Storage Engine team, I dont think there are many who can say something as convincingly as Paul on this topic. A must read post for any production DBA. See his post here http://www.sqlskills.com/blogs/paul/2007/11/13/AutoshrinkTurnItOFF.aspx

Monday, May 19, 2008

Database Change Logging

Recently, I was asked to configure process with which the developers will get to know the changes that happened on the dev boxes. This was important because we are having new application pilot run in parallel with the existing system, and there were different departments and a lot of teams involved in the development and QA tasks. I thought of sharing how we accomplished this.
The process we agreed upon is to use DDL triggers to track the schema changes, and send the data on a weekly basis to the concerned people.
The first thing we did was to create a log table in the databases. After creating the table, we created a ddl trigger on the database, so that this trigger will get fired on every schema change on the database. The trigger captures the eventdata for each schema change and inserts the information such as login name, host name, complete xml event data etc into the Log table. This helped us to have a complete log on the changes happening on the database schema. Following is the code for the table creation and ddl trigger.
**********************************************************************************
-- Script: CretaTable_DatabaseLog.sql
-- Author: RAJAN JOHN
-- Date: 05/08/2008
--**********************************************************************************
USE [MyDatabase]
GO
SET ANSI_NULLS OFF
GO
SET QUOTED_IDENTIFIER ON
GO
SET ANSI_PADDING ON
GO
CREATE TABLE [dbo].[DatabaseLog](
[PostTime] [datetime] NULL,
[DatabaseUser] [varchar](100) NULL,
[hostname] [varchar](100) NULL,
[Event] [varchar](100) NULL,
[Schema] [varchar](100) NULL,
[Object] [varchar](100) NULL,
[XmlEvent] [xml] NULL
) ON [PRIMARY]
GO
SET ANSI_PADDING OFF

**********************************************************************************-- Script: SendDatabaseChangeLog.sql
-- Author: RAJAN JOHN
-- Date: 05/08/2008
--**********************************************************************************
**********************************************************************************
USE [MyDatabase]
GO
/****** Object: DdlTrigger [ddlDatabaseTrigger_DatabaseChangeLo]
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE TRIGGER [ddlDatabaseTrigger_DatabaseChangeLog]
ON DATABASE
FOR DDL_DATABASE_LEVEL_EVENTS
AS
BEGIN
SET NOCOUNT ON;

DECLARE @data XML;
DECLARE @schema sysname;
DECLARE @object sysname;
DECLARE @eventType sysname;
DECLARE @LoginName sysname;
DECLARE @SPID sysname;
DECLARE @hostname varchar(100);

SET @data = EVENTDATA();
SET @eventType = @data.value('(/EVENT_INSTANCE/EventType)[1]', 'sysname');
SET @schema = @data.value('(/EVENT_INSTANCE/SchemaName)[1]', 'sysname');
SET @object = @data.value('(/EVENT_INSTANCE/ObjectName)[1]', 'sysname') ;
SET @LoginName =@data.value('(/EVENT_INSTANCE/LoginName)[1]', 'sysname') ;
SET @SPID =@data.value('(/EVENT_INSTANCE/SPID)[1]', 'sysname');
SELECT @hostname=hostname FROM master.dbo.sysprocesses WHERE spid=@SPID
INSERT [dbo].[DatabaseLog]
([PostTime],
[DatabaseUser],
[Event],
[Schema],
[Object],
[XmlEvent],
[hostname] )
VALUES
( GETDATE(),
@LoginName,
@eventType,
CONVERT(sysname, @schema),
CONVERT(sysname, @object),
@data,
@hostname );
SET NOCOUNT OFF
END;
GO
SET ANSI_NULLS OFF
GO
SET QUOTED_IDENTIFIER OFF
GO
ENABLE TRIGGER [ddlDatabaseTrigger_DatabaseChangeLo] ON DATABASE

Now, the next step was to provide a report on the schema changes to the concerned people (the development leads, QA leads etc). Initially, we thought of running a query on the log table on a weekly basis, and copy-paste the resultset and send via email. Then we decided on automating this process as well. We wrote a job that uses database mail and send the results of the query to the concerned people. Following is the code for the sp that is run by the job.

*************************************************************************************
-- Script: SendDatabaseChangeLog.sql
-- Author: RAJAN JOHN
-- Date: 05/08/2008
--*************************************************************************************
*************************************************************************************
Declare @DeBug bit
Declare @strServerName varchar(255)
Declare @mailRecipients varchar(max),
@mailImporance varchar(6),
@subjectString nvarchar(255),
@mailBody nvarchar(max),
@mailQuery nvarchar(max),
@resultAttachFlag bit,
@resultAttachFileName nvarchar(255),
@resultColSeperator char(1),
@resultWidth int
--*************************************************************************************
set @DeBug = 1
select @strServerName = CASE CAST(isnull(ServerProperty('InstanceName'), '') AS VARCHAR(100))
WHEN '' THEN CAST(ServerProperty('MachineName') AS VARCHAR(100))
ELSE CAST(ServerProperty('MachineName') AS VARCHAR(100)) + '\' + CAST(isnull(ServerProperty('InstanceName'), '') AS VARCHAR(100))
END
set @mailImporance = 'High'
set @subjectString = N'Weekly Database Change Log - ' + @strServerName + '\MyDatabase'
set @mailBody = N'Please find below the database changes from ' + cast(dateadd(wk, -1, getdate()) as varchar)+ N' and ' + cast(getdate() as varchar)
set @mailQuery = N'SELECT [PostTime], [DatabaseUser], [HostName], [Event], [Schema], [Object], [XmlEvent]
FROM HrTaxMasterDB.dbo.DatabaseLog WHERE DATEDIFF(wk, [PostTime] , GETDATE()) <= 1' set @resultAttachFlag = 1 set @resultAttachFileName = N'Database Change Log' set @resultColSeperator = char(9) set @resultWidth = 32767 --************************************************************************************* if @DeBug = 1 begin set @mailRecipients = N'Rajan.JOhn@MyCompany.com print @SubjectString + char(13) + @mailBody execute msdb.dbo.sp_send_dbmail @profile_name = 'Database Change Log Profile', @recipients = @mailRecipients, @subject = @SubjectString, @body = @MailBody, @body_format = 'TEXT', @importance = @mailImporance, @query = @mailQuery, @attach_query_result_as_file = @resultAttachFlag, @query_attachment_filename = @resultAttachFileName, @query_result_separator = @resultColSeperator, @query_result_width = @resultWidth, @query_no_truncate = 0
end

Capacity Planning - Sizing Memory

Physical memory is one of the important hardware components for a database server. This is because of the fact that, adequate physical memory decreases the physical IO for the server by reusing as much as the data and query plans that exist on the cache.

SQL Server Memory Requirements
SQL Server does dynamic memory allocation. This means SQL Server will allocate whatever memory left on the server that is not being used for its functioning. Whenever any application or the operating system requires additional memory, SQL Server releases memory for their use.
Different components of SQL Server require physical memory for their functioning. The following sections describe the components within SQL Server which are the main users of physical memory.
Buffer Cache Requirements
SQL Server buffer cache is the area in the physical memory where the data pages and index pages are cached. The buffer cache is the largest consumer of physical memory in SQL Server. The dynamic allocation and de-allocation of memory by SQL Server is done mostly by adjusting the size of the buffer pool. The server should be provided with adequate memory to ensure a very high cache hit ratio and the amount of physical IO be reduced to a very minimum level. A decrease in the buffer cache size increases physical IO activities.
Procedure Cache Requirements
Procedure Cache is the area in physical memory where SQL Server caches the compiled execution plans for queries and stored procedures. This is the second largest consumer of physical memory in SQL Server. A healthy size of procedure cache is required to reduce the number of query compilations and optimizations. A decrease in the size of procedure cache increases recompilations which in turn increases processor overhead.
Connection Context Requirements
Each user connection to SQL Server requires a certain amount of memory for handling the user session. This comes around an average of 500KB per connection. Hence, larger the set of concurrent users accessing the database, larger will be the amount of memory required to support the connections.
Other Requirements
There are other components in SQL Server that require memory. These include the lock manager etc.
The details on SQL Server memory allocation can be obtained by the command “DBCC MEMORYSTATUS”
See the hardware comparison details for different SQL Server editions.

Sizing Memory through Load Testing
The best method to estimate the physical memory requirements for a database application infrastructure is through load testing.
We need to do the simulation for the application load with the maximum number of concurrent users and transactions possible, and have to take the memory consumption. Physical memory has to be allocated enough to avoid any memory bottleneck.
A decrease in memory affects the whole system. Memory deficiency increases cache misses and paging, paging increases physical IO and these all increase the processor utilization.

Sizing Memory through Workload Analysis
In all cases, its will not be possible to create the infrastructure prototype and do simulations. In such cases, memory sizing can be carried out by manually doing a workload analysis and calculating the memory requirements by the different components of the server infrastructure.
We can have an example of a sales database application that experiences an annual user accounts growth of 15% and sales data growth of 20%. The following table shows a rough estimation of the memory requirements for procedure cache, user connections etc, and how that will affect the buffer cache.

This is just a rough estimation, assuming that one user connection requires around 500KB and the SQL Server, OS program components together require around 1GB. Similarly, we can do accurate calculations and project the memory requirements.

With this, I am concluding the current Capacity Planning series. I will be demonstrating on a few tools, trends etc in some future posts.

Tuesday, May 6, 2008

Capacity Planning - Planning the Processor Subsystem

Planning the processor subsystem for a database infrastructure is a straight forward task. Normally, all the high end servers will be using the latest available best processors. If all the individual servers are already loaded with the best the maximum supported number of best available processors, the way to increase the processing power is to increase the number of servers. These two ways of increasing processing power is explained in the sections below.
Scale-up vs. Scale-out
Scale up is the process of increasing the processing power of an already existing server either by adding more processors or by increasing the processing power of the processors. Scale out is the process of increasing the processing power of a database infrastructure by increasing the number of servers so that the load can be distributed among the different servers. Both these approaches have got their benefits and downsides.
Scale up is normally considered as a quick fix to resolve processor bottlenecks for database applications. With the arrival of hot swappable CPUs and databases supporting them, this approach doesn’t require little downtime and lesser overhead of planning. The major advantage is that, since it does not increase the number of servers, there is no extra overhead in terms of maintenance, monitoring, licensing etc.
Though scale up can be done easily as a quick fix for issues, it has its downsides. The major disadvantage is that there is always a maximum gain we can get by scaling up. This is because there are limitations for the maximum number of processors supported for a particular operating system as well as the RDBMS. Hence, scaling up cannot be considered as a permanent solution especially when the application is an ongoing growing application.
Scale out is normally considered as a permanent solution for increasing processing power. The most important advantage of this approach is the expandability. We can add more servers and make the application to use the newly added servers.
Since scale out is a more permanent solution, it requires a lot of planning and overhead. The most important disadvantage is that the application should be redesigned to make use of the multiple servers. This may not be an easy task for ongoing applications, and will be a costly affair. There are other disadvantages as well. These include the increased overhead of extra servers for maintenance, monitoring licensing etc.
We can decide whether to go for scale up or scale out depending upon the current situation of the application, and what is the future of the application.
Factors for Considerations
While selecting the processors, there are a few factors to be considered and see how they affect the application environment.
Ø Processor Architecture
Nowadays, processors are available in 32bit and 64bit architectures. We can decide which architecture our infrastructure requires based upon the cost and scalability requirements of the application.
The other important factors that have to be considered when selecting the processors are the following. Since, they are known to most people, I am not explaining each of them.
o Processor Cores
o L2 Cache
o Hyper threading

Data Services