Microsoft distributes Microsoft SQL Server 2012 fixes as one downloadable file. Because the fixes are cumulative, each new release contains all the hotfixes and all the security fixes that were included with the previous SQL Server 2012 fix release.
Symptoms
When you perform the Discovery activity or the Cleansing activity in SQL Server 2012 Data Quality Services (DQS), the DQS_PROJECTS database may grow very quickly. The transient X_ tables in the DQS_PROJECT database especially quickly.
In particular, you may experience the following issues for the Discovery activity:-
The DQS_PROJECTS data file (MDF) size may be about 30 times the size of source data. This transient data will be deleted when the Knowledge Base is published. The published Knowledge Base will be moved into the DQS_MAIN data file (MDF) and may be about 5 times the size of source data.
-
The DQS_PROJECTS transaction log (LDF) size grows frequently.
In particular, you may experience the following issues for the Cleansing activity:
-
The DQS_PROJECTS data file (MDF) size may be about 40 times the size of source data.
-
The DQS_PROJECTS transaction log (LDF) size may be about 60 to 100 times the size of source data.
Cause
This issue occurs for one or more of the following reasons:
-
Empty rows and column values consume space in transient tables. Most space is consumed by BigInt columns for empty rows and columns in tables that have the name X_INDEX_DATA_SOURCE_TABLE in each schema.
-
The transient data that is stored in X_INDEX_DATA_SOURCE_TABLE internal tables is not immediately released when the project is finished or the Knowledge Base is published.
Resolution
Cumulative update information
SQL Server 2012
The fix for this issue was first released in Cumulative Update 2 for SQL Server 2012 Service Pack RTM. For more information about this cumulative update package, click the following article number to view the article in the Microsoft Knowledge Base:
2703275 Cumulative update package 2 for SQL Server 2012Note Because the builds are cumulative, each new fix release contains all the hotfixes and all the security fixes that were included with the previous SQL Server 2012 fix release. We recommend that you consider applying the most recent fix release that contains this hotfix. For more information, click the following article number to view the article in the Microsoft Knowledge Base:
2692828 The SQL Server 2012 builds that were released after SQL Server 2012 was released Microsoft SQL Server 2012 hotfixes are created for specific SQL Server service packs. You must apply a SQL Server 2012 Service Pack RTM hotfix to an installation of SQL Server 2012 Service Pack RTM. By default, any hotfix that is provided in a SQL Server service pack is included in the next SQL Server service pack.
Status
Microsoft has confirmed that this is a problem in the Microsoft products that are listed in the "Applies to" section.
Workaround
To work around this issue, delete unnecessary data projects and knowledge bases in DQS to help minimize the footprint of the internal tables that are used by the projects.
After you apply this fix, the DQS_PROJECT database size will not automatically decrease. For more information about how to reduce the size of a database, go to the following Microsoft website:How to shrink a databaseNote Installing this hotfix alone does not correct the database growth problems that have occurred in the past, and only is helpful for new projects and activities which are created in the future, and projects and activities which are finished in the future. To manually reduce the size of DQS databases, review the knowledge bases that are no longer used or the data quality projects that are already finished. You can use the data quality client, and then click Open Knowledge Base or Open Data Quality Project to obtain the list of each knowledge base or each project. Right-click the knowledge base name or project name, and then click Delete. You can run the following Transact-SQL script to help estimate the size of each knowledge base and each project that is located in the DQS_MAIN and DQS_PROJECTS databases to better gauge the disk space that is consumed by each.
-- DQSSizes.sql
-- Version 1 - Nov 7, 2011 -JH -- Loops over each table in DQS_MAIN and DQS_PROJECTS to enumerate rows counts + index page counts to estimate size -- Reports by KB, then reports a drill in by Table within each KB -- Count Rows per object USE DQS_MAIN GO CREATE TABLE ##DQSRowCounts (ObjId BIGINT, DBName SYSNAME, SchemaName SYSNAME, TableName SYSNAME, RowCounts BIGINT); EXEC sp_msforeachtable 'INSERT INTO ##DQSRowCounts SELECT OBJECT_ID(''?''), ''DQS_MAIN'', OBJECT_SCHEMA_NAME(OBJECT_ID(''?'')), OBJECT_NAME(OBJECT_ID(''?'')), COUNT(*) FROM ? WITH (NOLOCK)'; GO USE DQS_PROJECTS GO EXEC sp_msforeachtable 'INSERT INTO ##DQSRowCounts SELECT OBJECT_ID(''?''), ''DQS_PROJECTS'', OBJECT_SCHEMA_NAME(OBJECT_ID(''?'')), OBJECT_NAME(OBJECT_ID(''?'')), COUNT(*) FROM ? WITH (NOLOCK)'; -- count pages per object CREATE TABLE ##DQSsizes(DbId INT, objid BIGINT, PageCount INT) INSERT INTO ##DQSsizes SELECT DB_ID('DQS_MAIN'), object_id, SUM(PS.page_count) FROM sys.dm_db_index_physical_stats (DB_ID('DQS_MAIN'),NULL, NULL, NULL, NULL) PS GROUP BY object_id INSERT INTO ##DQSsizes SELECT DB_ID('DQS_PROJECTS'), object_id, SUM(PS.page_count) FROM sys.dm_db_index_physical_stats (DB_ID('DQS_PROJECTS'),NULL, NULL, NULL, NULL) PS GROUP BY object_id GO -- Rows and Size per KB SELECT RC.DBName, RC.SchemaName, KB.Name AS KBProject, SUM(RC.RowCounts) AS TotalRowsAcrossAllTables, SUM(S.PageCount)*8 AS EstimatedKBSizeinKiloBytes, KB.ORIGIN_KNOWLEDGEBASE, KB2.Name AS OriginKBName, KB.CREATED_BY, KB.CREATE_DATE FROM ##DQSsizes S JOIN ##DQSRowCounts RC ON S.DbId=DB_ID(RC.DBName) AND S.objid=RC.ObjId LEFT OUTER JOIN DQS_MAIN.dbo.A_KNOWLEDGEBASE KB WITH (NOLOCK) ON RC.SchemaName = ( CASEWHEN (RC.SchemaName LIKE 'KnowledgeBase%') THEN 'KnowledgeBase'+CAST(KB.ID AS NVARCHAR) WHEN (RC.SchemaName LIKE 'DQProject%') THEN 'DQProject'+CAST(KB.ID AS NVARCHAR) WHEN (RC.SchemaName LIKE 'KnowledgeManagement%') THEN 'KnowledgeManagement'+CAST(KB.ID AS NVARCHAR) END) LEFT OUTER JOIN DQS_MAIN.dbo.A_KNOWLEDGEBASE KB2 WITH (NOLOCK) ON KB.ORIGIN_KNOWLEDGEBASE=KB2.ID GROUP BY DBName, SchemaName, KB.Name, KB.ORIGIN_KNOWLEDGEBASE, KB2.Name, KB.CREATED_BY, KB.CREATE_DATE ORDER BY DBName, SchemaName, KB.Name GO -- Rows and Size per Table per KB SELECT RC.DBName, RC.SchemaName, KB.Name AS KBProject, RC.TableName, RC.RowCounts, S.PageCount*8 AS EstKiloByteSize FROM ##DQSsizes S LEFT OUTER JOIN ##DQSRowCounts RC ON S.DbId=DB_ID(RC.DBName) AND S.objid=RC.ObjId LEFT OUTER JOIN DQS_MAIN.dbo.A_KNOWLEDGEBASE KB WITH (NOLOCK) ON RC.SchemaName = ( CASEWHEN (RC.SchemaName LIKE 'KnowledgeBase%') THEN 'KnowledgeBase'+CAST(KB.ID AS NVARCHAR) WHEN (RC.SchemaName LIKE 'DQProject%') THEN 'DQProject'+CAST(KB.ID AS NVARCHAR) WHEN (RC.SchemaName LIKE 'KnowledgeManagement%') THEN 'KnowledgeManagement'+CAST(KB.ID AS NVARCHAR) END) ORDER BY DBName, SchemaName, KBProject, TableName GO DROP TABLE ##DQSsizes DROP TABLE ##DQSRowCounts
More Information
After you install this hotfix, when DQS creates the internal table X_INDEX_DATA_SOURCE_TABLE within each project's schema, the SPARSE hint is used on the BigInt columns to minimize the space that is used by empty rows and columns to minimize the disk space that is required to hold those objects. This hotfix lets DQS release the transient X_INDEX_DATA_SOURCE_TABLE object. Therefore, the space may be reused, or the size of the database can be reduced after the knowledge base is published or the data quality project is finished. Therefore, when you click the Finish button at the end of the projects, you help free the disk space that is consumed by the activity.