< Back to Blog

Make More Room: How to Resolve Database Record Limitations

Make More Room: How to Resolve Database Record Limitations

Sometimes in the process of designing a database, developers can underestimate the scope of a project. If that project were to become successful, it would result in significant loads and put undue strain on the system and it’s developers down the line. Eventually, people realize that they are using an INT identity key with a maximum value of 2,147,483,647 as the primary key of a huge table.

On one hand, 2 billion values should be pretty much enough. But what if you may have more than 10 million records added on a daily basis, and you have already used more than one billion of them? That only gives you 114 days to make the necessary fixes or suffer the consequences. That’s not much, especially if the identity key is the kind that is “hardwired” to both the web interface and the client application.

Should you have reached that regrettable moment, have no fear, this article is here to help, even if it’s already too late to change something, and your application is working 24/7. In a worst case scenario, you should already have seen the following error message: “Arithmetic overflow error converting IDENTITY to data type INT”. We at Distillery can offer you several useful solutions for when this situation turns out to be an emergency in your MS SQL Server.

  1. Switch to BIGINT. In a perfect world, the best option would be to invent a time traveling machine, go back in time and change INT to BIGINT. However, changing INT to BIGINT on a more advanced stage of the development process will cause changes not only in the database, but in both server and client applications. If you have the opportunity to change INT to BIGINT without serious consequences, do it and do it fast.
  2. Use negative values. The increment step is set to one by default — IDENTITY(1,1). At the moment when you are about to reach 2 billion, you can use
    DBCC CHECKIDENT (TableWithPKViolation, −2147483647, reseed)

    thus adding additional time before switching to BIGINT. However, this can create a problem if your domain logic prevents the use of negative values. Make sure you domain logic can accommodate first before making the switch.

  3. Create a table with unused values. You need to find those identity key values that were skipped, thus creating a pool of free and unused values. This approach allows the creation of additional time before moving to BIGINT. In any situation when you use increment keys, you will see skipped values, because such values are reserved at the beginning of a transaction and never returned to the main pool in case of a rollback. Thus, a new transaction will get a new value. You can use more free values if you delete data from the table. Your main task is to form a table containing all available values and a logical field that works as a value reservation attribute. You can delete the reserved values from such a table according to the timetable, e.g. once a day. After this, you have to create a function that will provide the required number of identity keys, and use the received results with SET IDENTITY INSERT ON command. If you delete information from the main table on a regular basis, you can add such deleted values to your table as free values. This method can be used only if you don’t need to keep the order of the records in the table, in other words, in case you are not using ORDER BY Id. Otherwise, you have to select a different sorting option.

Below, you will find examples of the third method. In case you already deleted some information from the table while having less than one billion records, it will be more efficient to generate a full table with values, deleting identity keys in use.

  1. Create a table to generate a sequence and fill it with values from 1 to 2,147,483,647.
    CREATE TABLE [dbo].[IntRange](
    	[Id] [int] NOT NULL
    ) ON [PRIMARY]
    SET ANSI_NULLS ON
    GO
    
    SET QUOTED_IDENTIFIER ON
    GO
    
    CREATE PROCEDURE [dbo].[spNewIDPopulateInsert]
    	@batchsize INT = 10000,
    	@startFrom INT = NULL
    AS
    BEGIN
    	SET NOCOUNT ON;
    	SET XACT_ABORT ON;
    
    	IF @startFrom IS NULL
    	BEGIN
    		SELECT @startFrom = MAX(id)
    		FROM dbo.IntRange; 
    	END;
    
    	DECLARE @startId INT = ISNULL(@startFrom,0);
    	DECLARE @rowscount INT = @batchsize;
    	DECLARE @maxId INT = 2147483647;
    
    	WHILE @rowscount = @batchsize
    	BEGIN
    		INSERT INTO dbo.IntRange
    		(id)
    		SELECT id
    		FROM (
    			SELECT TOP (@batchsize)
    				@startId + ROW_NUMBER() OVER(ORDER BY TableWithPKViolationId) AS id
    			FROM dbo.TableWithPKViolation --any table where you have @batchsize rows
    			) AS genids
    		WHERE id < @maxId; 
    
    		SET @rowscount = @@ROWCOUNT;
    				
    		SET @startId = @startId + @rowscount;
    
    		PRINT CONVERT(VARCHAR(50),GETDATE(),121)+' '+ CAST(@startId AS VARCHAR(50));
    	END
    END
    
    GO
    exec dbo.spNewIDPopulateInsert
    	@batchsize = 10000000

    This script involves a TableWithPKViolation table in order to generate the sequence. However, you are free to use any desired or favorite method, including the SEQUENCE command integrated into MS SQL Server. We have selected this method as the most productive one.

    ALTER TABLE [dbo].[IntRange] ADD PRIMARY KEY CLUSTERED 
    (
    	[Id] ASC
    )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, 
    	ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = OFF) ON [PRIMARY]
    GO
  2. Create a table for free values and fill it.
    SET ANSI_NULLS ON
    GO
    
    SET QUOTED_IDENTIFIER ON
    GO
    
    CREATE TABLE [dbo].[NewIds](
    	[NewId] [int] NOT NULL,
    	[DateUsedUtc] [datetime] NULL
    ) ON [PRIMARY]
    
    GO
    SET ANSI_NULLS ON
    GO
    
    SET QUOTED_IDENTIFIER ON
    GO
    
    CREATE PROCEDURE [dbo].[spNewIDPopulateInsertFiltered]
    	@batchsize INT = 10000,
    	@startFrom INT = NULL,
    	@endTill INT = NULL
    AS
    BEGIN
    	SET NOCOUNT ON;
    	SET XACT_ABORT ON;
    
    	IF @startFrom IS NULL
    	BEGIN
    		SELECT @startFrom = MAX([NewId])
    		FROM dbo.NewIds; 
    	END;
    
    	DECLARE @startId INT = ISNULL(@startFrom,0);
    	DECLARE @rowscount INT = @batchsize;
    	DECLARE @maxId INT = ISNULL(@endTill,2147483647);
    	DECLARE @endId INT = @startId + @batchsize;
    
    	WHILE @startId < @maxId BEGIN INSERT INTO [NewIds] ([NewId]) 
    		SELECT IR.id FROM [dbo].[IntRange] AS IR WHERE IR.id >= @startId
    		AND IR.id < @endId 
    		AND NOT EXISTS ( SELECT 1 FROM [dbo].[TableWithPKViolation] as Tb WITH (NOLOCK) 
    			WHERE Tb.TableWithPKViolationId = IR.id ); 
    			SET @rowscount = @@ROWCOUNT; 
    			SET @startId = @endId; 
    			SET @endId = @endId + @batchsize; 
    			IF @endId > @maxId SET @endId = @maxId;
    
    		PRINT CONVERT(VARCHAR(50),GETDATE(),121)+' '+ CAST(@startId AS VARCHAR(50));
    	END
    END
    
    GO
    -----
    -----Run each part in separate window in parallel
    -----
    
    --part 1
    exec dbo.spNewIDPopulateInsertFiltered @batchsize = 10000000, @startFrom = 1, 
    	@endTill= 500000000
    --end of part 1
    
    --part 2
    exec dbo.spNewIDPopulateInsertFiltered @batchsize = 10000000, @startFrom = 500000000, 
    	@endTill= 1000000000
    --end of part 2
    
    --part 3
    exec dbo.spNewIDPopulateInsertFiltered @batchsize = 10000000, @startFrom = 1000000000, 
    	@endTill= 1500000000
    --end of part 3
    
    --part 4
    DECLARE @maxId INT
    
    SELECT @maxId = MAX(TableWithPKViolationId)
    FROM dbo.TableWithPKViolation
    
    exec dbo.spNewIDPopulateInsertFiltered @batchsize = 10000000, @startFrom = 1500000000, 
    	@endTill= @maxId
    --end of part 4
    ALTER TABLE [dbo].[NewIds] ADD  CONSTRAINT [PK_NewIds] PRIMARY KEY CLUSTERED 
    (
    	[NewId] ASC
    )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, 
    	ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
    GO
    
    CREATE NONCLUSTERED INDEX [IX_NewIds_DateUsedUtc] ON [dbo].[NewIds]
    (
    	[DateUsedUtc] ASC
    )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, 
    	ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = OFF)
    
    GO
    
    ALTER TABLE [dbo].[NewIds] SET ( LOCK_ESCALATION = DISABLE )
    GO

    Check that the table was generated properly. The NewId table shall not contain any identity keys that are included in main TableWithPKViolation table.

  3. Create a process that will mark and return all the available values.
    create PROCEDURE [dbo].[spGetTableWithPKViolationIds]
    	@batchsize INT = 1
    AS
    BEGIN
    	SET NOCOUNT ON;
    	SET XACT_ABORT ON;
    
    	DECLARE @rowcount INT,
    		@now DATETIME = GETUTCDATE();
    
    	BEGIN TRAN
    
        UPDATE TOP (@batchsize) dbo.NewIds
    	SET DateUsedUtc = @now
    	OUTPUT inserted.[NewId]
    	WHERE DateUsedUtc IS NULL;
    
    	SET @rowcount = @@ROWCOUNT; 
    
    	IF @rowcount != @batchsize
    	BEGIN 
    		DECLARE @msg NVARCHAR(2048);
    
    		SET @msg = 'TableWithPKViolationId out of ids.'
    			+ 'sp spGetTableWithPKViolationIds, table NewIds. '
    			+ 'Ids requested '
    			+ CAST(@batchsize AS NVARCHAR(255))
    			+ ', IDs available '
    			+ CAST(@rowcount AS NVARCHAR(255));
    
    		RAISERROR(@msg, 16,1);
    		ROLLBACK;
    	END
    	ELSE
    	BEGIN
    		COMMIT TRAN
    	END;
    END
    GO
  4. Take all the procedures that add information to the main table and return SCOPE_IDENTITY() and add procedure invocation. If your time is limited or your hardware is relatively powerful, you can create an INSTEAD OF INSERT trigger, thus avoiding changes in a huge number of procedures. I will show you an example of procedure changing.
    CREATE TABLE #tmp_Id (Id INT);
    
    INSERT INTO #tmp_Id
    EXEC spGetTableWithPKViolationIds @batchsize=@IDNumber;
    
    SELECT @newVersionId = Id
    FROM #tmp_Id;
    
    SET IDENTITY_INSERT [dbo].[TableWithPKViolation] ON;

    Please note that SET IDENTITY_INSERT option requires a user with enough rights to use ALTER command for TableWithPKViolation table.

  5. Next, we need to set identity key table clearing according to the time-table.
    create PROCEDURE dbo.spCleanupNewIds @batchSize INT = 4999
    AS
    BEGIN
    	SET NOCOUNT ON
    
    	DECLARE @minId INT
    	DECLARE @maxId INT
    
    	SELECT @minId = Min([NewId]), @maxId = MAX([NewId])
    	FROM dbo.NewIds WITH (NOLOCK)
    	WHERE DateUsedUtc IS NOT NULL;
    
    	DECLARE @totRowCount INT = 0
    	DECLARE @rowCount INT = @batchSize
    
    	WHILE @rowcount = @batchsize
    	BEGIN
    		DELETE TOP (@batchsize)
    		FROM dbo.NewIds
    		WHERE DateUsedUtc IS NOT NULL AND [NewId] >= @minId AND [NewId] <= @maxId
    
    		SET @rowcount = @@ROWCOUNT
    
    		SET @totRowCount = @totRowCount + @rowcount
    	END
    
    	PRINT 'Total records cleaned up - ' + CAST(@totRowCount AS VARCHAR(100))
    END
    GO

In the case that sequence of identity keys plays an important role, this method will be very helpful, because all new values are given incrementally. However, you need to solve the problem of the transition period. The best solution is to switch existing procedures requiring proper sequence to a different field in the same table, e.g. to the date field.

There you go! This smart solution allows you to buy some additional time before switching to BIGINT. Nevertheless, we highly recommend you avoid these sorts of difficult situations at any cost by planning your transition in advance. If you need help administering your database, our DBA team will be happy to help!

 

kristina-kucherova

About Author

Kristina Kucherova started her career as a software engineer at Russian payment service provider moving to Distillery in early 2016. Working as DBE she’s being focused on performance tuning and fault prediction. Database architecture is her true passion and Kristina is a frequent conferences speaker on the topic as well as a lecturer at Rostov-on-Don State College of Communications and Informatics.