Visual Basic To Load Text Files To Database 10 Times Faster ( .OpenTextFile / ADODB.Connection / Fixed Length )

 

Link to Visual Basic To Load Text Files To Database 10 Times Faster ( .OpenTextFile / ADODB.Connection / Fixed Length )

 

 

 

Load Text Files To A Database 10 Times Faster
By Robbe D. Morris

While completing a contract project for an Outbound Telemarketing company recently, I ran into a situation where a creative solution to an age old problem was needed.  We desperately needed to improve the performance of our desktop application that read in fixed position, fixed length, ASCII text files and handled various inserts and updates to multiple tables for each record.  To complicate matters, the load process had to take the following requirements into account:

1.
All access to the SQL Server database had to be done remotely.  Local access was not allowed.

2.
With each record read in from the text file, conditional statements as well as table inserts with the newly generated foreign keys needed to be handled accordingly.

3.
It had to be fast enough to be feasible over a very slow network.

4.
Needed to be simple enough for our computer operators to run without support from the programming group.

5.
Be able to insert characters that would typically foul up standard SQL insert/update statements.

These requirements really left me with the option of creating a desktop app that could read the text files line by line, pull out specific pieces of data by position on the record, and then perform the various inserts and updates required.  Typically, an approach using the ADO Command object and SQL Server stored procedures would be used on a record by record basis.  Of course, loading 50,000 to 500,000 records can be quite time consuming and I needed an approach that would be much faster.  So, I put on my thinking cap and began to look for alternatives that could at least double the speed of our current processes.  As it turned out, I was able to increase that speed by at least a factor of ten just working against a local instance of SQL Server.  Depending on the size of the text file (ie number of records to process), the speed improvements gradually improved above and beyond the factor of 10 when applied to our remote database servers over the network.

Here’s how I did it.  I assumed that the largest time consumer in the process was the repeated trips to the database for each record.  A little test I put together confirmed that.  With this in mind, a test app was written to compare the current approach against one that would send a thousand records at a time to the database to be processed.  Rather than parsing the text file locally in the desktop app for each record, I created a process to package up a thousand records at a time and send the whole string to a similar stored procedure.  The stored procedure would iterate through the string and parse out the individual pieces of data and react accordingly.  A key point to mention is that I used a slight variation of the string builder class mentioned in another article here on EggHeadCafe:  A Fast String Class for ASP Pages.  Using standard string concatenation techniques would have dramatically impacted the performance gains in this endeavor.

In your implementations, you may have to adjust the number of records you can send to the stored procedure at any given time.  If you don’t need unicode support, you can set the internal string to be varchar and support 8000 bytes.  Otherwise, if you use nvarchar, you may need to reduce that.  You may also opt to use a slightly different approach to parse the data directly from the varchar input parameter directly.  In either event, it is just something you need to consider.  The fewer trips you make to the database, the faster your process will be.

All of the test code has been included here for your review.  You’ll notice that I used a column in the tables called RecordNum.  I’ve often found it useful to flag records as they are being loaded to make it much easier to break out large text files into smaller groups of records to be used on predictive dialers.  As always, if you would like to comment on this article, feel free to email me at the address at the bottom of this article.

Visual Basic basMain.bas Module (Primary Test Oriented Code)

 

   Option Explicit

Public Const mksTestFileName = "egghead.txt"
Public UI As cUI

Public Enum ekTestOptions
            EggHeadTest1 = 1
            EggHeadTest2 = 2
End Enum
            
 
Public Sub Main()

   Set UI = New cUI
   
   UI.AppPath = App.Path & "\"
    
    If InStr(1, UI.AppPath, "VB98", vbTextCompare) > 0 Then
       UI.AppPath = "C:\EggHeadTest\"
       ' This is to prevent VB from using the default Visual Studio as it's root
       ' when developing in the IDE.  Simply change the path to fit the
       ' folder where your source code resides.  This has no effect on
       ' production executables.
    End If
    
   DeleteExistingTestRecords
   CreateTestFile
   
   UI.TimerInit
   LoadTest ekTestOptions.EggHeadTest1
   Debug.Print "Single Records: " & UI.TimerGetElapsedTime
   
   UI.TimerInit
   LoadTest ekTestOptions.EggHeadTest2
   Debug.Print "Multiple Records: " & UI.TimerGetElapsedTime
   
   ExitApp
   
End Sub
 

Public Sub ExitApp()
    Set UI = Nothing
    Err.Clear
    End
End Sub

Public Sub DeleteExistingTestRecords()
   On Error GoTo ErrHandler
   
    If UI.DBOpen = True Then
       UI.ADOCon.Execute "delete  from eggheadtest1"
       UI.ADOCon.Execute "delete  from eggheadtest1trans"
       UI.ADOCon.Execute "delete  from eggheadtest2"
       UI.ADOCon.Execute "delete  from eggheadtest2trans"
    End If
 
ErrHandler:
     If Err.Number <> 0 Then MsgBox "Failure to delete test records. " & vbCrLf & Err.Description
          On Error Resume Next
          UI.DBClose
          Err.Clear
End Sub


Public Sub CreateTestFile()

            Dim lTot As Long
            Dim lCnt As Long
            Dim oFS As Scripting.FileSystemObject
            Dim oFSFile As Scripting.TextStream
 
            On Error GoTo ErrHandler
   
            Const ForWriting = 2

            Set oFS = New Scripting.FileSystemObject
 
             If oFS.FileExists(UI.AppPath & mksTestFileName) = True Then 
			                 oFS.DeleteFile UI.AppPath & mksTestFileName, True
   
                 Set oFSFile = oFS.OpenTextFile(UI.AppPath & mksTestFileName, ForWriting, True)
 
            lTot = 20005
         
           ' Record length = 100
         
            For lCnt = 1 To lTot
                    oFSFile.Write "4075551212" & "Rush" & Space(6) & "Limbaugh" & Space(2)
					& "12 Main St" & Space(2) & Space(44) & "Test" & vbCrLf
                   DoEvents
            Next
            
            oFSFile.Close
                   
          
ErrHandler:
          If Err.Number <> 0 Then MsgBox "Failure to open file for writing. " &
		                vbCrLf & Err.Description
          On Error Resume Next
          Set oFSFile = Nothing
          Set oFS = Nothing
          Err.Clear
End Sub


Public Sub LoadTest(ByRef ekTestOption As ekTestOptions)

      Dim sRecord As String
      Dim lTotRecCnt As Long
      Dim lRecLength As Long
      Dim sMassRec As String
      Dim lMassRecCnt As Long
      Dim lStartRecord As Long
      Dim lRecordCnt As Long
      Dim lRecordCntMax As Long
      Dim lRecs As String
     
      
      On Error GoTo ErrHandler
      
      ' If we use the Open/For Input method, it allows us to reference one record at
	  ' a time .v loading up a 40-50MB file into
      ' a string variable with the Scripting.FileSystemObject.  But, remember that it
	  ' removes the vbcrlf that was at the end of each
      ' record.  Thus, we adjusted the previous record length of 100 to 98.
      
      UI.HourGlass True
      
      lRecLength = 98
      lRecordCntMax = 1000
      
      UI.DBOpen
      UI.DBClearParameters
      
      Open UI.AppPath & mksTestFileName For Input As #1
      
       Do Until EOF(1)
       
              lTotRecCnt = lTotRecCnt + 1
              Line Input #1, sRecord
              DoEvents
              Select Case ekTestOption
              
                          Case ekTestOptions.EggHeadTest1
                          
                               With UI.ADOCom
                                 .CommandText = "spEggHeadTest" & ekTestOption
                                 .CommandType = adCmdStoredProc
                                 .Parameters.Append 
                                 .CreateParameter("@RETURN_VALUE", 
                                        adInteger, adParamReturnValue, 0)
                                        .Parameters.Append
                                 .CreateParameter("@Phone",
                                adVarWChar, adParamInput, 10, Mid(sRecord, 1, 10))
                                 .Parameters.Append .CreateParameter("@FName",
                                  adVarWChar, adParamInput, 50, Mid(sRecord, 11, 10))
                                 .Parameters.Append .CreateParameter("@LName",
                                  adVarWChar, adParamInput, 50, Mid(sRecord, 21, 10))
                                 .Parameters.Append .CreateParameter("@RecordData",
                                   adVarWChar, adParamInput, 500, Mid(sRecord, 31, 64))
                                 .Parameters.Append .CreateParameter("@LastPieceOfRecord", 
                                  adVarWChar, adParamInput, 50, Mid(sRecord, 95, 4))
                                 .Parameters.Append .CreateParameter("@RecordNum",
				         adInteger, adParamInput, , lTotRecCnt)
                                        .Execute lRecs, , adExecuteNoRecords
                                    End With
 
                                    UI.DBClearParameters
                          
                          
                          Case ekTestOptions.EggHeadTest2
                          
                                  
                                     lMassRecCnt = lMassRecCnt + 1
                                     
                                    If lMassRecCnt = 1 Then
                                       UI.StringBuilderInit
                                       lStartRecord = lTotRecCnt
                                    End If
                                    
                                      UI.StringBuilderAppend sRecord
                                      
                                      If lTotRecCnt Mod lRecordCntMax = 0 Then
                                          
                                         sMassRec = UI.StringBuilderConcat
                                         UI.StringBuilderInit
                                         
                                            With UI.ADOCom
                                                  .CommandText = "spEggHeadTest" & ekTestOption
                                                  .CommandType = adCmdStoredProc
                                                  .Parameters.Append
                                             .CreateParameter("@RETURN_VALUE", adInteger,
                                                   adParamReturnValue, 0)
                                                  .Parameters.Append 
                                              .CreateParameter("@RecLength", adInteger,													
                                          adParamInput, , lRecLength)
                                              .Parameters.Append
                                           .CreateParameter("@StartRecord",
                                              adInteger, adParamInput, , lStartRecord)
                                           .Parameters.Append 
                                           .CreateParameter("@CurRecordCnt",
                                             adInteger, adParamInput, , lMassRecCnt)
                                           .Parameters.Append 
                                           .CreateParameter("@RecordData", adLongVarWChar,
                                              adParamInput, 1073741823, sMassRec)
                                                  .Execute lRecs, , adExecuteNoRecords
                                              End With
                                             
                                              UI.DBClearParameters
                                               lMassRecCnt = 0
                                   
                                    End If
                                     
              End Select
              
       Loop
       
         Close #1
       
       If ekTestOption = ekTestOptions.EggHeadTest2 Then
         If ((lMassRecCnt < lRecordCntMax) And (lMassRecCnt > 0)) Then
         
             sMassRec = UI.StringBuilderConcat
              
              With UI.ADOCom
                       .CommandText = "spEggHeadTest" & ekTestOption
                       .CommandType = adCmdStoredProc
                       .Parameters.Append .CreateParameter("@RETURN_VALUE",
			                      adInteger, adParamReturnValue, 0)
                       .Parameters.Append .CreateParameter("@RecLength", 
                       adInteger, adParamInput, , lRecLength)
                       .Parameters.Append .CreateParameter("@StartRecord",
			                  adInteger, adParamInput, , lStartRecord)
                       .Parameters.Append .CreateParameter("@CurRecordCnt",
			                adInteger, adParamInput, , lMassRecCnt)
                       .Parameters.Append .CreateParameter("@RecordData",
			        adLongVarWChar, adParamInput, 1073741823, sMassRec)
                       .Execute lRecs, , adExecuteNoRecords
              End With
                            
             UI.DBClearParameters
            UI.StringBuilderInit
            
        End If
      End If
      
  

ErrHandler:
          UI.HourGlass False
          If Err.Number <> 0 Then MsgBox "Error loading test file. "
		                & vbCrLf & Err.Description
          On Error Resume Next
          Close #1
          UI.DBClose
          Err.Clear
End Sub

Visual Basic cUI.cls (Support Class Module With Common Methods)

   
  Option Explicit

Private msStringBuilderArray() As String
Private mlStringBuilderGrowthRate As Long
Private mlStringBuilderNumItems As Long
Private mlStartTimer As Long
Public AppPath As String
Public ADOCon As ADODB.Connection
Public ADOCom As ADODB.Command
 
Private Sub Class_Initialize()
   Set ADOCon = New ADODB.Connection
   Set ADOCom = New ADODB.Command
End Sub

Private Sub Class_Terminate()
   On Error Resume Next
   ADOCon.Close
   Set ADOCom = Nothing
   Set ADOCon = Nothing
   Err.Clear
End Sub

Public Function DBOpen() As Boolean
   On Error Resume Next
     Me.ADOCon.Close
     Err.Clear
   On Error GoTo ErrHandler
       DBOpen = False
                 
      Me.ADOCon.Open "Provider=SQLOLEDB;Data Source=(local);
	                 User ID=yourusr;Password=yourpwd;Initial Catalog=YOURDB"
      Set ADOCom.ActiveConnection = Me.ADOCon
       DBOpen = True

ErrHandler:
         If Err.Number <> 0 Then MsgBox "Unable to connect to " &
		          vbCrLf & Err.Description
         Err.Clear
End Function

Public Sub DBClose()
     On Error Resume Next
     Me.ADOCon.Close
     Err.Clear
End Sub
 
Public Sub HourGlass(ByVal fOn As Boolean)
     If fOn = True Then
        Screen.MousePointer = vbHourglass
     Else
        Screen.MousePointer = vbDefault
     End If
End Sub

   
Public Function WriteFile(ByVal sFilePathAndName As String,
                          ByVal sFileContents As String) As Boolean

  Dim oFS As Scripting.FileSystemObject
  Dim oFSFile
 
  On Error GoTo ErrHandler
  
  WriteFile = False
  
  Const ForWriting = 2

  Set oFS = New Scripting.FileSystemObject
 
  If oFS.FileExists(sFilePathAndName) = True Then oFS.DeleteFile sFilePathAndName, True
   
  Set oFSFile = oFS.OpenTextFile(sFilePathAndName, ForWriting, True)

  oFSFile.Write (sFileContents)
  
  oFSFile.Close
  
  WriteFile = True
  
  
ErrHandler:
          If Err.Number <> 0 Then 
            MsgBox "Failure to open file for writing. " &
			        vbCrLf & sFilePathAndName & vbCrLf & Err.Description
           end if
          On Error Resume Next
          Set oFSFile = Nothing
          Set oFS = Nothing
          Err.Clear
End Function
 
Public Sub TimerInit()
           mlStartTimer = Timer
End Sub
Public Function TimerGetElapsedTime() As Long
            TimerGetElapsedTime = Timer - mlStartTimer
End Function
       
Public Sub CatStr(ByRef sOrgVal As String, ByRef sVal As String)
      ' Look at the egghead article for using arrays as a string builder.
      ' Since this sub is not part of the article's speed evaluation, I didn't bother.
           sOrgVal = sOrgVal & sVal
End Sub

Public Sub DBClearParameters()
   
   Dim nCnt
   On Error Resume Next
  
   If Me.ADOCom.Parameters.Count < 1 Then Exit Sub
   For nCnt = 0 To Me.ADOCom.Parameters.Count
       Me.ADOCom.Parameters.Delete 0
   Next
  Err.Clear
End Sub

Public Sub StringBuilderInit()
     On Error Resume Next
     Erase msStringBuilderArray
     Me.StringBuilderInit
     Err.Clear
     mlStringBuilderGrowthRate = 50: mlStringBuilderNumItems = 0
     ReDim msStringBuilderArray(mlStringBuilderGrowthRate)
End Sub

Public Sub StringBuilderAppend(ByVal strValue As String)
       If mlStringBuilderNumItems > UBound(msStringBuilderArray) Then 
        ReDim Preserve msStringBuilderArray(UBound(msStringBuilderArray)
		          + mlStringBuilderGrowthRate)
       end if
       msStringBuilderArray(mlStringBuilderNumItems) = 
	            strValue: mlStringBuilderNumItems = mlStringBuilderNumItems + 1
End Sub

 

Public Function StringBuilderConcat()
           ReDim Preserve msStringBuilderArray(mlStringBuilderNumItems)
           StringBuilderConcat = Join(msStringBuilderArray, "")
End Function
 

SQL Server Table Schemas And Stored Procedures

 
   

CREATE TABLE [dbo].[EggHeadTest1] (
	[RecordID] [int] IDENTITY (1, 1) NOT NULL ,
	[Phone] [nvarchar] (10) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
	[FName] [nvarchar] (50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
	[LName] [nvarchar] (50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
	[RecordData] [nvarchar] (500) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
	[LastPieceOfRecord] [nvarchar] (50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
	[RecordNum] [int] NOT NULL ,
	[LastUpdateTime] [datetime] NULL 
) ON [PRIMARY]
GO

CREATE TABLE [dbo].[EggHeadTest1Trans] (
	[TransID] [int] IDENTITY (1, 1) NOT NULL ,
	[RecordID] [int] NOT NULL ,
	[DispositionID] [int] NOT NULL ,
	[CallDateTime] [datetime] NULL ,
	[LastUpdateTime] [datetime] NOT NULL 
) ON [PRIMARY]
GO

CREATE TABLE [dbo].[EggHeadTest2] (
	[RecordID] [int] IDENTITY (1, 1) NOT NULL ,
	[Phone] [nvarchar] (10) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
	[FName] [nvarchar] (50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
	[LName] [nvarchar] (50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
	[RecordData] [nvarchar] (500) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
	[LastPieceOfRecord] [nvarchar] (50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
	[RecordNum] [int] NOT NULL ,
	[LastUpdateTime] [datetime] NULL 
) ON [PRIMARY]
GO

CREATE TABLE [dbo].[EggHeadTest2Trans] (
	[TransID] [int] IDENTITY (1, 1) NOT NULL ,
	[RecordID] [int] NOT NULL ,
	[DispositionID] [int] NOT NULL ,
	[CallDateTime] [datetime] NULL ,
	[LastUpdateTime] [datetime] NOT NULL 
) ON [PRIMARY]
GO

ALTER TABLE [dbo].[EggHeadTest1] WITH NOCHECK ADD 
	CONSTRAINT [PK_EggHeadTest1] PRIMARY KEY  CLUSTERED 
	(
		[RecordID]
	)  ON [PRIMARY] 
GO

ALTER TABLE [dbo].[EggHeadTest1Trans] WITH NOCHECK ADD 
	CONSTRAINT [DF_EggHeadTest1Trans_DispositionID] DEFAULT (0) FOR [DispositionID],
	CONSTRAINT [PK_EggHeadTest1Trans] PRIMARY KEY  CLUSTERED 
	(
		[TransID]
	)  ON [PRIMARY] 
GO

ALTER TABLE [dbo].[EggHeadTest2] WITH NOCHECK ADD 
	CONSTRAINT [PK_EggHeadTest2] PRIMARY KEY  CLUSTERED 
	(
		[RecordID]
	)  ON [PRIMARY] 
GO

ALTER TABLE [dbo].[EggHeadTest2Trans] WITH NOCHECK ADD 
	CONSTRAINT [DF_EggHeadTest2Trans_DispositionID] DEFAULT (0) FOR [DispositionID],
	CONSTRAINT [PK_EggHeadTest2Trans] PRIMARY KEY  CLUSTERED 
	(
		[TransID]
	)  ON [PRIMARY] 
GO




CREATE PROCEDURE dbo.spEggHeadTest1
 (
  @Phone nvarchar(10), 
  @FName nvarchar(50), 
  @LName nvarchar(50), 
  @RecordData nvarchar(500), 
  @LastPieceOfRecord nvarchar(50),
  @RecordNum int
) AS

DECLARE @rc int
DECLARE @NewID int
 
BEGIN TRANSACTION EggHeadTest

    insert EggHeadTest1 (Phone,FName,LName,RecordData,LastPieceOfRecord,
               RecordNum,LastUpdateTime)
      values  (@Phone,@FName,@LName,@RecordData,@LastPieceOfRecord,
               @RecordNum,GetDate())
  
       if (@@ERROR <> 0)                
           BEGIN
              ROLLBACK TRANSACTION EggHeadTest
                 select @rc = @@Error
                 Goto OnExit   
          END

    SELECT @NewID = @@IDENTITY
  
    insert EggHeadTest1Trans (RecordID,DispositionID,
                             CallDateTime,LastUpdateTime)
        values (@NewID,0,null,GetDate())

   if (@@ERROR <> 0)                
         BEGIN
              ROLLBACK TRANSACTION EggHeadTest
                 select @rc = @@Error
                 Goto OnExit   
          END

   COMMIT TRANSACTION EggHeadTest
          
                         

OnExit:
  RETURN @rc
GO
SET QUOTED_IDENTIFIER OFF 
GO
SET ANSI_NULLS ON 
GO

SET QUOTED_IDENTIFIER OFF 
GO
SET ANSI_NULLS OFF 
GO

CREATE PROCEDURE dbo.spEggHeadTest2
 (
  @RecLength int,
  @StartRecord int,
  @CurRecordCnt int,
  @RecordData ntext
) AS

   
DECLARE @rc int
DECLARE @TotRecCnt int
DECLARE @NewID int
DECLARE @nCnt int
DECLARE @RecStart int
DECLARE @TmpData varchar(8000)
 
Select @TotRecCnt = @StartRecord - 1
Select @RecStart = 1
Select @nCnt = 1

 BEGIN TRANSACTION EggHeadTest 
 
WHILE (@nCnt <= @CurRecordCnt)
        BEGIN  
          
           Select @TotRecCnt = @TotRecCnt + 1      
		      /*  Increment our record count for the whole file */
           Select @TmpData = SUBSTRING(@RecordData,@RecStart,@RecLength)

   insert EggHeadTest2 (Phone,FName,LName,RecordData,
                 LastPieceOfRecord,RecordNum,LastUpdateTime)
   values  (SUBSTRING(@TmpData,1,10),SUBSTRING(@TmpData,11,10),
              SUBSTRING(@TmpData,21,10),
                SUBSTRING(@TmpData,31,64),
            SUBSTRING(@TmpData,95,4),@TotRecCnt, GetDate())
    
       if (@@ERROR <> 0)                
           BEGIN
               ROLLBACK TRANSACTION EggHeadTest 
                select @rc = @@Error 
                 Goto OnExit   
          END

    SELECT @NewID = @@IDENTITY
   
    insert EggHeadTest2Trans (RecordID,DispositionID,
                 CallDateTime,LastUpdateTime)
        values (@NewID,0,null,GetDate())

   if (@@ERROR <> 0)                
         BEGIN
          ROLLBACK TRANSACTION EggHeadTest 
          select @rc = @@Error  
           Goto OnExit   
          END





           Select @nCnt = @nCnt + 1 
           Select @RecStart  = @RecStart + @RecLength

        END   /* End of loop  */
 

     COMMIT TRANSACTION EggHeadTest  
          
                         

OnExit:
  RETURN @rc
GO
SET QUOTED_IDENTIFIER OFF 
GO
SET ANSI_NULLS ON 
GO
  
 

This entry was posted in Uncategorized. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s