Bulk Insert in .NET applications, part 1


This is the first of a 2 series articles about how to perform bulk inserts in your .NET applications, using a SQL Server database.

In this article I’ll show how to create a wrapper object for SqlBulkCopy that can do a bulk insert for a collection of objects. In the second article I’ll show how that wrapper can be easily integrated with Entity Framework (creating extension methods for DbContext and ObjectContext).

Table of contents

The problem

I am working in an application that parses Excel files and creates an object for each line. After some validations and properties mapping the objects are then being inserted in a database table, one by one. At the beggining this wasn’t a big problem because the number of objects to insert in the database was small, but now there can be files with thousands of rows, so using Entity Framework isn’t the best way to do it (currently Entity Framework has no support for bulk insert operations).

The solution

I decided to use the object SqlBulkCopy because it seemed the best and easiest option for performing a bulk insert in a SQL Server database. I created a wrapper for SqlBulkCopy – BulkCopy.cs. This class works the same way as SqlBulkCopy but has some extra features.

The following properties are available:

  • DestinationTableName: Name of the destination table on the server
  • BatchSize (optional): Number of rows in each batch. At the end of each batch, the rows in the batch are sent to the server
  • ConnectionString: Database connection string
  • ExpressionFilter: Filters the properties to be included

And these are the methods available (see examples below):

public void WriteToServer<T>(IEnumerable<T> items) where T : class;
public void WriteToServer<T>(IEnumerable<T> items, SqlBulkCopyOptions options) where T : class;
public void WriteToServer<T>(IEnumerable<T> items, SqlBulkCopyOptions options, IEnumerable<SqlBulkCopyColumnMapping> columnMappings) where T : class;

BulkCopy.cs source code (click to expand):

using System;
using System.Collections.Generic;
using System.ComponentModel;
using System.Data.SqlClient;
using System.Data;
using System.Linq;
using System.Linq.Expressions;
using System.Text;

using BulkCopy.Extensions;

namespace BulkCopy
{
    /// <summary>
    /// Lets you efficiently bulk load a SQL Server table with data from another source.
    /// This is a wrapper class for <see cref="SqlBulkCopy"/>
    /// </summary>
    public class BulkCopy
    {
        /// <summary>
        /// Name of the destination table on the server
        /// </summary>
        public string DestinationTableName { get; set; }

        /// <summary>
        /// Number of rows in each batch. 
        /// At the end of each batch, the rows in the batch are sent to the server.
        /// </summary>
        public int? BatchSize { get; set; }

        /// <summary>
        /// Database connection string
        /// </summary>
        public string ConnectionString { get; set; }

        /// <summary>
        /// Filters the properties to be included
        /// </summary>
        public Func<PropertyDescriptor, bool> ExpressionFilter { get; set; }

		
		
        /// <summary>
        /// Initializes a new instance of the <see cref="BulkCopy&lt;T&gt;"/> class.
        /// </summary>
        public BulkCopy()
        {
        }

        /// <summary>
        /// Initializes a new instance of the <see cref="BulkCopy&lt;T&gt;"/> class.
        /// </summary>
        public BulkCopy(string connectionString)
        {
            this.ConnectionString = connectionString;
        }

		
		
        /// <summary>
        /// Copies all items in a collection to a destination table
        /// </summary>
        /// <param name="dataTable">The items that will be copied to the destination table</param>
        /// <param name="options">A combination of values from the System.Data.SqlClient.SqlBulkCopyOptions 
        /// enumeration that determines which data source rows are copied to the destination table. <see cref="SqlBulkCopyOptions"/></param>
        public virtual void WriteToServer<T>(IEnumerable<T> items) where T : class
        {
            WriteToServer(items, SqlBulkCopyOptions.Default);
        }

        /// <summary>
        /// Copies all items in a collection to a destination table
        /// </summary>
        /// <param name="dataTable">The items that will be copied to the destination table</param>
        /// <param name="options">A combination of values from the System.Data.SqlClient.SqlBulkCopyOptions 
        /// enumeration that determines which data source rows are copied to the destination table. <see cref="SqlBulkCopyOptions"/></param>
        public virtual void WriteToServer<T>(IEnumerable<T> items, SqlBulkCopyOptions options) where T : class
        {
            DataTable dataTable = (this.ExpressionFilter == null) ? items.ToDataTable() : items.ToDataTable(this.ExpressionFilter);

            WriteToServer(dataTable, options);
        }

        /// <summary>
        /// Copies all items in a collection to a destination table
        /// </summary>
        /// <param name="dataTable">The items that will be copied to the destination table</param>
        /// <param name="options">A combination of values from the System.Data.SqlClient.SqlBulkCopyOptions 
        /// enumeration that determines which data source rows are copied to the destination table. <see cref="SqlBulkCopyOptions"/></param>
        /// <param name="columnMappings">Returns a collection of System.Data.SqlClient.SqlBulkCopyColumnMapping items. 
        /// Column mappings define the relationships between columns in the data source and columns in the destination.</param>
        public virtual void WriteToServer<T>(IEnumerable<T> items, SqlBulkCopyOptions options, IEnumerable<SqlBulkCopyColumnMapping> columnMappings) where T : class
        {
            DataTable dataTable = (this.ExpressionFilter == null) ? items.ToDataTable() : items.ToDataTable(this.ExpressionFilter);

            WriteToServer(dataTable, options, columnMappings);
        }

        /// <summary>
        /// Copies all rows in the supplied System.Data.DataTable to a destination table
        /// </summary>
        /// <param name="dataTable">A System.Data.DataTable whose rows will be copied to the destination table</param>
        private void WriteToServer(DataTable dataTable)
        {
            WriteToServer(dataTable, SqlBulkCopyOptions.Default);
        }

        /// <summary>
        /// Copies all rows in the supplied System.Data.DataTable to a destination table
        /// </summary>
        /// <param name="dataTable">A System.Data.DataTable whose rows will be copied to the destination table</param>
        /// <param name="options">A combination of values from the System.Data.SqlClient.SqlBulkCopyOptions 
        /// enumeration that determines which data source rows are copied to the destination table. <see cref="SqlBulkCopyOptions"/></param>
        private void WriteToServer(DataTable dataTable, SqlBulkCopyOptions options)
        {
            var columnMappings = from x in dataTable.Columns.Cast<DataColumn>()
                                 select new SqlBulkCopyColumnMapping(x.ColumnName, x.ColumnName);

            WriteToServer(dataTable, options, columnMappings);
        }

        /// <summary>
        /// Copies all rows in the supplied System.Data.DataTable to a destination table
        /// </summary>
        /// <param name="dataTable">A System.Data.DataTable whose rows will be copied to the destination table</param>
        /// <param name="options">A combination of values from the System.Data.SqlClient.SqlBulkCopyOptions 
        /// enumeration that determines which data source rows are copied to the destination table. <see cref="SqlBulkCopyOptions"/></param>
        /// <param name="columnMappings">Returns a collection of System.Data.SqlClient.SqlBulkCopyColumnMapping items. 
        /// Column mappings define the relationships between columns in the data source and columns in the destination.</param>
        private void WriteToServer(DataTable dataTable, SqlBulkCopyOptions options, IEnumerable<SqlBulkCopyColumnMapping> columnMappings)
        {
            // table name matching:
            // checks for DestinationTableName value
            // if null or empty, checks for dataTable.TableName
            string destinationTableName =
                (string.IsNullOrWhiteSpace(DestinationTableName) ? null : DestinationTableName)
                ?? (string.IsNullOrWhiteSpace(dataTable.TableName) ? null : dataTable.TableName);

            if(string.IsNullOrWhiteSpace(destinationTableName))
                throw new ArgumentException("destinationTableName cannot be null or empty");

            using(var bulkCopy = new SqlBulkCopy(this.ConnectionString, options))
            {
                bulkCopy.DestinationTableName = destinationTableName;

                if(this.BatchSize.HasValue)
                    bulkCopy.BatchSize = this.BatchSize.Value;

                foreach(var mapping in columnMappings)
                    bulkCopy.ColumnMappings.Add(mapping);

                bulkCopy.WriteToServer(dataTable);
            }
        }
    }
}

The BulkCopy object is using internally some extension methods that converts a collection of objects to a DataTable. (taken from SO’s post Generic List to DataTable, with some small modifications).

Click to expand the source code:

using System;
using System.Collections.Generic;
using System.Collections.Specialized;
using System.Linq;
using System.Linq.Expressions;
using System.Text;
using System.Data;
using System.ComponentModel;

namespace BulkCopy.Extensions
{
    public static class DataExtensions
    {
        /// <summary>
        /// Basic data types 
        /// </summary>
        private static Type[] dataTypes = new[] {
            typeof(byte)
            ,typeof(sbyte)
            ,typeof(short)
            ,typeof(ushort)
            ,typeof(int)
            ,typeof(uint)
            ,typeof(long)
            ,typeof(ulong)
            ,typeof(float)
            ,typeof(double)
            ,typeof(decimal)
            ,typeof(bool)
            ,typeof(char)
            ,typeof(Guid)
            ,typeof(DateTime)
            ,typeof(DateTimeOffset)
            ,typeof(byte[])
            ,typeof(string)
        };

        /// <summary>
        /// Converts a generic List to a DataTable
        /// <see cref="http://stackoverflow.com/a/5805044"/>
        /// </summary>
        /// <typeparam name="T">Type of the object to convert to DataTable</typeparam>
        /// <param name="data">Data to be converted</param>
        /// <returns>The converted DataTable</returns>
        public static DataTable ToDataTable<T>(this IList<T> data)
        {
            IEnumerable<PropertyDescriptor> properties = from x in TypeDescriptor.GetProperties(typeof(T)).Cast<PropertyDescriptor>()
                                                         where IsBasicType(x.PropertyType)
                                                         select x;

            DataTable table = GetDataTable(data, properties);
            return table;
        }

        /// <summary>
        /// Converts a generic List to a DataTable
        /// <see cref="http://stackoverflow.com/a/5805044"/>
        /// </summary>
        /// <typeparam name="T">Type of the object to convert to DataTable</typeparam>
        /// <param name="data">Data to be converted</param>
        /// <returns>The converted DataTable</returns>
        public static DataTable ToDataTable<T>(this IList<T> data, Func<PropertyDescriptor, bool> expression)
        {
            var properties = TypeDescriptor.GetProperties(typeof(T))
                .Cast<PropertyDescriptor>()
                .Where(expression);

            DataTable table = GetDataTable(data, properties);
            return table;
        }

        /// <summary>
        /// Converts an IEnumerable to a DataTable
        /// <see cref="http://stackoverflow.com/a/5805044"/>
        /// </summary>
        /// <typeparam name="T">Type of the object to convert to DataTable</typeparam>
        /// <param name="data">Data to be converted</param>
        /// <returns>The DataTable</returns>
        public static DataTable ToDataTable<T>(this IEnumerable<T> data)
        {
            return data.ToList().ToDataTable();
        }

        /// <summary>
        /// Converts an IEnumerable to a DataTable
        /// <see cref="http://stackoverflow.com/a/5805044"/>
        /// </summary>
        /// <typeparam name="T">Type of the object to convert to DataTable</typeparam>
        /// <param name="data">Data to be converted</param>
        /// <param name="expression">Predicate to filter the properties of <typeparamref name="T"/> to be included to the DataTable</param>
        /// <returns>The DataTable</returns>
        public static DataTable ToDataTable<T>(this IEnumerable<T> data, Func<PropertyDescriptor, bool> expression)
        {
            return data.ToList().ToDataTable(expression);
        }

        #region Private methods

        private static bool IsBasicType(Type type)
        {
            type = Nullable.GetUnderlyingType(type) ?? type;

            return type.IsEnum || dataTypes.Contains(type);
        }

        private static DataTable GetDataTable<T>(this IList<T> data, IEnumerable<PropertyDescriptor> mappedProperties)
        {
            DataTable table = new DataTable();

            // columns
            foreach(PropertyDescriptor prop in mappedProperties)
            {
                table.Columns.Add(prop.Name, Nullable.GetUnderlyingType(prop.PropertyType) ?? prop.PropertyType);
            }

            // row values
            foreach(T item in data)
            {
                DataRow row = table.NewRow();

                foreach(PropertyDescriptor prop in mappedProperties)
                {
                    object value = prop.GetValue(item) ?? DBNull.Value;
                    row[prop.Name] = value;
                }

                table.Rows.Add(row);
            }

            return table;
        }

        #endregion
    }
}

Using the code

Assuming a table named dbo.Albums (I’m using Mvc Music Store model):

1. Basic usage

This example shows how to use a model object whose property names match the database column names from the table above:

public class Album
{
    public virtual int AlbumId { get; set; }
    public virtual int GenreId { get; set; }
    public virtual int ArtistId { get; set; }
    public virtual string Title { get; set; }
    public virtual decimal Price { get; set; }
    public virtual string AlbumArtUrl { get; set; }
}

All you need to do is to specify the connection string and the destination table:

IEnumerable<Album> data = GetData();

var bulkCopy = new BulkCopy() {
    ConnectionString = ConnectionString,
    DestinationTableName = "dbo.Albums"
};

bulkCopy.WriteToServer(data);

2. Specifying batch size and bulk options

Using the model from the previous example, you can specify some other options:

IEnumerable<Album> data = GetData();

var bulkCopy = new BulkCopy() {
    BatchSize = 200,
    ConnectionString = ConnectionString,
    DestinationTableName = "dbo.Albums"
};

// SqlBulkCopyOptions.CheckConstraints: Check constraints while data is being inserted. 
// By default, constraints are not checked.
bulkCopy.WriteToServer(data, SqlBulkCopyOptions.CheckConstraints);

3. Using column mappings

This example shows how to use a model object whose property names don’t match the database column names:

public class Album
{
    public virtual int Id { get; set; }
    public virtual int GenreId { get; set; }
    public virtual int ArtistId { get; set; }
    public virtual string AlbumTitle { get; set; }
    public virtual decimal Price { get; set; }
    public virtual string AlbumArtImage { get; set; }
}

In this case we have to create the column mappings. It can be done like this:

private static IEnumerable<SqlBulkCopyColumnMapping> GetColumnMappings()
{
    yield return new SqlBulkCopyColumnMapping("Id", "AlbumId");
    yield return new SqlBulkCopyColumnMapping("GenreId", "GenreId");
    yield return new SqlBulkCopyColumnMapping("ArtistId", "ArtistId");
    yield return new SqlBulkCopyColumnMapping("AlbumTitle", "Title");
    yield return new SqlBulkCopyColumnMapping("Price", "Price");
    yield return new SqlBulkCopyColumnMapping("AlbumArtImage", "AlbumArtUrl");
}

Finally, the bulk insert can be done this way:

IEnumerable<Album> data = GetData();
IEnumerable<SqlBulkCopyColumnMapping> mappings = GetColumnMappings();

var bulkCopy = new BulkCopy() {
    ConnectionString = ConnectionString,
    DestinationTableName = "dbo.Albums"
};

bulkCopy.WriteToServer(data,SqlBulkCopyOptions.Default, mappings);

4. Filtering properties to be copied

This example shows how to filter the properties of a model object to be used in the bulk insert:

public partial class Album
{
    public virtual int AlbumId { get; set; }
    public virtual int GenreId { get; set; }
    public virtual int ArtistId { get; set; }
    public virtual string Title { get; set; }
    public virtual decimal Price { get; set; }
    public virtual string AlbumArtUrl { get; set; }
    public virtual string P1 { get; set; }
    public virtual string P2 { get; set; }
}

Properties P1 and P2 don’t match any column of the table above, so they cannot be used in the bulk insert operation. Creating a filter to exclude those properties and using it can be done like this:

// properties to exclude from mapping
var nonMappedProperties = new string[] { "P1", "P2" };

Func<PropertyDescriptor, bool> expression = x => 
    !nonMappedProperties.Contains(x.Name);

IEnumerable<Album> data = GetData();

var bulkCopy = new BulkCopy() {
    BatchSize = 200,
    ConnectionString = ConnectionString,
    DestinationTableName = "dbo.Albums",
    ExpressionFilter = expression
};

bulkCopy.WriteToServer(data, SqlBulkCopyOptions.CheckConstraints);

That’s it! In the next article I’ll show you how to integrate easily this wrapper class in Entity Framework (creating extension methods for DbContext and ObjectContext).

References

Downloads

Download the demo project (VS2010): BulkCopy-part1.zip

Technorati Tags: , , ,
About these ads

7 thoughts on “Bulk Insert in .NET applications, part 1

  1. Yeah… I must say: Thanks you very much.
    I’ve looked for translator from webservice-response to sqlbulkcopy object. and your solution meet my expectation. Hey, I was searching for it in several days.. and very happy founding this article.
    Anyway, with minor modification, the project is running well in VS2008.

      • oh yes, VS2008 did not understand the “string.IsNullOrWhiteSpace”, so I replace all of it occurrence with “string.IsNullOrEmpty”.
        ————————————————————
        in *.csproj, I replace this :

        Debug
        x86
        8.0.30703

        with this:

        Debug
        x86
        9.0.30729

        and delete these lines:
        ” ”

        False
        Microsoft .NET Framework 4 Client Profile %28×86 and x64%29
        true

        —————————————————————

        And in file *.sln, I replace this:
        “Microsoft Visual Studio Solution File, Format Version 11.00
        # Visual Studio 2010″
        with this:
        “Microsoft Visual Studio Solution File, Format Version 10.00
        # Visual Studio 2008″
        ————————————————————-

        and everything are now going well. :)

      • oops, my prior reply contains xml tag, and it seem your wordpress-comment delete some part of it. it is now didnt similar as It written by my.
        Actually I just compare the BulkCopy.sln with any of my local *.sln, also for Bulkcopy.csproj. Without modifiying that, VS2008 reports about the solution was created by the Visual studio that is newer version and can not recognize the files. :)

  2. Hi Rui,

    Thanks for the great post. However, I miss the second part! I could REALLY use that wrapper and see how you implemented it! Could you at least upload the code to the article?

    Thanks a lot man!

  3. Hey, I really enjoyed this article, did you figure out how to integrate this wrapper with Entity Framework? I would like to see your implementation.

    • Hi Justin,

      Yes I found a way to integrate it with EF.
      I’ve been quite busy in the last months, I didn’t have the time yet to create the 2nd part of the article.

      Basically you need to find a way to get the connection for the DbContext or ObjectContext used, and how to get the mapped table name for a given entity. You can then use the class that I created in this article.

      Hope it helps!

      Regards,
      Rui

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s