Jan 28, 2012

Import data from Excel to SQLServer

Many times we come across a situation where we have readily available excel sheets having enormous data that needs to be imported in database system for enhanced querying capabilities or as a backend datasource for some sort of software development.


We could achieve this goal either by copy-pasting the excel rows directly to the destined database table in SQL Server Management Studio or by querying directly on the excel sheet from SQL Server Management Studio itself!


The former way (copy-pasting) looks pretty simple but consider a situation where you are having 10K records in a single excel sheet and wanted to fetch some selective/filtered records only! At that time, the earlier way is much efficient and ideal.


I've created this sample excel file to import/retrieve the data in SQL server, and is also the file I will be using in this article. Here is the screenshot for the same.


Excel sheet - Cars


OPENROWSET() to query excel files


Here, I've used the OPENROWSET() function to query excel files. This is a T-SQL function that can be used to access any OLE DB data source. All you need is the right OLE DB driver.


The following are the queries to retrieve data from Excel files.


--Excel 2007-2010
SELECT * --INTO #Cars
FROM OPENROWSET('Microsoft.ACE.OLEDB.12.0',
'Excel 12.0 Xml;HDR=YES;Database=C:\cars.xlsx','SELECT * FROM [cars$]');

--Excel 97-2003
SELECT * --INTO #Cars
FROM OPENROWSET('Microsoft.ACE.OLEDB.4.0',
'Excel 8.0;HDR=YES;Database=C:\cars.xls','SELECT * FROM [cars$]');


These queries return excel data in the results window when executing with SQL Server Management Studio. To insert the data into a table, just uncomment the INTO clause. That will create a local temporary table with the retrieved data in the tempdb database.


Here, HDR=YES means the 1st row in the excel sheet is a column name. If that's not the case set HDR = No instead YES.


Note : If you receive an error message in executing the query, i've given some possible errors with their respective solution below.


If you've executed uncommented INTO clause query then you can query the temporary table (here, it's Cars table) just like any other normal sql table.

SELECT * FROM #Cars


CHECK DATA TYPES OF TEMP TABLE (#Cars)



Let's have a look how such temporary tables are structured when created using OPENROWSET(). Execute the following command in query window.



USE tempdb;
GO
sp_help '#Cars'


The following screenshot shows the data types of each field in the  #Cars  temp table.


Structure of Temporary Table #Cars in SQL Server
I discovered that even with the formatted cells in the excel file the derived type and length of the columns in the temporary table is solely based on SQL Server's own algorithm.


I've formatted ProductId column in the Excel sheet as a number with zero decimal places. Even though, it is stored as a float type in the temporary table!


If you wish to avoid such behavior, it's always better to import data directly in SQL tables that are created already with the desired data-types.


QUERY TO IMPORT DATA DIRECTLY IN SQL TABLES



--Excel 2007-2010
INSERT INTO DATABASE.SCHEMA.TABLENAME
SELECT * FROM OPENROWSET('Microsoft.ACE.OLEDB.12.0',
'Excel 12.0 Xml;HDR=YES;Database=C:\cars.xlsx','SELECT * FROM [cars$]');

--Excel 97-2003
INSERT INTO DATABASE.SCHEMA.TABLENAME
SELECT * FROM OPENROWSET('Microsoft.ACE.OLEDB.4.0',
'Excel 8.0;HDR=YES;Database=C:\cars.xls','SELECT * FROM [cars$]');


It's up to you to decide where to store records. You can store either in  Temporary table or in SQL Table with the desired table structure and types.


Possible Issues


Enable 'AD HOC DISTRIBUTED QUERIES'


AD HOC DISTRIBUTED QUERIES option must be enabled on the server to use the OPENROWSET() function. If it's not enabled on your server then you will get the following error message.

Msg 15281, Level 16, State 1, Line 1
SQL Server blocked access to STATEMENT ‘OpenRowset/OpenDatasource’ of component ‘Ad Hoc Distributed Queries’ because this component is turned off as part of the security configuration for this server. A system administrator can enable the use of ‘Ad Hoc Distributed Queries’ by using sp_configure. For more information about enabling ‘Ad Hoc Distributed Queries’, see “Surface Area Configuration” in SQL Server Books Online.
You can enable this option through SQL Server Surface Area Configuration tool (Start > All Programs > Microsoft SQL Server > Configuration Tools > SQL Server Surface Area Configuration > Surface Area Configuration for Features). See the following screenshot.
Enabling ADHOC Distributed queries


OLE DB DRIVER IS NOT INSTALLED


OPENROWSET() function needs OLE DB drivers installed. If the right driver is not installed you will see error something like following one.
Msg 7302, Level 16, State 1, Line 1Cannot create an instance of OLE DB provider “Microsoft.ACE.OLEDB.12.0″ for linked server “(null)”.
To check the driver is installed or not open ODBC Data Source Administrator (Start > Run > type ODBCAD32.EXE and hit enter) and check under the Drivers tab. The following screenshot shows the both drivers JET 4.0 for Excel 97-2003 and new ACE Driver for Excel 2007.




In the absence of required drivers you can download the same from the Microsoft's site.
Excel 97-2003 JET 4.0 driver
Excel 2007 ACE driver
Excel 2010 ACE driver (beta)

That's it! Have fun! :)

The Evolution of Programming World