Microsoft SQL Server to Snowflake

This page provides you with instructions on how to extract data from Microsoft SQL Server and load it into Snowflake. (If this manual process sounds onerous, check out Stitch, which can do all the heavy lifting for you in just a few clicks.)

What is Microsoft SQL Server?

Microsoft SQL Server is a relational database management system designed by Microsoft. Storing an retrieving data for applications both on a single machine and across the web (or any network) is the primary function of MSSQL. Folks use Microsoft SQL Server because is supports .NET framework out of the box, and integrates nicely into the Microsoft ecosystem.

About Snowflake

Snowflake is a data warehouse solution that is entirely cloud based. It's a managed service. If you don't want to deal with hardware, software, or upkeep for a data warehouse you're going to love Snowflake. It runs on the wicked fast Amazon Web Services architecture using EC2 and S3 instances. Snowflake is designed to be flexible and easy to work with where other relational databases are not. One example of this is the query execution. Snowflake creates virtual warehouses where query processing takes place. These virtual warehouses run on separate compute clusters, so querying one of these virtual warehouses doesn't slow down the others. If you have ever had to wait for a query to complete, you know the value of speed and efficiency for query processing.

Getting data out of SQL Server

Depending on your particular needs and abilities, there are a few methods of getting data out of MSSQL.

The default for most folks who work with databases will be to use queries for extraction. The main benefit here is the flexibility available through putting together SELECT statements. You can filter, sort, and limit to your hearts content. This can be especially useful if you care only about retrieving a certain subset of data from your MSSQL database.

If you need to export data in bulk, you can use Microsoft SQL Server Management Studio which enables you to export entire tables and databases in formats like text, CSV, or SQL queries meant to restore the database if run.

Preparing data for Snowflake

Depending on the structure that you data is in, you may need to prepare it for loading. Take a look at the supported data types for Snowflake and make sure that the data you've got will map neatly to them. If you have a lot of data, you should compress it. Gzip, bzip2, Brotli, Zstandard v0.8 and deflate/raw deflate compression types are all supported.

One important thing to note here is that you don't need to define a schema in advance when loading JSON data into Snowflake. Onward to loading!

Loading data into Snowflake

There is a good reference for this step in the Data Loading Overview section of the Snowflake documentation. If there isn’t much data that you’re trying to load, then you might be able to use the data loading wizard in the Snowflake web UI. Chances are, the limitations on that tool will make it a non-starter as a reliable ETL solution. There two main steps to getting data into Snowflake:

  • Use the PUT command to stage files
  • Use the COPY INTO table command to load prepared data into the awaiting table from the prior step.

For the COPY step, you’ll have the option of copying from your local drive, or from Amazon S3. One of Snowflakes’ slick features lets you to make a virtual warehouse that will power the insertion process.

Keeping SQL Server data up to date

All set! You've written a script to move data from MSSQL into your data warehouse. What happens when there is new data in your MSSQL server that needs to be added to your data warehouse? Data freshness is one of the most important aspects of any analysis.

You could load the entire MSSQL database again. This is almost guaranteed to be slow and painful, causing all kinds of latency.

The key here is to build your script to recognize new and updated records in the source database. Using an auto-incrementing field as a primary key is a great way to accomplish this. The key functions something like a bookmark, so your script can resume where it left off. When you've built in this functionality, you can set up your script as a cron job or continuous loop to get new data as it appears in MSSQL.

Easier and Faster Alternatives

If all this sounds a bit overwhelming, don’t be alarmed. If you have all the skills necessary to go through this process, chances are building and maintaining a script like this isn’t a very high-leverage use of your time.

Thankfully, products like Stitch were built to solve this problem automatically. With just a few clicks, Stitch starts extracting your Microsoft SQL Server data via the API, structuring it in a way that is optimized for analysis, and inserting that data into your Snowflake data warehouse.