This page provides you with instructions on how to extract data from Microsoft SQL Server and load it into Snowflake. (If this manual process sounds onerous, check out Stitch, which can do all the heavy lifting for you in just a few clicks.)
What is Microsoft SQL Server?
Microsoft SQL Server is a relational database management system designed by Microsoft. Storing an retrieving data for applications both on a single machine and across the web (or any network) is the primary function of MSSQL. Folks use Microsoft SQL Server because is supports .NET framework out of the box, and integrates nicely into the Microsoft ecosystem.
Snowflake is a data warehouse solution that is entirely cloud based. It's a managed service. If you don't want to deal with hardware, software, or upkeep for a data warehouse you're going to love Snowflake. It runs on the wicked fast Amazon Web Services architecture using EC2 and S3 instances. Snowflake is designed to be flexible and easy to work with where other relational databases are not. One example of this is the query execution. Snowflake creates virtual warehouses where query processing takes place. These virtual warehouses run on separate compute clusters, so querying one of these virtual warehouses doesn't slow down the others. If you have ever had to wait for a query to complete, you know the value of speed and efficiency for query processing.
Getting data out of SQL Server
Depending on your particular needs and abilities, there are a few methods of getting data out of MSSQL.
The default for most folks who work with databases will be to use queries for extraction. The main benefit here is the flexibility available through putting together SELECT statements. You can filter, sort, and limit to your hearts content. This can be especially useful if you care only about retrieving a certain subset of data from your MSSQL database.
If you need to export data in bulk, you can use Microsoft SQL Server Management Studio which enables you to export entire tables and databases in formats like text, CSV, or SQL queries meant to restore the database if run.
Preparing data for Snowflake
Depending on the structure that you data is in, you may need to prepare it for loading. Take a look at the supported data types for Snowflake and make sure that the data you've got will map neatly to them. If you have a lot of data, you should compress it. Gzip, bzip2, Brotli, Zstandard v0.8 and deflate/raw deflate compression types are all supported.
One important thing to note here is that you don't need to define a schema in advance when loading JSON data into Snowflake. Onward to loading!
Loading data into Snowflake
Keeping SQL Server data up to date
All set! You've written a script to move data from MSSQL into your data warehouse. What happens when there is new data in your MSSQL server that needs to be added to your data warehouse? Data freshness is one of the most important aspects of any analysis.
You could load the entire MSSQL database again. This is almost guaranteed to be slow and painful, causing all kinds of latency.
The key here is to build your script to recognize new and updated records in the source database. Using an auto-incrementing field as a primary key is a great way to accomplish this. The key functions something like a bookmark, so your script can resume where it left off. When you've built in this functionality, you can set up your script as a cron job or continuous loop to get new data as it appears in MSSQL.
Easier and Faster Alternatives
If all this sounds a bit overwhelming, don’t be alarmed. If you have all the skills necessary to go through this process, chances are building and maintaining a script like this isn’t a very high-leverage use of your time.
Thankfully, products like Stitch were built to solve this problem automatically. With just a few clicks, Stitch starts extracting your Microsoft SQL Server data via the API, structuring it in a way that is optimized for analysis, and inserting that data into your Snowflake data warehouse.