Loading BigQuery into Snowflake with Matillion Data Loader

Last week I wrote about the newly released Matillion Data Loader (MDL). Matillion Data Loader is a Code-free, Wizard-based Pipeline builder. My first MDL Data Pipeline was loading an Excel sheet into Snowflake. Creating Data Pipelines in Matillion Data Loader is relatively easy and straightforward. Time to try something else. This time I will be loading BigQuery (GBQ) into Snowflake. Again I will show how to go from Source to Cloud Data Warehouse in a few clicks.

Add Pipeline

First add the Google Big Query Pipeline to select data from GBQ.

Destination

To be able to load data into Snowflake, you need to make sure that MDL can connect to Snowflake with the correct credentials and with enough privileges. Therefor you can re-use a Snowflake Destination Configuration if you created one earlier. I re-use the one I created for the previous blogpost.

Connection details

Google Big Query

If you want to load data from Google Big Query, then you should specify the following:

  • Google OAuth — As I wrote in my previous blogpost, you will need Google Cloud Platform Credentials to connect to Google BigQuery. Follow the steps in the; Getting Started with Authentication and download the JSON-file to your local environment. Manage the GCP Cloud Credentials in MDL and upload the JSON-file.
  • ProjectId — In this case; ‘daanalyticsproject’
  • DatasetId — In this case; ‘new_york_citibike’

Data Sources

By selecting DatasetId; ‘new_york_citibike’ I can select ‘citibike_stations’ and the columns in this Data Source.

Staging & Target Configuration

In this step I specify how the tables are named and created in Snowflake. Both a staging,- as well as a target table will be created. By specifying a prefix in combination with the Data Source Name, you will get the Snowflake Table name. On top of that we specify which Warehouse to use and in which Database Schema we are going to create the table and load the data.

Finish

Almost all steps are performed. In the last step I can check wether Matillion Data Loader is ready to retrieve rows from the citibike_stations table in Google Big Query. As you can see on the right clicking the ‘Test’-button resulted in a ‘Success’-message.

Verify the results in Snowflake

In MDL I received a message that 936 rows have been loaded into Snowflake. I can verify that in Snowflake. In Snowflake I can see that there are also 936 rows in the TGT_CITIBIKE_STATIONS-table.

Snowflake History

The Matillion Data Loader job performs a series of steps in Snowflake. These steps can be checked via the History-tab in Snowflake.

Summary

Just as in the previous blogpost creating a Data Pipeline in Matillion is relatively easy. Also for the Google Big Query Pipeline it needs some preparation at both ends. After that it is a matter of selecting and configuring. Most things speak for itself. One downside of MDL is that you cannot reconfigure a Data Pipeline. If something is not working like expected, you need to start all over again. Luckily enough creating Data Pipelines in Matillion Data Loader is not that hard.

Thanks for reading.

Get started with Matillion Data Loader: https://dataloader.matillion.com

Originally published at https://daanalytics.nl on March 8, 2020.

--

--

--

Cloud ☁️ Data & Analytics 📊 Engineer @ DaAnalytics | Manager Data & Analytics @ Pong | Snowflake ❄️ Data Superhero | Modern Cloud ☁️ Data Stack enthusiast

Love podcasts or audiobooks? Learn on the go with our new app.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Daan Bakboord

Daan Bakboord

Cloud ☁️ Data & Analytics 📊 Engineer @ DaAnalytics | Manager Data & Analytics @ Pong | Snowflake ❄️ Data Superhero | Modern Cloud ☁️ Data Stack enthusiast

More from Medium

Snowflake and ELVT vs [ELT|ETL] — Case Study Part 2, Real Time Availability for Single Row INSERTs

How To Generate an Audit Table with Python and Matillion ETL for Snowflake

Snowflake Micro-partitions, Data Clustering & Zero-copy Cloning

Snowflake: how to repartition staged data