Launching Matillion Data Loader

Introducing Matillion Data Loader to load data from GCP into SnowflakeAs you might know I am managing the Data & Analytics unit at Pong. One of my responsibilities is setting the direction of this unit. One of our focus points is Modern Cloud Analytics. Therefor we have setup a few partnerships focussed on the Snowflake Eco-system. You can find out more about these partnerships; here. Unfortunately only in Dutch, but Google Translate might help a bit. Matillion is one of those partners from the Snowflake Eco-system.

A few weeks ago I found out that Matillion is launching Matillion Data Loader (MDL); a free Software-as-a-Service (SaaS) Data Integration solution.

“Code-free, Wizard-based Pipeline builder”

With Matillion Data Loader you can:

  1. Easily and quickly load data into a Cloud Data Warehouse (Snowflake, Google Big Query and Amazon Redshift) using a Code-free, Wizard-based Pipeline builder,
  2. Using Scale & Speed capabilities in a Serverless environment,
  3. Monitoring jobs from a central dashboard,
  4. Upgrade to Matillion ETL for Advanced Data Transformation, Data Integration, and Job Orchestration.

I was one of the lucky ones to receive an invite to play around with Matillion Data Loader. Starting was relatively easy;

  1. Sign-up and activate my account,
  2. Start building Data Pipelines
  3. Start loading Data into Snowflake.

From Source to Cloud Data Warehouse in a few clicks

In this blog I will show you my first steps on the Matillion Data Loader path. I will use MDL to load data from an Excel-sheet into Snowflake (SF). The Excel-sheet is located in a Google Cloud Platform (GCP) bucket.

GCP Cloud Credentials

Matillion Data Loader must be able to select the Excel-sheet from the GCP-bucket. There for it is necessary to specify the Google Cloud Credentials. These can be downloaded from GCP. Follow the steps in the; Getting Started with Authentication and download the JSON-file to your local environment. In MDL you can manage the GCP Cloud Credentials and upload the JSON-file.

Snowflake Destination Configuration

To be able to load data into Snowflake, you need to make sure that MDL can connect to Snowflake with the correct credentials with enough privileges (e.g. ‘CREATE VIEW’-privileges in SF).

The two endpoints are in place to setup. There is a GCP-bucket where the Excel-file lives and where we have access to. At the other end, there is a Snowflake Database with a Schema where we are allowed to create a table and load data into. What’s left is creating an MDL Pipeline

Matillion Data Loader Pipeline

We need to create a Data Pipeline to get started. For the purpose of this blog, we choose an Excel-sheet as the Source. Snowflake will be the Target Destination Warehouse.

Add the Pipeline

We select Snowflake as the Destination Warehouse. Therefore we can select the Destination Configuration we created earlier. The Pipeline gets a meaningful name.

Google Cloud Storage

In a next step we need to provide the URI of the GCP-Bucket. . When you copy the URI it seems to include a space at the beginning. If you do not remove this space when you enter the Connection URL in Matillion Data Loader, you will get an error; “That URL is not a valid GS connection URL” . Copy the URI to the next step in the MDL Data Pipeline creation.

Google Cloud Connection Details

To be able to load the Excel-file from the Google Cloud Storage, we need to specify the Connection details. This means we select the Google Storage Type. After that we can select the GCP Credentials we created earlier. Finally we fill in the copied URI from the previous step. Make sure it has the right format and no spaces at the beginning.

Data Sources

Without specifying Connection Options, we go straight to te selection of the Data Sources. First click the ‘+’-sign and then select the appropriate sheet. Make sure you give the sheet in Excel a meaningful name. This name will be used to name the tabel in Snowflake.

Data Selection

In the following step, you select the columns you want to add to the table in Snowflake. By default all columns are selected. Also the empty ones.

Snowflake Configuration

We arrived at the fifth step in the Data Pipeline setup. Here we are going to tell Snowflake which table we are going to create by specifying a prefix. Combined with the sheet name, this will be the Target Table name. On top of that we specify which Warehouse to use and in which Database Schema we are going to create the table and load the data.

Verify the results in Snowflake

If everything works out like planned the MDL Data Pipeline will run and load the data into Snowflake. We can head over there and verify the results. As we can see there is a STG_CHANNELS table in the DAANALYTICS-database in the STAGING-schema. The columns match the columns specified in the sheet. Also the data is exactly the same as in the sheet. Apart from to Matillion-specific metadata columns.

Creating a Data Pipeline in Matillion is relatively easy. It needs some preparation at both ends. After that it is a matter of selecting and configuring. Most things speak for itself. One downside of MDL is that you cannot reconfigure a Data Pipeline. If something is not working like expected, you need to start all over again. Luckily enough creating Data Pipelines in Matillion Data Loader is not that hard.

Thanks for reading.

Daan Bakboord

Originally published at https://daanalytics.nl on February 29, 2020.

Cloud ☁️ Data & Analytics 📊 Engineer @ DaAnalytics | Manager Data & Analytics @ Pong | Snowflake ❄️ Data Superhero | Modern Cloud ☁️ Data Stack enthusiast

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

Monitoring MQTT broker with Prometheus and Grafana

Debug a python workload that has gone silent (hung) inside kubernetes

0Chain Weekly Debrief — October 20, 2021

Alvin’s Dew Drop Daily — Issue #69

6 Major Differences Between SQL Server DELETE and TRUNCATE options.

My journey changing career paths and learning python (1)

Music Player Apk, Music Player Apk Download, NEW 2021*

Music Player Apk, Music Player Apk Download, NEW 2021*

Hadoop and Ansible

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Daan Bakboord

Daan Bakboord

Cloud ☁️ Data & Analytics 📊 Engineer @ DaAnalytics | Manager Data & Analytics @ Pong | Snowflake ❄️ Data Superhero | Modern Cloud ☁️ Data Stack enthusiast

More from Medium

Data from 10 Million IoT Devices into the Snowflake Data Cloud

Migrating to Snowflake, Redshift, or BigQuery? Avoid these Common Pitfalls

Building CI/CD Pipelines for Data Applications at Snowflake

Collaborate Better with Data Versioning