Introduction into Snowflake: What is Snowflake and why is it so popular?
Last month I attended the first Snowflake Summit in San Francisco. I will try to share things I learned at the Summit and give a short introduction into Snowflake. But first; “What is Snowflake?”. One of the keynotes at the Snowflake Summit was; “The origin Story of Snowflake” by Benoît Dageville (current President of Products).
I wrote a blog; “The origin Story of Snowflake” about that keynote earlier. Don’t confuse the name Snowflake with the database modelling technique. Snowflake explains their name as follows; Behind the Snowflake Name.
The founders of Snowflake wanted to solve three challenges with Snowflake. Firstly, the challenge of analysing Machine generated (Big) Data with sometimes an enormous volume (scale) and great variety (structure). Secondly, elasticity and compute on demand. Simplicity as in Software-as-a-Service (SaaS). Thirdly, all the good things from the RDBMS (e.g. SQL).
Snowflake’s architecture is built on three important pillars on top of a Cloud Agnostic Layer for AWS, Azure and later this year; GCP. Snowflake will make use of the specific capabilities of the chosen Cloud. Nevertheless it’s possible to switch clouds. You won’t be locked into a Cloud provider.
I will try to compare these components with a PC.
- Storage — The Hard drive. No limit on storage, logically separated by databases.
- Compute- The CPU, the processing power of the machine. Different virtual warehouses per workload. Automatically or programmatically suspend and resume.
- Service — This is basically the software which tells the computer what to do. The Service Layer is the Brain of the System.
Snowflake’s unique architecture is based on physical separation of storage, compute and services. While these three layers ar physically separated, the are designed to work perfectly together logically; “Multi-Cluster, Shared Data”.
Data is centrally stored and optimised. No data silos. Storage is in a Snowflake proprietary columnar format in the clouds’ blob storage (AWS S3, Azure Blob Storage or Google Cloud Storage. Snowflake manages (data replication, scaling and availability) how and where the data is stored. There is basically no limit on storage. On top of that, Snowflake is capable to store and query all kinds of data, both structured and semi-structured (JSON, AVRO).
The costs of the storage depends on the chosen Cloud. This means that the Cloud provider’s storage costs will be charged 1-on-1 (pass-through). Because of Snowflake’s compression techniques you will save on storage costs.
Snowflake Compute can be seen as various independent MPP-clusters. In Snowflake terms; Virtual Warehouses. Depending on the workload you can choose from various T-shirt sizes; from XS-Small to 4X-Large. Virtual Warehouse work independently from each other. There is no compete for resources. In other words, performance of each workload (e.g. loading, querying or machine learning) is not affected by others.
Compute is charged on a per-second basis with a minimum of 60 seconds. To keep cost under control, Snowflake has the capability to automatically or programmatically suspend and resume a Virtual Warehouse. Next to that you can scale-up or down depending on the demand.
Snowflake is a true Software as a Service offering. In Dutch, DWaaS may sound silly, but Snowflake is DataWarehouse as a Service. The “Brain of the System” takes care of a series of Cloud Services like; Authentication, Infrastructure management, Metadata management, Query parsing & Optimization and Access control. No need for manual interference. Snowflake takes care of it all.
Why love Snowflake?
Apart from the Snowflake capabilities mentioned above, there are some additional things which makes you love Snowflake.
ANSI SQL support. SQL is the language to query data. Snowflake is capable of querying both structured as semi-structured data. The latter makes it possible to query JSON directly via the VARIANT-datatype.
Zero-copy cloning — Multiple copies of the data (e.g. Test, Q&A), without extra storage costs.
Last but nat least, Snowflake is Secure by Design. All data is automatically encrypted. The various security options depend on the chosen Snowflake edition; Standard, Premier, Enterprise, Enterprise for Sensitive Data(ESD) or Virtual Private Snowflake (VPS).
If you want to discuss the architecture of Snowflake in more detail, please let me know. I am happy to discuss things in more detail.
Thanks for reading.
Originally published at https://daanalytics.nl on July 29, 2019.