The Origin of Snowflake. Simply load and query data

Benoît Dageville (current President of Products) and Thierry Cruanes (current CTO) are the founders of Snowflake. What was the idea when they started Snowflake? Benoît and Thierry had a pretty clear and simple vision; “Simply load and query data”. In his keynote, Benoît shared the Origin of Snowflake. He walked us through the Snowflake Secret Sauce. Some notes from my side to summarise Benoît’s talk.

Simply load and query data

August 2012 — In this period there were 3 challenges when it came to loading and querying data. The founders of Snowflake wanted to solve this with Snowflake.

Architecture

Traditional architecture are based on a Single Cluster. This often is a bottleneck because it’s not elastic and cannot scale. Therefore the Snowflake architecture should be Multi-Cluster. This means, as many compute clusters should be able to independently query the same data. There should be no compete for resources. Data should be shared and centralised data. Above all, no data silo’s.

  • Secure Data Share — Create, as a provider, a secure view on your data for consumers to select data from. Therefore Snowflakes announcement about the Data Exchange.
  • Severless capabilities like SnowPipe
  • Micro-Partitioning — Automatically created at runtime (background re-clustering service). Small, Columnar, Structured/semi-Structured, Partition Map Index ( find that partition relevant to your query). Challenges; Blob Storage is immutable and performance

The Snowflake Approach

Try to eliminate the problems and solve them at runtime. At that time you know which data is in use for that workload. Normally you would have to setup the following:

  • physical design (index, partitioning, etc)
  • query tuning (statistics, workload management, etc.).

What’s next?

Now we now the Origin of Snowflake, the question is; What’s next?

  • All the data — Including non-structured data and all types of access
  • Real-Time — The time from when the data is born to the time you can see the data in Snowflake. End-to-End latency. As little as possible.
  • Beyond SQL — To overcome the challenges SQL cannot solve. Having an extensible framework.
  • Data Services — Data Sharing + add value. Data enrichment and potentially share back.

Data & Analytics 📊 Consultant @daanalytics_nl & Partner @pongprof & Board Member @nl_OUG | Modern ☁️ Analytics #Oracle #Snowflake #Looker #Matillion #Fivetran

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store