DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

Enterprise AI Trend Report: Gain insights on ethical AI, MLOps, generative AI, large language models, and much more.

2024 Cloud survey: Share your insights on microservices, containers, K8s, CI/CD, and DevOps (+ enter a $750 raffle!) for our Trend Reports.

PostgreSQL: Learn about the open-source RDBMS' advanced capabilities, core components, common commands and functions, and general DBA tasks.

AI Automation Essentials. Check out the latest Refcard on all things AI automation, including model training, data security, and more.

Related

  • Enhancing Database Efficiency With MySQL Views: A Comprehensive Guide and Examples
  • Why SQL Isn’t the Right Fit for Graph Databases
  • Transferring Data From OneStream Cube to SQL Table
  • Optimizing Pgbench for CockroachDB Part 2

Trending

  • Harnessing the Power of SIMD With Java Vector API
  • DSL Validations: Properties
  • Build Your Own Programming Language
  • Elevate Your Terminal Game: Hacks for a Productive Workspace
  1. DZone
  2. Software Design and Architecture
  3. Integration
  4. Integrating Snowflake With Trino

Integrating Snowflake With Trino

Trino simplifies accessing Snowflake, highlighting its versatility in providing robust data management and analytics solutions for modern enterprises.

By 
Artem Ervits user avatar
Artem Ervits
DZone Core CORE ·
Mar. 18, 24 · Tutorial
Like (1)
Save
Tweet
Share
3.9K Views

Join the DZone community and get the full member experience.

Join For Free

In today's discourse, we delve into the intricacies of accessing Snowflake via the Trino project. This article illuminates the seamless integration of Trino with Snowflake, offering a comprehensive analysis of its benefits and implications.

Previous Articles

Previous articles on Snowflake and Trino:

  • SaaS Galore: Integrating CockroachDB with Confluent Kafka, Fivetran, and Snowflake
  • Data federation with CockroachDB and Presto

Motivation

A common query among potential adopters of Snowflake pertains to its compatibility with on-premise data and cloud platforms like Azure. In this article, we address this question head-on, exploring the feasibility of accessing Snowflake alongside on-premise data through the Trino project. Let's unravel the possibilities together.

High-Level Steps

  • Deploy Trino in Docker
  • Get a trial Snowflake account
  • Connect the dots
  • Conclusion

Step-By-Step Instructions

Navigating the landscape of data integration can be daunting, especially when considering the compatibility of Snowflake with on-premise environments. In this tutorial, we aim to simplify the process by utilizing a Docker environment to simulate on-premise conditions. Our approach prioritizes simplicity, leveraging standard Snowflake configurations and a basic Trino Docker setup. It's essential to consult your documentation for specific scenarios, but let's begin with the fundamentals.

Deploy Trino in Docker

I have a compose file called compose-trino.yaml with the following contents:

services:

  trino:
    container_name: trino
    hostname: trino
    build: trino/.
    ports:
      - "8080:8080"
    environment:
      - _JAVA_OPTIONS=-Dfile.encoding=UTF-8
    volumes:
      - ./trino/catalog:/etc/trino/catalog
      - ./trino/etc:/etc/trino


In the current directory, I have a folder called trino. Within the folder, I have the following files:

FROM trinodb/trino:442
LABEL version="1.0"
LABEL description="trino container"
ENV REFRESHED_AT 2024_03_15


I also have two more folders called etc and catalog.

Within the catalog directory, I've set up a snowflake.properties file with the following contents:

connector.name=snowflake
connection-url=jdbc:snowflake://<account>.snowflakecomputing.com
connection-user=root
connection-password=secret
snowflake.account=account
snowflake.database=database
snowflake.role=role
snowflake.warehouse=warehouse


If you encounter any hurdles along the way, don't hesitate to refer to the comprehensive Trino documentation available here. Let's dive in!

Once you set up the Snowflake environment, you can adjust these properties with your values.

Within the etc directory, I have a jvm.config with the following contents:

--add-opens=java.base/java.nio=ALL-UNNAMED
-Djdk.module.illegalAccess=permit


These particular JDK flags are Snowflake-specific.

I also have config.properties with the following contents:

coordinator=true
node-scheduler.include-coordinator=true
http-server.http.port=8080
discovery.uri=http://example.net:8080


And finally, node.properties with the following contents:

node.environment=production
node.id=ffffffff-ffff-ffff-ffff-ffffffffffff
node.data-dir=/tmp/trino/data


With everything in place, you're now ready to initiate the Compose environment. Execute the following command to start the environment: docker compose -f compose-trino.yaml up -d.

Upon successful configuration, you should observe a running container named trino. You can confirm this by executing the command: docker ps.

f426506aa443   snowflake-docker-trino   "/usr/lib/trino/bin/…"   53 minutes ago   Up 47 minutes (healthy)   0.0.0.0:8080->8080/tcp   trino


If you encounter any issues, you can further troubleshoot by examining the Trino logs using the following command: docker logs trino.

You can access the Trino container with the following command:

docker exec -it trino trino


Once logged in, you can verify the correct configuration of the Snowflake catalog by executing the following command:

trino> show catalogs;
  Catalog  
-----------
 snowflake 
 system    


For the next phase of this tutorial, kindly proceed to sign up for a Snowflake Trial Account. Opt for the standard edition since we won't be utilizing enterprise features. During the signup process, I selected the Azure eastus2 region for my Snowflake deployment.

Upon completing the signup, you'll receive a verification email. Once verified, you'll gain access to your Snowflake environment. Retrieve the necessary details from the email sent by Snowflake, particularly the credentials, and populate the snowflake.properties file located in the trino/catalog directory.

Trial Snowflake account

Connect the Dots

Snowflake provides a variety of demo tutorials, including the Tasty Bytes series. For this tutorial, we'll focus on the "Load sample data with SQL from S3 bucket" worksheet. Alternatively, feel free to select a dataset of your preference.

---> set the Role
USE ROLE accountadmin;

---> set the Warehouse
USE WAREHOUSE compute_wh;

---> create the Tasty Bytes Database
CREATE OR REPLACE DATABASE tasty_bytes_sample_data;

---> create the Raw POS (Point-of-Sale) Schema
CREATE OR REPLACE SCHEMA tasty_bytes_sample_data.raw_pos;

---> create the Raw Menu Table
CREATE OR REPLACE TABLE tasty_bytes_sample_data.raw_pos.menu
(
    menu_id NUMBER(19,0),
    menu_type_id NUMBER(38,0),
    menu_type VARCHAR(16777216),
    truck_brand_name VARCHAR(16777216),
    menu_item_id NUMBER(38,0),
    menu_item_name VARCHAR(16777216),
    item_category VARCHAR(16777216),
    item_subcategory VARCHAR(16777216),
    cost_of_goods_usd NUMBER(38,4),
    sale_price_usd NUMBER(38,4),
    menu_item_health_metrics_obj VARIANT
);

---> confirm the empty Menu table exists
SELECT * FROM tasty_bytes_sample_data.raw_pos.menu;

---> create the Stage referencing the Blob location and CSV File Format
CREATE OR REPLACE STAGE tasty_bytes_sample_data.public.blob_stage
url = 's3://sfquickstarts/tastybytes/'
file_format = (type = csv);

---> query the Stage to find the Menu CSV file
LIST @tasty_bytes_sample_data.public.blob_stage/raw_pos/menu/;

---> copy the Menu file into the Menu table
COPY INTO tasty_bytes_sample_data.raw_pos.menu
FROM @tasty_bytes_sample_data.public.blob_stage/raw_pos/menu/;

---> how many rows are in the table?
SELECT COUNT(*) AS row_count FROM tasty_bytes_sample_data.raw_pos.menu;

---> what do the top 10 rows look like?
SELECT TOP 10 * FROM tasty_bytes_sample_data.raw_pos.menu;

---> what menu items does the Freezing Point brand sell?
SELECT 
   menu_item_name
FROM tasty_bytes_sample_data.raw_pos.menu
WHERE truck_brand_name = 'Freezing Point';

---> what is the profit on Mango Sticky Rice?
SELECT 
   menu_item_name,
   (sale_price_usd - cost_of_goods_usd) AS profit_usd
FROM tasty_bytes_sample_data.raw_pos.menu
WHERE 1=1
AND truck_brand_name = 'Freezing Point'
AND menu_item_name = 'Mango Sticky Rice';

---> to finish, let's extract the Mango Sticky Rice ingredients from the semi-structured column
SELECT 
    m.menu_item_name,
    obj.value:"ingredients"::ARRAY AS ingredients
FROM tasty_bytes_sample_data.raw_pos.menu m,
    LATERAL FLATTEN (input => m.menu_item_health_metrics_obj:menu_item_health_metrics) obj
WHERE 1=1
AND truck_brand_name = 'Freezing Point'
AND menu_item_name = 'Mango Sticky Rice';


We have a dataset in Snowflake, so let's now pivot back to Trino and access the Snowflake data from there.

If your Compose environment is currently active but lacks essential configurations such as snowflake.database, snowflake.warehouse or other pertinent Snowflake properties, it's crucial to halt the environment. Before proceeding, ensure these properties are appropriately configured. Once adjusted, you can restart the Compose environment and continue with the integration process seamlessly.

docker compose -f compose-trino.yaml down


Back in the snowflake.properties file, change the properties to:

connection-user=snowflakeuser
connection-password=snowflakepassword
snowflake.database=tasty_bytes_sample_data
snowflake.role=accountadmin
snowflake.warehouse=compute_wh


Restart the environment and access the Trino shell.

Within the Trino shell, type:

use snowflake.raw_pos;


Since the Snowflake catalog is already configured to connect to our Trino environment, we can omit the database name from the fully qualified table name. Pick any of the above queries from the Snowflake worksheet and try running them in the Trino container.

trino:raw_pos> SELECT COUNT(*) AS row_count FROM raw_
pos.menu;
 row_count 
-----------
       100 
(1 row)

Query 20240315_185131_00013_45g27, FINISHED, 1 node
Splits: 1 total, 1 done (100.00%)
0.84 [1 rows, 0B] [1 rows/s, 0B/s]
trino:raw_pos> SELECT 
            ->    menu_item_name
            -> FROM raw_pos.menu
            -> WHERE truck_brand_name = 'Freezing Poi
nt';
   menu_item_name   
--------------------
 Lemonade           
 Sugar Cone         
 Waffle Cone        
 Two Scoop Bowl     
 Bottled Water      
 Bottled Soda       
 Ice Tea            
 Ice Cream Sandwich 
 Mango Sticky Rice  
 Popsicle           
(10 rows)

Query 20240315_185212_00015_45g27, FINISHED, 1 node
Splits: 1 total, 1 done (100.00%)
1.23 [10 rows, 0B] [8 rows/s, 0B/s]


Indeed, accessing Snowflake datasets using Trino from our local environment demonstrates the flexibility and interoperability of these tools. This integration allows us to seamlessly work with data across different platforms, enhancing our analytical capabilities and workflow efficiency.

Additionally, you can access the Trino UI via http://localhost:8080. With the default configuration, no password is required, and the username is set to admin. By navigating to the "finished queries" section, you can review the queries you've executed, providing valuable insights into your workflow and facilitating debugging if needed. This feature enhances visibility and transparency into your data operations within the Trino environment.

Trino cluster overview

Conclusion

Trino and its commercial version, Starburst, are potent tools for federating data across different sources. This article showcases how easily Snowflake can be accessed using local tools with Trino. The synergy between Snowflake and Trino offers a robust solution for data management and analytics, empowering organizations to leverage cloud data warehousing and distributed query processing for enhanced insights.

Data integration Data management Database Docker (software) sql

Published at DZone with permission of Artem Ervits. See the original article here.

Opinions expressed by DZone contributors are their own.

Related

  • Enhancing Database Efficiency With MySQL Views: A Comprehensive Guide and Examples
  • Why SQL Isn’t the Right Fit for Graph Databases
  • Transferring Data From OneStream Cube to SQL Table
  • Optimizing Pgbench for CockroachDB Part 2

Partner Resources


Comments

ABOUT US

  • About DZone
  • Send feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends: