DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

Enterprise AI Trend Report: Gain insights on ethical AI, MLOps, generative AI, large language models, and much more.

2024 Cloud survey: Share your insights on microservices, containers, K8s, CI/CD, and DevOps (+ enter a $750 raffle!) for our Trend Reports.

PostgreSQL: Learn about the open-source RDBMS' advanced capabilities, core components, common commands and functions, and general DBA tasks.

AI Automation Essentials. Check out the latest Refcard on all things AI automation, including model training, data security, and more.

Related

  • Accelerating Insights With Couchbase Columnar
  • Which Tool Is Better for Code Completion — Azure Data Studio or dbForge SQL Complete?
  • Microsoft Azure Backup Service
  • Amazon RDS vs Azure SQL — Know the Key Differentiators and Choose the Best

Trending

  • Navigating the Digital Frontier: A Journey Through Information Technology Progress
  • RRR Retro and IPL for Rewards and Recognition
  • Minimum Viable Elevator [Comic]
  • Harnessing the Power of SIMD With Java Vector API
  1. DZone
  2. Data Engineering
  3. Databases
  4. Data Store Options for Operational Analytics/Data Engineering

Data Store Options for Operational Analytics/Data Engineering

Using Managed SQL server instance in Azure with Column-store indexes for better performing Data Analytics and Transaction Processing.

By 
Nagendran Sathananda Manidas user avatar
Nagendran Sathananda Manidas
·
Dec. 21, 23 · Opinion
Like (1)
Save
Tweet
Share
3.0K Views

Join the DZone community and get the full member experience.

Join For Free

In this article, we will delve into essential concepts within the domain of analytics databases, conducting a comparative analysis of the available offerings for Azure SQL databases based on these foundational principles. Let’s look at some key concepts before we delve into the data storage options in Azure.

Clustered Index

  • A B-Tree clustered index organizes rows physically in memory in sorted order, automatically created when the primary key is established.
  • The key advantage of a clustered index lies in the swift searching of a range of values. Internally utilizing a B-Tree data structure, the leaf node of the B-Tree clustered index contains the actual table data.
  • It is important to note that only one clustered index can be created for a table.

Non-Clustered Index 

  • A non-clustered index also employs a B-Tree data structure, with the distinction that the leaf node of the B-Tree or non-clustered index contains pointers to the pages containing the actual table data.
  • Unlike a clustered index, a non-clustered index does not organize rows physically in memory in a sorted order.
  • Importantly, it is permissible to create more than one non-clustered index for a table.

Clustered Column Store

Clustered column-store storage involves organizing all data in a table in a columnar format, significantly compressing the data and facilitating rapid execution of analytical queries and reports. Depending on the data characteristics, data size may be reduced by a factor of 10x to 100x. The clustered column-store model excels in the quick ingestion of substantial data volumes (bulk-load) as large batches exceeding 100,000 rows undergo compression before storage on disk. This model is particularly well-suited for classic data warehouse scenarios. 

Non-Clustered Column-Store

In the non-clustered column-store model, data is stored in the traditional row-store table, supplemented by an index in column-store format specifically designed for analytical queries. This configuration supports Hybrid Transactional-Analytic Processing (HTAP), allowing the seamless execution of real-time analytics alongside transactional workloads. Operational queries (OLTP) are processed on the row-store table, optimized for accessing a limited set of rows, while analytical queries (OLAP) leverage the column-store index, preferred for scans and complex analytics. The query optimizer dynamically selects between row-store and column-store formats based on the nature of the query. Importantly, non-clustered column-store indexes maintain the original data size, preserving the dataset in the original row-store table unaltered. However, the size of the additional column-store index is typically orders of magnitude smaller than an equivalent B-tree index. 

 

Data stored as rows vs data stored as columns

When and Where Should We Use Clustered Column-Store Index?

The Clustered Column-store index primarily targets analytics workloads. The table below shows the common scenarios that can be achieved using this technology.

Column-store Option

Workload

Compression

Clustered Column-store Index

  • Traditional Datawarehouse workload with Star or Snowflake schema: Commonly enable CCI on the FACT table but keep DIMENSION tables with row-store with PAGE compression.
     Additional Considerations: consider CCI for large dimension tables with > 1 million rows
  • Insert mostly workload: Many workloads such as insert large volume of data with minimal updates/deletes. These workloads can benefit with huge data compression as well as speed up of analytic queries.

 

10x on average

CCI/NCI (with one or more nonclustered indexes)

  • Similar to the ones mentioned with CCI but require (a) PK/FK enforcements (b) significant number of queries with equality predicate or short-range queries. NCIs speed up the query performance by avoiding full table scans (c) update/delete deletes of rows which can be efficiently located using NCIs.

10X on average + additional storage for NCIs

 

The selection of databases in Azure depends on the nature of the workload, the specific features offered by each Azure database service, and the performance characteristics required for analytics engineering tasks. Azure provides a range of SQL database services with different capabilities to cater to diverse use cases in the analytics domain.  

SQL Virtual Machines (IAAS) 

SQL virtual machines provide comprehensive administrative control over both the SQL Server instance and the underlying operating system during the migration to Azure. Key capabilities include:

  • Full access to SQL Server and the operating system.
  • Extensive support for various versions of SQL Server and the operating system.
  • Automated manageability features tailored for SQL Server.
  • This offering allows for the flexibility to choose the desired SQL Server version, compute resources, and storage options based on specific requirements.

Azure SQL Managed Instances: (PAAS)

SQL Managed Instance offers SQL Server access and feature compatibility, providing a seamless option for deploying on the infrastructure of your preference and facilitating the migration of SQL from on-premises to the cloud. Key capabilities include:

  • Deployment flexibility on the infrastructure of your choice.
  • Built-in management features for enhanced control.
  • Native support for virtual networks.
  • Fully managed as a service, reducing operational overheads.

Azure SQL Database (Platform as a Service or Database as a Service)

This flavor of database services is ideally suited for modern, cloud-native applications that demand a fully managed database with consistent performance. Key capabilities include:

  • Hyperscale storage, supporting up to 100TB of data.
  • Serverless computing, offering flexibility and cost savings.
  • Simplified management for increased operational efficiency.

Deployment Options:

  • Single Database: Enables swift deployment for individual database needs.
  • Elastic Pools: Facilitates resource sharing among a group of databases, reducing costs and streamlining management for specific applications.
  • Hyperscale: Allows databases to scale beyond the 4 TB limit of a single database, ideal for large-scale data requirements.
  • Serverless: Cost-effective option with auto-pause functionality for non-production workloads that do not require continuous database access.

Query Performance Benchmark (Provided by Microsoft)

Key point to note below is that with clustered columnstore index, the example query runs 5x faster on P1 compared to the same query running on P15 with rowstore with no tuning.  This can significantly lower the cost you need to pay to meet your workload requirements.

Pricing Tier

With Rowstore

With Columnstore

Performance Gains

P1

30.6 secs

4.2 secs

14x

P15

19.5 secs

0.319 secs

60x

 

Storage Size: The storage savings with column-store compared to PAGE or NONE compressed tables are shown below. While the cost of storage is already included with AzureDB, but lower storage can enable you to choose a lower tier. Note that this is generated test data, so the compression is lower than one would get for customer workloads.

 

Number of Rows

Size Rowstore (MB)

 

Size columnstore (MB)

Savings

3626191

212 (PAGE compression)

 

120

1.8x

3626191

756 (NONE compression)

 

120MB

6.2x

 

The best part of columnstore index technology is that it does not require any changes to your application. All you need to do is to either create or replace an existing index with columnstore index on your table(s). 

In conclusion, Azure SQL Database stands out as the optimal choice for operational analytics, offering a tailored solution that leverages columnar storage indices with robust transaction support. This strategic combination not only ensures efficient storage and retrieval of data but also provides a foundation for seamless transactional operations, making it the ideal platform for organizations seeking enhanced performance and analytical capabilities.

Analytics Database azure clustered sql Tree (data structure)

Opinions expressed by DZone contributors are their own.

Related

  • Accelerating Insights With Couchbase Columnar
  • Which Tool Is Better for Code Completion — Azure Data Studio or dbForge SQL Complete?
  • Microsoft Azure Backup Service
  • Amazon RDS vs Azure SQL — Know the Key Differentiators and Choose the Best

Partner Resources


Comments

ABOUT US

  • About DZone
  • Send feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends: