DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

Enterprise AI Trend Report: Gain insights on ethical AI, MLOps, generative AI, large language models, and much more.

2024 Cloud survey: Share your insights on microservices, containers, K8s, CI/CD, and DevOps (+ enter a $750 raffle!) for our Trend Reports.

PostgreSQL: Learn about the open-source RDBMS' advanced capabilities, core components, common commands and functions, and general DBA tasks.

AI Automation Essentials. Check out the latest Refcard on all things AI automation, including model training, data security, and more.

Related

  • Simplify Big Data Analytics With AirMettle
  • SQL Data Storytelling: A Comprehensive Guide
  • How To Use SQL To Analyze and Visualize Data
  • Supercharging Data Analytics With SQL in S4 HANA and Domo: A Machine Learning Perspective

Trending

  • Generative AI With Spring Boot and Spring AI
  • Implementing CI/CD Pipelines With Jenkins and Docker
  • The Rise of the Platform Engineer: How to Deal With the Increasing Complexity of Software
  • Behavior-Driven Development (BDD) Framework for Terraform
  1. DZone
  2. Data Engineering
  3. Data
  4. Database in Fintech: How To Support 10,000 Dashboards Without Causing a Mess

Database in Fintech: How To Support 10,000 Dashboards Without Causing a Mess

This article introduces the lifecycle of financial metrics in a database, from how they're produced to how they're efficiently presented in data reports.

By 
Frank Z user avatar
Frank Z
·
Oct. 11, 23 · Tutorial
Like (1)
Save
Tweet
Share
1.8K Views

Join the DZone community and get the full member experience.

Join For Free

In a data-intensive industry like finance, data comes from numerous entries and goes to numerous exits. Such status quo can easily, and almost inevitably, lead to chaos in data analysis and management. For example, analysts from different business lines define their own financial metrics in data reports. When you pool these countless reports together in your data architecture, you will find that many metrics overlap or even contradict each other in definition. The consequence is, developing a simple data report will require lots of clarification back and forth, making the process more complicated and time-consuming than it should be.

As your business grows, your data management will arrive at a point when "standardization" is needed. In terms of data engineering, that means you need a data platform where you can produce and manage all metrics. That's your architectural prerequisite to provide efficient financial services. 

This article introduces the lifecycle of financial metrics in a database (in this case, Apache Doris), from how they're produced to how they're efficiently presented in data reports. You will get an inside view of what's behind those fancy financial dashboards. 

Define New Metrics and Add Them to Your Database

Fundamentally, metrics are fields in a table. To provide a more concrete idea of them, I will explain with an example in the banking industry. 

Banks measure the assets of customers by AUM (Assets Under Management). In this scenario, AUM is an atomic metric, which is often a field in the source data table. On the basis of AUM, analysts derive a series of derivative metrics, such as "year-on-year AUM growth", "month-on-month AUM growth", and "AUM per customer".

Once you define the new metrics, you add them to your data reports, which involves a few simple configurations in Apache Doris:

Developers update the metadata accordingly, register the base table where the metrics are derived, configure the data granularity and update frequency of intermediate tables, and input the metric name and definition. Some engineers will also monitor the metrics to identify abnormalities and remove redundant metrics based on a metric evaluation system.

When the metrics are soundly put in place, you can ingest new data into your database to get your data reports. For example, if you ingest CSV files, we recommend the Stream Load method of Apache Doris and a file size of 1~10G per batch. Eventually, these metrics will be visualized in data charts. 

Calculate Your Metrics

As is mentioned, some metrics are produced by combining multiple fields in the source table. In data engineering, that is a multi-table join query. Based on the optimization experience of an Apache Doris user, we recommend flat tables instead of Star/Snowflake Schema. The user reduced the query response time on tables of 100 million rows from 5s to 63ms after such a change.

The user reduced the query response time on tables of 100 million rows from 5s to 63ms after such a change.

The flat table solution also eliminates jitter.
The flat table solution also eliminates jitter.

Enable SQL Caching To Reduce Resource Consumption

Analysts often check data reports of the same metrics on a regular basis. These reports are produced by the same SQL, so one way to further improve query speed is SQL caching. Here is how it turns out in a use case with SQL caching enabled.

  • All queries are responded within 10ms;
  • When computing 30 metrics simultaneously (over 120 SQL commands), results can be returned within 600ms;
  • A TPS (Transactions Per Second) of 300 is reached, with CPU, memory, disk, and I/O usage under 80%;
  • Under the recommended cluster size, over 10,000 metrics can be cached, which means you can save a lot of computation resources.

Enable SQL Caching To Reduce Resource Consumptio

Conclusion

The complexity of data analysis in the financial industry lies in the data itself other than the engineering side. Thus, the underlying data architecture should focus on facilitating the unified and efficient management of data. Apache Doris provides the flexibility of simple metric registration and the ability of fast and resource-efficient metric computation. In this case, the user is able to handle 10,000 active financial metrics in 10,000 dashboards with 30% less ETL efforts.

Data analysis sql

Published at DZone with permission of Frank Z. See the original article here.

Opinions expressed by DZone contributors are their own.

Related

  • Simplify Big Data Analytics With AirMettle
  • SQL Data Storytelling: A Comprehensive Guide
  • How To Use SQL To Analyze and Visualize Data
  • Supercharging Data Analytics With SQL in S4 HANA and Domo: A Machine Learning Perspective

Partner Resources


Comments

ABOUT US

  • About DZone
  • Send feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends: