DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

Enterprise AI Trend Report: Gain insights on ethical AI, MLOps, generative AI, large language models, and much more.

2024 Cloud survey: Share your insights on microservices, containers, K8s, CI/CD, and DevOps (+ enter a $750 raffle!) for our Trend Reports.

PostgreSQL: Learn about the open-source RDBMS' advanced capabilities, core components, common commands and functions, and general DBA tasks.

AI Automation Essentials. Check out the latest Refcard on all things AI automation, including model training, data security, and more.

Related

  • Keep Your Application Secrets Secret
  • Comparing the Efficiency of a Spring Boot Project to a Go Project
  • Container Security: The Art and Science of Safeguarding Cloud-Native Environments
  • Microsoft Azure Service Fabric

Trending

  • C4 PlantUML: Effortless Software Documentation
  • AWS Fargate: Deploy and Run Web API (.NET Core)
  • Code Complexity in Practice
  • The Impact of Biometric Authentication on User Privacy and the Role of Blockchain in Preserving Secure Data
  1. DZone
  2. Software Design and Architecture
  3. Cloud Architecture
  4. Shift-Left Monitoring Approach for Cloud Apps in Containers

Shift-Left Monitoring Approach for Cloud Apps in Containers

It is too costly to have an inefficient code in elastic container-based environments. Shift left monitoring approach with observability solution helps.

By 
Pandian Ramaiah user avatar
Pandian Ramaiah
·
Aug. 28, 23 · Tutorial
Like (3)
Save
Tweet
Share
7.4K Views

Join the DZone community and get the full member experience.

Join For Free

Shift-left is an approach to software development and operations that emphasizes testing, monitoring, and automation earlier in the software development lifecycle. The goal of the shift-left approach is to prevent problems before they arise by catching them early and addressing them quickly. 

When you identify a scalability issue or a bug early, it is quicker and more cost-effective to resolve it. Moving inefficient code to cloud containers can be costly, as it may activate auto-scaling and increase your monthly bill. Furthermore, you will be in a state of emergency until you can identify, isolate, and fix the issue.

The Problem Statement

I would like to demonstrate to you a case where we managed to avert a potential issue with an application that could have caused a major issue in a production environment. 

I was reviewing the performance report of the UAT infrastructure following the recent application change. It was a Spring Boot microservice with MariaDB as the backend, running behind Apache reverse proxy and AWS application load balancer. The new feature was successfully integrated, and all UAT test cases are passed. However, I noticed the performance charts in the MariaDB performance dashboard deviated from pre-deployment patterns.

This is the timeline of the events.

On August 6th at 14:13, The application was restarted with a new Spring Boot jar file containing an embedded Tomcat. 

Application restarts after migration

Application restarts after migration

At 14:52, the query processing rate for MariaDB increased from 0.1 to 88 queries per second and then to 301 queries per second. 

Increase in query rate

Increase in query rate

Additionally, the system CPU was elevated from 1% to 6%. 

Raise in CPU utilization

Raise in CPU utilization

Finally, the JVM time spent on the G1 Young Generation Garbage Collection increased from 0% to 0.1% and remained at that level. 

Increase in GC time on JVM

Increase in GC time on JVM

The application, in its UAT phase, is abnormally issuing 300 queries/sec, which is far beyond what it was designed to do. The new feature has caused an increase in database connection, which is why the increase in queries is so drastic. However, the monitoring dashboard showed that the problematic measures were normal before the new version was deployed.

The Resolution

It is a Spring Boot application that uses JPA to query a MariaDB. The application is designed to run on two containers for minimal load but is expected to scale up to ten. 

Web - app - db topology

Web - app - db topology

If a single container can generate 300 queries per second, can it handle 3000 queries per second if all ten containers are operational? Can the database have enough connections to meet the needs of the other parts of the application? 

We had no other choice but to go back to the developer's table to inspect the changes in Git.

The new change will take a few records from a table and process them. This is what we observed in the service class.

List<X> findAll = this.xRepository.findAll();

No, using the findAll() method without pagination in Spring's CrudRepository is not efficient. Pagination helps to reduce the amount of time it takes to retrieve data from the database by limiting the amount of data fetched. This is what our primary RDBMS education taught us. Additionally, pagination helps to keep memory usage low to prevent the application from crashing due to an overload of data, as well as reducing the Garbage Collection effort of Java Virtual Machine, which was mentioned in the problem statement above.

This test was conducted using only 2,000 records in one container. If this code were to move to production, where there are around 200,000 records in up to 10 containers, it could have caused the team a lot of stress and worry that day. 

The application was rebuilt with the addition of a WHERE clause to the method.

List<X> findAll = this.xRepository.findAllByY(Y);

The normal functioning was restored. The number of queries per second was decreased from 300 to 30, and the amount of effort put into garbage collection returned to its original level. Additionally, the system's CPU usage decreased. 

Query rate becomes normal

Query rate becomes normal

Learning and Summary

Anyone who works in Site Reliability Engineering (SRE) will appreciate the significance of this discovery. We were able to act upon it without having to raise a Severity 1 flag. If this flawed package had been deployed in production, it could have triggered the customer's auto-scaling threshold, resulting in new containers being launched even without an additional user load.

There are three main takeaways from this story. 

Firstly, it is best practice to turn on an observability solution from the beginning, as it can provide a history of events that can be used to identify potential issues. Without this history, I might not have taken a 0.1% Garbage Collection percentage and 6% CPU consumption seriously, and the code could have been released into production with disastrous consequences. Expanding the scope of the monitoring solution to UAT servers helped the team to identify potential root causes and prevent problems before they occur.

Secondly, performance-related test cases should exist in the testing process, and these should be reviewed by someone with experience in observability. This will ensure the functionality of the code is tested, as well as its performance.

Thirdly, cloud-native performance tracking techniques are good for receiving alerts about high utilization, availability, etc. To achieve observability, you may need to have the right tools and expertise in place. Happy Coding!

Container Java virtual machine Site reliability engineering Cloud Spring Boot MariaDB

Opinions expressed by DZone contributors are their own.

Related

  • Keep Your Application Secrets Secret
  • Comparing the Efficiency of a Spring Boot Project to a Go Project
  • Container Security: The Art and Science of Safeguarding Cloud-Native Environments
  • Microsoft Azure Service Fabric

Partner Resources


Comments

ABOUT US

  • About DZone
  • Send feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends: