DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

Enterprise AI Trend Report: Gain insights on ethical AI, MLOps, generative AI, large language models, and much more.

2024 Cloud survey: Share your insights on microservices, containers, K8s, CI/CD, and DevOps (+ enter a $750 raffle!) for our Trend Reports.

PostgreSQL: Learn about the open-source RDBMS' advanced capabilities, core components, common commands and functions, and general DBA tasks.

AI Automation Essentials. Check out the latest Refcard on all things AI automation, including model training, data security, and more.

Related

  • The State of Observability 2024: Navigating Complexity With AI-Driven Insights
  • Understanding Status Page Aggregation: Inside the Technology of a Typical Status Page Aggregator
  • Bridging the Observability Gap for Modern Cloud Architectures
  • Making Waves: Dynatrace Perform 2024 Ushers in New Era of Observability

Trending

  • BPMN 2.0 and Jakarta EE: A Powerful Alliance
  • Architecture: Software Cost Estimation
  • Sprint Anti-Patterns
  • Initializing Services in Node.js Application
  1. DZone
  2. Testing, Deployment, and Maintenance
  3. Monitoring and Observability
  4. O11y Guide, Cloud-Native Observability Pitfalls: Controlling Costs

O11y Guide, Cloud-Native Observability Pitfalls: Controlling Costs

In part 2 of this series, take a look at how to control the costs and the broken cost models encountered with cloud-native observability.

By 
Eric D.  Schabell user avatar
Eric D. Schabell
DZone Core CORE ·
Jan. 29, 24 · Analysis
Like (3)
Save
Tweet
Share
2.5K Views

Join the DZone community and get the full member experience.

Join For Free

Are you looking at your organization's efforts to enter or expand into the cloud-native landscape and feeling a bit daunted by the vast expanse of information surrounding cloud-native observability? When you're moving so fast with Agile practices across your DevOps, SREs, and platform engineering teams, it's no wonder this can seem a bit confusing. Unfortunately, the choices being made have a great impact on both your business, your budgets, and the ultimate success of your cloud-native initiatives that hasty decisions upfront lead to big headaches very quickly down the road.

In the previous introduction, we looked at the problem facing everyone with cloud-native observability. It was the first article in this series. In this article, you'll find the first pitfall discussion that's another common mistake organizations make. By sharing common pitfalls in this series, the hope is that we can learn from them.

After laying the groundwork in the previous article, it's time to tackle the first pitfall, where we need to look at how to control the costs and the broken cost models we encounter with cloud-native observability.

O11y Costs Broken

One of the biggest topics of the last year has been how broken the cost models are for cloud-native observability. I previously wrote about why cloud-native observability needs phases, detailing how the second generation of observability tooling suffers from this broken model.

"The second generation consisted of application performance monitoring (APM) with the infrastructure using virtual machines and later cloud platforms. These second-generation monitoring tools have been unable to keep up with the data volume and massive scale that cloud-native architectures..."

knot the cost of observability metrics data?

They store all of our cloud-native observability data and charge for this, and as our business finds success, scaling data volumes means expensive observability tooling, degraded visualization performance, and slow data queries (rules, alerts, dashboards, etc.).

Organizations would not care how much data is being stored or what it costs if they had better outcomes, happier customers, higher levels of availability, faster remediation of issues, and, above all, more revenue. Unfortunately, as pointed out on TheNewStack, 

"It's remarkable how common this situation is, where an organization is paying more for their observability data than they do for their production infrastructure."

The issue quickly resolves itself around the answer to the question, "Do we need to store all our observability data?" The quick and dirty answer is, of course, not! There has been almost no incentive for any tooling vendors to provide insights into the data we are ingesting for what is actually being used and what is not.

It turns out that when you do take a good look at the data coming in and are able to filter all your data at ingestion for what is not touched by any user, not ad-hoc queried, not part of any dashboard, not part of any rule, and not used for any alerts. It turns out to make quite a difference in data costs.

A dashboard designed for a service status overview, initially while ingesting over 280K data points

In the example above, we designed a dashboard for a service status overview, initially while ingesting over 280K data points. With the ability to inspect and clarify that a lot of these data points were not used in the organization, the same ingestion flow was reduced to just 390 single data points being stored. The cost reduction here depends on your vendor pricing, but with the effect shown here, it's obviously going to be a dramatic cost control tool.

It's important to understand that we need to ingest what we can collect, but we really only want to store what we are actually going to use for queries, rules, alerts, and visualizations. Below is an architectural view of how we are assisted by having control plane functionality and tooling between our data ingestion and data storage. Any data we are not storing can later be passed through to storage should a future project require it.

Architectural view of how we are assisted by having control plane functionality and tooling between our data ingestion and data storage

Finally, without standards and ownership of the cost-controlling processes in an organization, there is little hope of controlling costs. To this end, the FinOps role has become critical to many organizations, and the entire field started a community in 2019 known as the FinOps Foundation. It's very important that cloud-native observability vendors join these efforts moving forward, and this should be a point of interest when evaluating new tooling. Today, 90% of the Fortune 50 now have FinOps teams.

The road to cloud-native success has many pitfalls, and understanding how to avoid the pillars and focusing instead on solutions for the phases of observability will save much wasted time and energy.

Coming Up Next

Another pitfall is when organizations focus on The Pillars in their observability solutions. In the next article in this series, I'll share why this is a pitfall and how we can avoid it wreaking havoc on our cloud-native observability efforts.

Data storage Observability Cloud Cost approach

Published at DZone with permission of Eric D. Schabell, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

Related

  • The State of Observability 2024: Navigating Complexity With AI-Driven Insights
  • Understanding Status Page Aggregation: Inside the Technology of a Typical Status Page Aggregator
  • Bridging the Observability Gap for Modern Cloud Architectures
  • Making Waves: Dynatrace Perform 2024 Ushers in New Era of Observability

Partner Resources


Comments

ABOUT US

  • About DZone
  • Send feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends: