DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

Enterprise AI Trend Report: Gain insights on ethical AI, MLOps, generative AI, large language models, and much more.

2024 Cloud survey: Share your insights on microservices, containers, K8s, CI/CD, and DevOps (+ enter a $750 raffle!) for our Trend Reports.

PostgreSQL: Learn about the open-source RDBMS' advanced capabilities, core components, common commands and functions, and general DBA tasks.

AI Automation Essentials. Check out the latest Refcard on all things AI automation, including model training, data security, and more.

Related

  • Why We Built Smart Scaler
  • Achieving High Availability in CI/CD With Observability
  • Examples of Generative AI In SRE
  • What Is Platform Engineering?

Trending

  • Minimum Viable Elevator [Comic]
  • Harnessing the Power of SIMD With Java Vector API
  • Build Your Own Programming Language
  • Elevate Your Terminal Game: Hacks for a Productive Workspace
  1. DZone
  2. Culture and Methodologies
  3. Methodologies
  4. Understanding the relationships between SLO, SLI, and SRE

Understanding the relationships between SLO, SLI, and SRE

An SLI is a measure of compliance with an SLO. This means there is no SLI without SLO. This article looks into the importance of SLIs and SLOs in SREs and how to implement them.

By 
Alireza Chegini user avatar
Alireza Chegini
DZone Core CORE ·
Nov. 15, 21 · Opinion
Like (4)
Save
Tweet
Share
11.6K Views

Join the DZone community and get the full member experience.

Join For Free

Even after delivering a project to a client, the software engineer’s job is not complete. The next phase is ensuring service reliability. In site reliability engineering (SRE) practice, there are two key concepts that the engineer should know, service level objective (SLO) and service level indicator (SLI).

This article looks into the importance of SLIs and SLOs in SREs and how to implement them.

What are service level objectives?

A service level objective is an agreement about a specific metric like uptime or response time. In other words, SLOs are the individual promises made by a service provider to the client and used to set expectations of the service. SLOs also let the IT and DevOps teams have a goal or metric to measure themselves against for a view of how well they are performing.

A service may have more than one SLOs, and they apply to both paying and non-paying customers and even internal clients in the same organization. For example, when a customer-facing team uses tools provided by another team in the same organization, the two teams need to have clearly defined service level objectives so that the customer-facing teams can meet their contractual obligations.

For an SLO to be effective, it must not be vague, very complicated, or impossible to measure. Only the relevant SLOs should be in the document and be spelled out in plain language to provide clarity. It is also essential to factor in other issues like delays from the client.

Using an online service that is called by clients an example, SLOs can include system availability, how long it takes for a request to get a response, the error rate or how often an error is encountered expressed as a fraction, and the number of requests the service can handle per second.

What are service level indicators?

An SLI is a measure of compliance with an SLO. This means there is no SLI without SLO.

Returning to the example of online service, if the service level agreement (SLA) promises availability of 99.95 percent, then your SLO is 99.95 percent. Your SLI is then the actual availability reported by your system.

If your SLI is above 99.95 percent, then you have met your obligation to your client. While 100 percent availability is not possible, the goal is to get as close as possible.

Some of the challenges of SLIs are choosing the relevant metrics to track and implementing how to track them as accurately as possible. Tracking metrics just because you can and not because they are essential to the client is a waste of resources.

How does SRE benefit from SLOs and SLIs?

Having excellent and practical SLOs and SLIs is fundamental to seamlessly transitioning from development to operations. SLOs help the team prioritize their work, while SLIs indicate areas where attention is needed to meet client expectations.

Now that you know what SLOs and SLIs stand for, we will look at the best practices of implementing them to improve your SRE.

Best practices for SLOs and SLIs

When formulating your SLOs within your SLA, it is important to pay attention to these points:

Take customers’ expectations into account

When drafting your SLA, it is important to know what your customers expect from your service or product. With the understanding of what matters to your clients, your team can craft what is practical and that the customer can work with.

Use the plainest language possible in your SLA

Your client might not read the document in your presence where they can ask you for clarification. If any part of your SLA, which includes the SLOs, is ambiguous, you and your client will probably have disagreements on expectations down the line.

Not every metric is an SLO

You will avoid lots of troubles by limiting your SLOs to only practical and essential ones. Use as few SLOs as possible, do not cram in as much as you can to impress with your metric tracking capabilities.

Don’t promise the moon even if you can deliver it

While setting your SLOs, you do not need to promise clients your total capacity. For example, if your system can maintain an uptime of 99.99 percent, you do not have to set your SLO at 99.99 percent. It is better to have a wiggle room by underpromising and over-delivering. This way, you can take care of unforeseen issues that can affect the service you provide.

Have a sounds disaster recovery plan

Before committing to an SLO, prepare a detailed plan of what to do when your SLI drops below your SLOs. Failure to do this will result in an uncoordinated response that only wastes your team’s time, instead of fixing the problem.

Site reliability engineering

Opinions expressed by DZone contributors are their own.

Related

  • Why We Built Smart Scaler
  • Achieving High Availability in CI/CD With Observability
  • Examples of Generative AI In SRE
  • What Is Platform Engineering?

Partner Resources


Comments

ABOUT US

  • About DZone
  • Send feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends: