DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

Enterprise AI Trend Report: Gain insights on ethical AI, MLOps, generative AI, large language models, and much more.

2024 Cloud survey: Share your insights on microservices, containers, K8s, CI/CD, and DevOps (+ enter a $750 raffle!) for our Trend Reports.

PostgreSQL: Learn about the open-source RDBMS' advanced capabilities, core components, common commands and functions, and general DBA tasks.

AI Automation Essentials. Check out the latest Refcard on all things AI automation, including model training, data security, and more.

Related

  • Demystifying Event Storming: Process Modeling Level Event Storming (Part 2)
  • Pipelining To Increase Throughput of Stream Processing Systems
  • Why Software Development Leaders Should Invest in Continuous Learning and Growth Opportunities
  • Monolithic First

Trending

  • The Power of Generative AI: How It Is Revolutionizing Business Process Automation
  • The Future of Kubernetes: Potential Improvements Through Generative AI
  • Deploying Heroku Apps To Staging and Production Environments With GitLab CI/CD
  • The Data Streaming Landscape 2024
  1. DZone
  2. Data Engineering
  3. Data
  4. Anti-Patterns in Incident Response That You Should Unlearn

Anti-Patterns in Incident Response That You Should Unlearn

In this article, we will explore anti-patterns in incident response and why you should unlearn those.

By 
Vishal Padghan user avatar
Vishal Padghan
·
Oct. 28, 22 · Opinion
Like (6)
Save
Tweet
Share
10.9K Views

Join the DZone community and get the full member experience.

Join For Free

It is important to invest time and effort in understanding why a system performs the way it does and how we can improve it. Companies continue with practices that yield successful results but ignoring anti-patterns can be far worse than choosing rigid processes. In this article, we will explore anti-patterns in incident response and why you should unlearn those.

Common Anti-Patterns in Incident Response 

Just Get Everyone on the Call 

Alerting everyone each time an incident is detected is not the best of practices. Sometimes notifying everyone is easier or adds value. For example:

  • Organizations have smaller teams, and it is easier to notify the entire team.
  • The issue is critical and getting everyone on board is a better option.

This practice may not be ideal when teams scale. You will end up notifying people who have nothing to do with the incident. This may result in alert fatigue where people get accustomed to not paying attention and often ignore incidents where their attention is needed. 

Having on-call rotations and targeted alerting can help with efficient routing and prevent burnouts. 

Using Up Bandwidth to Give Status Updates 

Responders deal with critical incidents where stakeholders expect constant status updates. Updates are great as it keeps everyone in the loop and may potentially offer more solutions. Sometimes, teams deal with minor incidents, which they can resolve quickly and then pass on updates to concerned members. However, while dealing with critical incidents, teams may be forced to focus more on sending updates rather than just resolving the incident. This may compromise the resolution process. 

To address this issue a dedicated person can be assigned for handling communication and to provide timely updates to the stakeholders. 

Progress Follows Chaos 

There is a perception that while dealing with critical incidents, people will move around with lots of discussions chaos and panic. This is not always true. When multiple people are responding to an incident, it is absolutely critical that they collaborate and keep everyone in sync with the actions being taken. Chaos and panic can worsen the situation and should be avoided by defining clear roles and responsibilities. Teams should have an incident commander who takes decisions and authorizes changes that can impact the outcome. Teams also use chat rooms to give updates and maintain records effectively. By setting up these processes, teams can ensure effective communication and prevent chaos and panic. 

Incident Severity and Policy Discussion During an Incident Call 

Debating over the severity of the incident at the last minute is a waste of people’s time. This time should be used in resolving incidents. It is important to define unambiguous severity levels for incidents, as responses, plans, and policies are chosen based on the severity. Ideally, rules should be technically driven, clear and automated so that every incident comes with a pre-defined severity level. 

Training and drills should be conducted to educate teams on how to handle these situations better.

Not Escalating Incidents to the Right Responders 

Teams fail to inform the right responders when they don't have mechanisms to associate/relate incidents to the right responders. In order to find the right person, teams go back and forth, slowing down the process. Another reason why the right people aren't notified is when there are multiple teams involved and team structures are complex. It is important to have an identifiable and reachable person for every team. There should be a clear, well-oiled mechanism to route alerts to the right responders to ensure smooth routing and escalation. 

Postmortem Failures 

Postmortems are important for incident response because they help you learn from the events that happened in the past and help you plan your future actions. 

There are various reasons that result in postmortem failures:

  • Some teams are frequently stressed with deadlines and unplanned incidents. Therefore, once the incident is resolved, no postmortems are done.
  • Sometimes postmortems end up in blame games. A good postmortem happens only when people are open to discussing problems honestly. If you are afraid of getting blamed during a postmortem, it kills the purpose of having postmortems to find solutions to problems.
  • In some cases, postmortems are done just because the process demands it and not to find answers.

Without postmortems, you fail to recognize what’s working and where you can improve. Most importantly, they help you avoid making the same mistakes in the future. Hence, postmortems should be an integral part of the incident response process and must be done sincerely. 

Inflexible Policies and Processes 

Organizations find comfort in practices that return successful results and like to continue with those practices. However, at times you cannot anticipate certain events and established solutions do not work. Having flexible policies and processes can help you adapt to changing requirements and find the right solutions when needed. You don't have to be reckless and should try to introduce sensible changes. Also, don't be afraid to make changes. Some changes will slow down proceedings in the short-term, but promise faster and better results in the long-run. 

Putting on Multiple Hats 

Incidents are confusing at the best of times. People taking up different roles uninformed, just adds to the confusion. In high-pressure situations, people are expected to act quickly. Also, there is limited information coming in and lack of clarity on who needs to do what. This only makes the situation worse. Hence, it is important to define the right roles and responsibilities for people. Also, as an individual, one should keep others involved and informed about a change when needed. 

Conclusion 

Incident response is a field where we constantly look for processes and stability, but ignoring anti-patterns can be far worse than choosing optimal solutions or rigid processes. 

Incident response teams need to identify issues early on, so they can help save time, prevent frustration, and reduce refactoring in the long run. Hence, it is very important to unlearn anti-patterns and learn new processes that can help accelerate incident response.

Drill Inform Bandwidth (computing) Chaos Clear (Unix) Event Go (programming language) Pass (software) teams Data Types

Published at DZone with permission of Vishal Padghan. See the original article here.

Opinions expressed by DZone contributors are their own.

Related

  • Demystifying Event Storming: Process Modeling Level Event Storming (Part 2)
  • Pipelining To Increase Throughput of Stream Processing Systems
  • Why Software Development Leaders Should Invest in Continuous Learning and Growth Opportunities
  • Monolithic First

Partner Resources


Comments

ABOUT US

  • About DZone
  • Send feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends: