Chaos Day Playbook
Equal ExpertsContact UsPlayBooks
  • Creating Chaos
  • What & Why
  • 5-minute guide
  • Complementary approaches
    • Running a mini chaos event
  • Ready for chaos?
  • How
    • Timeline
    • Who to involve in a Chaos Day
    • What experiments to run on a Chaos Day
      • Experiment brainstorm
      • Experiment design and preparation
    • When to run a Chaos Day
    • Where to run a Chaos Day
    • How a Chaos Day unfolds
    • Learning from a Chaos Day
  • What next?
  • Licence
  • Contributing
    • Contributors
    • How to contribute
Powered by GitBook
On this page

Was this helpful?

  1. Complementary approaches

Running a mini chaos event

PreviousComplementary approachesNextReady for chaos?

Last updated 2 years ago

Was this helpful?

One of our clients found it beneficial to condense the Chaos Day process into a 2.5 hour mini chaos event.

Condensing a Chaos Day into 2.5 hours to increase understanding of digital services

In a recent engagement, time was limited to spend on proactive failure investigations. The digital platform team took on the role of facilitators for nine delivery teams to introduce chaos engineering principles and help increase understanding of the digital services they were building and operating. With limited time for the exercises across all the teams, we ran 2.5-hour sessions that included two experiments and a post-incident review, instead of running full chaos days with multiple ongoing scenarios.

To choose experiments under those conditions, we ran experiment selection sessions with the team leads to select potential failures to investigate and gather knowledge on based on two factors:

  • the level of impact a potential failure could have on the user, team, or organisation

  • whether the response to that potential failure was known or unknown either by the service, team, or other parts of the organisation

Working together with the team leads, we prioritised and selected experiments that allowed the team to investigate potential failures combining a high level of impact with an unknown response, because this would provide the best conditions for the team to understand more about how their service worked.

The team had built an authenticated user journey to manage personal details and payment methods. During an experiment selection exercise, we found a great example of a high impact/unknown response failure in the journey whereby if a request failed to be sent to the authentication provider it could prevent users from being able to login.

The potential failure had a high impact on the user experience and the team was unsure of the response with the authentication provider hosting the login pages. It was not clear if any alerts would be fired and if they would be notified.

| EE UK
Adam Hansrod