LogoLogo
Equal ExpertsContact UsPlaybooks
  • Overview
  • Authors
  • Introduction
  • What is Ops Run It
    • Deployment throughput in Ops Run It
    • Service reliability in Ops Run It
    • Learning culture in Ops Run It
    • Benefits of Ops Run It
    • Drawbacks of Ops Run It
  • What is You Build It You Run It
    • Deployment throughput in You Build It You Run It
    • Service reliability in You Build It You Run It
    • Learning culture in You Build It You Run It
    • Benefits of You Build It You Run It
    • Drawbacks of You Build It You Run It
  • Principles
  • Practices
    • Selection
    • Governance
    • Build
    • Operational Enablers
    • Incident Response
    • Measurement
    • Scale
  • Pitfalls
  • Resources
  • Contributions
    • Contributors
    • How to Contribute
  • About Equal Experts
Powered by GitBook
On this page
  • Measure deployment throughput
  • Measure service reliability
  • Measure learning culture

Was this helpful?

  1. Practices

Measurement

PreviousIncident ResponseNextScale

Last updated 11 months ago

Was this helpful?

These practices measure an operating model to decide if it's worthy of further investment. The cost effectiveness of business outcome protection is an informative, actionable measure of an operating model.

Leading and trailing measures can be used to gauge progress in deployment throughput, service reliability, and learning culture. It's important to select measures that are holistic, actionable, and do not create the wrong incentives. See by Steve Smith.

These practices are linked to our principles of and .

Measure deployment throughput

Measure the deployment throughput of digital services as:

Measure
Type
Description
Suggested Implementation

Loose Coupling

Leading

Is a product team able to change a digital service without orchestrating simultaneous changes with other digital services

Inspect deployment history. Calculate how many deployments of different digital services occurred in lockstep

Pre-Approved Changes

Leading

Does a product team have permission to pre-approve its own change requests for low risk, repeatable changes

Inspect ticketing system. Calculate how many digital services have pre-approved change request templates

Deployment Lead Time

Trailing

How long does a release candidate for a digital service take to reach deployment

Instrument deployment pipeline. Calculate days from production deployment back to build date

Deployment Frequency

Trailing

How often are digital services deployed to customers

Instrument deployment pipeline. Calculate rate of production deployments

Measure service reliability

Measure the reliability of digital services as:

Measure
Type
Description
Suggested Implementation

Failure Design

Leading

Does a digital service include bulkheads, caches, circuit breakers

Survey product teams. Are they confident they can cope with downstream failures

Service Telemetry

Leading

Does a digital service have custom logging, monitoring, and alerts

Survey product teams. Are they confident they can quickly diagnose abnormal operating conditions

Availability

Trailing

What is the availability rate of a digital service, as a Nines Of Availability e.g. 99.9%

Instrument digital services. Calculate percentage of requests that are fully or partially successful

Time To Restore Availability

Trailing

How long does it take to re-attain an availability target once it has been lost

Instrument digital services. Calculate minutes from loss of availability target to re-attainment of availability target

Financial Loss Protection Effectiveness

Trailing

What amount of expected financial loss per incident is protected by a faster than expected Time To Restore Availability

Instrument digital services and incident financial losses. Calculate maximum incident financial loss, up to time to restore allotted by availability target. Calculate percentage of maximum incident financial loss protected by a time to restore faster than allotted by availability target

Financial loss protection effectiveness is a measure we've used with customers to gauge the cost effectiveness of You Build It You Run It at scale. It's a check of projected financial loss per incident that is unrealised, because the actual time to restore is faster than the projected time to restore. It's a comparison that can be made between digital services, product teams, and even operating models.

Software Service
Maximum Financial Exposure In An Hour
Availability Target
Tolerable Unavailability In A Week

bedroom

$200K

99.0%

1:40:48

Assume the bedroom service is unavailable for 30 mins, and the incident financial loss is calculated as $100K. The tolerable unavailability per week for 99.0% is 1 hour 41 mins, which would have produced a $336K incident financial loss. Financial loss protection can therefore be calculated as 70% of a projected incident financial loss was unrealised, as $236K of a theoretical $336K was protected by a faster than expected time to restore.

Software Service
Incident Duration
Incident Financial Loss
Incident Financial Loss At Tolerable Unavailability Per Week
Incident Financial Loss Protection
Incident Financial Loss Protection Effectiveness %

bedroom

0:30:00

$100K

$336K

$236K

70%

Measure learning culture

Measure the progress of a learning culture as:

Measure
Type
Description
Suggested Implementation

Post-Incident Review Readers

Leading

How many readers does a post-incident review for a digital service have

Instrument post-incident reviews. Calculate how many unique visitors each review receives

Chaos Days Frequency

Leading

How often are Chaos Days run for a digital service

Instrument Chaos Day reviews. Calculate rate of review publication

Improvement Action Lead Time

Trailing

How long does an improvement action from a post-incident review for a digital service take to be implemented

Instrument post-incident reviews. Calculate days from improvement action definition to completion

Improvement Action Frequency

Trailing

How often are improvement actions implemented for a digital service

Instrument post-incident reviews. Calculate days between implementation of improvement actions

You Build It You Run It on the Xinja banking platform

In the , there was a furniture retailer example with a bedroom digital service. It had a maximum financial exposure of $200K per hour, and a 99.0% availability target.

For more on measuring financial loss protection effectiveness, see our case study on .

Trends in post-incident reviews are of particular interest. Usage in training materials and citations in internal company documents could also be included. See by John Allspaw.

This customer was a fintech startup, with very limited money and skilled employees. It was a greenfield project in every sense, including business processes as well as technical implementation. I worked on the single technical team responsible for delivery, and therefore the onus of the entire system was on us. It was the classic case for You Build It You Run It. To achieve this, automation was key. Adoption of Continuous Delivery enabled us to focus our energy on development, not deployments. All our changes were tested and deployed through a single automated pipeline, for simplicity and ease of tracking. Operational tasks needed to be in code as well, to allow for seamless deployments of infrastructure and application changes. Monitoring and system reliability were at the heart of the system design as well, since reduced issues meant more time for development. Code quality and system health awareness checks also steered the interests of our team. Since the business and operational knowledge was contained within our team, the customer only needed to contact us to raise their issues. This meant issues were resolved rapidly, and learnings from the issues were fed back into the design, testing and monitoring of the system. Developer EE Australia & New Zealand

earlier furniture retailer example
How to do digital transformation at John Lewis & Partners
Markers of progress in incident analysis
Measuring Continuous Delivery
Adrian Ng
operating models are insurance for business outcomes
operating models are powered by feedback
Adrian Ng