Cover of Why We Still Suck at Resilience by Adrian Hornsby

Why We Still Suck
at Resilience

By Adrian Hornsby

Your organisation does incident reviews, runs GameDays, and practises chaos engineering. So why do similar incidents keep happening? The practices operated, the boxes got checked, but the learning that would build resilience didn't accumulate. This book explains why — and what to do about it.

Get the Book on Leanpub First chapter free

From the people
who read it.

"For thirty years, I have observed similar patterns recur across teams and companies — drift between how we think systems work and how they actually behave, prevention erasing its own evidence, practices decaying into theater. Adrian Hornsby's book is the first I have read that names them, explains why they happen, and gives teams a shared vocabulary to discuss them without talking past each other. I recommend this book to engineers and leaders building the organizational capacity to adapt to surprising situations — and a resilience program durable enough to keep that capacity alive. After reading it, you will recognize unwanted patterns, know what contributes to them, and help your organization stop sucking at resilience."
JN
Jason Niemczyk
Senior Principal Site Reliability Engineer, Autodesk
"This book is the missing map to guide leaders and practitioners through modern operational challenges. Adrian grounds theory firmly in practice, putting words to everyday experiences that engineers and managers recognize but often can't articulate. He explores the full scope of resilience challenges: technical, organizational, and psychological. This is a must-read for anyone who cares about operational excellence."
BL
Beth Adele Long
Principal, Adaptive Capacity Labs
"I can highly recommend this book. It has excellent structure, an easy to read writing style and the content really does live up to the title. The concept of "Work As Imagined" and "Work As Done" is a central theme which, when embraced, will fundamentally change the way you approach resilience."
AC
Andy Cranston
Founder & Owner, Cranston Innovation
"This book doesn't hand you a checklist. It reshapes how you look at complex systems and the organizations that run them. It makes you think and reflect more than it gives you answers. The real failures are rarely technical, they're communication, incentives, and leadership gaps, and Adrian captures that well. Recommended for engineering leaders, SREs, and anyone who senses their organization has stopped learning."
MK
Mikhail Kuklin
Senior Data Engineer, PhD

“Once you can name what’s happening, you can see it. Once you can see it, you can talk about it. Once you can talk about it, you can make different choices.”

From the preface

14 chapters across
three parts.

Part 1 — Foundation
01
How Complex Systems Fail
Why failures in interconnected systems defy root cause analysis and what that means for how we build resilience.
02
What Learning Means
Single-loop, double-loop, and deutero-learning. Organisational learning vs. learning organisation. Why most organisations only do the first and how to tell the difference.
03
Why Practices Fail to Build Learning
How ORRs, load testing, chaos engineering, incident reviews, and GameDays become rituals that check boxes instead of building capability.
04
The Bedrock
Psychological safety, appropriate incentives, and leadership support — the conditions without which no practice works.
Part 2 — The Practices
05
Operational Readiness Reviews
How ORRs reveal the gap between Work-as-Imagined and Work-as-Done — and how they drift towards compliance theatre.
06
Load Testing
Why load tests confirm assumptions instead of challenging them, and what enables them to surface real capacity cliffs.
07
Chaos Engineering
How deliberate experimentation becomes validation theatre under organisational pressure, and what keeps it honest.
08
GameDays
How GameDays expose gaps between documented response procedures and how coordination actually works under pressure — and why scripted exercises confirm Work-as-Imagined without discovering Work-as-Done.
09
Incident Analysis
Why reviews stay shallow, carefully avoid the real problems, and what enables depth and genuine learning.
Part 3 — Navigation
10
Designing for Adaptability
Building systems and organisations that can respond to the unexpected, not just the predicted.
11
Making Drift Visible
How organisations unconsciously move towards brittleness through accumulated small decisions — and how to see it happening.
12
The Prevention Paradox
Why successful resilience work becomes its own enemy, and how to sustain investment when nothing is visibly breaking.
13
Organizing Resilience Work
Treating resilience as ongoing practice rather than achievable destination.
14
The Future of Resilience
When machines do the thinking, what do the humans learn? Why AI risks outsourcing the very learning that makes organisations resilient, why our tooling has got it wrong by optimising practices in isolation instead of building learning systems, and how to start where you are by exploring the adjacent possible.
Part 4 — Appendix
ORR Template + Incident Analysis Template
Ready-to-use templates you can adapt for your organisation, plus a glossary of key concepts and full references.
Engineers implementing practices who sense that resilience work has become a checkbox exercise rather than learning.
Leaders sponsoring programmes confused why significant investment hasn't built the adaptive capacity they expected.
Architects and SREs concerned their organisation talks about resilience but doesn't actually behave in ways that build it.
Consultants advising organisations who need the vocabulary to articulate what's wrong beneath the surface of impressive-sounding programmes.
Adrian Hornsby

Adrian Hornsby spent nearly a decade at AWS, progressing from Solutions Architect to Principal Engineer on the AWS Fault Injection Service team. He authored much of AWS's resilience and chaos engineering guidance and worked with internal teams including Prime Video, Amazon Search, and Lambda. Today, through Resilium Labs, he helps Fortune 500 companies and growth-stage organisations diagnose why their resilience programmes aren't working.

Ready to find out why?

Available now as an ebook. No DRM, read on any device. First chapter free.

Get the Book on Leanpub or Book a Diagnostic Call