In your complex software and systems, even if everything is stable, and uptime is great, it’s only a matter of time. Stuff breaks. It’s a fact of life for modern software teams.
Teams at Amazon and Netflix have pioneered “chaos engineering” practices as a way to improve reliability. A new and important subset of this idea is “adversarial gameday testing,” where one team member introduces faults into a system and the rest of the team remediates those faults—in order to build a more resilient system overall.