Learning to 'Fail Better': 5 Iterative Development Best Practices

"Fail fast."

Software engineers and product development execs are constantly told that failing fast—shorthand for taking a highly iterative approach to software development and learning quickly from your mistakes—is the key to success in today’s complex and dynamic technology environments.

smile face inside computer code

It's a valid premise—as far as it goes. But it doesn't go nearly far enough.

A software team isn't an iteration engine. You can't simply floor the accelerator and expect a successful outcome. Your team won't make the leap from "failing faster" to "going faster" without the right technology, the right data and analytical insights, and the right culture to support and sustain its effort.

Here's a better idea: Don’t just fail fast, learn how to fail better.

Learn how to turn iteration into a strategic process—one where your team's ability to experiment, adapt, innovate, and act contributes directly to meaningful business outcomes and to excellent customer experiences.

That may sound like a tall order for a concept rooted in the word "fail." So let's dig into an example: a New Relic engineering team that successfully deployed a major new feature for New Relic Insights in just three months.

A 'fail better' success story

Our team was tasked with building what would become the New Relic Insights metric explorer—a tool designed to search and chart any metric timeslice data sent to New Relic Insights via New Relic agents. Customers would be able to create and place metric charts within their existing Insights dashboards; this capability, in turn, would make it much easier for users to monitor the data most interesting to them in a single, centralized location.

The metric explorer project posed some interesting challenges. It would need to index and ingest millions of metrics per minute, and to query timeslices across hundreds of thousands of Insights customers. On the frontend, providing a querying interface and enabling customers to chart and modify metrics on dashboards would require fundamental changes to significant parts of the Insights user experience.

We consider the project a major success: Metric explorer has proven to be highly reliable, with strong adoption rates and stellar customer feedback. And as mentioned above, we built and launched metric explorer in just three months—an exceptional achievement for a relatively small team.

The metric explorer team's success was built on three factors: First, the team was packed full of very smart people. Second, those people worked very, very hard to deliver a great product on an aggressive schedule. Third, the team's structure, technology choices, and development processes exemplifies how to adapt and apply a "fail better" mindset to achieve a strategic outcome.

Five keys to failing better

Specifically, the team implemented five key “fail better” best practices:

1. Make technology choices that promote a fast-paced, low-friction, iterative development process.

For the metric explorer team, this meant using containers anywhere and everywhere we could—from deploying code to running services in our staging and production environments. The benefits of containerization are now widely understood, but it's hard to overstate the role containers can play in creating simple, consistent, and reliable environments that allow for lightning-quick deployments, rapid iteration, and low-cost experimentation.

We also benefitted from Amazon Web Services (AWS) as a versatile and low-cost cloud environment for experimentation. When combined with a "containerize everything" approach and a microservices architecture (more on this in a moment), our technology choices made it much easier to take chances, to experiment aggressively, and to quickly roll what we learned into the development process.

2. Give engineering teams and product managers a shared understanding of priorities, performance, and success metrics.

The relationship between software engineers and product managers is a notorious source of friction—and, too often, failure—in the development process. We kept our teams on the same page with a shared, and always data-driven, understanding of:

Success metrics focused on user adoption, reliability, and performance
Defining and prioritizing a set of minimum marketable features (MMFs)
Which customer problems to solve first
Why to cut or keep features when faced with competing priorities

True relationship building, however, takes more than the right mix of dashboards and data visualizations. Our PM worked as a fully integrated member of the team—participating in standup meetings and demos, performing reliability work, and taking direct responsibility for tasks within the development process.

3. Choose a team structure that promotes trust, teamwork, and empathy.

Integrating our PM with the rest of the team was part of a bigger effort to nurture a level of mutual trust, understanding, and empathy that would enable a faster and more efficient development process.

A big part of that came down to basic interaction: We spent a lot of time doing "mob programming"—literally writing code together as a team. We also required every team member to participate in on-call rotations, sharing responsibility (and the occasional sleepless night) for troubleshooting and resolving incidents.

In addition, over time, each member of the metric explorer team also evolved into what HR experts call a "T-shaped" individual: Each of us brought deep expertise within a specific field, but we also developed a solid working knowledge across a number of other technologies.

A T-shaped team is better equipped to roll with the punches than a team composed of silo-bound specialists; it's easier for any team member to step in where needed, to make informed decisions on the fly, and to keep the process moving. Just as important, it's an approach that promotes empathy by giving everybody an opportunity to walk a mile in a colleague's shoes.

4. Employ a launch process that supports iteration and incremental delivery, using an array of relevant tools and tactics.

It's easy to talk about "iteration," but a truly iterative development process requires foresight and smart decision-making. Our team broke its work on metric explorer into 10 micro-features, each of which we delivered incrementally. For example, we launched a chart-creation wizard that represented just part of what we intended to ship, but which helped us gather critical feedback early in the process. This was especially important, given the challenges we faced releasing a product on such a short timeline.

We employed other tactics that supported this approach: We used feature flagging to separate our code deployments from feature releases—a useful way to enable testing in production environments and to maintain a consistently fast deployment pace. Dark launches let us minimize project risk while gaining realistic user feedback and assessing our infrastructure performance. And a rigorous approach to instrumentation gave us the ability to measure performance as we dialed up our ingest and query traffic—giving us confidence in our ability to scale without exposing customers to potentially sub-par experiences.

5. Rely on New Relic for critical visibility, instrumentation, and real-time feedback.

At several points in the metric explorer development process, our use of the New Relic platform made a potentially game-changing difference.

As noted above, we used New Relic Insights to create dashboards that gave design, product management, and engineering stakeholders a shared understanding of usage patterns. New Relic APM 360 supported not just instrumentation for the project's Java services, but also let us do our own instrumentation for custom services. New Relic Browser yielded real-time insights into our customers' client-side experience. And features such as programmatic alerting and deploy markers helped us detect, isolate, and resolve issues quickly, and with a new level of confidence and certainty.

Fail your way to success—one step at a time

Sure, failing fast is better than failing slow: At least you know where you stand and that you still have work to do. But the real point of failing better is to use failure as a catalyst for achieving success—consistently, sustainably, and in ways that matter to your business.

Of course, your team's exact path to this goal will be different than our path, or anyone else's. With the right technology and culture, however, you can be sure that you're moving in the right direction—and that you'll have a fun and interesting time getting there.

By Henry Shapiro

Henry Shapiro is Vice President of Product, Observability Tools at New Relic. He’s spent most of his career in the data analytics and monitoring space, primarily in product and engineering roles. When he’s not working, he’s on any number of two-wheeled motorized vehicles.

The views expressed on this blog are those of the author and do not necessarily reflect the views of New Relic. Any solutions offered by the author are environment-specific and not part of the commercial solutions or support offered by New Relic. Please join us exclusively at the Explorers Hub (discuss.newrelic.com) for questions and support related to this blog post. This blog may contain links to content on third-party sites. By providing such links, New Relic does not adopt, guarantee, approve or endorse the information, views or products available on such sites.

780+ integrations to start monitoring your stack for free.

See All Integrations