Site Reliability Engineering

Traditionally, corporations used to split development- and operational accountability across different teams. The consequence is typically QA intensive and time consuming release procedures. These result in a time-to-market for new features that is not competitive in today’s highly digital customer journeys. We help our clients to define and implement an operations strategy that meets the desired levels of agility.

Site Reliability Engineering (SRE) has emerged as the best choice to balance business agility needs and available skills.

Site Reliability Engineering

Benefits of Site Reliability Engineering

In the classic operations approach a central team is running a software system and takes care of periodic house keeping or incident procedures. SRE on the other hand introduces the idea that the software system should run autonomously and that any required intervention is essentially a bug that needs to be fixed in the software itself. Commonly this is also done by the SRE team itself and therefore over time, the software system improves to a point where no intervention is needed anymore, except in rare incident cases.
SRE teams often work embedded into one or a couple of development teams and thereby also remove the need for an explicit handover of releases and its negative impact on feature delivery agility.

Onboarding your Teams to SRE

In addition to helping you with establishing an SRE approach and culture, we also offer hands on training for your teams to develop the necessary skills for working as Site Reliability Engineers. For a variety of infrastructure providers and products and types of programming languages we provide your teams with a path to improvement.

SRE skill set requirements are among the highest within the IT profession, making the investment in these skills a core contribution to a general digital transformation.

‘Follow-The-Sun’ 24x7

Experience shows that existing legal or contractual situations often make it difficult to have non-classic operations teams take over responsibilities outside normal business hours. We have in the past filled that gap for clients, taking over 24x7 on-call to subject matter expert level support during off-business hours. Our spread-out geographic locations makes it possible to also provide you with a follow-the-sun approach.

Make Conscious Decisions About Target Service Levels

Increasing the target service level of any feature is not a free lunch. Site Reliability Engineering provides you with concepts to determine your required target service levels and how to verify that the software system can support them.
Learn more about SRE