Getting value to our customers as quickly and reliably as possible is core to what we do. That is why we are excited to be introducing a new function Site Reliability Engineering(SRE) in our engineering department. Up until now our product and operations teams have fulfilled a wide variety of functions. This has worked well for us but we are starting to see some seams between the responsibilities and difficulties juggling competing priorities. We are adding a team that will help shoulder the workload of operations in supporting the needs of our product teams and ultimately our customers.
Our SRE team will take on 2 main responsibilities: reliability and platform development. They will be the voice of reliability in the department, offering expertise on how best to approach reliability and communicate this to the teams through guidance and enablement. They will also take ownership of our platform(PAAS) and cater for our product team’s interactions with it. The platform will be treated like a product whose customers are our product teams. We want the SRE team to enable our product teams, we still believe in a DevOps culture.
We want the SRE team to enable our product teams, we still believe in a DevOps culture.
Up until now the work required to create and maintain our platform and our reliability responsibilities was spread throughout our teams. As we scaled we found that expertise is spread too thin across the department. Scheduling this type of work can be problematic, what is a priority for one team is not the priority for another regardless of where the knowledge resides. We want experts that are able to dedicate themselves to these concerns.
As we scaled we found that expertise is spread too thin across the department.
The SRE team will evolve over time and we already envision a future where the 2 responsibilities are split and we have a separate platform team.
Our product teams will still be responsible for deploying their own services, monitoring and running them in production. This is the best place for this responsibility to live. However as we are getting bigger, concentrating our platform development and reliability expertise will allow us to more effectively develop both. Reliability and our platform are first class concerns and need to be treated with the respect they deserve.
We will evolve our approach as we scale over the coming years. The key part to us is that our SRE team is motivated and enjoy what they are doing and that our product teams feel enabled by the introduction. If this sounds interesting to you jump over to our careers page to take a look at the job spec which has more details on the type of work and experience.