SRE vs DevOps
The question that I would like to answer today is what is the difference
between SRE and DevOps?
This is a question that I hear on a fairly regular basis, not just internally, but from external customers as well and it’s one that I would like to help you walk through so that you can really figure out what makes sense in your organization and I think the answer is probably going to surprise you a little bit.
I think probably the most important thing to understand is this isn’t a versus question. You don’t have to have one or the other. As a matter of fact, I would argue and I think that many people would agree that SRE is actually an essential component of DevOps and a good properly implemented DevOps method leads to the necessity of SRE when it comes to deploy. There are two sides of the same coin and so that’s obviously going to lead to a little bit of confusion because DevOps is the development methodology, it’s all about integrating your development teams and your operations teams. It’s about knocking down those silos between them. It’s about ensuring that everybody is singing off the same song book and that’s very important. SRE is in charge of automating all of the things and making sure that you never go down.
There are really two parts of the same group and so let’s look at the differences, because they do have some differences. Probably the first and largest one is that when we think about our DevOps.The DevOps guys, particularly your developers, they are doing the Core Development,
they are answering the question “what do we want to do?”, they are working with product, they’re working with sales, they’re working with marketing to develop design and deploy. What is it that we do? They’re working on the core.
SRE on the other hand, they’re not working on the Core Development. What they are working is the implementation of the core, they are working on the deployment and they are constantly giving feedback back into that core development group to say “hey something that you guys have designed isn’t working exactly the way that you think that it is” If you want to think about it this way DevOps is trying to develop and SRE is saying how do we deploy and maintain and run to solve this problem it’s the theoretical versus the practical and ideally they’re talking to each other every day, because SRE should be logging defects, they should be logging tickets back with development but probably most importantly they need to understand that they have the same goals. These groups should never be aligned against one another. And so, they do have to have a common understanding.
Let’s see about the most important part, we’re going to talk about failure because failure is not necessary failure, it’s just a way of life. It doesn’t matter what you deploy. It doesn’t matter how well it goes, it’s going to happen. There is a failure budget or an error budget, where things are going to go wrong. SRE team when it comes to failure, they’re going to anticipate it, they’re going to monitor it, they’re going to log it, they’re going to record everything and ideally they can identify a failure before it happens. They’re going to have predictive analytics that are going to say “all right this thing is going to go bad based on what we’ve seen before.” So, SRE is responsible for mitigating some of those failures through monitoring, logging and doing the preemptive parts. So we’ll do the monitors, we’ll do the logs. SRE is also going to lead all of your post actual failure incident management. They’re going to get you through the incident to begin with and then they’re going to hot wash it and when it’s done, you have to get Dev online because these are the guys who are gonna solve the core problem, some RCAs might be solved by SRE internally. Then SRE team will integrate the fix into their monitoring and their logging efforts to make sure that we don’t get into another RCA for the same kind of a problem.
There are different skill sets. Core development DevOps, these are the guys that really love writing software. SRE is a little bit more of an investigative mindset, right. You have to be willing to go and do that analysis, figure out what things have gone wrong, automate all of the things. But there’s a lot that they have in common. Everyone should be writing automation, everyone should be getting rid of toil as much as possible because we just don’t have the time to be doing manual tasks. When we can put the computers in charge of it, computers are not great at thinking on their own, but if you need it to do the same thing over and over and over again in exactly the same way you can’t beat computing for that. And so, automation is key, you just have a slightly different mindset. DevOps is going to automate deployment, they’re going to automate tasks, they’re going to automate feature. SRE is going to automate redundancy and they’re going to automate manual tasks that they can turn into programmatic tasks to keep the stack up.