Review book Site Reliability Engineering

15 Mar 2018 · Two minute read · on Gianluca's blog · Subscribe via RSS

I bought Site Reliability Engineering a lot of months ago. I read the ebook first but I am the kind of people that buy also paper books when they are good, so if you are working on a distributed and scalable environment it’s something that you should read.

Published by O’Reilly and edited by Betsy Beyer, Chris Jones, Jennifer Petoff and Niall Richard Murphy it is written by many Google engineers and it’s about the experience they made scaling services like Google Maps, Calendar, YouTube and all the other products.

I spoke with different people about this book and a lot of them told me that there is nothing new on that. It’s just cool because Google made it cool.

I have a different option. It’s a nice book because it is a complete source of information about design and processes in a highly scalable environment. Probably some of them topic are well known but it’s hard to find all this information in a single place.

Site Reliability Engineering book

To be fair, it has 524 pages so it’s not a fast read. It took me few months but I keep it around when I need to explain concepts like how to dimension and measure loads in a services environment. SLA, SLO and how to use them properly to manage and measure risks are well explained, circuit breaking and more, in general, a lot of good procedures about resiliency, teamwork, delivery are explained in this book.

There is a nice chapter about how to use the metrics to set up a function and smart alerting system to keep engineer on-call in a safe and comfortable environment.

Another one about how Google design resilient applications and how they dimension services. How much and how deep they know their services impressed me a lot.

Site Reliability Engineering is a good mix of concepts that you can apply through your day to day no-at-Google job and the all the Google scale “freaky fun”.

So, in the end, I will define like the bible for an engineer that wish to work in a high-scalable environment. Doesn’t matter if you are not there yet or if you won’t serve millions of requests per second. It’s good to read and to keep around.

The HTML version of this book is now available online for free.

Something weird with this website? Let me know.