What is a runbook?
Often, we find operations teams losing their time to incidents that happen in regular day-to-day operations. Though the task would have a simple solution, the person working on it might be unfamiliar with it and ends up wasting a lot of productive man-hours to find the solution. This is where a runbook comes into play.
A Runbook can be defined as a set of guidelines for tackling incidents and maintaining the smooth running of the ecosystem. It is a documentation of the experiential knowledge of engineers in the firm which is designed to help new engineers navigate through the bottlenecks faced during day-to-day operations.
Runbooks help in tackling specific issues by leveraging the knowledge of subject matter experts (SME). A well-made runbook eliminates the need for escalations by empowering the team to be self-sufficient. This significantly reduces downtime and enhances the productivity of the team as well as the management.
Types of runbooks
Runbooks are primarily segmented based on level of automation as follows:
- Manual: Manual runbooks contain step-by-step instructions which are to be performed by an operator.
- Semi-automated: Involves human effort as well as automation.
- Automatic: Does not require human effort.
Runbooks can also be divided based on their functions as:
1. General: For routine operations.
2. Specialized: For specific complex operations or incidents.
What is the difference between runbook and playbook?
Though the names are used interchangeably, they are both significantly different. The basic difference between a runbook and a playbook is the scope. A playbook generally has a much larger scope compared to a runbook. A playbook could even contain multiple runbooks.
If a playbook is a set of instructions to dismantle a car, a runbook would be the section dedicated to dismantling the engine alone.
How to create your own runbook?
The first step in creating a general runbook is finding out the bottlenecks faced by your team in regular operations and identifying the easiest way to resolve them.
In the case of specialized runbooks, an in-depth post-mortem of incidents would enhance the quality of the runbook.
The runbook is to be tested and updated if necessary.
Typically, a runbook contains:
- Overview: A brief overview of the process.
- Authorization: Who gets what level of access to the runbook.
- Steps: Steps required to complete the process.
- Monitoring system information: Specifies all the monitoring alerts that could be triggered and instructions to mitigate them.
- DR plans: contains the different SLA’s, protocols, and instructions for reporting and communication.
- Technical documentation: Critical system information, configurations, metrics, etc.
The things to keep in mind while creating a runbook are:
- It should be simple and easy to understand.
- The language used in writing the runbook should be comfortable for the reader.
- It should be flexible and easily accommodate future amendments.
- Runbooks are to be regularly tested for optimal functioning. An adaptive runbook would seamlessly accommodate changes keeping it updated and relevant.