To access material, start machines and answer questions login.
Knowing what actions to take in case a specific alert is triggered can often become confusing when dealing with alerts in a SOC. Incident Response playbooks help reduce some of the confusion arising from such scenarios. But what are IR playbooks, and how do one create them? Let's learn in this room.
Objectives
In this room, we will be able to answer the following questions:
- What are IR playbooks, and how are they used?
- How do we capture the different steps of the IR process in a playbook?
- What roles and responsibilities are assigned to the different members of the SOC in a playbook?
Prerequisites
Before starting this room, please ensure that you have completed the following:
- Intro to IR and IM room
- Incident Response Module
- Advanced ELK Queries room
As organisations mature, they also try to document the scenarios people are expected to encounter in their daily work routines and the steps they should take when those scenarios arise. In most organisations with a mature SOC, processes govern everything, defining the roles and responsibilities of different people. In a SOC, since the primary routine is monitoring alerts, these processes primarily govern how to respond to different alerts.
The Incident Response Process
The Incident Response process is the primary governing process for responding to incidents. Although this process differs from one organisation to another in semantics, most functional parts of the process remain similar. This process defines different metrics that govern how incidents are responded to. Mainly, an incident response process will include the following:
- A RACI matrix that defines the roles and responsibilities of different people.
- An escalation matrix that defines how and when incidents are communicated up the escalation ladder to the management.
- A severity matrix that defines the criteria for assigning severity to an incident.
- The procedures for handling crises from a cyber security standpoint.
In essence, it ensures that when an incident happens, the organisation is prepared to handle it, and people don't have to think on their feet about what to do next. Although sometimes crises might dictate deviating from the written process, it helps to have the confidence that one knows what to do when an incident occurs.
The Need for Playbooks
Since organisations already have an incident response process, why are playbooks required? The difference is mainly in the granularity of the response process. The IR process defines the high-level steps needed during an incident. It superficially mentions the stages of the IR process and does not dig into details about what steps must be taken during each stage. For example, the IR process might mention that containment and eradication actions will be taken at a particular stage without going into details about containment and eradication.
In a crisis, differences of opinion might arise regarding what steps to take to contain and eradicate a threat. While some might prefer searching on one portal for threat intelligence related to a particular artefact, others might have more confidence in another portal, and others might believe it's unnecessary to search on any portal altogether.
To avoid such conflicts, processes that define in such granularity what steps might be taken for each type of incident are needed. We can call these processes IR playbooks. In short, playbooks are processes that define in detail what granular steps need to be taken for each different type of alert that we receive. So, there should be a different playbook for each type of incident: phishing, malware, account compromise, policy violation, ransomware, or other types of incidents.
Use Cases and Playbooks
When monitoring is set up in a SOC, one major part is building use cases. Since most organisations produce terabytes of logs daily, it is impossible to properly view and analyse them manually. So, use cases are built to flag suspicious activity that needs further investigation. Once the use case is triggered, the analyst analyses the logs which triggered the use case and decides what actions must be taken. These actions are written in the playbooks. Therefore, we can reasonably assume that every use case created by the detection engineering team needs to have a playbook assigned to it. However, while the detection engineering team might develop hundreds of use cases, only a dozen playbooks are created for them. While there is a unique playbook for every use case, multiple use cases might trigger a single playbook, making it a one-to-many mapping.
Automating Playbooks
So far, we have identified that playbooks contain repeatable steps to follow whenever a use case triggers an alert. Depending on the organisational dynamics, some of these steps might be automated. The first step in automating something is often to divide it into small, repeatable steps, which the playbooks already do. Therefore, a SOC team might use a SOAR platform to automate partial or complete playbooks, relieving a lot of pressure from the analysts and making their lives easier. However, this process often takes time, and not everything can be automated. So, different organisations are at varying levels of automation at any given time, depending on their resources and goals. Nevertheless, having well-defined playbooks helps organisations take the next step towards automation and reducing alert fatigue from the analysts.
Creating playbooks, use cases, and an IR plan makes it very simple for the analysts to understand what to do in any scenario. They have documents to help them perform their jobs effectively, which means that the SOC's output is consistent, and there is little room for ambiguity. Playbooks also help the SOC teams identify opportunities to automate repetitive tasks and make their jobs easier. In the coming tasks, we will discuss the playbooks in a way that helps us follow them and equip us to create them when the need arises.
Naturally, playbooks are designed to follow the steps of the IR process. Therefore, most organisations divide playbooks by the different steps of the IR process. Preparation is the first step, so let's start with that.
Preparation
Most organisations don't include this stage of the IR process in their playbooks. Since preparation is the stage of IR before the detection or identification of an incident, and a playbook is triggered after an incident is detected, it makes sense that this part is not present in most organisation's playbooks. However, this does not mean that no preparation needs to be done regarding playbook development. This phase is usually covered in the prerequisites part of the playbook.
Prerequisites of Playbooks
Like the IR process, the prerequisites of playbooks are created to ensure that the capability of detecting, investigating, and responding to an incident is available. Without this capability, the playbook might never trigger. As described previously, each playbook refers to a few use cases that will be the starting point of the playbook. Therefore, triggering use cases also need to be noted in the prerequisites of the playbook. In short, the following points should be ensured before a playbook is created, and these prerequisites can be indicated at the start of the playbook.
- All relevant logs are present and integrated into the SIEM.
- The logs are appropriately parsed, and the required fields are extracted and searchable.
- The logs contain all the required fields, such as machine information, IP information, process name, and more.
- Use cases are created on the specific behaviour that needs to be flagged and responded to.
- The recommended security controls are applied, representing the organisation's policies.
Prerequisites for Phishing Playbook
Now, let's use the above-mentioned general prerequisites to create more specific prerequisites for an example phishing playbook.
Relevant Logs |
|
Required Fields |
|
Possible Use Cases |
|
Recommended Security Controls |
|
These sample prerequisites can change based on an organisation's requirements, but they will mostly remain similar to the ones we mentioned here. If necessary, the organisation might add other security policies, such as vulnerability and patch management, recommended security baselines, etc., to the prerequisites for the playbooks.
Prerequisites for Malware Playbook
Similar to the above, we can also list the prerequisites for an example malware playbook.
Relevant Logs |
|
Required Fields |
|
Possible Use Cases |
|
Recommended Security Controls |
|
As with the phishing playbook, these sample prerequisites can be changed based on the organisation's requirements, and they might add general security guidelines based on their policies.
The prerequisites explained what our use cases require to trigger an incident. Once the incident is triggered, the detection and analysis part of the IR process will start. From here on, playbooks contain detailed information on granular steps to perform for verification of the incident.
Workflow Diagrams
Playbooks often contain workflow diagrams that help users understand the process. This workflow diagram can be the first step when creating a playbook, which can be explained and expanded to add more details. An example workflow diagram of this stage of the IR process will look like the following illustration.
We can add further details in this diagram as per the detailed steps of the playbook, making a separate workflow diagram for each playbook.
Detection and Analysis Checklist
As a checklist, we can ensure the following points are considered in the detection and analysis part of the playbook:
- Alert trigger
- Initial verification of data from the logs
- Verify potential IOC data (hashes, IP addresses, domain names, etc.) from e.g. OSINT, threat intelligence feeds, internal documentation
- Verify metadata of the IOCs (parent process, command line instructions, domain age, open ports, etc.) to understand the context
- Depending on the results of the above investigation, either close the incident or escalate it for containment, eradication and recovery
Example From Phishing Playbook
Let's use the above checklist to see how we can extrapolate that to create the detection and analysis part of the phishing playbook.
Detection

For phishing, the playbook can be triggered either through an alert on the SIEM (often relayed by the email security gateway or in-house use cases) or through a user-reported phishing email. Regardless of where the alert is triggered, the next steps for a phishing email will always be the same. However, this will be the starting point of our phishing playbook workflow. The following are a few use cases that might be used as triggers for a phishing playbook, but the list is non-exhaustive.
- Email from a recently created domain
- An email containing a known malicious link
- An email containing a suspicious attachment file type (rar, zip, 7z, exe, chm, htm, ps1, etc.)
- Email from a domain with a bad reputation
As we mentioned, multiple use cases can have a single playbook; therefore, all of these use cases will trigger a single phishing playbook.
Analysis
Once the playbook is triggered, the analyst must verify the initial data. At this stage, the analyst will determine the following information from the logs and email headers:
- Identify the sender and recipient(s).
- Build context from the email subject, recipient's designation and department, and email body (if available).
- Identify the sender's IP address and domain.
- Extract any URLs, attachments, or QR codes from the email. Check if the QR code leads to a URL or IP address.
- Check threat intel sources for the reputation of the sender's email address, domain, and IP address. Also, check how old the domain is and whether it can be a typosquatting domain.
- Check if the extracted URLs, attachments, or QR codes are marked as malicious on VirusTotal, Hybrid Analysis, URLScan or other such platforms. Do this for all artefacts extracted from the email.
- Check if the URLs contain any credential phishing platform.
- Hunt for any suspicious logins from the email recipients to identify users who might have been phished.
Based on the results from the above analysis, the analyst will conclude whether the email is a phishing or a clean email. If the analyst identifies the email as phishing, they might escalate it to the next step. Otherwise, the incident will be closed here.
Example From Malware Playbook
Similar to the above, we can generate the detection and analysis part of an example malware playbook.
Detection
A malware detection will often trigger a malware playbook. Sometimes, malware playbooks can be further fleshed into different playbooks such as ransomware, infostealer, process injection, and more. However, for this room, we will keep it generic. Some use cases that might trigger the malware playbook are as follows:
- EDR marked a process as malicious
- Suspicious file executing from temp directory
- Browser executing a suspicious script engine such as PowerShell or CMD
Although we can have separate playbooks for some types of malware, for the sake of this room, we will consider a single malware playbook that covers all these cases.
Analysis
Once any of the above-mentioned use cases get triggered, the analyst will perform the following steps to verify the alert:
- Identify the process that triggered the alert.
- To identify the binary's reputation, check the process's hash on VirusTotal, Hybrid Analysis, or similar platforms.
- On VirusTotal, check if the process is marked as safe, distributed by a known vendor, or if it is signed by that vendor. Important Note: Do not upload the binary to VirusTotal or any other third-party platform without consulting your management.
- If the process is clean but was used to execute a file (such as an MS Word file, PDF file, or PowerShell script), analyse the executed file using VirusTotal, Hybrid Analysis, or other similar platforms.
- Check the parent process of the process that triggered the alert. Using the above steps, see if the parent process is clean or malicious.
- Identify how the process/file landed on the affected system. Check network or proxy logs to identify suspicious downloads or email logs to identify if the file was delivered through a phishing email.
- If a phishing email was involved, trigger the phishing playbook.
- Execute the malware in a private sandbox (if available) to understand its behaviour.
- Note the malware's activities in a document, which will be used for later containment actions.
- Preserve evidence of malicious activity, like taking memory, disk, or triage images, without turning off the affected systems (if required).
- Perform forensic analysis of the collected forensic data to identify further IOCs.
- Perform an organisation-wide threat hunt to identify all affected machines.
Based on the above steps, the analyst can either close the incident if it is a False Positive or escalate it to the next level if it is a True Positive.
The analyst will decide whether to close or escalate the incident based on the analysis results in the previous phase. If the incident is closed as a False Positive, the analyst might send the use case for fine-tuning to avoid such False Positives. If it is a True Positive, it might just be an authorised or expected activity, which the analyst will verify from the initiator of the activity, or it will be a True Positive that needs remediation. At this point, the incident will need to be escalated.
Escalation Process
In most organisations, the L1 analyst will work on the identification and analysis part of the IR process, also called triage. If the incident is a False Positive, it will be closed. However, it will be escalated to the L2 analyst if it is a True Positive. The L2 analyst will take it through the containment, eradication, and recovery part, taking help from the L3 analyst where required, mainly including forensics analysis, malware analysis, or other advanced analysis. The L3 or the incident responder will also create and maintain the playbooks, updating them as required. Most organisations follow the process defined above. However, each organisation might make a few changes to the process to suit their needs.
Checklist for Containment, Eradication, and Recovery
The IR process's containment, eradication, and recovery phase is only triggered when the incident is a True Positive. Therefore, the checklist here will focus mainly on limiting and reversing the impact of the incident. We can chalk out the following checklist for this phase of the IR process:
- Identify the root cause of the incident.
- Identify the impact and the affected assets.
- Contain the threat by isolating the affected assets and limiting connectivity.
- Perform actions to remove the impact of the threat from the affected assets.
- Bring the assets back to the last known good configuration.
- Resume services as usual for all the affected assets.
Example From Phishing Playbook
Let's use the above checklist to build the phishing playbook's containment, eradication, and recovery part.
Containment
Once the analyst has identified the phishing email is a True Positive, they must take steps to contain the threat. These steps need to be spelt out in the playbook to avoid ambiguity. The following steps can be taken using the above checklist to contain a phishing email.
- Extract artefacts from the phishing email, such as IP addresses, email addresses, file hashes, domains, etc.
- Block the sender's email address, domain, and IP address on the email gateway.
- Block file hashes on the EDR.
- Block phishing links on the web proxy so that no outbound connection to the proxy is possible.
Eradication
The previous steps will ensure that the threat is contained. However, if people already interacted with the email, further steps might be needed to eradicate the threat. They can be as follows.
- Remove the phishing emails from the inbox of all the affected users.
- If someone has already clicked on the link or downloaded the attachment, isolate their machine from the network, revoke their active logon sessions, reset their credentials, and initiate the malware playbook (if required).
Recovery
Once the threat has been eradicated, the affected systems and user accounts must be returned to normal service. The following steps might be taken to ensure this happens smoothly.
- Reset credentials of all the affected accounts and ensure that MFA is enabled and strict passwords are created as per policy.
- Audit the activity of the affected user accounts and reverse any suspicious activities that the malicious actor might have carried out.
- Reimage any machines that might have been affected by malicious attachments and restore them to their last known good configuration.
Example From Malware Playbook
Similarly, we can extrapolate the checklist from earlier in the task to flesh out this phase of the malware playbook.
Containment
The following steps can outline an effective containment strategy for malware.
- Isolate all the affected systems from the network.
- Revoke sessions of all the accounts logged in to the affected systems.
- Block any communication to potential C2 servers.
Eradication
Once the malware threat is contained, the following steps might be taken to eradicate it.
- Shut down the services run by the malicious process.
- Remove the binary of the malicious process from the affected systems, if possible.
- If required, reimage the affected systems.
- Run a complete EDR scan on the machines to ensure malware removal.
- Reset the credentials of all the affected user accounts.
Recovery
After eradication of the malware, the following steps can be taken for full recovery.
- Enable MFA on all affected accounts and ensure strong passwords are created as per policy (to cover for infostealers/spyware malware).
- Audit the malware's activity and revert its changes in the affected systems (e.g., adding database entries, changing a webpage or other such changes).
- Return the affected machines to the last known good configuration and resume services.
It might be noted that the containment, eradication, and recovery phases often overlap. Hence, we sometimes group them in a single stage of the IR process.
Post-incident activity is the last stage of the IR process. The steps to be taken at this stage vary from incident to incident and will depend on the gaps identified during the process. The following questions are addressed during post-incident activity.
- Why did the incident happen (or using the 5 Whys method, which help reach the root cause)?
- What gaps were identified, which, if plugged, would have helped avoid the incident?
- How can we improve People, Processes, and Technology to avoid such incidents in the future?
- What steps, if taken, could have minimised the impact of the incident?
As might be evident, this activity will be highly subjective based on the organisational dynamics, the type of incident, the impact, and the gaps found during the incident investigation. Therefore, it isn't easy to outline specific steps in playbooks that might be taken as part of this stage of the IR process. Hence, this part is generally not added to the playbooks. Instead, the post-incident activity is guided by broad guidelines outlined in the IR plan instead of the playbooks.
It might also be noted that the scope and guidelines of post-incident activity might differ based on the incident severity, impact on the organisation, and organisation from the IR process. It is common for most organisations to perform post-incident activity only for high and critical-severity incidents. For these reasons, it is generally only added to the high-level IR plan and not in the playbooks. If required for an incident, the playbook can refer to the IR plan for initiating post-incident activity once an incident has been recovered.
What is the last stage of the IR process?
Now that we have a pretty good idea of how playbooks work and how we might even be able to create them, let's put them into practice. We have logs from a machine infected by malware in the attached VM. Let's see if we can use the guidelines we laid out for the malware playbook to answer the questions below.
Setting up and Connecting to the Machine
Start the virtual machine by clicking on the green Start Machine button below.
Let the VM load for around 3 minutes, as it will run in the background.
To access the dashboard, you can do it in one of the following ways:
- Connect via OpenVPN (more info here), then type the machine's IP
http://MACHINE_IP
in your browser's address bar. - Follow the link
https://LAB_WEB_URL.p.thmlabs.com/
using your browser.
You'll be presented with the Kibana login screen. Enter elastic
for the username and elastic
for the password.
Scenario
Deer Inc. is working on research that is of national importance. Therefore, they remain in close contact with the national CERT, which also has a sensor in its network. The sensor can identify network anomalies, which are then communicated to Deer Inc. They have received a communication from the national CERT regarding suspicious network activity detected in their network. The following are the details shared by the national CERT:
Date | Destination IP | Source IP | Source Port | Destination Port |
2024-08-28 | 171.25.193.9 | 192.168.64.4 | 59230 | 80 |
We have an instance of Elasticsearch and Kibana in the attached VM, which will act as an SIEM solution in this investigation.
Is this process malicious, as per VirusTotal? y/n
What is the name of the parent process of this process?
This process's parent was launched by another process, which is a notorious ransomware. Which ransomware is that?
Which playbook should be followed to respond to this incident?
Is this incident an FP (False Positive) or a TP (True Positive)?
In case the incident is a TP, what will be the next step in the IR process?
And that's a wrap. In this room, we learned about playbooks, especially:
- What playbooks are, and what their place in the documentation and processes hierarchy of an organisation is.
- How the different stages of the IR process are mapped to action items in a playbook.
- How we can use a playbook to investigate an incident.
Let us know what you think about this room on our Discord channel or Twitter account. See you around
Yay! Now I can create IR playbooks.
Created by
Room Type
Free Room. Anyone can deploy virtual machines in the room (without being subscribed)!
Users in Room
5,906
Created
332 days ago
Ready to learn Cyber Security? Create your free account today!
TryHackMe provides free online cyber security training to secure jobs & upskill through a fun, interactive learning environment.
Already have an account? Log in