Glossary

Agent

An agent is a program installed on a physical server. An agent executes various server processes.

Agile is an iterative set of software development best practices. The top priorities in agile software development include software quality assurance, user feedback integration, and the ability to “fail fast” and implement rapid changes as needed.

Alert

An alert is used to notify an organization significant changes in their IT environment. Alerts can also indicate when a system has failed.

Alert Aggregation

Alert aggregation refers to the connection of all IT monitoring tools to view all alerts and incident data in one place.

Alert Fatigue

Alert fatigue occurs when many monitoring systems create an abundance of alerts that flood mailboxes. This causes alerts to become less meaningful, and often decreases the responsiveness of IT team members.

Alert Noise

Alert noise is a high volume of false alarms that makes it difficult to detect and respond to actual, important alerts.

Alert Rule

Alert rules are customized policies created by the user of an IT alerting system. They can be used to normalize the behavior of alerts based on time of day, type of alert and more.

Analytics

Analytics are the application of statistics, research, and computing to gather insights and meaning from a set of data.

API

An API is used as an intermediary between different programs; an API ensures programs can share data with one another.

Application Release

Application release is a practice in which a software release is deployed across multiple environments and configurations with little to no human interaction.

Artifact

An artifact is a descriptive model used to create software; artifact examples include diagrams and UML models.

Assigned / Acknowledged Incident

An incident that has been assigned or acknowledged means that a specific team or individual has committed to taking accountability for and resolving the issue.

Automation

In IT alerting, automation is the technique of enabling the alerting process or notification system to operate automatically using alert rules.

Autonomy

Autonomy in DevOps is self-governance. An autonomous DevOps team empowers each member to act based on the situation and resources available without the need to defer to a superior.

Behavior-Driven Development

Behavior-driven development is a form of software development that involves ongoing communication between developers, business analysts, quality assurance teams, and other team members. It promotes constant collaboration and helps key stakeholders work together to achieve common software development goals.

Branching

Branching refers to a programming technique in which a source code copy is used to create two versions of software. This enables the source code to be simultaneously modified by two developers.

Capacity Test

A capacity test enables a DevOps team to determine the maximum number of end users an application, computer, or server can handle before it crashes.

Categorization

Categorization of incidents allows their impact, urgency, and severity to be easily understood and transparent.

Closed Incident

A closed incident is considered fully resolved, and it is confirmed that no additional action by a network operations center (NOC) or incident management team is necessary.

Closure

Closure is the confirmation that no further action needs to be taken on a resolved incident.

Communications Lead

In an incident response team, the communications lead facilitates communication about an incident to parties both inside and outside of the organization.

Complex-Adaptive System

BA complex-adaptive system consists of an IT platform or project that includes multiple components. In this system, each component interacts with others in ways that cannot be accurately predicted or controlled.

Configuration Drift

Configuration drift occurs when a hardware or software infrastructure configuration changes from a recovery or secondary configuration. It may occur due to inconsistent configurations across a set of computers or devices.

Configuration Management

Configuration management is a system engineering process for creating and maintaining product consistency. It involves management of a product’s performance, function, and physical attributes relative to its design and requirements.

Containerization

Containerization is the use of virtual software containers that include operating system resources, memory, and services to run an application or service. It often helps a developer test production flows for services deployed in the cloud.

Containment

Containment is the third step in an incident response plan. The goal of this step is to quickly patch up the cause of the incident.

Continuous Delivery (CD)

Continuous delivery (CD) is a software engineering approach that utilizes short, frequent cycles to produce software.

Continuous Deployment

Continuous deployment utilizes automated software code testing. If code passes the automated test, the software automatically moves into a production environment.

Continuous Integration (CI)

Continuous integration (CI) is a software engineering practice that merges developer code changes into a single repository. Once CI is complete, the merged code is used to automate software builds and tests.

Continuous Quality

Continuous quality is the integration of software quality reviews into the CD pipeline. It requires quality assurance team members to review software code as soon as it becomes available, and address any potential code issues during the software development cycle.

Continuous Testing

Continuous testing is the execution of automated tests as part of the software delivery pipeline. It enables a DevOps team to get feedback for identifying potential risks before software is publicly released.

Cyber-Attack

A cyber-attack is an attempt by hackers to penetrate a business’ IT networks or systems.

Data Breach

A data breach is the intentional or unintentional release of private information to an untrusted recipient. Data breaches can cause incidents.

Data Enrichment

Data enrichment is the process of merging data from a third-party source with an existing database. Brands implement data enrichment to enhance their data, improve data accuracy, and make more informed decisions.

Deduplication

Deduplication refers to the elimination of duplicate or redundant alerts received by monitoring systems.

Deployment

Deployment is all the activities performed before software is publicly released.

DevOps

DevOps (development and operations) is a culture in the IT industry that fosters collaboration between developers and operations teams.

DevSecOps

DevSecOps (development and security operations) is a “security as code” culture that fosters collaboration between software developers and information security teams.

Diagnosis

Diagnosis of an incident is a formulation of a hypothesis as to what caused the incident. An incident management team may be able to resolve an incident solely based on an initial diagnosis.

Documentation Lead

In an incident response team, the documentation lead documents the timeline of events during the incident response process.

Downtime

Downtime is the period that a system is unable to perform its primary functions.

Eradication

Eradication is the fourth step in an incident response plan. It aims to completely remove the threat causing an incident.

Escalation

Incidents require escalation when more support is needed to resolve them. Teams gather and log incident information to prompt incident escalation to other team members or executives.

Escalation

Escalation is bringing an issue to an individual or team in a higher department within an organization. For example, if a customer service representative finds an issue that can be resolved only by the IT team, the issue can be escalated to an IT manager.

Event

An event is any occurrence that causes a change in the IT environment.

Event-Driven Architecture

Event-driven architecture is a form of software architecture that involves the creation of events by a system; the system then uses these events to identify or consume similar occurrences in the future.

Event-Triggered Alert

An event-triggered email is an automated alert sent via email, text, or phone call when a pre-determined event occurs.

Exploratory Testing

Exploratory testing is a strategy that provides human software testers with the ability to analyze different areas of a piece of software. It empowers human software testers with the flexibility to test potential software issues that may otherwise go undetected during automated tests.

HR/Legal Representative

Sometimes, incidents can cause an organization to be charged with a criminal offense. In an incident response team, the HR or legal representative must navigate any legal consequences of an incident.

Identification

Identification is the initial detection of an incident.

Identification / Detection and Analysis

Identification, also known as detection and analysis, is the second step in an incident response plan. In this step, research is done to find the cause of a detected incident.

Impact

Impact measures the effect of an incident on business processes. A high-impact incident may force business processes to come to a halt, whereas a low-impact incident has little or no effect on operations.

Incident

An incident is an unplanned and undesired event that interrupts business operations. Incidents can cause downtime, revenue loss, compliance penalties, and brand reputation damage. Incidents can also affect employees and customers.

Incident Log

An incident log includes the name of the person reporting an incident, the date and time of an incident and other incident details.

Incident Management

Incident management is the process of identifying, analyzing, and addressing the incidents or technical disruptions of a business.

Incident Management Model

An incident management model includes time frames for incident resolution, insight into how to escalate an incident and best practices for preserving data and key performance indicators (KPIs) during an incident.

Incident Management Tool

An incident management tool is used by organizations to both facilitate and improve incident management. Incident management tools can automate escalations, monitoring, and collaboration.

Incident Response Phase

An incident response phase is a stage of an incident response plan. There are generally six incident response phases: preparation, identification, containment, eradication, recovery and lessons learned. Each phase plays an important role in effective incident response.

Incident Response Plan

An IT incident response plan guides IT staff in detecting, understanding, and responding to incidents caused by issues like cybercrime, data loss, and outages.

Incident Response Team (NOC Team)

Incident response or incident management teams, also known as NOC teams, are trained to provide immediate solutions for incidents that disrupt an organization’s operations. An incident management team ensures an incident is closed or resolved within a predefined time limit described in an SLA.

Incident Volume

Incident volume refers to the number of incidents received in a given time period.

Information Security Incident

An information security incident is an adverse event, such as a cyber-attack or insider threat, that negatively impacts an information system or a network. This type of incident poses a threat to the availability, integrity and confidentiality of a system.

Infrastructure-as-a-Service (IaaS)

Infrastructure as a service (IaaS) is a form of cloud computing that utilizes virtualized computing resources over the internet.

Integration

Integration of alerting tools allows users to streamline their alerts into one space. Integration interconnects data and notifications.

Integration Testing

Integration testing involves the evaluation of myriad software components. During an integration test, software components are combined and analyzed as a single group.

IT Alerting System

An IT alerting system is a tool used by organizations to mitigate business risks and detect problems in the IT environment. It is a customizable tool that can automate and deduplicate alerts from various integrated monitoring sources, run analytics and create reports.

KPI

A key performance indicator (KPI) is a performance indicator clearly demonstrates how effectively and efficiently an organization is meeting its objectives. MTTR and MTTF are good KPI examples.

Lead Investigator

In an incident response team, the lead investigator analyzes an incident to find its root cause so that the team may start recovering from the incident and developing preventative measures as soon as possible.

Lessons Learned / Post-Incident Activity

Lessons learned, also known as post-incident activity, is the final step in an incident response plan. In this step, a resolved and closed incident is reviewed to identify steps that can be taken to improve a system, and aid in prevention of future incidents.

Mass Notification / Manual Paging

Mass notifications or manual paging delivers information to a group of people in the form of email, text, or phone call.

Mass Notification System (MNS)

A mass notification system (MNS) is a platform that delivers information to a group of people. The system is flexible in its configuration of messages, controls, recipients, and methods of communication.

Mean Time Between Failures (MTBF)

Mean time between failures (MTBF) is commonly used to measure hardware component or system reliability. It is calculated as an average of the time between hardware component or system failures.

Mean Time to Acknowledge (MTTA)

Mean time to acknowledge (MTTA) is the average time between an incident’s detection and the beginning of assistance or “acknowledgement” to resolve the issue.

Mean Time to Detect (MTTD)

Mean time to detect (MTTD) is the average time it takes to identify an issue. It measures the time between the beginning of an outage and when the business identifies the issue.

Mean Time to Failure (MTTF)

Mean time to failure (MTTF), aka “uptime,” is the average amount of time elapsed between a DevOps team encountering a serious defect in a system and the complete failure of the system.

Mean Time to Recovery (MTTR)

Mean time to recovery (MTTR) is the average time it takes to return to production status after a hardware component or system fails.

Message Template / Topic

A message template or topic is a template in an IT alerting system that eases the process of sending messages to stakeholders, employees, and customers.

Microservices

Microservices, or microservices architecture, is a software development methodology that involves building single-function modules with clearly defined interfaces and operations.

Mobile Incident Management

Mobile incident management tools allow users to complete incident management processes and tasks on a mobile device such as a smartphone or tablet.

Model-Based Testing

Model-based testing requires the use of test cases derived from visual models that represent the desired behavior of a system or environment. It is commonly used to generate manual tests, test data, and automated tests.

Monitoring

Monitoring or “logging” refers to the tracking of incident information (such as time, duration, and severity) in an incident log.

New Incident

A new incident is one that has been newly discovered by a team or individual and is yet to be assigned or acknowledged.

Notification

A notification is a message sent to an individual to alert them of any updates or issues.

Notification Channel

A notification channel is the channel used to deliver a notification. Notification channel examples include text, email, and phone call.

On-Call Management

On-call management is the management of an on-call team’s accountability, visibility, and responsibilities.

On-Call Team

An on-call team is the team scheduled to respond to messages or incidents at unpredictable times.

On-Hold Incident

An on-hold incident is one that has been assigned or acknowledged but is suspended. Incidents can be put on hold if more information is needed to resolve the issue.

OODA Loop

The OODA loop is an incident response strategy developed by U.S. Air Force military strategist John Boyd. The steps of the OODA loop are: Observe, Orient, Decide, and Act. The OODA loop is designed to help businesses quickly identify and respond to incidents.

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Enterprise

Incident Management Simplified for Enterprises

MSP

Everything an MSP needs.

We’ve simplified your research process!

Make AlertOps work for you. Keep your business humming

Use Cases

Industry

Integrate with your favorite tools

Avoid complex billing and costly add-on modules.

Starter

Standard

Premium

Enterprise

Enterprise

Incident Management Simplified for Enterprises

MSP

Everything an MSP needs.

We’ve simplified your research process!

Make AlertOps work for you. Keep your business humming

Use Cases

Industry

Integrate with your favorite tools

Avoid complex billing and costly add-on modules.

Starter

Standard

Premium

Enterprise

Enterprise

Incident Management Simplified for Enterprises

MSP

Everything an MSP needs.

We’ve simplified your research process!

Make AlertOps work for you. Keep your business humming

Use Cases

Industry

Integrate with your favorite tools

Avoid complex billing and costly add-on modules.

Starter

Standard

Premium

Enterprise

Glossary