Glossary

All Glossary

Agent

An agent is a program installed on a physical server. An agent executes various server processes.

Agile Software Development

Agile is an iterative set of software development best practices. The top priorities in agile software development include software quality assurance, user feedback integration, and the ability to “fail fast” and implement rapid changes as needed.

Alert

An alert is used to notify an organization significant changes in their IT environment. Alerts can also indicate when a system has failed.

Alert Aggregation

Alert aggregation refers to the connection of all IT monitoring tools to view all alerts and incident data in one place.

Alert Fatigue

Alert fatigue occurs when many monitoring systems create an abundance of alerts that flood mailboxes. This causes alerts to become less meaningful, and often decreases the responsiveness of IT team members.

Alert Noise

Alert noise is a high volume of false alarms that makes it difficult to detect and respond to actual, important alerts.

Alert Rule

Alert rules are customized policies created by the user of an IT alerting system. They can be used to normalize the behavior of alerts based on time of day, type of alert and more.

Analytics

Analytics are the application of statistics, research, and computing to gather insights and meaning from a set of data.

API

An API is used as an intermediary between different programs; an API ensures programs can share data with one another.

Application Release

Application release is a practice in which a software release is deployed across multiple environments and configurations with little to no human interaction.

Artifact

An artifact is a descriptive model used to create software; artifact examples include diagrams and UML models. 

Assigned / Acknowledged Incident

An incident that has been assigned or acknowledged means that a specific team or individual has committed to taking accountability for and resolving the issue.

Automation

In IT alerting, automation is the technique of enabling the alerting process or notification system to operate automatically using alert rules. 

Autonomy

Autonomy in DevOps is self-governance. An autonomous DevOps team empowers each member to act based on the situation and resources available without the need to defer to a superior.

Behavior-Driven Development

Behavior-driven development is a form of software development that involves ongoing communication between developers, business analysts, quality assurance teams, and other team members. It promotes constant collaboration and helps key stakeholders work together to achieve common software development goals.

Branching

Branching refers to a programming technique in which a source code copy is used to create two versions of software. This enables the source code to be simultaneously modified by two developers. 

Capacity Test

A capacity test enables a DevOps team to determine the maximum number of end users an application, computer, or server can handle before it crashes. 

Categorization

Categorization of incidents allows their impact, urgency, and severity to be easily understood and transparent.

Closed Incident

A closed incident is considered fully resolved, and it is confirmed that no additional action by a network operations center (NOC) or incident management team is necessary.

Closure

Closure is the confirmation that no further action needs to be taken on a resolved incident.

Communications Lead

In an incident response team, the communications lead facilitates communication about an incident to parties both inside and outside of the organization.

Complex-Adaptive System

BA complex-adaptive system consists of an IT platform or project that includes multiple components. In this system, each component interacts with others in ways that cannot be accurately predicted or controlled.

Configuration Drift

Configuration drift occurs when a hardware or software infrastructure configuration changes from a recovery or secondary configuration. It may occur due to inconsistent configurations across a set of computers or devices. 

Configuration Management

Configuration management is a system engineering process for creating and maintaining product consistency. It involves management of a product’s performance, function, and physical attributes relative to its design and requirements. 

Containerization

Containerization is the use of virtual software containers that include operating system resources, memory, and services to run an application or service. It often helps a developer test production flows for services deployed in the cloud.

Containment

Containment is the third step in an incident response plan. The goal of this step is to quickly patch up the cause of the incident.

Continuous Delivery (CD)

Continuous delivery (CD) is a software engineering approach that utilizes short, frequent cycles to produce software.

Continuous Deployment

Continuous deployment utilizes automated software code testing. If code passes the automated test, the software automatically moves into a production environment.

Continuous Integration (CI)

Continuous integration (CI) is a software engineering practice that merges developer code changes into a single repository. Once CI is complete, the merged code is used to automate software builds and tests.

Continuous Quality

Continuous quality is the integration of software quality reviews into the CD pipeline. It requires quality assurance team members to review software code as soon as it becomes available, and address any potential code issues during the software development cycle.

Continuous Testing

Continuous testing is the execution of automated tests as part of the software delivery pipeline. It enables a DevOps team to get feedback for identifying potential risks before software is publicly released.

Cyber Threat

A cyber threat is any risk that may lead to a cyber-attack; cyber threats include malware and ransomware.

Cyber-Attack

A cyber-attack is an attempt by hackers to penetrate a business’ IT networks or systems. 

Data Breach

A data breach is the intentional or unintentional release of private information to an untrusted recipient. Data breaches can cause incidents.

Data Enrichment

Data enrichment is the process of merging data from a third-party source with an existing database. Brands implement data enrichment to enhance their data, improve data accuracy, and make more informed decisions.

Deduplication

Deduplication refers to the elimination of duplicate or redundant alerts received by monitoring systems.

Deployment

Deployment is all the activities performed before software is publicly released.

DevOps

DevOps (development and operations) is a culture in the IT industry that fosters collaboration between developers and operations teams.

DevSecOps

DevSecOps (development and security operations) is a “security as code” culture that fosters collaboration between software developers and information security teams.

Diagnosis

Diagnosis of an incident is a formulation of a hypothesis as to what caused the incident. An incident management team may be able to resolve an incident solely based on an initial diagnosis.

Documentation Lead

In an incident response team, the documentation lead documents the timeline of events during the incident response process.

Downtime

Downtime is the period that a system is unable to perform its primary functions.

Eradication

Eradication is the fourth step in an incident response plan. It aims to completely remove the threat causing an incident.

Escalation

Incidents require escalation when more support is needed to resolve them. Teams gather and log incident information to prompt incident escalation to other team members or executives.

Escalation

Escalation is bringing an issue to an individual or team in a higher department within an organization. For example, if a customer service representative finds an issue that can be resolved only by the IT team, the issue can be escalated to an IT manager.

Event

An event is any occurrence that causes a change in the IT environment.

Event Management

Event management provides the ability to detect, interpret and act on the status of events in IT infrastructure and services. It aids in the automation of many service operations.

Event-Driven Architecture

Event-driven architecture is a form of software architecture that involves the creation of events by a system; the system then uses these events to identify or consume similar occurrences in the future.

Event-Triggered Alert

An event-triggered email is an automated alert sent via email, text, or phone call when a pre-determined event occurs.

Exploratory Testing

Exploratory testing is a strategy that provides human software testers with the ability to analyze different areas of a piece of software. It empowers human software testers with the flexibility to test potential software issues that may otherwise go undetected during automated tests.

HR/Legal Representative

Sometimes, incidents can cause an organization to be charged with a criminal offense. In an incident response team, the HR or legal representative must navigate any legal consequences of an incident.

Identification

Identification is the initial detection of an incident.

Identification / Detection and Analysis

Identification, also known as detection and analysis, is the second step in an incident response plan. In this step, research is done to find the cause of a detected incident.

Impact

Impact measures the effect of an incident on business processes. A high-impact incident may force business processes to come to a halt, whereas a low-impact incident has little or no effect on operations.

In-Progress Incident

An in-progress incident is one that is in the process of being mitigated.

Incident

An incident is an unplanned and undesired event that interrupts business operations. Incidents can cause downtime, revenue loss, compliance penalties, and brand reputation damage. Incidents can also affect employees and customers. 

Incident Lifecycle

The incident lifecycle is a series of six stages that each incident goes through after being detected. These stages are: new, assigned/acknowledged, in-progress, on-hold, resolved, and closed.

Incident Log

An incident log includes the name of the person reporting an incident, the date and time of an incident and other incident details.

Incident Management

Incident management is the process of identifying, analyzing, and addressing the incidents or technical disruptions of a business.

Incident Management Model

An incident management model includes time frames for incident resolution, insight into how to escalate an incident and best practices for preserving data and key performance indicators (KPIs) during an incident.

Incident Management Tool

An incident management tool is used by organizations to both facilitate and improve incident management. Incident management tools can automate escalations, monitoring, and collaboration.

Incident Response Phase

An incident response phase is a stage of an incident response plan. There are generally six incident response phases: preparation, identification, containment, eradication, recovery and lessons learned. Each phase plays an important role in effective incident response.  

Incident Response Plan

An IT incident response plan guides IT staff in detecting, understanding, and responding to incidents caused by issues like cybercrime, data loss, and outages.

Incident Response Team (NOC Team)

Incident response or incident management teams, also known as NOC teams, are trained to provide immediate solutions for incidents that disrupt an organization’s operations. An incident management team ensures an incident is closed or resolved within a predefined time limit described in an SLA.

Incident Volume

Incident volume refers to the number of incidents received in a given time period.

Information Security Incident

An information security incident is an adverse event, such as a cyber-attack or insider threat, that negatively impacts an information system or a network. This type of incident poses a threat to the availability, integrity and confidentiality of a system.

Infrastructure-as-a-Service (IaaS)

Infrastructure as a service (IaaS) is a form of cloud computing that utilizes virtualized computing resources over the internet.

Integration

Integration of alerting tools allows users to streamline their alerts into one space. Integration interconnects data and notifications.

Integration Testing

Integration testing involves the evaluation of myriad software components. During an integration test, software components are combined and analyzed as a single group.

IT Alerting System

An IT alerting system is a tool used by organizations to mitigate business risks and detect problems in the IT environment. It is a customizable tool that can automate and deduplicate alerts from various integrated monitoring sources, run analytics and create reports.

KPI

A key performance indicator (KPI) is a performance indicator clearly demonstrates how effectively and efficiently an organization is meeting its objectives. MTTR and MTTF are good KPI examples.

Lead Investigator

In an incident response team, the lead investigator analyzes an incident to find its root cause so that the team may start recovering from the incident and developing preventative measures as soon as possible.

Lessons Learned / Post-Incident Activity

Lessons learned, also known as post-incident activity, is the final step in an incident response plan. In this step, a resolved and closed incident is reviewed to identify steps that can be taken to improve a system, and aid in prevention of future incidents.

Mass Notification / Manual Paging

Mass notifications or manual paging delivers information to a group of people in the form of email, text, or phone call.

Mass Notification System (MNS)

A mass notification system (MNS) is a platform that delivers information to a group of people. The system is flexible in its configuration of messages, controls, recipients, and methods of communication.

Mean Time Between Failures (MTBF)

Mean time between failures (MTBF) is commonly used to measure hardware component or system reliability. It is calculated as an average of the time between hardware component or system failures. 

Mean Time to Acknowledge (MTTA)

Mean time to acknowledge (MTTA) is the average time between an incident’s detection and the beginning of assistance or “acknowledgement” to resolve the issue.

Mean Time to Detect (MTTD)

Mean time to detect (MTTD) is the average time it takes to identify an issue. It measures the time between the beginning of an outage and when the business identifies the issue.

Mean Time to Failure (MTTF)

Mean time to failure (MTTF), aka “uptime,” is the average amount of time elapsed between a DevOps team encountering a serious defect in a system and the complete failure of the system. 

Mean Time to Recovery (MTTR)

Mean time to recovery (MTTR) is the average time it takes to return to production status after a hardware component or system fails.

Message Template / Topic

A message template or topic is a template in an IT alerting system that eases the process of sending messages to stakeholders, employees, and customers.

Microservices

Microservices, or microservices architecture, is a software development methodology that involves building single-function modules with clearly defined interfaces and operations.

Mobile Incident Management

Mobile incident management tools allow users to complete incident management processes and tasks on a mobile device such as a smartphone or tablet. 

Model-Based Testing

Model-based testing requires the use of test cases derived from visual models that represent the desired behavior of a system or environment. It is commonly used to generate manual tests, test data, and automated tests.

Monitoring

Monitoring or “logging” refers to the tracking of incident information (such as time, duration, and severity) in an incident log.

New Incident

A new incident is one that has been newly discovered by a team or individual and is yet to be assigned or acknowledged.

Notification

A notification is a message sent to an individual to alert them of any updates or issues.

Notification Channel

A notification channel is the channel used to deliver a notification. Notification channel examples include text, email, and phone call.

On-Call Management

On-call management is the management of an on-call team’s accountability, visibility, and responsibilities.

On-Call Team

An on-call team is the team scheduled to respond to messages or incidents at unpredictable times.

On-Hold Incident

An on-hold incident is one that has been assigned or acknowledged but is suspended. Incidents can be put on hold if more information is needed to resolve the issue.

OODA Loop

The OODA loop is an incident response strategy developed by U.S. Air Force military strategist John Boyd. The steps of the OODA loop are: Observe, Orient, Decide, and Act. The OODA loop is designed to help businesses quickly identify and respond to incidents.

Open Application Programming Interface (API)

An open application programming interface (API) is a public API that is generally available to consumers and developers. 

Overload

Overload occurs when a service demand exceeds its capacity. This can cause errors, and even network, server, or system overloads, causing an incident.

Pair Programming

Pair programming is a software development technique in which two developers simultaneously work on a single feature. It promotes collaboration, as both developers can analyze each other’s code to bolster overall code quality.

Planning

Organizations can use planning to shorten incident response and resolution times. Organizations plan for incident management by identifying potential events that may cause incidents before they happen.

Platform-as-a-Service (PaaS)

Platform as a service (PaaS) is a form of cloud computing service that involves the use of a platform to develop, run, and control applications. With PaaS, a third-party provider delivers hardware and software tools to end users via the internet.

Preparation

Preparation is the first step in an incident response plan. In this step, all assets are complied, and ranked in order of importance.

Prioritization

Prioritization is the assessment of an incident and its impact on business processes and stakeholders. Different processes and workflows can be implemented depending on the priority level of an incident.

Problem Management

Problem management is a process for fixing system errors or weaknesses. Successful problem management limits the impact of incidents and ensures that an incident does not re-occur.

Recovery

Recovery is the fifth step in an incident response plan. In this step, the aim is to get any affected systems and processes to become operational again.

Release Engineering

Release engineering is the technical process of building reliable and fast pipelines to quickly transform source code into a product.

Release Management

Release management is the non-technical process of overseeing and scheduling software build stages such as testing and deployment.

Reporting

Reporting allows businesses to understand previous incidents and improve future incident management analysis, evaluation, and decision-making for reduced incident management costs.

Resolution

Resolution occurs after the necessary steps and processes to resolve an incident have been completed.

Resolved Incident

A resolved incident has been mitigated and all service has returned to SLA standards.

Retry Spike

If users are unable to access a service and repeatedly try to gain access, it causes a retry spike. Retry spikes can cause a service to shut down, causing an incident.

Rich Alerting

Rich alerting is the method of alerting in which all alerts are ensured to reach the correct and most relevant alert recipient, depending on the type of alert and the recipient’s schedule.

Role-Based Security

Role-based security ensures that users are restricted to viewing only data and alerts that are for them. For example, a software developer will not receive or be able to view alerts meant for a C-level employee.

Service-level Agreement (SLA)

A service-level agreement (SLA) is a commitment made between a service provider and its client. It can include elements such as reliability, responsiveness, obligations, and penalties to be implemented when the SLA is not followed.

Severity

Severity describes the impact of an incident on a business’ users. For a severe incident, a business may need to craft a public statement to its users. An incident of minor severity may require action but may not immediately affect users.

Stakeholder

A stakeholder is any person with an interest and concern in a business. Stakeholders can include investors, users, employees, and executives of an organization.

Standard Operating Procedure (SOP)

A standard operating procedure (SOP) in IT alerting is a set of instructions compiled by an organization to help teams carry out routine IT operations based on alerts they receive.

Streamlining

In IT alerting, streamlining alerts makes organizations more efficient and effective by employing simpler working methods for received alerts.

Team Leader

In an incident response team, the team leader’s role is to coordinate incident response activity to keep the team on track and minimize damage to the system and organization.

Technical Debt

Technical debt is a programming concept related to the implied cost of extra development work. A DevOps team may build technical debt if it implements a solution that delivers short-term results versus a long-term solution that requires additional time to produce and implement.

Test Automation

Test automation involves the use of software to perform tests and compares actual and predicted test outcomes.

Toolchain

In DevOps, a toolchain is a set of software and/or products used to create a new program or perform a complex software development task. For example, IT alerting tools and/or incident management tools can be part of a toolchain.

Trigger

A trigger is any event that starts the automated response process in an IT alerting system.

Unit Testing

Unit testing is a software testing methodology in which each part of an application (unit) is evaluated individually.

Urgency

Urgency is the amount of time before an incident has a significant business impact. For example, an incident with high urgency may result in immediate brand reputation and/or revenue loss.

Virtual Machine (VM)

A Virtual Machine (VM) is a computer file that acts like a computer system. It runs like a typical computer program and replicates the system experience.

Accelerate your real-time operations.

Try AlertOps for FREE!

Sign up for a 14 day free trial

No obligation, no credit card required

Malcare WordPress Security