All Tasks

Filter by Category:

AlertManager Webhook Handler

TaskSet Run SLX Tasks with matching AlertManager Webhook commonLabels

Artifactory OK

SLI Check If Artifactory Endpoint Is Healthy

AWS Account Creation Notification

TaskSet Get The Recently Created AWS Accounts
SLI Get Count Of AWS Accounts In Organization

AWS ACM health

TaskSet List Unused ACM Certificates in AWS Region `${AWS_REGION}` in AWS Account `${AWS_ACCOUNT_ID}`
List Expiring ACM Certificates in AWS Region `${AWS_REGION}` in AWS Account `${AWS_ACCOUNT_ID}`
List Expired ACM Certificates in AWS Region `${AWS_REGION}` in AWS Account `${AWS_ACCOUNT_ID}`
List Failed Status ACM Certificates in AWS Region `${AWS_REGION}` in AWS Account `${AWS_ACCOUNT_ID}`
List Pending Validation ACM Certificates in AWS Region `${AWS_REGION}` in AWS Account `${AWS_ACCOUNT_ID}`
SLI Check for unused ACM certificates in AWS Region `${AWS_REGION}` in AWS account `${AWS_ACCOUNT_ID}`
Check for Expiring ACM certificates in AWS Region `${AWS_REGION}` in AWS account `${AWS_ACCOUNT_ID}`
Check for expired ACM certificates in AWS Region `${AWS_REGION}` in AWS account `${AWS_ACCOUNT_ID}`
Check for Failed Status ACM Certificates in AWS Region `${AWS_REGION}` in AWS Account `${AWS_ACCOUNT_ID}`
Check for Pending Validation ACM Certificates in AWS Region `${AWS_REGION}` in AWS Account `${AWS_ACCOUNT_ID}`
Generate Health Score

AWS Billing Period Costs by Tag

SLI Get All Billing Sliced By Tags

AWS CLI Command

TaskSet ${TASK_TITLE}
SLI ${TASK_TITLE}

AWS CLI Command with Issue

TaskSet ${TASK_TITLE}
SLI ${TASK_TITLE}

AWS CloudFormation Event Rate

SLI Fetch CloudFormation Stack Events

AWS CloudFormation Triage

TaskSet Get All Recent Stack Events

AWS CloudWatch Log Query (Pass/Fail)

SLI Running CloudWatch Log Query And Pushing 1 If No Results Found

AWS CloudWatch Log Query (Total Count)

SLI Running CloudWatch Log Query And Pushing The Count Of Results

AWS CloudWatch Logs health

TaskSet List CloudWatch Log Groups Without Retention Period in AWS Region `${AWS_REGION}` in AWS Account `${AWS_ACCOUNT_ID}`
Check CloudTrail Configuration in AWS Region `${AWS_REGION}` in AWS Account `${AWS_ACCOUNT_ID}`
Check for CloudTrail integration with CloudWatch Logs in AWS Region `${AWS_REGION}` in AWS Account `${AWS_ACCOUNT_ID}`
SLI Check CloudWatch Log Groups Without Retention Period in AWS Region `${AWS_REGION}` in AWS account `${AWS_ACCOUNT_ID}`
Check if CloudTrail exists and is configured for multi-region in AWS Region `${AWS_REGION}` in AWS Account `${AWS_ACCOUNT_ID}`
Check CloudTrail Without CloudWatch Logs in AWS Region `${AWS_REGION}` in AWS Account `${AWS_ACCOUNT_ID}`
Generate Health Score

AWS CloudWatch Metric Query Dashboard

TaskSet Get CloudWatch MetricQuery Insights URL

AWS CloudWatch Overutlized EC2 Inspection

TaskSet Check For Overutilized Ec2 Instances

AWS CloudWatch Tag Metric Query

SLI Run CloudWatch Metric Query Across Set Of IDs And Push Metric

AWS Costs by Tag

TaskSet Get All Billing Sliced By Tags

AWS EBS Health

TaskSet List Unattached EBS Volumes in AWS Region `${AWS_REGION}` in AWS account `${AWS_ACCOUNT_ID}`
List Unencrypted EBS Volumes in AWS Region `${AWS_REGION}` in AWS account `${AWS_ACCOUNT_ID}`
List Unused EBS Snapshots in AWS Region `${AWS_REGION}` in AWS account `${AWS_ACCOUNT_ID}`
SLI Check Unattached EBS Volumes in `${AWS_REGION}`
Check Unencrypted EBS Volumes in `${AWS_REGION}`
Check Unused EBS Snapshots in `${AWS_REGION}`
Generate EBS Score

AWS EC2 Health

TaskSet List stale AWS EC2 instances in AWS Region `${AWS_REGION}` in AWS account `${AWS_ACCOUNT_ID}`
List stopped AWS EC2 instances in AWS Region `${AWS_REGION}` in AWS account `${AWS_ACCOUNT_ID}`
List invalid AWS Auto Scaling Groups in AWS Region ${AWS_REGION} in AWS account ${AWS_ACCOUNT_ID}
SLI Check for stale AWS EC2 instances in AWS Region `${AWS_REGION}` in AWS account `${AWS_ACCOUNT_ID}`
Check for stopped AWS EC2 instances in AWS Region `${AWS_REGION}` in AWS account `${AWS_ACCOUNT_ID}`
Check for invalid AWS Auto Scaling Groups in AWS Region `${AWS_REGION}` in AWS account `${AWS_ACCOUNT_ID}`
Generate Health Score

AWS EC2 Security Check

TaskSet Check For Untagged instances
Check For Dangling Volumes
Check For Open Routes
Check For Overused Instances
Check For Underused Instances
Check For Underused Volumes
Check For Overused Volumes

AWS EKS Cluster Health

TaskSet Check EKS Fargate Cluster Health Status in AWS Region `${AWS_REGION}`
Check Amazon EKS Cluster Health Status in AWS Region `${AWS_REGION}`
Monitor EKS Cluster Health in AWS Region `${AWS_REGION}`
SLI Check Amazon EKS Cluster Health Status in AWS Region `${AWS_REGION}`

AWS EKS Nodegroup Status Check

TaskSet Check EKS Nodegroup Status in `${EKS_CLUSTER_NAME}`

AWS ElastiCache Health Check

TaskSet Scan AWS Elasticache Redis Status in AWS Region `${AWS_REGION}`
SLI Scan ElastiCaches in AWS Region `${AWS_REGION}`

AWS Lambda Health Check

TaskSet List Lambda Versions and Runtimes in AWS Region `${AWS_REGION}`
Analyze AWS Lambda Invocation Errors in Region `${AWS_REGION}`
Monitor AWS Lambda Performance Metrics in AWS Region `${AWS_REGION}`
SLI Analyze AWS Lambda Invocation Errors in Region `${AWS_REGION}`

AWS network health

TaskSet List Publicly Accessible Security Groups in AWS account `${AWS_ACCOUNT_ID}`
List unused Elastic IPs in AWS account `${AWS_ACCOUNT_ID}`
List unused ELBs in AWS account `${AWS_ACCOUNT_ID}`
List VPCs with Flow Logs Disabled in AWS account `${AWS_ACCOUNT_ID}`
SLI Check for publicly accessible security groups in AWS account `${AWS_ACCOUNT_ID}`
Check for unused Elastic IPs in AWS account `${AWS_ACCOUNT_ID}`
Check for unused ELBs in AWS account `${AWS_ACCOUNT_ID}`
Check for VPCs with Flow Logs disabled in AWS account `${AWS_ACCOUNT_ID}`
Generate Health Score

AWS RDS health

TaskSet List Unencrypted RDS Instances in AWS Region `${AWS_REGION}` in AWS Account `${AWS_ACCOUNT_ID}`
List Publicly Accessible RDS Instances in AWS Region `${AWS_REGION}` in AWS Account `${AWS_ACCOUNT_ID}`
List RDS Instances with Backups Disabled in AWS Region `${AWS_REGION}` in AWS Account `${AWS_ACCOUNT_ID}`
SLI Check for unencrypted RDS instances in AWS Region `${AWS_REGION}` in AWS account `${AWS_ACCOUNT_ID}`
Check for publicly accessible RDS instances in AWS Region `${AWS_REGION}` in AWS account `${AWS_ACCOUNT_ID}`
Check for disabled backup RDS instances in AWS Region `${AWS_REGION}` in AWS account `${AWS_ACCOUNT_ID}`
Generate Health Score

AWS S3 Bucket Info Report

TaskSet Check AWS S3 Bucket Storage Utilization

AWS S3 Health

TaskSet List S3 Buckets With Public Access in AWS Account `${AWS_ACCOUNT_NAME}`
SLI Count S3 Buckets With Public Access in AWS Account `${AWS_ACCOUNT_NAME}`

AWS S3 Stale Check

TaskSet Create Report For Stale Buckets

AWS VM Triage

TaskSet Get Max VM CPU Utilization In Last 3 Hours
Get Lowest VM CPU Credits In Last 3 Hours
Get Max VM CPU Credit Usage In Last 3 hours
Get Max VM Memory Utilization In Last 3 Hours
Get Max VM Volume Usage In Last 3 Hours

aws-cloudwatch-metricquery

SLI Running CloudWatch Metric Query And Pushing The Result

Azure ACR Image Sync

TaskSet Sync Container Images into Azure Container Registry `${ACR_REGISTRY}`
SLI Count Outdated Images in Azure Container Registry `${ACR_REGISTRY}`

Azure AKS Triage

TaskSet Check for Resource Health Issues Affecting AKS Cluster `${AKS_CLUSTER}` In Resource Group `${AZ_RESOURCE_GROUP}`
Check Configuration Health of AKS Cluster `${AKS_CLUSTER}` In Resource Group `${AZ_RESOURCE_GROUP}`
Check Network Configuration of AKS Cluster `${AKS_CLUSTER}` In Resource Group `${AZ_RESOURCE_GROUP}`
Fetch Activities for AKS Cluster `${AKS_CLUSTER}` In Resource Group `${AZ_RESOURCE_GROUP}`
SLI Check for Resource Health Issues Affecting AKS Cluster `${AKS_CLUSTER}` In Resource Group `${AZ_RESOURCE_GROUP}`
Fetch Activities for AKS Cluster `${AKS_CLUSTER}` In Resource Group `${AZ_RESOURCE_GROUP}`
Check Configuration Health of AKS Cluster `${AKS_CLUSTER}` In Resource Group `${AZ_RESOURCE_GROUP}`
Generate AKS Cluster Health Score

Azure App Service Operations

TaskSet Restart App Service `${APP_SERVICE_NAME}` in Resource Group `${AZ_RESOURCE_GROUP}`
Swap Deployment Slots for App Service `${APP_SERVICE_NAME}` in Resource Group `${AZ_RESOURCE_GROUP}`
Scale Up App Service `${APP_SERVICE_NAME}` in Resource Group `${AZ_RESOURCE_GROUP}`
Scale Down App Service `${APP_SERVICE_NAME}` in Resource Group `${AZ_RESOURCE_GROUP}`
Scale Out Instances for App Service `${APP_SERVICE_NAME}` in Resource Group `${AZ_RESOURCE_GROUP}` by ${SCALE_OUT_FACTOR}x
Scale In Instances for App Service `${APP_SERVICE_NAME}` in Resource Group `${AZ_RESOURCE_GROUP}` to 1/${SCALE_IN_FACTOR}
Redeploy App Service `${APP_SERVICE_NAME}` from Latest Source in Resource Group `${AZ_RESOURCE_GROUP}`

Azure App Service Triage

TaskSet Check for Resource Health Issues Affecting App Service `${APP_SERVICE_NAME}` In Resource Group `${AZ_RESOURCE_GROUP}`
Check App Service `${APP_SERVICE_NAME}` Health in Resource Group `${AZ_RESOURCE_GROUP}`
Fetch App Service `${APP_SERVICE_NAME}` Utilization Metrics In Resource Group `${AZ_RESOURCE_GROUP}`
Get App Service `${APP_SERVICE_NAME}` Logs In Resource Group `${AZ_RESOURCE_GROUP}`
Check Configuration Health of App Service `${APP_SERVICE_NAME}` In Resource Group `${AZ_RESOURCE_GROUP}`
Check Deployment Health of App Service `${APP_SERVICE_NAME}` In Resource Group `${AZ_RESOURCE_GROUP}`
Fetch App Service `${APP_SERVICE_NAME}` Activities In Resource Group `${AZ_RESOURCE_GROUP}`
Check Logs for Errors in App Service `${APP_SERVICE_NAME}` In Resource Group `${AZ_RESOURCE_GROUP}`
SLI Check for Resource Health Issues Affecting App Service `${APP_SERVICE_NAME}` In Resource Group `${AZ_RESOURCE_GROUP}`
Check App Service `${APP_SERVICE_NAME}` Health Check Metrics In Resource Group `${AZ_RESOURCE_GROUP}`
Check App Service `${APP_SERVICE_NAME}` Configuration Health In Resource Group `${AZ_RESOURCE_GROUP}`
Check Deployment Health of App Service `${APP_SERVICE_NAME}` In Resource Group `${AZ_RESOURCE_GROUP}`
Fetch App Service `${APP_SERVICE_NAME}` Activities In Resource Group `${AZ_RESOURCE_GROUP}`
Generate App Service Health Score for `${APP_SERVICE_NAME}` in resource group `${AZ_RESOURCE_GROUP}`

Azure Application Gateway Health

TaskSet Check for Resource Health Issues Affecting Application Gateway `${APP_GATEWAY_NAME}` In Resource Group `${AZ_RESOURCE_GROUP}`
Check Configuration Health of Application Gateway `${APP_GATEWAY_NAME}` In Resource Group `${AZ_RESOURCE_GROUP}`
Check Backend Pool Health for Application Gateway `${APP_GATEWAY_NAME}` In Resource Group `${AZ_RESOURCE_GROUP}`
Fetch Log Analytics for Application Gateway `${APP_GATEWAY_NAME}` In Resource Group `${AZ_RESOURCE_GROUP}`
Fetch Metrics for Application Gateway `${APP_GATEWAY_NAME}` In Resource Group `${AZ_RESOURCE_GROUP}`
Check SSL Certificate Health for Application Gateway `${APP_GATEWAY_NAME}` In Resource Group `${AZ_RESOURCE_GROUP}`
Check Logs for Errors with Application Gateway `${APP_GATEWAY_NAME}` In Resource Group `${AZ_RESOURCE_GROUP}`
List Related Azure Resources for Application Gateway `${APP_GATEWAY_NAME}` In Resource Group `${AZ_RESOURCE_GROUP}`
SLI Check for Resource Health Issues Affecting Application Gateway `${APP_GATEWAY_NAME}` In Resource Group `${AZ_RESOURCE_GROUP}`
Check Configuration Health of Application Gateway `${APP_GATEWAY_NAME}` In Resource Group `${AZ_RESOURCE_GROUP}`
Check Backend Pool Health for Application Gateway `${APP_GATEWAY_NAME}` In Resource Group `${AZ_RESOURCE_GROUP}`
Fetch Metrics for Application Gateway `${APP_GATEWAY_NAME}` In Resource Group `${AZ_RESOURCE_GROUP}`
Check SSL Certificate Health for Application Gateway `${APP_GATEWAY_NAME}` In Resource Group `${AZ_RESOURCE_GROUP}`
Check Logs for Errors with Application Gateway `${APP_GATEWAY_NAME}` In Resource Group `${AZ_RESOURCE_GROUP}`
Generate Application Gateway Health Score

Azure CLI Command

TaskSet ${TASK_TITLE}
SLI ${TASK_TITLE}

Azure CLI Command with Issue

TaskSet ${TASK_TITLE}
SLI ${TASK_TITLE}

Azure Internal LoadBalancer Triage

TaskSet Check Activity Logs for Azure Load Balancer `${AZ_LB_NAME}`

Azure VM Scale Set Triage

TaskSet Check Scale Set `${VMSCALESET}` Key Metrics In Resource Group `${AZ_RESOURCE_GROUP}`
Fetch VM Scale Set `${VMSCALESET}` Config In Resource Group `${AZ_RESOURCE_GROUP}`
Fetch Activities for VM Scale Set `${VMSCALESET}` In Resource Group `${AZ_RESOURCE_GROUP}`
SLI Check Scale Set `${VMSCALESET}` Key Metrics In Resource Group `${AZ_RESOURCE_GROUP}`

Cert-manager Expirations

SLI Inspect Certification Expiration Dates

Cert-Manager Health Check

SLI Health Check cert-manager Pods

Cortex Metrics Ingester Health

TaskSet Fetch Ingestor Ring Member List and Status
SLI Determine Cortex Ingester Ring Health

cURL CLI Command

TaskSet ${TASK_TITLE}
SLI ${TASK_TITLE}

cURL CLI Command with Issue

TaskSet ${TASK_TITLE}
SLI ${TASK_TITLE}

cURL Generic Report

TaskSet Run Curl Command and Add to Report
SLI Run Curl Command and Push Metric

cURL HTTP OK

TaskSet Check HTTP URL Availability and Timeliness for `${URL}`
SLI Validate HTTP URL Availability and Timeliness for ${URL}

Datadog Metric

SLI Query Datadog Metrics

Datadog System Load

SLI Check Datadog System Load

Discord Send Message

TaskSet Send Chat Message

DNS Latency

SLI Check DNS latency for Google Resolver

ElasticSearch Health

SLI Check Elasticsearch Cluster Health

GCP CLI Command

TaskSet ${TASK_TITLE}
SLI ${TASK_TITLE}

GCP CLI Command with Issue

TaskSet ${TASK_TITLE}
SLI ${TASK_TITLE}

GCP Cloud Function Health

TaskSet List Unhealthy Cloud Functions in GCP Project `${GCP_PROJECT_ID}`
Get Error Logs for Unhealthy Cloud Functions in GCP Project `${GCP_PROJECT_ID}`
SLI Count unhealthy GCP Cloud Functions in GCP Project `${GCP_PROJECT_ID}`

GCP GCloud Generic Report

TaskSet Run Gcloud CLI Command and Push metric
SLI Run Gcloud CLI Command and Push metric

GCP Gcloud Log Inspection

TaskSet Inspect GCP Logs For Common Errors in GCP Project `${GCP_PROJECT_ID}`

GCP Node Prempt List

TaskSet List all nodes in an active preempt operation for GCP Project `${GCP_PROJECT_ID}` within the last `${AGE}` hours
SLI Count the number of nodes in active preempt operation in project `${GCP_PROJECT_ID}`

GCP Operations Suite Log Query

SLI Running GCE Logging Query And Pushing Result Count Metric

GCP Operations Suite Log Query Dashboard URL

TaskSet Get GCP Log Dashboard URL For Given Log Query

GCP Operations Suite Metric Query

SLI Running GCP OpsSuite Metric Query

GCP Operations Suite Prometheus Query

SLI Run Prometheus Instant Query Against Google Prom API Endpoint

GCP Service Status

SLI Get Number of GCP Incidents Effecting My Workspace

GCP Storage Bucket Health

TaskSet Fetch GCP Bucket Storage Utilization for `${PROJECT_IDS}`
Add GCP Bucket Storage Configuration for `${PROJECT_IDS}` to Report
Check GCP Bucket Security Configuration for `${PROJECT_IDS}`
Fetch GCP Bucket Storage Operations Rate for `${PROJECT_IDS}`
SLI Fetch GCP Bucket Storage Utilization for `${PROJECT_IDS}`
Check GCP Bucket Security Configuration for `${PROJECT_IDS}`
Fetch GCP Bucket Storage Operations Rate for `${PROJECT_IDS}`
Generate Bucket Score in Project `${PROJECT_IDS}`

GitHub - Create Issue From RunSession

TaskSet Create GitHub Issue in Repository `${GITHUB_REPOSITORY}` from RunSession

GitHub Actions Artifact Analysis

TaskSet Analyze artifact from GitHub workflow `${WORKFLOW_NAME}` in repository `${GITHUB_REPO}`
SLI Analyze artifact from GitHub Workflow `${WORKFLOW_NAME}` in repository `${GITHUB_REPO}` and push metric

GitHub Actions Workflow Timing

SLI Get Average Run Time For Workflow

GitHub API Latency

TaskSet Check Latency When Creating a New GitHub Issue
SLI Check GitHub Latency With Get Repos

GitHub Service Status

SLI Get Availability of GitHub or Individual GitHub Components

GitHub Status Incidents

SLI Get Number of Incidents Affecting GitHub

GitHub Status Maintenance

SLI Get Scheduled and Active GitHub Maintenance Windows

GitLab Availability

TaskSet Check GitLab Server Status
SLI Check GitLab Server Status

GitLab Get Repo Latency

SLI Check GitLab Latency With Get Repos

GKE Kong Ingress Host Triage

TaskSet Check If Kong Ingress HTTP Error Rate Violates HTTP Error Threshold in GCP Project `${GCP_PROJECT_ID}`
Check If Kong Ingress HTTP Request Latency Violates Threshold in GCP Project `${GCP_PROJECT_ID}`
Check If Kong Ingress Controller Reports Upstream Errors in GCP Project `${GCP_PROJECT_ID}`

GKE Nginx Ingress Host Triage

TaskSet Fetch Nginx HTTP Errors From GMP for Ingress `${INGRESS_OBJECT_NAME}`
Find Owner and Service Health for Ingress `${INGRESS_OBJECT_NAME}`

Google Chat Send Message

TaskSet Send Chat Message

Grafana Health

SLI Check Grafana Server Health

gRPC cURL Unary

TaskSet Create a new Jira Issue
SLI Search Jira Issues By Current User

gRPC cURL Unary

TaskSet Run gRPCurl Command and Show Output
SLI Run gRPCurl Command and Push Metric

HahiCorp Vault Health

SLI Check If Vault Endpoint Is Healthy

HTTP Latency

SLI Check HTTP Latency to Well Known URL

HTTP OK

SLI Checking HTTP URL Is Available And Timely

K8s Jaeger Query

TaskSet Query Traces in Jaeger for Unhealthy HTTP Response Codes in Namespace `${NAMESPACE}`

K8s OpenTelemetry Collector Health

TaskSet Query Collector Queued Spans in Namespace `${NAMESPACE}`
Check OpenTelemetry Collector Logs For Errors In Namespace `${NAMESPACE}`
Query OpenTelemetry Logs For Dropped Spans In Namespace `${NAMESPACE}`

Kong Ingress Health (GCP PromQL)

SLI Get Access Token
Get HTTP Error Rate
Get Upstream Health
Get Request Latency Rate
Generate Kong Ingress Score

Kubeprometheus Operator Troubleshoot

TaskSet Check Prometheus Service Monitors in namespace `${NAMESPACE}`
Check For Successful Rule Setup in Kubernetes Namespace `${NAMESPACE}`
Verify Prometheus RBAC Can Access ServiceMonitors in Namespace `${PROM_NAMESPACE}`
Inspect Prometheus Operator Logs for Scraping Errors in Namespace `${NAMESPACE}`
Check Prometheus API Healthy in Namespace `${PROM_NAMESPACE}`

Kubernetes API Server Health

SLI Running Kubectl Check Against API Server

Kubernetes Application Log Health

TaskSet Scan ${WORKLOAD_TYPE} `${WORKLOAD_NAME}` Logs for Errors in Namespace `${NAMESPACE}`
Scan ${WORKLOAD_TYPE} `${WORKLOAD_NAME}` Logs for Stack Traces in Namespace `${NAMESPACE}`
Scan ${WORKLOAD_TYPE} `${WORKLOAD_NAME}` Logs for Connection Failures in Namespace `${NAMESPACE}`
Scan ${WORKLOAD_TYPE} `${WORKLOAD_NAME}` Logs for Timeout Errors in Namespace `${NAMESPACE}`
Scan ${WORKLOAD_TYPE} `${WORKLOAD_NAME}` Logs for Authentication and Authorization Failures in Namespace `${NAMESPACE}`
Scan ${WORKLOAD_TYPE} `${WORKLOAD_NAME}` Logs for Null Pointer and Unhandled Exceptions in Namespace `${NAMESPACE}`
Scan ${WORKLOAD_TYPE} `${WORKLOAD_NAME}` for Log Anomalies in Namespace `${NAMESPACE}`
Scan ${WORKLOAD_TYPE} `${WORKLOAD_NAME}` Logs for Application Restarts and Failures in Namespace `${NAMESPACE}`
Scan ${WORKLOAD_TYPE} `${WORKLOAD_NAME}` Logs for Memory and CPU Resource Warnings in Namespace `${NAMESPACE}`
Scan ${WORKLOAD_TYPE} `${WORKLOAD_NAME}` Logs for Service Dependency Failures in Namespace `${NAMESPACE}`
SLI Scan ${WORKLOAD_TYPE} `${WORKLOAD_NAME}` Logs for Errors in Namespace `${NAMESPACE}`
Scan ${WORKLOAD_TYPE} `${WORKLOAD_NAME}` Logs for Stack Traces in Namespace `${NAMESPACE}`
Scan ${WORKLOAD_TYPE} `${WORKLOAD_NAME}` Logs for Connection Failures in Namespace `${NAMESPACE}`
Scan ${WORKLOAD_TYPE} `${WORKLOAD_NAME}` Logs for Timeout Errors in Namespace `${NAMESPACE}`
Scan ${WORKLOAD_TYPE} `${WORKLOAD_NAME}` Logs for Authentication and Authorization Failures in Namespace `${NAMESPACE}`
Scan ${WORKLOAD_TYPE} `${WORKLOAD_NAME}` Logs for Null Pointer and Unhandled Exceptions in Namespace `${NAMESPACE}`
Scan ${WORKLOAD_TYPE} `${WORKLOAD_NAME}` for Log Anomalies in Namespace `${NAMESPACE}`
Scan ${WORKLOAD_TYPE} `${WORKLOAD_NAME}` Logs for Application Restarts and Failures in Namespace `${NAMESPACE}`
Scan ${WORKLOAD_TYPE} `${WORKLOAD_NAME}` Logs for Memory and CPU Resource Warnings in Namespace `${NAMESPACE}`
Scan ${WORKLOAD_TYPE} `${WORKLOAD_NAME}` Logs for Service Dependency Failures in Namespace `${NAMESPACE}`
Generate Application Gateway Health Score

Kubernetes Application Troubleshoot

TaskSet Get `${CONTAINER_NAME}` Application Logs from Workload `${WORKLOAD_NAME}` in Namespace `${NAMESPACE}`
Scan `${CONTAINER_NAME}` Application For Misconfigured Environment
Tail `${CONTAINER_NAME}` Application Logs For Stacktraces in Workload `${WORKLOAD_NAME}`
SLI Measure Application Exceptions in `${NAMESPACE}`

Kubernetes ArgoCD Application Health & Troubleshoot

TaskSet Fetch ArgoCD Application Sync Status & Health for `${APPLICATION}`
Fetch ArgoCD Application Last Sync Operation Details for `${APPLICATION}`
Fetch Unhealthy ArgoCD Application Resources for `${APPLICATION}`
Scan For Errors in Pod Logs Related to ArgoCD Application `${APPLICATION}`
Fully Describe ArgoCD Application `${APPLICATION}`

Kubernetes ArgoCD HelmRelease TaskSet

TaskSet Fetch all available ArgoCD Helm releases in namespace `${NAMESPACE}`
Fetch Installed ArgoCD Helm release versions in namespace `${NAMESPACE}`

Kubernetes Artifactory Triage

TaskSet Check Artifactory Liveness and Readiness Endpoints in `NAMESPACE`

Kubernetes cert-manager Healthcheck

TaskSet Get Namespace Certificate Summary for Namespace `${NAMESPACE}`
Find Unhealthy Certificates in Namespace `${NAMESPACE}`
Find Failed Certificate Requests and Identify Issues for Namespace `${NAMESPACE}`
SLI Count Unready and Expired Certificates in Namespace `${NAMESPACE}`

Kubernetes CLI Command

TaskSet ${TASK_TITLE}
SLI ${TASK_TITLE}

Kubernetes CLI Command

TaskSet ${TASK_TITLE}
SLI ${TASK_TITLE}

Kubernetes Cluster Node Health

TaskSet Check for Node Restarts in Cluster `${CONTEXT}` within Interval `${INTERVAL}`
SLI Check for Node Restarts in Cluster `${CONTEXT}`
Generate Namespace Score in Kubernetes Cluster `$${CONTEXT}`

Kubernetes Cluster Resource Health

TaskSet Identify High Utilization Nodes for Cluster `${CONTEXT}`
Identify Pods Causing High Node Utilization in Cluster `${CONTEXT}`
SLI Identify High Utilization Nodes for Cluster `${CONTEXT}`

Kubernetes Daemonset Health Check

SLI Health Check Daemonset

Kubernetes Daemonset Triage

TaskSet Get DaemonSet Logs for `${DAEMONSET_NAME}` and Add to Report
Get Related Daemonset `${DAEMONSET_NAME}` Events in Namespace `${NAMESPACE}`
Check Daemonset `${DAEMONSET_NAME}` Replicas

Kubernetes Decomission Workload

TaskSet Generate Decomission Commands

Kubernetes Deployment Operations

TaskSet Restart Deployment `${DEPLOYMENT_NAME}` in Namespace `${NAMESPACE}`
Force Delete Pods in Deployment `${DEPLOYMENT_NAME}` in Namespace `${NAMESPACE}`
Rollback Deployment `${DEPLOYMENT_NAME}` in Namespace `${NAMESPACE}` to Previous Version
Scale Down Deployment `${DEPLOYMENT_NAME}` in Namespace `${NAMESPACE}`
Scale Up Deployment `${DEPLOYMENT_NAME}` in Namespace `${NAMESPACE}` by ${SCALE_UP_FACTOR}x
Clean Up Stale ReplicaSets for Deployment `${DEPLOYMENT_NAME}` in Namespace `${NAMESPACE}`
Scale Down Stale ReplicaSets for Deployment `${DEPLOYMENT_NAME}` in Namespace `${NAMESPACE}`

Kubernetes Deployment Triage

TaskSet Check Deployment Log For Issues with `${DEPLOYMENT_NAME}`
Fetch Deployments Logs for `${DEPLOYMENT_NAME}` in Namespace `${NAMESPACE}` and Add to Report
Check Liveness Probe Configuration for Deployment `${DEPLOYMENT_NAME}`
Check Readiness Probe Configuration for Deployment `${DEPLOYMENT_NAME}` in Namespace `${NAMESPACE}`
Inspect Container Restarts for Deployment `${DEPLOYMENT_NAME}` in Namespace `${NAMESPACE}`
Inspect Deployment Warning Events for `${DEPLOYMENT_NAME}` in Namespace `${NAMESPACE}`
Fetch Deployment Workload Details For `${DEPLOYMENT_NAME}` in Namespace `${NAMESPACE}`
Inspect Deployment Replicas for `${DEPLOYMENT_NAME}` in namespace `${NAMESPACE}`
Check Deployment Event Anomalies for `${DEPLOYMENT_NAME}` in Namespace `${NAMESPACE}`
Check ReplicaSet Health for Deployment `${DEPLOYMENT_NAME}` in Namespace `${NAMESPACE}`

Kubernetes Event Query

SLI Get Number Of Matching Events

Kubernetes Flux Choas Testing

TaskSet Suspend the Flux Resource Reconciliation for `${FLUX_RESOURCE_NAME}` in namespace `${FLUX_RESOURCE_NAMESPACE}`
Select Random FluxCD Workload for Chaos Target in Namespace `${FLUX_RESOURCE_NAMESPACE}`
Execute Chaos Command on `${TARGET_RESOURCE}` in Namespace `${TARGET_NAMESPACE}`
Execute Additional Chaos Command on ${FLUX_RESOURCE_TYPE} '${FLUX_RESOURCE_NAME}' in namespace '${FLUX_RESOURCE_NAMESPACE}'
Resume Flux Resource Reconciliation in `${TARGET_NAMESPACE}`

Kubernetes FluxCD HelmRelease TaskSet

TaskSet List all available FluxCD Helmreleases in Namespace `${NAMESPACE}`
Fetch Installed FluxCD Helmrelease Versions in Namespace `${NAMESPACE}`
Fetch Mismatched FluxCD HelmRelease Version in Namespace `${NAMESPACE}`
Fetch FluxCD HelmRelease Error Messages in Namespace `${NAMESPACE}`
Check for Available Helm Chart Updates in Namespace `${NAMESPACE}`

Kubernetes FluxCD Kustomization TaskSet

TaskSet List all available FluxCD Kustomization objects in Namespace `${NAMESPACE}`
List Unready FluxCD Kustomizations in Namespace `${NAMESPACE}`

Kubernetes Fluxcd Reconciliation Report

TaskSet Check FluxCD Reconciliation Health in Kubernetes Namespace `${FLUX_NAMESPACE}`
SLI Health Check Flux Reconciliation

Kubernetes GitOps GitHub Remediation

TaskSet Remediate Readiness and Liveness Probe GitOps Manifests in Namespace `${NAMESPACE}`
Increase ResourceQuota Limit for Namespace `${NAMESPACE}` in GitHub GitOps Repository
Adjust Pod Resources to Match VPA Recommendation in `${NAMESPACE}`
Expand Persistent Volume Claims in Namespace `${NAMESPACE}`

Kubernetes Grafana Loki Health Check

TaskSet Check Loki Ring API for Unhealthy Shards in Kubernetes Cluster `$${NAMESPACE}`
Check Loki API Ready in Kubernetes Cluster `${NAMESPACE}`

Kubernetes Image Check

TaskSet Check Image Rollover Times for Namespace `${NAMESPACE}`
List Images and Tags for Every Container in Running Pods for Namespace `${NAMESPACE}`
List Images and Tags for Every Container in Failed Pods for Namespace `${NAMESPACE}`
List ImagePullBackOff Events and Test Path and Tags for Namespace `${NAMESPACE}`

Kubernetes Ingress GCE & GCP HTTP Load Balancer Healthcheck

TaskSet Search For GCE Ingress Warnings in GKE Context `${CONTEXT}`
Identify Unhealthy GCE HTTP Ingress Backends in GKE Namespace `${NAMESPACE}`
Validate GCP HTTP Load Balancer Configurations in GCP Project `${GCP_PROJECT_ID}`
Fetch Network Error Logs from GCP Operations Manager for Ingress Backends in GCP Project `${GCP_PROJECT_ID}`
Review GCP Operations Logging Dashboard in GCP project `${GCP_PROJECT_ID}`

Kubernetes Ingress Healthcheck

TaskSet Fetch Ingress Object Health in Namespace `${NAMESPACE}`
Check for Ingress and Service Conflicts in Namespace `${NAMESPACE}`

Kubernetes Jenkins Healthcheck

TaskSet Query The Jenkins Kubernetes Workload HTTP Endpoint in Kubernetes StatefulSet `${STATEFULSET_NAME}`
Query For Stuck Jenkins Jobs in Kubernetes Statefulset Workload `${STATEFULSET_NAME}`

Kubernetes Labeled Pod Count

SLI Measure Number of Running Pods with Label in `${NAMESPACE}`

Kubernetes Namespace Chaos Engineering

TaskSet Kill Random Pods In Namespace `${NAMESPACE}`
OOMKill Pods In Namespace `${NAMESPACE}`
Mangle Service Selector In Namespace `${NAMESPACE}`
Mangle Service Port In Namespace `${NAMESPACE}`
Fill Random Pod Tmp Directory In Namespace `${NAMESPACE}`

Kubernetes Namespace Inspection

TaskSet Inspect Warning Events in Namespace `${NAMESPACE}`
Inspect Container Restarts In Namespace `${NAMESPACE}`
Inspect Pending Pods In Namespace `${NAMESPACE}`
Inspect Failed Pods In Namespace `${NAMESPACE}`
Inspect Workload Status Conditions In Namespace `${NAMESPACE}`
Get Listing Of Resources In Namespace `${NAMESPACE}`
Check Event Anomalies in Namespace `${NAMESPACE}`
Check Missing or Risky PodDisruptionBudget Policies in Namepace `${NAMESPACE}`
Check Resource Quota Utilization in Namespace `${NAMESPACE}`
SLI Get Error Event Count within ${EVENT_AGE} and calculate Score
Get Container Restarts and Score in Namespace `${NAMESPACE}`
Get NotReady Pods in `${NAMESPACE}`
Generate Namespace Score in `${NAMESPACE}`

Kubernetes Namespace Troubleshoot

TaskSet Trace Namespace Errors
Fetch Unready Pods
Triage Namespace
Object Condition Check
Namespace Get All
SLI Get Event Count and Score
Get Container Restarts and Score
Get NotReady Pods
Generate Namspace Score

Kubernetes Patroni Health Check

SLI Determine Patroni Health

Kubernetes Patroni Lag Health

TaskSet Determine Patroni Health
SLI Measure Patroni Member Lag

Kubernetes Persistent Volume Healthcheck

TaskSet Fetch Events for Unhealthy Kubernetes PersistentVolumeClaims in Namespace `${NAMESPACE}`
List PersistentVolumeClaims in Terminating State in Namespace `${NAMESPACE}`
List PersistentVolumes in Terminating State in Namespace `${NAMESPACE}`
List Pods with Attached Volumes and Related PersistentVolume Details in Namespace `${NAMESPACE}`
Fetch the Storage Utilization for PVC Mounts in Namespace `${NAMESPACE}`
Check for RWO Persistent Volume Node Attachment Issues in Namespace `${NAMESPACE}`
SLI Fetch the Storage Utilization for PVC Mounts in Namespace `${NAMESPACE}`
Generate Namespace Score for Namespace `${NAMESPACE}`

Kubernetes Pod Resources Health

TaskSet Show Pods Without Resource Limit or Resource Requests Set in Namespace `${NAMESPACE}`
Check Pod Resource Utilization with Top in Namespace `${NAMESPACE}`
Identify VPA Pod Resource Recommendations in Namespace `${NAMESPACE}`
Identify Overutilized Pods in Namespace `${NAMESPACE}`

Kubernetes Postgres Healthcheck

TaskSet List Resources Related to Postgres Cluster `${OBJECT_NAME}` in Namespace `${NAMESPACE}`
Get Postgres Pod Logs & Events for Cluster `${OBJECT_NAME}` in Namespace `${NAMESPACE}`
Get Postgres Pod Resource Utilization for Cluster `${OBJECT_NAME}` in Namespace `${NAMESPACE}`
Get Running Postgres Configuration for Cluster `${OBJECT_NAME}` in Namespace `${NAMESPACE}`
Get Patroni Output and Add to Report for Cluster `${OBJECT_NAME}` in Namespace `${NAMESPACE}`
Fetch Patroni Database Lag for Cluster `${OBJECT_NAME}` in Namespace `${NAMESPACE}`
Check Database Backup Status for Cluster `${OBJECT_NAME}` in Namespace `${NAMESPACE}`
Run DB Queries for Cluster `${OBJECT_NAME}` in Namespace `${NAMESPACE}`
SLI Check Patroni Database Lag in Namespace `${NAMESPACE}` on Host `${HOSTNAME}` using `patronictl`
Check Database Backup Status for Cluster `${OBJECT_NAME}` in Namespace `${NAMESPACE}`
Generate Namespace Score for Namespace `${NAMESPACE}`

Kubernetes PostgreSQL Query

TaskSet Run Postgres Query And Results to Report
SLI Run Postgres Query And Return Result As Metric

Kubernetes PostgreSQL Triage

TaskSet Get Standard Resources
Describe Custom Resources
Get Pod Logs & Events
Get Pod Resource Utilization
Get Running Configuration
Get Patroni Output
Run DB Queries

Kubernetes Redis Healthcheck

TaskSet Ping `${DEPLOYMENT_NAME}` Redis Workload
Verify `${DEPLOYMENT_NAME}` Redis Read Write Operation in Kubernetes

Kubernetes Restart resource

TaskSet Get Current Resource State with Labels `${LABELS}`
Get Resource Logs with Labels `${LABELS}`
Restart Resource with Labels `${LABELS}` in `${CONTEXT}`

Kubernetes Run Shell Command

TaskSet Running Kubectl And Adding Stdout To Report

Kubernetes Service Account Check

TaskSet Test Service Account Access to Kubernetes API Server in Namespace `${NAMESPACE}`

Kubernetes StatefulSet Triage

TaskSet Check Readiness Probe Configuration for StatefulSet `${STATEFULSET_NAME}`
Check Liveness Probe Configuration for StatefulSet `${STATEFULSET_NAME}`
Troubleshoot StatefulSet Warning Events for `${STATEFULSET_NAME}`
Check StatefulSet Event Anomalies for `${STATEFULSET_NAME}` in Namespace `${NAMESPACE}`
Fetch StatefulSet Logs for `${STATEFULSET_NAME}` in Namespace `${NAMESPACE}` and Add to Report
Get Related StatefulSet `${STATEFULSET_NAME}` Events
Fetch Manifest Details for StatefulSet `${STATEFULSET_NAME}` in Namespace `${NAMESPACE}`
List Unhealthy Replica Counts for StatefulSets in Namespace `${NAMESPACE}`

Kubernetes Synthetic PVC Test

SLI Run Canary Job

Kubernetes Tail Application Logs

TaskSet Get `${CONTAINER_NAME}` Application Logs in Namespace `${NAMESPACE}`
Tail `${CONTAINER_NAME}` Application Logs For Stacktraces
SLI Tail `${CONTAINER_NAME}` Application Logs For Stacktraces

Kubernetes Top

SLI Running Kubectl Top And Extracting Metric Data

Kubernetes Triage Deployment Replicas

TaskSet Fetch Logs
Get Related Events
Check Deployment Replicas

Kubernetes Triage Patroni

TaskSet Get Patroni Status
Get Pods Status
Fetch Logs

Kubernetes Triage StatefulSet

TaskSet Check StatefulSets Replicas Ready
Get Events For The StatefulSet
Get StatefulSet Logs
Get StatefulSet Manifests Dump

Kubernetes Troubleshoot Deployment

TaskSet Troubleshoot Resourcing
Troubleshoot Events
Troubleshoot PVC
Troubleshoot Pods

Kubernetes Vault Triage

TaskSet Fetch Vault CSI Driver Logs in Namespace `${NAMESPACE}`
Get Vault CSI Driver Warning Events in `${NAMESPACE}`
Check Vault CSI Driver Replicas
Fetch Vault Pod Workload Logs in Namespace `${NAMESPACE}` with Labels `${LABELS}`
Get Related Vault Events in Namespace `${NAMESPACE}`
Fetch Vault StatefulSet Manifest Details in `${NAMESPACE}`
Fetch Vault DaemonSet Manifest Details in Kubernetes Cluster `${NAMESPACE}`
Verify Vault Availability in Namespace `${NAMESPACE}` and Context `${CONTEXT}`
Check Vault StatefulSet Replicas in `NAMESPACE`

Kubernetes Workload Chaos Engineering

TaskSet Test `${WORKLOAD_NAME}` High Availability in Namespace `${NAMESPACE}`
OOMKill `${WORKLOAD_NAME}` Pod
Mangle Service Selector For `${WORKLOAD_NAME}` in `${NAMESPACE}`
Mangle Service Port For `${WORKLOAD_NAME}` in `${NAMESPACE}`
Fill Tmp Directory Of Pod From `${WORKLOAD_NAME}`

Kubernetes Workload Metric

SLI Running Kubectl get and push the metric

Microsoft Teams Send Message

TaskSet Send a Message to an MS Teams Channel

MongoDB Health (GCP PromQL)

SLI Get Access Token
Get Instance Status
Get Connection Utilization Rate
Get MongoDB Member State Health
Get MongoDB Replication Lag
Get MongoDB Queue Size
Get Assertion Rate
Generate MongoDB Score

OpsGenie Create Alert

TaskSet Get Opsgenie System Info
Create An Alert

PagerDuty Webhook Handler

TaskSet Run SLX Tasks with matching PagerDuty Webhook Service ID

Ping Host Availability

SLI Ping host and collect packet lost percentage

Pingdom Health

SLI Check Pingdom Health

Prometheus Query (Instant) Metric

SLI Querying Prometheus Instance And Pushing Aggregated Data

Prometheus Query (Range) Metric

SLI Querying Prometheus Instance And Pushing Aggregated Data

rds-mysql-conn-count

TaskSet Run Bash File
SLI Querying Prometheus Instance And Pushing Aggregated Data

REST Metric

SLI Request Data From Rest Endpoint

REST Metric (Basic Auth)

SLI Request Data From Rest Endpoint

REST Metric (Explicit OAuth2 with BasicAuth)

SLI Request Data From Rest Endpoint

REST Metric (Explicit OAuth2 with Bearer Token)

SLI Request Data From Rest Endpoint

RocketChat Send Message

TaskSet Send Chat Message

RunWhen Local Helm Update (ACR)

TaskSet Apply Available RunWhen Helm Images in ACR Registry`${REGISTRY_NAME}`
SLI Check for Available RunWhen Helm Images in ACR Registry`${REGISTRY_NAME}`

RunWhen Platform Azure ACR Image Sync

TaskSet Sync CodeCollection Images to ACR Registry `${REGISTRY_NAME}`
Sync RunWhen Local Image Updates to ACR Registry`${REGISTRY_NAME}`
SLI Check for CodeCollection Updates against ACR Registry`${REGISTRY_NAME}`
Check for RunWhen Local Image Updates against ACR Registry`${REGISTRY_NAME}`
Count Images Needing Update and Push Metric

Slack - Send Issue Summary From RunSession

TaskSet Send Slack Notification to Channel `${SLACK_CHANNEL}` from RunSession

Slack Send Message

TaskSet Send Chat Message

SLI Alert Threshold

SLI Check If SLI Within Incident Threshold

Sysdig Monitor Metric

SLI Query Sysdig Metric Data And Pushing Metric

Sysdig Monitor PromQL Metric

SLI Querying PromQL Endpoint And Pushing Metric Data

Terraform Cloud Workspace Lock Check

TaskSet Checking whether the Terraform Cloud Workspace '${TERRAFORM_WORKSPACE_NAME}' is in a locked state

Twitter Query Handle

TaskSet Query Twitter
SLI Query Twitter

Uptime.com Component Health

SLI Check If Vault Endpoint Is Healthy

Web Triage

TaskSet Validate Platform Egress
Perform Inspection On URL