All Tasks

Filter by Category:

AlertManager Webhook Handler

TaskSet Run SLX Tasks with matching AlertManager Webhook commonLabels

Artifactory OK

SLI Check If Artifactory Endpoint Is Healthy

AWS Account Creation Notification

TaskSet Get The Recently Created AWS Accounts
SLI Get Count Of AWS Accounts In Organization

AWS Billing Period Costs by Tag

SLI Get All Billing Sliced By Tags

AWS CLI Command

TaskSet ${TASK_TITLE}
SLI ${TASK_TITLE}

AWS CLI Command with Issue

TaskSet ${TASK_TITLE}
SLI ${TASK_TITLE}

AWS CloudFormation Event Rate

SLI Fetch CloudFormation Stack Events

AWS CloudFormation Triage

TaskSet Get All Recent Stack Events

AWS CloudWatch Log Query (Pass/Fail)

SLI Running CloudWatch Log Query And Pushing 1 If No Results Found

AWS CloudWatch Log Query (Total Count)

SLI Running CloudWatch Log Query And Pushing The Count Of Results

AWS CloudWatch Logs health

TaskSet List CloudWatch Log Groups Without Retention Period in AWS Region `${AWS_REGION}` in AWS Account `${AWS_ACCOUNT_ID}`
Check CloudTrail Configuration in AWS Region `${AWS_REGION}` in AWS Account `${AWS_ACCOUNT_ID}`
Check for CloudTrail integration with CloudWatch Logs in AWS Region `${AWS_REGION}` in AWS Account `${AWS_ACCOUNT_ID}`
SLI Check CloudWatch Log Groups Without Retention Period in AWS Region `${AWS_REGION}` in AWS account `${AWS_ACCOUNT_ID}`
Check if CloudTrail exists and is configured for multi-region in AWS Region `${AWS_REGION}` in AWS Account `${AWS_ACCOUNT_ID}`
Check CloudTrail Without CloudWatch Logs in AWS Region `${AWS_REGION}` in AWS Account `${AWS_ACCOUNT_ID}`
Generate Health Score

AWS CloudWatch Metric Query Dashboard

TaskSet Get CloudWatch MetricQuery Insights URL

AWS CloudWatch Overutlized EC2 Inspection

TaskSet Check For Overutilized Ec2 Instances

AWS CloudWatch Tag Metric Query

SLI Run CloudWatch Metric Query Across Set Of IDs And Push Metric

AWS Costs by Tag

TaskSet Get All Billing Sliced By Tags

AWS EBS Health

TaskSet List Unattached EBS Volumes in AWS Region `${AWS_REGION}` in AWS account `${AWS_ACCOUNT_ID}`
List Unencrypted EBS Volumes in AWS Region `${AWS_REGION}` in AWS account `${AWS_ACCOUNT_ID}`
List Unused EBS Snapshots in AWS Region `${AWS_REGION}` in AWS account `${AWS_ACCOUNT_ID}`
SLI Check Unattached EBS Volumes in `${AWS_REGION}`
Check Unencrypted EBS Volumes in `${AWS_REGION}`
Check Unused EBS Snapshots in `${AWS_REGION}`
Generate EBS Score

AWS EC2 Health

TaskSet List stale AWS EC2 instances in AWS Region `${AWS_REGION}` in AWS account `${AWS_ACCOUNT_ID}`
List stopped AWS EC2 instances in AWS Region `${AWS_REGION}` in AWS account `${AWS_ACCOUNT_ID}`
List invalid AWS Auto Scaling Groups in AWS Region ${AWS_REGION} in AWS account ${AWS_ACCOUNT_ID}
SLI Check for stale AWS EC2 instances in AWS Region `${AWS_REGION}` in AWS account `${AWS_ACCOUNT_ID}`
Check for stopped AWS EC2 instances in AWS Region `${AWS_REGION}` in AWS account `${AWS_ACCOUNT_ID}`
Check for invalid AWS Auto Scaling Groups in AWS Region `${AWS_REGION}` in AWS account `${AWS_ACCOUNT_ID}`
Generate Health Score

AWS EC2 Security Check

TaskSet Check For Untagged instances
Check For Dangling Volumes
Check For Open Routes
Check For Overused Instances
Check For Underused Instances
Check For Underused Volumes
Check For Overused Volumes

AWS EKS Cluster Health

TaskSet Check EKS Fargate Cluster Health Status
Check EKS Cluster Health Status
List EKS Cluster Metrics
SLI Check EKS Cluster Health Status

AWS EKS Nodegroup Status Check

TaskSet Check EKS Nodegroup Status

AWS ElastiCache Health Check

TaskSet Scan AWS Elasticache Redis Status
SLI Scan ElastiCaches

AWS Lambda Health Check

TaskSet List Lambda Versions and Runtimes
Analyze AWS Lambda Invocation Errors
Monitor AWS Lambda Performance Metrics
SLI Analyze AWS Lambda Invocation Errors

AWS network health

TaskSet List Publicly Accessible Security Groups in AWS account `${AWS_ACCOUNT_ID}`
List unused Elastic IPs in AWS account `${AWS_ACCOUNT_ID}`
List unused ELBs in AWS account `${AWS_ACCOUNT_ID}`
List VPCs with Flow Logs Disabled in AWS account `${AWS_ACCOUNT_ID}`
SLI Check for publicly accessible security groups in AWS account `${AWS_ACCOUNT_ID}`
Check for unused Elastic IPs in AWS account `${AWS_ACCOUNT_ID}`
Check for unused ELBs in AWS account `${AWS_ACCOUNT_ID}`
Check for VPCs with Flow Logs disabled in AWS account `${AWS_ACCOUNT_ID}`
Generate Health Score

AWS RDS health

TaskSet List Unencrypted RDS Instances in AWS Region `${AWS_REGION}` in AWS Account `${AWS_ACCOUNT_ID}`
List Publicly Accessible RDS Instances in AWS Region `${AWS_REGION}` in AWS Account `${AWS_ACCOUNT_ID}`
List RDS Instances with Backups Disabled in AWS Region `${AWS_REGION}` in AWS Account `${AWS_ACCOUNT_ID}`
SLI Check for unencrypted RDS instances in AWS Region `${AWS_REGION}` in AWS account `${AWS_ACCOUNT_ID}`
Check for publicly accessible RDS instances in AWS Region `${AWS_REGION}` in AWS account `${AWS_ACCOUNT_ID}`
Check for disabled backup RDS instances in AWS Region `${AWS_REGION}` in AWS account `${AWS_ACCOUNT_ID}`
Generate Health Score

AWS S3 Bucket Info Report

TaskSet Check AWS S3 Bucket Storage Utilization

AWS S3 Health

TaskSet List S3 Buckets With Public Access in AWS Account `${AWS_ACCOUNT_NAME}`
SLI Count S3 Buckets With Public Access in AWS Account `${AWS_ACCOUNT_NAME}`

AWS S3 Stale Check

TaskSet Create Report For Stale Buckets

AWS VM Triage

TaskSet Get Max VM CPU Utilization In Last 3 Hours
Get Lowest VM CPU Credits In Last 3 Hours
Get Max VM CPU Credit Usage In Last 3 hours
Get Max VM Memory Utilization In Last 3 Hours
Get Max VM Volume Usage In Last 3 Hours

aws-cloudwatch-metricquery

SLI Running CloudWatch Metric Query And Pushing The Result

Azure ACR Image Sync

TaskSet Sync Container Images into Azure Container Registry `${ACR_REGISTRY}`
SLI Count Outdated Images in Azure Container Registry `${ACR_REGISTRY}`

Azure AKS Triage

TaskSet Check for Resource Health Issues Affecting AKS Cluster `${AKS_CLUSTER}` In Resource Group `${AZ_RESOURCE_GROUP}`
Check Configuration Health of AKS Cluster `${AKS_CLUSTER}` In Resource Group `${AZ_RESOURCE_GROUP}`
Check Network Configuration of AKS Cluster `${AKS_CLUSTER}` In Resource Group `${AZ_RESOURCE_GROUP}`
Fetch Activities for AKS Cluster `${AKS_CLUSTER}` In Resource Group `${AZ_RESOURCE_GROUP}`
SLI Check for Resource Health Issues Affecting AKS Cluster `${AKS_CLUSTER}` In Resource Group `${AZ_RESOURCE_GROUP}`
Fetch Activities for AKS Cluster `${AKS_CLUSTER}` In Resource Group `${AZ_RESOURCE_GROUP}`
Check Configuration Health of AKS Cluster `${AKS_CLUSTER}` In Resource Group `${AZ_RESOURCE_GROUP}`
Generate AKS Cluster Health Score

Azure App Service Triage

TaskSet Check for Resource Health Issues Affecting App Service `${APP_SERVICE_NAME}` In Resource Group `${AZ_RESOURCE_GROUP}`
Check App Service `${APP_SERVICE_NAME}` Health Check Metrics In Resource Group `${AZ_RESOURCE_GROUP}`
Fetch App Service `${APP_SERVICE_NAME}` Utilization Metrics In Resource Group `${AZ_RESOURCE_GROUP}`
Get App Service `${APP_SERVICE_NAME}` Logs In Resource Group `${AZ_RESOURCE_GROUP}`
Check Configuration Health of App Service `${APP_SERVICE_NAME}` In Resource Group `${AZ_RESOURCE_GROUP}`
Check Deployment Health of App Service `${APP_SERVICE_NAME}` In Resource Group `${AZ_RESOURCE_GROUP}`
Fetch App Service `${APP_SERVICE_NAME}` Activities In Resource Group `${AZ_RESOURCE_GROUP}`
Check Logs for Errors in App Service `${APP_SERVICE_NAME}` In Resource Group `${AZ_RESOURCE_GROUP}`
SLI Check for Resource Health Issues Affecting App Service `${APP_SERVICE_NAME}` In Resource Group `${AZ_RESOURCE_GROUP}`
Check App Service `${APP_SERVICE_NAME}` Health Check Metrics In Resource Group `${AZ_RESOURCE_GROUP}`
Check App Service `${APP_SERVICE_NAME}` Configuration Health In Resource Group `${AZ_RESOURCE_GROUP}`
Check Deployment Health of App Service `${APP_SERVICE_NAME}` In Resource Group `${AZ_RESOURCE_GROUP}`
Fetch App Service `${APP_SERVICE_NAME}` Activities In Resource Group `${AZ_RESOURCE_GROUP}`
Generate App Service Health Score

Azure Application Gateway Health

TaskSet Check for Resource Health Issues Affecting Application Gateway `${APP_GATEWAY_NAME}` In Resource Group `${AZ_RESOURCE_GROUP}`
Check Configuration Health of Application Gateway `${APP_GATEWAY_NAME}` In Resource Group `${AZ_RESOURCE_GROUP}`
Check Backend Pool Health for Application Gateway `${APP_GATEWAY_NAME}` In Resource Group `${AZ_RESOURCE_GROUP}`
SLI Check for Resource Health Issues Affecting Application Gateway `${APP_GATEWAY_NAME}` In Resource Group `${AZ_RESOURCE_GROUP}`
Check Configuration Health of Application Gateway `${APP_GATEWAY_NAME}` In Resource Group `${AZ_RESOURCE_GROUP}`
Check Backend Pool Health for Application Gateway `${APP_GATEWAY_NAME}` In Resource Group `${AZ_RESOURCE_GROUP}`
Generate Application Gateway Health Score

Azure CLI Command

TaskSet ${TASK_TITLE}
SLI ${TASK_TITLE}

Azure CLI Command with Issue

TaskSet ${TASK_TITLE}
SLI ${TASK_TITLE}

Azure Internal LoadBalancer Triage

TaskSet Check Activity Logs for Azure Load Balancer `${AZ_LB_NAME}`

Azure VM Scale Set Triage

TaskSet Check Scale Set `${VMSCALESET}` Key Metrics In Resource Group `${AZ_RESOURCE_GROUP}`
Fetch VM Scale Set `${VMSCALESET}` Config In Resource Group `${AZ_RESOURCE_GROUP}`
Fetch Activities for VM Scale Set `${VMSCALESET}` In Resource Group `${AZ_RESOURCE_GROUP}`
SLI Check Scale Set `${VMSCALESET}` Key Metrics In Resource Group `${AZ_RESOURCE_GROUP}`

Cert-manager Expirations

SLI Inspect Certification Expiration Dates

Cert-Manager Health Check

SLI Health Check cert-manager Pods

cli-test

TaskSet Run CLI and Parse Output For Issues
Exec Test
Local Process Test

cmd-test

TaskSet Run CLI Command
Run Bash File
Log Suggestion

Cortex Metrics Ingester Health

TaskSet Fetch Ingestor Ring Member List and Status
SLI Determine Cortex Ingester Ring Health

cURL CLI Command

TaskSet ${TASK_TITLE}
SLI ${TASK_TITLE}

cURL CLI Command with Issue

TaskSet ${TASK_TITLE}
SLI ${TASK_TITLE}

cURL Generic Report

TaskSet Run Curl Command and Add to Report
SLI Run Curl Command and Push Metric

cURL HTTP OK

TaskSet Checking HTTP URL Is Available And Timely
SLI Checking HTTP URL Is Available And Timely

Datadog Metric

SLI Query Datadog Metrics

Datadog System Load

SLI Check Datadog System Load

Discord Send Message

TaskSet Send Chat Message

DNS Latency

SLI Check DNS latency for Google Resolver

ElasticSearch Health

SLI Check Elasticsearch Cluster Health

GCP CLI Command

TaskSet ${TASK_TITLE}
SLI ${TASK_TITLE}

GCP CLI Command with Issue

TaskSet ${TASK_TITLE}
SLI ${TASK_TITLE}

GCP Cloud Function Health

TaskSet List Unhealhy Cloud Functions in GCP Project `${GCP_PROJECT_ID}`
Get Error Logs for Unhealthy Cloud Functions in GCP Project `${GCP_PROJECT_ID}`
SLI Count unhealthy GCP Cloud Functions in GCP Project `${GCP_PROJECT_ID}`

GCP GCloud Generic Report

TaskSet Run Gcloud CLI Command and Push metric
SLI Run Gcloud CLI Command and Push metric

GCP Gcloud Log Inspection

TaskSet Inspect GCP Logs For Common Errors

GCP Node Prempt List

TaskSet List all nodes in an active prempt operation for GCP Project `${GCP_PROJECT_ID}`
SLI Count the number of nodes in active prempt operation

GCP Operations Suite Log Query

SLI Running GCE Logging Query And Pushing Result Count Metric

GCP Operations Suite Log Query Dashboard URL

TaskSet Get GCP Log Dashboard URL For Given Log Query

GCP Operations Suite Metric Query

SLI Running GCP OpsSuite Metric Query

GCP Operations Suite Prometheus Query

SLI Run Prometheus Instant Query Against Google Prom API Endpoint

GCP Service Status

SLI Get Number of GCP Incidents Effecting My Workspace

GCP Storage Bucket Health

TaskSet Fetch GCP Bucket Storage Utilization for `${PROJECT_IDS}`
Add GCP Bucket Storage Configuration for `${PROJECT_IDS}` to Report
Check GCP Bucket Security Configuration for `${PROJECT_IDS}`
Fetch GCP Bucket Storage Operations Rate for `${PROJECT_IDS}`
SLI Fetch GCP Bucket Storage Utilization for `${PROJECT_IDS}`
Check GCP Bucket Security Configuration for `${PROJECT_IDS}`
Fetch GCP Bucket Storage Operations Rate for `${PROJECT_IDS}`
Generate Bucket Score

GitHub Actions Artifact Analysis

TaskSet Analyze artifact from GitHub workflow `${WORKFLOW_NAME}` in repository `${GITHUB_REPO}`
SLI Analyze artifact from GitHub Workflow `${WORKFLOW_NAME}` in repository `${GITHUB_REPO}` and push metric

GitHub Actions Workflow Timing

SLI Get Average Run Time For Workflow

GitHub API Latency

TaskSet Check Latency When Creating a New GitHub Issue
SLI Check GitHub Latency With Get Repos

GitHub Service Status

SLI Get Availability of GitHub or Individual GitHub Components

GitHub Status Incidents

SLI Get Number of Incidents Affecting GitHub

GitHub Status Maintenance

SLI Get Scheduled and Active GitHub Maintenance Windows

GitLab Availability

TaskSet Check GitLab Server Status
SLI Check GitLab Server Status

GitLab Get Repo Latency

SLI Check GitLab Latency With Get Repos

GKE Kong Ingress Host Triage

TaskSet Check If Kong Ingress HTTP Error Rate Violates HTTP Error Threshold
Check If Kong Ingress HTTP Request Latency Violates Threshold
Check If Kong Ingress Controller Reports Upstream Errors

GKE Nginx Ingress Host Triage

TaskSet Fetch Nginx HTTP Errors From GMP for Ingress `${INGRESS_OBJECT_NAME}`
Find Owner and Service Health for Ingress `${INGRESS_OBJECT_NAME}`

Google Chat Send Message

TaskSet Send Chat Message

Grafana Health

SLI Check Grafana Server Health

gRPC cURL Unary

TaskSet Run gRPCurl Command and Show Output
SLI Run gRPCurl Command and Push Metric

gRPC cURL Unary

TaskSet Create a new Jira Issue
SLI Search Jira Issues By Current User

HahiCorp Vault Health

SLI Check If Vault Endpoint Is Healthy

HTTP Latency

SLI Check HTTP Latency to Well Known URL

HTTP OK

SLI Checking HTTP URL Is Available And Timely

K8s Jaeger Query

TaskSet Query Traces in Jaeger for Unhealthy HTTP Response Codes in Namespace `${NAMESPACE}`

K8s OpenTelemetry Collector Health

TaskSet Query Collector Queued Spans in Namespace `${NAMESPACE}`
Check OpenTelemetry Collector Logs For Errors In Namespace `${NAMESPACE}`
Scan OpenTelemetry Logs For Dropped Spans In Namespace `${NAMESPACE}`

k8s-kubectl-cmd

TaskSet Run User Provided Kubectl Command
SLI Run User Provided Kubectl Command

Kong Ingress Health (GCP PromQL)

SLI Get Access Token
Get HTTP Error Rate
Get Upstream Health
Get Request Latency Rate
Generate Kong Ingress Score

Kubeprometheus Operator Troubleshoot

TaskSet Check Prometheus Service Monitors
Check For Successful Rule Setup
Verify Prometheus RBAC Can Access ServiceMonitors
Identify Endpoint Scraping Errors
Check Prometheus API Healthy

Kubernetes API Server Health

SLI Running Kubectl Check Against API Server

Kubernetes Application Troubleshoot

TaskSet Get `${CONTAINER_NAME}` Application Logs
Scan `${CONTAINER_NAME}` Application For Misconfigured Environment
Tail `${CONTAINER_NAME}` Application Logs For Stacktraces
SLI Measure Application Exceptions

Kubernetes ArgoCD Application Health & Troubleshoot

TaskSet Fetch ArgoCD Application Sync Status & Health for `${APPLICATION}`
Fetch ArgoCD Application Last Sync Operation Details for `${APPLICATION}`
Fetch Unhealthy ArgoCD Application Resources for `${APPLICATION}`
Scan For Errors in Pod Logs Related to ArgoCD Application `${APPLICATION}`
Fully Describe ArgoCD Application `${APPLICATION}`

Kubernetes ArgoCD HelmRelease TaskSet

TaskSet Fetch all available ArgoCD Helm releases in namespace `${NAMESPACE}`
Fetch Installed ArgoCD Helm release versions in namespace `${NAMESPACE}`

Kubernetes Artifactory Triage

TaskSet Check Artifactory Liveness and Readiness Endpoints

Kubernetes cert-manager Healthcheck

TaskSet Get Namespace Certificate Summary for Namespace `${NAMESPACE}`
Find Unhealthy Certificates in Namespace `${NAMESPACE}`
Find Failed Certificate Requests and Identify Issues for Namespace `${NAMESPACE}`
SLI Count Unready and Expired Certificates

Kubernetes CLI Command

TaskSet ${TASK_TITLE}
SLI ${TASK_TITLE}

Kubernetes CLI Command

TaskSet ${TASK_TITLE}
SLI ${TASK_TITLE}

Kubernetes Cluster Node Health

TaskSet Check for Node Restarts in Cluster `${CONTEXT}`
SLI Check for Node Restarts in Cluster `${CONTEXT}`
Generate Namspace Score

Kubernetes Cluster Resource Health

TaskSet Identify High Utilization Nodes for Cluster `${CONTEXT}`
Identify Pods Causing High Node Utilization in Cluster `${CONTEXT}`
SLI Identify High Utilization Nodes for Cluster `${CONTEXT}`

Kubernetes Daemonset Health Check

SLI Health Check Daemonset

Kubernetes Daemonset Triage

TaskSet Get DaemonSet Logs for `${DAEMONSET_NAME}` and Add to Report
Get Related Daemonset `${DAEMONSET_NAME}` Events
Check Daemonset `${DAEMONSET_NAME}` Replicas

Kubernetes Decomission Workload

TaskSet Generate Decomission Commands

Kubernetes Deployment Operations

TaskSet Restart Deployment `${DEPLOYMENT_NAME}` in Namespace `${NAMESPACE}`
Force Delete Pods in Deployment `${DEPLOYMENT_NAME}` in Namespace `${NAMESPACE}`
Rollback Deployment `${DEPLOYMENT_NAME}` in Namespace `${NAMESPACE}` to Previous Version
Scale Down Deployment `${DEPLOYMENT_NAME}` in Namespace `${NAMESPACE}`
Scale Up Deployment `${DEPLOYMENT_NAME}` in Namespace `${NAMESPACE}` by ${SCALE_UP_FACTOR}x
Clean Up Stale ReplicaSets for Deployment `${DEPLOYMENT_NAME}` in Namespace `${NAMESPACE}`
Scale Down Stale ReplicaSets for Deployment `${DEPLOYMENT_NAME}` in Namespace `${NAMESPACE}`

Kubernetes Deployment Triage

TaskSet Check Deployment Log For Issues with `${DEPLOYMENT_NAME}`
Fetch Deployments Logs for `${DEPLOYMENT_NAME}` and Add to Report
Check Liveness Probe Configuration for Deployment `${DEPLOYMENT_NAME}`
Check Readiness Probe Configuration for Deployment `${DEPLOYMENT_NAME}`
Inspect Container Restarts for Deployment `${DEPLOYMENT_NAME}` Namespace `${NAMESPACE}`
Inspect Deployment Warning Events for `${DEPLOYMENT_NAME}`
Get Deployment Workload Details For `${DEPLOYMENT_NAME}` and Add to Report
Inspect Deployment Replicas for `${DEPLOYMENT_NAME}`
Check Deployment Event Anomalies for `${DEPLOYMENT_NAME}`
Check ReplicaSet Health for Deployment `${DEPLOYMENT_NAME}`

Kubernetes Event Query

SLI Get Number Of Matching Events

Kubernetes Flux Choas Testing

TaskSet Suspend the Flux Resource Reconciliation
Find Random FluxCD Workload as Chaos Target
Execute Chaos Command
Execute Additional Chaos Command
Resume Flux Resource Reconciliation

Kubernetes Flux Suspend Namespace

TaskSet Flux Suspend Namespace ${NAMESPACE}
Unsuspend Flux for Namespace ${NAMESPACE}

Kubernetes FluxCD HelmRelease TaskSet

TaskSet List all available FluxCD Helmreleases in Namespace `${NAMESPACE}`
Fetch Installed FluxCD Helmrelease Versions in Namespace `${NAMESPACE}`
Fetch Mismatched FluxCD HelmRelease Version in Namespace `${NAMESPACE}`
Fetch FluxCD HelmRelease Error Messages in Namespace `${NAMESPACE}`
Check for Available Helm Chart Updates in Namespace `${NAMESPACE}`

Kubernetes FluxCD Kustomization TaskSet

TaskSet List all available Kustomization objects in Namespace `${NAMESPACE}`
Get details for unready Kustomizations in Namespace `${NAMESPACE}`

Kubernetes Fluxcd Reconciliation Report

TaskSet Health Check Flux Reconciliation
SLI Health Check Flux Reconciliation

Kubernetes GitOps GitHub Remediation

TaskSet Remediate Readiness and Liveness Probe GitOps Manifests in Namespace `${NAMESPACE}`
Increase ResourceQuota for Namespace `${NAMESPACE}`
Adjust Pod Resources to Match VPA Recommendation in `${NAMESPACE}`
Expand Persistent Volume Claims in Namespace `${NAMESPACE}`

Kubernetes Grafana Loki Health Check

TaskSet Check Loki Ring API
Check Loki API Ready

Kubernetes Image Check

TaskSet Check Image Rollover Times for Namespace `${NAMESPACE}`
List Images and Tags for Every Container in Running Pods for Namespace `${NAMESPACE}`
List Images and Tags for Every Container in Failed Pods for Namespace `${NAMESPACE}`
List ImagePullBackOff Events and Test Path and Tags for Namespace `${NAMESPACE}`

Kubernetes Ingress GCE & GCP HTTP Load Balancer Healthcheck

TaskSet Search For GCE Ingress Warnings in GKE
Identify Unhealthy GCE HTTP Ingress Backends
Validate GCP HTTP Load Balancer Configurations
Fetch Network Error Logs from GCP Operations Manager for Ingress Backends
Review GCP Operations Logging Dashboard

Kubernetes Ingress Healthcheck

TaskSet Fetch Ingress Object Health in Namespace `${NAMESPACE}`
Check for Ingress and Service Conflicts in Namespace `${NAMESPACE}`

Kubernetes Jenkins Healthcheck

TaskSet Query The Jenkins Kubernetes Workload HTTP Endpoint
Query For Stuck Jenkins Jobs

Kubernetes Labeled Pod Count

SLI Measure Number of Running Pods with Label

Kubernetes Namespace Chaos Engineering

TaskSet Kill Random Pods In Namespace `${NAMESPACE}`
OOMKill Pods In Namespace `${NAMESPACE}`
Mangle Service Selector In Namespace `${NAMESPACE}`
Mangle Service Port In Namespace `${NAMESPACE}`
Fill Random Pod Tmp Directory In Namespace `${NAMESPACE}`

Kubernetes Namespace Inspection

TaskSet Inspect Warning Events in Namespace `${NAMESPACE}`
Inspect Container Restarts In Namespace `${NAMESPACE}`
Inspect Pending Pods In Namespace `${NAMESPACE}`
Inspect Failed Pods In Namespace `${NAMESPACE}`
Inspect Workload Status Conditions In Namespace `${NAMESPACE}`
Get Listing Of Resources In Namespace `${NAMESPACE}`
Check Event Anomalies in Namespace `${NAMESPACE}`
Check Missing or Risky PodDisruptionBudget Policies in Namepace `${NAMESPACE}`
Check Resource Quota Utilization in Namespace `${NAMESPACE}`
SLI Get Event Count and Score
Get Container Restarts and Score
Get NotReady Pods
Generate Namspace Score

Kubernetes Namespace Troubleshoot

TaskSet Trace Namespace Errors
Fetch Unready Pods
Triage Namespace
Object Condition Check
Namespace Get All
SLI Get Event Count and Score
Get Container Restarts and Score
Get NotReady Pods
Generate Namspace Score

Kubernetes Patroni Health Check

SLI Determine Patroni Health

Kubernetes Patroni Lag Health

TaskSet Determine Patroni Health
SLI Measure Patroni Member Lag

Kubernetes Persistent Volume Healthcheck

TaskSet Fetch Events for Unhealthy Kubernetes PersistentVolumeClaims in Namespace `${NAMESPACE}`
List PersistentVolumeClaims in Terminating State in Namespace `${NAMESPACE}`
List PersistentVolumes in Terminating State in Namespace `${NAMESPACE}`
List Pods with Attached Volumes and Related PersistentVolume Details in Namespace `${NAMESPACE}`
Fetch the Storage Utilization for PVC Mounts in Namespace `${NAMESPACE}`
Check for RWO Persistent Volume Node Attachment Issues in Namespace `${NAMESPACE}`
SLI Fetch the Storage Utilization for PVC Mounts in Namespace `${NAMESPACE}`
Generate Namspace Score

Kubernetes Pod Resources Health

TaskSet Show Pods Without Resource Limit or Resource Requests Set in Namespace `${NAMESPACE}`
Get Pod Resource Utilization with Top in Namespace `${NAMESPACE}`
Identify VPA Pod Resource Recommendations in Namespace `${NAMESPACE}`
Identify Resource Constrained Pods In Namespace `${NAMESPACE}`

Kubernetes Postgres Healthcheck

TaskSet List Resources Related to Postgres Cluster `${OBJECT_NAME}` in Namespace `${NAMESPACE}`
Get Postgres Pod Logs & Events for Cluster `${OBJECT_NAME}` in Namespace `${NAMESPACE}`
Get Postgres Pod Resource Utilization for Cluster `${OBJECT_NAME}` in Namespace `${NAMESPACE}`
Get Running Postgres Configuration for Cluster `${OBJECT_NAME}` in Namespace `${NAMESPACE}`
Get Patroni Output and Add to Report for Cluster `${OBJECT_NAME}` in Namespace `${NAMESPACE}`
Fetch Patroni Database Lag for Cluster `${OBJECT_NAME}` in Namespace `${NAMESPACE}`
Check Database Backup Status for Cluster `${OBJECT_NAME}` in Namespace `${NAMESPACE}`
Run DB Queries for Cluster `${OBJECT_NAME}` in Namespace `${NAMESPACE}`
SLI Fetch Patroni Database Lag
Check Database Backup Status for Cluster `${OBJECT_NAME}` in Namespace `${NAMESPACE}`
Generate Namspace Score

Kubernetes PostgreSQL Query

TaskSet Run Postgres Query And Results to Report
SLI Run Postgres Query And Return Result As Metric

Kubernetes PostgreSQL Triage

TaskSet Get Standard Resources
Describe Custom Resources
Get Pod Logs & Events
Get Pod Resource Utilization
Get Running Configuration
Get Patroni Output
Run DB Queries

Kubernetes Redis Healthcheck

TaskSet Ping `${DEPLOYMENT_NAME}` Redis Workload
Verify `${DEPLOYMENT_NAME}` Redis Read Write Operation

Kubernetes Restart resource

TaskSet Get Current Resource State with Labels `${LABELS}`
Get Resource Logs with Labels `${LABELS}`
Restart Resource with Labels `${LABELS}`

Kubernetes Run Shell Command

TaskSet Running Kubectl And Adding Stdout To Report

Kubernetes Service Account Check

TaskSet Test Service Account Access to Kubernetes API Server in Namespace `${NAMESPACE}`

Kubernetes StatefulSet Triage

TaskSet Check Readiness Probe Configuration for StatefulSet `${STATEFULSET_NAME}`
Check Liveness Probe Configuration for StatefulSet `${STATEFULSET_NAME}`
Troubleshoot StatefulSet Warning Events for `${STATEFULSET_NAME}`
Check StatefulSet Event Anomalies for `${STATEFULSET_NAME}`
Fetch StatefulSet Logs for `${STATEFULSET_NAME}` and Add to Report
Get Related StatefulSet `${STATEFULSET_NAME}` Events
Fetch Manifest Details for StatefulSet `${STATEFULSET_NAME}`
List StatefulSets with Unhealthy Replica Counts In Namespace `${NAMESPACE}`

Kubernetes Synthetic PVC Test

SLI Run Canary Job

Kubernetes Tail Application Logs

TaskSet Get `${CONTAINER_NAME}` Application Logs
Tail `${CONTAINER_NAME}` Application Logs For Stacktraces
SLI Tail `${CONTAINER_NAME}` Application Logs For Stacktraces

Kubernetes Top

SLI Running Kubectl Top And Extracting Metric Data

Kubernetes Triage Deployment Replicas

TaskSet Fetch Logs
Get Related Events
Check Deployment Replicas

Kubernetes Triage Patroni

TaskSet Get Patroni Status
Get Pods Status
Fetch Logs

Kubernetes Triage StatefulSet

TaskSet Check StatefulSets Replicas Ready
Get Events For The StatefulSet
Get StatefulSet Logs
Get StatefulSet Manifests Dump

Kubernetes Troubleshoot Deployment

TaskSet Troubleshoot Resourcing
Troubleshoot Events
Troubleshoot PVC
Troubleshoot Pods

Kubernetes Vault Triage

TaskSet Fetch Vault CSI Driver Logs
Get Vault CSI Driver Warning Events
Check Vault CSI Driver Replicas
Fetch Vault Logs
Get Related Vault Events
Fetch Vault StatefulSet Manifest Details
Fetch Vault DaemonSet Manifest Details
Verify Vault Availability
Check Vault StatefulSet Replicas

Kubernetes Workload Chaos Engineering

TaskSet Test `${WORKLOAD_NAME}` High Availability
OOMKill `${WORKLOAD_NAME}` Pod
Mangle Service Selector For `${WORKLOAD_NAME}`
Mangle Service Port For `${WORKLOAD_NAME}`
Fill Tmp Directory Of Pod From `${WORKLOAD_NAME}`

Kubernetes Workload Metric

SLI Running Kubectl get and push the metric

Microsoft Teams Send Message

TaskSet Send a Message to an MS Teams Channel

MongoDB Health (GCP PromQL)

SLI Get Access Token
Get Instance Status
Get Connection Utilization Rate
Get MongoDB Member State Health
Get MongoDB Replication Lag
Get MongoDB Queue Size
Get Assertion Rate
Generate MongoDB Score

OpsGenie Create Alert

TaskSet Get Opsgenie System Info
Create An Alert

PagerDuty Webhook Handler

TaskSet Run SLX Tasks with matching PagerDuty Webhook Service ID

Ping Host Availability

SLI Ping host and collect packet lost percentage

Pingdom Health

SLI Check Pingdom Health

Prometheus Query (Instant) Metric

SLI Querying Prometheus Instance And Pushing Aggregated Data

Prometheus Query (Range) Metric

SLI Querying Prometheus Instance And Pushing Aggregated Data

rds-mysql-conn-count

TaskSet Run Bash File
SLI Querying Prometheus Instance And Pushing Aggregated Data

REST Metric

SLI Request Data From Rest Endpoint

REST Metric (Basic Auth)

SLI Request Data From Rest Endpoint

REST Metric (Explicit OAuth2 with BasicAuth)

SLI Request Data From Rest Endpoint

REST Metric (Explicit OAuth2 with Bearer Token)

SLI Request Data From Rest Endpoint

RocketChat Send Message

TaskSet Send Chat Message

RunWhen Local Helm Update (ACR)

TaskSet Apply Available RunWhen Helm Images in ACR Registry`${REGISTRY_NAME}`
SLI Check for Available RunWhen Helm Images in ACR Registry`${REGISTRY_NAME}`

RunWhen Platform Azure ACR Image Sync

TaskSet Sync CodeCollection Images to ACR Registry `${REGISTRY_NAME}`
Sync RunWhen Local Image Updates to ACR Registry`${REGISTRY_NAME}`
SLI Check for CodeCollection Updates against ACR Registry`${REGISTRY_NAME}`
Check for RunWhen Local Image Updates against ACR Registry`${REGISTRY_NAME}`
Count Images Needing Update and Push Metric

Slack Send Message

TaskSet Send Chat Message

SLI Alert Threshold

SLI Check If SLI Within Incident Threshold

Sysdig Monitor Metric

SLI Query Sysdig Metric Data And Pushing Metric

Sysdig Monitor PromQL Metric

SLI Querying PromQL Endpoint And Pushing Metric Data

Terraform Cloud Workspace Lock Check

TaskSet Checking whether the Terraform Cloud Workspace is in a locked state

Test Issues

TaskSet Raise Full Issue

Twitter Query Handle

TaskSet Query Twitter
SLI Query Twitter

Uptime.com Component Health

SLI Check If Vault Endpoint Is Healthy

Web Triage

TaskSet Validate Platform Egress
Perform Inspection On URL