All Tasks
AlertManager Webhook Handler |
||
TaskSet | Run SLX Tasks with matching AlertManager Webhook commonLabels | |
Artifactory OK |
||
SLI | Check If Artifactory Endpoint Is Healthy | |
AWS Account Creation Notification |
||
TaskSet | Get The Recently Created AWS Accounts | |
SLI | Get Count Of AWS Accounts In Organization | |
AWS ACM health |
||
TaskSet | List Unused ACM Certificates in AWS Region `${AWS_REGION}` in AWS Account `${AWS_ACCOUNT_ID}` List Expiring ACM Certificates in AWS Region `${AWS_REGION}` in AWS Account `${AWS_ACCOUNT_ID}` List Expired ACM Certificates in AWS Region `${AWS_REGION}` in AWS Account `${AWS_ACCOUNT_ID}` List Failed Status ACM Certificates in AWS Region `${AWS_REGION}` in AWS Account `${AWS_ACCOUNT_ID}` List Pending Validation ACM Certificates in AWS Region `${AWS_REGION}` in AWS Account `${AWS_ACCOUNT_ID}` |
|
SLI | Check for unused ACM certificates in AWS Region `${AWS_REGION}` in AWS account `${AWS_ACCOUNT_ID}` Check for Expiring ACM certificates in AWS Region `${AWS_REGION}` in AWS account `${AWS_ACCOUNT_ID}` Check for expired ACM certificates in AWS Region `${AWS_REGION}` in AWS account `${AWS_ACCOUNT_ID}` Check for Failed Status ACM Certificates in AWS Region `${AWS_REGION}` in AWS Account `${AWS_ACCOUNT_ID}` Check for Pending Validation ACM Certificates in AWS Region `${AWS_REGION}` in AWS Account `${AWS_ACCOUNT_ID}` Generate Health Score |
|
AWS Billing Period Costs by Tag |
||
SLI | Get All Billing Sliced By Tags | |
AWS CLI Command |
||
TaskSet | ${TASK_TITLE} | |
SLI | ${TASK_TITLE} | |
AWS CLI Command with Issue |
||
TaskSet | ${TASK_TITLE} | |
SLI | ${TASK_TITLE} | |
AWS CloudFormation Event Rate |
||
SLI | Fetch CloudFormation Stack Events | |
AWS CloudFormation Triage |
||
TaskSet | Get All Recent Stack Events | |
AWS CloudWatch Log Query (Pass/Fail) |
||
SLI | Running CloudWatch Log Query And Pushing 1 If No Results Found | |
AWS CloudWatch Log Query (Total Count) |
||
SLI | Running CloudWatch Log Query And Pushing The Count Of Results | |
AWS CloudWatch Logs health |
||
TaskSet | List CloudWatch Log Groups Without Retention Period in AWS Region `${AWS_REGION}` in AWS Account `${AWS_ACCOUNT_ID}` Check CloudTrail Configuration in AWS Region `${AWS_REGION}` in AWS Account `${AWS_ACCOUNT_ID}` Check for CloudTrail integration with CloudWatch Logs in AWS Region `${AWS_REGION}` in AWS Account `${AWS_ACCOUNT_ID}` |
|
SLI | Check CloudWatch Log Groups Without Retention Period in AWS Region `${AWS_REGION}` in AWS account `${AWS_ACCOUNT_ID}` Check if CloudTrail exists and is configured for multi-region in AWS Region `${AWS_REGION}` in AWS Account `${AWS_ACCOUNT_ID}` Check CloudTrail Without CloudWatch Logs in AWS Region `${AWS_REGION}` in AWS Account `${AWS_ACCOUNT_ID}` Generate Health Score |
|
AWS CloudWatch Metric Query Dashboard |
||
TaskSet | Get CloudWatch MetricQuery Insights URL | |
AWS CloudWatch Overutlized EC2 Inspection |
||
TaskSet | Check For Overutilized Ec2 Instances | |
AWS CloudWatch Tag Metric Query |
||
SLI | Run CloudWatch Metric Query Across Set Of IDs And Push Metric | |
AWS Costs by Tag |
||
TaskSet | Get All Billing Sliced By Tags | |
AWS EBS Health |
||
TaskSet | List Unattached EBS Volumes in AWS Region `${AWS_REGION}` in AWS account `${AWS_ACCOUNT_ID}` List Unencrypted EBS Volumes in AWS Region `${AWS_REGION}` in AWS account `${AWS_ACCOUNT_ID}` List Unused EBS Snapshots in AWS Region `${AWS_REGION}` in AWS account `${AWS_ACCOUNT_ID}` |
|
SLI | Check Unattached EBS Volumes in `${AWS_REGION}` Check Unencrypted EBS Volumes in `${AWS_REGION}` Check Unused EBS Snapshots in `${AWS_REGION}` Generate EBS Score |
|
AWS EC2 Health |
||
TaskSet | List stale AWS EC2 instances in AWS Region `${AWS_REGION}` in AWS account `${AWS_ACCOUNT_ID}` List stopped AWS EC2 instances in AWS Region `${AWS_REGION}` in AWS account `${AWS_ACCOUNT_ID}` List invalid AWS Auto Scaling Groups in AWS Region ${AWS_REGION} in AWS account ${AWS_ACCOUNT_ID} |
|
SLI | Check for stale AWS EC2 instances in AWS Region `${AWS_REGION}` in AWS account `${AWS_ACCOUNT_ID}` Check for stopped AWS EC2 instances in AWS Region `${AWS_REGION}` in AWS account `${AWS_ACCOUNT_ID}` Check for invalid AWS Auto Scaling Groups in AWS Region `${AWS_REGION}` in AWS account `${AWS_ACCOUNT_ID}` Generate Health Score |
|
AWS EC2 Security Check |
||
TaskSet | Check For Untagged instances Check For Dangling Volumes Check For Open Routes Check For Overused Instances Check For Underused Instances Check For Underused Volumes Check For Overused Volumes |
|
AWS EKS Cluster Health |
||
TaskSet | Check EKS Fargate Cluster Health Status in AWS Region `${AWS_REGION}` Check Amazon EKS Cluster Health Status in AWS Region `${AWS_REGION}` Monitor EKS Cluster Health in AWS Region `${AWS_REGION}` |
|
SLI | Check Amazon EKS Cluster Health Status in AWS Region `${AWS_REGION}` | |
AWS EKS Nodegroup Status Check |
||
TaskSet | Check EKS Nodegroup Status in `${EKS_CLUSTER_NAME}` | |
AWS ElastiCache Health Check |
||
TaskSet | Scan AWS Elasticache Redis Status in AWS Region `${AWS_REGION}` | |
SLI | Scan ElastiCaches in AWS Region `${AWS_REGION}` | |
AWS Lambda Health Check |
||
TaskSet | List Lambda Versions and Runtimes in AWS Region `${AWS_REGION}` Analyze AWS Lambda Invocation Errors in Region `${AWS_REGION}` Monitor AWS Lambda Performance Metrics in AWS Region `${AWS_REGION}` |
|
SLI | Analyze AWS Lambda Invocation Errors in Region `${AWS_REGION}` | |
AWS network health |
||
TaskSet | List Publicly Accessible Security Groups in AWS account `${AWS_ACCOUNT_ID}` List unused Elastic IPs in AWS account `${AWS_ACCOUNT_ID}` List unused ELBs in AWS account `${AWS_ACCOUNT_ID}` List VPCs with Flow Logs Disabled in AWS account `${AWS_ACCOUNT_ID}` |
|
SLI | Check for publicly accessible security groups in AWS account `${AWS_ACCOUNT_ID}` Check for unused Elastic IPs in AWS account `${AWS_ACCOUNT_ID}` Check for unused ELBs in AWS account `${AWS_ACCOUNT_ID}` Check for VPCs with Flow Logs disabled in AWS account `${AWS_ACCOUNT_ID}` Generate Health Score |
|
AWS RDS health |
||
TaskSet | List Unencrypted RDS Instances in AWS Region `${AWS_REGION}` in AWS Account `${AWS_ACCOUNT_ID}` List Publicly Accessible RDS Instances in AWS Region `${AWS_REGION}` in AWS Account `${AWS_ACCOUNT_ID}` List RDS Instances with Backups Disabled in AWS Region `${AWS_REGION}` in AWS Account `${AWS_ACCOUNT_ID}` |
|
SLI | Check for unencrypted RDS instances in AWS Region `${AWS_REGION}` in AWS account `${AWS_ACCOUNT_ID}` Check for publicly accessible RDS instances in AWS Region `${AWS_REGION}` in AWS account `${AWS_ACCOUNT_ID}` Check for disabled backup RDS instances in AWS Region `${AWS_REGION}` in AWS account `${AWS_ACCOUNT_ID}` Generate Health Score |
|
AWS S3 Bucket Info Report |
||
TaskSet | Check AWS S3 Bucket Storage Utilization | |
AWS S3 Health |
||
TaskSet | List S3 Buckets With Public Access in AWS Account `${AWS_ACCOUNT_NAME}` | |
SLI | Count S3 Buckets With Public Access in AWS Account `${AWS_ACCOUNT_NAME}` | |
AWS S3 Stale Check |
||
TaskSet | Create Report For Stale Buckets | |
AWS VM Triage |
||
TaskSet | Get Max VM CPU Utilization In Last 3 Hours Get Lowest VM CPU Credits In Last 3 Hours Get Max VM CPU Credit Usage In Last 3 hours Get Max VM Memory Utilization In Last 3 Hours Get Max VM Volume Usage In Last 3 Hours |
|
aws-cloudwatch-metricquery |
||
SLI | Running CloudWatch Metric Query And Pushing The Result | |
Azure ACR Image Sync |
||
TaskSet | Sync Container Images into Azure Container Registry `${ACR_REGISTRY}` | |
SLI | Count Outdated Images in Azure Container Registry `${ACR_REGISTRY}` | |
Azure AKS Triage |
||
TaskSet | Check for Resource Health Issues Affecting AKS Cluster `${AKS_CLUSTER}` In Resource Group `${AZ_RESOURCE_GROUP}` Check Configuration Health of AKS Cluster `${AKS_CLUSTER}` In Resource Group `${AZ_RESOURCE_GROUP}` Check Network Configuration of AKS Cluster `${AKS_CLUSTER}` In Resource Group `${AZ_RESOURCE_GROUP}` Fetch Activities for AKS Cluster `${AKS_CLUSTER}` In Resource Group `${AZ_RESOURCE_GROUP}` |
|
SLI | Check for Resource Health Issues Affecting AKS Cluster `${AKS_CLUSTER}` In Resource Group `${AZ_RESOURCE_GROUP}` Fetch Activities for AKS Cluster `${AKS_CLUSTER}` In Resource Group `${AZ_RESOURCE_GROUP}` Check Configuration Health of AKS Cluster `${AKS_CLUSTER}` In Resource Group `${AZ_RESOURCE_GROUP}` Generate AKS Cluster Health Score |
|
Azure App Service Operations |
||
TaskSet | Restart App Service `${APP_SERVICE_NAME}` in Resource Group `${AZ_RESOURCE_GROUP}` Swap Deployment Slots for App Service `${APP_SERVICE_NAME}` in Resource Group `${AZ_RESOURCE_GROUP}` Scale Up App Service `${APP_SERVICE_NAME}` in Resource Group `${AZ_RESOURCE_GROUP}` Scale Down App Service `${APP_SERVICE_NAME}` in Resource Group `${AZ_RESOURCE_GROUP}` Scale Out Instances for App Service `${APP_SERVICE_NAME}` in Resource Group `${AZ_RESOURCE_GROUP}` by ${SCALE_OUT_FACTOR}x Scale In Instances for App Service `${APP_SERVICE_NAME}` in Resource Group `${AZ_RESOURCE_GROUP}` to 1/${SCALE_IN_FACTOR} Redeploy App Service `${APP_SERVICE_NAME}` from Latest Source in Resource Group `${AZ_RESOURCE_GROUP}` |
|
Azure App Service Triage |
||
TaskSet | Check for Resource Health Issues Affecting App Service `${APP_SERVICE_NAME}` In Resource Group `${AZ_RESOURCE_GROUP}` Check App Service `${APP_SERVICE_NAME}` Health in Resource Group `${AZ_RESOURCE_GROUP}` Fetch App Service `${APP_SERVICE_NAME}` Utilization Metrics In Resource Group `${AZ_RESOURCE_GROUP}` Get App Service `${APP_SERVICE_NAME}` Logs In Resource Group `${AZ_RESOURCE_GROUP}` Check Configuration Health of App Service `${APP_SERVICE_NAME}` In Resource Group `${AZ_RESOURCE_GROUP}` Check Deployment Health of App Service `${APP_SERVICE_NAME}` In Resource Group `${AZ_RESOURCE_GROUP}` Fetch App Service `${APP_SERVICE_NAME}` Activities In Resource Group `${AZ_RESOURCE_GROUP}` Check Logs for Errors in App Service `${APP_SERVICE_NAME}` In Resource Group `${AZ_RESOURCE_GROUP}` |
|
SLI | Check for Resource Health Issues Affecting App Service `${APP_SERVICE_NAME}` In Resource Group `${AZ_RESOURCE_GROUP}` Check App Service `${APP_SERVICE_NAME}` Health Check Metrics In Resource Group `${AZ_RESOURCE_GROUP}` Check App Service `${APP_SERVICE_NAME}` Configuration Health In Resource Group `${AZ_RESOURCE_GROUP}` Check Deployment Health of App Service `${APP_SERVICE_NAME}` In Resource Group `${AZ_RESOURCE_GROUP}` Fetch App Service `${APP_SERVICE_NAME}` Activities In Resource Group `${AZ_RESOURCE_GROUP}` Generate App Service Health Score for `${APP_SERVICE_NAME}` in resource group `${AZ_RESOURCE_GROUP}` |
|
Azure Application Gateway Health |
||
TaskSet | Check for Resource Health Issues Affecting Application Gateway `${APP_GATEWAY_NAME}` In Resource Group `${AZ_RESOURCE_GROUP}` Check Configuration Health of Application Gateway `${APP_GATEWAY_NAME}` In Resource Group `${AZ_RESOURCE_GROUP}` Check Backend Pool Health for Application Gateway `${APP_GATEWAY_NAME}` In Resource Group `${AZ_RESOURCE_GROUP}` Fetch Log Analytics for Application Gateway `${APP_GATEWAY_NAME}` In Resource Group `${AZ_RESOURCE_GROUP}` Fetch Metrics for Application Gateway `${APP_GATEWAY_NAME}` In Resource Group `${AZ_RESOURCE_GROUP}` Check SSL Certificate Health for Application Gateway `${APP_GATEWAY_NAME}` In Resource Group `${AZ_RESOURCE_GROUP}` Check Logs for Errors with Application Gateway `${APP_GATEWAY_NAME}` In Resource Group `${AZ_RESOURCE_GROUP}` List Related Azure Resources for Application Gateway `${APP_GATEWAY_NAME}` In Resource Group `${AZ_RESOURCE_GROUP}` |
|
SLI | Check for Resource Health Issues Affecting Application Gateway `${APP_GATEWAY_NAME}` In Resource Group `${AZ_RESOURCE_GROUP}` Check Configuration Health of Application Gateway `${APP_GATEWAY_NAME}` In Resource Group `${AZ_RESOURCE_GROUP}` Check Backend Pool Health for Application Gateway `${APP_GATEWAY_NAME}` In Resource Group `${AZ_RESOURCE_GROUP}` Fetch Metrics for Application Gateway `${APP_GATEWAY_NAME}` In Resource Group `${AZ_RESOURCE_GROUP}` Check SSL Certificate Health for Application Gateway `${APP_GATEWAY_NAME}` In Resource Group `${AZ_RESOURCE_GROUP}` Check Logs for Errors with Application Gateway `${APP_GATEWAY_NAME}` In Resource Group `${AZ_RESOURCE_GROUP}` Generate Application Gateway Health Score |
|
Azure CLI Command |
||
TaskSet | ${TASK_TITLE} | |
SLI | ${TASK_TITLE} | |
Azure CLI Command with Issue |
||
TaskSet | ${TASK_TITLE} | |
SLI | ${TASK_TITLE} | |
Azure Internal LoadBalancer Triage |
||
TaskSet | Check Activity Logs for Azure Load Balancer `${AZ_LB_NAME}` | |
Azure VM Scale Set Triage |
||
TaskSet | Check Scale Set `${VMSCALESET}` Key Metrics In Resource Group `${AZ_RESOURCE_GROUP}` Fetch VM Scale Set `${VMSCALESET}` Config In Resource Group `${AZ_RESOURCE_GROUP}` Fetch Activities for VM Scale Set `${VMSCALESET}` In Resource Group `${AZ_RESOURCE_GROUP}` |
|
SLI | Check Scale Set `${VMSCALESET}` Key Metrics In Resource Group `${AZ_RESOURCE_GROUP}` | |
Cert-manager Expirations |
||
SLI | Inspect Certification Expiration Dates | |
Cert-Manager Health Check |
||
SLI | Health Check cert-manager Pods | |
Cortex Metrics Ingester Health |
||
TaskSet | Fetch Ingestor Ring Member List and Status | |
SLI | Determine Cortex Ingester Ring Health | |
cURL CLI Command |
||
TaskSet | ${TASK_TITLE} | |
SLI | ${TASK_TITLE} | |
cURL CLI Command with Issue |
||
TaskSet | ${TASK_TITLE} | |
SLI | ${TASK_TITLE} | |
cURL Generic Report |
||
TaskSet | Run Curl Command and Add to Report | |
SLI | Run Curl Command and Push Metric | |
cURL HTTP OK |
||
TaskSet | Check HTTP URL Availability and Timeliness for `${URL}` | |
SLI | Validate HTTP URL Availability and Timeliness for ${URL} | |
Datadog Metric |
||
SLI | Query Datadog Metrics | |
Datadog System Load |
||
SLI | Check Datadog System Load | |
Discord Send Message |
||
TaskSet | Send Chat Message | |
DNS Latency |
||
SLI | Check DNS latency for Google Resolver | |
ElasticSearch Health |
||
SLI | Check Elasticsearch Cluster Health | |
GCP CLI Command |
||
TaskSet | ${TASK_TITLE} | |
SLI | ${TASK_TITLE} | |
GCP CLI Command with Issue |
||
TaskSet | ${TASK_TITLE} | |
SLI | ${TASK_TITLE} | |
GCP Cloud Function Health |
||
TaskSet | List Unhealthy Cloud Functions in GCP Project `${GCP_PROJECT_ID}` Get Error Logs for Unhealthy Cloud Functions in GCP Project `${GCP_PROJECT_ID}` |
|
SLI | Count unhealthy GCP Cloud Functions in GCP Project `${GCP_PROJECT_ID}` | |
GCP GCloud Generic Report |
||
TaskSet | Run Gcloud CLI Command and Push metric | |
SLI | Run Gcloud CLI Command and Push metric | |
GCP Gcloud Log Inspection |
||
TaskSet | Inspect GCP Logs For Common Errors in GCP Project `${GCP_PROJECT_ID}` | |
GCP Node Prempt List |
||
TaskSet | List all nodes in an active preempt operation for GCP Project `${GCP_PROJECT_ID}` within the last `${AGE}` hours | |
SLI | Count the number of nodes in active preempt operation in project `${GCP_PROJECT_ID}` | |
GCP Operations Suite Log Query |
||
SLI | Running GCE Logging Query And Pushing Result Count Metric | |
GCP Operations Suite Log Query Dashboard URL |
||
TaskSet | Get GCP Log Dashboard URL For Given Log Query | |
GCP Operations Suite Metric Query |
||
SLI | Running GCP OpsSuite Metric Query | |
GCP Operations Suite Prometheus Query |
||
SLI | Run Prometheus Instant Query Against Google Prom API Endpoint | |
GCP Service Status |
||
SLI | Get Number of GCP Incidents Effecting My Workspace | |
GCP Storage Bucket Health |
||
TaskSet | Fetch GCP Bucket Storage Utilization for `${PROJECT_IDS}` Add GCP Bucket Storage Configuration for `${PROJECT_IDS}` to Report Check GCP Bucket Security Configuration for `${PROJECT_IDS}` Fetch GCP Bucket Storage Operations Rate for `${PROJECT_IDS}` |
|
SLI | Fetch GCP Bucket Storage Utilization for `${PROJECT_IDS}` Check GCP Bucket Security Configuration for `${PROJECT_IDS}` Fetch GCP Bucket Storage Operations Rate for `${PROJECT_IDS}` Generate Bucket Score in Project `${PROJECT_IDS}` |
|
GitHub - Create Issue From RunSession |
||
TaskSet | Create GitHub Issue in Repository `${GITHUB_REPOSITORY}` from RunSession | |
GitHub Actions Artifact Analysis |
||
TaskSet | Analyze artifact from GitHub workflow `${WORKFLOW_NAME}` in repository `${GITHUB_REPO}` | |
SLI | Analyze artifact from GitHub Workflow `${WORKFLOW_NAME}` in repository `${GITHUB_REPO}` and push metric | |
GitHub Actions Workflow Timing |
||
SLI | Get Average Run Time For Workflow | |
GitHub API Latency |
||
TaskSet | Check Latency When Creating a New GitHub Issue | |
SLI | Check GitHub Latency With Get Repos | |
GitHub Service Status |
||
SLI | Get Availability of GitHub or Individual GitHub Components | |
GitHub Status Incidents |
||
SLI | Get Number of Incidents Affecting GitHub | |
GitHub Status Maintenance |
||
SLI | Get Scheduled and Active GitHub Maintenance Windows | |
GitLab Availability |
||
TaskSet | Check GitLab Server Status | |
SLI | Check GitLab Server Status | |
GitLab Get Repo Latency |
||
SLI | Check GitLab Latency With Get Repos | |
GKE Kong Ingress Host Triage |
||
TaskSet | Check If Kong Ingress HTTP Error Rate Violates HTTP Error Threshold in GCP Project `${GCP_PROJECT_ID}` Check If Kong Ingress HTTP Request Latency Violates Threshold in GCP Project `${GCP_PROJECT_ID}` Check If Kong Ingress Controller Reports Upstream Errors in GCP Project `${GCP_PROJECT_ID}` |
|
GKE Nginx Ingress Host Triage |
||
TaskSet | Fetch Nginx HTTP Errors From GMP for Ingress `${INGRESS_OBJECT_NAME}` Find Owner and Service Health for Ingress `${INGRESS_OBJECT_NAME}` |
|
Google Chat Send Message |
||
TaskSet | Send Chat Message | |
Grafana Health |
||
SLI | Check Grafana Server Health | |
gRPC cURL Unary |
||
TaskSet | Create a new Jira Issue | |
SLI | Search Jira Issues By Current User | |
gRPC cURL Unary |
||
TaskSet | Run gRPCurl Command and Show Output | |
SLI | Run gRPCurl Command and Push Metric | |
HahiCorp Vault Health |
||
SLI | Check If Vault Endpoint Is Healthy | |
HTTP Latency |
||
SLI | Check HTTP Latency to Well Known URL | |
HTTP OK |
||
SLI | Checking HTTP URL Is Available And Timely | |
K8s Jaeger Query |
||
TaskSet | Query Traces in Jaeger for Unhealthy HTTP Response Codes in Namespace `${NAMESPACE}` | |
K8s OpenTelemetry Collector Health |
||
TaskSet | Query Collector Queued Spans in Namespace `${NAMESPACE}` Check OpenTelemetry Collector Logs For Errors In Namespace `${NAMESPACE}` Query OpenTelemetry Logs For Dropped Spans In Namespace `${NAMESPACE}` |
|
Kong Ingress Health (GCP PromQL) |
||
SLI | Get Access Token Get HTTP Error Rate Get Upstream Health Get Request Latency Rate Generate Kong Ingress Score |
|
Kubeprometheus Operator Troubleshoot |
||
TaskSet | Check Prometheus Service Monitors in namespace `${NAMESPACE}` Check For Successful Rule Setup in Kubernetes Namespace `${NAMESPACE}` Verify Prometheus RBAC Can Access ServiceMonitors in Namespace `${PROM_NAMESPACE}` Inspect Prometheus Operator Logs for Scraping Errors in Namespace `${NAMESPACE}` Check Prometheus API Healthy in Namespace `${PROM_NAMESPACE}` |
|
Kubernetes API Server Health |
||
SLI | Running Kubectl Check Against API Server | |
Kubernetes Application Log Health |
||
TaskSet | Scan ${WORKLOAD_TYPE} `${WORKLOAD_NAME}` Logs for Errors in Namespace `${NAMESPACE}` Scan ${WORKLOAD_TYPE} `${WORKLOAD_NAME}` Logs for Stack Traces in Namespace `${NAMESPACE}` Scan ${WORKLOAD_TYPE} `${WORKLOAD_NAME}` Logs for Connection Failures in Namespace `${NAMESPACE}` Scan ${WORKLOAD_TYPE} `${WORKLOAD_NAME}` Logs for Timeout Errors in Namespace `${NAMESPACE}` Scan ${WORKLOAD_TYPE} `${WORKLOAD_NAME}` Logs for Authentication and Authorization Failures in Namespace `${NAMESPACE}` Scan ${WORKLOAD_TYPE} `${WORKLOAD_NAME}` Logs for Null Pointer and Unhandled Exceptions in Namespace `${NAMESPACE}` Scan ${WORKLOAD_TYPE} `${WORKLOAD_NAME}` for Log Anomalies in Namespace `${NAMESPACE}` Scan ${WORKLOAD_TYPE} `${WORKLOAD_NAME}` Logs for Application Restarts and Failures in Namespace `${NAMESPACE}` Scan ${WORKLOAD_TYPE} `${WORKLOAD_NAME}` Logs for Memory and CPU Resource Warnings in Namespace `${NAMESPACE}` Scan ${WORKLOAD_TYPE} `${WORKLOAD_NAME}` Logs for Service Dependency Failures in Namespace `${NAMESPACE}` |
|
SLI | Scan ${WORKLOAD_TYPE} `${WORKLOAD_NAME}` Logs for Errors in Namespace `${NAMESPACE}` Scan ${WORKLOAD_TYPE} `${WORKLOAD_NAME}` Logs for Stack Traces in Namespace `${NAMESPACE}` Scan ${WORKLOAD_TYPE} `${WORKLOAD_NAME}` Logs for Connection Failures in Namespace `${NAMESPACE}` Scan ${WORKLOAD_TYPE} `${WORKLOAD_NAME}` Logs for Timeout Errors in Namespace `${NAMESPACE}` Scan ${WORKLOAD_TYPE} `${WORKLOAD_NAME}` Logs for Authentication and Authorization Failures in Namespace `${NAMESPACE}` Scan ${WORKLOAD_TYPE} `${WORKLOAD_NAME}` Logs for Null Pointer and Unhandled Exceptions in Namespace `${NAMESPACE}` Scan ${WORKLOAD_TYPE} `${WORKLOAD_NAME}` for Log Anomalies in Namespace `${NAMESPACE}` Scan ${WORKLOAD_TYPE} `${WORKLOAD_NAME}` Logs for Application Restarts and Failures in Namespace `${NAMESPACE}` Scan ${WORKLOAD_TYPE} `${WORKLOAD_NAME}` Logs for Memory and CPU Resource Warnings in Namespace `${NAMESPACE}` Scan ${WORKLOAD_TYPE} `${WORKLOAD_NAME}` Logs for Service Dependency Failures in Namespace `${NAMESPACE}` Generate Application Gateway Health Score |
|
Kubernetes Application Troubleshoot |
||
TaskSet | Get `${CONTAINER_NAME}` Application Logs from Workload `${WORKLOAD_NAME}` in Namespace `${NAMESPACE}` Scan `${CONTAINER_NAME}` Application For Misconfigured Environment Tail `${CONTAINER_NAME}` Application Logs For Stacktraces in Workload `${WORKLOAD_NAME}` |
|
SLI | Measure Application Exceptions in `${NAMESPACE}` | |
Kubernetes ArgoCD Application Health & Troubleshoot |
||
TaskSet | Fetch ArgoCD Application Sync Status & Health for `${APPLICATION}` Fetch ArgoCD Application Last Sync Operation Details for `${APPLICATION}` Fetch Unhealthy ArgoCD Application Resources for `${APPLICATION}` Scan For Errors in Pod Logs Related to ArgoCD Application `${APPLICATION}` Fully Describe ArgoCD Application `${APPLICATION}` |
|
Kubernetes ArgoCD HelmRelease TaskSet |
||
TaskSet | Fetch all available ArgoCD Helm releases in namespace `${NAMESPACE}` Fetch Installed ArgoCD Helm release versions in namespace `${NAMESPACE}` |
|
Kubernetes Artifactory Triage |
||
TaskSet | Check Artifactory Liveness and Readiness Endpoints in `NAMESPACE` | |
Kubernetes cert-manager Healthcheck |
||
TaskSet | Get Namespace Certificate Summary for Namespace `${NAMESPACE}` Find Unhealthy Certificates in Namespace `${NAMESPACE}` Find Failed Certificate Requests and Identify Issues for Namespace `${NAMESPACE}` |
|
SLI | Count Unready and Expired Certificates in Namespace `${NAMESPACE}` | |
Kubernetes CLI Command |
||
TaskSet | ${TASK_TITLE} | |
SLI | ${TASK_TITLE} | |
Kubernetes CLI Command |
||
TaskSet | ${TASK_TITLE} | |
SLI | ${TASK_TITLE} | |
Kubernetes Cluster Node Health |
||
TaskSet | Check for Node Restarts in Cluster `${CONTEXT}` within Interval `${INTERVAL}` | |
SLI | Check for Node Restarts in Cluster `${CONTEXT}` Generate Namespace Score in Kubernetes Cluster `$${CONTEXT}` |
|
Kubernetes Cluster Resource Health |
||
TaskSet | Identify High Utilization Nodes for Cluster `${CONTEXT}` Identify Pods Causing High Node Utilization in Cluster `${CONTEXT}` |
|
SLI | Identify High Utilization Nodes for Cluster `${CONTEXT}` | |
Kubernetes Daemonset Health Check |
||
SLI | Health Check Daemonset | |
Kubernetes Daemonset Triage |
||
TaskSet | Get DaemonSet Logs for `${DAEMONSET_NAME}` and Add to Report Get Related Daemonset `${DAEMONSET_NAME}` Events in Namespace `${NAMESPACE}` Check Daemonset `${DAEMONSET_NAME}` Replicas |
|
Kubernetes Decomission Workload |
||
TaskSet | Generate Decomission Commands | |
Kubernetes Deployment Operations |
||
TaskSet | Restart Deployment `${DEPLOYMENT_NAME}` in Namespace `${NAMESPACE}` Force Delete Pods in Deployment `${DEPLOYMENT_NAME}` in Namespace `${NAMESPACE}` Rollback Deployment `${DEPLOYMENT_NAME}` in Namespace `${NAMESPACE}` to Previous Version Scale Down Deployment `${DEPLOYMENT_NAME}` in Namespace `${NAMESPACE}` Scale Up Deployment `${DEPLOYMENT_NAME}` in Namespace `${NAMESPACE}` by ${SCALE_UP_FACTOR}x Clean Up Stale ReplicaSets for Deployment `${DEPLOYMENT_NAME}` in Namespace `${NAMESPACE}` Scale Down Stale ReplicaSets for Deployment `${DEPLOYMENT_NAME}` in Namespace `${NAMESPACE}` |
|
Kubernetes Deployment Triage |
||
TaskSet | Check Deployment Log For Issues with `${DEPLOYMENT_NAME}` Fetch Deployments Logs for `${DEPLOYMENT_NAME}` in Namespace `${NAMESPACE}` and Add to Report Check Liveness Probe Configuration for Deployment `${DEPLOYMENT_NAME}` Check Readiness Probe Configuration for Deployment `${DEPLOYMENT_NAME}` in Namespace `${NAMESPACE}` Inspect Container Restarts for Deployment `${DEPLOYMENT_NAME}` in Namespace `${NAMESPACE}` Inspect Deployment Warning Events for `${DEPLOYMENT_NAME}` in Namespace `${NAMESPACE}` Fetch Deployment Workload Details For `${DEPLOYMENT_NAME}` in Namespace `${NAMESPACE}` Inspect Deployment Replicas for `${DEPLOYMENT_NAME}` in namespace `${NAMESPACE}` Check Deployment Event Anomalies for `${DEPLOYMENT_NAME}` in Namespace `${NAMESPACE}` Check ReplicaSet Health for Deployment `${DEPLOYMENT_NAME}` in Namespace `${NAMESPACE}` |
|
Kubernetes Event Query |
||
SLI | Get Number Of Matching Events | |
Kubernetes Flux Choas Testing |
||
TaskSet | Suspend the Flux Resource Reconciliation for `${FLUX_RESOURCE_NAME}` in namespace `${FLUX_RESOURCE_NAMESPACE}` Select Random FluxCD Workload for Chaos Target in Namespace `${FLUX_RESOURCE_NAMESPACE}` Execute Chaos Command on `${TARGET_RESOURCE}` in Namespace `${TARGET_NAMESPACE}` Execute Additional Chaos Command on ${FLUX_RESOURCE_TYPE} '${FLUX_RESOURCE_NAME}' in namespace '${FLUX_RESOURCE_NAMESPACE}' Resume Flux Resource Reconciliation in `${TARGET_NAMESPACE}` |
|
Kubernetes FluxCD HelmRelease TaskSet |
||
TaskSet | List all available FluxCD Helmreleases in Namespace `${NAMESPACE}` Fetch Installed FluxCD Helmrelease Versions in Namespace `${NAMESPACE}` Fetch Mismatched FluxCD HelmRelease Version in Namespace `${NAMESPACE}` Fetch FluxCD HelmRelease Error Messages in Namespace `${NAMESPACE}` Check for Available Helm Chart Updates in Namespace `${NAMESPACE}` |
|
Kubernetes FluxCD Kustomization TaskSet |
||
TaskSet | List all available FluxCD Kustomization objects in Namespace `${NAMESPACE}` List Unready FluxCD Kustomizations in Namespace `${NAMESPACE}` |
|
Kubernetes Fluxcd Reconciliation Report |
||
TaskSet | Check FluxCD Reconciliation Health in Kubernetes Namespace `${FLUX_NAMESPACE}` | |
SLI | Health Check Flux Reconciliation | |
Kubernetes GitOps GitHub Remediation |
||
TaskSet | Remediate Readiness and Liveness Probe GitOps Manifests in Namespace `${NAMESPACE}` Increase ResourceQuota Limit for Namespace `${NAMESPACE}` in GitHub GitOps Repository Adjust Pod Resources to Match VPA Recommendation in `${NAMESPACE}` Expand Persistent Volume Claims in Namespace `${NAMESPACE}` |
|
Kubernetes Grafana Loki Health Check |
||
TaskSet | Check Loki Ring API for Unhealthy Shards in Kubernetes Cluster `$${NAMESPACE}` Check Loki API Ready in Kubernetes Cluster `${NAMESPACE}` |
|
Kubernetes Image Check |
||
TaskSet | Check Image Rollover Times for Namespace `${NAMESPACE}` List Images and Tags for Every Container in Running Pods for Namespace `${NAMESPACE}` List Images and Tags for Every Container in Failed Pods for Namespace `${NAMESPACE}` List ImagePullBackOff Events and Test Path and Tags for Namespace `${NAMESPACE}` |
|
Kubernetes Ingress GCE & GCP HTTP Load Balancer Healthcheck |
||
TaskSet | Search For GCE Ingress Warnings in GKE Context `${CONTEXT}` Identify Unhealthy GCE HTTP Ingress Backends in GKE Namespace `${NAMESPACE}` Validate GCP HTTP Load Balancer Configurations in GCP Project `${GCP_PROJECT_ID}` Fetch Network Error Logs from GCP Operations Manager for Ingress Backends in GCP Project `${GCP_PROJECT_ID}` Review GCP Operations Logging Dashboard in GCP project `${GCP_PROJECT_ID}` |
|
Kubernetes Ingress Healthcheck |
||
TaskSet | Fetch Ingress Object Health in Namespace `${NAMESPACE}` Check for Ingress and Service Conflicts in Namespace `${NAMESPACE}` |
|
Kubernetes Jenkins Healthcheck |
||
TaskSet | Query The Jenkins Kubernetes Workload HTTP Endpoint in Kubernetes StatefulSet `${STATEFULSET_NAME}` Query For Stuck Jenkins Jobs in Kubernetes Statefulset Workload `${STATEFULSET_NAME}` |
|
Kubernetes Labeled Pod Count |
||
SLI | Measure Number of Running Pods with Label in `${NAMESPACE}` | |
Kubernetes Namespace Chaos Engineering |
||
TaskSet | Kill Random Pods In Namespace `${NAMESPACE}` OOMKill Pods In Namespace `${NAMESPACE}` Mangle Service Selector In Namespace `${NAMESPACE}` Mangle Service Port In Namespace `${NAMESPACE}` Fill Random Pod Tmp Directory In Namespace `${NAMESPACE}` |
|
Kubernetes Namespace Inspection |
||
TaskSet | Inspect Warning Events in Namespace `${NAMESPACE}` Inspect Container Restarts In Namespace `${NAMESPACE}` Inspect Pending Pods In Namespace `${NAMESPACE}` Inspect Failed Pods In Namespace `${NAMESPACE}` Inspect Workload Status Conditions In Namespace `${NAMESPACE}` Get Listing Of Resources In Namespace `${NAMESPACE}` Check Event Anomalies in Namespace `${NAMESPACE}` Check Missing or Risky PodDisruptionBudget Policies in Namepace `${NAMESPACE}` Check Resource Quota Utilization in Namespace `${NAMESPACE}` |
|
SLI | Get Error Event Count within ${EVENT_AGE} and calculate Score Get Container Restarts and Score in Namespace `${NAMESPACE}` Get NotReady Pods in `${NAMESPACE}` Generate Namespace Score in `${NAMESPACE}` |
|
Kubernetes Namespace Troubleshoot |
||
TaskSet | Trace Namespace Errors Fetch Unready Pods Triage Namespace Object Condition Check Namespace Get All |
|
SLI | Get Event Count and Score Get Container Restarts and Score Get NotReady Pods Generate Namspace Score |
|
Kubernetes Patroni Health Check |
||
SLI | Determine Patroni Health | |
Kubernetes Patroni Lag Health |
||
TaskSet | Determine Patroni Health | |
SLI | Measure Patroni Member Lag | |
Kubernetes Persistent Volume Healthcheck |
||
TaskSet | Fetch Events for Unhealthy Kubernetes PersistentVolumeClaims in Namespace `${NAMESPACE}` List PersistentVolumeClaims in Terminating State in Namespace `${NAMESPACE}` List PersistentVolumes in Terminating State in Namespace `${NAMESPACE}` List Pods with Attached Volumes and Related PersistentVolume Details in Namespace `${NAMESPACE}` Fetch the Storage Utilization for PVC Mounts in Namespace `${NAMESPACE}` Check for RWO Persistent Volume Node Attachment Issues in Namespace `${NAMESPACE}` |
|
SLI | Fetch the Storage Utilization for PVC Mounts in Namespace `${NAMESPACE}` Generate Namespace Score for Namespace `${NAMESPACE}` |
|
Kubernetes Pod Resources Health |
||
TaskSet | Show Pods Without Resource Limit or Resource Requests Set in Namespace `${NAMESPACE}` Check Pod Resource Utilization with Top in Namespace `${NAMESPACE}` Identify VPA Pod Resource Recommendations in Namespace `${NAMESPACE}` Identify Overutilized Pods in Namespace `${NAMESPACE}` |
|
Kubernetes Postgres Healthcheck |
||
TaskSet | List Resources Related to Postgres Cluster `${OBJECT_NAME}` in Namespace `${NAMESPACE}` Get Postgres Pod Logs & Events for Cluster `${OBJECT_NAME}` in Namespace `${NAMESPACE}` Get Postgres Pod Resource Utilization for Cluster `${OBJECT_NAME}` in Namespace `${NAMESPACE}` Get Running Postgres Configuration for Cluster `${OBJECT_NAME}` in Namespace `${NAMESPACE}` Get Patroni Output and Add to Report for Cluster `${OBJECT_NAME}` in Namespace `${NAMESPACE}` Fetch Patroni Database Lag for Cluster `${OBJECT_NAME}` in Namespace `${NAMESPACE}` Check Database Backup Status for Cluster `${OBJECT_NAME}` in Namespace `${NAMESPACE}` Run DB Queries for Cluster `${OBJECT_NAME}` in Namespace `${NAMESPACE}` |
|
SLI | Check Patroni Database Lag in Namespace `${NAMESPACE}` on Host `${HOSTNAME}` using `patronictl` Check Database Backup Status for Cluster `${OBJECT_NAME}` in Namespace `${NAMESPACE}` Generate Namespace Score for Namespace `${NAMESPACE}` |
|
Kubernetes PostgreSQL Query |
||
TaskSet | Run Postgres Query And Results to Report | |
SLI | Run Postgres Query And Return Result As Metric | |
Kubernetes PostgreSQL Triage |
||
TaskSet | Get Standard Resources Describe Custom Resources Get Pod Logs & Events Get Pod Resource Utilization Get Running Configuration Get Patroni Output Run DB Queries |
|
Kubernetes Redis Healthcheck |
||
TaskSet | Ping `${DEPLOYMENT_NAME}` Redis Workload Verify `${DEPLOYMENT_NAME}` Redis Read Write Operation in Kubernetes |
|
Kubernetes Restart resource |
||
TaskSet | Get Current Resource State with Labels `${LABELS}` Get Resource Logs with Labels `${LABELS}` Restart Resource with Labels `${LABELS}` in `${CONTEXT}` |
|
Kubernetes Run Shell Command |
||
TaskSet | Running Kubectl And Adding Stdout To Report | |
Kubernetes Service Account Check |
||
TaskSet | Test Service Account Access to Kubernetes API Server in Namespace `${NAMESPACE}` | |
Kubernetes StatefulSet Triage |
||
TaskSet | Check Readiness Probe Configuration for StatefulSet `${STATEFULSET_NAME}` Check Liveness Probe Configuration for StatefulSet `${STATEFULSET_NAME}` Troubleshoot StatefulSet Warning Events for `${STATEFULSET_NAME}` Check StatefulSet Event Anomalies for `${STATEFULSET_NAME}` in Namespace `${NAMESPACE}` Fetch StatefulSet Logs for `${STATEFULSET_NAME}` in Namespace `${NAMESPACE}` and Add to Report Get Related StatefulSet `${STATEFULSET_NAME}` Events Fetch Manifest Details for StatefulSet `${STATEFULSET_NAME}` in Namespace `${NAMESPACE}` List Unhealthy Replica Counts for StatefulSets in Namespace `${NAMESPACE}` |
|
Kubernetes Synthetic PVC Test |
||
SLI | Run Canary Job | |
Kubernetes Tail Application Logs |
||
TaskSet | Get `${CONTAINER_NAME}` Application Logs in Namespace `${NAMESPACE}` Tail `${CONTAINER_NAME}` Application Logs For Stacktraces |
|
SLI | Tail `${CONTAINER_NAME}` Application Logs For Stacktraces | |
Kubernetes Top |
||
SLI | Running Kubectl Top And Extracting Metric Data | |
Kubernetes Triage Deployment Replicas |
||
TaskSet | Fetch Logs Get Related Events Check Deployment Replicas |
|
Kubernetes Triage Patroni |
||
TaskSet | Get Patroni Status Get Pods Status Fetch Logs |
|
Kubernetes Triage StatefulSet |
||
TaskSet | Check StatefulSets Replicas Ready Get Events For The StatefulSet Get StatefulSet Logs Get StatefulSet Manifests Dump |
|
Kubernetes Troubleshoot Deployment |
||
TaskSet | Troubleshoot Resourcing Troubleshoot Events Troubleshoot PVC Troubleshoot Pods |
|
Kubernetes Vault Triage |
||
TaskSet | Fetch Vault CSI Driver Logs in Namespace `${NAMESPACE}` Get Vault CSI Driver Warning Events in `${NAMESPACE}` Check Vault CSI Driver Replicas Fetch Vault Pod Workload Logs in Namespace `${NAMESPACE}` with Labels `${LABELS}` Get Related Vault Events in Namespace `${NAMESPACE}` Fetch Vault StatefulSet Manifest Details in `${NAMESPACE}` Fetch Vault DaemonSet Manifest Details in Kubernetes Cluster `${NAMESPACE}` Verify Vault Availability in Namespace `${NAMESPACE}` and Context `${CONTEXT}` Check Vault StatefulSet Replicas in `NAMESPACE` |
|
Kubernetes Workload Chaos Engineering |
||
TaskSet | Test `${WORKLOAD_NAME}` High Availability in Namespace `${NAMESPACE}` OOMKill `${WORKLOAD_NAME}` Pod Mangle Service Selector For `${WORKLOAD_NAME}` in `${NAMESPACE}` Mangle Service Port For `${WORKLOAD_NAME}` in `${NAMESPACE}` Fill Tmp Directory Of Pod From `${WORKLOAD_NAME}` |
|
Kubernetes Workload Metric |
||
SLI | Running Kubectl get and push the metric | |
Microsoft Teams Send Message |
||
TaskSet | Send a Message to an MS Teams Channel | |
MongoDB Health (GCP PromQL) |
||
SLI | Get Access Token Get Instance Status Get Connection Utilization Rate Get MongoDB Member State Health Get MongoDB Replication Lag Get MongoDB Queue Size Get Assertion Rate Generate MongoDB Score |
|
OpsGenie Create Alert |
||
TaskSet | Get Opsgenie System Info Create An Alert |
|
PagerDuty Webhook Handler |
||
TaskSet | Run SLX Tasks with matching PagerDuty Webhook Service ID | |
Ping Host Availability |
||
SLI | Ping host and collect packet lost percentage | |
Pingdom Health |
||
SLI | Check Pingdom Health | |
Prometheus Query (Instant) Metric |
||
SLI | Querying Prometheus Instance And Pushing Aggregated Data | |
Prometheus Query (Range) Metric |
||
SLI | Querying Prometheus Instance And Pushing Aggregated Data | |
rds-mysql-conn-count |
||
TaskSet | Run Bash File | |
SLI | Querying Prometheus Instance And Pushing Aggregated Data | |
REST Metric |
||
SLI | Request Data From Rest Endpoint | |
REST Metric (Basic Auth) |
||
SLI | Request Data From Rest Endpoint | |
REST Metric (Explicit OAuth2 with BasicAuth) |
||
SLI | Request Data From Rest Endpoint | |
REST Metric (Explicit OAuth2 with Bearer Token) |
||
SLI | Request Data From Rest Endpoint | |
RocketChat Send Message |
||
TaskSet | Send Chat Message | |
RunWhen Local Helm Update (ACR) |
||
TaskSet | Apply Available RunWhen Helm Images in ACR Registry`${REGISTRY_NAME}` | |
SLI | Check for Available RunWhen Helm Images in ACR Registry`${REGISTRY_NAME}` | |
RunWhen Platform Azure ACR Image Sync |
||
TaskSet | Sync CodeCollection Images to ACR Registry `${REGISTRY_NAME}` Sync RunWhen Local Image Updates to ACR Registry`${REGISTRY_NAME}` |
|
SLI | Check for CodeCollection Updates against ACR Registry`${REGISTRY_NAME}` Check for RunWhen Local Image Updates against ACR Registry`${REGISTRY_NAME}` Count Images Needing Update and Push Metric |
|
Slack - Send Issue Summary From RunSession |
||
TaskSet | Send Slack Notification to Channel `${SLACK_CHANNEL}` from RunSession | |
Slack Send Message |
||
TaskSet | Send Chat Message | |
SLI Alert Threshold |
||
SLI | Check If SLI Within Incident Threshold | |
Sysdig Monitor Metric |
||
SLI | Query Sysdig Metric Data And Pushing Metric | |
Sysdig Monitor PromQL Metric |
||
SLI | Querying PromQL Endpoint And Pushing Metric Data | |
Terraform Cloud Workspace Lock Check |
||
TaskSet | Checking whether the Terraform Cloud Workspace '${TERRAFORM_WORKSPACE_NAME}' is in a locked state | |
Twitter Query Handle |
||
TaskSet | Query Twitter | |
SLI | Query Twitter | |
Uptime.com Component Health |
||
SLI | Check If Vault Endpoint Is Healthy | |
Web Triage |
||
TaskSet | Validate Platform Egress Perform Inspection On URL |