All Tasks
AlertManager Webhook Handler |
||
TaskSet | Run SLX Tasks with matching AlertManager Webhook commonLabels | |
Artifactory OK |
||
SLI | Check If Artifactory Endpoint Is Healthy | |
AWS Account Creation Notification |
||
TaskSet | Get The Recently Created AWS Accounts | |
SLI | Get Count Of AWS Accounts In Organization | |
AWS Billing Period Costs by Tag |
||
SLI | Get All Billing Sliced By Tags | |
AWS CLI Command |
||
TaskSet | ${TASK_TITLE} | |
SLI | ${TASK_TITLE} | |
AWS CLI Command with Issue |
||
TaskSet | ${TASK_TITLE} | |
SLI | ${TASK_TITLE} | |
AWS CloudFormation Event Rate |
||
SLI | Fetch CloudFormation Stack Events | |
AWS CloudFormation Triage |
||
TaskSet | Get All Recent Stack Events | |
AWS CloudWatch Log Query (Pass/Fail) |
||
SLI | Running CloudWatch Log Query And Pushing 1 If No Results Found | |
AWS CloudWatch Log Query (Total Count) |
||
SLI | Running CloudWatch Log Query And Pushing The Count Of Results | |
AWS CloudWatch Logs health |
||
TaskSet | List CloudWatch Log Groups Without Retention Period in AWS Region `${AWS_REGION}` in AWS Account `${AWS_ACCOUNT_ID}` Check CloudTrail Configuration in AWS Region `${AWS_REGION}` in AWS Account `${AWS_ACCOUNT_ID}` Check for CloudTrail integration with CloudWatch Logs in AWS Region `${AWS_REGION}` in AWS Account `${AWS_ACCOUNT_ID}` |
|
SLI | Check CloudWatch Log Groups Without Retention Period in AWS Region `${AWS_REGION}` in AWS account `${AWS_ACCOUNT_ID}` Check if CloudTrail exists and is configured for multi-region in AWS Region `${AWS_REGION}` in AWS Account `${AWS_ACCOUNT_ID}` Check CloudTrail Without CloudWatch Logs in AWS Region `${AWS_REGION}` in AWS Account `${AWS_ACCOUNT_ID}` Generate Health Score |
|
AWS CloudWatch Metric Query Dashboard |
||
TaskSet | Get CloudWatch MetricQuery Insights URL | |
AWS CloudWatch Overutlized EC2 Inspection |
||
TaskSet | Check For Overutilized Ec2 Instances | |
AWS CloudWatch Tag Metric Query |
||
SLI | Run CloudWatch Metric Query Across Set Of IDs And Push Metric | |
AWS Costs by Tag |
||
TaskSet | Get All Billing Sliced By Tags | |
AWS EBS Health |
||
TaskSet | List Unattached EBS Volumes in AWS Region `${AWS_REGION}` in AWS account `${AWS_ACCOUNT_ID}` List Unencrypted EBS Volumes in AWS Region `${AWS_REGION}` in AWS account `${AWS_ACCOUNT_ID}` List Unused EBS Snapshots in AWS Region `${AWS_REGION}` in AWS account `${AWS_ACCOUNT_ID}` |
|
SLI | Check Unattached EBS Volumes in `${AWS_REGION}` Check Unencrypted EBS Volumes in `${AWS_REGION}` Check Unused EBS Snapshots in `${AWS_REGION}` Generate EBS Score |
|
AWS EC2 Health |
||
TaskSet | List stale AWS EC2 instances in AWS Region `${AWS_REGION}` in AWS account `${AWS_ACCOUNT_ID}` List stopped AWS EC2 instances in AWS Region `${AWS_REGION}` in AWS account `${AWS_ACCOUNT_ID}` List invalid AWS Auto Scaling Groups in AWS Region ${AWS_REGION} in AWS account ${AWS_ACCOUNT_ID} |
|
SLI | Check for stale AWS EC2 instances in AWS Region `${AWS_REGION}` in AWS account `${AWS_ACCOUNT_ID}` Check for stopped AWS EC2 instances in AWS Region `${AWS_REGION}` in AWS account `${AWS_ACCOUNT_ID}` Check for invalid AWS Auto Scaling Groups in AWS Region `${AWS_REGION}` in AWS account `${AWS_ACCOUNT_ID}` Generate Health Score |
|
AWS EC2 Security Check |
||
TaskSet | Check For Untagged instances Check For Dangling Volumes Check For Open Routes Check For Overused Instances Check For Underused Instances Check For Underused Volumes Check For Overused Volumes |
|
AWS EKS Cluster Health |
||
TaskSet | Check EKS Fargate Cluster Health Status Check EKS Cluster Health Status List EKS Cluster Metrics |
|
SLI | Check EKS Cluster Health Status | |
AWS EKS Nodegroup Status Check |
||
TaskSet | Check EKS Nodegroup Status | |
AWS ElastiCache Health Check |
||
TaskSet | Scan AWS Elasticache Redis Status | |
SLI | Scan ElastiCaches | |
AWS Lambda Health Check |
||
TaskSet | List Lambda Versions and Runtimes Analyze AWS Lambda Invocation Errors Monitor AWS Lambda Performance Metrics |
|
SLI | Analyze AWS Lambda Invocation Errors | |
AWS network health |
||
TaskSet | List Publicly Accessible Security Groups in AWS account `${AWS_ACCOUNT_ID}` List unused Elastic IPs in AWS account `${AWS_ACCOUNT_ID}` List unused ELBs in AWS account `${AWS_ACCOUNT_ID}` List VPCs with Flow Logs Disabled in AWS account `${AWS_ACCOUNT_ID}` |
|
SLI | Check for publicly accessible security groups in AWS account `${AWS_ACCOUNT_ID}` Check for unused Elastic IPs in AWS account `${AWS_ACCOUNT_ID}` Check for unused ELBs in AWS account `${AWS_ACCOUNT_ID}` Check for VPCs with Flow Logs disabled in AWS account `${AWS_ACCOUNT_ID}` Generate Health Score |
|
AWS RDS health |
||
TaskSet | List Unencrypted RDS Instances in AWS Region `${AWS_REGION}` in AWS Account `${AWS_ACCOUNT_ID}` List Publicly Accessible RDS Instances in AWS Region `${AWS_REGION}` in AWS Account `${AWS_ACCOUNT_ID}` List RDS Instances with Backups Disabled in AWS Region `${AWS_REGION}` in AWS Account `${AWS_ACCOUNT_ID}` |
|
SLI | Check for unencrypted RDS instances in AWS Region `${AWS_REGION}` in AWS account `${AWS_ACCOUNT_ID}` Check for publicly accessible RDS instances in AWS Region `${AWS_REGION}` in AWS account `${AWS_ACCOUNT_ID}` Check for disabled backup RDS instances in AWS Region `${AWS_REGION}` in AWS account `${AWS_ACCOUNT_ID}` Generate Health Score |
|
AWS S3 Bucket Info Report |
||
TaskSet | Check AWS S3 Bucket Storage Utilization | |
AWS S3 Health |
||
TaskSet | List S3 Buckets With Public Access in AWS Account `${AWS_ACCOUNT_NAME}` | |
SLI | Count S3 Buckets With Public Access in AWS Account `${AWS_ACCOUNT_NAME}` | |
AWS S3 Stale Check |
||
TaskSet | Create Report For Stale Buckets | |
AWS VM Triage |
||
TaskSet | Get Max VM CPU Utilization In Last 3 Hours Get Lowest VM CPU Credits In Last 3 Hours Get Max VM CPU Credit Usage In Last 3 hours Get Max VM Memory Utilization In Last 3 Hours Get Max VM Volume Usage In Last 3 Hours |
|
aws-cloudwatch-metricquery |
||
SLI | Running CloudWatch Metric Query And Pushing The Result | |
Azure ACR Image Sync |
||
TaskSet | Sync Container Images into Azure Container Registry `${ACR_REGISTRY}` | |
SLI | Count Outdated Images in Azure Container Registry `${ACR_REGISTRY}` | |
Azure AKS Triage |
||
TaskSet | Check for Resource Health Issues Affecting AKS Cluster `${AKS_CLUSTER}` In Resource Group `${AZ_RESOURCE_GROUP}` Check Configuration Health of AKS Cluster `${AKS_CLUSTER}` In Resource Group `${AZ_RESOURCE_GROUP}` Check Network Configuration of AKS Cluster `${AKS_CLUSTER}` In Resource Group `${AZ_RESOURCE_GROUP}` Fetch Activities for AKS Cluster `${AKS_CLUSTER}` In Resource Group `${AZ_RESOURCE_GROUP}` |
|
SLI | Check for Resource Health Issues Affecting AKS Cluster `${AKS_CLUSTER}` In Resource Group `${AZ_RESOURCE_GROUP}` Fetch Activities for AKS Cluster `${AKS_CLUSTER}` In Resource Group `${AZ_RESOURCE_GROUP}` Check Configuration Health of AKS Cluster `${AKS_CLUSTER}` In Resource Group `${AZ_RESOURCE_GROUP}` Generate AKS Cluster Health Score |
|
Azure App Service Triage |
||
TaskSet | Check for Resource Health Issues Affecting App Service `${APP_SERVICE_NAME}` In Resource Group `${AZ_RESOURCE_GROUP}` Check App Service `${APP_SERVICE_NAME}` Health Check Metrics In Resource Group `${AZ_RESOURCE_GROUP}` Fetch App Service `${APP_SERVICE_NAME}` Utilization Metrics In Resource Group `${AZ_RESOURCE_GROUP}` Get App Service `${APP_SERVICE_NAME}` Logs In Resource Group `${AZ_RESOURCE_GROUP}` Check Configuration Health of App Service `${APP_SERVICE_NAME}` In Resource Group `${AZ_RESOURCE_GROUP}` Check Deployment Health of App Service `${APP_SERVICE_NAME}` In Resource Group `${AZ_RESOURCE_GROUP}` Fetch App Service `${APP_SERVICE_NAME}` Activities In Resource Group `${AZ_RESOURCE_GROUP}` Check Logs for Errors in App Service `${APP_SERVICE_NAME}` In Resource Group `${AZ_RESOURCE_GROUP}` |
|
SLI | Check for Resource Health Issues Affecting App Service `${APP_SERVICE_NAME}` In Resource Group `${AZ_RESOURCE_GROUP}` Check App Service `${APP_SERVICE_NAME}` Health Check Metrics In Resource Group `${AZ_RESOURCE_GROUP}` Check App Service `${APP_SERVICE_NAME}` Configuration Health In Resource Group `${AZ_RESOURCE_GROUP}` Check Deployment Health of App Service `${APP_SERVICE_NAME}` In Resource Group `${AZ_RESOURCE_GROUP}` Fetch App Service `${APP_SERVICE_NAME}` Activities In Resource Group `${AZ_RESOURCE_GROUP}` Generate App Service Health Score |
|
Azure Application Gateway Health |
||
TaskSet | Check for Resource Health Issues Affecting Application Gateway `${APP_GATEWAY_NAME}` In Resource Group `${AZ_RESOURCE_GROUP}` Check Configuration Health of Application Gateway `${APP_GATEWAY_NAME}` In Resource Group `${AZ_RESOURCE_GROUP}` Check Backend Pool Health for Application Gateway `${APP_GATEWAY_NAME}` In Resource Group `${AZ_RESOURCE_GROUP}` |
|
SLI | Check for Resource Health Issues Affecting Application Gateway `${APP_GATEWAY_NAME}` In Resource Group `${AZ_RESOURCE_GROUP}` Check Configuration Health of Application Gateway `${APP_GATEWAY_NAME}` In Resource Group `${AZ_RESOURCE_GROUP}` Check Backend Pool Health for Application Gateway `${APP_GATEWAY_NAME}` In Resource Group `${AZ_RESOURCE_GROUP}` Generate Application Gateway Health Score |
|
Azure CLI Command |
||
TaskSet | ${TASK_TITLE} | |
SLI | ${TASK_TITLE} | |
Azure CLI Command with Issue |
||
TaskSet | ${TASK_TITLE} | |
SLI | ${TASK_TITLE} | |
Azure Internal LoadBalancer Triage |
||
TaskSet | Check Activity Logs for Azure Load Balancer `${AZ_LB_NAME}` | |
Azure VM Scale Set Triage |
||
TaskSet | Check Scale Set `${VMSCALESET}` Key Metrics In Resource Group `${AZ_RESOURCE_GROUP}` Fetch VM Scale Set `${VMSCALESET}` Config In Resource Group `${AZ_RESOURCE_GROUP}` Fetch Activities for VM Scale Set `${VMSCALESET}` In Resource Group `${AZ_RESOURCE_GROUP}` |
|
SLI | Check Scale Set `${VMSCALESET}` Key Metrics In Resource Group `${AZ_RESOURCE_GROUP}` | |
Cert-manager Expirations |
||
SLI | Inspect Certification Expiration Dates | |
Cert-Manager Health Check |
||
SLI | Health Check cert-manager Pods | |
cli-test |
||
TaskSet | Run CLI and Parse Output For Issues Exec Test Local Process Test |
|
cmd-test |
||
TaskSet | Run CLI Command Run Bash File Log Suggestion |
|
Cortex Metrics Ingester Health |
||
TaskSet | Fetch Ingestor Ring Member List and Status | |
SLI | Determine Cortex Ingester Ring Health | |
cURL CLI Command |
||
TaskSet | ${TASK_TITLE} | |
SLI | ${TASK_TITLE} | |
cURL CLI Command with Issue |
||
TaskSet | ${TASK_TITLE} | |
SLI | ${TASK_TITLE} | |
cURL Generic Report |
||
TaskSet | Run Curl Command and Add to Report | |
SLI | Run Curl Command and Push Metric | |
cURL HTTP OK |
||
TaskSet | Checking HTTP URL Is Available And Timely | |
SLI | Checking HTTP URL Is Available And Timely | |
Datadog Metric |
||
SLI | Query Datadog Metrics | |
Datadog System Load |
||
SLI | Check Datadog System Load | |
Discord Send Message |
||
TaskSet | Send Chat Message | |
DNS Latency |
||
SLI | Check DNS latency for Google Resolver | |
ElasticSearch Health |
||
SLI | Check Elasticsearch Cluster Health | |
GCP CLI Command |
||
TaskSet | ${TASK_TITLE} | |
SLI | ${TASK_TITLE} | |
GCP CLI Command with Issue |
||
TaskSet | ${TASK_TITLE} | |
SLI | ${TASK_TITLE} | |
GCP Cloud Function Health |
||
TaskSet | List Unhealhy Cloud Functions in GCP Project `${GCP_PROJECT_ID}` Get Error Logs for Unhealthy Cloud Functions in GCP Project `${GCP_PROJECT_ID}` |
|
SLI | Count unhealthy GCP Cloud Functions in GCP Project `${GCP_PROJECT_ID}` | |
GCP GCloud Generic Report |
||
TaskSet | Run Gcloud CLI Command and Push metric | |
SLI | Run Gcloud CLI Command and Push metric | |
GCP Gcloud Log Inspection |
||
TaskSet | Inspect GCP Logs For Common Errors | |
GCP Node Prempt List |
||
TaskSet | List all nodes in an active prempt operation for GCP Project `${GCP_PROJECT_ID}` | |
SLI | Count the number of nodes in active prempt operation | |
GCP Operations Suite Log Query |
||
SLI | Running GCE Logging Query And Pushing Result Count Metric | |
GCP Operations Suite Log Query Dashboard URL |
||
TaskSet | Get GCP Log Dashboard URL For Given Log Query | |
GCP Operations Suite Metric Query |
||
SLI | Running GCP OpsSuite Metric Query | |
GCP Operations Suite Prometheus Query |
||
SLI | Run Prometheus Instant Query Against Google Prom API Endpoint | |
GCP Service Status |
||
SLI | Get Number of GCP Incidents Effecting My Workspace | |
GCP Storage Bucket Health |
||
TaskSet | Fetch GCP Bucket Storage Utilization for `${PROJECT_IDS}` Add GCP Bucket Storage Configuration for `${PROJECT_IDS}` to Report Check GCP Bucket Security Configuration for `${PROJECT_IDS}` Fetch GCP Bucket Storage Operations Rate for `${PROJECT_IDS}` |
|
SLI | Fetch GCP Bucket Storage Utilization for `${PROJECT_IDS}` Check GCP Bucket Security Configuration for `${PROJECT_IDS}` Fetch GCP Bucket Storage Operations Rate for `${PROJECT_IDS}` Generate Bucket Score |
|
GitHub Actions Artifact Analysis |
||
TaskSet | Analyze artifact from GitHub workflow `${WORKFLOW_NAME}` in repository `${GITHUB_REPO}` | |
SLI | Analyze artifact from GitHub Workflow `${WORKFLOW_NAME}` in repository `${GITHUB_REPO}` and push metric | |
GitHub Actions Workflow Timing |
||
SLI | Get Average Run Time For Workflow | |
GitHub API Latency |
||
TaskSet | Check Latency When Creating a New GitHub Issue | |
SLI | Check GitHub Latency With Get Repos | |
GitHub Service Status |
||
SLI | Get Availability of GitHub or Individual GitHub Components | |
GitHub Status Incidents |
||
SLI | Get Number of Incidents Affecting GitHub | |
GitHub Status Maintenance |
||
SLI | Get Scheduled and Active GitHub Maintenance Windows | |
GitLab Availability |
||
TaskSet | Check GitLab Server Status | |
SLI | Check GitLab Server Status | |
GitLab Get Repo Latency |
||
SLI | Check GitLab Latency With Get Repos | |
GKE Kong Ingress Host Triage |
||
TaskSet | Check If Kong Ingress HTTP Error Rate Violates HTTP Error Threshold Check If Kong Ingress HTTP Request Latency Violates Threshold Check If Kong Ingress Controller Reports Upstream Errors |
|
GKE Nginx Ingress Host Triage |
||
TaskSet | Fetch Nginx HTTP Errors From GMP for Ingress `${INGRESS_OBJECT_NAME}` Find Owner and Service Health for Ingress `${INGRESS_OBJECT_NAME}` |
|
Google Chat Send Message |
||
TaskSet | Send Chat Message | |
Grafana Health |
||
SLI | Check Grafana Server Health | |
gRPC cURL Unary |
||
TaskSet | Run gRPCurl Command and Show Output | |
SLI | Run gRPCurl Command and Push Metric | |
gRPC cURL Unary |
||
TaskSet | Create a new Jira Issue | |
SLI | Search Jira Issues By Current User | |
HahiCorp Vault Health |
||
SLI | Check If Vault Endpoint Is Healthy | |
HTTP Latency |
||
SLI | Check HTTP Latency to Well Known URL | |
HTTP OK |
||
SLI | Checking HTTP URL Is Available And Timely | |
K8s Jaeger Query |
||
TaskSet | Query Traces in Jaeger for Unhealthy HTTP Response Codes in Namespace `${NAMESPACE}` | |
K8s OpenTelemetry Collector Health |
||
TaskSet | Query Collector Queued Spans in Namespace `${NAMESPACE}` Check OpenTelemetry Collector Logs For Errors In Namespace `${NAMESPACE}` Scan OpenTelemetry Logs For Dropped Spans In Namespace `${NAMESPACE}` |
|
k8s-kubectl-cmd |
||
TaskSet | Run User Provided Kubectl Command | |
SLI | Run User Provided Kubectl Command | |
Kong Ingress Health (GCP PromQL) |
||
SLI | Get Access Token Get HTTP Error Rate Get Upstream Health Get Request Latency Rate Generate Kong Ingress Score |
|
Kubeprometheus Operator Troubleshoot |
||
TaskSet | Check Prometheus Service Monitors Check For Successful Rule Setup Verify Prometheus RBAC Can Access ServiceMonitors Identify Endpoint Scraping Errors Check Prometheus API Healthy |
|
Kubernetes API Server Health |
||
SLI | Running Kubectl Check Against API Server | |
Kubernetes Application Troubleshoot |
||
TaskSet | Get `${CONTAINER_NAME}` Application Logs Scan `${CONTAINER_NAME}` Application For Misconfigured Environment Tail `${CONTAINER_NAME}` Application Logs For Stacktraces |
|
SLI | Measure Application Exceptions | |
Kubernetes ArgoCD Application Health & Troubleshoot |
||
TaskSet | Fetch ArgoCD Application Sync Status & Health for `${APPLICATION}` Fetch ArgoCD Application Last Sync Operation Details for `${APPLICATION}` Fetch Unhealthy ArgoCD Application Resources for `${APPLICATION}` Scan For Errors in Pod Logs Related to ArgoCD Application `${APPLICATION}` Fully Describe ArgoCD Application `${APPLICATION}` |
|
Kubernetes ArgoCD HelmRelease TaskSet |
||
TaskSet | Fetch all available ArgoCD Helm releases in namespace `${NAMESPACE}` Fetch Installed ArgoCD Helm release versions in namespace `${NAMESPACE}` |
|
Kubernetes Artifactory Triage |
||
TaskSet | Check Artifactory Liveness and Readiness Endpoints | |
Kubernetes cert-manager Healthcheck |
||
TaskSet | Get Namespace Certificate Summary for Namespace `${NAMESPACE}` Find Unhealthy Certificates in Namespace `${NAMESPACE}` Find Failed Certificate Requests and Identify Issues for Namespace `${NAMESPACE}` |
|
SLI | Count Unready and Expired Certificates | |
Kubernetes CLI Command |
||
TaskSet | ${TASK_TITLE} | |
SLI | ${TASK_TITLE} | |
Kubernetes CLI Command |
||
TaskSet | ${TASK_TITLE} | |
SLI | ${TASK_TITLE} | |
Kubernetes Cluster Node Health |
||
TaskSet | Check for Node Restarts in Cluster `${CONTEXT}` | |
SLI | Check for Node Restarts in Cluster `${CONTEXT}` Generate Namspace Score |
|
Kubernetes Cluster Resource Health |
||
TaskSet | Identify High Utilization Nodes for Cluster `${CONTEXT}` Identify Pods Causing High Node Utilization in Cluster `${CONTEXT}` |
|
SLI | Identify High Utilization Nodes for Cluster `${CONTEXT}` | |
Kubernetes Daemonset Health Check |
||
SLI | Health Check Daemonset | |
Kubernetes Daemonset Triage |
||
TaskSet | Get DaemonSet Logs for `${DAEMONSET_NAME}` and Add to Report Get Related Daemonset `${DAEMONSET_NAME}` Events Check Daemonset `${DAEMONSET_NAME}` Replicas |
|
Kubernetes Decomission Workload |
||
TaskSet | Generate Decomission Commands | |
Kubernetes Deployment Operations |
||
TaskSet | Restart Deployment `${DEPLOYMENT_NAME}` in Namespace `${NAMESPACE}` Force Delete Pods in Deployment `${DEPLOYMENT_NAME}` in Namespace `${NAMESPACE}` Rollback Deployment `${DEPLOYMENT_NAME}` in Namespace `${NAMESPACE}` to Previous Version Scale Down Deployment `${DEPLOYMENT_NAME}` in Namespace `${NAMESPACE}` Scale Up Deployment `${DEPLOYMENT_NAME}` in Namespace `${NAMESPACE}` by ${SCALE_UP_FACTOR}x Clean Up Stale ReplicaSets for Deployment `${DEPLOYMENT_NAME}` in Namespace `${NAMESPACE}` Scale Down Stale ReplicaSets for Deployment `${DEPLOYMENT_NAME}` in Namespace `${NAMESPACE}` |
|
Kubernetes Deployment Triage |
||
TaskSet | Check Deployment Log For Issues with `${DEPLOYMENT_NAME}` Fetch Deployments Logs for `${DEPLOYMENT_NAME}` and Add to Report Check Liveness Probe Configuration for Deployment `${DEPLOYMENT_NAME}` Check Readiness Probe Configuration for Deployment `${DEPLOYMENT_NAME}` Inspect Container Restarts for Deployment `${DEPLOYMENT_NAME}` Namespace `${NAMESPACE}` Inspect Deployment Warning Events for `${DEPLOYMENT_NAME}` Get Deployment Workload Details For `${DEPLOYMENT_NAME}` and Add to Report Inspect Deployment Replicas for `${DEPLOYMENT_NAME}` Check Deployment Event Anomalies for `${DEPLOYMENT_NAME}` Check ReplicaSet Health for Deployment `${DEPLOYMENT_NAME}` |
|
Kubernetes Event Query |
||
SLI | Get Number Of Matching Events | |
Kubernetes Flux Choas Testing |
||
TaskSet | Suspend the Flux Resource Reconciliation Find Random FluxCD Workload as Chaos Target Execute Chaos Command Execute Additional Chaos Command Resume Flux Resource Reconciliation |
|
Kubernetes Flux Suspend Namespace |
||
TaskSet | Flux Suspend Namespace ${NAMESPACE} Unsuspend Flux for Namespace ${NAMESPACE} |
|
Kubernetes FluxCD HelmRelease TaskSet |
||
TaskSet | List all available FluxCD Helmreleases in Namespace `${NAMESPACE}` Fetch Installed FluxCD Helmrelease Versions in Namespace `${NAMESPACE}` Fetch Mismatched FluxCD HelmRelease Version in Namespace `${NAMESPACE}` Fetch FluxCD HelmRelease Error Messages in Namespace `${NAMESPACE}` Check for Available Helm Chart Updates in Namespace `${NAMESPACE}` |
|
Kubernetes FluxCD Kustomization TaskSet |
||
TaskSet | List all available Kustomization objects in Namespace `${NAMESPACE}` Get details for unready Kustomizations in Namespace `${NAMESPACE}` |
|
Kubernetes Fluxcd Reconciliation Report |
||
TaskSet | Health Check Flux Reconciliation | |
SLI | Health Check Flux Reconciliation | |
Kubernetes GitOps GitHub Remediation |
||
TaskSet | Remediate Readiness and Liveness Probe GitOps Manifests in Namespace `${NAMESPACE}` Increase ResourceQuota for Namespace `${NAMESPACE}` Adjust Pod Resources to Match VPA Recommendation in `${NAMESPACE}` Expand Persistent Volume Claims in Namespace `${NAMESPACE}` |
|
Kubernetes Grafana Loki Health Check |
||
TaskSet | Check Loki Ring API Check Loki API Ready |
|
Kubernetes Image Check |
||
TaskSet | Check Image Rollover Times for Namespace `${NAMESPACE}` List Images and Tags for Every Container in Running Pods for Namespace `${NAMESPACE}` List Images and Tags for Every Container in Failed Pods for Namespace `${NAMESPACE}` List ImagePullBackOff Events and Test Path and Tags for Namespace `${NAMESPACE}` |
|
Kubernetes Ingress GCE & GCP HTTP Load Balancer Healthcheck |
||
TaskSet | Search For GCE Ingress Warnings in GKE Identify Unhealthy GCE HTTP Ingress Backends Validate GCP HTTP Load Balancer Configurations Fetch Network Error Logs from GCP Operations Manager for Ingress Backends Review GCP Operations Logging Dashboard |
|
Kubernetes Ingress Healthcheck |
||
TaskSet | Fetch Ingress Object Health in Namespace `${NAMESPACE}` Check for Ingress and Service Conflicts in Namespace `${NAMESPACE}` |
|
Kubernetes Jenkins Healthcheck |
||
TaskSet | Query The Jenkins Kubernetes Workload HTTP Endpoint Query For Stuck Jenkins Jobs |
|
Kubernetes Labeled Pod Count |
||
SLI | Measure Number of Running Pods with Label | |
Kubernetes Namespace Chaos Engineering |
||
TaskSet | Kill Random Pods In Namespace `${NAMESPACE}` OOMKill Pods In Namespace `${NAMESPACE}` Mangle Service Selector In Namespace `${NAMESPACE}` Mangle Service Port In Namespace `${NAMESPACE}` Fill Random Pod Tmp Directory In Namespace `${NAMESPACE}` |
|
Kubernetes Namespace Inspection |
||
TaskSet | Inspect Warning Events in Namespace `${NAMESPACE}` Inspect Container Restarts In Namespace `${NAMESPACE}` Inspect Pending Pods In Namespace `${NAMESPACE}` Inspect Failed Pods In Namespace `${NAMESPACE}` Inspect Workload Status Conditions In Namespace `${NAMESPACE}` Get Listing Of Resources In Namespace `${NAMESPACE}` Check Event Anomalies in Namespace `${NAMESPACE}` Check Missing or Risky PodDisruptionBudget Policies in Namepace `${NAMESPACE}` Check Resource Quota Utilization in Namespace `${NAMESPACE}` |
|
SLI | Get Event Count and Score Get Container Restarts and Score Get NotReady Pods Generate Namspace Score |
|
Kubernetes Namespace Troubleshoot |
||
TaskSet | Trace Namespace Errors Fetch Unready Pods Triage Namespace Object Condition Check Namespace Get All |
|
SLI | Get Event Count and Score Get Container Restarts and Score Get NotReady Pods Generate Namspace Score |
|
Kubernetes Patroni Health Check |
||
SLI | Determine Patroni Health | |
Kubernetes Patroni Lag Health |
||
TaskSet | Determine Patroni Health | |
SLI | Measure Patroni Member Lag | |
Kubernetes Persistent Volume Healthcheck |
||
TaskSet | Fetch Events for Unhealthy Kubernetes PersistentVolumeClaims in Namespace `${NAMESPACE}` List PersistentVolumeClaims in Terminating State in Namespace `${NAMESPACE}` List PersistentVolumes in Terminating State in Namespace `${NAMESPACE}` List Pods with Attached Volumes and Related PersistentVolume Details in Namespace `${NAMESPACE}` Fetch the Storage Utilization for PVC Mounts in Namespace `${NAMESPACE}` Check for RWO Persistent Volume Node Attachment Issues in Namespace `${NAMESPACE}` |
|
SLI | Fetch the Storage Utilization for PVC Mounts in Namespace `${NAMESPACE}` Generate Namspace Score |
|
Kubernetes Pod Resources Health |
||
TaskSet | Show Pods Without Resource Limit or Resource Requests Set in Namespace `${NAMESPACE}` Get Pod Resource Utilization with Top in Namespace `${NAMESPACE}` Identify VPA Pod Resource Recommendations in Namespace `${NAMESPACE}` Identify Resource Constrained Pods In Namespace `${NAMESPACE}` |
|
Kubernetes Postgres Healthcheck |
||
TaskSet | List Resources Related to Postgres Cluster `${OBJECT_NAME}` in Namespace `${NAMESPACE}` Get Postgres Pod Logs & Events for Cluster `${OBJECT_NAME}` in Namespace `${NAMESPACE}` Get Postgres Pod Resource Utilization for Cluster `${OBJECT_NAME}` in Namespace `${NAMESPACE}` Get Running Postgres Configuration for Cluster `${OBJECT_NAME}` in Namespace `${NAMESPACE}` Get Patroni Output and Add to Report for Cluster `${OBJECT_NAME}` in Namespace `${NAMESPACE}` Fetch Patroni Database Lag for Cluster `${OBJECT_NAME}` in Namespace `${NAMESPACE}` Check Database Backup Status for Cluster `${OBJECT_NAME}` in Namespace `${NAMESPACE}` Run DB Queries for Cluster `${OBJECT_NAME}` in Namespace `${NAMESPACE}` |
|
SLI | Fetch Patroni Database Lag Check Database Backup Status for Cluster `${OBJECT_NAME}` in Namespace `${NAMESPACE}` Generate Namspace Score |
|
Kubernetes PostgreSQL Query |
||
TaskSet | Run Postgres Query And Results to Report | |
SLI | Run Postgres Query And Return Result As Metric | |
Kubernetes PostgreSQL Triage |
||
TaskSet | Get Standard Resources Describe Custom Resources Get Pod Logs & Events Get Pod Resource Utilization Get Running Configuration Get Patroni Output Run DB Queries |
|
Kubernetes Redis Healthcheck |
||
TaskSet | Ping `${DEPLOYMENT_NAME}` Redis Workload Verify `${DEPLOYMENT_NAME}` Redis Read Write Operation |
|
Kubernetes Restart resource |
||
TaskSet | Get Current Resource State with Labels `${LABELS}` Get Resource Logs with Labels `${LABELS}` Restart Resource with Labels `${LABELS}` |
|
Kubernetes Run Shell Command |
||
TaskSet | Running Kubectl And Adding Stdout To Report | |
Kubernetes Service Account Check |
||
TaskSet | Test Service Account Access to Kubernetes API Server in Namespace `${NAMESPACE}` | |
Kubernetes StatefulSet Triage |
||
TaskSet | Check Readiness Probe Configuration for StatefulSet `${STATEFULSET_NAME}` Check Liveness Probe Configuration for StatefulSet `${STATEFULSET_NAME}` Troubleshoot StatefulSet Warning Events for `${STATEFULSET_NAME}` Check StatefulSet Event Anomalies for `${STATEFULSET_NAME}` Fetch StatefulSet Logs for `${STATEFULSET_NAME}` and Add to Report Get Related StatefulSet `${STATEFULSET_NAME}` Events Fetch Manifest Details for StatefulSet `${STATEFULSET_NAME}` List StatefulSets with Unhealthy Replica Counts In Namespace `${NAMESPACE}` |
|
Kubernetes Synthetic PVC Test |
||
SLI | Run Canary Job | |
Kubernetes Tail Application Logs |
||
TaskSet | Get `${CONTAINER_NAME}` Application Logs Tail `${CONTAINER_NAME}` Application Logs For Stacktraces |
|
SLI | Tail `${CONTAINER_NAME}` Application Logs For Stacktraces | |
Kubernetes Top |
||
SLI | Running Kubectl Top And Extracting Metric Data | |
Kubernetes Triage Deployment Replicas |
||
TaskSet | Fetch Logs Get Related Events Check Deployment Replicas |
|
Kubernetes Triage Patroni |
||
TaskSet | Get Patroni Status Get Pods Status Fetch Logs |
|
Kubernetes Triage StatefulSet |
||
TaskSet | Check StatefulSets Replicas Ready Get Events For The StatefulSet Get StatefulSet Logs Get StatefulSet Manifests Dump |
|
Kubernetes Troubleshoot Deployment |
||
TaskSet | Troubleshoot Resourcing Troubleshoot Events Troubleshoot PVC Troubleshoot Pods |
|
Kubernetes Vault Triage |
||
TaskSet | Fetch Vault CSI Driver Logs Get Vault CSI Driver Warning Events Check Vault CSI Driver Replicas Fetch Vault Logs Get Related Vault Events Fetch Vault StatefulSet Manifest Details Fetch Vault DaemonSet Manifest Details Verify Vault Availability Check Vault StatefulSet Replicas |
|
Kubernetes Workload Chaos Engineering |
||
TaskSet | Test `${WORKLOAD_NAME}` High Availability OOMKill `${WORKLOAD_NAME}` Pod Mangle Service Selector For `${WORKLOAD_NAME}` Mangle Service Port For `${WORKLOAD_NAME}` Fill Tmp Directory Of Pod From `${WORKLOAD_NAME}` |
|
Kubernetes Workload Metric |
||
SLI | Running Kubectl get and push the metric | |
Microsoft Teams Send Message |
||
TaskSet | Send a Message to an MS Teams Channel | |
MongoDB Health (GCP PromQL) |
||
SLI | Get Access Token Get Instance Status Get Connection Utilization Rate Get MongoDB Member State Health Get MongoDB Replication Lag Get MongoDB Queue Size Get Assertion Rate Generate MongoDB Score |
|
OpsGenie Create Alert |
||
TaskSet | Get Opsgenie System Info Create An Alert |
|
PagerDuty Webhook Handler |
||
TaskSet | Run SLX Tasks with matching PagerDuty Webhook Service ID | |
Ping Host Availability |
||
SLI | Ping host and collect packet lost percentage | |
Pingdom Health |
||
SLI | Check Pingdom Health | |
Prometheus Query (Instant) Metric |
||
SLI | Querying Prometheus Instance And Pushing Aggregated Data | |
Prometheus Query (Range) Metric |
||
SLI | Querying Prometheus Instance And Pushing Aggregated Data | |
rds-mysql-conn-count |
||
TaskSet | Run Bash File | |
SLI | Querying Prometheus Instance And Pushing Aggregated Data | |
REST Metric |
||
SLI | Request Data From Rest Endpoint | |
REST Metric (Basic Auth) |
||
SLI | Request Data From Rest Endpoint | |
REST Metric (Explicit OAuth2 with BasicAuth) |
||
SLI | Request Data From Rest Endpoint | |
REST Metric (Explicit OAuth2 with Bearer Token) |
||
SLI | Request Data From Rest Endpoint | |
RocketChat Send Message |
||
TaskSet | Send Chat Message | |
RunWhen Local Helm Update (ACR) |
||
TaskSet | Apply Available RunWhen Helm Images in ACR Registry`${REGISTRY_NAME}` | |
SLI | Check for Available RunWhen Helm Images in ACR Registry`${REGISTRY_NAME}` | |
RunWhen Platform Azure ACR Image Sync |
||
TaskSet | Sync CodeCollection Images to ACR Registry `${REGISTRY_NAME}` Sync RunWhen Local Image Updates to ACR Registry`${REGISTRY_NAME}` |
|
SLI | Check for CodeCollection Updates against ACR Registry`${REGISTRY_NAME}` Check for RunWhen Local Image Updates against ACR Registry`${REGISTRY_NAME}` Count Images Needing Update and Push Metric |
|
Slack Send Message |
||
TaskSet | Send Chat Message | |
SLI Alert Threshold |
||
SLI | Check If SLI Within Incident Threshold | |
Sysdig Monitor Metric |
||
SLI | Query Sysdig Metric Data And Pushing Metric | |
Sysdig Monitor PromQL Metric |
||
SLI | Querying PromQL Endpoint And Pushing Metric Data | |
Terraform Cloud Workspace Lock Check |
||
TaskSet | Checking whether the Terraform Cloud Workspace is in a locked state | |
Test Issues |
||
TaskSet | Raise Full Issue | |
Twitter Query Handle |
||
TaskSet | Query Twitter | |
SLI | Query Twitter | |
Uptime.com Component Health |
||
SLI | Check If Vault Endpoint Is Healthy | |
Web Triage |
||
TaskSet | Validate Platform Egress Perform Inspection On URL |