OPENSHIFT
| 
       | 
  
        Retreieve aggregate data via kubectl top command.
    
Tasks:
    
Tasks:
- Running Kubectl Top And Extracting Metric Data
 
        Uses kubectl (or equivalent) to query the state of a patroni cluster and determine if it's healthy.
    
Tasks:
    
Tasks:
- Determine Patroni Health
 
        Check the health of a Kubernetes API server using kubectl.
Returns 1 when OK, or a 0 in the case of an unhealthy API server.
    
Tasks:
    
Tasks:
- Running Kubectl Check Against API Server
 
    A taskset for troubleshooting general issues associated with typical kubernetes deployment resources.
Supports API interactions via both the API client and Kubectl binary through RunWhen Shell Services.
    
Tasks:
    
Tasks:
- Troubleshoot Resourcing
 - Troubleshoot Events
 - Troubleshoot PVC
 - Troubleshoot Pods
 
    This codebundle runs an arbitrary kubectl command and writes the stdout to a report.
Typically used in conjunction with other codebundles.
    
Tasks:
    
Tasks:
- Running Kubectl And Adding Stdout To Report
 
    This taskset runs general troubleshooting checks against all applicable objects in a namespace, checks error events, and searches pod logs for error entries.
    
Tasks:
    
Tasks:
- Trace Namespace Errors
 - Fetch Unready Pods
 - Triage Namespace
 - Object Condition Check
 - Namespace Get All
 
        This SLI uses kubectl to score namespace health. Produces a value between 0 (completely failing thet test) and 1 (fully passing the test). Looks for container restarts, events, and pods not ready.
    
Tasks:
    
Tasks:
- Get Event Count and Score
 - Get Container Restarts and Score
 - Get NotReady Pods
 - Generate Namspace Score
 
        This codebundle runs a kubectl get command that produces a value and pushes the metric.
Uses jmespath for filtering and allows calculations such as count, sum, avg on specified fields.
    
Tasks:
    
Tasks:
- Running Kubectl get and push the metric
 
    A taskset for troubleshooting issues for StatefulSets and their related resources.
    
Tasks:
    
Tasks:
- Check StatefulSets Replicas Ready
 - Get Events For The StatefulSet
 - Get StatefulSet Logs
 - Get StatefulSet Manifests Dump
 
    Runs multiple Kubernetes and psql commands to report on the health of a postgres cluster.
    
Tasks:
    
Tasks:
- Get Standard Resources
 - Describe Custom Resources
 - Get Pod Logs & Events
 - Get Pod Resource Utilization
 - Get Running Configuration
 - Get Patroni Output
 - Run DB Queries
 
    Searches a namespace for matching objects and provides the commands to decommission them.
    
Tasks:
    
Tasks:
- Generate Decomission Commands
 
        Returns the number of events with matching messages as an SLI metric.
    
Tasks:
    
Tasks:
- Get Number Of Matching Events
 
    Triages issues related to a deployment's replicas.
    
Tasks:
    
Tasks:
- Fetch Logs
 - Get Related Events
 - Check Deployment Replicas
 
    Detects and reinitializes laggy Patroni cluster members which are unable to catchup in replication using kubectl and patronictl.
    
Tasks:
    
Tasks:
- Determine Patroni Health
 
        Measures the maximum replica lag across a Patroni cluster.
    
Tasks:
    
Tasks:
- Measure Patroni Member Lag
 
        Checks that the current state of a daemonset is healthy and returns a score of either 1 (healthy) or 0 (unhealthy).
    
Tasks:
    
Tasks:
- Health Check Daemonset
 
        Creates an adhoc one-shot job which mounts a PVC as a canary test, which is polled for success before being torn down.
    
Tasks:
    
Tasks:
- Run Canary Job
 
    Taskset to triage issues related to patroni.
    
Tasks:
    
Tasks:
- Get Patroni Status
 - Get Pods Status
 - Fetch Logs
 
    Runs a postgres SQL query and pushes the returned result into a report.
During execution, the SQL query should be passed to a Kubernetes workload that has access to the psql binary.
The workload will run the query and return the results from stdout.
    
Tasks:
    
Tasks:
- Run Postgres Query And Results to Report
 
    Triages issues related to a StatefulSet and its pods, including persistent volumes and ordered deployment characteristics.
    
Tasks:
    
Tasks:
- Analyze Application Log Patterns for StatefulSet `STATEFULSET_NAME` in Namespace `NAMESPACE`
 - Detect Log Anomalies for StatefulSet `STATEFULSET_NAME` in Namespace `NAMESPACE`
 - Check Liveness Probe Configuration for StatefulSet `STATEFULSET_NAME`
 - Check Readiness Probe Configuration for StatefulSet `STATEFULSET_NAME` in Namespace `NAMESPACE`
 - Check for Container Restarts in StatefulSet `STATEFULSET_NAME` in Namespace `NAMESPACE`
 - Inspect StatefulSet Warning Events for `STATEFULSET_NAME` in Namespace `NAMESPACE`
 - Fetch StatefulSet Workload Details For `STATEFULSET_NAME` in Namespace `NAMESPACE`
 - Inspect StatefulSet Replicas for `STATEFULSET_NAME` in namespace `NAMESPACE`
 - Check StatefulSet PersistentVolumeClaims for `STATEFULSET_NAME` in Namespace `NAMESPACE`
 - Identify Recent Configuration Changes for StatefulSet `STATEFULSET_NAME` in Namespace `NAMESPACE`
 
    Provides a list of tasks that can remediate configuraiton issues with manifests in GitHub based GitOps repositories.
    
Tasks:
    
Tasks:
- Remediate Readiness and Liveness Probe GitOps Manifests in Namespace `NAMESPACE`
 - Increase ResourceQuota Limit for Namespace `NAMESPACE` in GitHub GitOps Repository
 - Adjust Pod Resources to Match VPA Recommendation in `NAMESPACE`
 - Expand Persistent Volume Claims in Namespace `NAMESPACE`
 
    Checks the overall health of certificates in a namespace that are managed by cert-manager.
    
Tasks:
    
Tasks:
- Get Namespace Certificate Summary for Namespace `NAMESPACE`
 - Find Unhealthy Certificates in Namespace `NAMESPACE`
 - Find Failed Certificate Requests and Identify Issues for Namespace `NAMESPACE`
 
        Counts the number of unhealthy cert-manager managed certificates in a namespace.
    
Tasks:
    
Tasks:
- Count Unready and Expired Certificates in Namespace `${NAMESPACE}`
 
    This taskset collects information about perstistent volumes and persistent volume claims to
validate health or help troubleshoot potential issues.
    
Tasks:
    
Tasks:
- Query The Jenkins Kubernetes Workload HTTP Endpoint in Kubernetes StatefulSet `STATEFULSET_NAME`
 - Query For Stuck Jenkins Jobs in Kubernetes Statefulset Workload `STATEFULSET_NAME`
 
    This taskset provides tasks to troubleshoot service accounts in a Kubernetes namespace.
    
Tasks:
    
Tasks:
- Test Service Account Access to Kubernetes API Server in Namespace `NAMESPACE`
 
    This codebundle runs a series of tasks to identify potential helm release issues related to Flux managed Helm objects.
    
Tasks:
    
Tasks:
- List all available FluxCD Helmreleases in Namespace `NAMESPACE`
 - Fetch Installed FluxCD Helmrelease Versions in Namespace `NAMESPACE`
 - Fetch Mismatched FluxCD HelmRelease Version in Namespace `NAMESPACE`
 - Fetch FluxCD HelmRelease Error Messages in Namespace `NAMESPACE`
 - Check for Available Helm Chart Updates in Namespace `NAMESPACE`
 
    Perform oprational tasks for a Kubernetes deployment.
    
Tasks:
    
Tasks:
- Restart Deployment `DEPLOYMENT_NAME` in Namespace `NAMESPACE`
 - Force Delete Pods in Deployment `DEPLOYMENT_NAME` in Namespace `NAMESPACE`
 - Rollback Deployment `DEPLOYMENT_NAME` in Namespace `NAMESPACE` to Previous Version
 - Scale Down Deployment `DEPLOYMENT_NAME` in Namespace `NAMESPACE`
 - Scale Up Deployment `DEPLOYMENT_NAME` in Namespace `NAMESPACE` by SCALE_UP_FACTORx
 - Clean Up Stale ReplicaSets for Deployment `DEPLOYMENT_NAME` in Namespace `NAMESPACE`
 - Scale Down Stale ReplicaSets for Deployment `DEPLOYMENT_NAME` in Namespace `NAMESPACE`
 - Scale Up HPA for Deployment `DEPLOYMENT_NAME` in Namespace `NAMESPACE` by HPA_SCALE_FACTORx
 - Scale Down HPA for Deployment `DEPLOYMENT_NAME` in Namespace `NAMESPACE` to Min HPA_MIN_REPLICAS
 - Increase CPU Resources for Deployment `DEPLOYMENT_NAME` in Namespace `NAMESPACE`
 - Increase Memory Resources for Deployment `DEPLOYMENT_NAME` in Namespace `NAMESPACE`
 - Decrease CPU Resources for Deployment `DEPLOYMENT_NAME` in Namespace `NAMESPACE`
 - Decrease Memory Resources for Deployment `DEPLOYMENT_NAME` in Namespace `NAMESPACE`
 
    This taskset restarts a resource with a given set of labels, typically used with other tasksets.
    
Tasks:
    
Tasks:
- Get Current Resource State with Labels `LABELS`
 - Get Resource Logs with Labels `LABELS`
 - Restart Resource with Labels `LABELS` in `CONTEXT`
 
    Detects and analyzes stacktraces/tracebacks in Kubernetes workload logs for troubleshooting application issues.
    
Tasks:
    
Tasks:
- Analyze Workload Stacktraces for WORKLOAD_TYPE `WORKLOAD_NAME` in Namespace `NAMESPACE`
 
        This SLI monitors stacktrace health in kubernetes workload application logs. Produces a value between 0 (stacktraces detected) and 1 (no stacktraces found). Focuses specifically on application error detection through stacktrace analysis.
    
Tasks:
    
Tasks:
- Get Stacktrace Health Score for ${WORKLOAD_TYPE} `${WORKLOAD_NAME}`
 - Generate Stacktrace Health Score for `${WORKLOAD_NAME}`
 
    This taskset investigates the logs, state and health of Kubernetes Prometheus operator.
    
Tasks:
    
Tasks:
- Check Prometheus Service Monitors in namespace `NAMESPACE`
 - Check For Successful Rule Setup in Kubernetes Namespace `NAMESPACE`
 - Verify Prometheus RBAC Can Access ServiceMonitors in Namespace `PROM_NAMESPACE`
 - Inspect Prometheus Operator Logs for Scraping Errors in Namespace `NAMESPACE`
 - Check Prometheus API Healthy in Namespace `PROM_NAMESPACE`
 
    Evaluate cluster node health using kubectl
    
Tasks:
    
Tasks:
- Check for Node Restarts in Cluster `CONTEXT` within Interval `RW_LOOKBACK_WINDOW`
 
        Evaluate cluster node health using kubectl.
    
Tasks:
    
Tasks:
- Check for Node Restarts in Cluster `${CONTEXT}`
 - Generate Namespace Score in Kubernetes Cluster `$${CONTEXT}`
 
    This taskset collects information and runs general troubleshooting checks against argocd application objects within a namespace.
    
Tasks:
    
Tasks:
- Fetch ArgoCD Application Sync Status & Health for `APPLICATION`
 - Fetch ArgoCD Application Last Sync Operation Details for `APPLICATION`
 - Fetch Unhealthy ArgoCD Application Resources for `APPLICATION`
 - Scan For Errors in Pod Logs Related to ArgoCD Application `APPLICATION`
 - Fully Describe ArgoCD Application `APPLICATION`
 
    Performs a triage on the Open Source version of Artifactory in a Kubernetes cluster.
    
Tasks:
    
Tasks:
- Check Artifactory Liveness and Readiness Endpoints in `NAMESPACE`
 
    Triages issues related to a ingress objects and services.
    
Tasks:
    
Tasks:
- Fetch Ingress Object Health in Namespace `NAMESPACE`
 - Check for Ingress and Service Conflicts in Namespace `NAMESPACE`
 
    Performs application-level troubleshooting by inspecting the logs of a workload for parsable exceptions,
and attempts to determine next steps.
    
Tasks:
    
Tasks:
- Get `CONTAINER_NAME` Application Logs in Namespace `NAMESPACE`
 - Tail `CONTAINER_NAME` Application Logs For Stacktraces
 
        Measures the number of exception stacktraces present in an application's logs over a time period.
    
Tasks:
    
Tasks:
- Tail `${CONTAINER_NAME}` Application Logs For Stacktraces
 
    Identify resource constraints or issues in a cluster.
    
Tasks:
    
Tasks:
- Identify High Utilization Nodes for Cluster `CONTEXT`
 - Identify Pods Causing High Node Utilization in Cluster `CONTEXT`
 - Identify Pods with Resource Limits Exceeding Node Capacity in Cluster `CONTEXT`
 
        Counts the number of nodes above 90% CPU or Memory Utilization from kubectl top.
    
Tasks:
    
Tasks:
- Identify High Utilization Nodes for Cluster `${CONTEXT}`
 - Identify Pods with Resource Limits Exceeding Node Capacity in Cluster `${CONTEXT}`
 - Generate Cluster Resource Health Score
 
    This taskset is used to suspend a flux resource for the purposes of executing chaos tasks.
    
Tasks:
    
Tasks:
- Suspend the Flux Resource Reconciliation for `FLUX_RESOURCE_NAME` in namespace `FLUX_RESOURCE_NAMESPACE`
 - Select Random FluxCD Workload for Chaos Target in Namespace `FLUX_RESOURCE_NAMESPACE`
 - Execute Chaos Command on `TARGET_RESOURCE` in Namespace `TARGET_NAMESPACE`
 - Execute Additional Chaos Command on FLUX_RESOURCE_TYPE 'FLUX_RESOURCE_NAME' in namespace 'FLUX_RESOURCE_NAMESPACE'
 - Resume Flux Resource Reconciliation in `TARGET_NAMESPACE`
 
    This codebundle runs a series of tasks to identify potential Kustomization issues related to Flux managed Kustomization objects.
    
Tasks:
    
Tasks:
- List All FluxCD Kustomization objects in Namespace `NAMESPACE` in Cluster `CONTEXT`
 - List Suspended FluxCD Kustomization objects in Namespace `NAMESPACE` in Cluster `CONTEXT`
 - List Unready FluxCD Kustomizations in Namespace `NAMESPACE` in Cluster `CONTEXT`
 
        This codebundle checks for unhealthy or suspended FluxCD Kustomization objects.
    
Tasks:
    
Tasks:
- List Suspended FluxCD Kustomization objects in Namespace `${NAMESPACE}` in Cluster `${CONTEXT}`
 - List Unready FluxCD Kustomizations in Namespace `${NAMESPACE}` in Cluster `${CONTEXT}`
 - Generate FluxCD Kustomization Health Score for Namespace `${NAMESPACE}` in Cluster `${CONTEXT}`
 
    This codebundle runs a series of tasks to identify potential helm release issues related to ArgoCD managed Helm objects.
    
Tasks:
    
Tasks:
- Fetch all available ArgoCD Helm releases in namespace `NAMESPACE`
 - Fetch Installed ArgoCD Helm release versions in namespace `NAMESPACE`
 
    Inspects the resources provisioned for a given set of pods and raises issues or recommendations as necessary.
    
Tasks:
    
Tasks:
- Show Pods Without Resource Limit or Resource Requests Set in Namespace `NAMESPACE`
 - Check Pod Resource Utilization with Top in Namespace `NAMESPACE`
 - Identify VPA Pod Resource Recommendations in Namespace `NAMESPACE`
 - Identify Overutilized Pods in Namespace `NAMESPACE`
 
    Triages issues related to a DaemonSet and its pods, including node scheduling and resource constraints.
    
Tasks:
    
Tasks:
- Analyze Application Log Patterns for DaemonSet `DAEMONSET_NAME` in Namespace `NAMESPACE`
 - Detect Log Anomalies for DaemonSet `DAEMONSET_NAME` in Namespace `NAMESPACE`
 - Identify Recent Configuration Changes for DaemonSet `DAEMONSET_NAME` in Namespace `NAMESPACE`
 - Check Liveness Probe Configuration for DaemonSet `DAEMONSET_NAME`
 - Check Readiness Probe Configuration for DaemonSet `DAEMONSET_NAME` in Namespace `NAMESPACE`
 - Check for Container Restarts in DaemonSet `DAEMONSET_NAME` in Namespace `NAMESPACE`
 - Inspect DaemonSet Warning Events for `DAEMONSET_NAME` in Namespace `NAMESPACE`
 - Fetch DaemonSet Workload Details For `DAEMONSET_NAME` in Namespace `NAMESPACE`
 - Inspect DaemonSet Status for `DAEMONSET_NAME` in namespace `NAMESPACE`
 - Check Node Affinity and Tolerations for DaemonSet `DAEMONSET_NAME` in Namespace `NAMESPACE`
 
    Checks istio proxy sidecar injection status, high memory and cpu usage, warnings and errors in logs, valid certificates, configuration and verify istio installation.
    
Tasks:
    
Tasks:
- Verify Istio Sidecar Injection for Cluster `CONTEXT`
 - Check Istio Sidecar Resource Usage for Cluster `CONTEXT`
 - Validate Istio Installation in Cluster `CONTEXT`
 - Check Istio Controlplane Logs For Errors in Cluster `CONTEXT`
 - Fetch Istio Proxy Logs in Cluster `CONTEXT`
 - Verify Istio SSL Certificates in Cluster `CONTEXT`
 - Check Istio Configuration Health in Cluster `CONTEXT`
 
        Checks istio proxy sidecar injection status, high memory and cpu usage, warnings and errors in logs, valid certificates, configuration and verify istio installation.
    
Tasks:
    
Tasks:
- Verify Istio Sidecar Injection for Cluster `${CONTEXT}`
 - Check Istio Sidecar Resource Usage for Cluster `${CONTEXT}`
 - Validate Istio Installation in Cluster `${CONTEXT}`
 - Check Istio Controlplane Logs For Errors in Cluster `${CONTEXT}`
 - Fetch Istio Proxy Logs in Cluster `${CONTEXT}`
 - Verify Istio SSL Certificates in Cluster `${CONTEXT}`
 - Check Istio Configuration Health in Cluster `${CONTEXT}`
 - Generate Health Score for Cluster ${CONTEXT}
 
    Performs application-level troubleshooting by inspecting the logs of a workload for parsable exceptions,
and attempts to determine next steps.
    
Tasks:
    
Tasks:
- Get `CONTAINER_NAME` Application Logs from Workload `WORKLOAD_NAME` in Namespace `NAMESPACE`
 - Scan `CONTAINER_NAME` Application For Misconfigured Environment
 - Tail `CONTAINER_NAME` Application Logs For Stacktraces in Workload `WORKLOAD_NAME`
 
        Measures the number of exception stacktraces present in an application's logs over a time period.
    
Tasks:
    
Tasks:
- Measure Application Exceptions in `${NAMESPACE}`
 
    This taskset provides detailed information about the images used in a Kubernetes namespace.
    
Tasks:
    
Tasks:
- Check Image Rollover Times for Namespace `NAMESPACE`
 - List Images and Tags for Every Container in Running Pods for Namespace `NAMESPACE`
 - List Images and Tags for Every Container in Failed Pods for Namespace `NAMESPACE`
 - List ImagePullBackOff Events and Test Path and Tags for Namespace `NAMESPACE`
 
    This taskset runs general troubleshooting checks against all applicable objects in a namespace. Looks for warning events, odd or frequent normal events, restarting containers and failed or pending pods.
    
Tasks:
    
Tasks:
- Inspect Warning Events in Namespace `NAMESPACE`
 - Inspect Container Restarts In Namespace `NAMESPACE`
 - Inspect Pending Pods In Namespace `NAMESPACE`
 - Inspect Failed Pods In Namespace `NAMESPACE`
 - Inspect Workload Status Conditions In Namespace `NAMESPACE`
 - Get Listing Of Resources In Namespace `NAMESPACE`
 - Check Event Anomalies in Namespace `NAMESPACE`
 - Check Missing or Risky PodDisruptionBudget Policies in Namepace `NAMESPACE`
 - Check Resource Quota Utilization in Namespace `NAMESPACE`
 
        This SLI uses kubectl to score namespace health. Produces a value between 0 (completely failing thet test) and 1 (fully passing the test). Looks for container restarts, events, and pods not ready.
    
Tasks:
    
Tasks:
- Get Error Event Count within ${EVENT_AGE} and calculate Score
 - Get Container Restarts and Score in Namespace `${NAMESPACE}`
 - Get NotReady Pods in `${NAMESPACE}`
 - Generate Namespace Score in `${NAMESPACE}`
 
    This taskset collects information about storage such as PersistentVolumes and PersistentVolumeClaims to
validate health or help troubleshoot potential storage issues.
    
Tasks:
    
Tasks:
- Fetch Events for Unhealthy Kubernetes PersistentVolumeClaims in Namespace `NAMESPACE`
 - List PersistentVolumeClaims in Terminating State in Namespace `NAMESPACE`
 - List PersistentVolumes in Terminating State in Namespace `NAMESPACE`
 - List Pods with Attached Volumes and Related PersistentVolume Details in Namespace `NAMESPACE`
 - Fetch the Storage Utilization for PVC Mounts in Namespace `NAMESPACE`
 - Check for RWO Persistent Volume Node Attachment Issues in Namespace `NAMESPACE`
 
        This SLI collects information about storage such as PersistentVolumes and PersistentVolumeClaims and generates an aggregated health score for the namespace. 1 = Healthy, 0 = Failed, >0 <1 = Degraded
    
Tasks:
    
Tasks:
- Fetch the Storage Utilization for PVC Mounts in Namespace `${NAMESPACE}`
 - Generate Namespace Score for Namespace `${NAMESPACE}`
 
    Triages issues related to a deployment and its replicas.
    
Tasks:
    
Tasks:
- Analyze Application Log Patterns for Deployment `DEPLOYMENT_NAME` in Namespace `NAMESPACE`
 - Detect Log Anomalies for Deployment `DEPLOYMENT_NAME` in Namespace `NAMESPACE`
 - Fetch Deployment Logs for `DEPLOYMENT_NAME` in Namespace `NAMESPACE`
 - Check Liveness Probe Configuration for Deployment `DEPLOYMENT_NAME`
 - Check Readiness Probe Configuration for Deployment `DEPLOYMENT_NAME` in Namespace `NAMESPACE`
 - Inspect Deployment Warning Events for `DEPLOYMENT_NAME` in Namespace `NAMESPACE`
 - Check Deployment Replica Status for `DEPLOYMENT_NAME` in Namespace `NAMESPACE`
 - Inspect Container Restarts for Deployment `DEPLOYMENT_NAME` in Namespace `NAMESPACE`
 - Identify Recent Configuration Changes for Deployment `DEPLOYMENT_NAME` in Namespace `NAMESPACE`
 - Check HPA Health for Deployment `DEPLOYMENT_NAME` in Namespace `NAMESPACE`
 
        This SLI uses kubectl to score deployment health. Produces a value between 0 (completely failing the test) and 1 (fully passing the test). Looks for container restarts, critical log errors, pods not ready, deployment status, and recent events.
    
Tasks:
    
Tasks:
- Get Container Restarts and Score for Deployment `${DEPLOYMENT_NAME}`
 - Get Critical Log Errors and Score for Deployment `${DEPLOYMENT_NAME}`
 - Get NotReady Pods Score for Deployment `${DEPLOYMENT_NAME}`
 - Get Deployment Replica Status and Score for `${DEPLOYMENT_NAME}`
 - Get Recent Warning Events Score for `${DEPLOYMENT_NAME}`
 - Generate Deployment Health Score for `${DEPLOYMENT_NAME}`
 
        This codebundle fetches the number of running pods with the set of provided labels, letting you measure the number of running pods.
    
Tasks:
    
Tasks:
- Measure Number of Running Pods with Label in `${NAMESPACE}`