OPENSHIFT

Icon

Icon 1 1 Troubleshooting Commands

Icon 2 Contributed by Jonathan Funk

Icon 2 Codecollection: rw-public-codecollection


Retreieve aggregate data via kubectl top command.

Tasks:
  • Running Kubectl Top And Extracting Metric Data

Icon 1 1 Troubleshooting Commands

Icon 2 Contributed by Jonathan Funk

Icon 2 Codecollection: rw-public-codecollection


Uses kubectl (or equivalent) to query the state of a patroni cluster and determine if it's healthy.

Tasks:
  • Determine Patroni Health

Icon 1 1 Troubleshooting Commands

Icon 2 Contributed by Jonathan Funk

Icon 2 Codecollection: rw-public-codecollection


Check the health of a Kubernetes API server using kubectl. Returns 1 when OK, or a 0 in the case of an unhealthy API server.

Tasks:
  • Running Kubectl Check Against API Server

Icon 1 4 Troubleshooting Commands

Icon 2 Contributed by Jonathan Funk

Icon 2 Codecollection: rw-public-codecollection


A taskset for troubleshooting general issues associated with typical kubernetes deployment resources. Supports API interactions via both the API client and Kubectl binary through RunWhen Shell Services.

Tasks:
  • Troubleshoot Resourcing
  • Troubleshoot Events
  • Troubleshoot PVC
  • Troubleshoot Pods

Icon 1 1 Troubleshooting Commands

Icon 2 Contributed by Jonathan Funk

Icon 2 Codecollection: rw-public-codecollection


This codebundle runs an arbitrary kubectl command and writes the stdout to a report. Typically used in conjunction with other codebundles.

Tasks:
  • Running Kubectl And Adding Stdout To Report

Icon 1 5 Troubleshooting Commands

Icon 2 Contributed by Jonathan Funk

Icon 2 Codecollection: rw-public-codecollection


This taskset runs general troubleshooting checks against all applicable objects in a namespace, checks error events, and searches pod logs for error entries.

Tasks:
  • Trace Namespace Errors
  • Fetch Unready Pods
  • Triage Namespace
  • Object Condition Check
  • Namespace Get All

Icon 1 4 Troubleshooting Commands

Icon 2 Contributed by Shea Stewart

Icon 2 Codecollection: rw-public-codecollection


This SLI uses kubectl to score namespace health. Produces a value between 0 (completely failing thet test) and 1 (fully passing the test). Looks for container restarts, events, and pods not ready.

Tasks:
  • Get Event Count and Score
  • Get Container Restarts and Score
  • Get NotReady Pods
  • Generate Namspace Score

Icon 1 1 Troubleshooting Commands

Icon 2 Contributed by Shea Stewart

Icon 2 Codecollection: rw-public-codecollection


This codebundle runs a kubectl get command that produces a value and pushes the metric. Uses jmespath for filtering and allows calculations such as count, sum, avg on specified fields.

Tasks:
  • Running Kubectl get and push the metric

Icon 1 4 Troubleshooting Commands

Icon 2 Contributed by Jonathan Funk

Icon 2 Codecollection: rw-public-codecollection


A taskset for troubleshooting issues for StatefulSets and their related resources.

Tasks:
  • Check StatefulSets Replicas Ready
  • Get Events For The StatefulSet
  • Get StatefulSet Logs
  • Get StatefulSet Manifests Dump

Icon 1 7 Troubleshooting Commands

Icon 2 Contributed by Shea Stewart

Icon 2 Codecollection: rw-public-codecollection


Runs multiple Kubernetes and psql commands to report on the health of a postgres cluster.

Tasks:
  • Get Standard Resources
  • Describe Custom Resources
  • Get Pod Logs & Events
  • Get Pod Resource Utilization
  • Get Running Configuration
  • Get Patroni Output
  • Run DB Queries

Icon 1 1 Troubleshooting Commands

Icon 2 Contributed by Jonathan Funk

Icon 2 Codecollection: rw-public-codecollection


Searches a namespace for matching objects and provides the commands to decommission them.

Tasks:
  • Generate Decomission Commands

Icon 1 1 Troubleshooting Commands

Icon 2 Contributed by Jonathan Funk

Icon 2 Codecollection: rw-public-codecollection


Returns the number of events with matching messages as an SLI metric.

Tasks:
  • Get Number Of Matching Events

Icon 1 3 Troubleshooting Commands

Icon 2 Contributed by Jonathan Funk

Icon 2 Codecollection: rw-public-codecollection


Triages issues related to a deployment's replicas.

Tasks:
  • Fetch Logs
  • Get Related Events
  • Check Deployment Replicas

Icon 1 1 Troubleshooting Commands

Icon 2 Contributed by Jonathan Funk

Icon 2 Codecollection: rw-public-codecollection


Detects and reinitializes laggy Patroni cluster members which are unable to catchup in replication using kubectl and patronictl.

Tasks:
  • Determine Patroni Health

Icon 1 1 Troubleshooting Commands

Icon 2 Contributed by Jonathan Funk

Icon 2 Codecollection: rw-public-codecollection


Measures the maximum replica lag across a Patroni cluster.

Tasks:
  • Measure Patroni Member Lag

Icon 1 1 Troubleshooting Commands

Icon 2 Contributed by Jonathan Funk

Icon 2 Codecollection: rw-public-codecollection


Checks that the current state of a daemonset is healthy and returns a score of either 1 (healthy) or 0 (unhealthy).

Tasks:
  • Health Check Daemonset

Icon 1 1 Troubleshooting Commands

Icon 2 Contributed by Jonathan Funk

Icon 2 Codecollection: rw-public-codecollection


Creates an adhoc one-shot job which mounts a PVC as a canary test, which is polled for success before being torn down.

Tasks:
  • Run Canary Job

Icon 1 3 Troubleshooting Commands

Icon 2 Contributed by Jonathan Funk

Icon 2 Codecollection: rw-public-codecollection


Taskset to triage issues related to patroni.

Tasks:
  • Get Patroni Status
  • Get Pods Status
  • Fetch Logs

Icon 1 1 Troubleshooting Commands

Icon 2 Contributed by Shea Stewart

Icon 2 Codecollection: rw-public-codecollection


Runs a postgres SQL query and pushes the returned result into a report. During execution, the SQL query should be passed to a Kubernetes workload that has access to the psql binary. The workload will run the query and return the results from stdout.

Tasks:
  • Run Postgres Query And Results to Report

Icon 1 10 Troubleshooting Commands

Icon 2 Contributed by stewartshea

Icon 2 Codecollection: rw-cli-codecollection


Triages issues related to a StatefulSet and its pods, including persistent volumes and ordered deployment characteristics.

Tasks:
  • Analyze Application Log Patterns for StatefulSet `STATEFULSET_NAME` in Namespace `NAMESPACE`
  • Detect Log Anomalies for StatefulSet `STATEFULSET_NAME` in Namespace `NAMESPACE`
  • Check Liveness Probe Configuration for StatefulSet `STATEFULSET_NAME`
  • Check Readiness Probe Configuration for StatefulSet `STATEFULSET_NAME` in Namespace `NAMESPACE`
  • Check for Container Restarts in StatefulSet `STATEFULSET_NAME` in Namespace `NAMESPACE`
  • Inspect StatefulSet Warning Events for `STATEFULSET_NAME` in Namespace `NAMESPACE`
  • Fetch StatefulSet Workload Details For `STATEFULSET_NAME` in Namespace `NAMESPACE`
  • Inspect StatefulSet Replicas for `STATEFULSET_NAME` in namespace `NAMESPACE`
  • Check StatefulSet PersistentVolumeClaims for `STATEFULSET_NAME` in Namespace `NAMESPACE`
  • Identify Recent Configuration Changes for StatefulSet `STATEFULSET_NAME` in Namespace `NAMESPACE`

Icon 1 4 Troubleshooting Commands

Icon 2 Contributed by stewartshea

Icon 2 Codecollection: rw-cli-codecollection


Provides a list of tasks that can remediate configuraiton issues with manifests in GitHub based GitOps repositories.

Tasks:
  • Remediate Readiness and Liveness Probe GitOps Manifests in Namespace `NAMESPACE`
  • Increase ResourceQuota Limit for Namespace `NAMESPACE` in GitHub GitOps Repository
  • Adjust Pod Resources to Match VPA Recommendation in `NAMESPACE`
  • Expand Persistent Volume Claims in Namespace `NAMESPACE`

Icon 1 3 Troubleshooting Commands

Icon 2 Contributed by stewartshea

Icon 2 Codecollection: rw-cli-codecollection


Checks the overall health of certificates in a namespace that are managed by cert-manager.

Tasks:
  • Get Namespace Certificate Summary for Namespace `NAMESPACE`
  • Find Unhealthy Certificates in Namespace `NAMESPACE`
  • Find Failed Certificate Requests and Identify Issues for Namespace `NAMESPACE`

Icon 1 1 Troubleshooting Commands

Icon 2 Contributed by stewartshea

Icon 2 Codecollection: rw-cli-codecollection


Counts the number of unhealthy cert-manager managed certificates in a namespace.

Tasks:
  • Count Unready and Expired Certificates in Namespace `${NAMESPACE}`

Icon 1 2 Troubleshooting Commands

Icon 2 Contributed by jon-funk

Icon 2 Codecollection: rw-cli-codecollection


This taskset collects information about perstistent volumes and persistent volume claims to validate health or help troubleshoot potential issues.

Tasks:
  • Query The Jenkins Kubernetes Workload HTTP Endpoint in Kubernetes StatefulSet `STATEFULSET_NAME`
  • Query For Stuck Jenkins Jobs in Kubernetes Statefulset Workload `STATEFULSET_NAME`

Icon 1 1 Troubleshooting Commands

Icon 2 Contributed by stewartshea

Icon 2 Codecollection: rw-cli-codecollection


This taskset provides tasks to troubleshoot service accounts in a Kubernetes namespace.

Tasks:
  • Test Service Account Access to Kubernetes API Server in Namespace `NAMESPACE`

Icon 1 5 Troubleshooting Commands

Icon 2 Contributed by stewartshea

Icon 2 Codecollection: rw-cli-codecollection


This codebundle runs a series of tasks to identify potential helm release issues related to Flux managed Helm objects.

Tasks:
  • List all available FluxCD Helmreleases in Namespace `NAMESPACE`
  • Fetch Installed FluxCD Helmrelease Versions in Namespace `NAMESPACE`
  • Fetch Mismatched FluxCD HelmRelease Version in Namespace `NAMESPACE`
  • Fetch FluxCD HelmRelease Error Messages in Namespace `NAMESPACE`
  • Check for Available Helm Chart Updates in Namespace `NAMESPACE`

Icon 1 13 Troubleshooting Commands

Icon 2 Contributed by stewartshea

Icon 2 Codecollection: rw-cli-codecollection


Perform oprational tasks for a Kubernetes deployment.

Tasks:
  • Restart Deployment `DEPLOYMENT_NAME` in Namespace `NAMESPACE`
  • Force Delete Pods in Deployment `DEPLOYMENT_NAME` in Namespace `NAMESPACE`
  • Rollback Deployment `DEPLOYMENT_NAME` in Namespace `NAMESPACE` to Previous Version
  • Scale Down Deployment `DEPLOYMENT_NAME` in Namespace `NAMESPACE`
  • Scale Up Deployment `DEPLOYMENT_NAME` in Namespace `NAMESPACE` by SCALE_UP_FACTORx
  • Clean Up Stale ReplicaSets for Deployment `DEPLOYMENT_NAME` in Namespace `NAMESPACE`
  • Scale Down Stale ReplicaSets for Deployment `DEPLOYMENT_NAME` in Namespace `NAMESPACE`
  • Scale Up HPA for Deployment `DEPLOYMENT_NAME` in Namespace `NAMESPACE` by HPA_SCALE_FACTORx
  • Scale Down HPA for Deployment `DEPLOYMENT_NAME` in Namespace `NAMESPACE` to Min HPA_MIN_REPLICAS
  • Increase CPU Resources for Deployment `DEPLOYMENT_NAME` in Namespace `NAMESPACE`
  • Increase Memory Resources for Deployment `DEPLOYMENT_NAME` in Namespace `NAMESPACE`
  • Decrease CPU Resources for Deployment `DEPLOYMENT_NAME` in Namespace `NAMESPACE`
  • Decrease Memory Resources for Deployment `DEPLOYMENT_NAME` in Namespace `NAMESPACE`

Icon 1 3 Troubleshooting Commands

Icon 2 Contributed by jon-funk

Icon 2 Codecollection: rw-cli-codecollection


This taskset restarts a resource with a given set of labels, typically used with other tasksets.

Tasks:
  • Get Current Resource State with Labels `LABELS`
  • Get Resource Logs with Labels `LABELS`
  • Restart Resource with Labels `LABELS` in `CONTEXT`

Icon 1 1 Troubleshooting Commands

Icon 2 Contributed by akshayrw25

Icon 2 Codecollection: rw-cli-codecollection


Detects and analyzes stacktraces/tracebacks in Kubernetes workload logs for troubleshooting application issues.

Tasks:
  • Analyze Workload Stacktraces for WORKLOAD_TYPE `WORKLOAD_NAME` in Namespace `NAMESPACE`

Icon 1 2 Troubleshooting Commands

Icon 2 Contributed by akshayrw25

Icon 2 Codecollection: rw-cli-codecollection


This SLI monitors stacktrace health in kubernetes workload application logs. Produces a value between 0 (stacktraces detected) and 1 (no stacktraces found). Focuses specifically on application error detection through stacktrace analysis.

Tasks:
  • Get Stacktrace Health Score for ${WORKLOAD_TYPE} `${WORKLOAD_NAME}`
  • Generate Stacktrace Health Score for `${WORKLOAD_NAME}`

Icon 1 5 Troubleshooting Commands

Icon 2 Contributed by jon-funk

Icon 2 Codecollection: rw-cli-codecollection


This taskset investigates the logs, state and health of Kubernetes Prometheus operator.

Tasks:
  • Check Prometheus Service Monitors in namespace `NAMESPACE`
  • Check For Successful Rule Setup in Kubernetes Namespace `NAMESPACE`
  • Verify Prometheus RBAC Can Access ServiceMonitors in Namespace `PROM_NAMESPACE`
  • Inspect Prometheus Operator Logs for Scraping Errors in Namespace `NAMESPACE`
  • Check Prometheus API Healthy in Namespace `PROM_NAMESPACE`

Icon 1 1 Troubleshooting Commands

Icon 2 Contributed by stewartshea

Icon 2 Codecollection: rw-cli-codecollection


Evaluate cluster node health using kubectl

Tasks:
  • Check for Node Restarts in Cluster `CONTEXT` within Interval `RW_LOOKBACK_WINDOW`

Icon 1 2 Troubleshooting Commands

Icon 2 Contributed by stewartshea

Icon 2 Codecollection: rw-cli-codecollection


Evaluate cluster node health using kubectl.

Tasks:
  • Check for Node Restarts in Cluster `${CONTEXT}`
  • Generate Namespace Score in Kubernetes Cluster `$${CONTEXT}`

Icon 1 5 Troubleshooting Commands

Icon 2 Contributed by stewartshea

Icon 2 Codecollection: rw-cli-codecollection


This taskset collects information and runs general troubleshooting checks against argocd application objects within a namespace.

Tasks:
  • Fetch ArgoCD Application Sync Status & Health for `APPLICATION`
  • Fetch ArgoCD Application Last Sync Operation Details for `APPLICATION`
  • Fetch Unhealthy ArgoCD Application Resources for `APPLICATION`
  • Scan For Errors in Pod Logs Related to ArgoCD Application `APPLICATION`
  • Fully Describe ArgoCD Application `APPLICATION`

Icon 1 1 Troubleshooting Commands

Icon 2 Contributed by jon-funk

Icon 2 Codecollection: rw-cli-codecollection


Performs a triage on the Open Source version of Artifactory in a Kubernetes cluster.

Tasks:
  • Check Artifactory Liveness and Readiness Endpoints in `NAMESPACE`

Icon 1 2 Troubleshooting Commands

Icon 2 Contributed by stewartshea

Icon 2 Codecollection: rw-cli-codecollection


Triages issues related to a ingress objects and services.

Tasks:
  • Fetch Ingress Object Health in Namespace `NAMESPACE`
  • Check for Ingress and Service Conflicts in Namespace `NAMESPACE`

Icon 1 2 Troubleshooting Commands

Icon 2 Contributed by jon-funk

Icon 2 Codecollection: rw-cli-codecollection


Performs application-level troubleshooting by inspecting the logs of a workload for parsable exceptions, and attempts to determine next steps.

Tasks:
  • Get `CONTAINER_NAME` Application Logs in Namespace `NAMESPACE`
  • Tail `CONTAINER_NAME` Application Logs For Stacktraces

Icon 1 1 Troubleshooting Commands

Icon 2 Contributed by jon-funk

Icon 2 Codecollection: rw-cli-codecollection


Measures the number of exception stacktraces present in an application's logs over a time period.

Tasks:
  • Tail `${CONTAINER_NAME}` Application Logs For Stacktraces

Icon 1 3 Troubleshooting Commands

Icon 2 Contributed by stewartshea

Icon 2 Codecollection: rw-cli-codecollection


Identify resource constraints or issues in a cluster.

Tasks:
  • Identify High Utilization Nodes for Cluster `CONTEXT`
  • Identify Pods Causing High Node Utilization in Cluster `CONTEXT`
  • Identify Pods with Resource Limits Exceeding Node Capacity in Cluster `CONTEXT`

Icon 1 3 Troubleshooting Commands

Icon 2 Contributed by stewartshea

Icon 2 Codecollection: rw-cli-codecollection


Counts the number of nodes above 90% CPU or Memory Utilization from kubectl top.

Tasks:
  • Identify High Utilization Nodes for Cluster `${CONTEXT}`
  • Identify Pods with Resource Limits Exceeding Node Capacity in Cluster `${CONTEXT}`
  • Generate Cluster Resource Health Score

Icon 1 5 Troubleshooting Commands

Icon 2 Contributed by stewartshea

Icon 2 Codecollection: rw-cli-codecollection


This taskset is used to suspend a flux resource for the purposes of executing chaos tasks.

Tasks:
  • Suspend the Flux Resource Reconciliation for `FLUX_RESOURCE_NAME` in namespace `FLUX_RESOURCE_NAMESPACE`
  • Select Random FluxCD Workload for Chaos Target in Namespace `FLUX_RESOURCE_NAMESPACE`
  • Execute Chaos Command on `TARGET_RESOURCE` in Namespace `TARGET_NAMESPACE`
  • Execute Additional Chaos Command on FLUX_RESOURCE_TYPE 'FLUX_RESOURCE_NAME' in namespace 'FLUX_RESOURCE_NAMESPACE'
  • Resume Flux Resource Reconciliation in `TARGET_NAMESPACE`

Icon 1 3 Troubleshooting Commands

Icon 2 Contributed by stewartshea

Icon 2 Codecollection: rw-cli-codecollection


This codebundle runs a series of tasks to identify potential Kustomization issues related to Flux managed Kustomization objects.

Tasks:
  • List All FluxCD Kustomization objects in Namespace `NAMESPACE` in Cluster `CONTEXT`
  • List Suspended FluxCD Kustomization objects in Namespace `NAMESPACE` in Cluster `CONTEXT`
  • List Unready FluxCD Kustomizations in Namespace `NAMESPACE` in Cluster `CONTEXT`

Icon 1 3 Troubleshooting Commands

Icon 2 Contributed by stewartshea

Icon 2 Codecollection: rw-cli-codecollection


This codebundle checks for unhealthy or suspended FluxCD Kustomization objects.

Tasks:
  • List Suspended FluxCD Kustomization objects in Namespace `${NAMESPACE}` in Cluster `${CONTEXT}`
  • List Unready FluxCD Kustomizations in Namespace `${NAMESPACE}` in Cluster `${CONTEXT}`
  • Generate FluxCD Kustomization Health Score for Namespace `${NAMESPACE}` in Cluster `${CONTEXT}`

Icon 1 2 Troubleshooting Commands

Icon 2 Contributed by nmadhok

Icon 2 Codecollection: rw-cli-codecollection


This codebundle runs a series of tasks to identify potential helm release issues related to ArgoCD managed Helm objects.

Tasks:
  • Fetch all available ArgoCD Helm releases in namespace `NAMESPACE`
  • Fetch Installed ArgoCD Helm release versions in namespace `NAMESPACE`

Icon 1 4 Troubleshooting Commands

Icon 2 Contributed by jon-funk

Icon 2 Codecollection: rw-cli-codecollection


Inspects the resources provisioned for a given set of pods and raises issues or recommendations as necessary.

Tasks:
  • Show Pods Without Resource Limit or Resource Requests Set in Namespace `NAMESPACE`
  • Check Pod Resource Utilization with Top in Namespace `NAMESPACE`
  • Identify VPA Pod Resource Recommendations in Namespace `NAMESPACE`
  • Identify Overutilized Pods in Namespace `NAMESPACE`

Icon 1 10 Troubleshooting Commands

Icon 2 Contributed by stewartshea

Icon 2 Codecollection: rw-cli-codecollection


Triages issues related to a DaemonSet and its pods, including node scheduling and resource constraints.

Tasks:
  • Analyze Application Log Patterns for DaemonSet `DAEMONSET_NAME` in Namespace `NAMESPACE`
  • Detect Log Anomalies for DaemonSet `DAEMONSET_NAME` in Namespace `NAMESPACE`
  • Identify Recent Configuration Changes for DaemonSet `DAEMONSET_NAME` in Namespace `NAMESPACE`
  • Check Liveness Probe Configuration for DaemonSet `DAEMONSET_NAME`
  • Check Readiness Probe Configuration for DaemonSet `DAEMONSET_NAME` in Namespace `NAMESPACE`
  • Check for Container Restarts in DaemonSet `DAEMONSET_NAME` in Namespace `NAMESPACE`
  • Inspect DaemonSet Warning Events for `DAEMONSET_NAME` in Namespace `NAMESPACE`
  • Fetch DaemonSet Workload Details For `DAEMONSET_NAME` in Namespace `NAMESPACE`
  • Inspect DaemonSet Status for `DAEMONSET_NAME` in namespace `NAMESPACE`
  • Check Node Affinity and Tolerations for DaemonSet `DAEMONSET_NAME` in Namespace `NAMESPACE`

Icon 1 7 Troubleshooting Commands

Icon 2 Contributed by Nbarola

Icon 2 Codecollection: rw-cli-codecollection


Checks istio proxy sidecar injection status, high memory and cpu usage, warnings and errors in logs, valid certificates, configuration and verify istio installation.

Tasks:
  • Verify Istio Sidecar Injection for Cluster `CONTEXT`
  • Check Istio Sidecar Resource Usage for Cluster `CONTEXT`
  • Validate Istio Installation in Cluster `CONTEXT`
  • Check Istio Controlplane Logs For Errors in Cluster `CONTEXT`
  • Fetch Istio Proxy Logs in Cluster `CONTEXT`
  • Verify Istio SSL Certificates in Cluster `CONTEXT`
  • Check Istio Configuration Health in Cluster `CONTEXT`

Icon 1 8 Troubleshooting Commands

Icon 2 Contributed by Nbarola

Icon 2 Codecollection: rw-cli-codecollection


Checks istio proxy sidecar injection status, high memory and cpu usage, warnings and errors in logs, valid certificates, configuration and verify istio installation.

Tasks:
  • Verify Istio Sidecar Injection for Cluster `${CONTEXT}`
  • Check Istio Sidecar Resource Usage for Cluster `${CONTEXT}`
  • Validate Istio Installation in Cluster `${CONTEXT}`
  • Check Istio Controlplane Logs For Errors in Cluster `${CONTEXT}`
  • Fetch Istio Proxy Logs in Cluster `${CONTEXT}`
  • Verify Istio SSL Certificates in Cluster `${CONTEXT}`
  • Check Istio Configuration Health in Cluster `${CONTEXT}`
  • Generate Health Score for Cluster ${CONTEXT}

Icon 1 3 Troubleshooting Commands

Icon 2 Contributed by jon-funk

Icon 2 Codecollection: rw-cli-codecollection


Performs application-level troubleshooting by inspecting the logs of a workload for parsable exceptions, and attempts to determine next steps.

Tasks:
  • Get `CONTAINER_NAME` Application Logs from Workload `WORKLOAD_NAME` in Namespace `NAMESPACE`
  • Scan `CONTAINER_NAME` Application For Misconfigured Environment
  • Tail `CONTAINER_NAME` Application Logs For Stacktraces in Workload `WORKLOAD_NAME`

Icon 1 1 Troubleshooting Commands

Icon 2 Contributed by jon-funk

Icon 2 Codecollection: rw-cli-codecollection


Measures the number of exception stacktraces present in an application's logs over a time period.

Tasks:
  • Measure Application Exceptions in `${NAMESPACE}`

Icon 1 4 Troubleshooting Commands

Icon 2 Contributed by jon-funk

Icon 2 Codecollection: rw-cli-codecollection


This taskset provides detailed information about the images used in a Kubernetes namespace.

Tasks:
  • Check Image Rollover Times for Namespace `NAMESPACE`
  • List Images and Tags for Every Container in Running Pods for Namespace `NAMESPACE`
  • List Images and Tags for Every Container in Failed Pods for Namespace `NAMESPACE`
  • List ImagePullBackOff Events and Test Path and Tags for Namespace `NAMESPACE`

Icon 1 2 Troubleshooting Commands

Icon 2 Contributed by jon-funk

Icon 2 Codecollection: rw-cli-codecollection


This taskset collects information on your redis workload in your Kubernetes cluster and raises issues if any health checks fail.

Tasks:
  • Ping `DEPLOYMENT_NAME` Redis Workload
  • Verify `DEPLOYMENT_NAME` Redis Read Write Operation in Kubernetes

Icon 1 9 Troubleshooting Commands

Icon 2 Contributed by stewartshea

Icon 2 Codecollection: rw-cli-codecollection


This taskset runs general troubleshooting checks against all applicable objects in a namespace. Looks for warning events, odd or frequent normal events, restarting containers and failed or pending pods.

Tasks:
  • Inspect Warning Events in Namespace `NAMESPACE`
  • Inspect Container Restarts In Namespace `NAMESPACE`
  • Inspect Pending Pods In Namespace `NAMESPACE`
  • Inspect Failed Pods In Namespace `NAMESPACE`
  • Inspect Workload Status Conditions In Namespace `NAMESPACE`
  • Get Listing Of Resources In Namespace `NAMESPACE`
  • Check Event Anomalies in Namespace `NAMESPACE`
  • Check Missing or Risky PodDisruptionBudget Policies in Namepace `NAMESPACE`
  • Check Resource Quota Utilization in Namespace `NAMESPACE`

Icon 1 4 Troubleshooting Commands

Icon 2 Contributed by stewartshea

Icon 2 Codecollection: rw-cli-codecollection


This SLI uses kubectl to score namespace health. Produces a value between 0 (completely failing thet test) and 1 (fully passing the test). Looks for container restarts, events, and pods not ready.

Tasks:
  • Get Error Event Count within ${RW_LOOKBACK_WINDOW} and calculate Score
  • Get Container Restarts and Score in Namespace `${NAMESPACE}`
  • Get NotReady Pods in `${NAMESPACE}`
  • Generate Namespace Score in `${NAMESPACE}`

Icon 1 6 Troubleshooting Commands

Icon 2 Contributed by stewartshea

Icon 2 Codecollection: rw-cli-codecollection


This taskset collects information about storage such as PersistentVolumes and PersistentVolumeClaims to validate health or help troubleshoot potential storage issues.

Tasks:
  • Fetch Events for Unhealthy Kubernetes PersistentVolumeClaims in Namespace `NAMESPACE`
  • List PersistentVolumeClaims in Terminating State in Namespace `NAMESPACE`
  • List PersistentVolumes in Terminating State in Namespace `NAMESPACE`
  • List Pods with Attached Volumes and Related PersistentVolume Details in Namespace `NAMESPACE`
  • Fetch the Storage Utilization for PVC Mounts in Namespace `NAMESPACE`
  • Check for RWO Persistent Volume Node Attachment Issues in Namespace `NAMESPACE`

Icon 1 2 Troubleshooting Commands

Icon 2 Contributed by stewartshea

Icon 2 Codecollection: rw-cli-codecollection


This SLI collects information about storage such as PersistentVolumes and PersistentVolumeClaims and generates an aggregated health score for the namespace. 1 = Healthy, 0 = Failed, >0 <1 = Degraded

Tasks:
  • Fetch the Storage Utilization for PVC Mounts in Namespace `${NAMESPACE}`
  • Generate Namespace Score for Namespace `${NAMESPACE}`

Icon 1 10 Troubleshooting Commands

Icon 2 Contributed by stewartshea

Icon 2 Codecollection: rw-cli-codecollection


Triages issues related to a deployment and its replicas.

Tasks:
  • Analyze Application Log Patterns for Deployment `DEPLOYMENT_NAME` in Namespace `NAMESPACE`
  • Detect Event Anomalies for Deployment `DEPLOYMENT_NAME` in Namespace `NAMESPACE`
  • Fetch Deployment Logs for `DEPLOYMENT_NAME` in Namespace `NAMESPACE`
  • Check Liveness Probe Configuration for Deployment `DEPLOYMENT_NAME`
  • Check Readiness Probe Configuration for Deployment `DEPLOYMENT_NAME` in Namespace `NAMESPACE`
  • Inspect Deployment Warning Events for `DEPLOYMENT_NAME` in Namespace `NAMESPACE`
  • Check Deployment Replica Status for `DEPLOYMENT_NAME` in Namespace `NAMESPACE`
  • Inspect Container Restarts for Deployment `DEPLOYMENT_NAME` in Namespace `NAMESPACE`
  • Identify Recent Configuration Changes for Deployment `DEPLOYMENT_NAME` in Namespace `NAMESPACE`
  • Check HPA Health for Deployment `DEPLOYMENT_NAME` in Namespace `NAMESPACE`

Icon 1 6 Troubleshooting Commands

Icon 2 Contributed by stewartshea

Icon 2 Codecollection: rw-cli-codecollection


This SLI uses kubectl to score deployment health. Produces a value between 0 (completely failing the test) and 1 (fully passing the test). Looks for container restarts, critical log errors, pods not ready, deployment status, and recent events.

Tasks:
  • Get Container Restarts and Score for Deployment `${DEPLOYMENT_NAME}`
  • Get Critical Log Errors and Score for Deployment `${DEPLOYMENT_NAME}`
  • Get NotReady Pods Score for Deployment `${DEPLOYMENT_NAME}`
  • Get Deployment Replica Status and Score for `${DEPLOYMENT_NAME}`
  • Get Recent Warning Events Score for `${DEPLOYMENT_NAME}`
  • Generate Deployment Health Score for `${DEPLOYMENT_NAME}`

Icon 1 1 Troubleshooting Commands

Icon 2 Contributed by stewartshea

Icon 2 Codecollection: rw-cli-codecollection


This codebundle fetches the number of running pods with the set of provided labels, letting you measure the number of running pods.

Tasks:
  • Measure Number of Running Pods with Label in `${NAMESPACE}`