OPENSHIFT

Icon

Icon 1 2 Troubleshooting Commands

Icon 2 Contributed by stewartshea

Icon 2 Codecollection: rw-cli-codecollection


This SLI collects information about storage such as PersistentVolumes and PersistentVolumeClaims and generates an aggregated health score for the namespace. 1 = Healthy, 0 = Failed, >0 <1 = Degraded

Tasks:
  • Fetch the Storage Utilization for PVC Mounts in Namespace `${NAMESPACE}`
  • Generate Namespace Score for Namespace `${NAMESPACE}`

Icon 1 6 Troubleshooting Commands

Icon 2 Contributed by stewartshea

Icon 2 Codecollection: rw-cli-codecollection


This taskset collects information about storage such as PersistentVolumes and PersistentVolumeClaims to validate health or help troubleshoot potential storage issues.

Tasks:
  • Fetch Events for Unhealthy Kubernetes PersistentVolumeClaims in Namespace `NAMESPACE`
  • List PersistentVolumeClaims in Terminating State in Namespace `NAMESPACE`
  • List PersistentVolumes in Terminating State in Namespace `NAMESPACE`
  • List Pods with Attached Volumes and Related PersistentVolume Details in Namespace `NAMESPACE`
  • Fetch the Storage Utilization for PVC Mounts in Namespace `NAMESPACE`
  • Check for RWO Persistent Volume Node Attachment Issues in Namespace `NAMESPACE`

Icon 1 10 Troubleshooting Commands

Icon 2 Contributed by stewartshea

Icon 2 Codecollection: rw-cli-codecollection


Triages issues related to a deployment and its replicas.

Tasks:
  • Check Deployment Log For Issues with `DEPLOYMENT_NAME`
  • Fetch Deployments Logs for `DEPLOYMENT_NAME` in Namespace `NAMESPACE` and Add to Report Show More
    Common scenarios that might relate to this command or script:
    1. Troubleshooting a Kubernetes CrashLoopBackoff event: The DevOps or Site Reliability Engineer may need to use this command to view the logs of the deployment to identify any errors or issues causing the pod to continuously restart. 2. Monitoring application performance: The engineer may use this command to view the logs of a specific deployment to monitor the performance of the application and identify any bottlenecks or errors affecting its functionality. 3. Investigating security incidents: In the event of a security incident or breach, the engineer may need to review the logs of a specific deployment to identify any unauthorized access or unusual activity within the Kubernetes cluster. 4. Debugging an application issue: When troubleshooting an issue with a specific application running on the Kubernetes cluster, the engineer may use this command to view the logs and identify the root cause of the problem. 5. Analyzing resource usage: The engineer may use this command to view the logs of a deployment to analyze the resource usage of the application and identify any inefficiencies or areas for optimization within the Kubernetes environment.
  • Check Liveness Probe Configuration for Deployment `DEPLOYMENT_NAME`
  • Check Readiness Probe Configuration for Deployment `DEPLOYMENT_NAME` in Namespace `NAMESPACE`
  • Inspect Container Restarts for Deployment `DEPLOYMENT_NAME` in Namespace `NAMESPACE`
  • Inspect Deployment Warning Events for `DEPLOYMENT_NAME` in Namespace `NAMESPACE`
  • Fetch Deployment Workload Details For `DEPLOYMENT_NAME` in Namespace `NAMESPACE`
  • Inspect Deployment Replicas for `DEPLOYMENT_NAME` in namespace `NAMESPACE`
  • Check Deployment Event Anomalies for `DEPLOYMENT_NAME` in Namespace `NAMESPACE`
  • Check ReplicaSet Health for Deployment `DEPLOYMENT_NAME` in Namespace `NAMESPACE`

Icon 1 1 Troubleshooting Commands

Icon 2 Contributed by stewartshea

Icon 2 Codecollection: rw-cli-codecollection


Counts the number of unhealthy cert-manager managed certificates in a namespace.

Tasks:
  • Count Unready and Expired Certificates in Namespace `${NAMESPACE}`

Icon 1 3 Troubleshooting Commands

Icon 2 Contributed by stewartshea

Icon 2 Codecollection: rw-cli-codecollection


Checks the overall health of certificates in a namespace that are managed by cert-manager.

Tasks:
  • Get Namespace Certificate Summary for Namespace `NAMESPACE`
  • Find Unhealthy Certificates in Namespace `NAMESPACE`
  • Find Failed Certificate Requests and Identify Issues for Namespace `NAMESPACE`

Icon 1 2 Troubleshooting Commands

Icon 2 Contributed by stewartshea

Icon 2 Codecollection: rw-cli-codecollection


Triages issues related to a ingress objects and services.

Tasks:
  • Fetch Ingress Object Health in Namespace `NAMESPACE`
  • Check for Ingress and Service Conflicts in Namespace `NAMESPACE`

Icon 1 8 Troubleshooting Commands

Icon 2 Contributed by jon-funk

Icon 2 Codecollection: rw-cli-codecollection


Triages issues related to a StatefulSet and its replicas.

Tasks:
  • Check Readiness Probe Configuration for StatefulSet `STATEFULSET_NAME`
  • Check Liveness Probe Configuration for StatefulSet `STATEFULSET_NAME`
  • Troubleshoot StatefulSet Warning Events for `STATEFULSET_NAME`
  • Check StatefulSet Event Anomalies for `STATEFULSET_NAME` in Namespace `NAMESPACE`
  • Fetch StatefulSet Logs for `STATEFULSET_NAME` in Namespace `NAMESPACE` and Add to Report Show More
    Common scenarios that might relate to this command or script:
    1. Troubleshooting a Kubernetes CrashLoopBackoff event for a specific StatefulSet in a critical production environment to identify the root cause of the issue and minimize downtime. 2. Monitoring the performance and behavior of a StatefulSet in a Kubernetes cluster by analyzing its logs to detect any abnormal patterns or errors. 3. Investigating a reported issue with a specific application running as a StatefulSet in a Kubernetes cluster by examining its recent logs for potential error messages or exceptions. 4. Analyzing the impact of recent changes or updates on a StatefulSet in a Kubernetes cluster by reviewing its log data to assess any abnormalities or unexpected behaviors. 5. Performing routine maintenance or troubleshooting tasks related to a specific StatefulSet in a Kubernetes cluster, such as identifying and resolving errors or warnings in its logs.
  • Get Related StatefulSet `STATEFULSET_NAME` Events Show More
    Common scenarios that might relate to this command or script:
    1. Troubleshooting a Kubernetes CrashLoopBackoff event in a production environment to ensure that the application is running smoothly and efficiently. 2. Monitoring and managing resource utilization within a specific namespace to optimize performance and prevent potential issues. 3. Investigating networking or connectivity issues within a Kubernetes cluster to ensure seamless communication between pods and services. 4. Resolving deployment failures or errors within a statefulset to maintain the availability and stability of the application. 5. Performing routine maintenance and checks on Kubernetes clusters to proactively identify and address any potential issues before they escalate.
  • Fetch Manifest Details for StatefulSet `STATEFULSET_NAME` in Namespace `NAMESPACE`
  • List Unhealthy Replica Counts for StatefulSets in Namespace `NAMESPACE`

Icon 1 1 Troubleshooting Commands

Icon 2 Contributed by stewartshea

Icon 2 Codecollection: rw-cli-codecollection


Counts the number of nodes above 90% CPU or Memory Utilization from kubectl top.

Tasks:
  • Identify High Utilization Nodes for Cluster `${CONTEXT}`

Icon 1 2 Troubleshooting Commands

Icon 2 Contributed by stewartshea

Icon 2 Codecollection: rw-cli-codecollection


Identify resource constraints or issues in a cluster.

Tasks:
  • Identify High Utilization Nodes for Cluster `CONTEXT`
  • Identify Pods Causing High Node Utilization in Cluster `CONTEXT`

Icon 1 5 Troubleshooting Commands

Icon 2 Contributed by stewartshea

Icon 2 Codecollection: rw-cli-codecollection


This codebundle runs a series of tasks to identify potential helm release issues related to Flux managed Helm objects.

Tasks:
  • List all available FluxCD Helmreleases in Namespace `NAMESPACE`
  • Fetch Installed FluxCD Helmrelease Versions in Namespace `NAMESPACE`
  • Fetch Mismatched FluxCD HelmRelease Version in Namespace `NAMESPACE`
  • Fetch FluxCD HelmRelease Error Messages in Namespace `NAMESPACE`
  • Check for Available Helm Chart Updates in Namespace `NAMESPACE`

Icon 1 1 Troubleshooting Commands

Icon 2 Contributed by stewartshea

Icon 2 Codecollection: rw-cli-codecollection


This taskset provides tasks to troubleshoot service accounts in a Kubernetes namespace.

Tasks:
  • Test Service Account Access to Kubernetes API Server in Namespace `NAMESPACE`

Icon 1 4 Troubleshooting Commands

Icon 2 Contributed by stewartshea

Icon 2 Codecollection: rw-cli-codecollection


This SLI uses kubectl to score namespace health. Produces a value between 0 (completely failing thet test) and 1 (fully passing the test). Looks for container restarts, events, and pods not ready.

Tasks:
  • Get Error Event Count within ${EVENT_AGE} and calculate Score
  • Get Container Restarts and Score in Namespace `${NAMESPACE}`
  • Get NotReady Pods in `${NAMESPACE}`
  • Generate Namespace Score in `${NAMESPACE}`

Icon 1 9 Troubleshooting Commands

Icon 2 Contributed by stewartshea

Icon 2 Codecollection: rw-cli-codecollection


This taskset runs general troubleshooting checks against all applicable objects in a namespace. Looks for warning events, odd or frequent normal events, restarting containers and failed or pending pods.

Tasks:
  • Inspect Warning Events in Namespace `NAMESPACE`
  • Inspect Container Restarts In Namespace `NAMESPACE`
  • Inspect Pending Pods In Namespace `NAMESPACE`
  • Inspect Failed Pods In Namespace `NAMESPACE`
  • Inspect Workload Status Conditions In Namespace `NAMESPACE`
  • Get Listing Of Resources In Namespace `NAMESPACE`
  • Check Event Anomalies in Namespace `NAMESPACE`
  • Check Missing or Risky PodDisruptionBudget Policies in Namepace `NAMESPACE`
  • Check Resource Quota Utilization in Namespace `NAMESPACE`

Icon 1 2 Troubleshooting Commands

Icon 2 Contributed by jon-funk

Icon 2 Codecollection: rw-cli-codecollection


This taskset collects information on your redis workload in your Kubernetes cluster and raises issues if any health checks fail.

Tasks:
  • Ping `DEPLOYMENT_NAME` Redis Workload Show More
    Common scenarios that might relate to this command or script:
    1. Troubleshooting a Kubernetes CrashLoopBackoff event for a specific Redis deployment to see if the server is running properly and responding to commands. 2. Performing routine health checks on Redis deployments within the Kubernetes cluster to ensure that the servers are operational and responsive. 3. Checking the status of the Redis server after a recent deployment or upgrade to ensure that it is functioning as expected within the Kubernetes environment. 4. Verifying the status of the Redis server in response to user-reported issues or errors related to data storage or retrieval. 5. Investigating performance or latency issues within the Kubernetes cluster by inspecting the responsiveness of the Redis servers using the redis-cli PING command.
  • Verify `DEPLOYMENT_NAME` Redis Read Write Operation in Kubernetes Show More
    Common scenarios that might relate to this command or script:
    1. Troubleshooting application performance issues related to Redis in a Kubernetes environment. 2. Investigating and resolving connectivity issues between a Kubernetes deployment and the Redis database. 3. Monitoring and diagnosing potential data inconsistencies or corruption in the Redis database within a Kubernetes cluster. 4. Analyzing and troubleshooting CrashLoopBackoff events related to the Redis deployment in Kubernetes. 5. Providing support for developers by retrieving specific key values from the Redis database within a Kubernetes environment for debugging purposes.

Icon 1 2 Troubleshooting Commands

Icon 2 Contributed by jon-funk

Icon 2 Codecollection: rw-cli-codecollection


This taskset collects information about perstistent volumes and persistent volume claims to validate health or help troubleshoot potential issues.

Tasks:
  • Query The Jenkins Kubernetes Workload HTTP Endpoint in Kubernetes StatefulSet `STATEFULSET_NAME`
  • Query For Stuck Jenkins Jobs in Kubernetes Statefulset Workload `STATEFULSET_NAME`

Icon 1 2 Troubleshooting Commands

Icon 2 Contributed by nmadhok

Icon 2 Codecollection: rw-cli-codecollection


This codebundle runs a series of tasks to identify potential helm release issues related to ArgoCD managed Helm objects.

Tasks:
  • Fetch all available ArgoCD Helm releases in namespace `NAMESPACE`
  • Fetch Installed ArgoCD Helm release versions in namespace `NAMESPACE`

Icon 1 4 Troubleshooting Commands

Icon 2 Contributed by jon-funk

Icon 2 Codecollection: rw-cli-codecollection


This taskset provides detailed information about the images used in a Kubernetes namespace.

Tasks:
  • Check Image Rollover Times for Namespace `NAMESPACE`
  • List Images and Tags for Every Container in Running Pods for Namespace `NAMESPACE`
  • List Images and Tags for Every Container in Failed Pods for Namespace `NAMESPACE`
  • List ImagePullBackOff Events and Test Path and Tags for Namespace `NAMESPACE`

Icon 1 4 Troubleshooting Commands

Icon 2 Contributed by stewartshea

Icon 2 Codecollection: rw-cli-codecollection


Provides a list of tasks that can remediate configuraiton issues with manifests in GitHub based GitOps repositories.

Tasks:
  • Remediate Readiness and Liveness Probe GitOps Manifests in Namespace `NAMESPACE`
  • Increase ResourceQuota Limit for Namespace `NAMESPACE` in GitHub GitOps Repository
  • Adjust Pod Resources to Match VPA Recommendation in `NAMESPACE`
  • Expand Persistent Volume Claims in Namespace `NAMESPACE`

Icon 1 2 Troubleshooting Commands

Icon 2 Contributed by stewartshea

Icon 2 Codecollection: rw-cli-codecollection


Evaluate cluster node health using kubectl.

Tasks:
  • Check for Node Restarts in Cluster `${CONTEXT}`
  • Generate Namespace Score in Kubernetes Cluster `$${CONTEXT}`

Icon 1 1 Troubleshooting Commands

Icon 2 Contributed by stewartshea

Icon 2 Codecollection: rw-cli-codecollection


Evaluate cluster node health using kubectl

Tasks:
  • Check for Node Restarts in Cluster `CONTEXT` within Interval `INTERVAL`

Icon 1 2 Troubleshooting Commands

Icon 2 Contributed by stewartshea

Icon 2 Codecollection: rw-cli-codecollection


This codebundle runs a series of tasks to identify potential Kustomization issues related to Flux managed Kustomization objects.

Tasks:
  • List all available FluxCD Kustomization objects in Namespace `NAMESPACE`
  • List Unready FluxCD Kustomizations in Namespace `NAMESPACE`

Icon 1 1 Troubleshooting Commands

Icon 2 Contributed by jon-funk

Icon 2 Codecollection: rw-cli-codecollection


Performs a triage on the Open Source version of Artifactory in a Kubernetes cluster.

Tasks:
  • Check Artifactory Liveness and Readiness Endpoints in `NAMESPACE`

Icon 1 1 Troubleshooting Commands

Icon 2 Contributed by jon-funk

Icon 2 Codecollection: rw-cli-codecollection


Measures the number of exception stacktraces present in an application's logs over a time period.

Tasks:
  • Measure Application Exceptions in `${NAMESPACE}`

Icon 1 3 Troubleshooting Commands

Icon 2 Contributed by jon-funk

Icon 2 Codecollection: rw-cli-codecollection


Performs application-level troubleshooting by inspecting the logs of a workload for parsable exceptions, and attempts to determine next steps.

Tasks:
  • Get `CONTAINER_NAME` Application Logs from Workload `WORKLOAD_NAME` in Namespace `NAMESPACE`
  • Scan `CONTAINER_NAME` Application For Misconfigured Environment
  • Tail `CONTAINER_NAME` Application Logs For Stacktraces in Workload `WORKLOAD_NAME`

Icon 1 1 Troubleshooting Commands

Icon 2 Contributed by stewartshea

Icon 2 Codecollection: rw-cli-codecollection


This codebundle fetches the number of running pods with the set of provided labels, letting you measure the number of running pods.

Tasks:
  • Measure Number of Running Pods with Label in `${NAMESPACE}`

Icon 1 7 Troubleshooting Commands

Icon 2 Contributed by stewartshea

Icon 2 Codecollection: rw-cli-codecollection


Perform oprational tasks for a Kubernetes deployment.

Tasks:
  • Restart Deployment `DEPLOYMENT_NAME` in Namespace `NAMESPACE`
  • Force Delete Pods in Deployment `DEPLOYMENT_NAME` in Namespace `NAMESPACE`
  • Rollback Deployment `DEPLOYMENT_NAME` in Namespace `NAMESPACE` to Previous Version
  • Scale Down Deployment `DEPLOYMENT_NAME` in Namespace `NAMESPACE`
  • Scale Up Deployment `DEPLOYMENT_NAME` in Namespace `NAMESPACE` by SCALE_UP_FACTORx
  • Clean Up Stale ReplicaSets for Deployment `DEPLOYMENT_NAME` in Namespace `NAMESPACE`
  • Scale Down Stale ReplicaSets for Deployment `DEPLOYMENT_NAME` in Namespace `NAMESPACE`

Icon 1 4 Troubleshooting Commands

Icon 2 Contributed by jon-funk

Icon 2 Codecollection: rw-cli-codecollection


Inspects the resources provisioned for a given set of pods and raises issues or recommendations as necessary.

Tasks:
  • Show Pods Without Resource Limit or Resource Requests Set in Namespace `NAMESPACE`
  • Check Pod Resource Utilization with Top in Namespace `NAMESPACE`
  • Identify VPA Pod Resource Recommendations in Namespace `NAMESPACE`
  • Identify Overutilized Pods in Namespace `NAMESPACE`

Icon 1 3 Troubleshooting Commands

Icon 2 Contributed by jon-funk

Icon 2 Codecollection: rw-cli-codecollection


Triages issues related to a Daemonset and its available replicas.

Tasks:
  • Get DaemonSet Logs for `DAEMONSET_NAME` and Add to Report Show More
    Common scenarios that might relate to this command or script:
    1. Monitoring the health and performance of a specific daemonset in a Kubernetes cluster to troubleshoot any issues or anomalies. 2. Investigating frequent CrashLoopBackoff events for a particular daemonset to identify the root cause and potential solutions. 3. Analyzing the logs of a specific daemonset to track down errors or issues related to resource utilization, connectivity, or application functionality. 4. Troubleshooting networking problems or intermittent failures for a daemonset by reviewing its recent log entries to identify patterns or recurring issues. 5. Performing regular maintenance or checks on a specific daemonset to proactively identify and address any potential issues before they impact production environments.
  • Get Related Daemonset `DAEMONSET_NAME` Events in Namespace `NAMESPACE`
  • Check Daemonset `DAEMONSET_NAME` Replicas Show More
    Common scenarios that might relate to this command or script:
    1. Troubleshooting Kubernetes CrashLoopBackoff events in a production environment to identify the root cause and resolve the issue. 2. Conducting a routine check on various daemonsets in a Kubernetes cluster to ensure they are running as expected and have the correct configuration. 3. Investigating performance issues related to a specific daemonset in a Kubernetes cluster and using the command to gather detailed information for analysis. 4. Auditing the status and configuration of all daemonsets in a Kubernetes cluster as part of regular maintenance tasks. 5. Resolving connectivity or networking issues affecting a specific daemonset by examining its current status and configuration with the command.

Icon 1 5 Troubleshooting Commands

Icon 2 Contributed by stewartshea

Icon 2 Codecollection: rw-cli-codecollection


This taskset is used to suspend a flux resource for the purposes of executing chaos tasks.

Tasks:
  • Suspend the Flux Resource Reconciliation for `FLUX_RESOURCE_NAME` in namespace `FLUX_RESOURCE_NAMESPACE`
  • Select Random FluxCD Workload for Chaos Target in Namespace `FLUX_RESOURCE_NAMESPACE`
  • Execute Chaos Command on `TARGET_RESOURCE` in Namespace `TARGET_NAMESPACE`
  • Execute Additional Chaos Command on FLUX_RESOURCE_TYPE 'FLUX_RESOURCE_NAME' in namespace 'FLUX_RESOURCE_NAMESPACE'
  • Resume Flux Resource Reconciliation in `TARGET_NAMESPACE`

Icon 1 5 Troubleshooting Commands

Icon 2 Contributed by jon-funk

Icon 2 Codecollection: rw-cli-codecollection


This taskset investigates the logs, state and health of Kubernetes Prometheus operator.

Tasks:
  • Check Prometheus Service Monitors in namespace `NAMESPACE`
  • Check For Successful Rule Setup in Kubernetes Namespace `NAMESPACE`
  • Verify Prometheus RBAC Can Access ServiceMonitors in Namespace `PROM_NAMESPACE`
  • Inspect Prometheus Operator Logs for Scraping Errors in Namespace `NAMESPACE`
  • Check Prometheus API Healthy in Namespace `PROM_NAMESPACE`

Icon 1 1 Troubleshooting Commands

Icon 2 Contributed by jon-funk

Icon 2 Codecollection: rw-cli-codecollection


Measures the number of exception stacktraces present in an application's logs over a time period.

Tasks:
  • Tail `${CONTAINER_NAME}` Application Logs For Stacktraces

Icon 1 2 Troubleshooting Commands

Icon 2 Contributed by jon-funk

Icon 2 Codecollection: rw-cli-codecollection


Performs application-level troubleshooting by inspecting the logs of a workload for parsable exceptions, and attempts to determine next steps.

Tasks:
  • Get `CONTAINER_NAME` Application Logs in Namespace `NAMESPACE`
  • Tail `CONTAINER_NAME` Application Logs For Stacktraces

Icon 1 3 Troubleshooting Commands

Icon 2 Contributed by jon-funk

Icon 2 Codecollection: rw-cli-codecollection


This taskset restarts a resource with a given set of labels, typically used with other tasksets.

Tasks:
  • Get Current Resource State with Labels `LABELS`
  • Get Resource Logs with Labels `LABELS`
  • Restart Resource with Labels `LABELS` in `CONTEXT`

Icon 1 5 Troubleshooting Commands

Icon 2 Contributed by stewartshea

Icon 2 Codecollection: rw-cli-codecollection


This taskset collects information and runs general troubleshooting checks against argocd application objects within a namespace.

Tasks:
  • Fetch ArgoCD Application Sync Status & Health for `APPLICATION`
  • Fetch ArgoCD Application Last Sync Operation Details for `APPLICATION`
  • Fetch Unhealthy ArgoCD Application Resources for `APPLICATION`
  • Scan For Errors in Pod Logs Related to ArgoCD Application `APPLICATION`
  • Fully Describe ArgoCD Application `APPLICATION`

Icon 1 3 Troubleshooting Commands

Icon 2 Contributed by Jonathan Funk

Icon 2 Codecollection: rw-public-codecollection


Triages issues related to a deployment's replicas.

Tasks:
  • Fetch Logs
  • Get Related Events
  • Check Deployment Replicas

Icon 1 4 Troubleshooting Commands

Icon 2 Contributed by Jonathan Funk

Icon 2 Codecollection: rw-public-codecollection


A taskset for troubleshooting general issues associated with typical kubernetes deployment resources. Supports API interactions via both the API client and Kubectl binary through RunWhen Shell Services.

Tasks:
  • Troubleshoot Resourcing
  • Troubleshoot Events
  • Troubleshoot PVC
  • Troubleshoot Pods

Icon 1 1 Troubleshooting Commands

Icon 2 Contributed by Jonathan Funk

Icon 2 Codecollection: rw-public-codecollection


Searches a namespace for matching objects and provides the commands to decommission them.

Tasks:
  • Generate Decomission Commands

Icon 1 1 Troubleshooting Commands

Icon 2 Contributed by Jonathan Funk

Icon 2 Codecollection: rw-public-codecollection


Measures the maximum replica lag across a Patroni cluster.

Tasks:
  • Measure Patroni Member Lag

Icon 1 1 Troubleshooting Commands

Icon 2 Contributed by Jonathan Funk

Icon 2 Codecollection: rw-public-codecollection


Detects and reinitializes laggy Patroni cluster members which are unable to catchup in replication using kubectl and patronictl.

Tasks:
  • Determine Patroni Health

Icon 1 7 Troubleshooting Commands

Icon 2 Contributed by Shea Stewart

Icon 2 Codecollection: rw-public-codecollection


Runs multiple Kubernetes and psql commands to report on the health of a postgres cluster.

Tasks:
  • Get Standard Resources
  • Describe Custom Resources
  • Get Pod Logs & Events
  • Get Pod Resource Utilization
  • Get Running Configuration
  • Get Patroni Output
  • Run DB Queries

Icon 1 1 Troubleshooting Commands

Icon 2 Contributed by Jonathan Funk

Icon 2 Codecollection: rw-public-codecollection


Check the health of a Kubernetes API server using kubectl. Returns 1 when OK, or a 0 in the case of an unhealthy API server.

Tasks:
  • Running Kubectl Check Against API Server

Icon 1 1 Troubleshooting Commands

Icon 2 Contributed by Shea Stewart

Icon 2 Codecollection: rw-public-codecollection


Runs a postgres SQL query and pushes the returned result into a report. During execution, the SQL query should be passed to a Kubernetes workload that has access to the psql binary. The workload will run the query and return the results from stdout.

Tasks:
  • Run Postgres Query And Results to Report

Icon 1 4 Troubleshooting Commands

Icon 2 Contributed by Shea Stewart

Icon 2 Codecollection: rw-public-codecollection


This SLI uses kubectl to score namespace health. Produces a value between 0 (completely failing thet test) and 1 (fully passing the test). Looks for container restarts, events, and pods not ready.

Tasks:
  • Get Event Count and Score
  • Get Container Restarts and Score
  • Get NotReady Pods
  • Generate Namspace Score

Icon 1 5 Troubleshooting Commands

Icon 2 Contributed by Jonathan Funk

Icon 2 Codecollection: rw-public-codecollection


This taskset runs general troubleshooting checks against all applicable objects in a namespace, checks error events, and searches pod logs for error entries.

Tasks:
  • Trace Namespace Errors
  • Fetch Unready Pods
  • Triage Namespace
  • Object Condition Check
  • Namespace Get All

Icon 1 1 Troubleshooting Commands

Icon 2 Contributed by Jonathan Funk

Icon 2 Codecollection: rw-public-codecollection


Creates an adhoc one-shot job which mounts a PVC as a canary test, which is polled for success before being torn down.

Tasks:
  • Run Canary Job

Icon 1 1 Troubleshooting Commands

Icon 2 Contributed by Jonathan Funk

Icon 2 Codecollection: rw-public-codecollection


Retreieve aggregate data via kubectl top command.

Tasks:
  • Running Kubectl Top And Extracting Metric Data

Icon 1 1 Troubleshooting Commands

Icon 2 Contributed by Shea Stewart

Icon 2 Codecollection: rw-public-codecollection


This codebundle runs a kubectl get command that produces a value and pushes the metric. Uses jmespath for filtering and allows calculations such as count, sum, avg on specified fields.

Tasks:
  • Running Kubectl get and push the metric

Icon 1 1 Troubleshooting Commands

Icon 2 Contributed by Jonathan Funk

Icon 2 Codecollection: rw-public-codecollection


Returns the number of events with matching messages as an SLI metric.

Tasks:
  • Get Number Of Matching Events

Icon 1 3 Troubleshooting Commands

Icon 2 Contributed by Jonathan Funk

Icon 2 Codecollection: rw-public-codecollection


Taskset to triage issues related to patroni.

Tasks:
  • Get Patroni Status
  • Get Pods Status
  • Fetch Logs

Icon 1 4 Troubleshooting Commands

Icon 2 Contributed by Jonathan Funk

Icon 2 Codecollection: rw-public-codecollection


A taskset for troubleshooting issues for StatefulSets and their related resources.

Tasks:
  • Check StatefulSets Replicas Ready
  • Get Events For The StatefulSet
  • Get StatefulSet Logs
  • Get StatefulSet Manifests Dump

Icon 1 1 Troubleshooting Commands

Icon 2 Contributed by Jonathan Funk

Icon 2 Codecollection: rw-public-codecollection


Checks that the current state of a daemonset is healthy and returns a score of either 1 (healthy) or 0 (unhealthy).

Tasks:
  • Health Check Daemonset

Icon 1 1 Troubleshooting Commands

Icon 2 Contributed by Jonathan Funk

Icon 2 Codecollection: rw-public-codecollection


Uses kubectl (or equivalent) to query the state of a patroni cluster and determine if it's healthy.

Tasks:
  • Determine Patroni Health

Icon 1 1 Troubleshooting Commands

Icon 2 Contributed by Jonathan Funk

Icon 2 Codecollection: rw-public-codecollection


This codebundle runs an arbitrary kubectl command and writes the stdout to a report. Typically used in conjunction with other codebundles.

Tasks:
  • Running Kubectl And Adding Stdout To Report