GCP

Icon

Icon 1 1 Troubleshooting Commands

Icon 2 Contributed by stewartshea

Icon 2 Codecollection: rw-cli-codecollection


Count the number of Cloud Functions in an unhealthy state for a GCP Project.

Tasks:
  • Count unhealthy GCP Cloud Functions in GCP Project `${GCP_PROJECT_ID}`

Icon 1 2 Troubleshooting Commands

Icon 2 Contributed by stewartshea

Icon 2 Codecollection: rw-cli-codecollection


Identify problems related to GCP Cloud Function deployments

Tasks:
  • List Unhealthy Cloud Functions in GCP Project `GCP_PROJECT_ID`
  • Get Error Logs for Unhealthy Cloud Functions in GCP Project `GCP_PROJECT_ID`

Icon 1 4 Troubleshooting Commands

Icon 2 Contributed by stewartshea

Icon 2 Codecollection: rw-cli-codecollection


This SLI uses the GCP API or gcloud to score bucket health. Produces a value between 0 (completely failing thet test) and 1 (fully passing the test). Looks for usage above a threshold and public buckets.

Tasks:
  • Fetch GCP Bucket Storage Utilization for `${PROJECT_IDS}`
  • Check GCP Bucket Security Configuration for `${PROJECT_IDS}`
  • Fetch GCP Bucket Storage Operations Rate for `${PROJECT_IDS}`
  • Generate Bucket Score in Project `${PROJECT_IDS}`

Icon 1 4 Troubleshooting Commands

Icon 2 Contributed by stewartshea

Icon 2 Codecollection: rw-cli-codecollection


Inspect GCP Storage bucket usage and configuration.

Tasks:
  • Fetch GCP Bucket Storage Utilization for `PROJECT_IDS`
  • Add GCP Bucket Storage Configuration for `PROJECT_IDS` to Report Show More
    Common scenarios that might relate to this command or script:
    1. Troubleshooting Kubernetes CrashLoopBackoff events: When a container in a Kubernetes pod repeatedly crashes and restarts, a DevOps or Site Reliability Engineer may need to use the Google Cloud Platform to gather information on the crash events and analyze the data in order to diagnose and resolve the issue. 2. Accessing metadata for multiple buckets in different GCP projects: When managing multiple GCP projects and needing to retrieve metadata for all the buckets within each project, a DevOps or Site Reliability Engineer might use this Bash script to efficiently collect and organize the necessary data for analysis or reporting purposes. 3. Creating a backup of bucket metadata: In preparation for a migration or data transfer, a DevOps or Site Reliability Engineer could use this script to generate a JSON file containing the metadata for all buckets in multiple GCP projects as part of a backup process. 4. Auditing bucket access permissions: As a security measure, a DevOps or Site Reliability Engineer might utilize this script to regularly audit and review the access permissions for all buckets across various GCP projects to ensure compliance and proper data protection measures. 5. Automating routine tasks: When needing to frequently gather and consolidate bucket metadata from multiple GCP projects for monitoring or reporting purposes, a DevOps or Site Reliability Engineer could employ this script to automate the process and streamline their workflow.
  • Check GCP Bucket Security Configuration for `PROJECT_IDS`
  • Fetch GCP Bucket Storage Operations Rate for `PROJECT_IDS`

Icon 1 5 Troubleshooting Commands

Icon 2 Contributed by stewartshea

Icon 2 Codecollection: rw-cli-codecollection


Troubleshoot GCE Ingress Resources related to GCP HTTP Load Balancer in GKE

Tasks:
  • Search For GCE Ingress Warnings in GKE Context `CONTEXT`
  • Identify Unhealthy GCE HTTP Ingress Backends in GKE Namespace `NAMESPACE`
  • Validate GCP HTTP Load Balancer Configurations in GCP Project `GCP_PROJECT_ID`
  • Fetch Network Error Logs from GCP Operations Manager for Ingress Backends in GCP Project `GCP_PROJECT_ID`
  • Review GCP Operations Logging Dashboard in GCP project `GCP_PROJECT_ID`

Icon 1 1 Troubleshooting Commands

Icon 2 Contributed by jon-funk

Icon 2 Codecollection: rw-cli-codecollection


Fetches logs from a GCP using a configurable query and raises an issue with details on the most common issues.

Tasks:
  • Inspect GCP Logs For Common Errors in GCP Project `GCP_PROJECT_ID`

Icon 1 5 Troubleshooting Commands

Icon 2 Contributed by stewartshea

Icon 2 Codecollection: rw-cli-codecollection


Identify issues affecting GKE Clusters in a GCP Project and creates a health score. A score of 1 is healthy, a score between 0 and 1 indicates unhealthy components.

Tasks:
  • Identify GKE Service Account Issues in GCP Project `${GCP_PROJECT_ID}`
  • Fetch GKE Recommendations for GCP Project `${GCP_PROJECT_ID}`
  • Fetch GKE Cluster Health for GCP Project `${GCP_PROJECT_ID}`
  • Check for Quota Related GKE Autoscaling Issues in GCP Project `${GCP_PROJECT_ID}`
  • Generate GKE Cluster Health Score

Icon 1 4 Troubleshooting Commands

Icon 2 Contributed by stewartshea

Icon 2 Codecollection: rw-cli-codecollection


Identify issues affecting GKE Clusters in a GCP Project

Tasks:
  • Identify GKE Service Account Issues in GCP Project `GCP_PROJECT_ID`
  • Fetch GKE Recommendations for GCP Project `GCP_PROJECT_ID`
  • Fetch GKE Cluster Health for GCP Project `GCP_PROJECT_ID`
  • Check for Quota Related GKE Autoscaling Issues in GCP Project `GCP_PROJECT_ID`

Icon 1 1 Troubleshooting Commands

Icon 2 Contributed by stewartshea

Icon 2 Codecollection: rw-cli-codecollection


Counts nodes that have been preempted within the defined time interval.

Tasks:
  • Count the number of nodes in active preempt operation in project `${GCP_PROJECT_ID}`

Icon 1 1 Troubleshooting Commands

Icon 2 Contributed by stewartshea

Icon 2 Codecollection: rw-cli-codecollection


List all GCP nodes that have been preempted in the previous time interval.

Tasks:
  • List all nodes in an active preempt operation for GCP Project `GCP_PROJECT_ID` within the last `AGE` hours Show More
    Common scenarios that might relate to this command or script:
    1. Investigating and resolving performance issues in a Kubernetes cluster, such as high CPU or memory utilization, by identifying and addressing any bottlenecked or underperforming pods. 2. Troubleshooting and resolving Kubernetes CrashLoopBackoff events, which occur when a container repeatedly crashes after starting up, by identifying the root cause and implementing appropriate fixes. 3. Monitoring and optimizing resource allocation in a Kubernetes cluster to ensure efficient usage of compute instances and avoid potential overutilization or underutilization. 4. Managing service account authentication for accessing resources within a Google Cloud Platform project and ensuring proper authorization and permissions are in place for preempted compute instances. 5. Implementing automated alerting and reporting for preempted compute instances within a specific time frame in a Google Cloud Platform project, in order to gain insights into potential service disruptions and take proactive measures to mitigate them.

Icon 1 2 Troubleshooting Commands

Icon 2 Contributed by jon-funk

Icon 2 Codecollection: rw-cli-codecollection


Collects Nginx ingress host controller metrics from GMP on GCP and inspects the results for ingress with a HTTP error code rate greater than zero over a configurable duration and raises issues based on the number of ingress with error codes.

Tasks:
  • Fetch Nginx HTTP Errors From GMP for Ingress `INGRESS_OBJECT_NAME`
  • Find Owner and Service Health for Ingress `INGRESS_OBJECT_NAME`

Icon 1 3 Troubleshooting Commands

Icon 2 Contributed by stewartshea

Icon 2 Codecollection: rw-cli-codecollection


Collects Kong ingress host metrics from GMP on GCP and inspects the results for ingress with a HTTP error code rate greater than zero over a configurable duration and raises issues based on the number of ingress with error codes.

Tasks:
  • Check If Kong Ingress HTTP Error Rate Violates HTTP Error Threshold in GCP Project `GCP_PROJECT_ID`
  • Check If Kong Ingress HTTP Request Latency Violates Threshold in GCP Project `GCP_PROJECT_ID`
  • Check If Kong Ingress Controller Reports Upstream Errors in GCP Project `GCP_PROJECT_ID`

Icon 1 1 Troubleshooting Commands

Icon 2 Contributed by Jonathan Funk

Icon 2 Codecollection: rw-public-codecollection


Generate a link to the GCP Log Explorer.

Tasks:
  • Get GCP Log Dashboard URL For Given Log Query

Icon 1 1 Troubleshooting Commands

Icon 2 Contributed by Jonathan Funk

Icon 2 Codecollection: rw-public-codecollection


This codebundle sets up a monitor for a specific region and GCP Product, which is then periodically checked for ongoing incidents based on the history available at https://status.cloud.google.com/incidents.json filtered based on severity level.

Tasks:
  • Get Number of GCP Incidents Effecting My Workspace

Icon 1 1 Troubleshooting Commands

Icon 2 Contributed by Jonathan Funk

Icon 2 Codecollection: rw-public-codecollection


Retrieve the number of results of a GCP Log Explorer query.

Tasks:
  • Running GCE Logging Query And Pushing Result Count Metric

Icon 1 1 Troubleshooting Commands

Icon 2 Contributed by

Icon 2 Codecollection: rw-public-codecollection


Performs a metric query using a Google MQL statement on the Ops Suite API and pushes the result as an SLI metric.

Tasks:
  • Running GCP OpsSuite Metric Query

Icon 1 5 Troubleshooting Commands

Icon 2 Contributed by Shea Stewart

Icon 2 Codecollection: rw-public-codecollection


Uses promql on the Ops Suite API to determine the health of a Kong managed ingress resource and pushes the result as an SLI metric. Produces a 1 for a healthy resource, or 0 for an unhealthy resource.

Tasks:
  • Get Access Token
  • Get HTTP Error Rate
  • Get Upstream Health
  • Get Request Latency Rate
  • Generate Kong Ingress Score

Icon 1 8 Troubleshooting Commands

Icon 2 Contributed by Shea Stewart

Icon 2 Codecollection: rw-public-codecollection


Uses promql on the Ops Suite API to determine the health of a MongoDB database instance and pushes the result as an SLI metric. Produces a 1 for a healthy resource, or 0 for an unhealthy resource.

Tasks:
  • Get Access Token
  • Get Instance Status
  • Get Connection Utilization Rate
  • Get MongoDB Member State Health
  • Get MongoDB Replication Lag
  • Get MongoDB Queue Size
  • Get Assertion Rate
  • Generate MongoDB Score

Icon 1 1 Troubleshooting Commands

Icon 2 Contributed by Jonathan Funk

Icon 2 Codecollection: rw-public-codecollection


Run arbitrary gcloud commands and parse their output for arbitrary values such as json to be submitted as a metric.

Tasks:
  • Run Gcloud CLI Command and Push metric

Icon 1 1 Troubleshooting Commands

Icon 2 Contributed by Jonathan Funk

Icon 2 Codecollection: rw-public-codecollection


Run arbitrary gcloud commands and capture the stdout in a report.

Tasks:
  • Run Gcloud CLI Command and Push metric

Icon 1 1 Troubleshooting Commands

Icon 2 Contributed by Shea Stewart

Icon 2 Codecollection: rw-public-codecollection


Performs a metric query using a PromQL statement on the Ops Suite API and pushes the result as an SLI metric.

Tasks:
  • Run Prometheus Instant Query Against Google Prom API Endpoint

Icon 1 1 Troubleshooting Commands

Icon 2 Contributed by stewartshea

Icon 2 Codecollection: rw-generic-codecollection


Runs a user provided gcloud command and pushes the metric to the RunWhen Platform. The supplied command must result in distinct single metric. Command line tools like jq are available.

Tasks:
  • ${TASK_TITLE}

Icon 1 1 Troubleshooting Commands

Icon 2 Contributed by stewartshea

Icon 2 Codecollection: rw-generic-codecollection


Runs a user provided gcloud command

Tasks:
  • TASK_TITLE