GCP

Icon

Icon 1 1 Troubleshooting Commands

Icon 2 Contributed by Jonathan Funk

Icon 2 Codecollection: rw-public-codecollection


Run arbitrary gcloud commands and parse their output for arbitrary values such as json to be submitted as a metric.

Tasks:
  • Run Gcloud CLI Command and Push metric

Icon 1 1 Troubleshooting Commands

Icon 2 Contributed by Jonathan Funk

Icon 2 Codecollection: rw-public-codecollection


Run arbitrary gcloud commands and capture the stdout in a report.

Tasks:
  • Run Gcloud CLI Command and Push metric

Icon 1 1 Troubleshooting Commands

Icon 2 Contributed by Jonathan Funk

Icon 2 Codecollection: rw-public-codecollection


Retrieve the number of results of a GCP Log Explorer query.

Tasks:
  • Running GCE Logging Query And Pushing Result Count Metric

Icon 1 1 Troubleshooting Commands

Icon 2 Contributed by Jonathan Funk

Icon 2 Codecollection: rw-public-codecollection


Generate a link to the GCP Log Explorer.

Tasks:
  • Get GCP Log Dashboard URL For Given Log Query

Icon 1 1 Troubleshooting Commands

Icon 2 Contributed by Shea Stewart

Icon 2 Codecollection: rw-public-codecollection


Performs a metric query using a PromQL statement on the Ops Suite API and pushes the result as an SLI metric.

Tasks:
  • Run Prometheus Instant Query Against Google Prom API Endpoint

Icon 1 1 Troubleshooting Commands

Icon 2 Contributed by

Icon 2 Codecollection: rw-public-codecollection


Performs a metric query using a Google MQL statement on the Ops Suite API and pushes the result as an SLI metric.

Tasks:
  • Running GCP OpsSuite Metric Query

Icon 1 8 Troubleshooting Commands

Icon 2 Contributed by Shea Stewart

Icon 2 Codecollection: rw-public-codecollection


Uses promql on the Ops Suite API to determine the health of a MongoDB database instance and pushes the result as an SLI metric. Produces a 1 for a healthy resource, or 0 for an unhealthy resource.

Tasks:
  • Get Access Token
  • Get Instance Status
  • Get Connection Utilization Rate
  • Get MongoDB Member State Health
  • Get MongoDB Replication Lag
  • Get MongoDB Queue Size
  • Get Assertion Rate
  • Generate MongoDB Score

Icon 1 1 Troubleshooting Commands

Icon 2 Contributed by Jonathan Funk

Icon 2 Codecollection: rw-public-codecollection


This codebundle sets up a monitor for a specific region and GCP Product, which is then periodically checked for ongoing incidents based on the history available at https://status.cloud.google.com/incidents.json filtered based on severity level.

Tasks:
  • Get Number of GCP Incidents Effecting My Workspace

Icon 1 5 Troubleshooting Commands

Icon 2 Contributed by Shea Stewart

Icon 2 Codecollection: rw-public-codecollection


Uses promql on the Ops Suite API to determine the health of a Kong managed ingress resource and pushes the result as an SLI metric. Produces a 1 for a healthy resource, or 0 for an unhealthy resource.

Tasks:
  • Get Access Token
  • Get HTTP Error Rate
  • Get Upstream Health
  • Get Request Latency Rate
  • Generate Kong Ingress Score

Icon 1 1 Troubleshooting Commands

Icon 2 Contributed by stewartshea

Icon 2 Codecollection: rw-cli-codecollection


Count the number of Cloud Functions in an unhealthy state for a GCP Project.

Tasks:
  • Count unhealthy GCP Cloud Functions in GCP Project `${GCP_PROJECT_ID}`

Icon 1 2 Troubleshooting Commands

Icon 2 Contributed by stewartshea

Icon 2 Codecollection: rw-cli-codecollection


Identify problems related to GCP Cloud Function deployments

Tasks:
  • List Unhealhy Cloud Functions in GCP Project `GCP_PROJECT_ID`
  • Get Error Logs for Unhealthy Cloud Functions in GCP Project `GCP_PROJECT_ID`

Icon 1 5 Troubleshooting Commands

Icon 2 Contributed by stewartshea

Icon 2 Codecollection: rw-cli-codecollection


Troubleshoot GCE Ingress Resources related to GCP HTTP Load Balancer in GKE

Tasks:
  • Search For GCE Ingress Warnings in GKE Show More
    Common scenarios that might relate to this command or script:
    1. Troubleshooting Kubernetes CrashLoopBackoff events: A DevOps or Site Reliability Engineer might use this command to gather information on abnormal events related to an Ingress and its associated Services in order to identify and fix any issues causing the CrashLoopBackoff events. 2. Investigating service disruption: If there are reports of service disruption within a specific namespace, a DevOps or Site Reliability Engineer might use this command to retrieve events related to the Ingress and Services to identify any abnormal events causing the disruption. 3. Debugging failed deployments: When a deployment fails within a specified namespace, a DevOps or Site Reliability Engineer might use this command to gather information on any abnormal events related to the Ingress and Services that could be contributing to the failed deployment. 4. Monitoring for unusual behavior: As part of routine monitoring and maintenance, a DevOps or Site Reliability Engineer might use this command to regularly check for abnormal events related to Ingress and Services within a specific namespace for any unusual behavior that could indicate potential issues. 5. Identifying resource conflicts: In a multi-tenant environment, a DevOps or Site Reliability Engineer might use this command to retrieve events related to the Ingress and Services in order to identify any resource conflicts or issues arising from interactions between different applications or services within the same namespace.
  • Identify Unhealthy GCE HTTP Ingress Backends Show More
    Common scenarios that might relate to this command or script:
    1. Troubleshooting Kubernetes CrashLoopBackoff events: A DevOps or Site Reliability Engineer may use this task to quickly identify and address unhealthy backends that are causing CrashLoopBackoff events in a Kubernetes cluster. 2. Monitoring and alerting: This task can be used to set up automated monitoring and alerting for unhealthy backends in a Kubernetes cluster, allowing the team to proactively address any issues before they impact the system. 3. Incident response: In the event of a system outage or performance degradation, a DevOps or SRE may use this task to quickly identify and address any unhealthy backends that are contributing to the issue. 4. Capacity planning: This task can be used to analyze the health and status of backends in a Kubernetes cluster, allowing the team to make informed decisions about capacity planning and resource allocation. 5. Continuous improvement: By regularly using this task to monitor and analyze the health of backends in a Kubernetes cluster, a DevOps or SRE can identify areas for improvement and optimize the system for better performance and reliability.
  • Validate GCP HTTP Load Balancer Configurations
  • Fetch Network Error Logs from GCP Operations Manager for Ingress Backends Show More
    Common scenarios that might relate to this command or script:
    1. Monitoring and troubleshooting an Ingress controller in Kubernetes when it goes into CrashLoopBackoff due to unhealthy backends. 2. Investigating and resolving issues with backend services not responding or returning errors within a Kubernetes cluster. 3. Troubleshooting and identifying the root cause of failures in Kubernetes pods or deployments that result in CrashLoopBackoff events. 4. Analyzing GCP logging data for error messages related to Kubernetes workloads and diagnosing issues such as connectivity problems or service outages. 5. Automating the process of identifying and retrieving error logs from GCP logging for specific backends in an Ingress controller in Kubernetes.
  • Review GCP Operations Logging Dashboard Show More
    Common scenarios that might relate to this command or script:
    1. Troubleshooting Kubernetes CrashLoopBackoff events: The DevOps or Site Reliability Engineer might use this command to quickly access and review logs for unhealthy backends in order to identify the root cause of the CrashLoopBackoff events. 2. Investigating high error rates in a specific GCP project or namespace: The engineer might use the command to easily gather and analyze logs from specific environments to identify patterns or issues causing high error rates. 3. Monitoring and analyzing traffic spikes or anomalies in a GCP ingress: The command could be used to generate logs for a specific ingress and quickly review them for unusual traffic patterns or anomalies. 4. Troubleshooting performance issues in a particular GCP context: The engineer might utilize the command to gather and analyze logs for a specific context to diagnose and resolve performance-related issues. 5. Investigating failures in a specific environment or application namespace: The command could be used to quickly access and investigate logs for a specific environment or application namespace experiencing failures or errors.

Icon 1 1 Troubleshooting Commands

Icon 2 Contributed by jon-funk

Icon 2 Codecollection: rw-cli-codecollection


Fetches logs from a GCP using a configurable query and raises an issue with details on the most common issues.

Tasks:
  • Inspect GCP Logs For Common Errors

Icon 1 1 Troubleshooting Commands

Icon 2 Contributed by stewartshea

Icon 2 Codecollection: rw-cli-codecollection


Counts nodes that have been preempted within the defined time interval.

Tasks:
  • Count the number of nodes in active prempt operation

Icon 1 1 Troubleshooting Commands

Icon 2 Contributed by stewartshea

Icon 2 Codecollection: rw-cli-codecollection


List all GCP nodes that have been preempted in the previous time interval.

Tasks:
  • List all nodes in an active prempt operation for GCP Project `GCP_PROJECT_ID`

Icon 1 4 Troubleshooting Commands

Icon 2 Contributed by stewartshea

Icon 2 Codecollection: rw-cli-codecollection


This SLI uses the GCP API or gcloud to score bucket health. Produces a value between 0 (completely failing thet test) and 1 (fully passing the test). Looks for usage above a threshold and public buckets.

Tasks:
  • Fetch GCP Bucket Storage Utilization for `${PROJECT_IDS}`
  • Check GCP Bucket Security Configuration for `${PROJECT_IDS}`
  • Fetch GCP Bucket Storage Operations Rate for `${PROJECT_IDS}`
  • Generate Bucket Score

Icon 1 4 Troubleshooting Commands

Icon 2 Contributed by stewartshea

Icon 2 Codecollection: rw-cli-codecollection


Inspect GCP Storage bucket usage and configuration.

Tasks:
  • Fetch GCP Bucket Storage Utilization for `PROJECT_IDS`
  • Add GCP Bucket Storage Configuration for `PROJECT_IDS` to Report Show More
    Common scenarios that might relate to this command or script:
    1. Troubleshooting Kubernetes CrashLoopBackoff events: When a container in a Kubernetes pod repeatedly crashes and restarts, a DevOps or Site Reliability Engineer may need to use the Google Cloud Platform to gather information on the crash events and analyze the data in order to diagnose and resolve the issue. 2. Accessing metadata for multiple buckets in different GCP projects: When managing multiple GCP projects and needing to retrieve metadata for all the buckets within each project, a DevOps or Site Reliability Engineer might use this Bash script to efficiently collect and organize the necessary data for analysis or reporting purposes. 3. Creating a backup of bucket metadata: In preparation for a migration or data transfer, a DevOps or Site Reliability Engineer could use this script to generate a JSON file containing the metadata for all buckets in multiple GCP projects as part of a backup process. 4. Auditing bucket access permissions: As a security measure, a DevOps or Site Reliability Engineer might utilize this script to regularly audit and review the access permissions for all buckets across various GCP projects to ensure compliance and proper data protection measures. 5. Automating routine tasks: When needing to frequently gather and consolidate bucket metadata from multiple GCP projects for monitoring or reporting purposes, a DevOps or Site Reliability Engineer could employ this script to automate the process and streamline their workflow.
  • Check GCP Bucket Security Configuration for `PROJECT_IDS`
  • Fetch GCP Bucket Storage Operations Rate for `PROJECT_IDS`

Icon 1 3 Troubleshooting Commands

Icon 2 Contributed by stewartshea

Icon 2 Codecollection: rw-cli-codecollection


Collects Kong ingress host metrics from GMP on GCP and inspects the results for ingress with a HTTP error code rate greater than zero over a configurable duration and raises issues based on the number of ingress with error codes.

Tasks:
  • Check If Kong Ingress HTTP Error Rate Violates HTTP Error Threshold Show More
    Common scenarios that might relate to this command or script:
    1. Troubleshooting Kubernetes CrashLoopBackoff events: A DevOps or Site Reliability Engineer may use this command to monitor HTTP error rates for a specific service running on Google Cloud, in order to identify and resolve any issues causing the CrashLoopBackoff events. 2. Investigating performance issues: If there are performance issues with a service running on Google Cloud, the DevOps or Site Reliability Engineer may utilize this command to monitor and analyze the HTTP error rates for the service, and identify any potential bottlenecks or issues affecting performance. 3. Conducting routine monitoring and analysis: As part of regular maintenance and monitoring tasks, the DevOps or Site Reliability Engineer may use this command to periodically check and track HTTP error rates for specific services running on Google Cloud, in order to ensure that they are operating efficiently and within expected parameters. 4. Incident response and troubleshooting: In the event of an incident or outage involving a specific service on Google Cloud, the DevOps or Site Reliability Engineer can use this command to quickly gather information about HTTP error rates and help diagnose the root cause of the issue. 5. Performance optimization and capacity planning: When optimizing the performance of services running on Google Cloud or planning for future capacity needs, the DevOps or Site Reliability Engineer may use this command to gather data on HTTP error rates and make informed decisions about resource allocation and infrastructure adjustments.
  • Check If Kong Ingress HTTP Request Latency Violates Threshold Show More
    Common scenarios that might relate to this command or script:
    1. Troubleshooting a Kubernetes CrashLoopBackoff event: DevOps or SRE may need to use this command to monitor the request latency for services running on Kubernetes to identify if any service is causing the crash loop. 2. Investigating performance issues in a microservices architecture: The command can be used to check the HTTP request latency for different services and identify any potential bottlenecks or performance issues. 3. Monitoring and alerting for SLA violations: DevOps or SRE may use this command as part of their monitoring and alerting systems to detect if any service is not meeting its Service Level Agreement (SLA) in terms of request latency. 4. Conducting regular performance checks and optimizations: This command can be automated to run at regular intervals to proactively identify and optimize the request latency for different services, improving overall system performance. 5. Investigating customer-reported performance complaints: If customers report slow response times from a particular service, DevOps or SRE can use this command to investigate and validate the reported performance issues.
  • Check If Kong Ingress Controller Reports Upstream Errors Show More
    Common scenarios that might relate to this command or script:
    1. Monitoring and troubleshooting Kubernetes CrashLoopBackoff events to identify the root cause of application crashes and implement remediation strategies. 2. Checking the health of a particular service in Google Cloud Platform and identifying any issues or anomalies in its performance or availability. 3. Investigating and resolving issues with service accounts in Google Cloud Platform, such as permissions errors or misconfigurations. 4. Automating the monitoring and health checks of multiple services in a Kubernetes cluster to ensure continuous availability and performance. 5. Integrating this command into a larger incident response and alerting system to proactively detect and address potential service disruptions or failures.

Icon 1 2 Troubleshooting Commands

Icon 2 Contributed by jon-funk

Icon 2 Codecollection: rw-cli-codecollection


Collects Nginx ingress host controller metrics from GMP on GCP and inspects the results for ingress with a HTTP error code rate greater than zero over a configurable duration and raises issues based on the number of ingress with error codes.

Tasks:
  • Fetch Nginx HTTP Errors From GMP for Ingress `INGRESS_OBJECT_NAME`
  • Find Owner and Service Health for Ingress `INGRESS_OBJECT_NAME`

Icon 1 1 Troubleshooting Commands

Icon 2 Contributed by stewartshea

Icon 2 Codecollection: rw-generic-codecollection


Runs a user provided gcloud command and pushes the metric to the RunWhen Platform. The supplied command must result in distinct single metric. Command line tools like jq are available.

Tasks:
  • ${TASK_TITLE}

Icon 1 1 Troubleshooting Commands

Icon 2 Contributed by stewartshea

Icon 2 Codecollection: rw-generic-codecollection


Runs a user provided gcloud command

Tasks:
  • TASK_TITLE