GCP

GCP Storage Bucket Health

4 Troubleshooting Commands

Contributed by stewartshea

Codecollection: rw-cli-codecollection

Inspect GCP Storage bucket usage and configuration.

Tasks:

Fetch GCP Bucket Storage Utilization for `PROJECT_IDS`
Add GCP Bucket Storage Configuration for `PROJECT_IDS` to Report Show More
Common scenarios that might relate to this command or script:
1. Troubleshooting Kubernetes CrashLoopBackoff events: When a container in a Kubernetes pod repeatedly crashes and restarts, a DevOps or Site Reliability Engineer may need to use the Google Cloud Platform to gather information on the crash events and analyze the data in order to diagnose and resolve the issue. 2. Accessing metadata for multiple buckets in different GCP projects: When managing multiple GCP projects and needing to retrieve metadata for all the buckets within each project, a DevOps or Site Reliability Engineer might use this Bash script to efficiently collect and organize the necessary data for analysis or reporting purposes. 3. Creating a backup of bucket metadata: In preparation for a migration or data transfer, a DevOps or Site Reliability Engineer could use this script to generate a JSON file containing the metadata for all buckets in multiple GCP projects as part of a backup process. 4. Auditing bucket access permissions: As a security measure, a DevOps or Site Reliability Engineer might utilize this script to regularly audit and review the access permissions for all buckets across various GCP projects to ensure compliance and proper data protection measures. 5. Automating routine tasks: When needing to frequently gather and consolidate bucket metadata from multiple GCP projects for monitoring or reporting purposes, a DevOps or Site Reliability Engineer could employ this script to automate the process and streamline their workflow.
Check GCP Bucket Security Configuration for `PROJECT_IDS`
Fetch GCP Bucket Storage Operations Rate for `PROJECT_IDS`

Source Code

Discoverable

Troubleshooting CheatSheet

GCP Storage Bucket Health

4 Troubleshooting Commands

Contributed by stewartshea

Codecollection: rw-cli-codecollection

This SLI uses the GCP API or gcloud to score bucket health. Produces a value between 0 (completely failing thet test) and 1 (fully passing the test). Looks for usage above a threshold and public buckets.

Tasks:

Fetch GCP Bucket Storage Utilization for `${PROJECT_IDS}`
Check GCP Bucket Security Configuration for `${PROJECT_IDS}`
Fetch GCP Bucket Storage Operations Rate for `${PROJECT_IDS}`
Generate Bucket Score in Project `${PROJECT_IDS}`

Source Code

GKE Kong Ingress Host Triage

3 Troubleshooting Commands

Contributed by stewartshea

Codecollection: rw-cli-codecollection

Collects Kong ingress host metrics from GMP on GCP and inspects the results for ingress with a HTTP error code rate greater than zero over a configurable duration and raises issues based on the number of ingress with error codes.

Tasks:

Check If Kong Ingress HTTP Error Rate Violates HTTP Error Threshold in GCP Project `GCP_PROJECT_ID`
Check If Kong Ingress HTTP Request Latency Violates Threshold in GCP Project `GCP_PROJECT_ID`
Check If Kong Ingress Controller Reports Upstream Errors in GCP Project `GCP_PROJECT_ID`

Source Code

Troubleshooting CheatSheet

Raises Issues

GCP Node Prempt List

1 Troubleshooting Commands

Contributed by stewartshea

Codecollection: rw-cli-codecollection

List all GCP nodes that have been preempted in the previous time interval.

Tasks:

List all nodes in an active preempt operation for GCP Project `GCP_PROJECT_ID` within the last `AGE` hours Show More
Common scenarios that might relate to this command or script:
1. Investigating and resolving performance issues in a Kubernetes cluster, such as high CPU or memory utilization, by identifying and addressing any bottlenecked or underperforming pods. 2. Troubleshooting and resolving Kubernetes CrashLoopBackoff events, which occur when a container repeatedly crashes after starting up, by identifying the root cause and implementing appropriate fixes. 3. Monitoring and optimizing resource allocation in a Kubernetes cluster to ensure efficient usage of compute instances and avoid potential overutilization or underutilization. 4. Managing service account authentication for accessing resources within a Google Cloud Platform project and ensuring proper authorization and permissions are in place for preempted compute instances. 5. Implementing automated alerting and reporting for preempted compute instances within a specific time frame in a Google Cloud Platform project, in order to gain insights into potential service disruptions and take proactive measures to mitigate them.

Source Code

Discoverable

Troubleshooting CheatSheet

Raises Issues

GCP Node Prempt List

1 Troubleshooting Commands

Contributed by stewartshea

Codecollection: rw-cli-codecollection

Counts nodes that have been preempted within the defined time interval.

Tasks:

Count the number of nodes in active preempt operation in project `${GCP_PROJECT_ID}`

Source Code

GKE Cluster Health

7 Troubleshooting Commands

Contributed by stewartshea

Codecollection: rw-cli-codecollection

Identify issues affecting GKE Clusters in a GCP Project

Tasks:

Identify GKE Service Account Issues in GCP Project `GCP_PROJECT_ID`
Fetch GKE Recommendations for GCP Project `GCP_PROJECT_ID`
Fetch GKE Cluster Health for GCP Project `GCP_PROJECT_ID`
Check for Quota Related GKE Autoscaling Issues in GCP Project `GCP_PROJECT_ID`
Validate GKE Node Sizes for GCP Project `GCP_PROJECT_ID`
Fetch GKE Cluster Operations for GCP Project `GCP_PROJECT_ID`
Check Node Pool Health for GCP Project `GCP_PROJECT_ID`

Source Code

Discoverable

GKE Cluster Health

6 Troubleshooting Commands

Contributed by stewartshea

Codecollection: rw-cli-codecollection

Identify issues affecting GKE Clusters in a GCP Project and creates a health score. A score of 1 is healthy, a score between 0 and 1 indicates unhealthy components.

Tasks:

Identify GKE Service Account Issues in GCP Project `${GCP_PROJECT_ID}`
Fetch GKE Recommendations for GCP Project `${GCP_PROJECT_ID}`
Fetch GKE Cluster Health for GCP Project `${GCP_PROJECT_ID}`
Check for Quota Related GKE Autoscaling Issues in GCP Project `${GCP_PROJECT_ID}`
Quick Node Instance Group Health Check for GCP Project `${GCP_PROJECT_ID}`
Generate GKE Cluster Health Score

Source Code

GCP Cloud Function Health

2 Troubleshooting Commands

Contributed by stewartshea

Codecollection: rw-cli-codecollection

Identify problems related to GCP Cloud Function deployments

Tasks:

List Unhealthy Cloud Functions in GCP Project `GCP_PROJECT_ID`
Get Error Logs for Unhealthy Cloud Functions in GCP Project `GCP_PROJECT_ID`

Source Code

Discoverable

Troubleshooting CheatSheet

GCP Cloud Function Health

1 Troubleshooting Commands

Contributed by stewartshea

Codecollection: rw-cli-codecollection

Count the number of Cloud Functions in an unhealthy state for a GCP Project.

Tasks:

Count unhealthy GCP Cloud Functions in GCP Project `${GCP_PROJECT_ID}`

Source Code

Kubernetes Ingress GCE & GCP HTTP Load Balancer Healthcheck

5 Troubleshooting Commands

Contributed by stewartshea

Codecollection: rw-cli-codecollection

Troubleshoot GCE Ingress Resources related to GCP HTTP Load Balancer in GKE

Tasks:

Search For GCE Ingress Warnings in GKE Context `CONTEXT`
Identify Unhealthy GCE HTTP Ingress Backends in GKE Namespace `NAMESPACE`
Validate GCP HTTP Load Balancer Configurations in GCP Project `GCP_PROJECT_ID`
Fetch Network Error Logs from GCP Operations Manager for Ingress Backends in GCP Project `GCP_PROJECT_ID`
Review GCP Operations Logging Dashboard in GCP project `GCP_PROJECT_ID`

Source Code

Troubleshooting CheatSheet

Raises Issues

GKE Nginx Ingress Host Triage

2 Troubleshooting Commands

Contributed by jon-funk

Codecollection: rw-cli-codecollection

Collects Nginx ingress host controller metrics from GMP on GCP and inspects the results for ingress with a HTTP error code rate greater than zero over a configurable duration and raises issues based on the number of ingress with error codes.

Tasks:

Fetch Nginx HTTP Errors From GMP for Ingress `INGRESS_OBJECT_NAME`
Find Owner and Service Health for Ingress `INGRESS_OBJECT_NAME`

Source Code

Troubleshooting CheatSheet

Raises Issues

GCP Gcloud Log Inspection

1 Troubleshooting Commands

Contributed by jon-funk

Codecollection: rw-cli-codecollection

Fetches logs from a GCP using a configurable query and raises an issue with details on the most common issues.

Tasks:

Inspect GCP Logs For Common Errors in GCP Project `GCP_PROJECT_ID`

Source Code

Raises Issues

GCP Vertex AI Model Garden Health

8 Troubleshooting Commands

Contributed by stewartshea

Codecollection: rw-cli-codecollection

Troubleshooting and remediation tasks for GCP Vertex AI Model Garden using Google Cloud Monitoring Python SDK. Required IAM Roles: - roles/monitoring.viewer (for metrics access) - roles/logging.privateLogViewer (for audit logs access) - roles/serviceusage.serviceUsageConsumer (for service status checks) Required Permissions: - monitoring.timeSeries.list - logging.privateLogEntries.list - serviceusage.services.list

Tasks:

Discover All Deployed Vertex AI Models in `GCP_PROJECT_ID`
Analyze Vertex AI Model Garden Error Patterns and Response Codes in `GCP_PROJECT_ID`
Investigate Vertex AI Model Latency Performance Issues in `GCP_PROJECT_ID`
Monitor Vertex AI Throughput and Token Consumption Patterns in `GCP_PROJECT_ID`
Check Vertex AI Model Garden API Logs for Issues in `GCP_PROJECT_ID`
Check Vertex AI Model Garden Service Health and Quotas in `GCP_PROJECT_ID`
Generate Vertex AI Model Garden Health Summary and Next Steps for `GCP_PROJECT_ID`
Generate Normalized Health Report Table for `GCP_PROJECT_ID`

Source Code

Troubleshooting CheatSheet

GCP Vertex AI Model Garden Health SLI

7 Troubleshooting Commands

Contributed by stewartshea

Codecollection: rw-cli-codecollection

Calculates SLI for GCP Vertex AI Model Garden health using Google Cloud Monitoring Python SDK. Required IAM Roles: - roles/monitoring.viewer (for metrics access) - roles/logging.privateLogViewer (for quick log health check) Required Permissions: - monitoring.timeSeries.list - logging.privateLogEntries.list

Tasks:

Quick Vertex AI Log Health Check for `${GCP_PROJECT_ID}`
Calculate Error Rate Score for `${GCP_PROJECT_ID}`
Calculate Latency Performance Score for `${GCP_PROJECT_ID}`
Calculate Throughput Usage Score for `${GCP_PROJECT_ID}`
Discover All Deployed Models for `${GCP_PROJECT_ID}`
Check Service Availability Score for `${GCP_PROJECT_ID}`
Generate Final Vertex AI Model Garden Health Score for `${GCP_PROJECT_ID}`

Source Code

Kong Ingress Health (GCP PromQL)

5 Troubleshooting Commands

Contributed by Shea Stewart

Codecollection: rw-public-codecollection

Uses promql on the Ops Suite API to determine the health of a Kong managed ingress resource and pushes the result as an SLI metric. Produces a 1 for a healthy resource, or 0 for an unhealthy resource.

Tasks:

Get Access Token
Get HTTP Error Rate
Get Upstream Health
Get Request Latency Rate
Generate Kong Ingress Score

Source Code

MongoDB Health (GCP PromQL)

8 Troubleshooting Commands

Contributed by Shea Stewart

Codecollection: rw-public-codecollection

Uses promql on the Ops Suite API to determine the health of a MongoDB database instance and pushes the result as an SLI metric. Produces a 1 for a healthy resource, or 0 for an unhealthy resource.

Tasks:

Get Access Token
Get Instance Status
Get Connection Utilization Rate
Get MongoDB Member State Health
Get MongoDB Replication Lag
Get MongoDB Queue Size
Get Assertion Rate
Generate MongoDB Score

Source Code

GCP Service Status

1 Troubleshooting Commands

Contributed by Jonathan Funk

Codecollection: rw-public-codecollection

This codebundle sets up a monitor for a specific region and GCP Product, which is then periodically checked for ongoing incidents based on the history available at https://status.cloud.google.com/incidents.json filtered based on severity level.

Tasks:

Get Number of GCP Incidents Effecting My Workspace

Source Code

GCP Operations Suite Log Query

1 Troubleshooting Commands

Contributed by Jonathan Funk

Codecollection: rw-public-codecollection

Retrieve the number of results of a GCP Log Explorer query.

Tasks:

Running GCE Logging Query And Pushing Result Count Metric

Source Code

GCP Operations Suite Prometheus Query

1 Troubleshooting Commands

Contributed by Shea Stewart

Codecollection: rw-public-codecollection

Performs a metric query using a PromQL statement on the Ops Suite API and pushes the result as an SLI metric.

Tasks:

Run Prometheus Instant Query Against Google Prom API Endpoint

Source Code

GCP Operations Suite Metric Query

1 Troubleshooting Commands

Contributed by

Codecollection: rw-public-codecollection

Performs a metric query using a Google MQL statement on the Ops Suite API and pushes the result as an SLI metric.

Tasks:

Running GCP OpsSuite Metric Query

Source Code

GCP GCloud Generic Report

1 Troubleshooting Commands

Contributed by Jonathan Funk

Codecollection: rw-public-codecollection

Run arbitrary gcloud commands and capture the stdout in a report.

Tasks:

Run Gcloud CLI Command and Push metric

Source Code

GCP GCloud Generic Metric

1 Troubleshooting Commands

Contributed by Jonathan Funk

Codecollection: rw-public-codecollection

Run arbitrary gcloud commands and parse their output for arbitrary values such as json to be submitted as a metric.

Tasks:

Run Gcloud CLI Command and Push metric

Source Code

GCP Operations Suite Log Query Dashboard URL

1 Troubleshooting Commands

Contributed by Jonathan Funk

Codecollection: rw-public-codecollection

Generate a link to the GCP Log Explorer.

Tasks:

Get GCP Log Dashboard URL For Given Log Query

Source Code

GCP CLI Command

1 Troubleshooting Commands

Contributed by stewartshea

Codecollection: rw-generic-codecollection

Runs a user provided gcloud command

Tasks:

TASK_TITLE

Source Code

Metric from GCP CLI Command

1 Troubleshooting Commands

Contributed by stewartshea

Codecollection: rw-generic-codecollection

Runs a user provided gcloud command and pushes the metric to the RunWhen Platform. The supplied command must result in distinct single metric. Command line tools like jq are available.

Tasks:

${TASK_TITLE}

Source Code