GCP
|
Runs a user provided gcloud command and pushes the metric to the RunWhen Platform. The supplied command must result in distinct single metric. Command line tools like jq are available.
Tasks:
Tasks:
- ${TASK_TITLE}
List all GCP nodes that have been preempted in the previous time interval.
Tasks:
Tasks:
- List all nodes in an active preempt operation for GCP Project `GCP_PROJECT_ID` within the last `AGE` hours Show More
Counts nodes that have been preempted within the defined time interval.
Tasks:
Tasks:
- Count the number of nodes in active preempt operation in project `${GCP_PROJECT_ID}`
Inspect GCP Storage bucket usage and configuration.
Tasks:
Tasks:
- Fetch GCP Bucket Storage Utilization for `PROJECT_IDS`
- Add GCP Bucket Storage Configuration for `PROJECT_IDS` to Report Show More
- Check GCP Bucket Security Configuration for `PROJECT_IDS`
- Fetch GCP Bucket Storage Operations Rate for `PROJECT_IDS`
This SLI uses the GCP API or gcloud to score bucket health. Produces a value between 0 (completely failing thet test) and 1 (fully passing the test). Looks for usage above a threshold and public buckets.
Tasks:
Tasks:
- Fetch GCP Bucket Storage Utilization for `${PROJECT_IDS}`
- Check GCP Bucket Security Configuration for `${PROJECT_IDS}`
- Fetch GCP Bucket Storage Operations Rate for `${PROJECT_IDS}`
- Generate Bucket Score in Project `${PROJECT_IDS}`
Identify issues affecting GKE Clusters in a GCP Project
Tasks:
Tasks:
- Identify GKE Service Account Issues in GCP Project `GCP_PROJECT_ID`
- Fetch GKE Recommendations for GCP Project `GCP_PROJECT_ID`
- Fetch GKE Cluster Health for GCP Project `GCP_PROJECT_ID`
- Check for Quota Related GKE Autoscaling Issues in GCP Project `GCP_PROJECT_ID`
- Validate GKE Node Sizes for GCP Project `GCP_PROJECT_ID`
- Fetch GKE Cluster Operations for GCP Project `GCP_PROJECT_ID`
- Check Node Pool Health for GCP Project `GCP_PROJECT_ID`
Identify issues affecting GKE Clusters in a GCP Project and creates a health score. A score of 1 is healthy, a score between 0 and 1 indicates unhealthy components.
Tasks:
Tasks:
- Identify GKE Service Account Issues in GCP Project `${GCP_PROJECT_ID}`
- Fetch GKE Recommendations for GCP Project `${GCP_PROJECT_ID}`
- Fetch GKE Cluster Health for GCP Project `${GCP_PROJECT_ID}`
- Check for Quota Related GKE Autoscaling Issues in GCP Project `${GCP_PROJECT_ID}`
- Quick Node Instance Group Health Check for GCP Project `${GCP_PROJECT_ID}`
- Generate GKE Cluster Health Score
Troubleshooting and remediation tasks for GCP Vertex AI Model Garden using Google Cloud Monitoring Python SDK.
Required IAM Roles:
- roles/monitoring.viewer (for metrics access)
- roles/logging.privateLogViewer (for audit logs access)
- roles/serviceusage.serviceUsageConsumer (for service status checks)
Required Permissions:
- monitoring.timeSeries.list
- logging.privateLogEntries.list
- serviceusage.services.list
Tasks:
Tasks:
- Discover All Deployed Vertex AI Models in `GCP_PROJECT_ID`
- Analyze Vertex AI Model Garden Error Patterns and Response Codes in `GCP_PROJECT_ID`
- Investigate Vertex AI Model Latency Performance Issues in `GCP_PROJECT_ID`
- Monitor Vertex AI Throughput and Token Consumption Patterns in `GCP_PROJECT_ID`
- Check Vertex AI Model Garden API Logs for Issues in `GCP_PROJECT_ID`
- Check Vertex AI Model Garden Service Health and Quotas in `GCP_PROJECT_ID`
- Generate Vertex AI Model Garden Health Summary and Next Steps for `GCP_PROJECT_ID`
- Generate Normalized Health Report Table for `GCP_PROJECT_ID`
Calculates SLI for GCP Vertex AI Model Garden health using Google Cloud Monitoring Python SDK.
Required IAM Roles:
- roles/monitoring.viewer (for metrics access)
- roles/logging.privateLogViewer (for quick log health check)
Required Permissions:
- monitoring.timeSeries.list
- logging.privateLogEntries.list
Tasks:
Tasks:
- Quick Vertex AI Log Health Check for `${GCP_PROJECT_ID}`
- Calculate Error Rate Score for `${GCP_PROJECT_ID}`
- Calculate Latency Performance Score for `${GCP_PROJECT_ID}`
- Calculate Throughput Usage Score for `${GCP_PROJECT_ID}`
- Discover All Deployed Models for `${GCP_PROJECT_ID}`
- Check Service Availability Score for `${GCP_PROJECT_ID}`
- Generate Final Vertex AI Model Garden Health Score for `${GCP_PROJECT_ID}`
Collects Kong ingress host metrics from GMP on GCP and inspects the results for ingress with a HTTP error code rate greater than zero
over a configurable duration and raises issues based on the number of ingress with error codes.
Tasks:
Tasks:
- Check If Kong Ingress HTTP Error Rate Violates HTTP Error Threshold in GCP Project `GCP_PROJECT_ID`
- Check If Kong Ingress HTTP Request Latency Violates Threshold in GCP Project `GCP_PROJECT_ID`
- Check If Kong Ingress Controller Reports Upstream Errors in GCP Project `GCP_PROJECT_ID`
Troubleshoot GCE Ingress Resources related to GCP HTTP Load Balancer in GKE
Tasks:
Tasks:
- Search For GCE Ingress Warnings in GKE Context `CONTEXT`
- Identify Unhealthy GCE HTTP Ingress Backends in GKE Namespace `NAMESPACE`
- Validate GCP HTTP Load Balancer Configurations in GCP Project `GCP_PROJECT_ID`
- Fetch Network Error Logs from GCP Operations Manager for Ingress Backends in GCP Project `GCP_PROJECT_ID`
- Review GCP Operations Logging Dashboard in GCP project `GCP_PROJECT_ID`
Collects Nginx ingress host controller metrics from GMP on GCP and inspects the results for ingress with a HTTP error code rate greater than zero
over a configurable duration and raises issues based on the number of ingress with error codes.
Tasks:
Tasks:
- Fetch Nginx HTTP Errors From GMP for Ingress `INGRESS_OBJECT_NAME`
- Find Owner and Service Health for Ingress `INGRESS_OBJECT_NAME`
Fetches logs from a GCP using a configurable query and raises an issue with details on the most common issues.
Tasks:
Tasks:
- Inspect GCP Logs For Common Errors in GCP Project `GCP_PROJECT_ID`
This taskset performs comprehensive DNS health monitoring and validation tasks.
Includes DNS zone record validation, broken DNS resolution detection,
forward lookup zone testing, external resolution validation, and latency monitoring.
Provides detailed issue reporting with severity levels and actionable next steps.
Supports multiple FQDNs, zones, and generic DNS monitoring scenarios.
Tasks:
Tasks:
- Check DNS Zone Records
- Detect Broken Record Resolution
- Test Forward Lookup Zones
- External Resolution Validation
- DNS Latency Check
This SLI measures DNS health metrics including resolution success rates,
latency measurements, DNS zone health, and external DNS resolver availability.
Provides binary scoring (0/1) for each metric and calculates an overall DNS health score.
Supports multiple FQDNs, DNS zones, forward lookup zones, and external resolver testing.
Tasks:
Tasks:
- DNS Resolution Success Rate
- DNS Query Latency
- DNS Zone Health
- External DNS Resolver Availability
- Generate DNS Health Score
Identify problems related to GCP Cloud Function deployments
Tasks:
Tasks:
- List Unhealthy Cloud Functions in GCP Project `GCP_PROJECT_ID`
- Get Error Logs for Unhealthy Cloud Functions in GCP Project `GCP_PROJECT_ID`
Count the number of Cloud Functions in an unhealthy state for a GCP Project.
Tasks:
Tasks:
- Count unhealthy GCP Cloud Functions in GCP Project `${GCP_PROJECT_ID}`
Generate a link to the GCP Log Explorer.
Tasks:
Tasks:
- Get GCP Log Dashboard URL For Given Log Query
Retrieve the number of results of a GCP Log Explorer query.
Tasks:
Tasks:
- Running GCE Logging Query And Pushing Result Count Metric
Performs a metric query using a PromQL statement on the Ops Suite API
and pushes the result as an SLI metric.
Tasks:
Tasks:
- Run Prometheus Instant Query Against Google Prom API Endpoint
Performs a metric query using a Google MQL statement on the Ops Suite API
and pushes the result as an SLI metric.
Tasks:
Tasks:
- Running GCP OpsSuite Metric Query
This codebundle sets up a monitor for a specific region and GCP Product, which is then periodically checked for
ongoing incidents based on the history available at https://status.cloud.google.com/incidents.json filtered based on severity level.
Tasks:
Tasks:
- Get Number of GCP Incidents Effecting My Workspace
Run arbitrary gcloud commands and capture the stdout in a report.
Tasks:
Tasks:
- Run Gcloud CLI Command and Push metric
Run arbitrary gcloud commands and parse their output for arbitrary values such as json to be submitted as a metric.
Tasks:
Tasks:
- Run Gcloud CLI Command and Push metric
Uses promql on the Ops Suite API to determine the health of a MongoDB database instance
and pushes the result as an SLI metric. Produces a 1 for a healthy resource, or 0 for an unhealthy resource.
Tasks:
Tasks:
- Get Access Token
- Get Instance Status
- Get Connection Utilization Rate
- Get MongoDB Member State Health
- Get MongoDB Replication Lag
- Get MongoDB Queue Size
- Get Assertion Rate
- Generate MongoDB Score
Uses promql on the Ops Suite API to determine the health of a Kong managed ingress resource
and pushes the result as an SLI metric. Produces a 1 for a healthy resource, or 0 for an unhealthy resource.
Tasks:
Tasks:
- Get Access Token
- Get HTTP Error Rate
- Get Upstream Health
- Get Request Latency Rate
- Generate Kong Ingress Score