GMP

Icon

Icon 1 1 Troubleshooting Commands

Icon 2 Contributed by Shea Stewart

Icon 2 Codecollection: rw-public-codecollection


Performs a metric query using a PromQL statement on the Ops Suite API and pushes the result as an SLI metric.

Tasks:
  • Run Prometheus Instant Query Against Google Prom API Endpoint

Icon 1 8 Troubleshooting Commands

Icon 2 Contributed by Shea Stewart

Icon 2 Codecollection: rw-public-codecollection


Uses promql on the Ops Suite API to determine the health of a MongoDB database instance and pushes the result as an SLI metric. Produces a 1 for a healthy resource, or 0 for an unhealthy resource.

Tasks:
  • Get Access Token
  • Get Instance Status
  • Get Connection Utilization Rate
  • Get MongoDB Member State Health
  • Get MongoDB Replication Lag
  • Get MongoDB Queue Size
  • Get Assertion Rate
  • Generate MongoDB Score

Icon 1 5 Troubleshooting Commands

Icon 2 Contributed by Shea Stewart

Icon 2 Codecollection: rw-public-codecollection


Uses promql on the Ops Suite API to determine the health of a Kong managed ingress resource and pushes the result as an SLI metric. Produces a 1 for a healthy resource, or 0 for an unhealthy resource.

Tasks:
  • Get Access Token
  • Get HTTP Error Rate
  • Get Upstream Health
  • Get Request Latency Rate
  • Generate Kong Ingress Score

Icon 1 3 Troubleshooting Commands

Icon 2 Contributed by stewartshea

Icon 2 Codecollection: rw-cli-codecollection


Collects Kong ingress host metrics from GMP on GCP and inspects the results for ingress with a HTTP error code rate greater than zero over a configurable duration and raises issues based on the number of ingress with error codes.

Tasks:
  • Check If Kong Ingress HTTP Error Rate Violates HTTP Error Threshold Show More
    Common scenarios that might relate to this command or script:
    1. Troubleshooting Kubernetes CrashLoopBackoff events: A DevOps or Site Reliability Engineer may use this command to monitor HTTP error rates for a specific service running on Google Cloud, in order to identify and resolve any issues causing the CrashLoopBackoff events. 2. Investigating performance issues: If there are performance issues with a service running on Google Cloud, the DevOps or Site Reliability Engineer may utilize this command to monitor and analyze the HTTP error rates for the service, and identify any potential bottlenecks or issues affecting performance. 3. Conducting routine monitoring and analysis: As part of regular maintenance and monitoring tasks, the DevOps or Site Reliability Engineer may use this command to periodically check and track HTTP error rates for specific services running on Google Cloud, in order to ensure that they are operating efficiently and within expected parameters. 4. Incident response and troubleshooting: In the event of an incident or outage involving a specific service on Google Cloud, the DevOps or Site Reliability Engineer can use this command to quickly gather information about HTTP error rates and help diagnose the root cause of the issue. 5. Performance optimization and capacity planning: When optimizing the performance of services running on Google Cloud or planning for future capacity needs, the DevOps or Site Reliability Engineer may use this command to gather data on HTTP error rates and make informed decisions about resource allocation and infrastructure adjustments.
  • Check If Kong Ingress HTTP Request Latency Violates Threshold Show More
    Common scenarios that might relate to this command or script:
    1. Troubleshooting a Kubernetes CrashLoopBackoff event: DevOps or SRE may need to use this command to monitor the request latency for services running on Kubernetes to identify if any service is causing the crash loop. 2. Investigating performance issues in a microservices architecture: The command can be used to check the HTTP request latency for different services and identify any potential bottlenecks or performance issues. 3. Monitoring and alerting for SLA violations: DevOps or SRE may use this command as part of their monitoring and alerting systems to detect if any service is not meeting its Service Level Agreement (SLA) in terms of request latency. 4. Conducting regular performance checks and optimizations: This command can be automated to run at regular intervals to proactively identify and optimize the request latency for different services, improving overall system performance. 5. Investigating customer-reported performance complaints: If customers report slow response times from a particular service, DevOps or SRE can use this command to investigate and validate the reported performance issues.
  • Check If Kong Ingress Controller Reports Upstream Errors Show More
    Common scenarios that might relate to this command or script:
    1. Monitoring and troubleshooting Kubernetes CrashLoopBackoff events to identify the root cause of application crashes and implement remediation strategies. 2. Checking the health of a particular service in Google Cloud Platform and identifying any issues or anomalies in its performance or availability. 3. Investigating and resolving issues with service accounts in Google Cloud Platform, such as permissions errors or misconfigurations. 4. Automating the monitoring and health checks of multiple services in a Kubernetes cluster to ensure continuous availability and performance. 5. Integrating this command into a larger incident response and alerting system to proactively detect and address potential service disruptions or failures.

Icon 1 2 Troubleshooting Commands

Icon 2 Contributed by jon-funk

Icon 2 Codecollection: rw-cli-codecollection


Collects Nginx ingress host controller metrics from GMP on GCP and inspects the results for ingress with a HTTP error code rate greater than zero over a configurable duration and raises issues based on the number of ingress with error codes.

Tasks:
  • Fetch Nginx HTTP Errors From GMP for Ingress `INGRESS_OBJECT_NAME`
  • Find Owner and Service Health for Ingress `INGRESS_OBJECT_NAME`