Rules

container_cpu_usage_is_high

5.24s ago

11.11ms

Rule State Error Last Evaluation Evaluation Time
alert: POD_CPU_IS_HIGH expr: sum by(container, pod, namespace) (rate(container_cpu_usage_seconds_total{container!=""}[5m])) * 100 > 90 for: 1m labels: severity: critical annotations: description: Container {{ $labels.container }} CPU usage inside POD {{ $labels.pod}} is high in {{ $labels.namespace}} summary: POD {{ $labels.pod}} CPU Usage is high in {{ $labels.namespace}} ok 5.24s ago 11.1ms

container_memory_usage_is_high

15.802s ago

17.89ms

Rule State Error Last Evaluation Evaluation Time
alert: POD_MEMORY_USAGE_IS_HIGH expr: (sum by(container, pod, namespace) (container_memory_working_set_bytes{container!=""}) / sum by(container, pod, namespace) (container_spec_memory_limit_bytes > 0) * 100) > 80 for: 1m labels: severity: critical annotations: description: |- Container Memory usage is above 80% VALUE = {{ $value }} LABELS = {{ $labels }} summary: Container {{ $labels.container }} Memory usage inside POD {{ $labels.pod}} is high in {{ $labels.namespace}} ok 15.803s ago 17.88ms

node_cpu_greater_than_80

50.499s ago

7.462ms

Rule State Error Last Evaluation Evaluation Time
alert: NODE_CPU_IS_HIGH expr: 100 - (avg by(instance) (rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100) > 90 for: 1m labels: severity: critical annotations: description: node {{ $labels.kubernetes_node }} cpu is high summary: node cpu is greater than 80 precent ok 50.499s ago 7.449ms

node_disk_space_too_low

45.62s ago

1.141ms

Rule State Error Last Evaluation Evaluation Time
alert: NODE_DISK_SPACE_IS_LOW expr: (100 * ((node_filesystem_avail_bytes{fstype!="rootfs",mountpoint="/"}) / (node_filesystem_size_bytes{fstype!="rootfs",mountpoint="/"}))) < 10 for: 1m labels: severity: critical annotations: description: node {{ $labels.node }} disk space is only {{ printf "%0.2f" $value }}% free. summary: node disk space remaining is less than 10 percent ok 45.62s ago 1.128ms

node_down

57ms ago

707.9us

Rule State Error Last Evaluation Evaluation Time
alert: NODE_DOWN expr: up{component="node-exporter"} == 0 for: 3m labels: severity: warning annotations: description: '{{ $labels.job }} job failed to scrape instance {{ $labels.instance }} for more than 3 minutes. Node Seems to be down' summary: Node {{ $labels.kubernetes_node }} is down ok 57ms ago 693.2us

node_memory_left_lessser_than_10

10.772s ago

2.563ms

Rule State Error Last Evaluation Evaluation Time
alert: NODE_MEMORY_LESS_THAN_10% expr: node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes * 100 < 10 for: 1m labels: severity: critical annotations: description: node {{ $labels.kubernetes_node }} memory left is low summary: node memory left is lesser than 10 precent ok 10.772s ago 2.55ms

Front50-cache

28.813s ago

476.4us

Rule State Error Last Evaluation Evaluation Time
alert: front50:storageServiceSupport:cacheAge__value expr: front50:storageServiceSupport:cacheAge__value > 300000 for: 2m labels: severity: warning annotations: description: front50 cacheAge for {{$labels.pod}} in namespace {{$labels.namespace}} has value = {{$value}} summary: front50 cacheAge too high ok 28.813s ago 463.8us

autopilot-component-jvm-errors

3.134s ago

5.399ms

Rule State Error Last Evaluation Evaluation Time
alert: jvm-memory-filling-up-for-oes-audit-client expr: (sum by(instance, kubernetes_pod_name, component, kubernetes_namespace) (jvm_memory_used_bytes{app="oes",area="heap",component="auditclient"}) / sum by(instance, kubernetes_pod_name, component, kubernetes_namespace) (jvm_memory_max_bytes{app="oes",area="heap",component="auditclient"})) * 100 > 90 for: 5m labels: severity: warning annotations: description: |- JVM memory is filling up for {{ $labels.kubernetes_pod_name }} in namespace {{ $labels.kubernetes_namespace }} (> 80%) VALUE = {{ $value }} summary: JVM memory filling up for {{ $labels.component }} for pod {{ $labels.kubernetes_pod_name }} in namespace {{ $labels.kubernetes_namespace }}) ok 3.134s ago 1.421ms
alert: jvm-memory-filling-up-for-oes-autopilot expr: (sum by(instance, kubernetes_pod_name, component, kubernetes_namespace) (jvm_memory_used_bytes{app="oes",area="heap",component="autopilot"}) / sum by(instance, kubernetes_pod_name, component, kubernetes_namespace) (jvm_memory_max_bytes{app="oes",area="heap",component="autopilot"})) * 100 > 90 for: 5m labels: severity: warning annotations: description: |- JVM memory is filling up for {{ $labels.kubernetes_pod_name }} in namespace {{ $labels.kubernetes_namespace }} (> 80%) VALUE = {{ $value }} summary: JVM memory filling up for {{ $labels.component }} for pod {{ $labels.kubernetes_pod_name }} in namespace {{ $labels.kubernetes_namespace }}) ok 3.133s ago 676.1us
alert: jvm-memory-filling-up-for-oes-dashboard expr: (sum by(instance, kubernetes_pod_name, component, kubernetes_namespace) (jvm_memory_used_bytes{app="oes",area="heap",component="dashboard"}) / sum by(instance, kubernetes_pod_name, component, kubernetes_namespace) (jvm_memory_max_bytes{app="oes",area="heap",component="autopilot"})) * 100 > 90 for: 5m labels: severity: warning annotations: description: |- JVM memory is filling up for {{ $labels.kubernetes_pod_name }} in namespace {{ $labels.kubernetes_namespace }} (> 80%) VALUE = {{ $value }} summary: JVM memory filling up for {{ $labels.component }} for pod {{ $labels.kubernetes_pod_name }} in namespace {{ $labels.kubernetes_namespace }}) ok 3.132s ago 719.4us
alert: jvm-memory-filling-up-for-oes-platform expr: (sum by(instance, kubernetes_pod_name, component, kubernetes_namespace) (jvm_memory_used_bytes{app="oes",area="heap",component="platform"}) / sum by(instance, kubernetes_pod_name, component, kubernetes_namespace) (jvm_memory_max_bytes{app="oes",area="heap",component="platform"})) * 100 > 90 for: 5m labels: severity: warning annotations: description: |- JVM memory is filling up for {{ $labels.kubernetes_pod_name }} in namespace {{ $labels.kubernetes_namespace }} (> 80%) VALUE = {{ $value }} summary: JVM memory filling up for {{ $labels.component }} for pod {{ $labels.kubernetes_pod_name }} in namespace {{ $labels.kubernetes_namespace }}) ok 3.132s ago 852.7us
alert: jvm-memory-filling-up-for-oes-sapor expr: (sum by(instance, kubernetes_pod_name, component, kubernetes_namespace) (jvm_memory_used_bytes{app="oes",area="heap",component="sapor"}) / sum by(instance, kubernetes_pod_name, component, kubernetes_namespace) (jvm_memory_max_bytes{app="oes",area="heap",component="sapor"})) * 100 > 90 for: 5m labels: severity: warning annotations: description: |- JVM memory is filling up for {{ $labels.kubernetes_pod_name }} in namespace {{ $labels.kubernetes_namespace }} (> 80%) VALUE = {{ $value }} summary: JVM memory filling up for {{ $labels.component }} for pod {{ $labels.kubernetes_pod_name }} in namespace {{ $labels.kubernetes_namespace }}) ok 3.131s ago 849.5us
alert: jvm-memory-filling-up-for-oes-visibility expr: (sum by(instance, kubernetes_pod_name, component, kubernetes_namespace) (jvm_memory_used_bytes{app="oes",area="heap",component="visibility"}) / sum by(instance, kubernetes_pod_name, component, kubernetes_namespace) (jvm_memory_max_bytes{app="oes",area="heap",component="visibility"})) * 100 > 90 for: 5m labels: severity: warning annotations: description: |- JVM memory is filling up for {{ $labels.kubernetes_pod_name }} in namespace {{ $labels.kubernetes_namespace }} (> 80%) VALUE = {{ $value }} summary: JVM memory filling up for {{ $labels.component }} for pod {{ $labels.kubernetes_pod_name }} in namespace {{ $labels.kubernetes_namespace }}) ok 3.131s ago 840us

autopilot-component-latency-too-high

9.87s ago

9.794ms

Rule State Error Last Evaluation Evaluation Time
alert: oes-audit-client-latency-too-high expr: sum by(kubernetes_pod_name, method, outcome, status, component, kubernetes_namespace, uri) (rate(http_server_requests_seconds_sum{component="auditclient"}[2m])) / sum by(kubernetes_pod_name, method, outcome, status, component, kubernetes_namespace, uri) (rate(http_server_requests_seconds_count{component="auditclient"}[2m])) > 0.5 for: 2m labels: severity: warning annotations: description: Latency of the component {{ $labels.component }} is {{ $value }} seconds for {{ $labels }} summary: Latency of the component {{ $labels.component }} in namespace {{$labels.kubernetes_namespace}} is high ok 9.87s ago 1.823ms
alert: oes-autopilot-latency-too-high expr: sum by(kubernetes_pod_name, method, outcome, status, component, kubernetes_namespace, uri) (rate(http_server_requests_seconds_sum{component="autopilot"}[2m])) / sum by(kubernetes_pod_name, method, outcome, status, component, kubernetes_namespace, uri) (rate(http_server_requests_seconds_count{component="autopilot"}[2m])) > 0.5 for: 2m labels: severity: warning annotations: description: Latency of the component {{ $labels.component }} is {{ $value }} seconds for {{ $labels }} summary: Latency of the component {{ $labels.component }} in namespace {{$labels.kubernetes_namespace}} is high ok 9.874s ago 705.7us
alert: oes-dashboard-latency-too-high expr: sum by(kubernetes_pod_name, method, outcome, status, component, kubernetes_namespace, uri) (rate(http_server_requests_seconds_sum{component="dashboard"}[2m])) / sum by(kubernetes_pod_name, method, outcome, status, component, kubernetes_namespace, uri) (rate(http_server_requests_seconds_count{component="dashboard"}[2m])) > 0.5 for: 2m labels: severity: warning annotations: description: Latency of the component {{ $labels.component }} is {{ $value }} seconds for {{ $labels }} summary: Latency of the component {{ $labels.component }} in namespace {{$labels.kubernetes_namespace}} is high ok 9.873s ago 1.927ms
alert: oes-platform-latency-too-high expr: sum by(kubernetes_pod_name, method, outcome, status, component, kubernetes_namespace, uri) (rate(http_server_requests_seconds_sum{component="platform"}[2m])) / sum by(kubernetes_pod_name, method, outcome, status, component, kubernetes_namespace, uri) (rate(http_server_requests_seconds_count{component="platform"}[2m])) > 0.5 for: 2m labels: severity: warning annotations: description: Latency of the component {{ $labels.component }} is {{ $value }} seconds for {{ $labels }} summary: Latency of the component {{ $labels.component }} in namespace {{$labels.kubernetes_namespace}} is high ok 9.872s ago 3.079ms
alert: oes-sapor-latency-too-high expr: sum by(kubernetes_pod_name, method, outcome, status, component, kubernetes_namespace, uri) (rate(http_server_requests_seconds_sum{component="sapor"}[2m])) / sum by(kubernetes_pod_name, method, outcome, status, component, kubernetes_namespace, uri) (rate(http_server_requests_seconds_count{component="sapor"}[2m])) > 0.5 for: 2m labels: severity: warning annotations: description: Latency of the component {{ $labels.component }} is {{ $value }} seconds for {{ $labels }} summary: Latency of the component {{ $labels.component }} in namespace {{$labels.kubernetes_namespace}} is high ok 9.869s ago 1.067ms
alert: oes-visibility-latency-too-high expr: sum by(kubernetes_pod_name, method, outcome, status, component, kubernetes_namespace, uri) (rate(http_server_requests_seconds_sum{component="visibility"}[2m])) / sum by(kubernetes_pod_name, method, outcome, status, component, kubernetes_namespace, uri) (rate(http_server_requests_seconds_count{component="visibility"}[2m])) > 0.5 for: 2m labels: severity: warning annotations: description: Latency of the component {{ $labels.component }} is {{ $value }} seconds for {{ $labels }} summary: Latency of the component {{ $labels.component }} in namespace {{$labels.kubernetes_namespace}} is high ok 9.868s ago 1.139ms

autopilot-scrape-target-is-down

31.497s ago

6.519ms

Rule State Error Last Evaluation Evaluation Time
alert: oes-audit-client-scrape-target-is-down expr: up{component="auditclient"} == 0 labels: severity: critical annotations: description: The scrape target endpoint of component {{$labels.component}} in namespace {{$labels.kubernetes_namespace}} is down summary: oes-audit-client scrape target is down ok 31.497s ago 1.208ms
alert: oes-autopilot-scrape-target-is-down expr: up{component="autopilot"} == 0 labels: severity: critical annotations: description: The scrape target endpoint of component {{$labels.component}} in namespace {{$labels.kubernetes_namespace}} is down summary: oes-autopilot scrape target is down ok 31.496s ago 1.853ms
alert: oes-dashboard-scrape-target-is-down expr: up{component="dashboard"} == 0 labels: severity: critical annotations: description: The scrape target endpoint of component {{$labels.component}} in namespace {{$labels.kubernetes_namespace}} is down summary: oes-dashboard scrape target is down ok 31.495s ago 1.014ms
alert: oes-platform-scrape-target-is-down expr: up{component="platform"} == 0 labels: severity: critical annotations: description: The scrape target endpoint of component {{$labels.component}} in namespace {{$labels.kubernetes_namespace}} is down summary: oes-platform scrape target is down ok 31.494s ago 840.7us
alert: oes-sapor-scrape-target-is-down expr: up{component="sapor"} == 0 labels: severity: critical annotations: description: The scrape target endpoint of component {{$labels.component}} in namespace {{$labels.kubernetes_namespace}} is down summary: oes-sapor scrape target is down ok 31.493s ago 1.245ms
alert: oes-visibility-scrape-target-is-down expr: up{component="visibility"} == 0 labels: severity: critical annotations: description: The scrape target endpoint of component {{$labels.component}} in namespace {{$labels.kubernetes_namespace}} is down summary: oes-visibility scrape target is down ok 31.492s ago 312us

igor-needs-attention

32.63s ago

367.6us

Rule State Error Last Evaluation Evaluation Time
alert: igor-needs-attention expr: igor:pollingMonitor:itemsOverThreshold__value > 0 labels: severity: crtical annotations: description: Igor in namespace {{$labels.namespace}} needs human help summary: Igor needs attention ok 32.631s ago 350.7us

jvm-too-high

35.189s ago

2.977ms

Rule State Error Last Evaluation Evaluation Time
alert: clouddriver-rw-pod-may-be-evicted-soon expr: (sum by(instance, area) (clouddriver_rw:jvm:memory:used__value) / sum by(instance, area) (clouddriver_rw:jvm:memory:max__value)) > 0.9 labels: severity: warning annotations: description: Service {{$labels.service}} in namespace {{$labels.namespace}} may be evicted soon summary: Clouddriver-rw JVM memory too high ok 35.189s ago 830.7us
alert: clouddriver-ro-pod-may-be-evicted-soon expr: (sum by(instance, area) (clouddriver_ro:jvm:memory:used__value) / sum by(instance, area) (clouddriver_ro:jvm:memory:max__value)) > 0.9 labels: severity: warning annotations: description: Service {{$labels.service}} in namespace {{$labels.namespace}} may be evicted soon summary: Clouddriver-ro JVM memory too high ok 35.189s ago 265.2us
alert: clouddriver-caching-pod-may-be-evicted-soon expr: (sum by(instance, area) (clouddriver_caching:jvm:memory:used__value) / sum by(instance, area) (clouddriver_caching:jvm:memory:max__value)) > 0.9 labels: severity: warning annotations: description: Service {{$labels.service}} in namespace {{$labels.namespace}} may be evicted soon summary: Clouddriver-caching JVM memory too high ok 35.189s ago 256.8us
alert: gate-pod-may-be-evicted-soon expr: (sum by(instance, area) (gate:jvm:memory:used__value) / sum by(instance, area) (gate:jvm:memory:max__value)) > 0.9 labels: severity: warning annotations: description: Service {{$labels.service}} in namespace {{$labels.namespace}} may be evicted soon summary: gate JVM memory too high ok 35.189s ago 234.9us
alert: orca-pod-may-be-evicted-soon expr: (sum by(instance, area) (orca:jvm:gc:liveDataSize__value) / sum by(instance, area) (orca:jvm:gc:maxDataSize__value)) > 0.9 labels: severity: warning annotations: description: Service {{$labels.service}} in namespace {{$labels.namespace}} may be evicted soon summary: orca JVM memory too high ok 35.189s ago 242.1us
alert: igor-pod-may-be-evicted-soon expr: (sum by(instance, area) (igor:jvm:memory:used__value) / sum by(instance, area) (igor:jvm:memory:max__value)) > 0.9 labels: severity: warning annotations: description: Service {{$labels.service}} in namespace {{$labels.namespace}} may be evicted soon summary: igor JVM memory too high ok 35.188s ago 274.5us
alert: echo-scheduler-pod-may-be-evicted-soon expr: (sum by(instance, area) (echo_scheduler:jvm:memory:used__value) / sum by(instance, area) (echo_scheduler:jvm:memory:max__value)) > 0.9 labels: severity: warning annotations: description: Service {{$labels.service}} in namespace {{$labels.namespace}} may be evicted soon summary: echo-scheduler JVM memory too high ok 35.188s ago 270.6us
alert: echo-worker-pod-may-be-evicted-soon expr: (sum by(instance, area) (echo_worker:jvm:memory:used__value) / sum by(instance, area) (echo_worker:jvm:memory:max__value)) > 0.9 labels: severity: warning annotations: description: Service {{$labels.service}} in namespace {{$labels.namespace}} may be evicted soon summary: echo-worker JVM memory too high ok 35.188s ago 311.3us
alert: front50-pod-may-be-evicted-soon expr: (sum by(instance, area) (front50:jvm:memory:used__value) / sum by(instance, area) (front50:jvm:memory:max__value)) > 0.9 labels: severity: warning annotations: description: Service {{$labels.service}} in namespace {{$labels.namespace}} may be evicted soon summary: Front50 JVM memory too high ok 35.188s ago 236us

kube-api-server-is-down

2.61s ago

578.5us

Rule State Error Last Evaluation Evaluation Time
alert: kube-api-server-down expr: up{job="kubernetes-apiservers"} == 0 for: 2m labels: severity: critical annotations: description: Kubernetes API Server service went down LABELS = {{ $labels }} summary: Kube API Server job {{ $labels.job }} is down ok 2.61s ago 565.3us

kubernetes-api-server-experiencing-high-error-rate

43.507s ago

40.88ms

Rule State Error Last Evaluation Evaluation Time
alert: kube-api-server-errors expr: sum(rate(apiserver_request_total{code=~"^(?:5..)$",job="kubernetes-apiservers"}[2m])) / sum(rate(apiserver_request_total{job="kubernetes-apiservers"}[2m])) * 100 > 3 for: 2m labels: severity: critical annotations: description: |- Kubernetes API server is experiencing high error rate VALUE = {{ $value }} LABELS = {{ $labels }} summary: Kubernetes API server errors (instance {{ $labels.instance }}) ok 43.507s ago 40.85ms

latency-too-high

32.285s ago

5.188ms

Rule State Error Last Evaluation Evaluation Time
alert: clouddriver-ro-latency-too-high expr: sum by(controller, instance, method, success, statusCode, service, namespace) (rate(clouddriver_ro:controller:invocations__total{service="spin-clouddriver-ro"}[5m])) / sum by(controller, instance, method, success, statusCode, service, namespace) (rate(clouddriver_ro:controller:invocations__count_total{service="spin-clouddriver-ro"}[5m])) > 1 for: 15m labels: severity: warning annotations: description: Latency of the Service {{$labels.service}} is {{$value}} seconds for {{ $labels }} summary: Latency of the service {{ $labels.service }} in namespace {{$labels.namespace}} is high ok 32.285s ago 763.1us
alert: clouddriver-rw-latency-too-high expr: sum by(controller, instance, method, success, statusCode, service, namespace) (rate(clouddriver_rw:controller:invocations__total{service="spin-clouddriver-rw"}[5m])) / sum by(controller, instance, method, success, statusCode, service, namespace) (rate(clouddriver_rw:controller:invocations__count_total{service="spin-clouddriver-rw"}[5m])) > 0.5 for: 15m labels: severity: warning annotations: description: Latency of the Service {{$labels.service}} is ({{$value}} seconds for {{ $labels }} summary: Latency of the service {{ $labels.service }} in namespace {{$labels.namespace}} is high ok 32.284s ago 446.7us
alert: clouddriver-caching-latency-too-high expr: sum by(controller, instance, method, success, statusCode, service, namespace) (rate(clouddriver_caching:controller:invocations__total{service="spin-clouddriver-caching"}[5m])) / sum by(controller, instance, method, success, statusCode, service, namespace) (rate(clouddriver_caching:controller:invocations__count_total{service="spin-clouddriver-caching"}[5m])) > 5 for: 15m labels: severity: warning annotations: description: Latency of the Service {{$labels.service}} is {{$value}} seconds for {{ $labels }} summary: Latency of the service {{ $labels.service }} in namespace {{$labels.namespace}} is high ok 32.284s ago 374us
alert: clouddriver_ro_deck-latency-too-high expr: sum by(controller, instance, method, success, statusCode, service, namespace) (rate(clouddriver_ro_deck:controller:invocations__total{service="spin-clouddriver-ro-deck"}[5m])) / sum by(controller, instance, method, success, statusCode, service, namespace) (rate(clouddriver_ro_deck:controller:invocations__total{service="spin-clouddriver-ro-deck"}[5m])) > 5 for: 15m labels: severity: warning annotations: description: Latency of the Service {{$labels.service}} is {{$value}} seconds for {{ $labels }} summary: Latency of the service {{ $labels.service }} in namespace {{$labels.namespace}} is high ok 32.284s ago 374.9us
alert: gate-latency-too-high expr: sum by(controller, instance, method, success, statusCode, service, namespace) (rate(gate:controller:invocations__total{service="spin-gate"}[5m])) / sum by(controller, instance, method, success, statusCode, service, namespace) (rate(gate:controller:invocations__count_total{service="spin-gate"}[5m])) > 0.5 for: 15m labels: severity: warning annotations: description: Latency of the Service {{$labels.service}} is {{$value}} seconds for {{ $labels }} summary: Latency of the service {{ $labels.service }} in namespace {{$labels.namespace}} is high ok 32.284s ago 376.7us
alert: orca-latency-too-high expr: sum by(controller, instance, method, success, statusCode, service, namespace) (rate(orca:controller:invocations__total{service="spin-orca"}[5m])) / sum by(controller, instance, method, success, statusCode, service, namespace) (rate(orca:controller:invocations__count_total{service="spin-orca"}[5m])) > 0.5 for: 15m labels: severity: warning annotations: description: Latency of the Service {{$labels.service}} is {{$value}} seconds for {{ $labels }} summary: Latency of the service {{ $labels.service }} in namespace {{$labels.namespace}} is high ok 32.284s ago 409.8us
alert: igor-latency-too-high expr: sum by(controller, instance, method, success, statusCode, service, namespace) (rate(igor:controller:invocations__total{service="spin-igor"}[5m])) / sum by(controller, instance, method, success, statusCode, service, namespace) (rate(igor:controller:invocations__count_total{service="spin-igor"}[5m])) > 0.5 for: 15m labels: severity: warning annotations: description: Latency of the Service {{$labels.service}} is {{$value}} seconds for {{ $labels }} summary: Latency of the service {{ $labels.service }} in namespace {{$labels.namespace}} is high ok 32.283s ago 484.9us
alert: echo_scheduler-latency-too-high expr: sum by(controller, instance, method, success, statusCode, service, namespace) (rate(echo_scheduler:controller:invocations__total{service="spin-echo-scheduler"}[5m])) / sum by(controller, instance, method, success, statusCode, service, namespace) (rate(echo_scheduler:controller:invocations__count_total{service="spin-echo-scheduler"}[5m])) > 0.5 for: 15m labels: severity: warning annotations: description: Latency of the Service {{$labels.service}} is {{$value}} seconds for {{ $labels }} summary: Latency of the service {{ $labels.service }} in namespace {{$labels.namespace}} is high ok 32.284s ago 353us
alert: echo_worker-latency-too-high expr: sum by(controller, instance, method, success, statusCode, service, namespace) (rate(echo_worker:controller:invocations__total{service="spin-echo-worker"}[5m])) / sum by(controller, instance, method, success, statusCode, service, namespace) (rate(echo_worker:controller:invocations__count_total{service="spin-echo-worker"}[5m])) > 0.5 for: 15m labels: severity: warning annotations: description: Latency of the Service {{$labels.service}} is {{$value}} seconds for {{ $labels }} summary: Latency of the service {{ $labels.service }} in namespace {{$labels.namespace}} is high ok 32.284s ago 392.4us
alert: front50-latency-too-high expr: sum by(controller, instance, method, success, statusCode, service, namespace) (rate(front50:controller:invocations__total{service="spin-front50"}[5m])) / sum by(controller, instance, method, success, statusCode, service, namespace) (rate(front50:controller:invocations__count_total{service="spin-front50"}[5m])) > 0.5 for: 15m labels: severity: warning annotations: description: Latency of the Service {{$labels.service}} is {{$value}} seconds for {{ $labels }} summary: Latency of the service {{ $labels.service }} in namespace {{$labels.namespace}} is high ok 32.285s ago 377us
alert: fiat-latency-too-high expr: sum by(controller, instance, method, success, statusCode, service, namespace) (rate(fiat:controller:invocations__total{service="spin-fiat"}[5m])) / sum by(controller, instance, method, success, statusCode, service, namespace) (rate(fiat:controller:invocations__count_total{service="spin-fiat"}[5m])) > 0.5 for: 15m labels: severity: warning annotations: description: Latency of the Service {{$labels.service}} is {{$value}} seconds for {{ $labels }} summary: Latency of the service {{ $labels.service }} in namespace {{$labels.namespace}} is high ok 32.285s ago 354.4us
alert: rosco-latency-too-high expr: sum by(controller, instance, method, success, statusCode, service, namespace) (rate(rosco:controller:invocations__total{service="spin-rosco"}[5m])) / sum by(controller, instance, method, success, statusCode, service, namespace) (rate(rosco:controller:invocations__count_total{service="spin-rosco"}[5m])) > 0.5 for: 15m labels: severity: warning annotations: description: Latency of the Service {{$labels.service}} is {{$value}} seconds for {{ $labels }} summary: Latency of the service {{ $labels.service }} in namespace {{$labels.namespace}} is high ok 32.284s ago 410.1us

orca-queue-issue

28.36s ago

1.075ms

Rule State Error Last Evaluation Evaluation Time
alert: orca-queue-depth-high expr: (sum by(instance) (orca:queue:ready:depth__value{namespace!=""})) > 10 labels: severity: warning annotations: description: Service {{$labels.service}} in namespace {{$labels.namespace}} may be evicted soon summary: Orca queue depth is high ok 28.36s ago 578.2us
alert: orca-queue-lag-high expr: sum by(instance, service, namespace) (rate(orca:controller:invocations__total[2m])) / sum by(instance, service, namespace) (rate(orca:controller:invocations__count_total[2m])) > 0.5 labels: severity: warning annotations: description: Service {{$labels.service}} in namespace {{$labels.namespace}} has Lag value of {{$value}} summary: Orca queue lag is high ok 28.359s ago 474us

prometheus-job-down

9.601s ago

444.4us

Rule State Error Last Evaluation Evaluation Time
alert: prometheus-job-is-down expr: up{job="prometheus"} == 0 for: 5m labels: severity: warning annotations: description: Default Prometheus Job is Down LABELS = {{ $labels }} summary: The Default Prometheus Job is Down (job {{ $labels.job}}) ok 9.602s ago 431.6us

spinnaker-service-is-down

33.101s ago

2.997ms

Rule State Error Last Evaluation Evaluation Time
alert: clouddriver-rw-is-down expr: up{job="opsmx_spinnaker_metrics",service="spin-clouddriver-rw"} == 0 labels: severity: critical annotations: description: Service {{$labels.service}} with pod name {{$labels.pod}} in namespace {{$labels.namespace}} is not responding summary: Clouddriver-rw Spinnaker service is down ok 33.101s ago 546.6us
alert: clouddriver-ro-is-down expr: up{job="opsmx_spinnaker_metrics",service="spin-clouddriver-ro"} == 0 labels: severity: critical annotations: description: Service {{$labels.service}} with pod name {{$labels.pod}} in namespace {{$labels.namespace}} is not responding summary: Clouddriver-ro Spinnaker service is down ok 33.1s ago 358.5us
alert: clouddriver-caching-is-down expr: up{job="opsmx_spinnaker_metrics",service="spin-clouddriver-caching"} == 0 labels: severity: critical annotations: description: Service {{$labels.service}} with pod name {{$labels.pod}} in namespace {{$labels.namespace}} is not responding summary: Clouddriver-caching Spinnaker service is down ok 33.1s ago 254.5us
alert: clouddriver-ro-deck-is-down expr: up{job="opsmx_spinnaker_metrics",service="spin-clouddriver-ro-deck"} == 0 labels: severity: critical annotations: description: Service {{$labels.service}} with pod name {{$labels.pod}} in namespace {{$labels.namespace}} is not responding summary: Clouddriver-ro-deck Spinnaker service is down ok 33.1s ago 261.2us
alert: gate-is-down expr: up{job="opsmx_spinnaker_metrics",service="spin-gate"} == 0 labels: severity: critical annotations: description: Service {{$labels.service}} with pod name {{$labels.pod}} in namespace {{$labels.namespace}} is not responding summary: Gate Spinnaker services is down ok 33.1s ago 306.1us
alert: orca-is-down expr: up{job="opsmx_spinnaker_metrics",service="spin-orca"} == 0 labels: severity: critical annotations: description: Service {{$labels.service}} with pod name {{$labels.pod}} in namespace {{$labels.namespace}} is not responding summary: Orca Spinnaker service is down ok 33.1s ago 201.7us
alert: igor-is-down expr: up{job="opsmx_spinnaker_metrics",service="spin-igor"} == 0 labels: severity: critical annotations: description: Service {{$labels.service}} with pod name {{$labels.pod}} in namespace {{$labels.namespace}} is not responding summary: Igor Spinnaker service is down ok 33.1s ago 217us
alert: echo-scheduler-is-down expr: up{job="opsmx_spinnaker_metrics",service="spin-echo-scheduler"} == 0 labels: severity: critical annotations: description: Service {{$labels.service}} with pod name {{$labels.pod}} in namespace {{$labels.namespace}} is not responding summary: Echo-Scheduler Spinnaker service is down ok 33.099s ago 202.6us
alert: echo-worker-is-down expr: up{job="opsmx_spinnaker_metrics",service="spin-echo-worker"} == 0 labels: severity: critical annotations: description: Service {{$labels.service}} with pod name {{$labels.pod}} in namespace {{$labels.namespace}} is not responding summary: Echo-worker Spinnaker service is down ok 33.099s ago 186.9us
alert: front50-is-down expr: up{job="opsmx_spinnaker_metrics",service="spin-front50"} == 0 labels: severity: critical annotations: description: Service {{$labels.service}} with pod name {{$labels.pod}} in namespace {{$labels.namespace}} is not responding summary: Front50 Spinnaker service is down ok 33.099s ago 137.8us
alert: fiat-is-down expr: up{job="opsmx_spinnaker_metrics",service="spin-fiat"} == 0 labels: severity: critical annotations: description: Service {{$labels.service}} with pod name {{$labels.pod}} in namespace {{$labels.namespace}} is not responding summary: Fiat Spinnaker service is down ok 33.099s ago 127.6us
alert: rosco-is-down expr: up{job="opsmx_spinnaker_metrics",service="spin-rosco"} == 0 labels: severity: critical annotations: description: Service {{$labels.service}} with pod name {{$labels.pod}} in namespace {{$labels.namespace}} is not responding summary: Rosco Spinnaker service is down ok 33.099s ago 128.2us

volume-is-almost-full (< 10% left)

40.14s ago

2.048ms

Rule State Error Last Evaluation Evaluation Time
alert: pvc-storage-full expr: kubelet_volume_stats_available_bytes / kubelet_volume_stats_capacity_bytes * 100 < 10 for: 2m labels: severity: warning annotations: description: |- Volume is almost full (< 10% left) VALUE = {{ $value }} LABELS = {{ $labels }} summary: Kubernetes Volume running out of disk space for (persistentvolumeclaim {{ $labels.persistentvolumeclaim }} in namespace {{$labels.namespace}}) ok 40.14s ago 2.034ms