New to Stash? Please start here.

Monitoring Stash

Stash has native support for monitoring via Prometheus. You can use builtin Prometheus scrapper or CoreOS Prometheus Operator to monitor Stash. This tutorial will show you how this monitoring works with Stash and how to enable them.

Overview

Stash uses Prometheus PushGateway to export the metrics for backup & recovery operations. Following diagram shows the logical structure of Stash monitoring flow.

  Monitoring Structure

Stash operator runs two containers. The operator container runs controller and other necessary stuffs and the pushgateway container runs prom/pushgateway image. Stash sidecar from different workloads pushes their metrics to this pushgateway. Then Prometheus server scraps these metrics through stash-operator service. Stash operator itself also provides some metrics at /metrics path of :8443 port.

Backup & Recovery Metrics

Following metrics are available for stash backup and recovery operations. These metrics are accessible through pushgateway endpoint of stash-operator service.

MetricUses
restic_session_successIndicates if session was successfully completed
restic_session_failIndicates if session failed
restic_session_duration_seconds_totalSeconds taken to complete restic session for all FileGroups
restic_session_duration_secondsSeconds taken to complete restic session for a FileGroup

Operator Metrics

Following metrics are available for Stash operator. These metrics are accessible through api endpoint of stash-operator service.

  • apiserver_audit_event_total
  • apiserver_client_certificate_expiration_seconds_bucket
  • apiserver_client_certificate_expiration_seconds_count
  • apiserver_client_certificate_expiration_seconds_sum
  • apiserver_current_inflight_requests
  • apiserver_request_count
  • apiserver_request_latencies_bucket
  • apiserver_request_latencies_count
  • apiserver_request_latencies_sum
  • apiserver_request_latencies_summary
  • apiserver_request_latencies_summary_count
  • apiserver_request_latencies_summary_sum
  • apiserver_storage_data_key_generation_failures_total
  • apiserver_storage_data_key_generation_latencies_microseconds_bucket
  • apiserver_storage_data_key_generation_latencies_microseconds_count
  • apiserver_storage_data_key_generation_latencies_microseconds_sum
  • apiserver_storage_envelope_transformation_cache_misses_total
  • authenticated_user_requests
  • etcd_helper_cache_entry_count
  • etcd_helper_cache_hit_count
  • etcd_helper_cache_miss_count
  • etcd_request_cache_add_latencies_summary
  • etcd_request_cache_add_latencies_summary_count
  • etcd_request_cache_add_latencies_summary_sum
  • etcd_request_cache_get_latencies_summary
  • etcd_request_cache_get_latencies_summary_count
  • etcd_request_cache_get_latencies_summary_sum

How to Enable Monitoring

You can enable monitoring through some flags while installing or upgrading or updating Stash via both script and Helm. You can also chose which monitoring agent to use for monitoring. Stash will configure respective resources accordingly. Here, are the list of available flags and their uses,

Script FlagHelm ValuesAcceptable ValuesDefaultUses
--monitoring-agentmonitoring.agentprometheus.io/builtin or prometheus.io/coreos-operatornoneSpecify which monitoring agent to use for monitoring Stash.
--monitoring-backupmonitoring.backuptrue or falsefalseSpecify whether to monitor Stash backup and recovery.
--monitoring-operatormonitoring.operatortrue or falsefalseSpecify whether to monitor Stash operator.
--prometheus-namespacemonitoring.prometheus.namespaceany namespacesame namespace as Stash operatorSpecify the namespace where Prometheus server is running or will be deployed
--servicemonitor-labelmonitoring.serviceMonitor.labelsany labelFor Helm installation, app: <generated app name> and release: <release name>. For script installation, app: stashSpecify the labels for ServiceMonitor. Prometheus crd will select ServiceMonitor using these labels. Only usable when monitoring agent is prometheus.io/coreos-operator.

You have to provides these flags while installing or upgrading or updating Stash. Here, are examples for both script and Helm installation process are given which enable monitoring with prometheus.io/coreos-operator Prometheuse server for backup & recovery and operator metrics.

Helm:

$ helm install appscode/stash --name stash-operator --version 0.8.2 --namespace kube-system \
  --set monitoring.agent=prometheus.io/coreos-operator \
  --set monitoring.backup=true \
  --set monitoring.operator=true \
  --set monitoring.prometheus.namespace=monitoring \
  --set monitoring.serviceMonitor.labels.k8s-app=prometheus

Script:

$ curl -fsSL https://raw.githubusercontent.com/appscode/stash/0.8.2/hack/deploy/stash.sh  | bash -s -- \
  --monitoring-agent=prometheus.io/coreos-operator \
  --monitoring-backup=true \
  --monitoring-operator=true \
  --prometheus-namespace=monitoring \
  --servicemonitor-label=k8s-app=prometheus

Next Steps

  • Learn how to monitor Stash using built-in Prometheus from here.
  • Learn how to monitor Stash using CoreOS Prometheus operator from here.
  • Learn how to use Grafana dashboard to visualize monitoring data from here.