Day 2: Metrics, Monitoring and Prometheus

💁 Metrics vs Monitoring

❇ Metrics - Historical data of the events to understand the health of the system.

Example: A patient in ICU, is taken care of periodically and collect the data like BP,heart beat and blood glucose levels.

10:00am - HB - 72

10:15am - HB - 78
All these data in the following Time-stamped data points represents current state of the system - Metrics

❇ Monitoring: `Metrics + Visualization + Alerting`

All the metrics is fed to the monitoring platform like prometheus and it scrapes the metrics of the system and represent the data in a visually appealing way of dashboards , also sends an alert on certain conditions met of the metrics data.

Example:

Cpu utilization of k8s nodes is the realtime data .
Abnormalities - like if cpu util > 80% send alert to alert manger
Dashboard in pie chart or graph format

❇ Prometheus: Monitoring Platform for Kubernetes

Prometheus server :

At the core we have server it is responsible for scraping the metrics from different targets and store them in TSDB format.

Components:

1️⃣ Retrieval - This handles the scrapping of metrics from different targets.

2️⃣ TSDB - data scraped is stored in this TSDB in Key value format.

3️⃣ HTTP server - this provides UI and api to let PromQL query to query the prometheus server to get the information.
Storage - the scraped data is actually stored on local disk (HDD/SSD).

🤔 How does prometheus pulls the metrics from various sources(targets)?

Node level metrics - Node exporter runs as a DS which gathers all node level metrics.
K8s resources metrics - Kube-state-metrics Queries the Kubernetes API server to gather metrics related to Kubernetes resources like pod,deploy,svc etc
Application level metrics - Developers expose application-specific metrics at the /metrics endpoint. Prometheus scrapes these endpoints.

Alertmanager is used to send alerts based on rules configured in Prometheus
Grafana for data visualization where Prometheus is configured as a data source.

❇ All prometheus needs is targets to monitor. What are those targets?

Targets - Be it server, application or application service or database service.
Units of those targets:
System - memory and disk usage data
Application - no of exceptions, no of requests and durations.

❇ Installation and configuration:

STEP-1: Create an EKS cluster with managed node group type
Step-2: Associate OIDC provider
Step-3: Update kubeconfig file to access the cluster api
Step-4: Install kube-prometheus-stack using helm chart Deploy the chart into a new namespace "monitoring”(alongside alertmanager config)
Step-5: Verify the Installation and access promethus UI , grafana UI and Alertmanage UI after port forwarding
Step-6: Cleanup the cluster - uninstall helm chart and delete namespace and cluster