Day 2: Metrics, Monitoring and Prometheus
š Metrics vs Monitoring
ā Metrics - Historical data of the events to understand the health of the system.
Example: A patient in ICU, is taken care of periodically and collect the data like BP,heart beat and blood glucose levels.
10:00am - HB - 72
10:15am - HB - 78
All these data in the following Time-stamped data points represents current state of the system - Metrics
ā Monitoring: Metrics + Visualization + Alerting
All the metrics is fed to the monitoring platform like prometheus and it scrapes the metrics of the system and represent the data in a visually appealing way of dashboards , also sends an alert on certain conditions met of the metrics data.
Example:
Cpu utilization of k8s nodes is the realtime data .
Abnormalities - like if cpu util > 80% send alert to alert manger
Dashboard in pie chart or graph format
ā Prometheus: Monitoring Platform for Kubernetes
Prometheus server :
At the core we have server it is responsible for scraping the metrics from different targets and store them in TSDB format.
Components:
1ļøā£ Retrieval - This handles the scrapping of metrics from different targets.
2ļøā£ TSDB - data scraped is stored in this TSDB in Key value format.
3ļøā£ HTTP server - this provides UI and api to let PromQL query to query the prometheus server to get the information.
Storage - the scraped data is actually stored on local disk (HDD/SSD).
š¤ How does prometheus pulls the metrics from various sources(targets)?
Node level metrics - Node exporter runs as a DS which gathers all node level metrics.
K8s resources metrics - Kube-state-metrics Queries the Kubernetes API server to gather metrics related to Kubernetes resources like pod,deploy,svc etc
Application level metrics - Developers expose application-specific metrics at the /metrics endpoint. Prometheus scrapes these endpoints.
Alertmanager is used to send alerts based on rules configured in Prometheus
Grafana for data visualization where Prometheus is configured as a data source.
ā All prometheus needs is targets to monitor. What are those targets?
Targets - Be it server, application or application service or database service.
Units of those targets:
System - memory and disk usage data
Application - no of exceptions, no of requests and durations.
ā Installation and configuration:
STEP-1: Create an EKS cluster with managed node group type
Step-2: Associate OIDC provider
Step-3: Update kubeconfig file to access the cluster api
Step-4: Install kube-prometheus-stack using helm chart Deploy the chart into a new namespace "monitoringā(alongside alertmanager config)
Step-5: Verify the Installation and access promethus UI , grafana UI and Alertmanage UI after port forwarding
Step-6: Cleanup the cluster - uninstall helm chart and delete namespace and cluster