DAY-3: Prometheus- Hands on Explanation
After Installing Prometheus-grafana and alert-manager stack in the EKS cluster, verify the services running. These are all clusterIP services which are accessible with in the cluster network.
Metric Endpoint Verification
SSH into node and Curl service IP:port/metrics
Node Exporter: Gather node-level metrics
Kube-state-metrics: Collect Kubernetes resource level metrics
Kube-state-Metrics
Exposed at
/metrics
endpointSample metrics:
kube_pod_container_status_restarts_total kube_pod_container_status_restarts_total{namespace="default"}
Practical Demonstration
Creating a Crash-Loop Pod
kubectl run busybox --image=busybox -- /bin/sh -c "exit 1"
Metrics Collection Flow
When we send an instruction like kubectl run pod, it goes to api-server and then eventually scheduler and then kubelet to schedule a pod, kube-state-metrics
continuously looking at api-server and get the metrics of the pod and expose them at /metrics
endpoint to make it understandable to prometheus. PromQL Query query HTTP server to retrieve specific metrics data.
Grafana
Add prometheus as a data source for better visualization and can also setup Authentication and authorization by integrating with IAM for access control.
- Sum Up All CPU Usage: This query aggregates the CPU usage across all nodes in grafana
- Average Memory Usage per Namespace: This query provides the average memory usage grouped by namespace.
- Sum Up All CPU Usage: This PromQL query aggregates the CPU usage across all nodes in grafana
- Pod Container restarts: This query gives the total container restarts
- Memory Utilization of Nodes