❗Understanding Scaling in Kubernetes
Scaling in Kubernetes means to adjusting the number of servers, workloads, or resources to meet demand. It's different from maintaining a fixed number of replicas, which is handled by the ReplicaSet controller(High Availability).
❓The Need for Autoscaling
Autoscaling becomes important during high-demand situations, such as sales events (e.g: Flipkart's Big Billion Days). Without it, applications may face resource constraints, leading to CPU throttling, high latency, and low throughput.
🌟 Types of Autoscaling in Kubernetes
1️⃣ Horizontal Pod Autoscaler (HPA):
Scales out/in by adjusting the number of identical pods.
Suitable for customer-facing, mission-critical applications
No pod restart required.
2️⃣ Vertical Pod Autoscaler (VPA):
Resizes existing pods by adjusting their resource allocation
Better for non-mission-critical, stateless applications
It requires pod restart may lead to temporary downtime.
3️⃣ Cluster Autoscaler:
Manages node-level scaling in cloud-based clusters (e.g: AWS EKS)
Adds or removes nodes based on pod resource requirements and pending pods status.
🌟 Prerequisites for HPA
- Make sure the metrics server is deployed in the cluster. HPA is enabled by default in a Kubernetes cluster, it is usually included with the Kubernetes control plane components.
🤔 How HPA Works
How does HPA knew about the resources usage of pods? Where does it gathers metrics data from?
The Metrics Server is deployed in the
kube-system
namespace but it runs as a deployment across the cluster, which means it can run on any worker node.Function: The Metrics Server collects resource usage metrics (CPU and memory) from the kubelets running on each node and exposes these metrics via the Kubernetes API-server.
HPA will query the api-server for the metrics data by default for every 15 sec, and works in conjunction with control manager to make sure the desired state is always maintained.
HPA: Decides when scaling is needed based on metrics and scaling policy set.
HPA Controller: Responsible for implementing the scaling actions to maintain the desired state and meet demand.
🌟 Other Autoscaling Approaches
Event-based Autoscaling: Using tools like KEDA.
Cron/Schedule-based Autoscaling: For predictable traffic patterns.
🌟 Cloud vs Kubernetes Autoscaling
Cloud: Uses Auto Scaling Groups (ASG) for instance-level scaling.
Kubernetes:
HPA for pod-level scaling.
Cluster Autoscaler for node-level scaling in cloud environments.
VPA for existing pod resource adjustments.
Node AutoProvisioning for existing node resource adjustments.
🌟 TASK
Make sure the metrics-server is deployed in the cluster using this
metrics-server.yaml
apiVersion: v1 kind: ServiceAccount metadata: labels: k8s-app: metrics-server name: metrics-server namespace: kube-system --- apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRole metadata: labels: k8s-app: metrics-server rbac.authorization.k8s.io/aggregate-to-admin: "true" rbac.authorization.k8s.io/aggregate-to-edit: "true" rbac.authorization.k8s.io/aggregate-to-view: "true" name: system:aggregated-metrics-reader rules: - apiGroups: - metrics.k8s.io resources: - pods - nodes verbs: - get - list - watch --- apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRole metadata: labels: k8s-app: metrics-server name: system:metrics-server rules: - apiGroups: - "" resources: - nodes/metrics verbs: - get - apiGroups: - "" resources: - pods - nodes verbs: - get - list - watch --- apiVersion: rbac.authorization.k8s.io/v1 kind: RoleBinding metadata: labels: k8s-app: metrics-server name: metrics-server-auth-reader namespace: kube-system roleRef: apiGroup: rbac.authorization.k8s.io kind: Role name: extension-apiserver-authentication-reader subjects: - kind: ServiceAccount name: metrics-server namespace: kube-system --- apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRoleBinding metadata: labels: k8s-app: metrics-server name: metrics-server:system:auth-delegator roleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: system:auth-delegator subjects: - kind: ServiceAccount name: metrics-server namespace: kube-system --- apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRoleBinding metadata: labels: k8s-app: metrics-server name: system:metrics-server roleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: system:metrics-server subjects: - kind: ServiceAccount name: metrics-server namespace: kube-system --- apiVersion: v1 kind: Service metadata: labels: k8s-app: metrics-server name: metrics-server namespace: kube-system spec: ports: - name: https port: 443 protocol: TCP targetPort: https selector: k8s-app: metrics-server --- apiVersion: apps/v1 kind: Deployment metadata: labels: k8s-app: metrics-server name: metrics-server namespace: kube-system spec: selector: matchLabels: k8s-app: metrics-server strategy: rollingUpdate: maxUnavailable: 0 template: metadata: labels: k8s-app: metrics-server spec: containers: - args: - --cert-dir=/tmp - --secure-port=10250 - --kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname - --kubelet-use-node-status-port - --kubelet-insecure-tls - --metric-resolution=15s image: registry.k8s.io/metrics-server/metrics-server:v0.7.1 imagePullPolicy: IfNotPresent livenessProbe: failureThreshold: 3 httpGet: path: /livez port: https scheme: HTTPS periodSeconds: 10 name: metrics-server ports: - containerPort: 10250 name: https protocol: TCP readinessProbe: failureThreshold: 3 httpGet: path: /readyz port: https scheme: HTTPS initialDelaySeconds: 20 periodSeconds: 10 resources: requests: cpu: 100m memory: 200Mi securityContext: allowPrivilegeEscalation: false capabilities: drop: - ALL readOnlyRootFilesystem: true runAsNonRoot: true runAsUser: 1000 seccompProfile: type: RuntimeDefault volumeMounts: - mountPath: /tmp name: tmp-dir nodeSelector: kubernetes.io/os: linux priorityClassName: system-cluster-critical serviceAccountName: metrics-server volumes: - emptyDir: {} name: tmp-dir --- apiVersion: apiregistration.k8s.io/v1 kind: APIService metadata: labels: k8s-app: metrics-server name: v1beta1.metrics.k8s.io spec: group: metrics.k8s.io groupPriorityMinimum: 100 insecureSkipTLSVerify: true service: name: metrics-server namespace: kube-system version: v1beta1 versionPriority: 100
Deploy php-apache server using yaml file
apiVersion: apps/v1 kind: Deployment metadata: name: php-apache spec: selector: matchLabels: run: php-apache template: metadata: labels: run: php-apache spec: containers: - name: php-apache image: registry.k8s.io/hpa-example ports: - containerPort: 80 resources: limits: cpu: 500m requests: cpu: 200m --- apiVersion: v1 kind: Service metadata: name: php-apache labels: run: php-apache spec: ports: - port: 80 selector: run: php-apache
Create the HorizontalPodAutoscaler:
kubectl autoscale deployment php-apache --cpu-percent=50 --min=1 --max=10
You can check the current status of the newly-made HorizontalPodAutoscaler, by running:
kubectl get hpa
The current CPU consumption is 0% as there are no clients sending requests to the server.
Increase the Load using the following command
# Run this in a separate terminal # so that the load generation continues and you can carry on with the rest of the steps kubectl run -i --tty load-generator --rm --image=busybox:1.28 --restart=Never -- /bin/sh -c "while sleep 0.01; do wget -q -O- http://php-apache; done"
Now run the command to check the load
kubectl get hpa php-apache --watch
Here, CPU consumption has increased to 150% of the request. As a result, the Deployment was resized to 7 replicas:
You should see the pod replica count is 7 now.
This shows that pods are scaled dynamically(HPA in this case) to meet the demand of the load as per scaling policy.
#Kubernetes #HPA #VPA #ClusterAutoscaler #40DaysofKubernetes #CKASeries