Day-1: Fundamentals of Observability
❇ Observability
Observability is simply getting the big picture of the internal state of the system. Be it application, infrastructure, networking(latency, https traffic) to improve system reliability and performance.
💡Three Pillars of Observability
Metrics
Logs
Traces
Metrics:
Understanding the state of the system
Ex: CPU utilization, Memory usage, Disk utilization, HTTP request statistics
Logs:
Analyze the issue and provides detailed insights about application behavior which helps in root cause investigation.
Traces:
Debugging, troubleshooting, and remediation. It provides extensive information to understand system interactions.
❇ Observability vs Monitoring
Monitoring: Subset of Observability which focuses on Metrics + Alerts + Dashboards
Reactive-approach: Firefighting and remediation after issues occur
Observability: More broader approach which Combines Metrics + Traces + Logs
for proactive system understanding.
Pro-active approach: Catch issues before they reach production
🤔 Why Observability Matters ?
- To ensure high availability and optimal performance of the system, to support/fulfill the SLA and SLO promised.
SLO’s Example:
99.9% availability
10,000 requests with 99.995 success rate.
Response time under 30ms with 200 status code.
❇ Responsibilities
Developers needs to implement metrics, logs and traces and DevOps does the setting up the observability stack to implement
❇ Tools:
Prometheus,Nagios,splunk: Metrics
ELK Stack,EFK stack: Logs
Jaeger: Distributed Tracing
Observability is a collaborative effort to maintain system reliability, performance, and proactively address potential issues before they impact users.