Day-1: Fundamentals of Observability

❇ Observability

Observability is simply getting the big picture of the internal state of the system. Be it application, infrastructure, networking(latency, https traffic) to improve system reliability and performance.

💡Three Pillars of Observability

Metrics
Logs
Traces

Metrics:

Understanding the state of the system
Ex: CPU utilization, Memory usage, Disk utilization, HTTP request statistics

Logs:

Analyze the issue and provides detailed insights about application behavior which helps in root cause investigation.

Traces:

Debugging, troubleshooting, and remediation. It provides extensive information to understand system interactions.

❇ Observability vs Monitoring

Monitoring: Subset of Observability which focuses on Metrics + Alerts + Dashboards

Reactive-approach: Firefighting and remediation after issues occur

Observability: More broader approach which Combines Metrics + Traces + Logs for proactive system understanding.

Pro-active approach: Catch issues before they reach production

🤔 Why Observability Matters ?

To ensure high availability and optimal performance of the system, to support/fulfill the SLA and SLO promised.

SLO’s Example:

99.9% availability
10,000 requests with 99.995 success rate.
Response time under 30ms with 200 status code.

❇ Responsibilities

Developers needs to implement metrics, logs and traces and DevOps does the setting up the observability stack to implement

❇ Tools:

Prometheus,Nagios,splunk: Metrics
ELK Stack,EFK stack: Logs
Jaeger: Distributed Tracing

Observability is a collaborative effort to maintain system reliability, performance, and proactively address potential issues before they impact users.