Day-1: Fundamentals of Observability

Observability

Observability is simply getting the big picture of the internal state of the system. Be it application, infrastructure, networking(latency, https traffic) to improve system reliability and performance.

💡Three Pillars of Observability

  • Metrics

  • Logs

  • Traces

Metrics:

Understanding the state of the system
Ex: CPU utilization, Memory usage, Disk utilization, HTTP request statistics

Logs:

Analyze the issue and provides detailed insights about application behavior which helps in root cause investigation.

Traces:

Debugging, troubleshooting, and remediation. It provides extensive information to understand system interactions.

Observability vs Monitoring

Monitoring: Subset of Observability which focuses on Metrics + Alerts + Dashboards

Reactive-approach: Firefighting and remediation after issues occur

Observability: More broader approach which Combines Metrics + Traces + Logs for proactive system understanding.

Pro-active approach: Catch issues before they reach production

🤔 Why Observability Matters ?

  • To ensure high availability and optimal performance of the system, to support/fulfill the SLA and SLO promised.

SLO’s Example:

  • 99.9% availability

  • 10,000 requests with 99.995 success rate.

  • Response time under 30ms with 200 status code.

Responsibilities

Developers needs to implement metrics, logs and traces and DevOps does the setting up the observability stack to implement

Tools:

  • Prometheus,Nagios,splunk: Metrics

  • ELK Stack,EFK stack: Logs

  • Jaeger: Distributed Tracing

Observability is a collaborative effort to maintain system reliability, performance, and proactively address potential issues before they impact users.