News
Building a Production-Grade Observability Platform for the Anvila API with LGTM, SLOs, DORA Metrics, and Game Day Testing
3+ hour, 46+ min ago (1474+ words) For the HNG Dev Ops Stage 6 task, our team built a production-grade observability and reliability platform for the Anvila API. The goal was not just to check whether a server was up or down. We needed to build a monitoring…...
Open Source Software Maker Grafana Labs Was Hacked
8+ hour, 25+ min ago (460+ words) Grafana Labs, the company behind the open-source analytics and visualization web application of the same name, has confirmed it suffered a data breach. The hackers have threatened to release the company's codebase, but Grafana Labs says it won't be paying…...
Prometheus and Grafana: from zero to production monitoring
6+ hour, 45+ min ago (222+ words) We started shipping monitoring after a string of outages where customers paged us before our own dashboards did. This is the stack we landed on, written like you're standing it up tomorrow. After trying Cloud Watch, Datadog, and New Relic, we…...
Grafana Labs Confirms Security Incident Involving Git Hub Codebase Access
22+ hour, 27+ min ago (424+ words) Grafana Labs has confirmed a security incident involving unauthorized access to its internal Git Hub environment, after a threat actor obtained a compromised access token and downloaded portions of the company's codebase. The disclosure, made via an official statement on…...
Distributed Tracing in Nest JS: End-to-End Request Visibility with Open Telemetry
1+ day, 6+ hour ago (734+ words) In a monolithic application, debugging a slow or failing request is straightforward, you have one codebase, one log stream, and one execution context to reason about. In a microservices architecture, a single user request can touch a dozen services, three…...
Observability | Open Router Python SDK
2+ day, 8+ hour ago (128+ words) Observability - Python SDK The Python SDK and docs are currently in beta. Report issues on Git Hub. List the observability destinations configured for the authenticated entity's default workspace. Use the workspace_id query parameter to scope the result to a different workspace....
Tracers " GLaa S
3+ day, 9+ hour ago (902+ words) A tracer is the component that observes what your command does as it runs. When you type roar run python train. py, the tracer is what records the read/write/mmap/rename/unlink events that flow through your process " and…...
How to make Time-Shifed Compare Metrics in Grafana Across Datasources
3+ day, 14+ hour ago (19+ words) How to Compare Today vs Yesterday Metrics in Grafana Across Datasources When building. .. Tagged with grafana, timeshift, datasource, observability....
SLO Alerting with Open Telemetry and Prometheus
5+ day, 15+ hour ago (173+ words) To address this problem, we can leverage Open Telemetry and Prometheus to implement SLO-based alerting. Open Telemetry provides a standardized way to collect and manage telemetry data, while Prometheus offers a robust alerting framework. Here's an example of how to…...
Why Prometheus couldn't see Cilium metrics at 2 a. m.
1+ week, 1+ day ago (276+ words) Tame the Kubernetes "integration tax." Learn how to wire CNCF projects like Prometheus and Cilium for production-grade reliability....