Behind the scenes: Running the Power BI service

I’m excited to share a new whitepaper that describes the Power BI team’s approach to maintaining a reliable, performant, and scalable service for our customers.

It covers aspects related to monitoring service health, mitigating incidents, release management and acting on necessary improvements. This document was created to share knowledge with our customers, who often raise questions regarding site reliability engineering practices.  The intention is to offer transparency into how the Power BI team minimizes service disruption through safe deployment, continuous monitoring, and rapid incident response. The techniques described here also provide a blueprint for teams hosting service-based solutions to build foundational live site processes that are efficient and effective at scale.

As service owners we need to make sure our customers can rely on us to use Power BI for mission critical work. This trust is shown in the rapid growth, with 6 straight years of triple digit paid growth since its launch. Power BI is now being used by 97% of Fortune 500 companies.

The results illustrated in the table below are the direct result of engineering, tools, and culture changes made by the Power BI team over the past few years.

MetricActual

(Dec 2018)

Actual

(May 2021)

% Improvement
Time to Notify (TTN) Customers of Incidents – P75110 min14 min87%
Time to Acknowledge (TTA) When Incidents Occur – P7511 min0.76 min93%
Time to Mitigate (TTM) Issue – P5049.3 min2.8 min94%
% Alerts Automated (Enrichment)7%88%1,157%
% Alerts Mitigated w/o human intervention0%82%New Capability
% Incidents Escalated to SMEs (Subject Matter Expert)6.7%0.34%95%

Read our service admin site reliability service model whitepaper

- Advertisement -

LEAVE A REPLY

Please enter your comment!
Please enter your name here

- Advertisement -

Latest News

- Advertisement -

More Articles Like This

- Advertisement -