New Delhi , Moti Nagar
+91-9007641046
subhendu@subhendumct.com

SLA in Cloud

Live in Future - Live in Cloud

SLA in Cloud

SLAs for Azure products and services

There are three key characteristics of SLAs for Azure products and services:

  1. Performance Targets
  2. Uptime and Connectivity Guarantees
  3. Service credits

Performance Targets

An SLA defines performance targets for an Azure product or service. The performance targets that an SLA defines are specific to each Azure product and service. For example, performance targets for some Azure services are expressed as uptime guarantees or connectivity rates.

Uptime and Connectivity Guarantees

A typical SLA specifies performance-target commitments that range from 99.9 percent (“three nines”) to 99.999 percent (“five nines”), for each corresponding Azure product or service. These targets can apply to such performance criteria as uptime or response times for services.

The following table lists the potential cumulative downtime for various SLA levels over different durations:

SLA % Downtime per week Downtime per month Downtime per year
99 1.68 hours 7.2 hours 3.65 days
99.9 10.1 minutes 43.2 minutes 8.76 hours
99.95 5 minutes 21.6 minutes 4.38 hours
99.99 1.01 minutes 4.32 minutes 52.56 minutes
99.999 6 seconds 25.9 seconds 5.26 minutes

For example, the SLA for the Azure Cosmos DB (Database) service SLA offers 99.99 percent uptime, which includes low-latency commitments of less than 10 milliseconds on DB read operations and less than 15 milliseconds on DB write operations.

Service Credits

SLAs also describe how Microsoft will respond if an Azure product or service fails to perform to its governing SLA’s specification.

For example, customers may have a discount applied to their Azure bill, as compensation for an under-performing Azure product or service. The table below explains this example in more detail.

The first column in the table below shows monthly uptime percentage SLA targets for a single instance Azure Virtual Machine. The second column shows the corresponding service credit amount you receive if the actual uptime is less than the specified SLA target for that month.

MONTHLY UPTIME PERCENTAGE SERVICE CREDIT PERCENTAGE
< 99.9 10
< 99 25
< 95 100

Composing SLAs across services

When combining SLAs across different service offerings, the resultant SLA is a called a Composite SLA. The resulting composite SLA can provide higher or lower uptime values, depending on your application architecture.

Calculating downtime

Consider an App Service web app that writes to Azure SQL Database. These Azure services currently have the following SLAs:

Image representing Web app and its SLA uptime value of 99.95 percent and a SQL database and its SLA value of 99.99 percent.

In this example, if either service fails the whole application will fail. In general, the individual probability values for each service are independent. However, the composite SLA value for this application is:

99.95 percent × 99.99 percent = 99.94 percent

This means the combined probability of failure is higher than the individual SLA values. This isn’t surprising, because an application that relies on multiple services has more potential failure points.

Conversely, you can improve the composite SLA by creating independent fallback paths. For example, if SQL Database is unavailable, you can put transactions into a queue for processing at a later time.

Image representing Web app and its SLA uptime value of 99.95% and SQL database and its SLA value of 99.99%.

With this design, the application is still available even if it can’t connect to the database. However, it fails if both the database and the queue fail simultaneously.

If the expected percentage of time for a simultaneous failure is 0.0001 × 0.001, the composite SLA for this combined path of a database or queue would be:

1.0 − (0.0001 × 0.001) = 99.99999 percent

Therefore, if we add the queue to our web app, the total composite SLA is:

99.95 percent × 99.99999 percent = ~99.95 percent

Notice Microsoft improved Azure SLA behavior. However, there are trade-offs to using this approach: the application logic is more complicated, you are paying more to add the queue support, and there may be data-consistency issues you’ll have to deal with due to retry behavior.

  • Source and Credit – Microsoft Official

Leave a Reply

Your email address will not be published. Required fields are marked *