SLA in Cloud
SLAs for Azure products and services
There are three key characteristics of SLAs for Azure products and services:
- Performance Targets
- Uptime and Connectivity Guarantees
- Service credits
An SLA defines performance targets for an Azure product or service. The performance targets that an SLA defines are specific to each Azure product and service. For example, performance targets for some Azure services are expressed as uptime guarantees or connectivity rates.
Uptime and Connectivity Guarantees
A typical SLA specifies performance-target commitments that range from 99.9 percent (“three nines”) to 99.999 percent (“five nines”), for each corresponding Azure product or service. These targets can apply to such performance criteria as uptime or response times for services.
The following table lists the potential cumulative downtime for various SLA levels over different durations:
|SLA %||Downtime per week||Downtime per month||Downtime per year|
|99||1.68 hours||7.2 hours||3.65 days|
|99.9||10.1 minutes||43.2 minutes||8.76 hours|
|99.95||5 minutes||21.6 minutes||4.38 hours|
|99.99||1.01 minutes||4.32 minutes||52.56 minutes|
|99.999||6 seconds||25.9 seconds||5.26 minutes|
For example, the SLA for the Azure Cosmos DB (Database) service SLA offers 99.99 percent uptime, which includes low-latency commitments of less than 10 milliseconds on DB read operations and less than 15 milliseconds on DB write operations.
SLAs also describe how Microsoft will respond if an Azure product or service fails to perform to its governing SLA’s specification.
For example, customers may have a discount applied to their Azure bill, as compensation for an under-performing Azure product or service. The table below explains this example in more detail.
The first column in the table below shows monthly uptime percentage SLA targets for a single instance Azure Virtual Machine. The second column shows the corresponding service credit amount you receive if the actual uptime is less than the specified SLA target for that month.
|MONTHLY UPTIME PERCENTAGE||SERVICE CREDIT PERCENTAGE|
Composing SLAs across services
When combining SLAs across different service offerings, the resultant SLA is a called a Composite SLA. The resulting composite SLA can provide higher or lower uptime values, depending on your application architecture.
Consider an App Service web app that writes to Azure SQL Database. These Azure services currently have the following SLAs:
In this example, if either service fails the whole application will fail. In general, the individual probability values for each service are independent. However, the composite SLA value for this application is:
99.95 percent × 99.99 percent = 99.94 percent
This means the combined probability of failure is higher than the individual SLA values. This isn’t surprising, because an application that relies on multiple services has more potential failure points.
Conversely, you can improve the composite SLA by creating independent fallback paths. For example, if SQL Database is unavailable, you can put transactions into a queue for processing at a later time.
With this design, the application is still available even if it can’t connect to the database. However, it fails if both the database and the queue fail simultaneously.
If the expected percentage of time for a simultaneous failure is 0.0001 × 0.001, the composite SLA for this combined path of a database or queue would be:
1.0 − (0.0001 × 0.001) = 99.99999 percent
Therefore, if we add the queue to our web app, the total composite SLA is:
99.95 percent × 99.99999 percent = ~99.95 percent
Notice Microsoft improved Azure SLA behavior. However, there are trade-offs to using this approach: the application logic is more complicated, you are paying more to add the queue support, and there may be data-consistency issues you’ll have to deal with due to retry behavior.
- Source and Credit – Microsoft Official