The worker instances of customer sites are virtual machines running server OS. The VMs and their host servers are upgraded regularly to keep them secure and healthy.
A well-designed upgrade process can keep customer sites high available through maintenance. IaaS and PaaS take different processes.
IaaS provides virtual machines to cloud customers. Cloud customers manage the software environment of their VMs.
IaaS uses the following process to upgrade the host environment of a virtual machine.
Steps | Cloud Platform | Site Admin | Website Requests Routing | Web Application |
---|---|---|---|---|
1 | Notify Admin | Instance 0 (Active) - To be upgraded Instance 1 (Inactive) |
Running in both instances | |
2 | Switch | Instance 0 (Inactive) - To be upgraded Instance 1 (Active) |
Running in both instances | |
3 | Apply Maintenance | Instance 0 (Inactive) - Upgrading Instance 1 (Active) |
Running in both instances | |
4 | Switch back | Instance 0 (Active) - Upgraded Instance 1 (Inactive) |
Running in both instances |
The web application does not need to restart in any instance.
Is tolerant to web application who are slow or fragile to start up.
Paas provides worker instance virtual machines to cloud customers. The worker instances are ready to host customer websites, with the OS of the VM, web server, runtime of programe languages, etc., that are all maintained by PaaS.
PaaS uses the following process to upgrade the host environment and the worker instances’ app stacks.
Steps | Cloud Platform | Website Requests Routing | Web Application |
---|---|---|---|
1 | Instance 0 (Active) - To be upgraded | Running in #0 | |
2 | Allocate new | Instance 0 (Active) - To be upgraded Instance 1 (Inactive) |
Running in #0 🟩Starting up in #1 |
3 | Overlapped | Instance 0 (Active) - To be upgraded Instance 1 (Active) |
Running in both |
4 | Switch | Instance 0 (Inactive) - To be upgraded Instance 1 (Active) |
Running in both |
5 | Shut down | Instance 0 (Inactive) - To be upgraded Instance 1 (Active) |
Shutting down from #0 Running in #1 |
6 | Deallocate | Instance 1 (Active) | Running in #1 |
The web application needs to start up in a new instance quickly and successfully.
In other words, it requires a cloud-friendly web application.
It is difficult to predict accurately when the maintenance upgrades a specific instance.
See Scale Unit
More upgrade activities, because the PaaS process updates both the host environment and the stack inside the virtual machine.
For site admins who are used to the IaaS maintenance process, the PaaS process is a new concept to understand.
As illustrated in the above process, PaaS allocates a new instance to the site and deallocates the original instance after the maintenance. That implies PaaS has a pool of available instances.
Taking Azure App Service for example: App Service has hundreds of instances in each scale unit. The instances in a scale unit can be allocated to various roles and different customer sites of that scale unit dynamically.
An App Service scale unit uses a global maintenance job to upgrade all its instances, instead of one job per site. The instances are divided into a few update domains. The job shuts down instances in turn to upgrade them. It could take hours or days to walk through all the instances in each update domain. It is difficult to predict when a specific instance of a site in a scale unit will be upgraded during the long maintenance batch.
What if my web application fails to start up quickly and successfully in the new instance.
A cloud-ready web application should be resilient to the restarts in a cloud environment. It should start up quickly and successfully when needed.
The Ultimate Guide to Running Healthy Apps in the Cloud:
Modern-day data centers are extremely complex and have many moving parts. VMs can restart or move, systems are upgraded, and file servers are scaled up and down. All these events are to be expected in a cloud environment. However, you can make your cloud application resilient to these events by following best practices…
As mentioned above, your instances are expected to and will restart…
Some safeguarding features can be used to handle unexpected start up failures.
Taking Azure App Service for example:
Use Warm Up to reserve more time for the web application to start up. It prevents the frontend load balancer from routing requests to a worker instance, before the web appliction is ready there.
Use Auto Heal to automatically restart the application in a worker instance if it does not start up successfully there.
Use Health Check to isolate and then replace an unhealthy instance.
Note: Most start-up errors are caused by web applications instead of a bad instance. They can be healed by restarting the worker process of the web application alone without replacing the instance. Auto Heal restarts are light-weight actions. It is not limited by any replacement quota that applies to Health Check.
I received maintenance notification emails from the cloud platform. Will my site be down through the maintenance?
You receive email notifications either because you previously subscribed to future maintenance activities of a scale unit or are enlisted by the platform.
Routine (planned) maintenance for App Service:
Platform maintenance isn’t expected to impact application uptime or availability. Applications continue to stay online while platform maintenance occurs. Platform maintenance may cause applications to be cold started on new virtual machines, which can lead to cold start delays. An application is still considered to be online, even while cold-starting. For best practices to minimize/avoid cold starts, …
Please review and follow the best practices in The Ultimate Guide to Running Healthy Apps in the Cloud for a resilient web application in the cloud.
I see keyword “Unplanned” in the notification email. Is there any 0-day security vulnerability that you are patching in a hurry?
Having unplanned maintenance does not imply there is a 0-day vulnerability found in the cloud.
For example: The PaaS environment updates are rolled out through deployment rings. If a regression is found in an outer ring, the deployment would be paused, and an unplanned fix would need to be applied to inner ring instances who deployed the same build previously.
PaaS maintains not only the host servers but also the app stack insider the worker instances. Potentially there would be more planned and unplanned maintenance activities in PaaS, while PaaS guarantees the same high availability as IaaS.
Why was my site restarted at business hour.
An App Service scale unit can take hours or days to upgrade. Although the maintenance job is started out of the local business hours, the job’s execution would extend to business hours, and we would see restarts in a specific instance at business hours.
On the other hand, seeing restarts at business hours does not mean site availability is impacted. Please review and follow the best practices in The Ultimate Guide to Running Healthy Apps in the Cloud for a resilient web application in the cloud.
My worker instances depend on a file storage instance. What is the maintenance process of upgrading a file storage instance?
File storage instances follow the same overlapped restart process of PaaS for maintenance.
The web applications, who depend on that file storage, need to restart in all the worker instances to apply the change.
To achieve high availability during the restarts, these restarts are overlapped ones in each worker instance by default. A new worker process starts up in each worker instance first to host the web application linking to the new file storage instance. After the web application starts up in the new worker process, the original process is shut down then.
Again, the web application needs to start up in the new worker process quickly and successfully. Even though it is an overlapped restart on that instance, high availability depends on how quickly the app starts in the new worker process too.
Further, because there are two worker processes run side-by-side in one worker instance for each web application, memory, CPU, and IO resource could be tensed during the overlapped restarts. App Service could alter to non-overlapped restarts if density is too high. In that case, there can be a service interruption till the web application starts up successfully in the new worker process.
Please review and follow the best practices in The Ultimate Guide to Running Healthy Apps in the Cloud for a resilient web application in the cloud.
How do I configure my site to the outer(late) upgrade deployment ring? Or even manually apply the upgrades?
Upgrade preference for App Service Environment planned maintenance:
With App Service Environment v3, you can specify your preference for when and how the planned maintenance is applied. The upgrade can be applied automatically or manually. Even with your preference set to automatic, you have some options to influence the timing.