We sincerely apologise for the major incident that impacted BlastIQ customers globally on Monday morning AEST.
During the incident an expired SSL certificate prevented users, integrated systems and field devices from logging in to BlastIQ systems.
BlastIQ uses Microsoft’s Azure Front Door (AFD) service to provide reverse proxy services for BlastIQ systems and to accelerate customer access to BlastIQ. When connecting to BlastIQ, a user is automatically routed to the nearest AFD edge node, which then routes traffic privately to BlastIQ using Azure’s global fibre network.
BlastIQ uses Azure’s managed SSL service to automatically issue and update the security certificates used by BlastIQ. After a new certificate is issued, Azure automatically deploys the new certificate to all AFD edge nodes.
The BlastIQ team have worked with engineers at Microsoft who investigated the cause of the issue. Microsoft have identified that the SSL management automation correctly issued a new certificate to replace the expiring certificate, however a bug in their deployment tools prevented the new certificate from being deployed to the AFD edge nodes.
When BlastIQ engineers informed Microsoft of the issue impacting customers, Microsoft engineers performed a manual update of the certificate on the AFD edge nodes, however approximately 10% of AFD edge nodes were not updated successfully, resulting in intermittent failures for some users and Microsoft engineers performed additional checks to manually identify those AFD edge nodes which still had an expired certificate and update them.
Microsoft have identified the bug in their automated certificate deployment tooling and will correct it to prevent recurrance for any Azure customers.
BlastIQ will continue to use Azure’s automated management tools and global network services to provide fast, reliable service for BlastIQ customers globally. We will introduce some additional monitoring checks to ensure that where possible we identify when automation has failed to perform its functions correctly.
We sincerely apologise to our customers who experienced interuption and disruption in their work due to this outage. We have investigated this issue and are confident that the root cause is being addressed.
We would like to thank the engineers at Microsoft who worked with us to resolve the issue and restore service.