The advice, which is specifically for virtual machines using Azure, shows that sometimes the solution to a catastrophic failure is turn it off and on again. And again.
Can't read the rest of the article because paywall but apparently users have chimed in saying rebooting 15 times worked for them. Whether they were serious or not remains a question. I can also imagine it was a time-related thing and after 15 reboots enough time has passed for it to be fixed so the user thought 15 times was the magic number.
“We have received feedback from customers that several reboots (as many as 15 have been reported) may be required, but overall feedback is that reboots are an effective troubleshooting step at this stage."
So fuck the headline, the real message is: yes, rebooting is an effective troubleshooting step at the moment.
I don't know the exact timing of that message in the timeline of the incident, so it could be early "please restart and see if issue persists" or late "something was updated, rebooting will probably help", I don't know.
My understanding is, it's just a matter of if the Crowdstrike updater service manages to connect to the internet long enough to download the patch before the core service takes a shit.
My team and I last night were lucky enough that our computers came back up after a single reboot following the BSOD. VPN and certain applications were wonky the rest of the night.
We had nearly 300 servers impacted, almost 100 still down. All planned maintenance for the weekend has been cancelled. Incidents like these make me very glad I climbed the IT ladder enough to not be in a support role anymore.
Hilariously, if Microsoft required WHQL certification for AV kernel drivers like they do for every other, this would have never happened. It would have been discovered in the first hour of testing easy.