Facebook hasn’t had a great week. Or a great month, really. Leaked internal documents show that the company has intentionally promoted anger and misinformation on its platform, a revelation that’s forced Facebook to defend itself in front of Congress. But as it struggled to control its public image on Monday, something odd happened—Facebook and its services disappeared from the internet for six hours.
The unexpected outage, which is the longest Facebook has experienced since 2008, took down all of its apps and services. People across the globe were unable to use Instagram, WhatsApp, Oculus, and other Facebook-owned platforms during the outage. And because many people use Facebook to log into 3rd party apps, they found themselves locked out of games, athletic apps, and other software.
So what happened? Well, we’ve known the basic details since yesterday. Facebook and its domains were pulled from global routing tables, effectively blocking anyone from connecting with the company’s servers. The “facebook.com” domain disappeared from the internet, and even showed up as “for sale” on domain websites (an accident, but still).
To the huge community of people and businesses around the world who depend on us: we're sorry. We’ve been working hard to restore access to our apps and services and are happy to report they are coming back online now. Thank you for bearing with us.
— Meta (@Meta) October 4, 2021
Because Facebook operates its own registrar, we concluded that something bad happened within the company’s facilities. A successful hacking attempt at this scale is unlikely, so we were left with two possibilities—either Facebook’s server infrastructure encountered a critical failure, or a Facebook employee pulled the plug. The latter option seemed like a strong possibility, given the bombshell 60 Minutes interview with a Facebook whistleblower that occurred on Sunday.
But Facebook now says that a “routine maintenance job” led to the outage. The company’s engineers issued a command to assess Facebook’s global network capacity, and for whatever reason, the command “unintentionally took down all the connections in our backbone network, effectively disconnecting Facebook data centers globally.”
Facebook’s networks could no longer respond to DNS queries, making them totally inaccessible. This problem required a hands-on fix from engineers, who had trouble getting on-site because Facebook facilities are defended by smart security systems, such as network-connected keycards. Unfortunately, Facebook hosts these security systems on its own servers, which were unresponsive.
We’re not really sure how Facebook engineers got to the company’s servers—reports that they used an angle grinder to break down doors and cages have not been verified by Facebook or independent sources. Either way, Facebook managed to resolve the issue, but it had to slowly bring its services online to prevent a surge in traffic, which would trigger a dramatic increase in power consumption and damage Facebook’s server hardware.
The consequences to this outage may not seem obvious. After all, you probably had a pretty productive workday without Instagram! But in some countries, namely India, WhatsApp is the primary form of mobile communication. If the Facebook outage had gone on for a full week, or even just a few days, it could have had a serious impact on Indian business, emergency medicine, and society.
And as documented by CloudFlare, people began to repeatedly refresh Facebook and its services after they went down, leading to a 30X traffic increase. While this traffic increase probably didn’t hurt Facebook’s efforts to bring its servers back to life, it did put a small strain on non-Facebook networks, a sign that future outages could cripple internet infrastructure as a whole.
While we enjoyed taking a six-hour break from social media, we’re worried that Facebook’s recent outage may be a sign of things to come. One small mistake brought down an empire for a full day and tanked Facebook stock—when a mistake is this easy to make, it will almost certainly happen again. Even if you’re conspiratorially-minded and think that a disgruntled employee brought down Facebook, the idea that one engineer could pull off such a feat isn’t very reassuring.
Source: Facebook (1, 2) via BleepingComputer