As many of you reported TV Tropes went offline at 2:40am EST on July 7th. It was offline for over 14 hours. The worst outage in many years. We did receive email and text alerts when it happened but unfortunately it was a major hardware failure which took quite a while to get under control.
The cause was a total failure of our database cluster. 6 of our 8 hard drives failed simultaneously resulting in a complete loss of data. We had to have our server company replace the cluster and then we had to rebuild the site from database backups. We do automatic backups every morning. Unfortunately the failure happened hours before the next backup so 24 hours of changes were lost.
To make it worse the history of wiki changes is only updated twice a week because it is over a 1TB in size. We are working on restoring that now so the history tab is blank on all pages until it's done. Editing will be offline for another 24 hours until we get that fixed. And it means we'll lose 72 hours of wiki history due to the timing of the last backup.
We will be working on optimizing our database structure so we can increase the frequency of our database backups to protect the data in the future.
We have redundant web servers on a load balancer, redundant database servers in a cluster and redundant hard drives in every server. So how did this happen? According to our server company there was a manufacturers bug in the firmware of the specific model that 6 of our 8 hard drives were on. That bug caused the disks to die after a certain number of hours running. We don't yet have all the details. They are reaching out to the manufacturer to get more information. I'll update here as I learn more.
UPDATE: (July 8th)
Editing is now enabled! History should be restored as of July 4th 10am EST. The history database imported faster than I expected (4 hours to decompress 1.1TB sql file, 12.5 hours to import)
The only thing I haven't done yet is purge the CDN cache. You must logout to view a page cache. Logged-in users get the live site. Not all pages are still cached and they do expire.
I'll hold off on purging the cache for a few more hours. If there is some specific edit you remember doing during those 24 hours that were lost you may be able to find it by logging out and viewing the cached page. Then login and make that edit again.
Edited by itcdr on Jul 8th 2020 at 7:12:45 AM
Well then, where would I start this project?
Short Term probably.
Currently Working On: Incorruptible Pure PurenessNyehhhh this probanly would've been a good time for the Page Source tab to be viewable by people not logged in.
Got my edits besides the Characters.Paper Mario 64 (which is the one with the most work, but c'est la vie I guess ), at any rate.
Jawbreakers on sale for 99¢EDIT: Archive.vn seems to work, so now we have a method we can be sure we can use.
Edited by ImperialMajestyXO on Jul 7th 2020 at 7:10:02 AM
Oh thank christ, I thought the site was a goner after that long a outage.
Glad things are mostly back to normal
Silence is golden, noise is platinum. Keelah se'laiYou interested in taking part in my project to save page histories that still exist before the cache updates?
Posted a bulletin so people know why editing is off.
"For a successful technology, reality must take precedence over public relations, for Nature cannot be fooled." - Richard Feynman6 out of 8 harddrives failing simultaneously?! Do I even want to know the likelihood of that happening, even if its the same design glitch?
Out of curiosity, would losing all 8 have been any worse?
Politics is the skilled use of blunt objects.Well, thanks for the explanation. Gee, what a terrible timing
"Bingo! If two species hate each other, they will wipe each other out on their own."- Note the words were:
It's like a Y2K bug. Happens "after a certain number of hours". Like clockwork.
If you start up all the drives at the same time, like you would when using them together... Then they all fail together, since they're identical.
So, what I'm taking from this, is "Wait until tomorrow to edit."... Ok.
Edited by Malady on Jul 7th 2020 at 7:30:01 AM
Disambig Needed: Help with those issues! tvtropes.org/pmwiki/posts.php?discussion=13324299140A37493800&page=24#comment-576Yes, yes it would. That would mean everything would be lost. Sure, there may have been backups even then, but still.
Edited by CustardAndPie on Jul 7th 2020 at 9:34:53 AM
Hey how you doing well I'm doing just fine I lied I'm dying insideA belated Hell Yeah to the mod team for rising to this unexpected challenge. You all have done well in saving this website and all the hard work and love the tropers have poured into it.
I have just noticed that some page edits (including one about removal of inappropiate content) were reverted since the restoration.
135 - 169 - 273 - 191 - 188 - 230 - 3006 out of 8... Man. Technology sure is something else.
A cruel, sick joke is still a joke, and sometimes all you can do is laugh.Kudos to all of you for restoring the site the best you could!
But are you also working to restore the lost forum posts? I know a lot of RPs were affected, and the players are scrambling to recreate lost scenes already.
"Anemone dear, I know you want to be more independent from me, but... please take care, okay?"My understanding is that we don't yet know for sure how much can be retrieved, but we're operating under the assumption that aside from some cached edit histories, none of it can.
I'm sorry, but forum posts fall under the same criteria. We have lost everything made after 6AM PST yesterday.
The crash could have been much worse - the last time catastrophic hardware failure struck the wiki 12 years ago three months of edits and forum posts were destroyed.
Not to sugar-coat the many edits and posts from around yesterday lost to today's crash but TV Tropes seems to have come a long way since then - based on nombretomado's statement only less than a day's worth of posts was confirmed incinerated.
Might be worth an Administrivia page?
Edited by Albert3105 on Jul 7th 2020 at 11:14:38 AM
Funnily enough, it happened a few hours after I made a post reciting "Peter Percival Patterson's Pet Pig Porky" on the Trash Heap. Needless to say, I think Porky popped so big it made the hard drives fail.
she/her/they | wall | sandboxIf you followed a page or forum before the crash, did that also get reverted? Or did I just mistakenly think I followed certain forum posts that I actually didn't?
I do some cleanup and then I enjoy shows you probably think are cringe.Well damn, I’m in a lot of R Ps that run at a steady pace and I lost some posts. Oh well.
Edited by TabbyGirl4 on Jul 7th 2020 at 11:22:12 AM
"I'm Mary Poppins, Y'all!" - Yondu,2017That's a shame, but who knows? We might still be able to find a way.
I'm in awe at how the catastrophic failure of 6 out of 8 drives (when the system was designed to allow for only 4 crashes) only managed to wipe out July 6, 2020's work and not more. Especially since in the 2008 crash, one broken drive nuked 3 months of work in one swoop.
Edited by Albert3105 on Jul 7th 2020 at 11:38:17 AM
That's the wonderful thing about daily backups.
Just glad the site is back, tv tropes is in my personal opinion one of the mainstays of the internet it going totally offline is too horrific to think about.