As many of you reported TV Tropes went offline at 2:40am EST on July 7th. It was offline for over 14 hours. The worst outage in many years. We did receive email and text alerts when it happened but unfortunately it was a major hardware failure which took quite a while to get under control.
The cause was a total failure of our database cluster. 6 of our 8 hard drives failed simultaneously resulting in a complete loss of data. We had to have our server company replace the cluster and then we had to rebuild the site from database backups. We do automatic backups every morning. Unfortunately the failure happened hours before the next backup so 24 hours of changes were lost.
To make it worse the history of wiki changes is only updated twice a week because it is over a 1TB in size. We are working on restoring that now so the history tab is blank on all pages until it's done. Editing will be offline for another 24 hours until we get that fixed. And it means we'll lose 72 hours of wiki history due to the timing of the last backup.
We will be working on optimizing our database structure so we can increase the frequency of our database backups to protect the data in the future.
We have redundant web servers on a load balancer, redundant database servers in a cluster and redundant hard drives in every server. So how did this happen? According to our server company there was a manufacturers bug in the firmware of the specific model that 6 of our 8 hard drives were on. That bug caused the disks to die after a certain number of hours running. We don't yet have all the details. They are reaching out to the manufacturer to get more information. I'll update here as I learn more.
UPDATE: (July 8th)
Editing is now enabled! History should be restored as of July 4th 10am EST. The history database imported faster than I expected (4 hours to decompress 1.1TB sql file, 12.5 hours to import)
The only thing I haven't done yet is purge the CDN cache. You must logout to view a page cache. Logged-in users get the live site. Not all pages are still cached and they do expire.
I'll hold off on purging the cache for a few more hours. If there is some specific edit you remember doing during those 24 hours that were lost you may be able to find it by logging out and viewing the cached page. Then login and make that edit again.
Edited by itcdr on Jul 8th 2020 at 7:12:45 AM
Can the admins try to restore this? https://tvtropes.org/pmwiki/pmwiki.php/Fanfic/GrimmFall
So can the edit history restoration restore the edits up to 7th July, or not?
We can never truly eradicate the coronavirus, but we can suppress its threat like influenzaAccording to the pinned comment, we've lost all history since July 4.
Bless you guuuuys, I feel like a trope veteran now.
I don't think that much eventful stuff happened over the last three days (I certainly didn't make any major edits), but it's good to hear they'll likely optimise this. I don't know if there's much that can be done since most history seems like plaintext, they should purge pre-2010/2011/2012 history stuff or automatically search for pages with 1500+ edits and purge them since it's unlikely anyone would want to load that much and search for edits anyway.
Edited by Piterpicher on Jul 8th 2020 at 2:26:05 PM
Currently mostly inactive. An incremental game I tested: https://galaxy.click/play/176 (Gods of Incremental)I'm so glad things are mostly back to normal, I was among those really worried.
Long live TV tropes.
So basically, we'll be getting the videos that were uploaded on July 6 back soon?
Everything that can be restored has been restored as far as the wiki is concerned, except for the article history which should be back soon.
If it's not here now, it's gone forever and you'll have to redo it.
"It's Occam's Shuriken! If the answer is elusive, never rule out ninjas!"I'm here to give my late kudos to all the restoration efforts. It must have been a really busy day yesterday.
Unfortunately, I don't remember what edits (if any) I made on the 6th. Perhaps it's Laser-Guided Amnesia at its finest...
Are mods able to edit or is it down for you guys too?
The Protomen enhanced my life.Hats off to you steely-eyed missile men!
I tried to go to edit history except now all the edits are all erased? What should I do now?
"Now and forever, I'll show them I care!"Editing is now enabled! History should be restored as of July 4th 10am EST. The history database imported faster than I expected (4 hours to decompress 1.1TB sql file, 12.5 hours to import)
Let me know if you see any new bugs
Edited by itcdr on Jul 8th 2020 at 7:05:10 AM
The admins are working to restore that. I’ve seen people say they’ve been able to see page histories by logging out; haven’t tried it myself.
Edited by jandn2014 on Jul 8th 2020 at 10:07:05 AM
back lolHallelujah!
...That is an admin.
Edited by Crossover-Enthusiast on Jul 8th 2020 at 10:07:21 AM
Jawbreakers on sale for 99¢Glad to see that editing is back up
I was typing my post before said admin posted. I was trying to reply to the post above them; I’ve fixed my own post since then.
Edited by jandn2014 on Jul 8th 2020 at 10:10:50 AM
back lolThe only thing I haven't done yet is purge the CDN cache. You must logout to view a page cache. Logged-in users get the live site. Not all pages are still cached and they do expire.
I'll hold off on purging the cache for a few more hours. If there is some specific edit you remember doing during those 24 hours that were lost you may be able to find it by logging out and viewing the cached page. Then login and make that edit again.
I wish I could post a list of pages that were changed in those 24 hours but that information was lost in the crash.
Thank you very much, again. I still think you should take into consideration deleting page history that's extremely old like pre-2012 or for pages that have 1500+ edits (I likely would have posted about it a long time ago, but I was lazy), but still, this is fast service for users and tons of hard work.
Edited by Piterpicher on Jul 8th 2020 at 4:18:22 PM
Currently mostly inactive. An incremental game I tested: https://galaxy.click/play/176 (Gods of Incremental)Yuck! 75% disk loss is a worst-case scenario...and very suspicious. Does your hosting company keep up with firmware, driver, and OS updates? The only time I've ever had a system fail in such a way was due to very old firmware on the drives causing the RAID controller to not be able to properly handle a single drive failure (as designed), causing multiple drives to fail and wiping out the entire array. It was a hard-learned lesson early in my career.
All this caused by one programming mistake the staff could never have known about or had control over. I'm just interested to know what the dodgy firmware did to the hard drives to cause them to crash. (Mainly for Idiot Programming purposes; something TV Tropes itself fell victim to seems perfect for that page.)
(Edited out something because I fell victim to the same thing N8tureGrace did.)
Edited by nm3youtube on Jul 8th 2020 at 3:24:32 PM
What we've been told is that the drives bricked after a certain amount of time in operation, indicating that there may have been a date/time-related bug in the firmware. Updating the firmware on a hard drive is a tricky business.
"It's Occam's Shuriken! If the answer is elusive, never rule out ninjas!"Ok, what about some of the Video Examples uploaded days ago as in, 2 days ago?
Edited by Agent2583 on Jul 8th 2020 at 3:42:31 PM
Editing is back?!
https://www.youtube.com/watch?v=zCPOo3bQIO8
Content Warning: My posts may involve my actions dealing with R-rated or Not Safe for Work content. Same for my edit history.Perhaps we need another Kickstarter to buy new hardware?
I can wait until editing is up again. Just having the site up and running is good enough for me.
(also wow, 7 pages in less than 12 hours?! And I opened the 8th while I was composing?!? That's why I love these forums.)
Edited by AndyLA on Jul 8th 2020 at 7:18:31 AM
One Nation Under WiFi