As many of you reported TV Tropes went offline at 2:40am EST on July 7th. It was offline for over 14 hours. The worst outage in many years. We did receive email and text alerts when it happened but unfortunately it was a major hardware failure which took quite a while to get under control.
The cause was a total failure of our database cluster. 6 of our 8 hard drives failed simultaneously resulting in a complete loss of data. We had to have our server company replace the cluster and then we had to rebuild the site from database backups. We do automatic backups every morning. Unfortunately the failure happened hours before the next backup so 24 hours of changes were lost.
To make it worse the history of wiki changes is only updated twice a week because it is over a 1TB in size. We are working on restoring that now so the history tab is blank on all pages until it's done. Editing will be offline for another 24 hours until we get that fixed. And it means we'll lose 72 hours of wiki history due to the timing of the last backup.
We will be working on optimizing our database structure so we can increase the frequency of our database backups to protect the data in the future.
We have redundant web servers on a load balancer, redundant database servers in a cluster and redundant hard drives in every server. So how did this happen? According to our server company there was a manufacturers bug in the firmware of the specific model that 6 of our 8 hard drives were on. That bug caused the disks to die after a certain number of hours running. We don't yet have all the details. They are reaching out to the manufacturer to get more information. I'll update here as I learn more.
UPDATE: (July 8th)
Editing is now enabled! History should be restored as of July 4th 10am EST. The history database imported faster than I expected (4 hours to decompress 1.1TB sql file, 12.5 hours to import)
The only thing I haven't done yet is purge the CDN cache. You must logout to view a page cache. Logged-in users get the live site. Not all pages are still cached and they do expire.
I'll hold off on purging the cache for a few more hours. If there is some specific edit you remember doing during those 24 hours that were lost you may be able to find it by logging out and viewing the cached page. Then login and make that edit again.
Edited by itcdr on Jul 8th 2020 at 7:12:45 AM
I was expecting near total deletion of everything,glad thats not the case!
New theme music also a boxHey, thanks for the update and I hope things work out. Dude, 6 out of 8 HDD's failed. The poor site was like a B-17 after raiding Berlin......
Edited by TairaMai on Jul 7th 2020 at 6:08:54 PM
All night at the computer, cuz people ain't that great. I keep to myself so I won't be on The First 48It could have been 8 of 8,it could have been much much worse,trust me I've been on websites where EVERYTHING was lost
New theme music also a boxI was looking everywhere for a thread to talk about the website going offline!
When I couldn't log back in or restart my password, I was worried that I would have to start from scratch and make up a new handle.
Hope everything works out!
Glad we’re back! Better to just have a little bit lost rather than the alternative
Thanks for keeping us posted. Hopefully the worst of the troubles are over with 💚
oh, that's why I need this binary mind // ⌘It sounds like you've all had a very tough time putting it back together.
I think you'll all need a cup of tea and a posh biscuit after this.
If my post doesn't mention a giant flying sperm whale with oversized teeth and lionfish fins for flippers, it just isn't worth reading.I only lost 1 avatar, that I uploaded yesterday. I am happy to see things are mostly back at normal. Shame about lost posts in threads, but better that than lose it all.
It was so scary to see tv tropes devoid of anything for brief time.
Still, our gratitude for the comeback is immeasurable and I hope it wasn't hell for you guys. Kick ass! And take a break, you guys more than deserve a break.
Edited by Dhiruxide on Jul 7th 2020 at 1:19:16 PM
I was hoping to find a thread on TV Tropes to discuss the crash...except that I wouldn't be able to because the site that crashed was TV Tropes.
I was scared that everything had been wiped, but it's good to know that we could recover the site.
Glad we're back! I was scared we'd lose 12 years worth of history!
Currently Working On: Incorruptible Pure PurenessGood work getting it back up this fast, you guys. From the sound of it, this could have taken a few days to fix.
Edited by TheLovecraftian on Jul 7th 2020 at 8:18:23 AM
Noticed that I video that I submitted which was showing as "approved" yesterday is now back in "awaiting review" status, but I'll gladly take that over the near-total loss of everything that people were hearing. Glad for the site being back up to view again!
I’m surprised that this much of TV Tropes was properly archived, I was kinda expecting a Pantheon Continuity Reboot.
As bummed as I am that I lost some detailed entries and my first go at image pickin', I'm so grateful the staff pulled it together when things seemed dire.
So no editing for 24 hours, what about forums and TLP that had work reverted yesterday? Are those to be left alone too?
I do some cleanup and then I enjoy shows you probably think are cringe.Glad we only lost 24 hours as opposed to the months we lost with The Great Crash.
Keet cleanupSo those 24 hours of changes mentioned in the second paragraph are definitely gone for good? Or is there a possibility that they can be retrieved? Just wondering if I'll need to redo them once editing it back.
Thank fuck this wasn't the Great Crash all over again.
Yeah, could've been a lot worse, so good job on putting things back together so fast.
Not a bad idea for those on Facebook to Like the TV Tropes page. It's an info channel if this site is down.
This freaked me out, good to see that the loss wasn’t too bad.
back lolWasn't able to get on at all due to the database server being down. I'm glad this place is back, and that they're doing all they can to get things back to normal.
As far as it's understood, all activity from pre-6AM Pacific Standard Time yesterday (7/6) can be retrieved. Anything made since then is lost. That's certainly the case with edits, I believe it's the same case for forums, TLP, etc.
Edited by nombretomado on Jul 7th 2020 at 4:30:34 AM
VERY glad and relieved to see the site is back. I was fearing the worst there for a little while, so glad the loss is relatively minor.
so glad that everything got solved, for the most part
however, this got me thinking: we should have a way to archive any tv tropes stuff on another platform, say something like this happens again
and the public won't dwell on my transmission cause it wasn't televised.Good to hear it's all mostly back. (Of all the days to make my first article.)
Say, is there a reason the page source view is limited to logged in users? Back when some pages of the site was sporadically available, I did try to save some newly created, at-risk content, only to find out logins were broken too.
(Though I assume content that was available during the crash is content that sat on either of the surviving two drives, in which case the question becomes relevant whether this surviving content could be merged into the backup we reverted to.)
Edited by LupoCani on Jul 7th 2020 at 10:34:07 AM
As many of you reported TV Tropes went offline at 2:40am EST on July 7th. It was offline for over 14 hours. The worst outage in many years. We did receive email and text alerts when it happened but unfortunately it was a major hardware failure which took quite a while to get under control.
The cause was a total failure of our database cluster. 6 of our 8 hard drives failed simultaneously resulting in a complete loss of data. We had to have our server company replace the cluster and then we had to rebuild the site from database backups. We do automatic backups every morning. Unfortunately the failure happened hours before the next backup so 24 hours of changes were lost.
To make it worse the history of wiki changes is only updated twice a week because it is over a 1TB in size. We are working on restoring that now so the history tab is blank on all pages until it's done. Editing will be offline for another 24 hours until we get that fixed. And it means we'll lose 72 hours of wiki history due to the timing of the last backup.
We will be working on optimizing our database structure so we can increase the frequency of our database backups to protect the data in the future.
We have redundant web servers on a load balancer, redundant database servers in a cluster and redundant hard drives in every server. So how did this happen? According to our server company there was a manufacturers bug in the firmware of the specific model that 6 of our 8 hard drives were on. That bug caused the disks to die after a certain number of hours running. We don't yet have all the details. They are reaching out to the manufacturer to get more information. I'll update here as I learn more.
UPDATE: (July 8th)
Editing is now enabled! History should be restored as of July 4th 10am EST. The history database imported faster than I expected (4 hours to decompress 1.1TB sql file, 12.5 hours to import)
The only thing I haven't done yet is purge the CDN cache. You must logout to view a page cache. Logged-in users get the live site. Not all pages are still cached and they do expire.
I'll hold off on purging the cache for a few more hours. If there is some specific edit you remember doing during those 24 hours that were lost you may be able to find it by logging out and viewing the cached page. Then login and make that edit again.
Edited by itcdr on Jul 8th 2020 at 7:12:45 AM