As many of you reported TV Tropes went offline at 2:40am EST on July 7th. It was offline for over 14 hours. The worst outage in many years. We did receive email and text alerts when it happened but unfortunately it was a major hardware failure which took quite a while to get under control.
The cause was a total failure of our database cluster. 6 of our 8 hard drives failed simultaneously resulting in a complete loss of data. We had to have our server company replace the cluster and then we had to rebuild the site from database backups. We do automatic backups every morning. Unfortunately the failure happened hours before the next backup so 24 hours of changes were lost.
To make it worse the history of wiki changes is only updated twice a week because it is over a 1TB in size. We are working on restoring that now so the history tab is blank on all pages until it's done. Editing will be offline for another 24 hours until we get that fixed. And it means we'll lose 72 hours of wiki history due to the timing of the last backup.
We will be working on optimizing our database structure so we can increase the frequency of our database backups to protect the data in the future.
We have redundant web servers on a load balancer, redundant database servers in a cluster and redundant hard drives in every server. So how did this happen? According to our server company there was a manufacturers bug in the firmware of the specific model that 6 of our 8 hard drives were on. That bug caused the disks to die after a certain number of hours running. We don't yet have all the details. They are reaching out to the manufacturer to get more information. I'll update here as I learn more.
UPDATE: (July 8th)
Editing is now enabled! History should be restored as of July 4th 10am EST. The history database imported faster than I expected (4 hours to decompress 1.1TB sql file, 12.5 hours to import)
The only thing I haven't done yet is purge the CDN cache. You must logout to view a page cache. Logged-in users get the live site. Not all pages are still cached and they do expire.
I'll hold off on purging the cache for a few more hours. If there is some specific edit you remember doing during those 24 hours that were lost you may be able to find it by logging out and viewing the cached page. Then login and make that edit again.
Edited by itcdr on Jul 8th 2020 at 7:12:45 AM
When for most of the day, the site was down, I was naively not as concerned as I probably should have been knowing what happened now. I'll admit I was antsy without the site running, and I give full respect to the team for getting it back up so quickly with so little lost. Oddly enough, I hadn't made any edits or forum posts in the time that was borked, so I'm lucky I didn't lose anything I did. It's scary that this happened, but it could have been so much worse.
Two questions:
1. How are images uploaded yesterday still functional?
2. Have you considered talking to expert data recoverers? I know there are people who can basically work miracles.
I think only some images were lost, not all? I've seen people say they still have their images, but both times I searched for some Paper Mario 64 images I uploaded a few hours before the crash and came up with nothing.
Jawbreakers on sale for 99ยขI don't know about the image database. I suspect that any images you're seeing from yesterday are cached by the CDN and are no longer actually on our servers.
As for data recovery, that costs a lot of money and can take weeks. If we do that just to retrieve 24-48 hours of data, it would not be worth it.
"It's Occam's Shuriken! If the answer is elusive, never rule out ninjas!"Well, FWIW, I'm sure there are a lot of people who would be willing to help pay for it. I certainly am.
Ah good, the New Video Examples are back!
So what is CDN cache and how do I access it?
We can never truly eradicate the coronavirus, but we can suppress its threat like influenzaSave your money for something more necessary, IMXO. And from what I can see, you don't have the adfree pass, you could help the site by buying that.
Edited by Piterpicher on Jul 8th 2020 at 6:04:16 PM
Currently mostly inactive. An incremental game I tested: https://galaxy.click/play/176 (Gods of Incremental)You would have to log out of your TV Tropes account. You will see the cached version of the site, including the history pages. You may copy any edits you want to make from that history, then re-do them in the live site after logging in.
Note that we will flush the cache very soon, so there is a narrow window when this is possible.
"It's Occam's Shuriken! If the answer is elusive, never rule out ninjas!""You will see the cached version of the site, including the history pages."
After I log out, where do I see the cached version of the site? How do I access it?
We can never truly eradicate the coronavirus, but we can suppress its threat like influenzaSo what are we gonna call this event? All my ideas aren't G-rated.
my brain is a computer with 4k of ram. this is a jokes wikiI just made like 3 huge edits on Indivisible and all the tropes I had added in there were also cross wicked and edited onto said trope pages. This is just ridiculous how I make my biggest edit since joining this site, cleared it off my computer, and this happens. How do I access the Cache?
Edited by CaptainJJC on Jul 8th 2020 at 12:15:03 PM
Excuse me, FGO is missing its Heartwarming, Recap, and Ensemble Dark Horse pages altogether; the links in-site point to empty pages. Any idea how this happened? Is there any way to double check pages that were similarly puzzlingly lost in this fashion?
For anyone asking for caches, what I did was to log out, close the site in your browser, Google (or Yahoo) the name of the page you want (I did "tv tropes [insert page name here]"), then click on it when it shows up in the search results.
Edited by Albert3105 on Jul 8th 2020 at 12:19:47 PM
https://tvtropes.org/pmwiki/posts.php?discussion=15942220080A15257500
Just a little something I thought might help people deal with what happened. I'm already getting over it (or trying to) but losing hours of work isn't a walk in the park, especially if you usually don't tend to edit a lot and just happened to do so on the day of the outage.
I don't want to draw attention away from the fact that losing one day instead of weeks or months really was the best possible outcome, but it's still natural to feel sad.
"I think if you're capable of entertaining people, then you are doing a good thing. - Stan LeeThanks, I'll try that out, hopefully I could easily recover stuff from the cache, since I recall editing a lot before the site crashed.
BTW, how many hours left until the cache gets updated? Is it an automated process by Google?
Edited by DanteVin on Jul 9th 2020 at 12:31:52 AM
With Great Power, Comes Great Motivation@Piterpicher: You know, I think I'll do that. Thanks for the suggestion.
I've updated the bulletin so that it refers to the cache technique mentioned by Fighteer above.
"For a successful technology, reality must take precedence over public relations, for Nature cannot be fooled." - Richard FeynmanHow else can we give financial support, by the way?
() I especially cannot thank you enough for managing to recover my LEGO Rewind pages! While it currently appears as though the page markup will have to be re-written from the ground up, the fact that it was preserved at all is a miracle in my book.
Edited by Noah1 on Jul 8th 2020 at 12:42:38 PM
An open mind and compassionate heart are among the most important qualities we can have.This issue is not related to, caused by, nor exacerbated by TV Tropes' financial situation. Anyone is welcome to subscribe to ad-free if they want, and the revenue will help us keep things running and hopefully get back to active development on the site.
I'm not aware of any other donation campaigns that are currently open. The admins may do something like that again in the future if they feel it's appropriate.
Edited by Fighteer on Jul 8th 2020 at 12:44:10 PM
"It's Occam's Shuriken! If the answer is elusive, never rule out ninjas!"You're welcome Noah.
@Fighteer: I'm aware that more money wouldn't have prevented this problem, but even leaving aside the expense of potential data recovery, I think monetary contributions might help with the recovery process. At the very least, it would be a firm show of support for the wiki.
I discussed this with my dad while we were walking, and we came to the conclusion that an uptime integer must have overflowed and either gone negative or caused a data conflict.
Edited by nm3youtube on Jul 8th 2020 at 5:53:59 PM
I don't know what happens to video examples in that situation. If they were uploaded to our Vimeo account, they should still be there, but the linking data may be lost from the TV Tropes end.
Editing is back.
The hardware is owned by the hosting company and was replaced at no charge to us. I'm fairly sure that we can't fix this problem by throwing money at it.
Edited by Fighteer on Jul 8th 2020 at 11:14:33 AM
"It's Occam's Shuriken! If the answer is elusive, never rule out ninjas!"