Follow TV Tropes

Following

Converting all data to UTF8 (now complete)

Go To

Hey tropers, we are about to undergo a large update to the tvtropes' database to convert the encoding. During this time the login system will be offline. You'll still be able to read TVTropes but you won't be able to post or edit anything during the conversion.

How long will the login system be down?

My most recent test run took 9 hours (my first test took 20+ hours so this is after a lot of optimizations to get down to 9 hours). ***it will now take 11 hours. see update below

Can you give me more details?

TVTropes was originally hosted on Windows' servers back in the day and all content on the site is encoded with Windows 1252 (a superset of ISO-8859-1, aka latin1).

According to the W3Tech, only 1.3% of internet traffic is ISO-8859-1. I'm guessing a large amount of that is TVTropes considering we have millions of pages defined with that charset.

The majority of the internet is encoded in UTF8. By doing this conversion it will bring us up to modern standards with the rest of the web and allow us to more easily support other languages and icons. It will also allow us to use more modern tools to help with editing such adding a WYSIWYG editor option. It will also help with code development as we often have to add special workarounds to continue to support this long deprecated charset.

Are we changing anything else?

While we have the login system offline, we are going to upgrade the edits history database table to include a sequence number. We've always wanted to do this but it could not be done while the site was online. That table has 40M+ rows of data. With this change we'll be able to make it so we can easily jump to any page when filtering to edits from a specific page or from a specific user. We have this for the forum and that's why you can jump to page 500 for example of a long running thread and not have a long delay.

When will this happen?

UPDATE: (take two) This is now scheduled for Mon Dec 5th at 8:30PM PST until Tue Dec 6th at 7:30AM

At 8:30PM PST you will be logged-off of TVTropes so make sure to save anything before that time. Once the conversion is done the next morning, I'll point the code to the new server and everyone should be auto logged in. If not, when you see the all clear announcement at the top of the page, go ahead and try logging in again.

The process will take approximately 11 hours. I had it down to 9 hours until I found examples of utf8 encoded values inside latin1 columns so I had to add some extra testing to ensure that data doesn't get double encoded in the process.

If you have any issues during the migration please send an email to thestaff@tvtropes.org

UPDATE: (complete) The migration is officially complete. It ended up taking nearly 12 hours but we got there. All data on TVTropes (hundreds of millions of rows of data) has now been converted from Latin1 to UTF8. We now have the same encoding as 99% of other websites and can support special characters and other languages. It will also allow us to build other tools such as a WYSIWYG editor.

Edited by itcdr on Dec 6th 2023 at 8:48:40 AM

itcdr Since: Aug, 2014
#1: Dec 1st 2023 at 12:53:01 PM

Hey tropers, we are about to undergo a large update to the tvtropes' database to convert the encoding. During this time the login system will be offline. You'll still be able to read TVTropes but you won't be able to post or edit anything during the conversion.

How long will the login system be down?

My most recent test run took 9 hours (my first test took 20+ hours so this is after a lot of optimizations to get down to 9 hours). ***it will now take 11 hours. see update below

Can you give me more details?

TVTropes was originally hosted on Windows' servers back in the day and all content on the site is encoded with Windows 1252 (a superset of ISO-8859-1, aka latin1).

According to the W3Tech, only 1.3% of internet traffic is ISO-8859-1. I'm guessing a large amount of that is TVTropes considering we have millions of pages defined with that charset.

The majority of the internet is encoded in UTF8. By doing this conversion it will bring us up to modern standards with the rest of the web and allow us to more easily support other languages and icons. It will also allow us to use more modern tools to help with editing such adding a WYSIWYG editor option. It will also help with code development as we often have to add special workarounds to continue to support this long deprecated charset.

Are we changing anything else?

While we have the login system offline, we are going to upgrade the edits history database table to include a sequence number. We've always wanted to do this but it could not be done while the site was online. That table has 40M+ rows of data. With this change we'll be able to make it so we can easily jump to any page when filtering to edits from a specific page or from a specific user. We have this for the forum and that's why you can jump to page 500 for example of a long running thread and not have a long delay.

When will this happen?

UPDATE: (take two) This is now scheduled for Mon Dec 5th at 8:30PM PST until Tue Dec 6th at 7:30AM

At 8:30PM PST you will be logged-off of TVTropes so make sure to save anything before that time. Once the conversion is done the next morning, I'll point the code to the new server and everyone should be auto logged in. If not, when you see the all clear announcement at the top of the page, go ahead and try logging in again.

The process will take approximately 11 hours. I had it down to 9 hours until I found examples of utf8 encoded values inside latin1 columns so I had to add some extra testing to ensure that data doesn't get double encoded in the process.

If you have any issues during the migration please send an email to thestaff@tvtropes.org

UPDATE: (complete) The migration is officially complete. It ended up taking nearly 12 hours but we got there. All data on TVTropes (hundreds of millions of rows of data) has now been converted from Latin1 to UTF8. We now have the same encoding as 99% of other websites and can support special characters and other languages. It will also allow us to build other tools such as a WYSIWYG editor.

Edited by itcdr on Dec 6th 2023 at 8:48:40 AM

Zuxtron Berserk Button: misusing Nightmare Fuel from Node 03 (On A Trope Odyssey)
#2: Dec 1st 2023 at 1:00:16 PM

So this is going to fix the various bugs surrounding accented characters?

Amonimus the Retromancer from <<|Wiki Talk|>> (Sergeant) Relationship Status: In another castle
itcdr Since: Aug, 2014
#4: Dec 1st 2023 at 1:11:01 PM

@Zuxtron, yes it should make it so we can easily work with accented characters, characters in many other languages and even things like iphone emoji's if we wanted to support those. In most of areas of the site we try to convert things like accented characters to html entities (ie: &eacute\;) due to the issues with them in the old Windows encoding. After this migration we won't have to and you'll be able to just type the accented character from your keyboard and have it stay like that.

skewview Since: Jun, 2013
#5: Dec 1st 2023 at 1:45:08 PM

Oh Yes! Thank you very much!

Please note, I didn't want to annoy anyone with the wishlist/bug-report on this issue, so if I did, I'm sorry!

AFK with issues, will return
ilovewildkratts1 from a peaceful and quiet meadow (Experienced Trainee) Relationship Status: Yes, I'm alone, but I'm alone and free
#6: Dec 1st 2023 at 1:48:32 PM

Just want to clarify: after this update, if I go to a troper's edit history and click on an edit, will it take me right to the edit they made versus just the trope's/work's history page?

Because if so, that would be a very helpful feature, especially for pages that accrue a lot of edits.

Kaito is an alien and he is kinda spacey, coming from the universe to party and go crazy!
miraculous Goku Black (Apprentice)
Goku Black
#7: Dec 1st 2023 at 1:50:14 PM

So to clarify will our accounts like not be able to post during this time?

"That's right mortal. By channeling my divine rage into power, I have forged a new instrument in which to destroy you."
Yinyang107 from the True North (Decatroper) Relationship Status: Tongue-tied
#8: Dec 1st 2023 at 1:52:36 PM

As stated in that first post, no posting or editing will be possible and we won't be able to log in.

EldritchBrawler God of Darkness and Chaos from Britain Since: Jun, 2017 Relationship Status: Anime is my true love
God of Darkness and Chaos
kory Admin from a universe without doors (The New Guy) Relationship Status: watch?v=dQw4w9WgXcQ
Admin
#10: Dec 1st 2023 at 1:56:17 PM

ilovewildkratts1, i just added that to the list of Upcoming Editing Improvements.

Now monitoring Wishlist and Bugs
JHD0919 One-Track Mind (he/him) from a 12-pack of Diet Coke (Troper in training) Relationship Status: Abstaining
One-Track Mind (he/him)
#11: Dec 1st 2023 at 1:56:24 PM

Didn't know this site wasn't already in UTF8. This is good news!

Edited by JHD0919 on Dec 1st 2023 at 4:57:06 AM

This is Idol Tap. (My Troper Wall)
gjjones Musician/Composer from South Wales, New York Since: Jul, 2016
Musician/Composer
#12: Dec 1st 2023 at 2:04:23 PM

This is really good news, indeed. I'm also looking forward to the data conversion.

Edited by gjjones on Dec 1st 2023 at 5:09:30 AM

He/His/Him. No matter who you are, always Be Yourself.
Noaqiyeum Trans Siberian Anarchestra (it/they) from the gentle and welcoming dark (Time Abyss) Relationship Status: Arm chopping is not a love language!
Trans Siberian Anarchestra (it/they)
#13: Dec 1st 2023 at 2:10:07 PM

Should we expect any special complications logging in afterwards?

The Revolution Will Not Be Tropeable
rmctagg09 The Wanderer from Brooklyn, NY (USA) (Time Abyss) Relationship Status: I won't say I'm in love
The Wanderer
#14: Dec 1st 2023 at 2:11:49 PM

Does this have a chance of breaking any current URLs once complete?

Eating a Vanilluxe will give you frostbite.
itcdr Since: Aug, 2014
#15: Dec 1st 2023 at 2:15:08 PM

@rmctagg09, I am copying all data to a new server to do this conversion for safety. So if anything happens we can always switch back to the old server. If everything goes correctly it should convert all characters properly and no one should notice any differences when reading the site.

Amonimus the Retromancer from <<|Wiki Talk|>> (Sergeant) Relationship Status: In another castle
the Retromancer
#16: Dec 1st 2023 at 2:16:53 PM

And all current wikiwords are forced to latin anyway, so I don't think it can affect existing links.

TroperWall / WikiMagic Cleanup
ChillyBeanBAM KIRBY CAR from Ontario, Canada Since: Jan, 2020
KIRBY CAR
#17: Dec 1st 2023 at 2:20:42 PM

Good to know, thanks! [tup]

he/him
FernandoLemon Nobody Here from Argentina (Troper Knight) Relationship Status: In season
Ultimatum Disasturbator from Second Star to the left (Old as dirt) Relationship Status: Wishfully thinking
Disasturbator
DopamineMess-14-qq Since: Jan, 2023 Relationship Status: Owner of a lonely heart
#20: Dec 1st 2023 at 5:43:41 PM

I rather wanna make sure and I don't wanna cause any troubles, but you see, last year I somehow started working on the russian translation of TV Tropes right on the site, and i guess we all know that translations here are somewhat problematic. It seems like your upgrade will make things easier. What can i, as a user of Cyrillic alphabet, expect? For example, will pages' UR Ls still be in english? Or when i edit pages in russian almost all symbols translate into HTML code (well, honestly, not a really big problem thanks to Page Source), so will it be fixed too? Will there be any changes to namespaces? And does this all applies to other languages as well (if my memory serves me right, i remember that chinese and japanese had troubles too and unlike me they could not even start the translation)?

Cutegirl920fire CG for short from NYC apparently (Rule of Three) Relationship Status: Paris holds the key to my heart
CG for short
#21: Dec 1st 2023 at 7:04:19 PM

So since the update will fix accented characters, would this include accented characters in links?

Take this link to the Wikipedia article on Marie-Thérèse Charlotte for example. Unless you replace the accented characters with non accented characters and hope the link still works afterwards (it's not guaranteed), the link won't work because of the accented characters.

Victor of HGS S320 | "There's rosemary, that's for remembrance. Pray you, love, remember."
ilovewildkratts1 from a peaceful and quiet meadow (Experienced Trainee) Relationship Status: Yes, I'm alone, but I'm alone and free
#22: Dec 1st 2023 at 7:08:05 PM

[up] x12: Ah, good to know. Thank you!

Kaito is an alien and he is kinda spacey, coming from the universe to party and go crazy!
Lymantria Tyrannoraptoran Reptiliomorph from Toronto Since: Apr, 2015 Relationship Status: Historians will say we were good friends.
Tyrannoraptoran Reptiliomorph
#23: Dec 1st 2023 at 7:09:26 PM

[up][up] Agreed that that should be fixed. Why should fairly basic diacritics have to break links?

Edited by Lymantria on Dec 1st 2023 at 3:09:38 PM

Join the Five-Man Band cleanup project!
dvorak The World's Least Powerful Man from Hiding in your shadow (Elder Troper) Relationship Status: love is a deadly lazer
The World's Least Powerful Man
#24: Dec 1st 2023 at 7:33:58 PM

"Server updates/tests" really explains what happened to my account over the past few months. For one, I had to change my password because I could've sworn I got hacked back in the summer; for another, my password reset itself halfway through October (which was upsetting, to say the least).

I wish you godspeed and good luck, but I'm still a little worried.

Now everyone pat me on the back and tell me how clever I am!
skewview Since: Jun, 2013
#25: Dec 1st 2023 at 9:14:25 PM

~Cutegirl920fire, That is the beauty and advantage of UTF8 - a special character with accent, is treated as if without, say: á = a, a search engine, and even the browser will treat the former as if it is the latter; but unfortunately this is not the case when the character is simply &aacute; or %E1 or %C3%A1. Some browsers just ignore the characters completely. You can test this out: here or go on that page from wikipedia, open your in-browser search dialogue and enter the characters in the name as if without accents.

I'm fairly confident that the special HTML URL character encoding will similarity be of no great concern. Please note however that it is a different situation — that will rely on the browser to encode the characters into the ASCII table to actually request the linked page — though even with that, there are few sites left online that maintain that primary encoding for content.

edit: some clarification

Edited by skewview on Dec 1st 2023 at 5:22:35 PM

AFK with issues, will return

Total posts: 331
Top