Follow TV Tropes

Following

Switch to Unicode?

Go To

SaniOKh Since: Jan, 2001
#1: Feb 1st 2011 at 3:09:15 PM

Hi.

I would like to know if it is possible to fix the biggest problem that prevents the Russian version of TV Tropes to be made.

Basically, everything is said in this discussion: https://tvtropes.org/pmwiki/posts.php?discussion=o33artr8mtj9yyizysn8116m&page=0 but I'll bring up the most important point if there's no will to read the discussion (which I understand) : cyrillic (Russian, Ukrainian etc) characters become entity codes when they are used in links.

For instance, here is plain text: Им Было Всё Равно. No anomalies here.

Here is the same text used as a link: Им Было Всё Равно. Ouch.

From what I understand, it seems like a codepage issue. My browser indicates that the encoding TV Tropes uses is ISO-8859-1, whereas all languages could have coexisted peacefully if it were in Unicode.

Or at least, if switching to Unicode would take too much time, a much quicker idea: is it possible to not convert "&" to "& amp;" when rendering a link (which would also fix the issue) ?

edited 1st Feb '11 3:10:25 PM by SaniOKh

Fighteer Lost in Space from The Time Vortex (Time Abyss) Relationship Status: TV Tropes ruined my love life
Lost in Space
#2: Feb 1st 2011 at 4:13:25 PM

Can't be done without a base level rewrite of the pmwiki codebase, which predates Unicode. It's on Eddie's way, way, way far to-do list.

"It's Occam's Shuriken! If the answer is elusive, never rule out ninjas!"
SilentReverence adopting kitteh from 3 tiles right 1 tile up Since: Jan, 2010
adopting kitteh
#3: Feb 1st 2011 at 5:41:44 PM

Just how old is the pmwiki codebase? Wiki Matrix and the official Pmwiki documentation both say Unicode=yes, although not without its caveats — in particular as that link indicates, that a tool like iconv may have to be used to convert existing content. In my opinion that shouldn't be an issue, as I recall very vividly that there was a time, not too long actually, that internationalized characters did work in links. Maybe not the entire UTF-8 range, but definitively Latin-2 and Coptic did. The conversion process itself can be tool- and editor-as-in-human assisted, flagging first pages that are problematic because of their content and that have a known attentive editor.

I don't like to see that this feature is "in the far future". Internationalization of content is one of these things that, because of the nature of the site, will only get exponentially worse as time is left to do its work.

Fanfic Recs orwellianretcon'd: cutlocked for committee or for Google?
Noelemahc Noodle Implements FTW! from Moscow, Russia Since: Nov, 2010 Relationship Status: Gay for Big Boss
#4: Feb 1st 2011 at 10:34:26 PM

Yeah, it may snowball, and that just won't do. The French version of this wiki uses punctuated titles for every single instance of accented vowels and consonants, and that makes potholing things a complicated horror. All the unicode-less approaches to adding the Russian language will be counter-intuitive to the wiki magic (such as romanization/translit or, again, punctuated titles). How much signatures will we need for this? =)

Videogames do not make you a worse person... Than you already are.
Tzetze DUMB from a converted church in Venice, Italy Since: Jan, 2001
DUMB
#5: Feb 1st 2011 at 10:42:03 PM

Just how old is the pmwiki codebase?

Quick test shows we don't have features from 2004. I don't think we've ever had (:markup:)...

[1] This facsimile operated in part by synAC.
Stratadrake Dragon Writer Since: Oct, 2009
Dragon Writer
#6: Feb 1st 2011 at 11:16:14 PM

Just what version of pmwiki is TV Tropes running on, then?

An Ear Worm is like a Rickroll: It is never going to give you up.
Tzetze DUMB from a converted church in Venice, Italy Since: Jan, 2001
DUMB
#7: Feb 1st 2011 at 11:17:50 PM

It's a fork. Fast Eddie and Janitor have edited the original pmwiki code to the point where it's not really pmwiki any more.

edited 1st Feb '11 11:17:58 PM by Tzetze

[1] This facsimile operated in part by synAC.
Fighteer Lost in Space from The Time Vortex (Time Abyss) Relationship Status: TV Tropes ruined my love life
Lost in Space
#8: Feb 2nd 2011 at 6:21:01 AM

Yeah, the original site grabbed a pmwiki build from way before Unicode was mainstream and it's divergent enough as to constitute a unique entity. Rolling in a new version of pmwiki would be impossible; might as well rewrite the whole thing.

"It's Occam's Shuriken! If the answer is elusive, never rule out ninjas!"
CentralAvenue Literally A Princess from The Palace of Serenity Since: Sep, 2014
Literally A Princess
#9: Feb 2nd 2011 at 7:32:06 AM

Would it be feasible to adopt a newer, less "modified" wiki engine (like the latest pmwiki or MediaWiki) and do a one-time conversion of the existing content? It would be a hassle now, but it might save a lot of work/headaches down the road.

edited 2nd Feb '11 7:32:20 AM by CentralAvenue

Heapers’ Hangout
Fighteer Lost in Space from The Time Vortex (Time Abyss) Relationship Status: TV Tropes ruined my love life
Tzetze DUMB from a converted church in Venice, Italy Since: Jan, 2001
DUMB
#11: Feb 2nd 2011 at 9:30:01 AM

I've been thinking that might be a good idea, despite the pain, because continuing to find security problems and bugs makes me worry.

[1] This facsimile operated in part by synAC.
Stratadrake Dragon Writer Since: Oct, 2009
Dragon Writer
#12: Feb 2nd 2011 at 10:10:57 AM

IMHO, I wouldn't mind seeing TV Tropes articles running on a Mediawiki framework (easier templating system, no accursed ptitles) ... but that's another story.

edited 2nd Feb '11 10:12:33 AM by Stratadrake

An Ear Worm is like a Rickroll: It is never going to give you up.
Madrugada Zzzzzzzzzz Since: Jan, 2001 Relationship Status: In season
Zzzzzzzzzz
#13: Feb 2nd 2011 at 10:32:43 AM

How much signatures will we need for this? =)
You'll need more than signatures. As has been stated several times, this would basically mean replacing the wiki software entirely, which would not be a simple or short job that Fast Eddie could whip up in his spare time. Throw a big enough pile of money to hire someone to work on it full-time until they've got it done at him, and you might get somewhere.

edited 2nd Feb '11 10:32:55 AM by Madrugada

...if you don’t love you’re dead, and if you do, they’ll kill you for it.
Noelemahc Noodle Implements FTW! from Moscow, Russia Since: Nov, 2010 Relationship Status: Gay for Big Boss
#14: Feb 2nd 2011 at 11:30:43 AM

Define "big enough", then. I'm fairly certain that the actual content movement will be supported by how Wiki Magic works, so it's more or less a matter of either upgrading the current software or replicating its features on a newer foundation, right?

Videogames do not make you a worse person... Than you already are.
Madrugada Zzzzzzzzzz Since: Jan, 2001 Relationship Status: In season
Zzzzzzzzzz
#15: Feb 2nd 2011 at 11:35:21 AM

We can't even get more than a few people to make a concerted effort to move the subjective and YMMV entries off of works pages and onto the dedicated subpages. What makes you think that wiki magic is going to migrate the whole wiki?

...if you don’t love you’re dead, and if you do, they’ll kill you for it.
SaniOKh Since: Jan, 2001
#16: Feb 2nd 2011 at 11:51:44 AM

I think finding a way to update this engine is more preferable than moving to Media Wiki. I fear changing the site completely could cause a severe case of They Changed It, Now It Sucks!.

I don't know how this version differs from the current version of Pm Wiki (BTW, what version is this?) nor how much of it was changed, but I suppose creating DIFF patches against the original version and then applying them to the current version of PmWiki could considerably shorten the time needed for updating.

edited 2nd Feb '11 11:52:27 AM by SaniOKh

Tzetze DUMB from a converted church in Venice, Italy Since: Jan, 2001
DUMB
#17: Feb 2nd 2011 at 11:53:36 AM

We're really far away from PmWiki now. Seriously, just look at the pmwiki website, it's completely different now.

edited 2nd Feb '11 11:53:56 AM by Tzetze

[1] This facsimile operated in part by synAC.
shimaspawn from Here and Now Since: May, 2010 Relationship Status: In your bunk
#18: Feb 2nd 2011 at 12:04:53 PM

[up][up] The only bit of PmWiki left is in the URL.

edited 2nd Feb '11 12:05:08 PM by shimaspawn

Reality is that, which when you stop believing in it, doesn't go away. -Philip K. Dick
CentralAvenue Literally A Princess from The Palace of Serenity Since: Sep, 2014
Literally A Princess
#19: Feb 2nd 2011 at 9:39:59 PM

Well, I was basically thinking that if the problems with the current parser are so deep that the only real fix is a total re-write, the time and effort might be better spent installing one of the more modern—and, honestly, better—wiki parsers available out there.

Heapers’ Hangout
shimaspawn from Here and Now Since: May, 2010 Relationship Status: In your bunk
#20: Feb 2nd 2011 at 10:00:16 PM

[up] Are you volunteering?

Reality is that, which when you stop believing in it, doesn't go away. -Philip K. Dick
Tzetze DUMB from a converted church in Venice, Italy Since: Jan, 2001
DUMB
#21: Feb 2nd 2011 at 10:04:43 PM

I'm not sure what there is to volunteer, as it's all backend work.

[1] This facsimile operated in part by synAC.
Shinr Since: Jun, 2009
#22: Feb 3rd 2011 at 12:23:48 AM

Unlike the situation with the YMMV, which unfortunately has a lot of negative stigma that prevents Wiki Magic, a more agreeable "Transfer all content from the obsolete engine to the new shiny one" will probably generate a lot more Wiki Magic.

SilentReverence adopting kitteh from 3 tiles right 1 tile up Since: Jan, 2010
adopting kitteh
#23: Feb 3rd 2011 at 7:54:48 AM

I concur with Shimr. I haven't done collaboration on the YMMV because, among other things, I don't agree with the move, and even if I did I don't agree on some specific tropes having been moved to YMMV, and then if I didI tend to get lost mixing it with Subjective / Audience Reaction, and then if I didn't etcetera. I haven't even been able to follow up one quarter of the discussions on the why. I'm sure I'm not the only one. On the other hand, the study of a feature that is objectively necessary and it is objectively agreeable that it should be implemented should allow for both an easier time calling the adequate people to contribute, and having random (good) tropers available for live testing.

For the record, I don't think Media Wiki is a good choice for a new engine. Although the substitution system works great and it is already visually nice, it depends too heavily on a good DBMS setup for everything, not just the content of the pages, and I fear with the number of articles and possible concurrent editions, we'd have to somehow be able to continually support specs as good as or better than the most well-known Wikias. I'd say the best choices are either a) (best) migrating and extending (not forking!!!!!!) the newest pmwiki or b) (acceptable) migrating to a text-backend wiki that supports Creole and/or extendable parsing stages.

edited 3rd Feb '11 7:56:08 AM by SilentReverence

Fanfic Recs orwellianretcon'd: cutlocked for committee or for Google?
SaniOKh Since: Jan, 2001
#24: Feb 4th 2011 at 8:54:07 AM

If there's no possibility for now to switch the engine, there's always a quick fix that would at least take care of the entities-in-links issue: find the code in the parser that converts "&" to "& amp;" and put a regex or something that would prevent it where it's not necessary (however, if it uses the Html Entities function, it will be impossible). Or add a line that would convert the generated link from

<a href="...">& amp;#[digits];& amp;#[digits];& amp;#[digits];</a>

to

<a href="...">& #[digits];& #[digits];& #[digits];</a>

Just a suggestion.

edited 4th Feb '11 8:55:48 AM by SaniOKh

FastEddie Since: Apr, 2004
#25: Feb 4th 2011 at 9:22:25 AM

We're branched off of 1.2.something. The 2003 code. At this point the only pmwiki code remaining is that god-awful camelcase wiki word parser which is the bane of my life. Link production by delimitation (the bracketed mediawiki style) is so much better.

A number of attempts have been made by Janitor and me to eliminate the ptitle system that attempts to fumble its way past non-latin-8/url-encoded character issues. The attempts have been defeated by processing issues. We have to be able to produce the 5-6 million pages a day that our traffic requires (on our slender budget), so each little increment of processing time per page view really adds up fast.

The current thinking is to require all punctuated/mbyte-char links to be delimited by a some kind of brackets to trigger special processing on save, so we don't have to do it on read. The camelcase parser would still be supported, for latin8-ble stuff.

A learning curve for the contributors, but nowhere as hairy/ugly as the ptitle thing has been. Once we have that, we do some data conversion, then flip the database into unicode mode and live like Rajahs.

Postscript: We have a gazillion bells and whistles that would have to be utterly re-done before a migration to Media Wiki, too many to even consider. Also — minor point, but a point — anything that differentiates us from wikipedia is good for us. There is something about using the same tools that reinforces the urge to adopt the same attitudes.

Goal: Clear, Concise and Witty

Total posts: 34
Top