???
I'm going to have to check that when/if I get to that point, but everything I've seen thus far says you can fix that with proper indexing.
I remember I had a non-forum database not too long ago that had several million records in it, and was able to navigate its browser with relative ease once it was indexed properly. Before that, hoo boy. Let's just say I'm not too happy with the documentation ethic of a lot of open source projects.
Quick-edit-to-add: I know that a table per thread gives you a separate file for each. It just seems like a pain, and I'd think the code for handling threads in the system is a bit dizzying as a result. But if it works, hey.
edited 3rd Dec '10 7:24:49 AM by GoggleFox
Sakamoto demands an explanation for this shit.The fact is, it works great. MySql doesn't work well at all when you get above a million rows (and degrades well before that number), no matter how much or how smart your indexing is. It's the indexing btree that's dumb, in fact.
Using a more robust engine might lead to a different design. Oracle-for-real is pretty good at scaling up.
Goal: Clear, Concise and WittyHeh, my Access tables get wonky running queries at a few hundred thousand rows. Of course, Access isn't exactly a "robust" database engine but I keenly feel that particular pain.
"It's Occam's Shuriken! If the answer is elusive, never rule out ninjas!"Yeah Access isn't known for being a high performance database. But My SQL seems to run fine for me, and passes benchmarks on billions of rows... in InnoDB engine mode, anyhow. MyISAM is notoriously slow, but InnoDB laps up the searches if you set the indices correctly (it works best if you have indexes suited to the queries you're doing, i.e. multiple column indices, and it can take some experimentation — which we don't really have the option for on a database this heavily used.) I haven't really had much of a reason to use MyISAM lately, as its only strength is FullText searches.
Sakamoto demands an explanation for this shit.We use InnoDB. And multicolumn indices. Your billions benchmarks ... rows containing no text or varchar cols, I suppose. Maybe.
It might be interesting to see those results. Out benchmarking here, and that we've seen elsewhere, don't support your findings.
Anyway. The problem is not the data model. The problem is proliferation of BS threads. With this design we can drop the BS without locking everything for a full table lock delete. You'll note that we bomb useless/troll threads all the time, without interuppting service.
Also, you do occasionally get a table crash in InnoDB, much more rarely than with myisam, but still. Doing a rebuild on one table while everything else goes about its business is a lot more pleasant than going out of business altogether for the rebuild.
Also, the mysql replication engine just plain pukes blood when shadowing mondo-sized tables. We have learned the hard way that having only one copy of the 'hot' db is a bad, bad idea. We needs us our replication engine.
Ish. You could cut the geek in air here with a knife.
Goal: Clear, Concise and WittyI am reminded of a time when I was still a professional, and someone on the UO beta dev team (gives you a timeframe, huh?) called our company asking if anyone had at least two years experience with the latest version of MS SQL Server. Some facts:
- MS SQL Server runs on the Jet database engine: The exact same engine as Access, but without the front end.
- Jet had always had a bizarre problem with queries on relational databases: with some databases, the relationships worked just fine. With others, reversing the order of the JOIN statement is required for no adequately explainable reason, or the query grinds down to a slow crawl.
- The relationship between users and characters in their beta server was one of those cases.
- The latest version of SQL Server, 4.0 at the time, was about six months old.
I suppose it does depend on that, yeah... large text fields might have a nasty effect. I wasn't sure how large of one. (And yes, the large tables I was using were devoid of large text fields — they had varchars, and the compression for InnoDB makes them very small, but all the large text fields were in their own table under that app.) Replication is indeed a necessity. No backup, no brains.
I'm mostly just trying to work out how to handle this, should I ever have to deal with this much text data and run into the walls you seem to have before. I'd rather not split things off into their own tables, since that's just hell to code. I'd planned on separating the text fields into their own table, and linking by ID to the main data tables. Maybe that'll be enough for me, maybe not. Either way, I was pretty sure that any regular backups would require shutting down the service for a little while and copying either the entire table or the day's transaction history to the backup. I expect this to make enormous backup files, but sometimes we need that.
But yeah, getting back on topic a bit, shit threads don't really need to stay put. Archival should be for things people want archived, not for every little thing. That helps with memory space, search time, keeping the page list clean... the works.
Now where do we put the archives, and in what format? I'm assuming we'd want to make them read-only, which probably makes it a lot easier to save the files.
Sakamoto demands an explanation for this shit.One last whack at the derail. It is not hell to code. It simplifies the coding greatly. A thread is an object that is instantiated. You have to do things with it like you would any other object. Delete, rename, move, lots of stuff.
If your thread is just a logical view of a collection of posts, you have to code up the view every time you want to do something for the thread object. Put it this way: making a thread an instantiation has utility and avoids certain performance woes. There is no advantage to the logical-view approach, and performance problems can be expected.
edited 3rd Dec '10 10:40:42 AM by FastEddie
Goal: Clear, Concise and WittyI'd imagine that that particular problem could be avoided by having an included library that does all of it for you. Wouldn't solve all the other problems, of course. 8-)
I guess semantically I always thought of a database as a set of tables where each table was kind of an OO "class" or well-defined relationship. A database which has thousands of tables is hard for me to grasp in terms of design. But I guess if it's the best technical way of doing things then there's not much choice in the matter. 8-)
Planescape Hijack
To rerail: is restoring the lost threads and flagging them for archiving still going to happen? I can understand that it's not a priority, but though I know not many people may have read it, I really did like being able to point to my liveblog and say "See, the person who made this thought my playthrough was so noteworthy they had to come comment on it".
edited 6th Dec '10 4:27:31 AM by Haven
Productivity is for people without internet connections. -Count DorkuI don't understand the techbabble above, and sorry that I'm just bitching again, but as a bit more time passed, this "cleanup" seems to be more and more inconvinient.
In the TRS, we can only guess why we did or didn't go through with some page action a few months ago. This problm came up in multiple threads, where people either tried to base their argument on a precedent but couldn't find it, or someone wanted to change a trope that was reportedly already changed once, but now we can't find the details.
In the Media forums, many works have single "official" threads, with only three or four fans having a long discussion about the work, until they run out of stuff to say, so they hibernate. This miht take months. When a newcomer sees the thread, he bumps it up, and discussion restarts with a new perspective. This is the normal cycle of these threads.
Actually, this supposedly standard behavior of looking for "active" threads on the first page, and considering everything else "dead", only works for a minority of our forums. On-Topic Conversations, Yack Fest, IJBM, IJAM, etc.
So, yeah, Fast Eddie, if there is any way to do avoid more of these cleanups in the future, please consider these points in its favor.
Will we be able to select "Edit Post" on the LB forum after it's museum'd? It would save a lot of hassle in migrating my Willy Beamish LP.
But soft! What rock through yonder window breaks? It is a brick! And Juliet is out cold.FYI, I'm loving the new system. It's actually making one of my LP's read better.
But soft! What rock through yonder window breaks? It is a brick! And Juliet is out cold.Yup. We'll likely get into that this weekend. And, yup:right here
.
edited 10th Dec '10 6:49:37 PM by FastEddie
Goal: Clear, Concise and Witty

Thanks for the design help, but the reason for a table per thread is performance. If we had all the posts in one thread, the table would be around 10M rows and pulling up a thread would be painfully slow. In fact, if you ever visit a forum that is painfully slow, you now know the reason why.
Goal: Clear, Concise and Witty