NineChime forum

Furry stuff, oekaki stuff, and other stuff.

You are not logged in.

#1 02-21-2011 13:55:29


Tech discussion: database upgrade suggestions

Edit: I think I found an answer to my problem.  This post has been fully rewritten.

Hey, everyone.  It's time to let you all know what's going on with the next version of the oekaki.

Thanks to the tireless efforts of Fibonacci and Trunski, Wacintaki 1.5.6 will have Spanish and German language packs.  With the new language packs, I came the realization that I would have to bite the bullet and make Wacintaki fully UTF-8 compliant.  So far, this is going well.

The database is probably the oldest part of the oekaki, and remains fundamentally unchanged from the original OekakiPoteto database structure from 8 years ago.  Yes, it's really been that long.  One major issue is that OP created its database using the Latin1 (ISO-8859-1) character set.  I'm trying to convert everything to UTF-8, and this has been a bit frustrating.  First, PHP does not support UTF-8 natively, second, the MySQL client defaults to Latin1 regardless of what the database wants, and third, the only way to "properly" port a Latin1 database to UTF-8 means to back it up, wipe it out, and restore, which is a complicated process and beyond the technical ability of many Wacintaki owners.

I've worked out a compromise.  If you have an oekaki board already installed, the updater will use the MySQL REGEXP command to search for any columns in the wrong format.  This works reliably so long as the database is using the Latin1 character set.  The columns are then recoded from Latin1 or Big5 into UTF-8 as they are found.  No dirty log or temp columns needed.  The only thing I'm not sure about is how the collation affects how data is pulled from the database.

The real plus is that this is a very reliable and safe way to update the database.  Nothing gets wiped out, and only non-ASCII characters get updated.

After the recode, it's possible to do a database re-import to fix the storage limits.  These issues are annoying, but characters will still display correctly, so fixing this is optional.  Most queries are sorted by date, so collation won't be that big of a problem with, for example, Chinese characters.

The only thing I'm not sure about is how the collation affects inserts and updates.  It appears that utf8_general_ci does not enforce character set correctness, while uft8_unicode_ci does, so trying to set a Latin1 database to Unicode raises an error whenever you send information to the database for storage.  I'm testing different collations now.

New installs of the oekaki will be fully UTF-8 compliant, and use Unicode collation, so they should always work correctly with new versions of MySQL and PHP.

I have no ETA as to when the next version will be available, but I should have a public beta available within a couple weeks.  There's still a lot of work to do.

Last edited by Waccoon (02-21-2011 21:20:23)


Board footer

Yep, still running PunBB
© Copyright 2002–2008 PunBB