Breaking Through Language BarriersAccording to a study at the University of Guelph, 70% of the internet is in English.  Considering how one in five humans can speak English to some level of competence, I’d say the internet is just too darned monolingual.

Thanks to translation services at AltaVista and Google, people can offer their pages in several other languages (WordPress bloggers can use GlobalTranslator, which simplifies the services) but machine translation often leaves much to be desired.  Slang comes out wrong.  Mis-spelled or mis-used words can make sentences unreadable in other languages (a prime example is the mis-use of “their”, “there” and “they’re”).  On top of that, most of us that can speak several languages tend to use only one when we write something online (who wants to write a blog post or static web page two or three times in different languages, after all?).

But despite some of the current limitations of machine translated pages, they’re still better than nothing.

Of course, there are some benefits to offering your site in multiple languages.  I’ve had GlobalTranslator on this site since January and have found that a full 38% of my readers are reading content in a language other than English.  Spanish seems to be the most popular, with a full 22% of the readership, followed closely by Traditional Chinese and German.  If the study’s 70% English figure is accurate, then all those “make money online” blogs are potentially missing out on a massive market.

And this language issue is the crux of my problem.

I read hundreds of blogs every week from every continent save Africa (I don’t think they’ve discovered blogging, yet), and 80% of these blogs have a fatal flaw in their website:  English-Only Commenting.  Perhaps I should rephrase that … because it’s not the comments themselves that I want to see in other languages … it’s the name fields.

Since I read so many blogs, there are times when I want to leave a comment or two.  Because there are so many other people with the name Jason, I try to mix it up a bit by entering “ジェイソン (Jason)”.  This does a few things for me.  By writing my name using Japanese Katakana, people are a bit more likely to click the link back to my site (I’ve noticed a significant jump in visits when using non-English characters in my name when commenting), and it sets me apart from all the other Jason’s on the internet.  I’ll be the first to admit that there are several people with this name here in Japan (I’ll be working with one shortly, too), but the odds of others using this visual link-clicking tactic has been slim thus far.  Unfortunately, when I start using Unicode characters, websites start to complain.

The error I’m often presented with is often a lovely MySQL error which spits out the raw query which tells me what tables or columns are rejecting the data.  This is most often seen with privately hosted WordPress blogs, as the larger blogging sites (Blogger, WordPress.com, etc.) tend to be happy with Unicode characters.  I can understand that most people don’t really give much thought to the table types and collations when setting up a database for their blog (I also understand that most people don’t even know about these things to begin with), but I’m curious to know why the default tends to be less-friendly towards the East-Asian and Middle-Eastern character systems.  There are far more people speaking Mandarin and Arabic than English.

According to the Internet World Stats page, approximately 17.8% of the world’s population has access to the internet.  This is roughly the same percentage of people that also speak English on some level.  As more of us connect to the global community, I’d love to see webpages become more language-friendly.  Whether it’s in the form of a machine-translation service, or a nice UTF-8 Unicode-capable site that can handle people’s names and comments in their native language, any progress towards greater communication would be beneficial.  Of course, if people are posting comments in languages other than the site’s primary target, these comments would need to be translated before being entered into the database (perhaps making the original comment available in the foreign language on request?).

The Internet has been called a Global Village.  This might have been an apt analogy at first, but the tiny village quickly grew into a big city with its own Chinatown, Little Italy, and every other pocket of culture that we see in the physical world today.

Should we try to reduce the language barriers that exist on the Internet today?  Or is it better to leave things as they are?