Global language stats for the web

Was a time when I used to visit Bill Dunlap’s pioneering Global Reach site regularly to find out about the fast changing figures on the language presence on the Internet. It doesn’t seem to feature any longer in data sources I come across, but someone’s blog prompted me to check out the latest (September) figures. Global Reach provides two essential data sets: year on year historical data to demonstrate evolution over time, and the sources for its figures.

English web users now account for 35% of the 801 million Web users, with Chinese users next with 13.7, trailed by Spanish (9%) and Japanese (8.4). The obvious figure, of course, is the forecast 100% leap for the online Chinese between 2004 and 2005, after already growing faster in recent years than any language constituency. See here for more details on the sources of these figures (in French).

People once used Global Reach online language data to demonstrate the need for localized e-commerce websites in a global marketplace. Now that this message has been widely taken on board, there is a need for more detailed stats that drill down inside these broad brush-stroke figures. I see a need for at least two critical sets of global stats, and I imagine there are technologies out there that can track this stuff automatically:

· the language spread found in globalized websites

· online language populations vs. country populations (to get a snapshot of movements in minority or immigrant language groups. E.g. see here for a report on the relevance of Latino language marketing in the U.S.)

Both these data sets should be visualizable synchronically and diachronically, so we can find out how fast web localization is proceeding, where it’s happening, and which languages are joining the localization pack. And governments as well as e-marketers will need national stats to inform educational, e-government and other policies.

If this is already being done, let’s hear more about it.

Andrew Joscelyne
European, a language technology industry watcher since Electric Word was first published, sometime journalist, consultant, market analyst and animateur of projects. Interested in technologies for augmenting human intellectual endeavour, multilingual méssage, the history of language machines, the future of translation, and the life of the digital mindset.


