Beyond tomAYto and toMARto

There’s an intriguing grass roots website that allows non-native English speakers to input their pronunciation of a test sentence which is then digitized. This feeds a growing database of speech accents of English that can then be used for various teaching, testing and other projects. Steven Weinberger (SW) of the George Mason University, Washington, has been masterminding it. Sound useful? Read on.

Why do dozens (hundreds?) of speakers with non-native English linguistic profiles read this test phrase: Please call Stella.  Ask her to bring these things with her from the store:  Six spoons of fresh snow peas, five thick slabs of blue cheese, and maybe a snack for her brother Bob.  We also need a small plastic snake and a big toy frog for the kids.  She can scoop these things into three red bags, and we will go meet her Wednesday at the train station.

SW: We constructed the paragraph so that it was short, had familiar words for non-natives, and had sounds that we wanted to test. The elicitation paragraph contains most of the consonants, vowels, and clusters of standard American English. To see the distribution of sounds, click here. We digitize the speech samples at 44.1KHz, 16-bit mono.

Do you keep statistics on how the archive is used and by whom?

SW: We don’t keep statistics, but we do know that more than 1 million visitors have seen the site. This is fabulous, given the arcane nature of the archive. This summer, we will be rolling out a completely new version of the archive, database driven and much more searchable. Then we will keep statistics on visitors. We get lots of mail from our visitors, who range from academics and engineers to stay-at-home moms. Everyone seems to like to listen to accents.

How about plans to use this database in news ways?

SW: We are letting the users of the archive exploit it as they wish, as long as they cite us as the source. We have a Creative Commons license, so people can use our stuff free of charge (so long as they do not sell it elsewhere).

We get lots of mail from speech engineers who use our recordings for speech recognition research, and from E(nglish) as a S(econd) L(anguage) teachers who are designing lesson plans from the samples. We even heard from a composer who was writing some saxophone music to the archive speech samples! We simply want to be kept informed of how people are using our stuff.

Andrew Joscelyne
European, a language technology industry watcher since Electric Word was first published, sometime journalist, consultant, market analyst and animateur of projects. Interested in technologies for augmenting human intellectual endeavour, multilingual méssage, the history of language machines, the future of translation, and the life of the digital mindset.


Weekly Digest

Subscribe to stay updated

MultiLingual Media LLC